# Customer Retention Strategy for Model Fitness Gyms

This project aims to analyze customer retention and churn patterns at Model Fitness branches across the USA. Our objective is to conduct an in-depth examination of churned customers, identify key trends, and understand the primary reasons behind membership cancellations.

The insights from this analysis will guide the development of a comprehensive customer retention strategy designed to enhance member loyalty and address the increasing churn rates, which have emerged as a significant business concern. This strategy will reinforce our commitment to improving member retention and overall customer satisfaction.

In [1]:
# General libraries
import pandas as pd
import numpy as np
import datetime as dt
import scipy.stats as stats

# Visualization libraries
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
from scipy.cluster.hierarchy import linkage, dendrogram

# Sklearn libraries for algorithms, metrics, and clustering
from sklearn.cluster import KMeans
from sklearn.metrics import (
    mean_absolute_error, mean_squared_error, r2_score,
    accuracy_score, precision_score, recall_score, f1_score
)
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso, Ridge, LogisticRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import (
    RandomForestRegressor, GradientBoostingRegressor, RandomForestClassifier
)

# Display settings
pd.set_option("display.max_columns", None)

# Chart styling
sns.set(style='whitegrid')
facecolor = '#eaeaea'
font_color = '#525252'
csfont = {'fontname': 'Liberation Serif'}
hfont = {'fontname': 'LiberationSans-Regular'}

## Read Data

In [2]:
# Load and process the gym data
gym_data = pd.read_csv("gym_churn_us.csv")
gym_data.columns = gym_data.columns.str.lower()

# Display dataset shape and detailed information
print(f"Dataset contains {gym_data.shape[0]} rows and {gym_data.shape[1]} columns.")
gym_data.info()

Dataset contains 4000 rows and 14 columns.
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4000 entries, 0 to 3999
Data columns (total 14 columns):
 #   Column                             Non-Null Count  Dtype  
---  ------                             --------------  -----  
 0   gender                             4000 non-null   int64  
 1   near_location                      4000 non-null   int64  
 2   partner                            4000 non-null   int64  
 3   promo_friends                      4000 non-null   int64  
 4   phone                              4000 non-null   int64  
 5   contract_period                    4000 non-null   int64  
 6   group_visits                       4000 non-null   int64  
 7   age                                4000 non-null   int64  
 8   avg_additional_charges_total       4000 non-null   float64
 9   month_to_end_contract              4000 non-null   float64
 10  lifetime                           4000 non-null   int64  
 11  avg_class_fre

In [3]:
gym_data.head()

Unnamed: 0,gender,near_location,partner,promo_friends,phone,contract_period,group_visits,age,avg_additional_charges_total,month_to_end_contract,lifetime,avg_class_frequency_total,avg_class_frequency_current_month,churn
0,1,1,1,1,0,6,1,29,14.22747,5.0,3,0.020398,0.0,0
1,0,1,0,0,1,12,1,31,113.202938,12.0,7,1.922936,1.910244,0
2,0,1,1,0,1,1,0,28,129.448479,1.0,2,1.859098,1.736502,0
3,0,1,1,1,1,12,1,33,62.669863,12.0,2,3.205633,3.357215,0
4,1,1,1,1,1,1,0,26,198.362265,1.0,3,1.113884,1.120078,0


In [4]:
gym_data.isnull().sum()

gender                               0
near_location                        0
partner                              0
promo_friends                        0
phone                                0
contract_period                      0
group_visits                         0
age                                  0
avg_additional_charges_total         0
month_to_end_contract                0
lifetime                             0
avg_class_frequency_total            0
avg_class_frequency_current_month    0
churn                                0
dtype: int64

At first glance, the dataset appears to have no missing values, and all data has already been converted into numerical format, which will be useful for applying predictive models later in the project.