## Treadmill Customer Analysis

### Intro:

Mock Dataset that imitates a survey completed by customers upon purchasing a treadmill.

#### Features:

Product - Model type

Age - Customer age\
Income - Customer Salary\
Education - no. years in Education\
Gender - Customer Gender\
MartialStatus - Martial Status

Fitness - Self Assessed fitness level\
Usage - no. times they expect to Use treadmil per week\
Miles - no. miles they expect to run per week


#### Proposed analysis:
Finding relationships between the model type and the other data - segmenting the market. Such that advertising can be directed in a particular fashion:


### 0) Setup

In [5]:
import pandas as pd
import numpy as np
import scipy.stats as ss
from sklearn.preprocessing import MinMaxScaler

import seaborn as sns
import matplotlib.pyplot as plt
import plotly.graph_objects as go

In [6]:
df = pd.read_csv('CardioGoodFitness.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'CardioGoodFitness.csv'

In [None]:
df.columns

### 1) Basic Analysis

In [None]:
df.Product.unique()

In [None]:
model_tots = df.groupby('Product')['Product'].count()
# model_tots

In [None]:
plt.pie(model_tots, autopct='%.1f%%');
plt.legend(title='Model Type', labels=['TM195','TM498','TM798'], loc="center left",bbox_to_anchor=(1, 0, 0.5, 1))
plt.tight_layout()
plt.title('Model Market Share');

#### Remarks::

* TM195 most popular model, TM 798 least

**Conclusion**

* Focus advertising on TM195

### 2) Analysing Numerics

In [None]:
tm195 = df[df['Product'] == 'TM195']
tm498 = df[df['Product'] == 'TM498']
tm798 = df[df['Product'] == 'TM798']

In [None]:
# display(tm195.describe(), tm498.describe(), tm798.describe())

In [None]:
fig, axes = plt.subplots(2,3, figsize=(12,8))

sns.boxplot(ax = axes[0,0], data = df, x = 'Product', y = 'Age', hue = 'Product')
sns.boxplot(ax = axes[0,1], data = df, x = 'Product', y = 'Income', hue = 'Product')
sns.boxplot(ax = axes[0,2], data = df, x = 'Product', y = 'Education', hue = 'Product')
sns.boxplot(ax = axes[1,0], data = df, x = 'Product', y = 'Usage', hue = 'Product')
sns.boxplot(ax = axes[1,1], data = df, x = 'Product', y = 'Miles', hue = 'Product')
sns.violinplot(ax = axes[1,2], data = df, x = 'Product', y = 'Fitness', hue = 'Product')
plt.suptitle('Distribution of Numerics with Respect to Model Type (Product)')
plt.tight_layout()

#### Remarks::

* Ages between early twenties and mid thirties
* TM798 has a unique user base, those that are wealthier, more educated and who want to use their treadmill more
* TM195 and TM 498 have similar users

**Conclusions**

* Advertise geared towards 20 - 35 year olds
* Have a segmented customer base: can direct advertising accordingly
* 195 and 498 possibly combined into one model - same customer base

### 3) Analysing Binaries

In [None]:
gender_counts = df.groupby(['Product','Gender'])['Age'].count().unstack()
marital_counts = df.groupby(['Product','MaritalStatus'])['Age'].count().unstack()

In [None]:
fig, axes = plt.subplots(1,2, figsize = (15,7))
gender_counts.plot(ax = axes[0], kind='bar', stacked=False, color=['red','blue'], ylabel = 'Number of Customers', xlabel='Model type', title='Gender')
axes[0].legend(['Female','Male'],
          loc="upper right",)
marital_counts.plot(ax=axes[1], kind='bar', stacked=False, color = ['green','orange'], ylabel='Number of Customers', xlabel='Model type',title='Marital Status')
axes[1].legend(['Partnered','Single'],
          loc="upper right",)
fig.suptitle('Number of customers against model type with respect to binaries')

[Office for National Statistics](https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/bulletins/populationestimatesbymaritalstatusandlivingarrangements/2019)\
*In 2019, just over half of the population (50.4%) were in a legally recognised partnership (50.2% were married with a further 0.2% in a civil partnership). An estimated 35.0% of the population were single (never married or in a civil partnership), with divorced/dissolved civil partnership and widowed/surviving civil partner accounting for 8.2% and 6.5% of the population respectively.*

In [None]:
partner_to_non = 0.504

In [None]:
df['MaritalStatus'].value_counts()

In [None]:
marital_counts['ratio'] = marital_counts['Partnered']/(marital_counts['Single'] + marital_counts['Partnered'])
marital_counts['above_stat'] = marital_counts['ratio']>partner_to_non
marital_counts

#### Remarks:

* Males more likely to buy 798 by far
* Partnered people more likely to buy treadmils across the board

**Conclusions**

* Advertising for 798 geared towards men
* All advertising should be somewhat geared towards people in relationships.

### 4a) Further Illustrating Customer demographics with numerics

*inspired by*
[OBrunet](https://obrunet.github.io/data%20science/cardio/)

In [None]:
df.columns

In [None]:
mean_data = df.groupby('Product')[['Age', 'Education', 'Usage', 'Fitness', 'Income', 'Miles']].mean()
base = mean_data.loc['TM195']
scaled = mean_data/base
scaled

In [None]:
radar_data = pd.DataFrame(scaled.stack())

radar_data_195 = radar_data.loc['TM195']
radar_data_195.reset_index(inplace = True)
radar_data_195.rename({0:'scaled'},axis=1, inplace=True)

radar_data_498 = radar_data.loc['TM498']
radar_data_498.reset_index(inplace = True)
radar_data_498.rename({0:'scaled'},axis=1, inplace=True)

radar_data_798 = radar_data.loc['TM798']
radar_data_798.reset_index(inplace = True)
radar_data_798.rename({0:'scaled'},axis=1, inplace=True)

In [None]:
fig = go.Figure()
# fig.add_trace(go.Scatterpolar(r=radar_data_798['scaled'], theta=radar_data_798['index'], fill='toself', name='798'))
# fig.add_trace(go.Scatterpolar(r=radar_data_498['scaled'], theta=radar_data_498['index'],fill='toself', name='498'))
fig.add_trace(go.Scatterpolar(r=radar_data_195['scaled'], theta=radar_data_195['index'], name='195'))
fig.add_trace(go.Scatterpolar(r=radar_data_498['scaled'], theta=radar_data_498['index'], name='498'))
fig.add_trace(go.Scatterpolar(r=radar_data_798['scaled'], theta=radar_data_798['index'], name='798'))
fig.show()

In [None]:
df.select_dtypes('number').columns

In [None]:
#mmscaled = 
# MinMaxScaler.fit_transform(df[df.select_dtypes('number').columns])
scaler = MinMaxScaler()
df_scaled = scaler.fit_transform(df[df.select_dtypes('number').columns])
df_scaled = pd.DataFrame(df_scaled, columns = ['Age', 'Education', 'Usage', 'Fitness', 'Income', 'Miles'])
df_scaled = pd.concat([df['Product'],df_scaled*5],axis=1)
df_scaled = df_scaled.groupby('Product').mean()

In [None]:
radar_data = pd.DataFrame(df_scaled.stack())

radar_data_195 = radar_data.loc['TM195']
radar_data_195.reset_index(inplace = True)
radar_data_195.rename({0:'scaled'},axis=1, inplace=True)

radar_data_498 = radar_data.loc['TM498']
radar_data_498.reset_index(inplace = True)
radar_data_498.rename({0:'scaled'},axis=1, inplace=True)

radar_data_798 = radar_data.loc['TM798']
radar_data_798.reset_index(inplace = True)
radar_data_798.rename({0:'scaled'},axis=1, inplace=True)

In [None]:
fig = go.Figure()
fig.add_trace(go.Scatterpolar(r=radar_data_195['scaled'], theta=radar_data_195['index'], name='195'))
fig.add_trace(go.Scatterpolar(r=radar_data_498['scaled'], theta=radar_data_498['index'], name='498'))
fig.add_trace(go.Scatterpolar(r=radar_data_798['scaled'], theta=radar_data_798['index'],  name='798'))
fig.show()

#### Remarks

* Provides extra clarification that over the whole customer base 498 and 195 have the same appeal
* Brunet provides addditional segmentation by gender splitting, argument for a male or female version of 498 and 195

**Conclusion**

* drop 498 and focus advertising on 195 exclusively

### 5ai) Numerics Correlation matrix

In [None]:
correlation_matrix = df[[ 'Age', 'Education', 'Usage',
       'Fitness', 'Income', 'Miles']].corr()
matrix=np.triu(correlation_matrix)
sns.heatmap(correlation_matrix, annot=True, mask = matrix, cmap='magma', square=True, linewidth=0.5 )
plt.title('Pearson Correlation of numeric features wrt each other')
plt.tight_layout()

##### Remarks:

* Income is highly correlated with every feature!
* Expected Miles shows greatest correlation of the group, with usage and fitness

##### appending model indicators

In [None]:
df_correlation = df.copy()
df_correlation['TM195'] = df['Product'] == 'TM195'
df_correlation['TM498'] = df['Product'] == 'TM498'
df_correlation['TM798'] = df['Product'] == 'TM798'
df_correlation.drop(['Product','Gender','MaritalStatus'],inplace = True, axis=1)

In [None]:
mask = df_correlation.corr().copy()
mask.drop(['Age','Education','Usage','Fitness','Income','Miles'],axis=0, inplace =True)
mask.drop(['TM195','TM498','TM798'],axis=1, inplace =True )
sns.heatmap(mask,  annot=True, cmap='magma',square=True, linewidth=0.5 )
plt.title('Point Biserial Correlation of numeric features With Model Type')
plt.tight_layout()

In [None]:
mask

#### Remarks:

* some nonsense data .e.g correlation between indicators
* qualifies level of correlation between 798 and numeric variables Useful to strengthen arguments
* Age has no bearing on model choice

## 5ai) Establishing Statistically Significant Relationships

In [None]:
df_nums = df.select_dtypes(include=['int64','float64'])
df_nums

In [None]:
cols_nums

In [None]:
cols_nums = df_nums.columns 
num_cols = cols_nums
num_cols_r = cols_nums

psage=[]
pseducation=[]
psusage=[]
psfitness=[]
psincome=[]
psmiles=[]
keys_nums=[]
corrcs_nums=[]
for rcol in num_cols:
    for col in num_cols_r:
        # print(col, rcol)
        correlation_coefficient, p_value = ss.pearsonr(df_nums[col], df_nums[rcol])
        # print(correlation_coefficient, p_value)
        keys_nums.append(col)
        corrcs_nums.append(correlation_coefficient)
        if rcol =='Age':
            psage.append(p_value)
        elif rcol =='Education':
            pseducation.append(p_value)
        elif rcol =='Usage':
            psusage.append(p_value)
        elif rcol =='Fitness':
            psfitness.append(p_value)
        elif rcol =='Income':
            psincome.append(p_value)
        else:
            psmiles.append(p_value)
        
    # print('\n')

In [None]:
print(len(psage),len(pseducation),len(psusage),len(psfitness))

In [None]:
df_pval = pd.DataFrame({'Age':psage, 'Education':pseducation,'Usage':psusage, 'Fitness':psfitness,\
                       'Income':psincome, 'Miles':psmiles},index=num_cols_r)
display(df_pval)
df_pval<0.05
# df_pval.drop('Age')
df_pval.transpose()
# df_pval = df_pval.loc['Education':,'Education':]
display(df_pval)
masking_m_nums=np.triu(df_pval)
sns.heatmap(df_pval, mask=masking_m_nums, square=True, linewidths=0.05, annot=True)
plt.title('p-value for numerics against model type')

In [None]:
cols = df_correlation.columns 
numerics = cols[:-3]
models = cols[-3:]

ps195=[]
ps498=[]
ps798=[]
keys=[]
corrcs=[]
for rcol in models:
    for col in numerics:
        # print(col, rcol)
        correlation_coefficient, p_value = ss.pointbiserialr(df_correlation[col], df_correlation[rcol])
        # print(correlation_coefficient, p_value)
        keys.append(col)
        corrcs.append(correlation_coefficient)
        if rcol =='TM195':
            ps195.append(p_value)
        elif rcol =='TM498':
            ps498.append(p_value)
        else:
            ps798.append(p_value)
        
    # print('\n')

In [None]:
df_pval = pd.DataFrame({'TM195':ps195, 'TM498':ps498,'TM798':ps798}, index=numerics)
display(df_pval)
df_pval<0.05
df_pval=df_pval.transpose()
sns.heatmap(df_pval,square=True, linewidths=0.05, annot=True)
plt.title('p-value for numerics against model type')

### 5b) Categorical Pearson's correlation matrix

In [None]:
df['MaritalStatus'].unique()

In [None]:
df_encoded = df.copy()
df_encoded = df_encoded.select_dtypes(include='object')
df_encoded['male'] = df['Gender'] =='Male'
df_encoded['female'] = df['Gender'] =='Female'
df_encoded['single'] = df['MaritalStatus'] =='Single'
df_encoded['partnered'] = df['MaritalStatus'] =='Partnered'
df_encoded['TM195'] = df['Product'] == 'TM195'
df_encoded['TM498'] = df['Product'] == 'TM498'
df_encoded['TM798'] = df['Product'] == 'TM798'
df_encoded.drop(['Product','Gender','MaritalStatus'],axis=1, inplace= True)

In [None]:
encoded_cor_m = df_encoded.corr()
masked = encoded_cor_m.copy()
masked.drop(['male','female','single','partnered'],axis=0, inplace = True)
masked.drop(['TM195','TM498','TM798'],axis=1, inplace = True)
sns.heatmap(masked, annot=True,cmap ='viridis', square=True, linewidth=0.5);

### 5d) Jaccard Similarity

In [None]:
def jaccard_similarity(vec1, vec2):
    intersection = np.sum(vec1 & vec2)
    union = np.sum(vec1 | vec2)
    return intersection / union

In [None]:
jaccards=[]
key=[]
jaccards = pd.DataFrame(np.zeros((len(df_encoded.columns),len(df_encoded.columns))))
jaccards.index = df_encoded.columns
jaccards.columns = df_encoded.columns
for i in range(len(df_encoded.columns)):
    for j in range(len(df_encoded.columns)):
        jaccards.iloc[i,j] =jaccard_similarity(df_encoded.iloc[:,i], df_encoded.iloc[:,j])
        jaccards.index

jaccards

In [None]:
masked_jaccard = jaccards.copy()
masked_jaccard.drop(['male','female','single','partnered'],axis=0, inplace = True)
masked_jaccard.drop(['TM195','TM498','TM798'],axis=1, inplace = True)
sns.heatmap(masked_jaccard, annot=True,cmap ='viridis', square=True, linewidth=0.5);

### 5d) Categorical Dice coef

In [None]:
def dice_coefficient(vec1, vec2):
    intersection = np.sum(vec1 & vec2)
    total = np.sum(vec1) + np.sum(vec2)
    return 2 * intersection / total

In [None]:
dices=[]
key=[]
dices = pd.DataFrame(np.zeros((len(df_encoded.columns),len(df_encoded.columns))))
dices.index = df_encoded.columns
dices.columns = df_encoded.columns
for i in range(len(df_encoded.columns)):
    for j in range(len(df_encoded.columns)):
        dices.iloc[i,j] =dice_coefficient(df_encoded.iloc[:,i], df_encoded.iloc[:,j])
        dices.index

dices

In [None]:
masked_dice = dices.copy()
masked_dice.drop(['male','female','single','partnered'],axis=0, inplace = True)
masked_dice.drop(['TM195','TM498','TM798'],axis=1, inplace = True)
sns.heatmap(masked_dice, annot=True,cmap ='viridis', square=True, linewidth=0.5);
plt.title('Dice similarity of binary features wrt model choice')
plt.tight_layout()

**Remarks:**

Dice is the most effeective for measuring set overlap of binaries.
Doesn't really make sence to check correlation - binaries don't scale with each other, either are or aren't present. Hence similarity score..

Dice>Jacard because it doesn't penalise smaller sets - taking sum of size of positives from sets rather than union of sets massively means despite different sizing the similarity is scaled fairly

### 6a) Simpler vis of binaries

In [None]:
df_maried=pd.concat([pd.DataFrame(df[df['Product'] =='TM195']['MaritalStatus'].value_counts()).rename({'count':'TM195'},axis=1),#,\
pd.DataFrame(df[df['Product'] =='TM498']['MaritalStatus'].value_counts()).rename({'count':'TM498'},axis=1),
pd.DataFrame(df[df['Product'] =='TM798']['MaritalStatus'].value_counts()).rename({'count':'TM798'},axis=1)], axis=1)

df_maried = df_maried[['TM195', 'TM498','TM798']].astype('float64')

df_maried.loc['Partnered',:] = df_maried.loc['Partnered',:]/df_maried.loc['Partnered',:].sum()
df_maried.loc['Single',:] = df_maried.loc['Single',:]/df_maried.loc['Single',:].sum()

df_maried = df_maried.transpose()

sns.heatmap(df_maried, annot=True,cmap ='viridis', square=True, linewidth=0.5)

In [None]:
df_maried=pd.concat([pd.DataFrame(df[df['Product'] =='TM195']['MaritalStatus'].value_counts()).rename({'count':'TM195'},axis=1),#,\
pd.DataFrame(df[df['Product'] =='TM498']['MaritalStatus'].value_counts()).rename({'count':'TM498'},axis=1),
pd.DataFrame(df[df['Product'] =='TM798']['MaritalStatus'].value_counts()).rename({'count':'TM798'},axis=1)], axis=1)

df_maried = df_maried[['TM195', 'TM498','TM798']].astype('float64')

df_maried.loc[:,'TM195'] = df_maried.loc[:,'TM195']/df_maried.loc[:,'TM195'].sum()
df_maried.loc[:,'TM498'] = df_maried.loc[:,'TM498']/df_maried.loc[:,'TM498'].sum()
df_maried.loc[:,'TM798'] = df_maried.loc[:,'TM798']/df_maried.loc[:,'TM798'].sum()

df_maried = df_maried.transpose()

sns.heatmap(df_maried, annot=True,cmap ='viridis', square=True, linewidth=0.5)
plt.title('Marital Status Distribution by Treadmill model')

In [None]:
plt.pie(df['MaritalStatus'].value_counts(), autopct='%.1f%%', colors= ['green','orange']);
plt.legend(['Partnered','Single'],
          title="Marital Status",
          loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))
plt.title('Marital Status Distribution')

In [None]:
df_gender=pd.concat([pd.DataFrame(df[df['Product'] =='TM195']['Gender'].value_counts()).rename({'count':'TM195'},axis=1),#,\
pd.DataFrame(df[df['Product'] =='TM498']['Gender'].value_counts()).rename({'count':'TM498'},axis=1),
pd.DataFrame(df[df['Product'] =='TM798']['Gender'].value_counts()).rename({'count':'TM798'},axis=1)], axis=1)

df_gender = df_gender[['TM195', 'TM498','TM798']].astype('float64')

df_gender.loc['Male',:] = df_gender.loc['Male',:]/df_gender.loc['Male',:].sum()
df_gender.loc['Female',:] = df_gender.loc['Female',:]/df_gender.loc['Female',:].sum()

df_gender = df_gender.transpose()


sns.heatmap(df_gender, annot=True,cmap ='viridis', square=True, linewidth=0.5)
plt.title('Treadmill distribution by gender')

In [None]:
fig, axe = plt.subplots(1,2)
# axe[0].pie(df_gender['Male'], labels=['TM195', 'TM498','TM798'], autopct='%.f%%');
axe[0].pie(df_gender['Male'], autopct='%.1f%%', );
axe[0].legend(['TM195','TM498','TM798'],
          title="Model type",
          loc="center left",
          bbox_to_anchor=(1, 0, 0.5, 1))
axe[0].set_title('Male')
#axe[1].pie(df_gender['Female'],labels=['TM195', 'TM498','TM798'], autopct='%.f%%');
axe[1].pie(df_gender['Female'], autopct='%.1f%%');
axe[1].set_title('Female')
plt.tight_layout()

In [None]:
df_gender=pd.concat([pd.DataFrame(df[df['Product'] =='TM195']['Gender'].value_counts()).rename({'count':'TM195'},axis=1),#,\
pd.DataFrame(df[df['Product'] =='TM498']['Gender'].value_counts()).rename({'count':'TM498'},axis=1),
pd.DataFrame(df[df['Product'] =='TM798']['Gender'].value_counts()).rename({'count':'TM798'},axis=1)], axis=1)

df_gender = df_gender[['TM195', 'TM498','TM798']].astype('float64')

df_gender.loc[:,'TM195'] = df_gender.loc[:,'TM195']/df_gender.loc[:,'TM195'].sum()
df_gender.loc[:,'TM498'] = df_gender.loc[:,'TM498']/df_gender.loc[:,'TM498'].sum()
df_gender.loc[:,'TM798'] = df_gender.loc[:,'TM798']/df_gender.loc[:,'TM798'].sum()


df_gender = df_gender.transpose()

sns.heatmap(df_gender, annot=True,cmap ='viridis', square=True, linewidth=0.5)
plt.title('Gender distribution by treadmill model')

### 6b) Customer segmentation including binaries

In [None]:
df

In [None]:
df_fr = df.copy()
df_fr['Male'] =df_fr['Gender']=='Male'
# df_fr['Female'] =df_fr['Gender']=='Female'
# df_fr['Maried'] = df_fr['MaritalStatus'] == 'Partnered'
df_fr['Single'] = df_fr['MaritalStatus'] == 'Single'
df_fr.drop(['Gender','MaritalStatus'],axis=1, inplace=True)

In [None]:
mean_data_fr = df_fr.groupby('Product')[['Age', 'Education', 'Usage', 'Fitness', 'Income', 'Miles','Male','Single']].mean()
base_fr = mean_data_fr.loc['TM195']
scaled_fr = mean_data_fr/base_fr
scaled_fr

In [None]:
radar_data = pd.DataFrame(scaled_fr.stack())

radar_data_195 = radar_data.loc['TM195']
radar_data_195.reset_index(inplace = True)
radar_data_195.rename({0:'scaled'},axis=1, inplace=True)

radar_data_498 = radar_data.loc['TM498']
radar_data_498.reset_index(inplace = True)
radar_data_498.rename({0:'scaled'},axis=1, inplace=True)

radar_data_798 = radar_data.loc['TM798']
radar_data_798.reset_index(inplace = True)
radar_data_798.rename({0:'scaled'},axis=1, inplace=True)

In [None]:
fig = go.Figure()
# fig.add_trace(go.Scatterpolar(r=radar_data_798['scaled'], theta=radar_data_798['index'], fill='toself', name='798'))
# fig.add_trace(go.Scatterpolar(r=radar_data_498['scaled'], theta=radar_data_498['index'],fill='toself', name='498'))
fig.add_trace(go.Scatterpolar(r=radar_data_195['scaled'], theta=radar_data_195['index'], name='195'))
fig.add_trace(go.Scatterpolar(r=radar_data_498['scaled'], theta=radar_data_498['index'], name='498'))
fig.add_trace(go.Scatterpolar(r=radar_data_798['scaled'], theta=radar_data_798['index'], name='798'))
fig.show()

### Further Experimentation 

In [None]:
import scipy.stats as ss

In [None]:
# ref: medium article
def cramers_v(x, y):
    confusion_matrix = pd.crosstab(x,y)
    chi2 = ss.chi2_contingency(confusion_matrix)[0]
    n = confusion_matrix.sum().sum()
    phi2 = chi2/n
    r,k = confusion_matrix.shape
    phi2corr = max(0, phi2-((k-1)*(r-1))/(n-1))
    rcorr = r-((r-1)**2)/(n-1)
    kcorr = k-((k-1)**2)/(n-1)
    return np.sqrt(phi2corr/min((kcorr-1),(rcorr-1)))

print("Crame's coef: ", cramers_v(df_encoded['male'],df_encoded['TM195']))

In [None]:
from dython.nominal import conditional_entropy
from dython.nominal import Counter
from dython.nominal import associations

In [None]:
def theils_u(x, y):
    s_xy = conditional_entropy(x,y)
    x_counter = Counter(x)
    total_occurrences = sum(x_counter.values())
    p_x = list(map(lambda n: n/total_occurrences, x_counter.values()))
    s_x = ss.entropy(p_x)
    if s_x == 0:
        return 1
    else:
        return (s_x - s_xy) / s_x

print("Theils_u coef: ", theils_u(df_encoded['male'],df_encoded['TM195']))

#### Remarks:

* male and 798 are most correlated - backs up previous argument
* Interestingly 195 and female correlated - argument for marketing it more towards women
    * keeping 498 and marketing it for men????

### References

*Inspiration*\
[Brunet](https://obrunet.github.io/data%20science/cardio/)

*External Stats*\
[Office for National Statistics](https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/populationestimates/bulletins/populationestimatesbymaritalstatusandlivingarrangements/2019)

*Statistics Help*\
[Pearson's Coefficient](https://www.statisticssolutions.com/free-resources/directory-of-statistical-analyses/pearsons-correlation-coefficient/#:~:text=High%20Degree%3A%20Values%20between%20%C2%B1,of%20zero%20implies%20no%20relationship.)\
[Biserial Coefficient](https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.stats.pointbiserialr.html#:~:text=The%20point%20biserial%20correlation%20is,1%20imply%20a%20determinative%20relationship.) \
[Dice Similarity](https://www.sciencedirect.com/science/article/pii/S2213158216300560#:~:text=As%20described%20above%2C%20the%20Dice,high%20(0.80%20to%201.00).)

*Programming Help*\
[Piechart](https://www.statology.org/seaborn-pie-chart/)\
[Subplots](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.subplots.html)\
[pd_manipulation](https://stackoverflow.com/questions/22233488/pandas-drop-a-level-from-a-multi-level-column-index)\
[Groupedbar](https://www.geeksforgeeks.org/create-a-grouped-bar-plot-in-matplotlib/)\
[Radarplot](https://plotly.com/python/radar-chart/)\
[Masking correlation matrix](https://stackoverflow.com/questions/57414771/how-to-plot-only-the-lower-triangle-of-a-seaborn-heatmap)\
[ColorPalettes](https://seaborn.pydata.org/tutorial/color_palettes.html)
