## Working on the Cardiogoodfitness Project
https://www.kaggle.com/vikaspericherla/cardio-good-fitness

Lets collaborate using the below notebook.

#### Downloading the Required Packages

In [9]:
! pip install pandas
! pip install numpy
! pip install matplotlib
! pip install seaborn



#### Importing the Libraries Required

In [10]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import io
import requests


#### Loading Our Dataset

Create a folder into your google drive called Data and name the csv file as CardioGoodFitness.csv

Make sure you mount your drive on running the Jupyter Notebook.

In [43]:
url = "https://raw.githubusercontent.com/EdnahM/cardiogoodFitness/master/data/cardioGoodFitness.csv"
data = pd.read_csv(url, sep=',')
data

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
0,TM195,18,Male,14,Single,3,4,29562,112
1,TM195,19,Male,15,Single,2,3,31836,75
2,TM195,19,Female,14,Partnered,4,3,30699,66
3,TM195,19,Male,12,Single,3,3,32973,85
4,TM195,20,Male,13,Partnered,4,2,35247,47
...,...,...,...,...,...,...,...,...,...
175,TM798,40,Male,21,Single,6,5,83416,200
176,TM798,42,Male,18,Single,5,4,89641,200
177,TM798,45,Male,16,Single,5,5,90886,160
178,TM798,47,Male,18,Partnered,4,5,104581,120


### Data Cleaning and Preprocessing 
For better understanding of the data, the need to understand and clean your data to the required formart.

#### Data type

In [44]:
data.dtypes

Product          object
Age               int64
Gender           object
Education         int64
MaritalStatus    object
Usage             int64
Fitness           int64
Income            int64
Miles             int64
dtype: object

#### Checking for Null values

In [67]:
data.isnull().sum()

Product          0
Age              0
Gender           0
Education        0
MaritalStatus    0
Usage            0
Fitness          0
Income           0
Miles            0
dtype: int64

As we can see from our data there are no Null Values

#### Data columns 
-  Unique Values

In [46]:
data.columns

Index(['Product', 'Age', 'Gender', 'Education', 'MaritalStatus', 'Usage',
       'Fitness', 'Income', 'Miles'],
      dtype='object')

In [47]:
data['Product'].unique()

array(['TM195', 'TM498', 'TM798'], dtype=object)

In [48]:
data['Age'].unique()

array([18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 43, 44, 46, 47, 50, 45, 48, 42])

In [49]:
data['Gender'].unique()

array(['Male', 'Female'], dtype=object)

In [50]:
data['Education'].unique()

array([14, 15, 12, 13, 16, 18, 20, 21])

In [51]:
data['MaritalStatus'].unique()

array(['Single', 'Partnered'], dtype=object)

In [52]:
data['Usage'].unique()

array([3, 2, 4, 5, 6, 7])

In [53]:
data['Income'].unique()

array([ 29562,  31836,  30699,  32973,  35247,  37521,  36384,  38658,
        40932,  34110,  39795,  42069,  44343,  45480,  46617,  48891,
        53439,  43206,  52302,  51165,  50028,  54576,  68220,  55713,
        60261,  67083,  56850,  59124,  61398,  57987,  64809,  47754,
        65220,  62535,  48658,  54781,  48556,  58516,  53536,  61006,
        57271,  52291,  49801,  62251,  64741,  70966,  75946,  74701,
        69721,  83416,  88396,  90886,  92131,  77191,  52290,  85906,
       103336,  99601,  89641,  95866, 104581,  95508])

In [54]:
data['Fitness'].unique()

array([4, 3, 2, 1, 5])

In [55]:
data['Miles'].unique()

array([112,  75,  66,  85,  47, 141, 103,  94, 113,  38, 188,  56, 132,
       169,  64,  53, 106,  95, 212,  42, 127,  74, 170,  21, 120, 200,
       140, 100,  80, 160, 180, 240, 150, 300, 280, 260, 360])

From the above checking on the unique values,we can see we can work on setting the Categorical Variables from samples from the columns like:
 - 'Gender'
 - 'MaritalStatus'
 - 'Product' 

A function to summarize checking of unique values in any datacolumn in our dataset. Reduce repetition tasks.

In [56]:
data_columns = data.columns
def unique_val(df):
  for x in data_columns:
    uniqueval = df[x].unique()
    return uniqueval
    
unique_val(data)

array(['TM195', 'TM498', 'TM798'], dtype=object)

#### correlation of our data

In [57]:
data.corr()

Unnamed: 0,Age,Education,Usage,Fitness,Income,Miles
Age,1.0,0.280496,0.015064,0.061105,0.513414,0.036618
Education,0.280496,1.0,0.395155,0.410581,0.625827,0.307284
Usage,0.015064,0.395155,1.0,0.668606,0.519537,0.75913
Fitness,0.061105,0.410581,0.668606,1.0,0.535005,0.785702
Income,0.513414,0.625827,0.519537,0.535005,1.0,0.543473
Miles,0.036618,0.307284,0.75913,0.785702,0.543473,1.0


Describe Our Data

In [68]:
data.describe(include='all')

Unnamed: 0,Product,Age,Gender,Education,MaritalStatus,Usage,Fitness,Income,Miles
count,180,180.0,180,180.0,180,180.0,180.0,180.0,180.0
unique,3,,2,,2,,,,
top,TM195,,Male,,Partnered,,,,
freq,80,,104,,107,,,,
mean,,28.788889,,15.572222,,3.455556,3.311111,53719.577778,103.194444
std,,6.943498,,1.617055,,1.084797,0.958869,16506.684226,51.863605
min,,18.0,,12.0,,2.0,1.0,29562.0,21.0
25%,,24.0,,14.0,,3.0,3.0,44058.75,66.0
50%,,26.0,,16.0,,3.0,3.0,50596.5,94.0
75%,,33.0,,16.0,,4.0,4.0,58668.0,114.75


#### Categorical Variables

In [59]:
categorical_col = [col for col in data.columns if data[col].dtype=='O']
categorical_col

['Product', 'Gender', 'MaritalStatus']

In [60]:
categorical = data[categorical_col]

In [61]:
categorical.head()

Unnamed: 0,Product,Gender,MaritalStatus
0,TM195,Male,Single
1,TM195,Male,Single
2,TM195,Female,Partnered
3,TM195,Male,Single
4,TM195,Male,Partnered


In [62]:
categorical['Gender'] =  categorical['Gender'].replace("Male", 0)
categorical['Gender'] =  categorical['Gender'].replace("Female", 1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [63]:
categorical['MaritalStatus'] =  categorical['MaritalStatus'].replace("Single", 0)
categorical['MaritalStatus'] =  categorical['MaritalStatus'].replace("Partnered", 1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [64]:
categorical['Product'] = categorical['Product'].replace('TM195', 0)
categorical['Product'] = categorical['Product'].replace('TM498', 1)
categorical['Product'] = categorical['Product'].replace('TM798', 2)


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


In [65]:
categorical.tail(10)

Unnamed: 0,Product,Gender,MaritalStatus
170,2,0,1
171,2,1,1
172,2,0,0
173,2,0,1
174,2,0,1
175,2,0,0
176,2,0,0
177,2,0,0
178,2,0,1
179,2,0,1
