## Predicting Stress & Personal Achievement based on Lifestyle Factors

### Project Summary:

Predict levels of personal achievement based on lifestyle factors, using a dataset of 12,757 survey responses with 23 lifestyle attributes.

https://www.kaggle.com/ydalat/lifestyle-and-wellbeing-data

## Project Goals:
- Create a model that will identify drivers of high personal achievement based on lifestyle factors. 
- Create a single notebook with necessary helper functions and instructions that allow a user to reproduce results on their own. 

Trello board:  https://trello.com/b/ebZrkO2D/lifestyle-factors-that-affect-personal-achievement

<a id='back'></a>
### Quick Links to Sections within this Notebook

- [Acquire Data](#AD)
- [Prepare Data](#PD)
- [Explore Data](#EX)
- [Split Data](#SD)
- [Hypothesis Testing](#HY)
- [Cluster Data](#CL)
- [Hypothesis Testing on Clusters](#HC)
- [Scale Data](#Scale)
- [Modeling](#Model)
- [Model on Test Data](#TD)
- [Conclusion](#Concl)

In [1]:
# ignore warnings
import warnings
warnings.filterwarnings("ignore")

# Wrangling
import pandas as pd
import numpy as np
import wrangle
from acquire import get_zillow_data
# Statistical Tests
import scipy.stats as stats

# Visualizing
import matplotlib.pyplot as plt
from matplotlib import cm
import seaborn as sns

#Modeling Imports
from sklearn.model_selection import learning_curve
from sklearn.cluster import KMeans, dbscan
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.feature_selection import SelectKBest, RFE, f_regression 
from sklearn.linear_model import LinearRegression, LassoLars
from sklearn.preprocessing import PolynomialFeatures

# modeling methods
from sklearn.metrics import mean_squared_error, explained_variance_score
from sklearn.linear_model import LinearRegression, LassoLars, TweedieRegressor
from sklearn.preprocessing import PolynomialFeatures

pd.set_option('display.max_columns', 80)

<a id='AD'></a>
## ```Acquire Data```
[back](#back) /  [next](#PD)

In [1]:
import pandas as pd
import acquire
import prepare




In [2]:
df = acquire.get_wellbeing_data()
df

Unnamed: 0,Timestamp,FRUITS_VEGGIES,DAILY_STRESS,PLACES_VISITED,CORE_CIRCLE,SUPPORTING_OTHERS,SOCIAL_NETWORK,ACHIEVEMENT,DONATION,BMI_RANGE,...,SLEEP_HOURS,LOST_VACATION,DAILY_SHOUTING,SUFFICIENT_INCOME,PERSONAL_AWARDS,TIME_FOR_PASSION,WEEKLY_MEDITATION,AGE,GENDER,WORK_LIFE_BALANCE_SCORE
0,7/7/15,3,2,2,5,0,5,2,0,1,...,7,5,5,1,4,0,5,36 to 50,Female,609.5
1,7/7/15,2,3,4,3,8,10,5,2,2,...,8,2,2,2,3,2,6,36 to 50,Female,655.6
2,7/7/15,2,3,3,4,4,10,3,2,2,...,8,10,2,2,4,8,3,36 to 50,Female,631.6
3,7/7/15,3,3,10,3,10,7,2,5,2,...,5,7,5,1,5,2,0,51 or more,Female,622.7
4,7/7/15,5,1,3,3,10,4,2,4,2,...,7,0,0,2,8,1,5,51 or more,Female,663.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15967,3/14/21 5:42,3,3,0,4,0,10,0,4,2,...,7,0,1,1,5,2,5,51 or more,Female,644.5
15968,3/14/21 6:30,3,3,6,8,7,4,6,3,1,...,6,0,0,2,10,5,8,21 to 35,Female,714.9
15969,3/14/21 8:35,4,3,0,10,10,8,6,5,1,...,7,0,1,2,10,1,10,21 to 35,Male,716.6
15970,3/14/21 8:43,1,1,10,8,2,7,3,2,1,...,8,7,2,2,1,6,8,21 to 35,Female,682.0


In [3]:
#How many nulls in each column?
df.isnull().sum(axis =0).sum()


0

In [4]:
#How many nulls in each row?
df.isnull().sum(axis =1).sum()



0

In [5]:
df.columns

Index(['Timestamp', 'FRUITS_VEGGIES', 'DAILY_STRESS', 'PLACES_VISITED',
       'CORE_CIRCLE', 'SUPPORTING_OTHERS', 'SOCIAL_NETWORK', 'ACHIEVEMENT',
       'DONATION', 'BMI_RANGE', 'TODO_COMPLETED', 'FLOW', 'DAILY_STEPS',
       'LIVE_VISION', 'SLEEP_HOURS', 'LOST_VACATION', 'DAILY_SHOUTING',
       'SUFFICIENT_INCOME', 'PERSONAL_AWARDS', 'TIME_FOR_PASSION',
       'WEEKLY_MEDITATION', 'AGE', 'GENDER', 'WORK_LIFE_BALANCE_SCORE'],
      dtype='object')

In [6]:
# New dataframe 
df.groupby('PERSONAL_AWARDS')['SLEEP_HOURS'].value_counts()

PERSONAL_AWARDS  SLEEP_HOURS
0                7              147
                 8              128
                 6              117
                 5               55
                 9               45
                               ... 
10               4               64
                 10              61
                 2                5
                 3                2
                 1                1
Name: SLEEP_HOURS, Length: 103, dtype: int64

In [7]:
df.PERSONAL_AWARDS.value_counts()

10    3765
5     2210
3     1881
4     1733
2     1382
6     1344
7     1118
8      946
1      713
0      545
9      335
Name: PERSONAL_AWARDS, dtype: int64

<a id='PD'></a>
## ```Prepare Data```

[back](#back) |  [next](#EX)

In [8]:
df = prepare.prep_wellbeing(df)

In [9]:
df

Unnamed: 0,diet,DAILY_STRESS,PLACES_VISITED,CORE_CIRCLE,SUPPORTING_OTHERS,SOCIAL_NETWORK,ACHIEVEMENT,donation,bmi,TODO_COMPLETED,...,LIVE_VISION,sleep_hours,LOST_VACATION,DAILY_SHOUTING,PERSONAL_AWARDS,TIME_FOR_PASSION,WEEKLY_MEDITATION,age_range,is_female,WORK_LIFE_BALANCE_SCORE
0,3,2,2,5,0,5,2,0,1,6,...,0,7,5,5,4,0,5,36 to 50,Female,609.5
1,2,3,4,3,8,10,5,2,2,5,...,5,8,2,2,3,2,6,36 to 50,Female,655.6
2,2,3,3,4,4,10,3,2,2,2,...,5,8,10,2,4,8,3,36 to 50,Female,631.6
3,3,3,10,3,10,7,2,5,2,3,...,0,5,7,5,5,2,0,51 or more,Female,622.7
4,5,1,3,3,10,4,2,4,2,5,...,0,7,0,0,8,1,5,51 or more,Female,663.9
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
15967,3,3,0,4,0,10,0,4,2,8,...,7,7,0,1,5,2,5,51 or more,Female,644.5
15968,3,3,6,8,7,4,6,3,1,7,...,5,6,0,0,10,5,8,21 to 35,Female,714.9
15969,4,3,0,10,10,8,6,5,1,7,...,2,7,0,1,10,1,10,21 to 35,Male,716.6
15970,1,1,10,8,2,7,3,2,1,6,...,5,8,7,2,1,6,8,21 to 35,Female,682.0


<a id='EX'></a>
## ```Explore Data```
[back](#back) | [next](#HY)

<a id='SD'></a>
#### ```Split Data```

<a id='HY'></a>
## ```Hypothesis Testing - Anova```
[back](#back) | [next](#Pear)

<a id='CL'></a>
## ```Cluster Data```
[back](#back) | [next](#all)