# Project 1

[This dataset](https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/Proj1.csv) is adapted from the World Health Organization on Strokes (it's based on real data but is NOT REAL). Use this dataset to answer the following questions and perform the following tasks. Feel free to add extra cells as needed, but follow the structure listed here and clearly identify where each question is answered. Please remove any superflous code.

## Data Information

- `reg_to_vote`: 0 if no, 1 if yes.
- `age`: age of the patient in years.
- `hypertension`: 0 if the patient doesn't have hypertension, 1 if the patient has hypertension.
- `heart_disease`: 0 if the patient doesn't have any heart diseases, 1 if the patient has a heart disease.
- `ever_married`: 0 if no, 1 if yes.
- `Residence_type`: 0 for Rural, 1 for Urban.
- `avg_glucose_level`: average glucose level in blood.
- `bmi`: body mass index.
- `smoking_status_smokes`, `smoking_status_formerly`: Whether or not the person smokes, or formerly smoked. If a person has 0's for both these columns, they never smoked.
- `stroke`: 1 if the patient had a stroke or 0 if not.
- `dog_owner`: 0 if no, 1 if yes.
- `er_visits`: number of recorded Emergency Room visits in lifetime.
- `raccoons_to_fight`: number of racoons the patient belives they could fight off at once.
- `fast_food_budget_month`: amount (in US dollars) spent on fast food per month.


## Part I: Logistic Regression
Build a logistic regression model to predict whether or not someone had a `stroke` based on **all** the other variables in the dataset.

1. Count the missing data per column, and remove rows with missing data (if any).
2. Use 10 fold cross validation for your model validation. Z-score your continuous variables only. Store both the train and test accuracies to check for overfitting. **Is the model overfit? How can you tell?**
3. After completing steps 1-2, fit another logistic regression model on ALL of the data (no model validation; but do z score) using the same predictors as before, and put the coefficients into a dataframe called `coef`.
4. print out a confusion matrix for the model you made in part 3. **What does this confusion matrix tell you about your model? How can you tell?**

## Part II: Data Exploration
The WHO has asked the following five questions, create **at least 1 ggplot graph per question** (using the above data + model when needed) to help answer each question, and **explicitly answer the question in a Markdown cell** below your graph. You may use other calculations to help support your answer but MUST pair it with a graph. Write your answer as if you were explaining it to a non-data scientist. You will be graded on the effectiveness and clarity of your graph, as well as the completeness, clarity, and correctness of your responses and justifications.

1. In this specific data set, do dog-owners over 50 have a higher average probability of stoke than non-dog owners who currently smoke? How can you tell?
2. What is the relationship between average blood glucose and BMI? Is the relationship between those two variables different for people who are and are not registered to vote? How can you tell?
3. Is your logistic regression model most accurate for people who make less than 30k, between 30-90k, or over 100k? Discuss the potential accuracy *and* ethical implications if your model *were* more accurate for different groups (you can use the full model from part I-3 to check accuracy; DO NOT create/fit new models for each income range.).
4. Which of the following variables is the strongest predictor of having a stroke (owning a dog, residence type, marriage, being registered to vote)? How were you able to tell?
5. Create a variable `er_visits_per_year` that calculates the # of visits to the ER that a person has had per year of life. Store this variable in your data frame (no need to include this variable in the previous logistic regression model). Is the # of ER visits per year different for stroke and non-stroke patients? How can you tell?

## PART I

In [73]:
# PART I
import numpy as np
import pandas as pd
from plotnine import *

from sklearn.linear_model import LogisticRegression # Logistic Regression Model
from sklearn.preprocessing import StandardScaler #Z-score variables
from sklearn.metrics import accuracy_score, confusion_matrix, plot_confusion_matrix

from sklearn.model_selection import train_test_split # simple TT split cv
from sklearn.model_selection import KFold # k-fold cv
from sklearn.model_selection import LeaveOneOut #LOO cv
from sklearn.model_selection import cross_val_score # cross validation metrics
from sklearn.model_selection import cross_val_predict # cross validation metrics

In [25]:
data = pd.read_csv('https://raw.githubusercontent.com/cmparlettpelleriti/CPSC392ParlettPelleriti/master/Data/Proj1.csv')
data.head()

Unnamed: 0,age,hypertension,heart_disease,ever_married,Residence_type,avg_glucose_level,bmi,stroke,smoking_status_smokes,smoking_status_formerly,reg_to_vote,dog_owner,raccoons_to_fight,fast_food_budget_month,income_in_k,er_visits
0,60.0,1.0,0.0,0.0,1.0,73.0,25.2,0,1,0,1.0,1.0,10.0,209.19,51.553645,9.0
1,4.0,0.0,0.0,0.0,0.0,110.15,17.1,0,0,0,0.0,1.0,13.0,176.46,45.405414,5.0
2,77.0,0.0,0.0,1.0,1.0,68.38,27.8,0,0,0,0.0,1.0,6.0,213.0,94.865174,8.0
3,37.0,0.0,0.0,1.0,1.0,95.08,30.1,0,0,0,1.0,1.0,12.0,161.9,84.123775,8.0
4,44.0,0.0,0.0,0.0,0.0,103.78,40.9,0,1,0,1.0,1.0,11.0,261.29,74.794596,11.0


In [26]:
data.isna().sum()

age                         13
hypertension                12
heart_disease               21
ever_married                 9
Residence_type              21
avg_glucose_level           31
bmi                        575
stroke                       0
smoking_status_smokes        0
smoking_status_formerly      0
reg_to_vote                 14
dog_owner                   21
raccoons_to_fight           27
fast_food_budget_month       8
income_in_k                 21
er_visits                   15
dtype: int64

In [86]:
data = data.dropna()
data = data.reset_index()

In [87]:
contpred=['age','avg_glucose_level','bmi','er_visits','raccoons_to_fight','fast_food_budget_month']
predictors=data.columns.drop('stroke')
# print(predictors)
X = data[contpred]
zscore = StandardScaler()
zX = zscore.fit_transform(X)
data.loc[:,contpred]=zX
data.head()

Unnamed: 0,level_0,index,age,hypertension,heart_disease,ever_married,Residence_type,avg_glucose_level,bmi,stroke,smoking_status_smokes,smoking_status_formerly,reg_to_vote,dog_owner,raccoons_to_fight,fast_food_budget_month,income_in_k,er_visits
0,0,0,0.755732,1.0,0.0,0.0,1.0,-0.726799,-0.460635,0,1,0,1.0,1.0,0.012965,0.248335,51.553645,-0.021923
1,1,1,-1.706986,0.0,0.0,0.0,0.0,0.096425,-1.481171,0,0,0,0.0,1.0,0.75646,-0.548153,45.405414,-0.865736
2,2,2,1.503343,0.0,0.0,1.0,1.0,-0.829175,-0.133055,0,0,0,0.0,1.0,-0.978362,0.341052,94.865174,-0.232876
3,3,3,-0.255741,0.0,0.0,1.0,1.0,-0.237518,0.156726,0,0,0,1.0,1.0,0.508628,-0.902472,84.123775,-0.232876
4,4,4,0.052098,0.0,0.0,0.0,0.0,-0.044731,1.517441,0,1,0,1.0,1.0,0.260797,1.516195,74.794596,0.399983


In [88]:
Xtrain, Xtest, ytrain, ytest = train_test_split(data[predictors],data['stroke'], test_size=0.2)

In [89]:
#10fold

kf=KFold(n_splits=10)

X=Xtrain
y=ytrain

logit = LogisticRegression(solver='lbfgs',max_iter=1000)

acc = []

for train, test in kf.split(X):
    X_train = X.iloc[train]
    X_test = X.iloc[test]
    y_train = y[train]
    y_test = y[test]
    model = logit.fit(X_train, y_train)
    acc.append(accuracy_score(y_test,model.predict(X_test)))
    
print(acc)

KeyError: '[1150, 1153, 1154, 1155, 1161, 1163, 1165, 1170, 1173, 1184, 1187, 1200, 1214, 1215, 1216, 1222, 1224, 1230, 1241, 1252, 1253, 1259, 1264, 1267, 1269, 1272, 1275, 1276, 1289, 1301, 1304, 1305, 1309, 1311, 1316, 1322, 1326, 1329, 1331, 1334, 1339, 1342, 1344, 1345, 1349, 1351, 1353, 1355, 1357, 1371, 1374, 1377, 1381, 1386, 1388, 1394, 1398, 1403, 1404, 1411, 1412, 1413, 1416, 1417, 1421, 1424, 1432, 1446, 1447, 1454, 1460, 1462, 1463, 1464, 1465, 1471, 1477, 1488, 1497, 1502, 1503, 1509, 1517, 1519, 1527, 1530, 1533, 1549, 1551, 1558, 1560, 1566, 1568, 1572, 1578, 1579, 1609, 1611, 1621, 1627, 1629, 1630, 1641, 1645, 1648, 1651, 1652, 1657, 1664, 1670, 1674, 1677, 1679, 1690, 1691, 1693, 1694, 1698, 1699, 1703, 1704, 1714, 1735, 1737, 1740, 1741, 1754, 1764, 1769, 1770, 1777, 1786, 1788, 1790, 1793, 1795, 1799, 1802, 1803, 1812, 1813, 1814, 1822, 1830, 1838, 1839, 1843, 1849, 1866, 1868, 1869, 1870, 1871, 1872, 1875, 1877, 1882, 1890, 1893, 1898, 1899, 1903, 1904, 1906, 1907, 1916, 1924, 1925, 1941, 1953, 1960, 1961, 1963, 1967, 1973, 1983, 1984, 1992, 1993, 1995, 1996, 1997, 1999, 2001, 2012, 2015, 2018, 2038, 2058, 2059, 2073, 2076, 2079, 2080, 2091, 2101, 2105, 2110, 2114, 2129, 2139, 2146, 2147, 2149, 2151, 2153, 2159, 2160, 2161, 2163, 2171, 2181, 2183, 2188, 2189, 2192, 2194, 2197, 2202, 2203, 2207, 2210, 2212, 2221, 2223, 2224, 2227, 2231, 2232, 2234, 2238, 2239, 2240, 2259, 2263, 2268, 2270, 2275, 2278, 2280, 2294, 2302, 2305, 2309, 2319, 2328, 2331, 2333, 2337, 2345, 2350, 2366, 2370, 2378, 2385, 2387, 2408, 2413, 2418, 2419, 2422, 2425, 2429, 2432, 2435, 2436, 2437, 2443, 2446, 2455, 2462, 2470, 2473, 2476, 2484, 2496, 2498, 2505, 2507, 2508, 2511, 2525, 2527, 2534, 2540, 2545, 2546, 2548, 2553, 2554, 2559, 2562, 2563, 2572, 2575, 2580, 2581, 2595, 2598, 2599, 2605, 2608, 2611, 2614, 2620, 2628, 2635, 2641, 2647, 2649, 2651, 2653, 2654, 2657, 2660, 2671, 2676, 2683, 2684, 2692, 2693, 2697, 2714, 2723, 2727, 2735, 2743, 2748, 2749, 2764, 2771, 2773, 2776, 2779, 2786, 2794, 2795, 2801, 2802, 2808, 2820, 2832, 2835, 2836, 2839, 2842, 2844, 2854, 2866, 2869, 2874, 2877, 2881, 2887, 2889, 2890, 2898, 2900, 2910, 2912, 2913, 2917, 2921, 2928, 2937, 2939, 2944, 2945, 2949, 2952, 2975, 2982, 2997, 2998, 3002, 3014, 3019, 3030, 3031, 3037, 3046, 3074, 3080, 3084, 3089, 3090, 3117, 3124, 3125, 3130, 3132, 3134, 3138, 3141, 3142, 3157, 3161, 3165, 3166, 3169, 3170, 3172, 3180, 3182, 3191, 3193, 3209, 3210, 3216, 3224, 3225, 3230, 3236, 3244, 3247, 3254, 3261, 3265, 3269, 3270, 3276, 3285, 3303, 3304, 3311, 3312, 3317, 3320, 3321, 3334, 3338, 3342, 3349, 3350, 3352, 3354, 3382, 3386, 3387, 3389, 3393, 3394, 3397, 3401, 3404, 3408, 3410, 3416, 3421, 3424, 3433, 3440, 3442, 3446, 3456, 3457, 3461, 3465, 3466, 3473, 3478, 3495, 3499, 3504, 3506, 3513, 3517, 3521, 3524, 3538, 3539, 3550, 3553, 3554, 3557, 3564, 3572, 3575, 3579, 3590, 3597, 3600, 3602, 3606, 3615, 3617, 3629, 3631, 3636, 3638, 3639, 3649, 3657, 3670, 3671, 3674, 3693, 3699, 3710, 3729, 3748, 3766, 3776, 3779, 3785, 3790, 3797, 3799, 3803, 3811, 3812, 3816, 3822, 3823, 3828, 3837, 3838, 3840, 3841, 3846, 3855, 3856, 3868, 3874, 3876, 3880, 3886, 3889, 3896, 3898, 3899, 3903, 3905, 3912, 3915, 3923, 3933, 3940, 3951, 3953, 3960, 3963, 3969, 3972, 3974, 3979, 3983, 3984, 3987, 3988, 3990, 3991, 3993, 3999, 4000, 4002, 4009, 4018, 4031, 4035, 4046, 4047, 4051, 4059, 4061, 4067, 4069, 4071, 4072, 4082, 4083, 4090, 4091, 4092, 4095, 4097, 4098, 4100, 4101, 4102, 4106, 4107, 4111, 4120, 4122, 4130, 4135, 4141, 4143, 4147, 4148, 4154, 4161, 4164, 4180, 4199, 4205, 4207, 4212, 4217, 4219, 4225, 4229, 4231, 4234, 4235, 4243, 4245, 4249, 4251, 4257, 4264, 4273, 4274, 4279, 4280, 4283, 4288, 4293, 4294, 4295, 4299, 4302, 4308, 4318, 4336, 4339, 4345, 4347, 4350, 4353, 4356, 4363, 4366, 4373, 4374, 4380, 4382, 4384, 4391, 4393, 4398, 4406, 4413, 4425, 4432, 4439, 4468, 4479, 4503, 4507, 4508, 4509, 4513, 4516, 4518, 4519, 4520, 4528, 4541, 4545, 4549, 4564, 4568, 4571, 4578, 4586, 4589, 4590, 4591, 4592, 4597, 4600, 4603, 4604, 4613, 4614, 4622, 4632, 4634, 4654, 4655, 4657, 4663, 4667, 4669, 4671, 4678, 4680, 4690, 4691, 4692, 4695, 4706, 4712, 4713, 4732, 4742, 4746, 4758, 4760, 4761, 4763, 4770, 4772, 4774, 4779, 4782, 4783, 4784, 4785, 4798, 4802, 4809, 4813, 4819, 4822, 4825, 4828, 4830, 4833, 4836, 4840, 4844, 4846, 4849, 4851, 4855, 4863, 4876, 4882, 4889, 4891, 4894, 4899, 4917, 4918, 4919, 4927, 4934, 4939, 4946, 4947, 4959, 4960, 4967, 4972, 4973, 4975, 4976, 4977, 4978, 4980, 4982, 4983, 4992, 4993, 5004, 5005, 5007, 5010, 5026, 5029, 5031, 5043, 5044, 5049, 5067, 5069, 5070, 5071, 5072, 5078, 5086, 5098, 5102, 5103, 5104, 5105, 5108, 5115, 5116, 5127, 5129, 5134, 5135, 5138, 5141, 5152, 5166, 5167, 5177, 5179, 5183, 5185, 5186, 5189, 5194, 5195, 5198, 5202, 5207, 5211, 5213, 5220, 5226, 5253, 5255, 5256, 5258, 5265, 5267, 5272, 5273, 5276, 5277, 5279, 5280, 5283, 5284, 5291, 5296, 5312, 5314, 5327, 5328, 5333, 5339, 5342, 5343, 5349, 5351, 5352, 5358, 5361, 5367, 5368, 5378, 5382, 5386, 5391, 5394, 5405, 5408, 5414, 5416, 5417, 5418, 5429, 5447, 5448, 5450, 5453, 5456, 5457, 5459, 5463, 5475, 5479, 5482, 5486, 5491, 5495, 5497, 5502, 5507, 5509, 5512, 5517, 5519, 5522, 5529, 5534, 5541, 5544, 5551, 5557, 5561, 5562, 5582, 5585, 5589, 5593, 5597, 5606, 5609, 5614, 5629, 5631, 5634, 5635, 5640, 5645, 5646, 5649, 5650, 5652, 5654, 5658, 5661, 5666, 5673, 5674, 5675, 5692, 5701, 5702, 5703, 5709, 5716, 5718, 5723, 5734, 5740, 5743, 5748, 5749, 5756, 5771, 5786, 5789, 5793, 5798, 5800, 5803, 5809, 5814, 5815, 5818, 5821, 5825, 5826, 5828, 5840, 5851, 5852, 5856, 5858, 5867, 5872, 5884, 5889, 5891, 5894, 5896, 5898, 5903, 5907, 5914, 5918, 5923, 5932, 5936, 5942, 5952, 5956, 5957, 5966, 5968, 5969, 5972, 5980, 5981, 5985, 5987, 5991, 5992, 5995, 6000, 6013, 6018, 6022, 6026, 6027, 6029, 6032, 6035, 6041, 6048, 6054, 6058, 6067, 6076, 6081, 6084, 6098, 6104, 6109, 6111, 6119, 6125, 6127, 6132, 6150, 6151, 6152, 6161, 6163, 6169, 6173, 6175, 6182, 6185, 6190, 6192, 6198, 6199, 6203, 6206, 6207, 6208, 6224, 6226, 6227, 6230, 6233, 6261, 6280, 6293, 6295, 6296, 6298, 6310, 6313, 6316, 6322, 6337, 6340, 6344, 6350, 6351, 6352, 6362, 6365, 6367, 6376, 6386, 6387, 6391, 6397, 6398, 6401, 6406, 6414, 6426, 6428, 6429, 6430, 6433, 6436, 6441, 6442, 6444, 6446, 6450, 6452, 6464, 6466, 6473, 6487, 6494, 6500, 6503, 6505, 6509, 6517, 6528, 6529, 6535, 6541, 6544, 6545, 6551, 6554, 6555, 6561, 6563, 6565, 6568, 6570, 6574, 6585, 6589, 6591, 6613, 6617, 6621, 6630, 6639, 6659, 6660, 6662, 6668, 6670, 6672, 6674, 6686, 6688, 6698, 6699, 6701, 6726, 6747, 6754, 6761, 6762, 6764, 6767, 6769, 6774, 6778, 6792, 6800, 6802, 6808, 6814, 6816, 6821, 6827, 6831, 6833, 6840, 6841, 6843, 6844, 6854, 6861, 6868, 6872, 6874, 6875, 6876, 6879, 6882, 6883, 6884, 6886, 6887, 6895, 6902, 6904, 6906, 6910, 6923, 6934, 6935, 6940, 6944, 6946, 6953, 6955, 6963, 6964, 6966, 6979, 6984, 6985, 6991, 6995, 6997, 7001, 7003, 7013, 7019, 7023, 7027, 7032, 7033, 7037, 7051, 7061, 7068, 7071, 7075, 7083, 7084, 7095, 7102, 7103, 7108, 7115, 7121, 7125, 7127, 7130, 7147, 7150, 7152, 7155, 7156, 7166, 7175, 7184, 7188, 7190, 7193, 7196, 7208, 7211, 7214, 7217, 7218, 7223, 7225, 7229, 7236, 7238, 7244, 7245, 7249, 7256, 7257, 7258, 7260, 7272, 7275, 7280, 7282, 7289, 7299, 7301, 7302, 7310, 7312, 7316, 7322, 7327, 7331, 7337, 7339, 7340, 7344, 7348, 7354, 7356, 7365, 7370, 7373, 7380, 7381, 7388, 7389, 7397, 7414, 7421, 7422, 7427, 7429, 7438, 7443, 7456, 7461, 7471, 7476, 7480, 7486, 7487, 7494, 7499, 7501, 7517, 7518, 7522, 7524, 7531, 7537, 7540, 7545, 7546, 7549, 7551, 7556, 7562, 7563, 7571, 7573, 7574, 7579, 7588, 7593, 7597, 7599, 7603, 7613, 7621, 7627, 7628, 7630, 7631, 7636, 7644, 7646, 7650, 7655, 7662, 7665, 7669, 7680, 7681, 7686, 7689, 7695, 7698, 7699, 7703, 7705, 7707, 7709, 7710, 7716, 7719, 7721, 7727, 7728, 7729, 7732, 7733, 7735, 7738, 7739, 7742, 7750, 7752, 7754, 7756, 7771, 7774, 7779, 7781, 7784, 7786, 7787, 7799, 7802, 7804, 7805, 7807, 7809, 7812, 7820, 7824, 7829, 7833, 7842, 7851, 7859, 7864, 7867, 7868, 7876, 7887, 7889, 7900, 7910, 7911, 7923, 7924, 7927, 7930, 7936, 7943, 7949, 7951, 7952, 7954, 7963, 7964, 7973, 7975, 7976, 7978, 7979, 7988, 7989, 7998, 8000, 8008, 8021, 8024, 8025, 8029, 8031, 8032, 8036, 8038, 8039, 8041, 8046, 8050, 8052, 8065, 8071, 8072, 8077, 8090, 8094, 8110, 8117, 8120, 8122, 8128, 8139, 8148, 8149, 8152, 8154, 8155, 8160, 8161, 8163, 8167, 8169, 8173, 8178, 8184, 8187, 8190, 8193, 8199, 8204, 8208, 8210, 8212, 8216, 8217, 8223, 8224, 8228, 8230, 8235, 8243, 8244, 8248, 8254, 8263, 8266, 8267, 8271, 8273, 8275, 8289, 8291, 8297, 8302, 8313, 8316, 8318, 8321, 8323, 8326, 8327, 8332, 8336, 8342, 8348, 8351, 8354, 8358, 8360, 8366, 8370, 8371, 8378, 8384, 8385, 8387, 8400, 8409, 8410, 8415, 8423, 8426, 8427, 8434, 8451, 8460, 8463, 8464, 8466, 8487, 8491, 8494, 8495, 8496, 8500, 8504, 8506, 8511, 8523, 8543, 8550, 8560, 8562, 8567, 8568, 8575, 8577, 8584, 8592, 8593, 8599, 8605, 8611, 8615, 8622, 8625, 8631, 8632, 8635, 8637, 8639, 8646, 8652, 8661, 8666, 8675, 8679, 8680, 8685, 8687, 8689, 8704, 8706, 8708, 8714, 8719, 8726, 8729, 8736, 8742, 8753, 8754, 8758, 8769, 8771, 8772, 8773, 8779, 8786, 8788, 8794, 8800, 8805, 8817, 8822, 8834, 8835, 8843, 8847, 8851, 8855, 8860, 8863, 8867, 8876, 8883, 8898, 8902, 8905, 8907, 8919, 8924, 8931, 8935, 8936, 8941, 8945, 8950, 8957, 8970, 8976, 8979, 8985, 8986, 8991, 8993, 8996, 8999, 9005, 9015, 9021, 9022, 9027, 9047, 9058, 9059, 9064, 9065, 9066, 9067, 9076, 9079, 9083, 9087, 9089, 9090, 9098, 9100, 9111, 9113, 9117, 9138, 9141, 9147, 9150, 9152, 9156, 9158, 9181, 9185, 9186, 9192, 9194, 9196, 9197, 9199, 9202, 9203, 9205, 9207, 9211, 9214, 9215, 9220, 9224, 9227, 9233, 9247, 9258, 9263, 9266, 9269, 9271, 9273, 9277, 9278, 9279, 9288, 9296, 9300, 9304, 9306, 9310, 9313, 9317, 9319, 9334, 9344, 9372, 9382, 9383, 9387, 9389, 9391, 9392, 9397, 9400, 9405, 9412, 9417, 9419, 9423, 9429, 9434, 9436, 9439, 9443, 9459, 9460, 9468, 9475, 9479, 9481, 9483, 9488, 9496, 9503, 9505, 9521, 9522, 9523, 9532, 9533, 9547, 9549, 9550, 9554, 9557, 9568, 9572, 9576, 9581, 9584, 9586, 9590, 9596, 9597, 9600, 9601, 9605, 9610, 9611, 9616, 9619, 9628, 9634, 9637, 9642, 9644, 9645, 9648, 9649, 9656, 9657, 9664, 9667, 9668, 9674, 9678, 9681, 9684, 9685, 9686, 9688, 9691, 9695, 9709, 9716, 9735, 9738, 9751, 9752, 9786, 9793, 9801, 9805, 9806, 9810, 9813, 9815, 9819, 9826, 9833, 9838, 9839, 9846, 9855, 9878, 9879, 9886, 9887, 9888, 9892, 9914, 9918, 9929, 9931, 9932, 9937, 9938, 9941, 9943, 9945, 9947, 9951, 9953, 9961, 9965, 9966, 9968, 9980, 9997, 10016, 10027, 10029, 10030, 10031, 10037, 10040, 10042, 10050, 10068, 10070, 10073, 10088, 10091, 10094, 10105, 10108, 10110, 10113, 10120, 10125, 10132, 10134, 10140, 10147, 10156, 10159, 10174, 10176, 10178, 10191, 10195, 10197, 10199, 10220, 10221, 10222, 10226, 10247, 10259, 10267, 10275, 10291, 10314, 10315, 10325, 10335, 10345, 10349, 10358, 10362, 10365, 10366, 10386, 10387, 10398, 10403, 10424, 10433, 10435, 10436, 10438, 10443, 10456, 10461, 10465, 10468, 10471, 10474, 10477, 10482, 10483, 10485, 10486, 10489, 10494, 10497, 10507, 10518, 10521, 10526, 10530, 10531, 10537, 10539, 10543, 10555, 10558, 10561, 10566, 10571, 10575, 10581, 10591, 10592, 10593, 10610, 10611, 10618, 10621, 10627, 10629, 10630, 10631, 10648, 10655, 10657, 10662, 10664, 10682, 10691, 10692, 10693, 10697, 10698, 10699, 10725, 10736, 10741, 10746, 10748, 10759, 10762, 10764, 10769, 10775, 10776, 10779, 10787, 10790, 10791, 10792, 10795, 10799, 10802, 10803, 10805, 10806, 10814, 10834, 10840, 10847, 10852, 10854, 10856, 10880, 10884, 10885, 10886, 10888, 10899, 10909, 10910, 10916, 10918, 10919, 10922, 10927, 10929, 10931, 10932, 10933, 10935, 10940, 10941, 10946, 10949, 10957, 10960, 10966, 10976, 10986, 10994, 10995, 10997, 11002, 11021, 11026, 11034, 11044, 11046, 11047, 11054, 11055, 11061, 11068, 11070, 11077, 11085, 11088, 11097, 11100, 11103, 11107, 11109, 11110, 11126, 11127, 11129, 11139, 11143, 11146, 11148, 11151, 11154, 11160, 11168, 11171, 11175, 11177, 11178, 11179, 11188, 11189, 11190, 11191, 11195, 11197, 11198, 11200, 11201, 11205, 11213, 11221, 11227, 11234, 11241, 11242, 11254, 11261, 11262, 11270, 11277, 11279, 11283, 11284, 11289, 11291, 11292, 11298, 11300, 11304, 11307, 11313, 11314, 11317, 11325, 11332, 11334, 11339, 11341, 11349, 11358, 11373, 11376] not in index'

2. ANSWER HERE
https://stackoverflow.com/questions/62658215/convergencewarning-lbfgs-failed-to-converge-status-1-stop-total-no-of-iter
https://stackoverflow.com/questions/53720684/use-kfold-split-to-fit-model-return-not-in-index
5. ANSWER HERE

## PART II

In [2]:
# PART II, 1

1. DISCUSSION + ANSWERE HERE

In [3]:
# PART II, 2

2. DISCUSSION + ANSWERE HERE

In [4]:
# PART II, 3

3. DISCUSSION + ANSWERE HERE

In [5]:
# PART II, 4

4. DISCUSSION + ANSWERE HERE

In [6]:
# PART II, 5

5. DISCUSSION + ANSWERE HERE