<a href="https://colab.research.google.com/github/cassidyhanna/AB-Demo/blob/master/module4-logistic-regression/%20U2%2CS1%2CM4_Logistic.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Lambda School Data Science

*Unit 2, Sprint 1, Module 4*

---

# Logistic Regression


## Assignment 🌯

You'll use a [**dataset of 400+ burrito reviews**](https://srcole.github.io/100burritos/). How accurately can you predict whether a burrito is rated 'Great'?

> We have developed a 10-dimensional system for rating the burritos in San Diego. ... Generate models for what makes a burrito great and investigate correlations in its dimensions.

- [ ] Do train/validate/test split. Train on reviews from 2016 & earlier. Validate on 2017. Test on 2018 & later.
- [ ] Begin with baselines for classification.
- [ ] Use scikit-learn for logistic regression.
- [ ] Get your model's validation accuracy. (Multiple times if you try multiple iterations.)
- [ ] Get your model's test accuracy. (One time, at the end.)
- [ ] Commit your notebook to your fork of the GitHub repo.
- [ ] Watch Aaron's [video #1](https://www.youtube.com/watch?v=pREaWFli-5I) (12 minutes) & [video #2](https://www.youtube.com/watch?v=bDQgVt4hFgY) (9 minutes) to learn about the mathematics of Logistic Regression.


## Stretch Goals

- [ ] Add your own stretch goal(s) !
- [ ] Make exploratory visualizations.
- [ ] Do one-hot encoding.
- [ ] Do [feature scaling](https://scikit-learn.org/stable/modules/preprocessing.html).
- [ ] Get and plot your coefficients.
- [ ] Try [scikit-learn pipelines](https://scikit-learn.org/stable/modules/compose.html).

In [0]:
%%capture
import sys

# If you're on Colab:
if 'google.colab' in sys.modules:
    DATA_PATH = 'https://raw.githubusercontent.com/LambdaSchool/DS-Unit-2-Linear-Models/master/data/'
    !pip install category_encoders==2.*
    !pip install pandas_profiling==2.*
    !pip install pandas-profiling==2.*
    

# If you're working locally:
else:
    DATA_PATH = '../data/'

In [0]:
# Load data downloaded from https://srcole.github.io/100burritos/
import pandas as pd
import numpy as np
import pandas_profiling as pp
import category_encoders as ce
from sklearn.dummy import DummyClassifier
from sklearn.impute import SimpleImputer
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score


df = pd.read_csv(DATA_PATH+'burritos/burritos.csv')

In [0]:
# Derive binary classification target:
# We define a 'Great' burrito as having an
# overall rating of 4 or higher, on a 5 point scale.
# Drop unrated burritos.
df = df.dropna(subset=['overall'])
df['Great'] = df['overall'] >= 4

In [0]:
# Clean/combine the Burrito categories
df['Burrito'] = df['Burrito'].str.lower()

california = df['Burrito'].str.contains('california')
asada = df['Burrito'].str.contains('asada')
surf = df['Burrito'].str.contains('surf')
carnitas = df['Burrito'].str.contains('carnitas')

df.loc[california, 'Burrito'] = 'California'
df.loc[asada, 'Burrito'] = 'Asada'
df.loc[surf, 'Burrito'] = 'Surf & Turf'
df.loc[carnitas, 'Burrito'] = 'Carnitas'
df.loc[~california & ~asada & ~surf & ~carnitas, 'Burrito'] = 'Other'

In [0]:
df.columns = [c.replace(' ', '_') for c in df.columns]
df.columns = [c.replace('.', '_') for c in df.columns]
df.columns = [c.replace("(g)","") for c in df.columns]
df.columns = [c.replace("(g/mL)","") for c in df.columns]


In [0]:
# Drop some high cardinality categoricals
df = df.drop(columns=['Notes', 'Location', 'Reviewer', 'Address', 'URL', 'Neighborhood'])

In [0]:
# Drop some columns to prevent "leakage"
df = df.drop(columns=['Rec', 'overall'])
#Change to datetime
#df['Date'] = df['Date'].astype(str)
df['Date'] = pd.to_datetime(df['Date'])
# set the dataframe index to date
df = df.set_index('Date')
# create a column that focus's on the year
df['Year'] = df.index.year


In [8]:
df

Unnamed: 0_level_0,Burrito,Yelp,Google,Chips,Cost,Hunger,Mass_,Density_,Length,Circum,Volume,Tortilla,Temp,Meat,Fillings,Meat:filling,Uniformity,Salsa,Synergy,Wrap,Unreliable,NonSD,Beef,Pico,Guac,Cheese,Fries,Sour_cream,Pork,Chicken,Shrimp,Fish,Rice,Beans,Lettuce,Tomato,Bell_peper,Carrots,Cabbage,Sauce,Salsa_1,Cilantro,Onion,Taquito,Pineapple,Ham,Chile_relleno,Nopales,Lobster,Queso,Egg,Mushroom,Bacon,Sushi,Avocado,Corn,Zucchini,Great,Year
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1
2016-01-18,California,3.5,4.2,,6.49,3.0,,,,,,3.0,5.0,3.0,3.5,4.0,4.0,4.0,4.0,4.0,,,x,x,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-24,California,3.5,3.3,,5.45,3.5,,,,,,2.0,3.5,2.5,2.5,2.0,4.0,3.5,2.5,5.0,,,x,x,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-24,Carnitas,,,,4.85,1.5,,,,,,3.0,2.0,2.5,3.0,4.5,4.0,3.0,3.0,5.0,,,,x,x,,,,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-24,Asada,,,,5.25,2.0,,,,,,3.0,2.0,3.5,3.0,4.0,5.0,4.0,4.0,5.0,,,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-27,California,4.0,3.8,x,6.59,4.0,,,,,,4.0,5.0,4.0,3.5,4.5,5.0,2.5,4.5,4.0,,,x,x,,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,2016
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2019-08-27,Other,,,,6.00,1.0,,,17.0,20.5,0.57,5.0,4.0,3.5,,4.0,4.0,2.0,2.0,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2019
2019-08-27,Other,,,,6.00,4.0,,,19.0,26.0,1.02,4.0,5.0,,3.5,4.0,4.0,5.0,4.0,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,2019
2019-08-27,California,,,,7.90,3.0,,,20.0,22.0,0.77,4.0,4.0,4.0,3.7,3.0,2.0,3.5,4.0,4.5,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2019
2019-08-27,Other,,,,7.90,3.0,,,22.5,24.5,1.07,5.0,2.0,5.0,5.0,5.0,2.0,5.0,5.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,2019


In [9]:
df.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 421 entries, 2016-01-18 to 2019-08-27
Data columns (total 59 columns):
Burrito          421 non-null object
Yelp             87 non-null float64
Google           87 non-null float64
Chips            26 non-null object
Cost             414 non-null float64
Hunger           418 non-null float64
Mass_            22 non-null float64
Density_         22 non-null float64
Length           283 non-null float64
Circum           281 non-null float64
Volume           281 non-null float64
Tortilla         421 non-null float64
Temp             401 non-null float64
Meat             407 non-null float64
Fillings         418 non-null float64
Meat:filling     412 non-null float64
Uniformity       419 non-null float64
Salsa            396 non-null float64
Synergy          419 non-null float64
Wrap             418 non-null float64
Unreliable       33 non-null object
NonSD            7 non-null object
Beef             179 non-null object
Pico           

In [10]:
df.describe()

Unnamed: 0,Yelp,Google,Cost,Hunger,Mass_,Density_,Length,Circum,Volume,Tortilla,Temp,Meat,Fillings,Meat:filling,Uniformity,Salsa,Synergy,Wrap,Queso,Year
count,87.0,87.0,414.0,418.0,22.0,22.0,283.0,281.0,281.0,421.0,401.0,407.0,418.0,412.0,419.0,396.0,419.0,418.0,0.0,421.0
mean,3.887356,4.167816,7.067343,3.495335,546.181818,0.675277,20.038233,22.135765,0.786477,3.519477,3.783042,3.620393,3.539833,3.586481,3.428998,3.37197,3.586993,3.979904,,2016.410926
std,0.475396,0.373698,1.506742,0.812069,144.445619,0.080468,2.083518,1.779408,0.152531,0.794438,0.980338,0.829254,0.799549,0.997057,1.068794,0.924037,0.886807,1.118185,,0.896965
min,2.5,2.9,2.99,0.5,350.0,0.56,15.0,17.0,0.4,1.0,1.0,1.0,1.0,0.5,0.0,0.0,1.0,0.0,,2011.0
25%,3.5,4.0,6.25,3.0,450.0,0.619485,18.5,21.0,0.68,3.0,3.0,3.0,3.0,3.0,2.6,3.0,3.0,3.5,,2016.0
50%,4.0,4.2,6.99,3.5,540.0,0.658099,20.0,22.0,0.77,3.5,4.0,3.8,3.5,4.0,3.5,3.5,3.8,4.0,,2016.0
75%,4.0,4.4,7.88,4.0,595.0,0.721726,21.5,23.0,0.88,4.0,4.5,4.0,4.0,4.0,4.0,4.0,4.0,5.0,,2017.0
max,4.5,5.0,25.0,5.0,925.0,0.865672,26.0,29.0,1.54,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,,2026.0


In [11]:
df.head(10)

Unnamed: 0_level_0,Burrito,Yelp,Google,Chips,Cost,Hunger,Mass_,Density_,Length,Circum,Volume,Tortilla,Temp,Meat,Fillings,Meat:filling,Uniformity,Salsa,Synergy,Wrap,Unreliable,NonSD,Beef,Pico,Guac,Cheese,Fries,Sour_cream,Pork,Chicken,Shrimp,Fish,Rice,Beans,Lettuce,Tomato,Bell_peper,Carrots,Cabbage,Sauce,Salsa_1,Cilantro,Onion,Taquito,Pineapple,Ham,Chile_relleno,Nopales,Lobster,Queso,Egg,Mushroom,Bacon,Sushi,Avocado,Corn,Zucchini,Great,Year
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1
2016-01-18,California,3.5,4.2,,6.49,3.0,,,,,,3.0,5.0,3.0,3.5,4.0,4.0,4.0,4.0,4.0,,,x,x,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-24,California,3.5,3.3,,5.45,3.5,,,,,,2.0,3.5,2.5,2.5,2.0,4.0,3.5,2.5,5.0,,,x,x,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-24,Carnitas,,,,4.85,1.5,,,,,,3.0,2.0,2.5,3.0,4.5,4.0,3.0,3.0,5.0,,,,x,x,,,,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-24,Asada,,,,5.25,2.0,,,,,,3.0,2.0,3.5,3.0,4.0,5.0,4.0,4.0,5.0,,,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-27,California,4.0,3.8,x,6.59,4.0,,,,,,4.0,5.0,4.0,3.5,4.5,5.0,2.5,4.5,4.0,,,x,x,,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,True,2016
2016-01-28,Other,,,,6.99,4.0,,,,,,3.0,4.0,5.0,3.5,2.5,2.5,2.5,4.0,1.0,,,,,x,x,,x,,x,,,x,x,x,x,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-30,California,3.0,2.9,,7.19,1.5,,,,,,2.0,3.0,3.0,2.0,2.5,2.5,,2.0,3.0,,,x,,,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-01-30,Carnitas,,,,6.99,4.0,,,,,,2.5,3.0,3.0,2.5,3.0,3.5,,2.5,3.0,,,,x,x,,,,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-02-01,California,3.0,3.7,x,9.25,3.5,,,,,,2.0,4.5,4.5,3.5,1.5,3.0,3.5,4.0,2.0,,,x,x,x,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016
2016-02-06,Asada,4.0,4.1,,6.25,3.5,,,,,,2.5,1.5,1.5,3.0,4.5,3.0,1.5,2.0,4.5,,,x,x,x,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,False,2016


In [12]:
df.isnull().sum()

Burrito            0
Yelp             334
Google           334
Chips            395
Cost               7
Hunger             3
Mass_            399
Density_         399
Length           138
Circum           140
Volume           140
Tortilla           0
Temp              20
Meat              14
Fillings           3
Meat:filling       9
Uniformity         2
Salsa             25
Synergy            2
Wrap               3
Unreliable       388
NonSD            414
Beef             242
Pico             263
Guac             267
Cheese           262
Fries            294
Sour_cream       329
Pork             370
Chicken          400
Shrimp           400
Fish             415
Rice             385
Beans            386
Lettuce          410
Tomato           414
Bell_peper       414
Carrots          420
Cabbage          413
Sauce            383
Salsa_1          414
Cilantro         406
Onion            404
Taquito          417
Pineapple        414
Ham              419
Chile_relleno    417
Nopales      

In [13]:
train = df[df['Year'] <= 2016]
val = df[df['Year'] == 2017]
test = df[df['Year'] >= 2018]
train.shape,val.shape,test.shape

((298, 59), (85, 59), (38, 59))

In [14]:
train.Great.value_counts(normalize= True)

False    0.590604
True     0.409396
Name: Great, dtype: float64

In [15]:
val.Great.value_counts(normalize= True)

False    0.552941
True     0.447059
Name: Great, dtype: float64

In [0]:
 categorical_features = train.describe(exclude='number')

In [0]:
categorical_feature_mask = train.dtypes==object

In [18]:
 categorical_features

Unnamed: 0,Burrito,Chips,Unreliable,NonSD,Beef,Pico,Guac,Cheese,Fries,Sour_cream,Pork,Chicken,Shrimp,Fish,Rice,Beans,Lettuce,Tomato,Bell_peper,Carrots,Cabbage,Sauce,Salsa_1,Cilantro,Onion,Taquito,Pineapple,Ham,Chile_relleno,Nopales,Lobster,Egg,Mushroom,Bacon,Sushi,Avocado,Corn,Zucchini,Great
count,298,22,27,5,168,143,139,149,119,85,43,20,20,5,33,32,11,7,7,1,7,37,6,15,17,4,7,1,4,4,1,4,3,3,2,13,2,1,298
unique,5,2,1,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,1,2,2,2,2,2,2,2,1,1,1,1,1,1,1,1,1,2,1,2
top,California,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,x,False
freq,118,19,27,3,130,115,101,121,97,63,29,19,17,3,24,24,9,5,4,1,5,33,5,9,9,3,5,1,4,4,1,4,3,3,2,13,1,1,176


In [19]:
train.describe(include='number')

Unnamed: 0,Yelp,Google,Cost,Hunger,Mass_,Density_,Length,Circum,Volume,Tortilla,Temp,Meat,Fillings,Meat:filling,Uniformity,Salsa,Synergy,Wrap,Queso,Year
count,71.0,71.0,292.0,297.0,0.0,0.0,175.0,174.0,174.0,298.0,283.0,288.0,297.0,292.0,296.0,278.0,296.0,296.0,0.0,298.0
mean,3.897183,4.142254,6.896781,3.445286,,,19.829886,22.042241,0.77092,3.472315,3.70636,3.551215,3.519024,3.52887,3.395946,3.32464,3.540203,3.955068,,2015.979866
std,0.47868,0.371738,1.211412,0.85215,,,2.081275,1.685043,0.137833,0.797606,0.991897,0.869483,0.850348,1.040457,1.089044,0.971226,0.922426,1.167341,,0.295187
min,2.5,2.9,2.99,0.5,,,15.0,17.0,0.4,1.4,1.0,1.0,1.0,0.5,1.0,0.0,1.0,0.0,,2011.0
25%,3.5,4.0,6.25,3.0,,,18.5,21.0,0.6625,3.0,3.0,3.0,3.0,3.0,2.5,2.5,3.0,3.5,,2016.0
50%,4.0,4.2,6.85,3.5,,,19.5,22.0,0.75,3.5,4.0,3.5,3.5,4.0,3.5,3.5,3.75,4.0,,2016.0
75%,4.0,4.4,7.5,4.0,,,21.0,23.0,0.87,4.0,4.5,4.0,4.0,4.0,4.0,4.0,4.0,5.0,,2016.0
max,4.5,4.9,11.95,5.0,,,26.0,27.0,1.24,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,,2016.0


In [0]:
target = 'Great'
features = ['Yelp','Google','Cost','Length','Circum','Volume','Tortilla','Temp']

In [21]:
X_train = train[features]
y_train = train[target]
X_val = val[features]
y_val = val[target]
X_test = test[features]
y_test = test[target]
X_train.shape, y_train.shape,X_val.shape,y_val.shape, X_test.shape, y_test.shape

((298, 8), (298,), (85, 8), (85,), (38, 8), (38,))

**Baseline** 

In [22]:
y_train.value_counts(normalize=True)

False    0.590604
True     0.409396
Name: Great, dtype: float64

In [0]:
# Majority class
majority_class = y_train.mode()[0]
y_pred = [majority_class] * len(y_train)

In [24]:
accuracy_score(y_train,y_pred)

0.5906040268456376

In [25]:
#fit DummyClassifier
baseline = DummyClassifier(strategy='most_frequent')
baseline.fit(X_train,y_train)

#Make predictions on val set
y_val_pred = baseline.predict(X_val)
accuracy_score(y_val,y_val_pred)

0.5529411764705883

In [0]:
#impute
imputer = SimpleImputer(strategy='mean')
X_train_imputed = imputer.fit_transform(X_train)
X_val_imputed = imputer.transform(X_val)

In [27]:
encoder = ce.OneHotEncoder(use_cat_names = True)
X_train_encoded = encoder.fit_transform(X_train_imputed)
X_val_encoded = encoder.transform(X_val_imputed)

X_train_encoded.shape,X_val_encoded.shape

((298, 8), (85, 8))

In [28]:
# Import estimator class
from sklearn.linear_model import LinearRegression

# Instantiate this class
linear_reg = LinearRegression()
linear_reg.fit(X_train_imputed, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

In [29]:
linear_reg.predict(X_val_imputed)

array([ 0.65064006,  0.34763066,  0.53138899,  0.53446251,  0.17246926,
        0.10163028,  0.73369874,  0.4302021 ,  0.61342886,  0.81861254,
        0.6814896 ,  0.0665614 ,  0.26665898,  0.1296919 ,  0.84500398,
        0.24270132,  0.34794289,  0.48765922,  0.34359257,  0.91981296,
        0.51791903,  0.37564963,  0.37306478,  0.35972964,  0.53991757,
        0.49409606,  0.09975711,  0.5192457 ,  0.17983742,  0.44558356,
        0.74921326,  0.52360612,  0.11346115,  0.41273904, -0.10735077,
       -0.05351471,  0.49436132,  0.650346  ,  0.53514757,  0.60474491,
        0.72666784,  0.27537755,  0.51826215,  0.59546248,  0.78191097,
        0.60034258,  0.82737206,  0.56122827,  0.64786661,  0.27973545,
        0.36778506,  0.53831528,  0.390122  ,  0.39110379,  0.26659157,
        0.32714524,  0.15566974,  0.04597867,  0.48729474,  0.29330056,
       -0.27886675,  0.39535041,  0.4088467 ,  0.55914427,  0.72338785,
        0.71446626,  0.6076061 ,  0.61240147,  0.36356473,  0.56

In [30]:
pd.Series(linear_reg.coef_,features)

Yelp        0.120877
Google     -0.033042
Cost        0.071247
Length      0.172563
Circum      0.309714
Volume     -4.393267
Tortilla    0.196148
Temp        0.107672
dtype: float64

**Logistic Regression** 

In [33]:
# Instantiate class

log_reg = LogisticRegression(solver='lbfgs', max_iter=1000 )
log_reg.fit(X_train_imputed,y_train)
print(f'Validation Accuracy Score:', log_reg.score(X_val_encoded,y_val))

Validation Accuracy Score: 0.5647058823529412


In [0]:
y_val_pred = log_reg.predict(X_val_encoded)


In [38]:
X_test_imputed  = imputer.transform(X_test)
X_test_encoded = encoder.transform(X_test_imputed)
print(f'Validation Accuracy Score:', log_reg.score(X_test_encoded,y_test))

Validation Accuracy Score: 0.6052631578947368


In [39]:
log_reg.predict(X_test_encoded)

array([ True,  True, False,  True,  True,  True, False,  True, False,
        True,  True,  True,  True, False,  True,  True,  True, False,
       False, False, False, False,  True,  True,  True,  True, False,
        True,  True,  True, False, False,  True,  True,  True,  True,
        True, False])