# Fitting a Logistic Regression Model - Lab

## Introduction

In the last lesson you were given a broad overview of logistic regression. This included an introduction to two separate packages for creating logistic regression models. In this lab, you'll be investigating fitting logistic regressions with `statsmodels`. For your first foray into logistic regression, you are going to attempt to build a model that classifies whether an individual survived the [Titanic](https://www.kaggle.com/c/titanic/data) shipwreck or not (yes, it's a bit morbid).


## Objectives

In this lab you will: 

* Implement logistic regression with `statsmodels` 
* Interpret the statistical results associated with model parameters

## Import the data

Import the data stored in the file `'titanic.csv'` and print the first five rows of the DataFrame to check its contents. 

In [17]:
import pandas as pd
import numpy as np 
from sklearn.linear_model import LogisticRegression
import statsmodels.api as sm


pd.options.display.max_columns = None

In [18]:
# Import the data


df = pd.read_csv('titanic.csv')
df.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


In [19]:
def gender(x): 
    if x == 'male': 
        return 1 
    elif x == 'female': 
        return 0 
    else: 
        return None

df.Sex = df.Sex.map(gender)
df = df.join(pd.get_dummies(df.Cabin, prefix = 'cabin')).join(pd.get_dummies(df.Embarked, prefix = 'embark'))

In [20]:
df.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,cabin_A10,cabin_A14,cabin_A16,cabin_A19,cabin_A20,cabin_A23,cabin_A24,cabin_A26,cabin_A31,cabin_A32,cabin_A34,cabin_A36,cabin_A5,cabin_A6,cabin_A7,cabin_B101,cabin_B102,cabin_B18,cabin_B19,cabin_B20,cabin_B22,cabin_B28,cabin_B3,cabin_B30,cabin_B35,cabin_B37,cabin_B38,cabin_B39,cabin_B4,cabin_B41,cabin_B42,cabin_B49,cabin_B5,cabin_B50,cabin_B51 B53 B55,cabin_B57 B59 B63 B66,cabin_B58 B60,cabin_B69,cabin_B71,cabin_B73,cabin_B77,cabin_B78,cabin_B79,cabin_B80,cabin_B82 B84,cabin_B86,cabin_B94,cabin_B96 B98,cabin_C101,cabin_C103,cabin_C104,cabin_C106,cabin_C110,cabin_C111,cabin_C118,cabin_C123,cabin_C124,cabin_C125,cabin_C126,cabin_C128,cabin_C148,cabin_C2,cabin_C22 C26,cabin_C23 C25 C27,cabin_C30,cabin_C32,cabin_C45,cabin_C46,cabin_C47,cabin_C49,cabin_C50,cabin_C52,cabin_C54,cabin_C62 C64,cabin_C65,cabin_C68,cabin_C7,cabin_C70,cabin_C78,cabin_C82,cabin_C83,cabin_C85,cabin_C86,cabin_C87,cabin_C90,cabin_C91,cabin_C92,cabin_C93,cabin_C95,cabin_C99,cabin_D,cabin_D10 D12,cabin_D11,cabin_D15,cabin_D17,cabin_D19,cabin_D20,cabin_D21,cabin_D26,cabin_D28,cabin_D30,cabin_D33,cabin_D35,cabin_D36,cabin_D37,cabin_D45,cabin_D46,cabin_D47,cabin_D48,cabin_D49,cabin_D50,cabin_D56,cabin_D6,cabin_D7,cabin_D9,cabin_E10,cabin_E101,cabin_E12,cabin_E121,cabin_E17,cabin_E24,cabin_E25,cabin_E31,cabin_E33,cabin_E34,cabin_E36,cabin_E38,cabin_E40,cabin_E44,cabin_E46,cabin_E49,cabin_E50,cabin_E58,cabin_E63,cabin_E67,cabin_E68,cabin_E77,cabin_E8,cabin_F E69,cabin_F G63,cabin_F G73,cabin_F2,cabin_F33,cabin_F38,cabin_F4,cabin_G6,cabin_T,embark_C,embark_Q,embark_S
0,1,0,3,"Braund, Mr. Owen Harris",1,22.0,1,0,A/5 21171,7.25,,S,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",0,38.0,1,0,PC 17599,71.2833,C85,C,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0


## Define independent and target variables

Your target variable is in the column `'Survived'`. A `0` indicates that the passenger didn't survive the shipwreck. Print the total number of people who didn't survive the shipwreck. How many people survived?

In [68]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import MinMaxScaler
# Total number of people who survived/didn't survive
column_ignore = ['Name', 'PassengerId', 'Ticket','Cabin', 'Embarked']
y = df.dropna()[['Survived']].values
X = df.dropna()[[i for i in df.columns if i not in column_ignore]].values

X = MinMaxScaler().fit_transform(X)
x_train, x_test, y_train, y_test = train_test_split(X,y, test_size = .85)

logi = LogisticRegression() 
logi.fit(x_train, y_train)
predictions = logi.predict(x_test)
print(accuracy_score(y_test, predictions))

0.9935897435897436


  y = column_or_1d(y, warn=True)
