# Logistic Regression

#### Logistic Regression is a simple machine learning method you can use to predict the value of the numeric categorical variable based on its relationship with predictor variables.

#### Use cases:
 * Customer Churn Prediction
 * Employee Attrition Modelling
 * Purchase Property vs. Ad Spend Analisys
 
#### Logistic Regression Assumption:
 * Data is free of missing values
 * The predictant variable is binary( thats is, it only accepts two values) or ordinal (that is, a categorical variable with ordered values
 * All predictors are independent of each other
 * There are at least 50 observations per predictor variable (to ensure reliable results)

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import sklearn
import seaborn as sb
from pandas import Series, DataFrame
from pylab import rcParams
from sklearn import preprocessing
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix
from sklearn.metrics import precision_score, recall_score
from sklearn.metrics import classification_report
from collections import Counter

In [2]:
%matplotlib inline
rcParams['figure.figsize']=5, 4
sb.set_style('whitegrid')

#### Logistic regression on titanic dataset

In [3]:
address = 'C:/Users/Fred/Documents/GitHub/Data Science/DataScience/Exercise Files/Data/titanic-training-data.csv'
titanic_training = pd.read_csv(address)
titanic_training.columns = [
    'PassengerId', 
    'Survived', 
    'Pclass', 
    'Name', 
    'Sex', 
    'Age', 
    'SibSp', 
    'Parch', 
    'Ticket', 
    'Fare', 
    'Cabin',
    'Embarked']


In [4]:
print(titanic_training.head())

   PassengerId  Survived  Pclass  \
0            1         0       3   
1            2         1       1   
2            3         1       3   
3            4         1       1   
4            5         0       3   

                                                Name     Sex   Age  SibSp  \
0                            Braund, Mr. Owen Harris    male  22.0      1   
1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   
2                             Heikkinen, Miss. Laina  female  26.0      0   
3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   
4                           Allen, Mr. William Henry    male  35.0      0   

   Parch            Ticket     Fare Cabin Embarked  
0      0         A/5 21171   7.2500   NaN        S  
1      0          PC 17599  71.2833   C85        C  
2      0  STON/O2. 3101282   7.9250   NaN        S  
3      0            113803  53.1000  C123        S  
4      0            373450   8.0500   NaN        S  


In [5]:
print(titanic_training.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
None


##### Variable Discription
* Survived - Survival (0 = No, 1 = Yes)
* Pclass = Passenger class (1 = 1st, 2 = 2nd, 3 = 3rd)
* Name - Name
* Sex - Sex
* Age - Age
* SibSp - Number of siblings / spouse abroad
* Patch - Number of parents /  children abroad
* Ticket - Ticket Number
* Fare - Passenger Fare (British pounds)
* Cabin - Cabin
* Embarked - Port of embarkation (C = Cherbourg, France; Q = Queenstown, UK; S = Southhampton - Cobb, Ireland) 
