### The logistic model (or logit model) is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick.
### This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc.
### Each object being detected in the image would be assigned a probability between 0 and 1, with a sum of one. In marketing, it may be used to predict if a given user (or group of users) will buy a certain product or not. An online education company might use logistic regression to predict whether a student will complete their course on time or not.
### As you can see, logistic regression is used to predict the likelihood of all kinds of “yes” or “no” outcomes. By predicting such outcomes, logistic regression helps data analysts (and the companies they work for) to make informed decisions.


In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
d = pd.read_csv("../input/TitanicDataset/titanic_data.csv")
#remove the columns that we wont be using
d.drop(['Cabin', 'Name', 'Ticket', 'PassengerId', 'Fare'], axis=1, inplace = True)
d.head()

In [None]:
sns.heatmap(d.isnull(), yticklabels=False, cbar=False, cmap='viridis')
#we will fill the missing data in age column by the avg age

In [None]:
d.groupby('Pclass').mean()
#find average ages according to the Pclass

In [None]:
#make a fnction to fill in the missing age values
def func(cols):
    Age = cols[0]
    Pclass = cols[1]
   
    if pd.isnull(Age):
       
         if Pclass == 1:
            return 38
         elif Pclass == 2:
            return 30
         else:
            return 25
    else:
            return Age


In [None]:
d['Age'] = d[['Age', 'Pclass']].apply(func, axis=1)
sns.heatmap(d.isnull(), yticklabels=False, cbar=False, cmap='viridis')
#now check for null values in age

In [None]:
#we have created dummy variables for M and F in sex colmn so that it can be fed to our ML model
#1 means passenger is male and 0 means a female
d.dropna(inplace=True)
gender = pd.get_dummies(d['Sex'],drop_first=True)
gender.head()

In [None]:
#same for the embarked column
embark = pd.get_dummies(d['Embarked'],drop_first=True)
embark.head()

In [None]:
d=pd.concat([d,gender,embark], axis=1)
d.head()

In [None]:
#drop the columns that can't be fed to ML model
d.drop(['Sex','Embarked'],axis=1, inplace=True)
d.head()

In [None]:
x = d.drop('Survived', axis=1)
y = d['Survived']
from sklearn.model_selection import train_test_split
x_test, x_train, y_test, y_train = train_test_split(x,y,test_size=0.3, random_state=101)
from sklearn.linear_model import LogisticRegression
lm = LogisticRegression()
lm.fit(x_train, y_train)

In [None]:
pr = lm.predict(x_test)
from sklearn.metrics import classification_report
print(classification_report(y_test,pr))

In [None]:
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test,pr)