### ALGORITHM 2: LOGISTIC REGRESSION

This is a classification algorithm, that is used to predict binary values i.e 1/0, True/False.
It classifies the results within a specific category, such as if an animal is a cat or not a cat.
The occurrence is calculated based on occurrence of independent factors, by fitting data into a logit function.
This algorithm requires the dataset to contain only numerical values.

```
odds = occurrence / not occurrence = P/(1-P)
ln(odds) = ln(P/1-P)
logit(P) = b + b1*x1 + b2*x2 + ... + bn*xn -> [0/1]
```

Logistic regression aims to solve classification problems. It does this by predicting categorical outcomes, unlike linear regression that predicts a continuous outcome.


In [56]:
# importing required libraries
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler

# read the train and test dataset [titanic survivors]
dataset = pd.read_csv("../data/titanic.csv")
train_data, test_data = train_test_split(dataset, test_size=0.2, shuffle=False)

# separate the train and test X and y dataset
train_X = train_data.drop("Survived", axis=1)  # drop survived column for X
test_X = test_data.drop("Survived", axis=1)

train_y = train_data["Survived"]               # select survived column for y
test_y = test_data["Survived"]


In [57]:
# scale the values to units comparable to each other
scaler = StandardScaler()

train_X = scaler.fit_transform(train_X)
test_X = scaler.fit_transform(test_X)

# create the model and train with data
model = LogisticRegression()
model.fit(train_X, train_y)

print("coefficient of model : ", model.coef_)
print("intercept of model   : ", model.intercept_, "\n")

print("test data :")
display(test_data.head(4))


coefficient of model :  [[-0.40061022  0.09783385  0.44224518  0.05221185 -0.42184603  0.63701492
  -0.63701492  0.1938273   0.15025706  0.03860307 -0.33623311 -0.31837562
  -0.36139869 -0.44085767 -0.02269506  0.14635141  0.00353597  0.01576255
  -0.3541281  -0.11816649 -0.18741514  0.08151949  0.06476753 -0.11170643]]
intercept of model   :  [-0.64120973] 

test data :


Unnamed: 0,Survived,Age,Fare,Pclass_1,Pclass_2,Pclass_3,Sex_female,Sex_male,SibSp_0,SibSp_1,...,Parch_0,Parch_1,Parch_2,Parch_3,Parch_4,Parch_5,Parch_6,Embarked_C,Embarked_Q,Embarked_S
712,0,35.0,7.125,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
713,0,20.0,7.05,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
714,0,26.0,7.8958,0,0,1,0,1,1,0,...,1,0,0,0,0,0,0,0,0,1
715,1,58.0,146.5208,1,0,0,1,0,1,0,...,1,0,0,0,0,0,0,1,0,0


In [58]:
# predict the results
pred_y = model.predict(test_X)
print("predicted survivors : ", pred_y[:4])

# score of the model
score = accuracy_score(test_y, pred_y)
print("score of model      : ", score * 100, "%")


predicted survivors :  [0 0 0 1]
score of model      :  83.24022346368714 %
