Logistic regression is a linear model that predicts the probability of a binary outcome, such as whether a passenger survived the Titanic or not. The model works by fitting a line to the data, and then using the line to predict the probability of an outcome.

Decision tree is a non-linear model that predicts the outcome of a classification problem by creating a tree-like structure. The tree is created by splitting the data into smaller and smaller groups, until each group contains only data points with the same outcome.

Random forest is an ensemble model that combines multiple decision trees to make predictions. The model works by training multiple decision trees on different subsets of the data, and then combining the predictions of the trees to make a final prediction.

Support vector machine is a non-linear model that predicts the outcome of a classification problem by finding the hyperplane that best separates the data points into two groups. The hyperplane is the line that minimizes the distance between the data points of each group.

K-nearest neighbors is a non-parametric model that predicts the outcome of a classification problem by finding the k data points that are closest to the new data point. The model then predicts the outcome of the new data point based on the outcomes of the k nearest neighbors.

In [1]:
import pandas as pd
import numpy as np
data= pd.read_csv("titanic.csv")
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [3]:
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import LabelEncoder

encoder= LabelEncoder()
data["Sex"]= encoder.fit_transform(data["Sex"])
print(data["Sex"])

data= data.dropna(subset=["Age","SibSp","Parch","Fare"])

x= data[["Pclass","Sex","Age","SibSp","Parch","Fare"]]
y= data["Survived"]

model= LogisticRegression()
model.fit(x,y)

score= model.score(x,y)
print("Accuracy: ", score)

0      1
1      0
2      0
3      0
4      1
      ..
886    1
887    0
888    0
889    1
890    1
Name: Sex, Length: 891, dtype: int64
Accuracy:  0.8025210084033614


In [5]:
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeClassifier 

encoder= LabelEncoder()
data["Sex"]= encoder.fit_transform(data["Sex"])
# print(data["Sex"])

data= data.dropna(subset=["Age","SibSp","Parch","Fare"])

x= data[["Pclass","Sex","Age","SibSp","Parch","Fare"]]
y= data["Survived"]

model= DecisionTreeClassifier()
model.fit(x,y)

score= model.score(x,y)
print("accuracy: ", score)


accuracy:  0.9859943977591037


In [6]:
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier

encoder= LabelEncoder()
data["Sex"]= encoder.fit_transform(data["Sex"])
# print(data["Sex"])

data= data.dropna(subset=["Age","SibSp","Parch","Fare"])

x= data[["Pclass","Sex","Age","SibSp","Parch","Fare"]]
y= data["Survived"]

model= RandomForestClassifier()
model.fit(x,y)

score= model.score(x,y)
print("accuracy: ", score)

accuracy:  0.9859943977591037


In [7]:
from sklearn.preprocessing import LabelEncoder
from sklearn.svm import SVC

encoder= LabelEncoder()
data["Sex"]= encoder.fit_transform(data["Sex"])
# print(data["Sex"])

data= data.dropna(subset=["Age","SibSp","Parch","Fare"])

x= data[["Pclass","Sex","Age","SibSp","Parch","Fare"]]
y= data["Survived"]

model= SVC()
model.fit(x,y)

score= model.score(x,y)
print("accuracy: ",score)

accuracy:  0.6764705882352942


In [10]:
from sklearn.preprocessing import LabelEncoder
from sklearn.neighbors import KNeighborsClassifier

encoder= LabelEncoder()
data["Sex"]= encoder.fit_transform(data["Sex"])
# print(data["Sex"])

data= data.dropna(subset=["Age","SibSp","Parch","Fare"])

x= data[["Pclass","Sex","Age","SibSp","Parch","Fare"]]
y= data["Survived"]

model= KNeighborsClassifier()
model.fit(x,y)

score= model.score(x,y)
print("accuracy: ", score)

accuracy:  0.7941176470588235
