# Titanic - Machine Learning from Disaster

We'll use **machine learning** to create a model that predicts which passengers survived the Titanic shipwreck.

### 1. Importing Needed packages

In [2]:
import pandas as pd
import numpy as np
from sklearn import linear_model

### 2. Importing Dataset and normalizing data

In [71]:
data= pd.read_csv("train.csv")
test=pd.read_csv("test.csv")
data.dropna(inplace=True)

data["Sex"].replace("male",0,inplace=True)
data["Sex"].replace("female",1,inplace=True)
test["Sex"].replace("male",0,inplace=True)
test["Sex"].replace("female",1,inplace=True)
test["Age"].fillna(test["Age"].median(),inplace=True)
test_ids=test["PassengerId"]
test.describe()

Unnamed: 0,PassengerId,Pclass,Sex,Age,SibSp,Parch,Fare
count,418.0,418.0,418.0,418.0,418.0,418.0,417.0
mean,1100.5,2.26555,0.363636,29.599282,0.447368,0.392344,35.627188
std,120.810458,0.841838,0.481622,12.70377,0.89676,0.981429,55.907576
min,892.0,1.0,0.0,0.17,0.0,0.0,0.0
25%,996.25,1.0,0.0,23.0,0.0,0.0,7.8958
50%,1100.5,3.0,0.0,27.0,0.0,0.0,14.4542
75%,1204.75,3.0,1.0,35.75,1.0,0.0,31.5
max,1309.0,3.0,1.0,76.0,8.0,9.0,512.3292


### 3. Simple Regression Model

Linear Regression fits a linear model with coefficients B = (B1, ..., Bn) to minimize the 'residual sum of squares' between the actual value y in the dataset, and the predicted value yhat using linear approximation.

In [8]:
linea=linear_model.LinearRegression()
x_t=np.asanyarray(data[["Pclass","Sex","Age","SibSp","Parch"]])
y_t=np.asanyarray(data["Survived"])
linea.fit(x_t,y_t)

print("Coefficient:",linea.coef_)
print ('Intercept: ',linea.intercept_)

y_=linea.predict(test[["Pclass","Sex","Age","SibSp","Parch"]])
x=np.asanyarray(test[["Pclass","Sex","Age","SibSp","Parch"]])
submission_preds=np.round(linea.predict(x),0)

Coefficient: [-0.09639983  0.49457377 -0.00578526  0.03080681 -0.04160372]
Intercept:  0.7530729994245616




**Coefficient** and **Intercept** in the simple linear regression are the parameters of the fit line.
Given that it is a simple linear regression, with only 2 parameters, and knowing that the parameters are the intercept and slope of the line, sklearn can estimate them directly from our data.
Notice that all of the data must be available to traverse and calculate the parameters.

#### 4. Save output

In [78]:
df=pd.DataFrame({"PassengerID":test_ids.values,
                    "Survived":submission_preds.astype(int)})
df.to_csv("submission.csv",index=False)