# Classfication Exersice

### First Step : Download __[Titanic Data](https://www.kaggle.com/c/titanic/data)__  
- **NOTE** : use only  file `train.csv`from data.
#### 1. Import library of pandas and numpy

In [219]:
import pandas as pd 
import numpy as np

#### 2. Load Data using Pandas

In [220]:
df = pd.read_csv('train.csv')
df.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


#### 3. Check column in Dataset and Drop useless columns 

- **Hint** : useless columns `Name`, `Ticket`,`PassengerId` and `Cabin`

In [221]:
df.drop(['Name','Ticket','PassengerId','Cabin'],axis=1,inplace=True)

#### 4. Checking null values in each column and Handle with Mode

- **Hint** : 
- 1. for `Age` use Mode since there is outliers in this column
- 2. for `Embarked` use Mode since it is a categorical variable


In [222]:
#import mode from stats
from scipy.stats import mode

In [223]:
#check null values
df.isnull().sum()

Survived      0
Pclass        0
Sex           0
Age         177
SibSp         0
Parch         0
Fare          0
Embarked      2
dtype: int64

In [224]:
#fill null values with mode
values = {'Age' : float(mode(df.Age)[0]) , 'Embarked' : str(list(mode(df.Embarked)[0])[0])}
df.fillna(value = values,inplace=True)

#### 5. Handle Categorical Data using `get_Dummies()` in pandas 

- **Hint** : Handle only columns `Sex` and `Embarked`
- 
 Read this document on how to use [`get_Dummies()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html)  

In [225]:
#know number of unique values of (Sex,Embarked)
df.Sex.nunique(),df.Embarked.nunique()

(2, 3)

In [226]:
df = df.join(pd.get_dummies(df[['Sex','Embarked']]))
df.drop(['Sex','Embarked'],axis=1,inplace=True)

#### 6.  Separate  X ( features ) from  Y (labels)
**Hint** : 
- goal : predict only passengers survive and or not be

In [227]:
x = df.drop('Survived',axis=1)
y = df.Survived

#### 7. Split data into the Training data and Test data by `random_state=5` and `test_size=0.25`

In [228]:
# Splitting Data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.25,random_state=5)

#### 8. Scale all Data using `StandardScaler` 

In [229]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

#### 9.Building your model ( LogisticRegression Model )
 Use the default sklearn parameters and `random_state=33`

In [230]:
# Fitting Logistic Regression to the Training set
from sklearn.linear_model import LogisticRegression
# Object
LogisticRegressionModel = LogisticRegression(random_state=33)
#Fitting
LogisticRegressionModel.fit(X_train, y_train)

LogisticRegression(random_state=33)

#### 10. Calculate Accuracy of the model
Hint: use `LogisticRegressionModel.score(X_test, y_test)`

In [231]:
print('LogisticRegressionModel Test Score is : ' , LogisticRegressionModel.score(X_test, y_test))

LogisticRegressionModel Test Score is :  0.820627802690583


#### 11. Calculate Confusion Matrix ,precision,recall and f1-score for first model

In [232]:
y_pred = LogisticRegressionModel.predict(X_test)

In [233]:
from sklearn.metrics import confusion_matrix
CM = confusion_matrix(y_test, y_pred)
print('Confusion Matrix is : \n', CM)

Confusion Matrix is : 
 [[125  15]
 [ 25  58]]


In [234]:
from sklearn.metrics import classification_report
ClassificationReport = classification_report(y_test,y_pred)
print('Classification Report is : \n', ClassificationReport )

Classification Report is : 
               precision    recall  f1-score   support

           0       0.83      0.89      0.86       140
           1       0.79      0.70      0.74        83

    accuracy                           0.82       223
   macro avg       0.81      0.80      0.80       223
weighted avg       0.82      0.82      0.82       223



#### 12.Building your model ( K-NN Model )
 Use the default sklearn parameters with `n_neighbors= 5`

In [235]:
from sklearn.neighbors import KNeighborsClassifier
KNNClassifierModel = KNeighborsClassifier(n_neighbors= 5)
KNNClassifierModel.fit(X_train, y_train)

KNeighborsClassifier()

#### 13. Calculate Accuracy of the K-NN model
Hint: use `KNNClassifierModel.score(X_test, y_test))`

In [236]:
print('KNNClassifierModel Test Score is : ' , KNNClassifierModel.score(X_test, y_test))

KNNClassifierModel Test Score is :  0.8430493273542601


#### 14. Calculate Confusion Matrix ,precision,recall and F1-score for the K-NN model

In [237]:
y_pred = KNNClassifierModel.predict(X_test)

In [238]:
from sklearn.metrics import confusion_matrix
CM = confusion_matrix(y_test, y_pred)
print('Confusion Matrix is : \n', CM)

Confusion Matrix is : 
 [[127  13]
 [ 22  61]]


In [239]:
from sklearn.metrics import classification_report
ClassificationReport = classification_report(y_test,y_pred)
print('Classification Report is : \n', ClassificationReport )

Classification Report is : 
               precision    recall  f1-score   support

           0       0.85      0.91      0.88       140
           1       0.82      0.73      0.78        83

    accuracy                           0.84       223
   macro avg       0.84      0.82      0.83       223
weighted avg       0.84      0.84      0.84       223



#### Hint : use for loop to Calculate score of the K-NN model at different K values

In [242]:
k_range = range(1,10,2)
scores = []

In [243]:
from sklearn import metrics

for k in k_range:
    knn = KNeighborsClassifier(n_neighbors= k)
    knn.fit(X_train , y_train)
    y_pred = knn.predict(X_test)
    scores.append(metrics.accuracy_score(y_test , y_pred) )
    
print(scores)

[0.8116591928251121, 0.8295964125560538, 0.8430493273542601, 0.8385650224215246, 0.8385650224215246]
