## Titanic Competition with Logistic Regression

**This notebook will take you through the steps needed to train a basic logistic regression model with some feature engineering**

**Logistic Regression is a supervised model, even though it has Regression in its name it is a classification model**

**It performs classification using the Sigmoid Function which is a mathematical function which creates a 'S' shaped regression line** 

![Image](https://www.google.com/imgres?imgurl=https%3A%2F%2Fupload.wikimedia.org%2Fwikipedia%2Fcommons%2Fthumb%2F8%2F88%2FLogistic-curve.svg%2F1200px-Logistic-curve.svg.png&tbnid=Mo8hh91-h1tILM&vet=12ahUKEwiG8rzPyLOBAxWNQWwGHW-0A08QMygAegQIARB0..i&imgrefurl=https%3A%2F%2Fen.wikipedia.org%2Fwiki%2FSigmoid_function&docid=avewwmVOe63F1M&w=1200&h=800&q=sigmoid%20function&ved=2ahUKEwiG8rzPyLOBAxWNQWwGHW-0A08QMygAegQIARB0)

**This notebook is an introduction to logistic regression and it will show you how to :-**

**1. Convert *Categorical data* into *numeric data**

**2. Filling *missing values* into dataset**

**3. Performing *one-hot Encoding* on Categorical Data** 

**4. Creating a *Logistic Regression* Model**

Note - As this is intended for beginners , we won't be performing any hyperparameter tuning


In [None]:
# importing our necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
from sklearn.linear_model import LogisticRegression

In [None]:
# Importing our training and testing data

train=pd.read_csv('/kaggle/input/titanic/train.csv')
test=pd.read_csv('/kaggle/input/titanic/test.csv')
display(train.head(),
        '-'*90,
        test.head())

## Data PreProcessing and Feature Engineering

In [None]:
# Transforming Object data into numeric values
# You can also use Label Encoder (from sklearn.preprocessing import LabelEncoder) to Transform Values


train["Sex"]=(train['Sex']=='male').astype(int)
test['Sex']=(test['Sex']=='male').astype(int)
train['Sex']

In [None]:
# For this notebook we won't be using these columns. Advanced Feature
# engineering on those columns is out of scope for this notebook

train=train.drop(['Ticket','PassengerId','Cabin','Name'],axis=1)
x_test=test.drop(['Name','Ticket','Cabin','PassengerId'],axis=1)

In [None]:
# Splitting Data into Features and Target Variables

x_train=train.drop(['Survived'],axis=1)
y_train=train['Survived']


In [None]:
# Filling Missing values 

x_train['Age']=x_train['Age'].fillna(x_train['Age'].mean())
x_test['Age']=x_test['Age'].fillna(x_test['Age'].mean())
x_test['Fare']=x_test['Fare'].fillna(x_test['Fare'].mean())
x_test['Embarked']=x_test['Embarked'].fillna(x_test['Embarked'].mode())

In [None]:
# Fare was 0.00 in some rows which had to be replaced as 0.00 for fare is not possible 

x_train['Fare']=x_train['Fare'].replace(0.000000,x_train['Fare'].median())
x_test['Fare']=x_test['Fare'].replace(0.000000,x_test['Fare'].median())

In [None]:
# Performing one-hot Encoding to create dummy variables
# This can also be done using OneHotEncoder( from sklearn.preprocessor import OneHotEncoder)

dummy_Embarked_x=pd.get_dummies(x_train['Embarked'],dtype=int)
dummy_Embarked_y=pd.get_dummies(x_test['Embarked'],dtype=int)

In [None]:
# Adding Dummy variables to trainig data
x_train['C']=dummy_Embarked_x['C']
x_train['Q']=dummy_Embarked_x['Q']
x_train['S']=dummy_Embarked_x['S']

# Adding Dummy variables to test data
x_test['C']=dummy_Embarked_y['C']
x_test['Q']=dummy_Embarked_y['Q']
x_test['S']=dummy_Embarked_y['S']

In [None]:
# After Creating Dummmy variables Embarked is not required

x_train=x_train.drop(['Embarked'],axis=1)
x_test=x_test.drop(['Embarked'],axis=1)

## Model Creation

In [None]:
# Creating Our Logistic Regression Model

reg=LogisticRegression()
reg_model=reg.fit(x_train,y_train)
y_pred=reg_model.predict(x_test)
y_pred=pd.DataFrame(y_pred)

In [None]:
reg_model.score(x_train,y_train)

In [None]:
# Creating  output file

submit=pd.DataFrame(test['PassengerId'])
submit['Survived']=y_pred

In [None]:
submit.to_csv('submit',index=False,header=1)