# TITANIC - LOGISTIC REGRESSION WITH GridSearchCV
The following notebook performs logistic regression on the Kaggle Titanic Dataset including preprocessing, model training, Evaluation, Hyperparameter Tuning using GridSearchCV

This Notebook explores the titanic dataset from Kaggle to predict passenger survival using Logistic Regression 
The workflow includes :
- Data Cleaning and preprocessing
- Handling missing values
- Feature engineering (encoding categorical variables)
- Splitting data into training and testing sets
- Building a Logistic Regression model
- Hyperparameter Tuning using GridSearchCV
- Model evaluation and interpretion of results

The goal is to demonstrate a basic supervised Machine Learning workflow on a classic dataset

In [None]:
import pandas as pd 
import numpy as np 
import seaborn as sns 

In [None]:
df = pd.read_csv("Titanic-Dataset.csv")
df.head()

In [None]:
df['Age'].fillna(df['Age'].median(),inplace = True)
df['Embarked'].fillna(df['Embarked'].mode()[0],inplace=True)
df['Sex'] = df['Sex'].map({'male' : 0 ,'female':1})

In [None]:
df = pd.get_dummies(df,columns=['Embarked'],drop_first =True)

In [None]:
y =df['Survived']
X = df[['Pclass','Sex','Age','SibSp','Parch','Fare','Embarked_Q','Embarked_S']]

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size =0.3,random_state = 42)

In [None]:
from sklearn.linear_model import LogisticRegression 
classifier = LogisticRegression()

In [None]:
classifier.fit(X_train,y_train)

In [None]:
from sklearn.model_selection import GridSearchCV
parameter = {'penalty':['l2','l1','elastic'] ,'C':[1,2,3,4,10],'max_iter':[100,200,300]}

In [None]:
classifier_pgrm=GridSearchCV(classifier,param_grid = parameter,scoring='accuracy',cv=5)

In [None]:
classifier_pgrm.fit(X_train,y_train)

In [None]:
classifier_pgrm.best_params_

In [None]:
classifier_pgrm.best_score_

In [None]:
y_pred = classifier_pgrm.predict(X_test)

In [None]:
from sklearn.metrics import accuracy_score,classification_report
score = accuracy_score(y_pred,y_test)
score

In [None]:
print(classification_report(y_pred,y_test))

In [None]:
sns.pairplot(df,hue='Survived')

# Conclusion

In this notebook, we applied **Logistic Regression** to the Titanic dataset to predict passenger survival.

Key steps:

- We handled missing data and encoded categorical features.
- We used **train-test split** to evaluate model performance.
- We used **GridSearchCV** to optimize model hyperparameters.
- The final model achieved a reasonable accuracy on the test set.

### Possible Improvements:

- Try more advanced models (Random Forest, XGBoost, etc.)
- Perform more extensive feature engineering.
- Use cross-validation with more folds for better generalization.
- Visualize feature importance.

### Summary:

This project serves as a simple end-to-end demonstration of a **supervised machine learning pipeline** with Logistic Regression on a well-known dataset. It can be extended further to improve both performance and interpretability.
