# Exercise 1: Exploratory Data Analysis
Instructions

Load the iris dataset.
Perform data cleaning and handle missing values.
Conduct exploratory data analysis to understand the relationships between different features and survival.


In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report
from xgboost import XGBClassifier

In [5]:
# import iris dataset
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
df.columns = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

In [6]:
# split data into X and y
X = df.drop('species', axis=1)
y = df['species']
X_train, X_test, y_train, y_test =train_test_split(X, y, test_size=0.2, random_state=42)
print(X_train.head())
print(y_train.head())

    sepal_length  sepal_width  petal_length  petal_width
22           4.6          3.6           1.0          0.2
15           5.7          4.4           1.5          0.4
65           6.7          3.1           4.4          1.4
11           4.8          3.4           1.6          0.2
42           4.4          3.2           1.3          0.2
22        Iris-setosa
15        Iris-setosa
65    Iris-versicolor
11        Iris-setosa
42        Iris-setosa
Name: species, dtype: object


# Exercise 2: Decision Tree Classifier without Grid Search
Instructions

Implement a Decision Tree Classifier on the Titanic dataset.
Manually choose and set the hyperparameters.
Evaluate its performance using accuracy, precision, recall, and F1 score.


In [7]:
logreg = LogisticRegression()
logreg.fit(X_train, y_train)
logreg_predictions = logreg.predict(X_test)
print('Logistic Regression without Grid Search Accuracy:', accuracy_score(y_test, logreg_predictions))

Logistic Regression without Grid Search Accuracy: 1.0


# Exercise 3: Decision Tree Classifier with Grid Search
Instructions

Apply GridSearchCV to find the optimal hyperparameters for the Decision Tree Classifier.
Compare its performance with the manually tuned model from Exercise 2.


In [13]:

param_grid_lr = {'C': [0.1, 1, 10, 100], 'penalty': ['l1', 'l2'], 'solver': ['liblinear'], 'max_iter': [100, 1000, 10000]}
grid_search_lr = GridSearchCV(LogisticRegression(solver='liblinear'), param_grid_lr, cv=5)
grid_search_lr.fit(X_train, y_train)
best_lr = grid_search_lr.best_estimator_
print('Best Logistic Regression Accuracy:', accuracy_score(y_test, best_lr.predict(X_test)))


Best Logistic Regression Accuracy: 1.0




# Exercise 4: K-Nearest Neighbors (KNN) without Grid Search
Instructions

Train a KNN classifier on the Titanic dataset without using hyperparameter tuning.
Choose the number of neighbors and distance metric based on your understanding.
Assess the model’s performance and discuss the choice of hyperparameters.


In [14]:
svm = SVC(kernel='linear')
svm.fit(X_train, y_train)
svm_predictions = svm.predict(X_test)
print('SVM without Grid Search Accuracy:', accuracy_score(y_test, svm_predictions))

SVM without Grid Search Accuracy: 1.0


# Exercise 5: K-Nearest Neighbors (KNN) with Grid Search
Instructions

Use GridSearchCV to optimize the hyperparameters of the KNN classifier, like the number of neighbors and distance metric.
Evaluate and compare the performance of the tuned model against the model from Exercise 4.


In [15]:
param_grid_svm = {'C': [0.1, 1, 10, 100], 'gamma': [0.001, 0.01

, 0.1, 1], 'kernel': ['linear', 'rbf']}
grid_search_svm = GridSearchCV(SVC(), param_grid_svm, cv=5)
grid_search_svm.fit(X_train, y_train)
best_svm = grid_search_svm.best_estimator_
print('Best SVM Accuracy:', accuracy_score(y_test, best_svm.predict(X_test)))

Best SVM Accuracy: 1.0


# Exercise 6: Neural Network Classifier without Hyperparameter Tuning
Instructions

Build a basic Neural Network using libraries like TensorFlow or PyTorch to classify Titanic passengers.
Set the layers, neurons, and activation functions based on your intuition.
Analyze the model’s effectiveness in predicting survival.


In [17]:
# one hot encoding
y_train_enc = pd.get_dummies(y_train)
y_test_enc = pd.get_dummies(y_test)

In [18]:
xgb = XGBClassifier(use_label_encoder=False, eval_metric='mlogloss')
xgb.fit(X_train, y_train_enc)
xgb_predictions = xgb.predict(X_test)
print('XGBoost without Grid Search Accuracy:', accuracy_score(y_test_enc, xgb_predictions))

XGBoost without Grid Search Accuracy: 1.0


Parameters: { "use_label_encoder" } are not used.



# Exercise 7: Neural Network Classifier with Hyperparameter Tuning
Instructions

Implement a Neural Network and use techniques like RandomizedSearchCV (or other applicable methods) to tune hyperparameters like the number of layers, neurons, and learning rate.
Evaluate the performance and compare it with the Neural Network from Exercise 6.


In [20]:
param_grid_xgb = {'learning_rate': [0.01, 0.1, 0.5], 'n_estimators': [50, 100, 200], 'max_depth': [3, 5, 7]}
grid_search_xgb = GridSearchCV(XGBClassifier(use_label_encoder=False, eval_metric='mlogloss'), param_grid_xgb, cv=5)
grid_search_xgb.fit(X_train, y_train_enc)
best_xgb = grid_search_xgb.best_estimator_
print('Best XGBoost Accuracy:', accuracy_score(y_test_enc, best_xgb.predict(X_test)))

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encode

Best XGBoost Accuracy: 1.0


Parameters: { "use_label_encoder" } are not used.

Parameters: { "use_label_encoder" } are not used.

