# Stage 07 — Machine Learning Modeling

## Objectives
Use the prepared Falcon 9 dataset to build and evaluate supervised models that predict the **first-stage landing outcome**.


**Goals**
- Perform exploratory data analysis (EDA) to validate assumptions and spot key patterns.
- Define and create the **training label** (`class`): 1 = success, 0 = failure.
- Standardize/normalize features as needed.
- Split data into **train** and **test** sets.
- Train and tune:
  - **Support Vector Machine (SVM)**
  - **Decision Tree Classifier**
  - **Logistic Regression**
- Select the **best model** based on test performance.

## Import Libraries and Define Auxiliary Functions


In [None]:
# Pandas is a software library written for the Python programming language for data manipulation and analysis.
import pandas as pd
# NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays
import numpy as np
# Matplotlib is a plotting library for python and pyplot gives us a MatLab like plotting framework. We will use this in our plotter function to plot data.
import matplotlib.pyplot as plt
#Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics
import seaborn as sns
# Preprocessing allows us to standarsize our data
from sklearn import preprocessing
# Allows us to split our data into training and testing data
from sklearn.model_selection import train_test_split
# Allows us to test parameters of classification algorithms and find the best one
from sklearn.model_selection import GridSearchCV
# Logistic Regression classification algorithm
from sklearn.linear_model import LogisticRegression
# Support Vector Machine classification algorithm
from sklearn.svm import SVC
# Decision Tree classification algorithm
from sklearn.tree import DecisionTreeClassifier
# K Nearest Neighbors classification algorithm
from sklearn.neighbors import KNeighborsClassifier

Function to plot confusion matrix:


In [None]:
def plot_confusion_matrix(y,y_predict):
    "this function plots the confusion matrix"
    from sklearn.metrics import confusion_matrix

    cm = confusion_matrix(y, y_predict)
    ax= plt.subplot()
    sns.heatmap(cm, annot=True, ax = ax); #annot=True to annotate cells
    ax.set_xlabel('Predicted labels')
    ax.set_ylabel('True labels')
    ax.set_title('Confusion Matrix'); 
    ax.xaxis.set_ticklabels(['did not land', 'land']); ax.yaxis.set_ticklabels(['did not land', 'landed']) 
    plt.show() 

## Load the dataframe


In [None]:
data = pd.read_csv('dataset_part_2.csv')

In [None]:
data.head()

In [None]:
X = pd.read_csv('dataset_part_3.csv')

In [None]:
X.head(100)

## Array for Column Class


In [None]:
Y = data['Class'].to_numpy()

## Standardize the Data

In [None]:
# students get this 
transform = preprocessing.StandardScaler()
X = transform.fit_transform(X)

## Train/Test Split


In [None]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size= 0.2, random_state = 2)

In [None]:
Y_test.shape

## Logistic Regression

In [None]:
parameters ={'C':[0.01,0.1,1],
             'penalty':['l2'],
             'solver':['lbfgs']}

In [None]:
parameters ={"C":[0.01,0.1,1],'penalty':['l2'], 'solver':['lbfgs']}# l1 lasso l2 ridge
lr=LogisticRegression()


In [None]:
logreg_cv = GridSearchCV(estimator=lr, param_grid=parameters, cv=10)
logreg_cv.fit(X_train, Y_train)
print("tuned hpyerparameters :(best parameters) ",logreg_cv.best_params_)
print("accuracy :",logreg_cv.best_score_)

## LogReg Accuracy


Method <code>score</code>:


In [None]:
test_accuracy = logreg_cv.score(X_test, Y_test)
test_accuracy

Confusion matrix:


In [None]:
yhat=logreg_cv.predict(X_test)
plot_confusion_matrix(Y_test,yhat)

## SVM


In [None]:
parameters = {'kernel':('linear', 'rbf','poly','rbf', 'sigmoid'),
              'C': np.logspace(-3, 3, 5),
              'gamma':np.logspace(-3, 3, 5)}
svm = SVC()

In [None]:
svm_cv = GridSearchCV(estimator=svm, param_grid=parameters, cv=10)
svm_cv.fit(X_train, Y_train)

In [None]:
print("tuned hpyerparameters :(best parameters) ",svm_cv.best_params_)
print("accuracy :",svm_cv.best_score_)

## SVM Accuracy


Method <code>score</code>:


In [None]:
test_accu = svm_cv.score(X_test, Y_test)
test_accu

Confusion matrix:


In [None]:
yhat=svm_cv.predict(X_test)
plot_confusion_matrix(Y_test,yhat)

## Decision Tree


In [None]:
parameters = {'criterion': ['gini', 'entropy'],
     'splitter': ['best', 'random'],
     'max_depth': [2*n for n in range(1,10)],
     'max_features': ['sqrt'],
     'min_samples_leaf': [1, 2, 4],
     'min_samples_split': [2, 5, 10]}

tree = DecisionTreeClassifier()

In [None]:
tree_cv = GridSearchCV(estimator=tree, param_grid=parameters, cv=10)
tree_cv.fit(X_train, Y_train)

In [None]:
print("tuned hpyerparameters :(best parameters) ",tree_cv.best_params_)
print("accuracy :",tree_cv.best_score_)

## Decision Tree Accuracy


Method <code>score</code>:


In [None]:
tree_accuracy = tree_cv.score(X_test, Y_test)
tree_accuracy

Confusion matrix:


In [None]:
yhat = tree_cv.predict(X_test)
plot_confusion_matrix(Y_test,yhat)

## K-Nearest Neighbors


In [None]:
parameters = {'n_neighbors': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
              'algorithm': ['auto', 'ball_tree', 'kd_tree', 'brute'],
              'p': [1,2]}

KNN = KNeighborsClassifier()

In [None]:
knn_cv = GridSearchCV(estimator=KNN, param_grid=parameters, cv=10)
knn_cv.fit(X_train, Y_train)

In [None]:
print("tuned hpyerparameters :(best parameters) ",knn_cv.best_params_)
print("accuracy :",knn_cv.best_score_)

## KNN Accuracy


Method <code>score</code>:


In [None]:
KNN_accuracy = knn_cv.score(X_test, Y_test)
KNN_accuracy

Confusion matrix:


In [None]:
yhat = knn_cv.predict(X_test)
plot_confusion_matrix(Y_test,yhat)

## BEST Performance


In [None]:
models = {
    'Logistic Regression': logreg_cv,
    'SVM': svm_cv,
    'Decision Tree': tree_cv,
    'KNN': knn_cv
}

for name, model in models.items():
    print(f"{name} best CV accuracy: {model.best_score_:.4f}")
    print(f"{name} test accuracy: {model.score(X_test, Y_test):.4f}\n")

## Authors


[Pratiksha Verma](https://www.linkedin.com/in/pratiksha-verma-6487561b1/)


<!--## Change Log--!>


<!--| Date (YYYY-MM-DD) | Version | Changed By      | Change Description      |
| ----------------- | ------- | -------------   | ----------------------- |
| 2022-11-09        | 1.0     | Pratiksha Verma | Converted initial version to Jupyterlite|--!>


### <h3 align="center"> IBM Corporation 2022. All rights reserved. <h3/>
