# Launch Success Prediction Using Machine Learning

This notebook uses classification algorithms to predict Falcon 9 launch success using features like PayloadMass, Orbit, and Booster Version.

# Model Training and Prediction

In this notebook, we train a Logistic Regression model to predict SpaceX Falcon 9 launch success using features such as payload mass, booster version, and orbit type. We also address class imbalance and evaluate the model's performance.

In [12]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

In [13]:
df = pd.read_csv(r'C:\Users\user\Downloads\falcon9_cleaned.csv')
print("Original class distribution:")
print(df['Class'].value_counts())

Original class distribution:
Class
1    100
0      1
Name: count, dtype: int64


## Balancing the Dataset

The dataset is highly imbalanced with very few failed launches. To train a fair model, we need to balance the data. We duplicate failure cases and sample an equal number of success cases.

In [14]:
# Separate classes
fail_df = df[df['Class'] == 0]
success_df = df[df['Class'] == 1]

# Duplicate failure rows (10 times for balancing)
fail_df = pd.concat([fail_df]*10, ignore_index=True)

# Sample equal number of success rows
success_sample = success_df.sample(n=len(fail_df), random_state=42)

# Combine and shuffle
balanced_df = pd.concat([success_sample, fail_df]).sample(frac=1, random_state=42)

# Check new class distribution
print("Balanced class distribution:")
print(balanced_df['Class'].value_counts())

Balanced class distribution:
Class
1    10
0    10
Name: count, dtype: int64


## Feature Engineering

We use the payload mass, orbit type, and booster version as features. Categorical features are converted into numerical format using one-hot encoding.

In [15]:
features = balanced_df[['PayloadMass', 'Orbit', 'Booster_Version']]
X = pd.get_dummies(features)  # One-hot encode categorical features
y = balanced_df['Class']

## Train-Test Split

We split the data into training and testing sets to evaluate model performance.

In [16]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

## Feature Scaling

To improve model convergence, we scale the numeric features using `StandardScaler`.

In [17]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

In [18]:
model = LogisticRegression()
model.fit(X_train_scaled, y_train)

y_pred = model.predict(X_test_scaled)

## Model Evaluation

We evaluate the model using a confusion matrix and classification report.

In [19]:
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Confusion Matrix:
 [[3 0]
 [0 1]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00         3
           1       1.00      1.00      1.00         1

    accuracy                           1.00         4
   macro avg       1.00      1.00      1.00         4
weighted avg       1.00      1.00      1.00         4



### Summary

The logistic regression model was trained on a balanced dataset. The evaluation metrics suggest it can reasonably distinguish between successful and failed launches. More complex models can be explored for improved accuracy.