# 🚀 Space Launch Mission Success Analysis
##  Group Presentation 4

**Team Members:**  
- Reham Abuarqoub, 9062922
- Erica Holden, 5490685
- Yu-Chen Chou (Tracy), 9006160 



#  Term Project - Space Mission Dataset Extension

##  1-Minute Summary of Use Case

We extended our term project by integrating a new dataset: **Global Space Mission Launches**. This data provides historical records of launch events, locations, companies, rocket types, and success status. We aim to analyze patterns and factors influencing mission outcomes. 

🧪 **Revised Hypothesis**  
**Null (H0):** The rocket cost and company do *not* significantly affect mission success.  
**Alternative (H1):** The rocket cost and company *do* significantly affect mission success.

This hypothesis update will guide our feature analysis and predictive modeling.



In [1]:
# 📦 Imports
import pandas as pd
import numpy as np
from sklearn.feature_selection import VarianceThreshold
from sklearn.ensemble import RandomForestClassifier
from sklearn.decomposition import PCA
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.feature_selection import SelectKBest, chi2

# 📊 Data Exploration + Encapsulation
class SpaceMissions:
    def __init__(self, filepath):
        self.df = pd.read_csv(filepath)
        self.cleaned_df = None
        self.feature_df = None

    def clean_data(self):
        df = self.df.copy()
        df.columns = df.columns.str.strip()

        # Drop duplicate index columns
        df.drop(columns=['Unnamed: 0', 'Unnamed: 0.1'], inplace=True)

        # Convert 'Datum' to datetime
        df['Datum'] = pd.to_datetime(df['Datum'], errors='coerce')

        # Encode Status Mission (Success/Failure)
        df['Success'] = df['Status Mission'].apply(lambda x: 1 if 'Success' in x else 0)

        # Fill missing Rocket cost with median (numerical column)
        df['Rocket'] = pd.to_numeric(df['Rocket'], errors='coerce')
        df['Rocket'].fillna(df['Rocket'].median(), inplace=True)

        # Encode Company Name
        df['Company Code'] = LabelEncoder().fit_transform(df['Company Name'])

        self.cleaned_df = df
        return df

    def remove_low_variance(self):
        selector = VarianceThreshold(threshold=0.01)
        features = self.cleaned_df[['Rocket', 'Company Code']]
        reduced = selector.fit_transform(features)
        self.feature_df = pd.DataFrame(reduced, columns=['Rocket', 'Company Code'])
        return self.feature_df

    def remove_high_correlation(self):
        corr_matrix = self.feature_df.corr().abs()
        upper = corr_matrix.where(np.triu(np.ones(corr_matrix.shape), k=1).astype(bool))
        to_drop = [column for column in upper.columns if any(upper[column] > 0.95)]
        self.feature_df.drop(columns=to_drop, inplace=True)
        return self.feature_df

    def apply_pca(self, n_components=2):
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(self.feature_df)
        pca = PCA(n_components=n_components)
        components = pca.fit_transform(X_scaled)
        return pd.DataFrame(components, columns=[f'PC{i+1}' for i in range(n_components)])

    def feature_importance_rf(self):
        X = self.feature_df
        y = self.cleaned_df['Success']
        model = RandomForestClassifier(random_state=42)
        model.fit(X, y)
        importance = pd.Series(model.feature_importances_, index=X.columns)
        return importance.sort_values(ascending=False)

    def forward_selection(self):
        X = self.feature_df
        y = self.cleaned_df['Success']
        selector = SelectKBest(score_func=chi2, k='all')
        selector.fit(X, y)
        scores = pd.Series(selector.scores_, index=X.columns)
        return scores.sort_values(ascending=False)


## 🔍 Hypothesis Testing & Feature Selection

- **Rocket Cost** and **Company Code** are tested for their predictive power.
- We use:
  - Missing value handling,
  - Low variance filtering,
  - High correlation removal,
  - PCA for dimensionality reduction,
  - Random Forest for feature importance,
  - Chi² for forward selection.

---
