# Problem Definition & Objective

**Project Track:** Space Market Trends & Predictive Analytics

**Problem Statement:** The Indian space sector is rapidly expanding with more commercial satellite launches. However, launch costs are extremely high, and every mission carries a risk of failure. My project aims to analyze the historical performance of ISRO's rockets (PSLV, GSLV, and LVM3) and build a machine learning model that predicts the success probability of a mission based on its weight, the vehicle used, and the target orbit.

**Motivation:** By predicting success probabilities, commercial satellite operators can better understand the risk profile of their missions. This helps in making decisions about insurance and budgeting, effectively moving from \"best-guess\" estimates to data-driven risk management.

# Data Understanding & Preparation

I sourced the data by scraping Wikipedia's launch history for the PSLV, GSLV, and LVM3 rockets. The raw data was very messy because Wikipedia often lists multiple satellites in a single mission block.

To clean this, I did the following:

**Mission Grouping:** I used a forward-fill technique to make sure sub-rows for small satellites were linked to the correct mission outcome.

**Mass Extraction:** I wrote a custom script to find all \"kg\" values in a mission and sum them up to get the total payload weight.

**Filtering:** I removed rows that weren't actual launches (like headers or future planned missions) and kept only those with a clear \"Success\" or \"Failure\" status.

**Jan 2026 Update:** I made sure to include the most recent data, including the PSLV-C62 mission from January 12, 2026.

# Model / System Design

I chose a **Random Forest Classifier** for this task. I picked this model because our dataset is relatively small (about 90 missions), and Random Forest is excellent at handling categorical data (like rocket names and orbits) without overcomplicating things.

The system works like this:

**Inputs:** Rocket variant, Total Payload Mass (kg), Orbit Type, and Launch Site.

**Processing:** The categorical data is converted into numbers (One-Hot Encoding).

**Output:** Instead of just saying \"Success\" or \"Failure,\" the model provides a probability score (e.g., \"This mission has an 85% chance of success\").

# Core Implementation

The implementation is broken down into three logical phases:

**Pipeline:** A script that handles the scraping and complex data cleaning. It handles the \"rowspan\" issue found in HTML tables.

**EDA (Exploratory Data Analysis):** A notebook where I visualized the success rates of different rockets over time and looked for correlations between heavy payloads and mission failures.

**Model Training:** The final script where I trained the Random Forest model and tested its accuracy using a standard train-test split.

# Evaluation & Analysis

The model was evaluated based on its accuracy and its ability to distinguish between different failure modes.

**Observation:** I noticed that the PSLV remains the most reliable \"workhorse,\" though recent anomalies in 2025 and 2026 have slightly lowered its success rate.

**Result:** The model successfully identified that missions with higher payload masses in specific elliptical orbits (like GTO) tend to have a different risk profile compared to standard LEO missions.

**Feature Importance:** The most important factors for the model were the \"Vehicle Family\" and the \"Payload Mass,\" which makes sense as these directly impact the rocket's performance.

# Ethical Considerations & Responsible AI

In the context of space and defense, AI ethics are crucial. I have addressed the following:

**Transparency:** I used a model that allows for \"Feature Importance\" analysis, so we can explain why a certain risk score was given rather than it being a \"black box.\"

**Bias:** Rocket data can be biased toward older technology. I accounted for this by including recent mission data (up to Jan 2026) to ensure the model reflects the current state of ISRO's hardware.

**Accountability:** AI in space should only be used as a decision-support tool. A human engineer should always make the final go/no-go call; this model is simply for cost and risk estimation.

# Conclusion & Future Scope

**Conclusion:** This project demonstrates that historical launch data can be used to create a baseline risk score for satellite launches. While space is inherently unpredictable, having a numeric probability helps in financial planning and mission design.

**Future Scope:** In the future, I would like to add weather data (wind speed, temperature) at the time of launch, as weather is a major cause of delays and anomalies. I also plan to expand the dataset to include global agencies like SpaceX to see how ISRO's reliability compares on a global scale.