## Introduction

Ever since the start of world war when the USSR launched the first ever manmade satelite into the space called the Sputnik, the world has taken great interests in trying to explore beyond the planet. Rocket science, Cosmology, Astronomy are the epitome of engineering and science that require extreme levels of theoretical as well as experimental work.

A lot of mathematics goes behind when and where should a space launch take place for reaching it's destination at least possible resistance and with highest probability of success. At the same time, extreme levels of engineering is done to simulate the similar space conditions back on earth and test the launch vehicles for any possible failures. All these space missions require Launch_Years of hard work, research and tests for success.

The dataset consists of the following columns:


* Company name : The space organisation undertaking the mission
* Location : The point of spacecraft launch on earth
* Datum : Date and time of liftoff
* Detail : Name and type of the spaceship
* Status of rocket : Whether the space craft is still under commission and active in it's mission
* Rocket : Cost of the mission in million dollars
* Status Mission : Whether the mission was successful.


# **1. Importing Libraries**

In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

# **2. Loading and Preprocessing the data**

In [3]:
data = pd.read_csv('/kaggle/input/all-space-missions-from-1957/Space_Corrected.csv')
data.head()

**Comment**<br>
The first two columns are meaningless. Lets drop them.

In [4]:

data = data.drop(['Unnamed: 0', 'Unnamed: 0.1'], axis=1)
data.head()

In [5]:
data.info()

**Comment**<br>
Rocket column has a space in the start.Lets fix it.

In [6]:
data = data.rename(columns = {' Rocket':'Rocket'})

In [7]:
data.shape

In [8]:
print('Null values:')
print(data.isnull().sum())
print('--'*40)
print('Percentage of Null Values:')
round(data.isnull().sum()/len(data)*100,2)

**Comment:** <br> Rocket column has about 77.71% of missing values.


In [9]:
# Analysing missing values for Description
data[data.Rocket.isnull()]

**Comment**<br>
Let us seperate Country from the location and seperate the Launch_Year from the Datum column.

In [10]:
data["Country"] = data["Location"].apply(lambda location: location.split(", ")[-1])
data['DateTime'] = pd.to_datetime(data['Datum'],utc = True) 
data['Launch_Year'] = data['DateTime'].apply(lambda datetime: datetime.year)
data['Month'] = data['DateTime'].apply(lambda datetime: datetime.month)
data['WeekDay'] = data['DateTime'].apply(lambda datetime: datetime.weekday())
data["Launch_Site"] = data["Location"].apply(lambda location: ", ".join(location.split(", ")[:-1]))
data.head()


In [11]:
data.Country.unique()

**Comments**:
There are certain regions that are not actually countries. Lets put them within the country/territory they belong.

In [12]:
area_to_country = {
    'Russia' : 'Russian Federation',
    'New Mexico' : 'USA',
    "Yellow Sea": 'China',
    "Shahrud Missile Test Site": "Iran",
    "Pacific Missile Range Facility": 'USA',
    "Barents Sea": 'Russian Federation',
    "Gran Canaria": 'USA'
}

data['Country'] = data['Country'].replace(area_to_country)

In [13]:
dict_comp = {'SpaceX':'USA','CASC':'China', 'Roscosmos':'Russia', 'ULA':'USA', 'JAXA':'Japan', 'Northrop':'USA', 'ExPace':'China',
'IAI':'Israel', 'Rocket Lab':'NZ', 'Virgin Orbit':'USA', 'VKS RF':'Russia', 'MHI':'Japan', 'IRGC':'Iran',
'Arianespace':'Europe', 'ISA':'Iran', 'Blue Origin':'USA', 'ISRO':'India', 'Exos':'USA', 'ILS':'USA',
'i-Space':'China', 'OneSpace':'China', 'Landspace':'China' ,'Eurockot':'Russia', 'Land Launch':'Russia',
'CASIC':'China', 'KCST':'North Korea', 'Sandia':'USA', 'Kosmotras':'Russia', 'Khrunichev':'Russia', 'Sea Launch':'Russia',
'KARI':'South Korea', 'ESA':'Europe', 'NASA':'USA', 'Boeing':'USA', 'ISAS':'Japan', 'SRC':'Russia', 'MITT':'Russia',
 'Lockheed':'USA','AEB':'Brazil', 'Starsem':'Europe', 'RVSN USSR':'Russia', 'EER':'USA', 'General Dynamics':'USA',
 'Martin Marietta':'USA', 'Yuzhmash':'Russia', 'Douglas':'USA', 'ASI':'Europe', 'US Air Force':'USA','CNES':'Europe',
  'CECLES':'Europe', 'RAE':'Europe', 'UT':'Japan', 'OKB-586':'Russia', 'AMBA':'USA',"Arm??e de l'Air":'Europe', 'US Navy':'USA'}

data['Sat_country'] = data['Company Name'].map(dict_comp)

In [14]:
data.Rocket.unique()

**Comment:**<br> Comma is present in some values as well as there are null values. Lets fix it.

In [15]:

data['Rocket'] = data['Rocket'].fillna(0.0).str.replace(',', '')
data.Rocket.unique()

In [16]:
data.tail()

In [17]:
data['Status Mission'].value_counts()

**Comment:**<br>
Grouping all categories of failure into one.

In [18]:
data["StatusMission"] = np.where(data["Status Mission"]=="Success",1,0)

In [19]:
data_group = data.groupby(["Sat_country","Launch_Year","StatusMission"]).agg({'Sat_country':'count'}).rename(columns={'Sat_country':'count'}).reset_index()
data_pivot = pd.pivot_table(data_group,index=["Sat_country","Launch_Year"],columns=["StatusMission"],values = 'count').reset_index()
data_pivot.columns.name=""
data_pivot.rename(columns = {1:'Success',0:'Failure'},inplace = True)
data_pivot.fillna(0,inplace=True)
data_pivot["Total_Launch"] = data_pivot.iloc[:,2:].sum(axis = 1)
# ndg=(data_pivot.groupby(["Sat_country","Launch_Year"])["Total_Launch"].sum()).reset_index() #this step needed? 

# **3. Exploaratory data analysis**

## Number of Space Missions by each company

In [20]:
df1 = data['Company Name'].value_counts().reset_index()

df1.columns = [
    'Company Name', 
    'Number of Missions'
]

df1 = df1.sort_values(['Number of Missions'])
fig = px.bar(
    df1, 
    x='Number of Missions', 
    y="Company Name", 
    orientation='h', 
    title='Number of Space Missions by each company', 
    width=800,
    height=1000,
    log_x = True,
)
fig.update_traces(marker_color='pink')
fig.update_layout(title_x=0.5) #centering the title
fig.show()

In [21]:
df1 = df1.sort_values(['Number of Missions'],ascending = False)
df1.head()

## Number of Space Missions by each country

In [22]:
df2 = data['Sat_country'].value_counts().reset_index()

df2.columns = [
    'Country', 
    'Number of Missions'
]

df2 = df2.sort_values(['Number of Missions'])
df2.head()

fig = px.bar(
    df2, 
    x='Number of Missions', 
    y="Country", 
    orientation='h', 
    title='Number of Space Missions by each country', 
    width=800,
    height=1000,
    log_x = True,
)
fig.update_traces(marker_color='green')
fig.update_layout(title_x=0.5) #centering the title
fig.show()

In [23]:
df2 = df2.sort_values(['Number of Missions'],ascending = False)
df2.head()

## Number of Space Missions by each Year

In [24]:
df3 = data['Launch_Year'].value_counts().reset_index()

df3.columns = [
    'Launch_Year', 
    'Number of Missions'
]

df3 = df3.sort_values(['Number of Missions'])

fig = px.bar(
    df3, 
    y='Number of Missions', 
    x="Launch_Year", 
    orientation='v', 
    title='Number of Space Missions by each Year', 
    width=1000,
    height=800,
)
fig.update_traces(marker_color='yellow')
fig.update_layout(title_x=0.5,xaxis = dict(
        tickmode = 'linear'
    )) #centering the title
fig.show()

In [25]:
df3 = df3.sort_values(['Number of Missions'],ascending = False)
df3.head()


## Number of Space Missions by each month

In [26]:
df4 = data['Month'].value_counts().reset_index()

df4.columns = [
    'Month', 
    'Number of Missions'
]

df4 = df4.sort_values(['Number of Missions'])

fig = px.bar(
    df4, 
    y='Number of Missions', 
    x="Month", 
    orientation='v', 
    title='Number of Space Missions by each month', 
    width=1000,
    height=800,
)
fig.update_traces(marker_color='grey')
fig.update_layout(title_x=0.5,xaxis = dict(
        tickmode = 'linear'
    )) #centering the title
fig.show()

In [27]:
df4 = df4.sort_values(['Number of Missions'],ascending = False)
df4.head()


## Status of the Rocket

In [28]:
df5 = data['Status Rocket'].value_counts().reset_index()

df5.columns = [
    'status', 
    'count'
]
colors = ['red','green']
fig = px.pie(
    df5, 
    values='count', 
    names="status",
    color ="status",
    title='Rocket status', 
    width=500, 
    height=500,
)
fig.update_traces(textposition='inside', textinfo='percent+label',marker=dict(colors=colors, line=dict(color='white', width=2)))
fig.show()

## Status of the Mission

In [29]:
df6 = data['Status Mission'].value_counts().reset_index()

df6.columns = [
    'Mission Status', 
    'count'
]
fig = px.bar(
    df6, 
    x='Mission Status', 
    y="count",
    orientation='v',
    title='Mission Status', 
    width=800,
    height=1000
)
fig.update_traces(marker_color='brown')
fig.update_layout(title_x=0.5) #centering the title
fig.show()


## Success analysis for different countries

In [114]:
data_group = data.groupby(["Sat_country","Launch_Year","StatusMission"]).agg({'Sat_country':'count'}).rename(columns={'Sat_country':'count'}).reset_index()
data_pivot = pd.pivot_table(data_group,index=["Sat_country","Launch_Year"],columns=["StatusMission"],values = 'count').reset_index()
data_pivot.columns.name=""
data_pivot.rename(columns = {1:'Success',0:'Failure'},inplace = True)
data_pivot.fillna(0,inplace=True)
data_pivot["Total_Launch"] = data_pivot.iloc[:,2:].sum(axis = 1)
# ndg=(data_pivot.groupby(["Sat_country","Launch_Year"])["Total_Launch"].sum()).reset_index() #this step needed? 

In [115]:
#selecting top five countries
df2 = df2.sort_values(['Number of Missions'],ascending = False)
top_countries = df2[:5].Country
top_countries

In [118]:
df7 = data_pivot[data_pivot["Sat_country"].isin(top_countries)]
df_7= (df7.groupby("Sat_country")['Success'].sum()/  df7.groupby("Sat_country")["Total_Launch"].sum()).sort_values(ascending=False)
df_7=df_7.reset_index().rename(columns={0:"Success %"})
df_7["Success %"]*=100
fig = px.bar(df_7,x="Sat_country",y="Success %",color="Sat_country",title = 'Top Five Countries with Successful Launches')
fig.update_layout(title_x=0.5) #centering the title
fig.show()

In [119]:
df_7.head(5)

In [120]:
sns.set_style("darkgrid")
row = 5
col = 1
f,ax=plt.subplots(row,col,figsize=(15,15))
for i in range(row):
    #get success and total launches for each country
    total_launch =df7.loc[df7["Sat_country"]==top_countries[i]]["Total_Launch"]
    success =df7.loc[df7["Sat_country"]==top_countries[i]]["Success"]
    launch_year = df7.loc[df7["Sat_country"]==top_countries[i]]["Launch_Year"]
    frame = pd.concat([total_launch,success_rate,launch_year],axis = 1)
    sns.lineplot(data = frame, x = launch_year,y = total_launch,color="g",ax = ax[i],label = 'Total Launch')
    sns.lineplot(data = frame,x = launch_year,y = success,color="r",ax = ax[i] ,label = 'Successful Launch')
    ax[i].set_title(top_countries[i])
    ax[i].legend().set_visible(False)
handles, labels = ax[i].get_legend_handles_labels()
f.legend(handles, labels, loc='upper right',ncol = 2)
f.tight_layout(pad = 2.8)
f.suptitle('Total Launches and successful launches over time for top 5 countries',y= 1.0,fontsize = 15)
plt.show()

## Success analysis for different companies

In [121]:
data_group = data.groupby(["Company Name","Launch_Year","StatusMission"]).agg({'Company Name':'count'}).rename(columns={'Company Name':'count'}).reset_index()
data_pivot = pd.pivot_table(data_group,index=["Company Name","Launch_Year"],columns=["StatusMission"],values = 'count').reset_index()
data_pivot.columns.name=""
data_pivot.rename(columns = {1:'Success',0:'Failure'},inplace = True)
data_pivot.fillna(0,inplace=True)
data_pivot["Total_Launch"] = data_pivot.iloc[:,2:].sum(axis = 1)
# ndg=(data_pivot.groupby(["Sat_country","Launch_Year"])["Total_Launch"].sum()).reset_index() #this step needed? 

In [122]:
#selecting top five countries
df1 = df1.sort_values(['Number of Missions'],ascending = False)
top_companies = df1[:10]['Company Name']
top_companies

In [126]:
df8 = data_pivot[data_pivot["Company Name"].isin(top_companies)]
df_8= (df8.groupby("Company Name")['Success'].sum()/  df8.groupby("Company Name")["Total_Launch"].sum()).sort_values(ascending=False)
df_8=df_8.reset_index().rename(columns={0:"Success %"})
df_8["Success %"]*=100
fig = px.bar(df_8,x="Company Name",y="Success %",color="Company Name",title = 'Top Ten Companies with Successful Launches')
fig.update_layout(title_x=0.5) #centering the title
fig.show()

In [125]:
df_8.head(10)

In [129]:
sns.set_style("darkgrid")
row = 10
col = 1
f,ax=plt.subplots(row,col,figsize=(15,30))
for i in range(row):
    #get success and total launches for each country
    total_launch =df8.loc[df8["Company Name"]==top_companies[i]]["Total_Launch"]
    success =df8.loc[df8["Company Name"]==top_companies[i]]["Success"]
    launch_year = df8.loc[df8["Company Name"]==top_companies[i]]["Launch_Year"]
    frame = pd.concat([total_launch,success_rate,launch_year],axis = 1)
    sns.lineplot(data = frame, x = launch_year,y = total_launch,color="g",ax = ax[i],label = 'Total Launch')
    sns.lineplot(data = frame,x = launch_year,y = success,color="r",ax = ax[i] ,label = 'Successful Launch')
    ax[i].set_title(top_companies[i])
    ax[i].legend().set_visible(False)
handles, labels = ax[i].get_legend_handles_labels()
f.legend(handles, labels, loc='upper right',ncol = 2)
f.tight_layout(pad = 2.8)
f.suptitle('Total Launches and successful launches over time for top 10 companies',y= 1.0,fontsize = 15)
plt.show()