## Introduction

Ever since the start of world war when the USSR launched the first ever manmade satelite into the space called the Sputnik, the world has taken great interests in trying to explore beyond the planet. Rocket science, Cosmology, Astronomy are the epitome of engineering and science that require extreme levels of theoretical as well as experimental work.

A lot of mathematics goes behind when and where should a space launch take place for reaching it's destination at least possible resistance and with highest probability of success. At the same time, extreme levels of engineering is done to simulate the similar space conditions back on earth and test the launch vehicles for any possible failures. All these space missions require years of hard work, research and tests for success.

The dataset consists of the following columns:


* Company name : The space organisation undertaking the mission
* Location : The point of spacecraft launch on earth
* Datum : Date and time of liftoff
* Detail : Name and type of the spaceship
* Status of rocket : Whether the space craft is still under commission and active in it's mission
* Rocket : Cost of the mission in million dollars
* Status Mission : Whether the mission was successful.

 

**Please upvote if you like my work.**

# **1. Importing Libraries**

In [24]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px



# **2. Loading and Preprocessing the data**

In [2]:
data = pd.read_csv('/kaggle/input/all-space-missions-from-1957/Space_Corrected.csv')
data.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Company Name,Location,Datum,Detail,Status Rocket,Rocket,Status Mission
0,0,0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Fri Aug 07, 2020 05:12 UTC",Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,50.0,Success
1,1,1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Thu Aug 06, 2020 04:01 UTC",Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,29.75,Success
2,2,2,SpaceX,"Pad A, Boca Chica, Texas, USA","Tue Aug 04, 2020 23:57 UTC",Starship Prototype | 150 Meter Hop,StatusActive,,Success
3,3,3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan","Thu Jul 30, 2020 21:25 UTC",Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,65.0,Success
4,4,4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA","Thu Jul 30, 2020 11:50 UTC",Atlas V 541 | Perseverance,StatusActive,145.0,Success


In [3]:

data = data.drop(['Unnamed: 0', 'Unnamed: 0.1'], axis=1)
data.head()

Unnamed: 0,Company Name,Location,Datum,Detail,Status Rocket,Rocket,Status Mission
0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Fri Aug 07, 2020 05:12 UTC",Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,50.0,Success
1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Thu Aug 06, 2020 04:01 UTC",Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,29.75,Success
2,SpaceX,"Pad A, Boca Chica, Texas, USA","Tue Aug 04, 2020 23:57 UTC",Starship Prototype | 150 Meter Hop,StatusActive,,Success
3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan","Thu Jul 30, 2020 21:25 UTC",Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,65.0,Success
4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA","Thu Jul 30, 2020 11:50 UTC",Atlas V 541 | Perseverance,StatusActive,145.0,Success


In [4]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4324 entries, 0 to 4323
Data columns (total 7 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Company Name    4324 non-null   object
 1   Location        4324 non-null   object
 2   Datum           4324 non-null   object
 3   Detail          4324 non-null   object
 4   Status Rocket   4324 non-null   object
 5    Rocket         964 non-null    object
 6   Status Mission  4324 non-null   object
dtypes: object(7)
memory usage: 236.6+ KB


**Comment**<br>
Rocket column has a space in the start.Lets fix it.

In [10]:
data = data.rename(columns = {' Rocket':'Rocket'})

In [11]:
data.shape

(4324, 7)

In [12]:
print('Null values:')
print(data.isnull().sum())
print('--'*40)
print('Percentage of Null Values:')
round(data.isnull().sum()/len(data)*100,2)

Null values:
Company Name         0
Location             0
Datum                0
Detail               0
Status Rocket        0
Rocket            3360
Status Mission       0
dtype: int64
--------------------------------------------------------------------------------
Percentage of Null Values:


Company Name       0.00
Location           0.00
Datum              0.00
Detail             0.00
Status Rocket      0.00
Rocket            77.71
Status Mission     0.00
dtype: float64

**Comment:** <br> Rocket column has about 77.71% of missing values.


In [13]:
# Analysing missing values for Description
data[data.Rocket.isnull()]

Unnamed: 0,Company Name,Location,Datum,Detail,Status Rocket,Rocket,Status Mission
2,SpaceX,"Pad A, Boca Chica, Texas, USA","Tue Aug 04, 2020 23:57 UTC",Starship Prototype | 150 Meter Hop,StatusActive,,Success
7,CASC,"LC-101, Wenchang Satellite Launch Center, China","Thu Jul 23, 2020 04:41 UTC",Long March 5 | Tianwen-1,StatusActive,,Success
13,IAI,"Pad 1, Palmachim Airbase, Israel","Mon Jul 06, 2020 01:00 UTC",Shavit-2 | Ofek-16,StatusActive,,Success
28,VKS RF,"Site 43/4, Plesetsk Cosmodrome, Russia","Fri May 22, 2020 07:31 UTC",Soyuz 2.1b/Fregat-M | Cosmos 2546,StatusActive,,Success
31,ExPace,"Site 95, Jiuquan Satellite Launch Center, China","Tue May 12, 2020 01:16 UTC",Kuaizhou 1A | Xingyun-2 01 (Wuhan) & 02,StatusActive,,Success
...,...,...,...,...,...,...,...
4319,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA","Wed Feb 05, 1958 07:33 UTC",Vanguard | Vanguard TV3BU,StatusRetired,,Failure
4320,AMBA,"LC-26A, Cape Canaveral AFS, Florida, USA","Sat Feb 01, 1958 03:48 UTC",Juno I | Explorer 1,StatusRetired,,Success
4321,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA","Fri Dec 06, 1957 16:44 UTC",Vanguard | Vanguard TV3,StatusRetired,,Failure
4322,RVSN USSR,"Site 1/5, Baikonur Cosmodrome, Kazakhstan","Sun Nov 03, 1957 02:30 UTC",Sputnik 8K71PS | Sputnik-2,StatusRetired,,Success


**Comment**<br>
Let us seperate Country from the location and seperate the year from the Datum column.

In [39]:
data["Country"] = data["Location"].apply(lambda location: location.split(", ")[-1])
data['DateTime'] = pd.to_datetime(data['Datum']) 
data['Year'] = data['DateTime'].apply(lambda datetime: datetime.year)
data["Launch_Site"] = data["Location"].apply(lambda location: ", ".join(location.split(", ")[:-1]))
data.head()


Unnamed: 0,Company Name,Location,Datum,Detail,Status Rocket,Rocket,Status Mission,Country,DateTime,Year,Launch_Site
0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Fri Aug 07, 2020 05:12 UTC",Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,50.0,Success,USA,2020-08-07 05:12:00+00:00,2020,"LC-39A, Kennedy Space Center, Florida"
1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Thu Aug 06, 2020 04:01 UTC",Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,29.75,Success,China,2020-08-06 04:01:00+00:00,2020,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce..."
2,SpaceX,"Pad A, Boca Chica, Texas, USA","Tue Aug 04, 2020 23:57 UTC",Starship Prototype | 150 Meter Hop,StatusActive,,Success,USA,2020-08-04 23:57:00+00:00,2020,"Pad A, Boca Chica, Texas"
3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan","Thu Jul 30, 2020 21:25 UTC",Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,65.0,Success,Kazakhstan,2020-07-30 21:25:00+00:00,2020,"Site 200/39, Baikonur Cosmodrome"
4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA","Thu Jul 30, 2020 11:50 UTC",Atlas V 541 | Perseverance,StatusActive,145.0,Success,USA,2020-07-30 11:50:00+00:00,2020,"SLC-41, Cape Canaveral AFS, Florida"


In [40]:
data.Rocket.unique()

array(['50.0 ', '29.75 ', nan, '65.0 ', '145.0 ', '64.68 ', '48.5 ',
       '90.0 ', '46.0 ', '28.3 ', '29.15 ', '7.5 ', '30.8 ', '5.3 ',
       '12.0 ', '112.5 ', '120.0 ', '153.0 ', '200.0 ', '85.0 ', '115.0 ',
       '41.8 ', '21.0 ', '31.0 ', '40.0 ', '164.0 ', '62.0 ', '37.0 ',
       '350.0 ', '39.0 ', '47.0 ', '35.0 ', '69.7 ', '109.0 ', '45.0 ',
       '123.0 ', '130.0 ', '25.0 ', '56.5 ', '15.0 ', '29.0 ', '80.0 ',
       '140.0 ', '55.0 ', '59.5 ', '450.0 ', '7.0 ', '20.14 ', '133.0 ',
       '190.0 ', '135.0 ', '20.0 ', '136.6 ', '5,000.0 ', '1,160.0 ',
       '59.0 ', '63.23 '], dtype=object)

**Comment:**<br> Comma is present in some values as well as there are null values. Lets fix it.

In [41]:

data['Rocket'] = data['Rocket'].fillna(0.0).str.replace(',', '')
data.Rocket.unique()

array(['50.0 ', '29.75 ', nan, '65.0 ', '145.0 ', '64.68 ', '48.5 ',
       '90.0 ', '46.0 ', '28.3 ', '29.15 ', '7.5 ', '30.8 ', '5.3 ',
       '12.0 ', '112.5 ', '120.0 ', '153.0 ', '200.0 ', '85.0 ', '115.0 ',
       '41.8 ', '21.0 ', '31.0 ', '40.0 ', '164.0 ', '62.0 ', '37.0 ',
       '350.0 ', '39.0 ', '47.0 ', '35.0 ', '69.7 ', '109.0 ', '45.0 ',
       '123.0 ', '130.0 ', '25.0 ', '56.5 ', '15.0 ', '29.0 ', '80.0 ',
       '140.0 ', '55.0 ', '59.5 ', '450.0 ', '7.0 ', '20.14 ', '133.0 ',
       '190.0 ', '135.0 ', '20.0 ', '136.6 ', '5000.0 ', '1160.0 ',
       '59.0 ', '63.23 '], dtype=object)

# Number of Space Missions by each company

In [42]:
df1 = data['Company Name'].value_counts().reset_index()

df1.columns = [
    'Company Name', 
    'Number of Missions'
]

df1 = df1.sort_values(['Number of Missions'])

fig = px.bar(
    df1, 
    x='Number of Missions', 
    y="Company Name", 
    orientation='h', 
    title='Number of Space Missions by each company', 
    width=800,
    height=1000,
    log_x = True,
)
fig.update_traces(marker_color='pink')
fig.update_layout(title_x=0.5) #centering the title
fig.show()

# Number of Space Missions by each country

In [60]:
df2 = data['Country'].value_counts().reset_index()

df2.columns = [
    'Country', 
    'Number of Missions'
]

df2 = df2.sort_values(['Number of Missions'])

fig = px.bar(
    df2, 
    x='Number of Missions', 
    y="Country", 
    orientation='h', 
    title='Number of Space Missions by each country', 
    width=800,
    height=1000,
    log_x = True,
)
fig.update_traces(marker_color='green')
fig.update_layout(title_x=0.5) #centering the title
fig.show()

## Number of Space Missions by each year

In [79]:
df3 = data['Year'].value_counts().reset_index()

df3.columns = [
    'Year', 
    'Number of Missions'
]

df3 = df3.sort_values(['Number of Missions'])

fig = px.bar(
    df3, 
    x='Number of Missions', 
    y="Year", 
    orientation='h', 
    title='Number of Space Missions by each year', 
    width=800,
    height=1000,
)
fig.update_traces(marker_color='yellow')
fig.update_layout(title_x=0.5,yaxis = dict(
        tickmode = 'linear'
    )) #centering the title
fig.show()

In [80]:
df3 = df3.sort_values(['Number of Missions'],ascending = False)
df3.head()


Unnamed: 0,Year,Number of Missions
0,1971,119
1,2018,117
2,1977,114
3,1975,113
4,1976,113


In [75]:
df3

Unnamed: 0,Year,Number of Missions
63,1957,3
62,1959,20
61,1958,28
60,2005,37
59,2010,37
...,...,...
4,1976,113
3,1975,113
2,1977,114
1,2018,117


In [72]:
df4 = data['Status Rocket'].value_counts().reset_index()

df4.columns = [
    'status', 
    'count'
]
colors = ['red','green']
fig = px.pie(
    df4, 
    values='count', 
    names="status",
    color ="status",
    title='Rocket status', 
    width=500, 
    height=500,
)
fig.update_traces(textposition='inside', textinfo='percent+label',marker=dict(colors=colors, line=dict(color='white', width=2)))
fig.show()

In [55]:
data['Status Mission'].value_counts()

Success              3879
Failure               339
Partial Failure       102
Prelaunch Failure       4
Name: Status Mission, dtype: int64

In [68]:
df5 = data['Status Mission'].value_counts().reset_index()

df5.columns = [
    'Mission Status', 
    'count'
]
fig = px.bar(
    df5, 
    x='Mission Status', 
    y="count",
    orientation='v',
    title='Mission Status', 
    width=800,
    height=1000
)
fig.update_traces(marker_color='brown')
fig.update_layout(title_x=0.5) #centering the title
fig.show()
