>### Questions that I tried to answer in dataset

>>* How have rocket launches trended across time? Has mission success rate increased?
>>* Which countries have had the most successful space missions? Has it always been that way?
>>* Which rocket has been used for the most space missions? Is it still active?
>>* Are there any patterns you can notice with the launch locations?

## Data Cleaning and Preparation

In [1]:
import pandas as pd # to import data in form of table and to filter it

In [2]:
import numpy as np # to solve complex calculation if there's any

In [3]:
import os
os.listdir('/kaggle/input/space-missions')

['space_missions.csv', 'space_missions_data_dictionary.csv']

In [4]:

space_missions_raw_df = pd.read_csv('/kaggle/input/space-missions/space_missions.csv', encoding_errors='ignore')
space_missions_raw_df

Unnamed: 0,Company,Location,Date,Time,Rocket,Mission,RocketStatus,Price,MissionStatus
0,RVSN USSR,"Site 1/5, Baikonur Cosmodrome, Kazakhstan",1957-10-04,19:28:00,Sputnik 8K71PS,Sputnik-1,Retired,,Success
1,RVSN USSR,"Site 1/5, Baikonur Cosmodrome, Kazakhstan",1957-11-03,02:30:00,Sputnik 8K71PS,Sputnik-2,Retired,,Success
2,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA",1957-12-06,16:44:00,Vanguard,Vanguard TV3,Retired,,Failure
3,AMBA,"LC-26A, Cape Canaveral AFS, Florida, USA",1958-02-01,03:48:00,Juno I,Explorer 1,Retired,,Success
4,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA",1958-02-05,07:33:00,Vanguard,Vanguard TV3BU,Retired,,Failure
...,...,...,...,...,...,...,...,...,...
4625,SpaceX,"SLC-4E, Vandenberg SFB, California, USA",2022-07-22,17:39:00,Falcon 9 Block 5,Starlink Group 3-2,Active,67,Success
4626,CASC,"LC-101, Wenchang Satellite Launch Center, China",2022-07-24,06:22:00,Long March 5B,Wentian,Active,,Success
4627,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA",2022-07-24,13:38:00,Falcon 9 Block 5,Starlink Group 4-25,Active,67,Success
4628,CAS Space,"Jiuquan Satellite Launch Center, China",2022-07-27,04:12:00,Zhongke-1A,Demo Flight,Active,,Success


In [5]:
space_missions_raw_df.shape #Number of columns and rows in following dataset

(4630, 9)

In [6]:
space_missions_raw_df.describe() # to know basic information about dataset

Unnamed: 0,Company,Location,Date,Time,Rocket,Mission,RocketStatus,Price,MissionStatus
count,4630,4630,4630,4503,4630,4630,4630,1265,4630
unique,62,158,4180,1300,370,4556,2,65,4
top,RVSN USSR,"Site 31/6, Baikonur Cosmodrome, Kazakhstan",1962-04-26,12:00:00,Cosmos-3M (11K65M),DSP,Retired,450,Success
freq,1777,251,4,52,446,8,3620,136,4162


In [7]:
space_missions_raw_df.info() # to know non-null values and data types in which data is stored

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4630 entries, 0 to 4629
Data columns (total 9 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   Company        4630 non-null   object
 1   Location       4630 non-null   object
 2   Date           4630 non-null   object
 3   Time           4503 non-null   object
 4   Rocket         4630 non-null   object
 5   Mission        4630 non-null   object
 6   RocketStatus   4630 non-null   object
 7   Price          1265 non-null   object
 8   MissionStatus  4630 non-null   object
dtypes: object(9)
memory usage: 325.7+ KB


> Since non-null count for **price** column is too much lower. Hence, it is clear that we cannot get any accurate information from it, due to which I am going to drop that column in further steps.

In [8]:
space_missions_raw_df.columns

Index(['Company', 'Location', 'Date', 'Time', 'Rocket', 'Mission',
       'RocketStatus', 'Price', 'MissionStatus'],
      dtype='object')

In [9]:
space_missions_dict_raw_df = pd.read_csv('/kaggle/input/space-missions/space_missions_data_dictionary.csv', index_col = 'Field')
space_missions_dict_raw_df # description of the columns that are present in initial dataset

Unnamed: 0_level_0,Description
Field,Unnamed: 1_level_1
Company,Company responsible for the space mission
Location,Location of the launch
Date,Date of the launch
Time,Time of the launch (UTC)
Rocket,Name of the rocket used for the mission
Mission,Name of the space mission (or missions)
RocketStatus,Status of the rocket as of August 2022 (Active...
Price,Cost of the rocket in millions of US dollars
MissionStatus,"Status of the mission (Success, Failure, Parti..."


In [10]:
space_missions_raw_df2 = space_missions_raw_df.copy() #to make copy of data to keep orignal data as it is
space_missions_df = space_missions_raw_df2.drop(columns = ['Price']) #to make dataset without price column
space_missions_df.drop(columns = ['Time'], inplace = True) #to make dataset without time column
space_missions_df

Unnamed: 0,Company,Location,Date,Rocket,Mission,RocketStatus,MissionStatus
0,RVSN USSR,"Site 1/5, Baikonur Cosmodrome, Kazakhstan",1957-10-04,Sputnik 8K71PS,Sputnik-1,Retired,Success
1,RVSN USSR,"Site 1/5, Baikonur Cosmodrome, Kazakhstan",1957-11-03,Sputnik 8K71PS,Sputnik-2,Retired,Success
2,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA",1957-12-06,Vanguard,Vanguard TV3,Retired,Failure
3,AMBA,"LC-26A, Cape Canaveral AFS, Florida, USA",1958-02-01,Juno I,Explorer 1,Retired,Success
4,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA",1958-02-05,Vanguard,Vanguard TV3BU,Retired,Failure
...,...,...,...,...,...,...,...
4625,SpaceX,"SLC-4E, Vandenberg SFB, California, USA",2022-07-22,Falcon 9 Block 5,Starlink Group 3-2,Active,Success
4626,CASC,"LC-101, Wenchang Satellite Launch Center, China",2022-07-24,Long March 5B,Wentian,Active,Success
4627,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA",2022-07-24,Falcon 9 Block 5,Starlink Group 4-25,Active,Success
4628,CAS Space,"Jiuquan Satellite Launch Center, China",2022-07-27,Zhongke-1A,Demo Flight,Active,Success


In [11]:
space_missions_df['RocketStatus'].value_counts() #to check if there's any exceptional value in dataset

Retired    3620
Active     1010
Name: RocketStatus, dtype: int64

In [12]:
space_missions_df['MissionStatus'].value_counts() #to check if there's any exceptional value in dataset

Success              4162
Failure               357
Partial Failure       107
Prelaunch Failure       4
Name: MissionStatus, dtype: int64

## Data Visualization

In [26]:
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns #to visualize the data

>### Trend of rocket launch and about success rate
To do so first we have to sort data on yearly basis.

In [27]:
space_missions_yearly_df = space_missions_df.copy() #to keep orignal data untouched
space_missions_yearly_df['year'] = pd.DatetimeIndex(space_mission_yearly_df.Date).year
space_missions_yearly_df.head(5) #to check if the changes take place

Unnamed: 0,Company,Location,Date,Rocket,Mission,RocketStatus,MissionStatus,year
0,RVSN USSR,"Site 1/5, Baikonur Cosmodrome, Kazakhstan",1957-10-04,Sputnik 8K71PS,Sputnik-1,Retired,Success,1957
1,RVSN USSR,"Site 1/5, Baikonur Cosmodrome, Kazakhstan",1957-11-03,Sputnik 8K71PS,Sputnik-2,Retired,Success,1957
2,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA",1957-12-06,Vanguard,Vanguard TV3,Retired,Failure,1957
3,AMBA,"LC-26A, Cape Canaveral AFS, Florida, USA",1958-02-01,Juno I,Explorer 1,Retired,Success,1958
4,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA",1958-02-05,Vanguard,Vanguard TV3BU,Retired,Failure,1958


In [28]:
space_missions_yearly_df.value_counts(space_missions_yearly_df.year).head(5)

year
2021    157
2020    119
1971    119
2018    117
1977    114
dtype: int64