# ANALYSIS ON SPACE MISSION SINCE 1957

One of the biggest explorations mankind had achieved over the years is <b>Space Exploration</b>. The vast ecosystem of the space kindled the relentless inquisitiveness of mankind to persist in it's exploration. So, What is space exploration? <strong>Space exploration</strong> is the physical exploration of outer space, both by human spaceflights and by robotic airflights. During the 20th century the development of large liquid-fueled rocket engines started and allowed space exploration to become a practical possibility.

The space exploration all began during the period of "Space race" (cold war)that was majorly dominated by the Soviet Union and the United States. On October 4, 1957 the Soviet Union launched the first human-made object to orbit Earth known as the USSR's Sputnik 1. After four years, April 12, 1961 Russian Lt. Yuri Gagarin became the first human to orbit Earth in Vostok 1.

So many explorations on space missions have been done between 20th and 21st century. More sophisticated space engines and rockets have also been built, in which companies from different countries in the industry had also explored. 

Dealing with this dataset on space missions gotten from Kaggle(insert a link to the dataset here), we have to go through a process of cleaning, structuring and enriching disorganized data(raw data) into a desired format, so that you can easily access and analyze it, which is known as <strong>Data Wrangling</strong>. Data wrangling is so important before we can move towards visualizing and analyzing the dataset. To execute this, we make use of python language, along with its library specifically for this purpose, known as "Pandas". Pandas is flexible to use regardless of the format of the dataset you're working with (i.e either xlsx, txt or csv formats).

The first step into wrangling this data set is to import the "pandas" python library and showing the raw statistical data of the space missions since 1957 till present 

In [11]:
import pandas as pd

space_data = pd.read_csv('Space_Corrected.csv')
space_data.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Company Name,Location,Datum,Detail,Status Rocket,Rocket,Status Mission
0,0,0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Fri Aug 07, 2020 05:12 UTC",Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,50.0,Success
1,1,1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Thu Aug 06, 2020 04:01 UTC",Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,29.75,Success
2,2,2,SpaceX,"Pad A, Boca Chica, Texas, USA","Tue Aug 04, 2020 23:57 UTC",Starship Prototype | 150 Meter Hop,StatusActive,,Success
3,3,3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan","Thu Jul 30, 2020 21:25 UTC",Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,65.0,Success
4,4,4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA","Thu Jul 30, 2020 11:50 UTC",Atlas V 541 | Perseverance,StatusActive,145.0,Success


The <b>"space_data.head()"</b> method shows the first 5 dataframe of the total dataset. This comes much more in handy when you just want to show few info contained in the data set.

We can make use of the <b>"shape"</b> method to check the number of rows and columns in the data set

## READING THE DATA

In [2]:
space_data.shape

(4324, 9)

Okay, with what we can see, the dataset contains 4324 rows and 9 columns. Let's move on to a more descriptive analysis as we read our data set.

If we have a Pandas series (either alone or as part of a Pandas dataframe) we can use the <b>"unique()"</b> function to identify the unique values, like this

In [3]:
space_data['Company Name'].unique()

array(['SpaceX', 'CASC', 'Roscosmos', 'ULA', 'JAXA', 'Northrop', 'ExPace',
       'IAI', 'Rocket Lab', 'Virgin Orbit', 'VKS RF', 'MHI', 'IRGC',
       'Arianespace', 'ISA', 'Blue Origin', 'ISRO', 'Exos', 'ILS',
       'i-Space', 'OneSpace', 'Landspace', 'Eurockot', 'Land Launch',
       'CASIC', 'KCST', 'Sandia', 'Kosmotras', 'Khrunichev', 'Sea Launch',
       'KARI', 'ESA', 'NASA', 'Boeing', 'ISAS', 'SRC', 'MITT', 'Lockheed',
       'AEB', 'Starsem', 'RVSN USSR', 'EER', 'General Dynamics',
       'Martin Marietta', 'Yuzhmash', 'Douglas', 'ASI', 'US Air Force',
       'CNES', 'CECLES', 'RAE', 'UT', 'OKB-586', 'AMBA',
       "Arm??e de l'Air", 'US Navy'], dtype=object)

As you can see from here, the listed data series from the "Company Name" column contains all the names of the companies that have contributed to the exploration of space missions.

In order to properly wrangle our data and to know what our data entails, it is important we derive some information from our data set. We can do this by using the <b>"info()"</b> funtion.

In [4]:
space_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4324 entries, 0 to 4323
Data columns (total 9 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   Unnamed: 0      4324 non-null   int64 
 1   Unnamed: 0.1    4324 non-null   int64 
 2   Company Name    4324 non-null   object
 3   Location        4324 non-null   object
 4   Datum           4324 non-null   object
 5   Detail          4324 non-null   object
 6   Status Rocket   4324 non-null   object
 7    Rocket         964 non-null    object
 8   Status Mission  4324 non-null   object
dtypes: int64(2), object(7)
memory usage: 304.2+ KB


Okay, from the output information gotten we can deduct some analysis from here; 
* There are 4324 entry datas in total, ranging from 0 to 4323
* 9 data columns in total
* There 4324 complete non-null values from all the columns data except "Rocket" which only have 964 non-null values 

Nevertheless, we have to detect for any similar columns that might seem to be the the same, to properly determine if it's just a tautology or not. So, let's see if <b>'Unnamed: 0'</b> and <b>'Unnamed: 0.1'</b> might match but we can't jump into conclusions like that, hence we have to check. We can do this with the <strong>"equals()"</strong> fuction.

In [5]:
equal_data = space_data[['Unnamed: 0', 'Unnamed: 0.1']]

space_data.equals(equal_data)

False

Whoa, awesome! It actually returns "False" as our output which tells us that they are the same, even though they contain the same data values.

### Note: 
<b>This part was not added to the meduim article!!!</b>

In [6]:
space_data.isnull()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Company Name,Location,Datum,Detail,Status Rocket,Rocket,Status Mission
0,False,False,False,False,False,False,False,False,False
1,False,False,False,False,False,False,False,False,False
2,False,False,False,False,False,False,False,True,False
3,False,False,False,False,False,False,False,False,False
4,False,False,False,False,False,False,False,False,False
...,...,...,...,...,...,...,...,...,...
4319,False,False,False,False,False,False,False,True,False
4320,False,False,False,False,False,False,False,True,False
4321,False,False,False,False,False,False,False,True,False
4322,False,False,False,False,False,False,False,True,False


## CLEANING THE DATA

Next, is to check for null values in all our data set.

In [7]:
space_data.isnull().sum()

Unnamed: 0           0
Unnamed: 0.1         0
Company Name         0
Location             0
Datum                0
Detail               0
Status Rocket        0
 Rocket           3360
Status Mission       0
dtype: int64

Oops! Looks like we have some cleaning to do. From the result shown above we got "3360"  null values out of "4324" values in total, only on the <b>Rocket</b> column data series.

We can go about this in two ways. Either to drop(i.e remove) the row where the null values are found or filling them all with zero. What do you suggest? Definitely, it will be better if we go with the latter because we don't want a situation where we will loose other important data and that won't be good enough to ensure our data is well organized. So, to do that we will fill the null values with '0' using the <b>"fillna()"</b> funtion. 

In [8]:
space_data[' Rocket'].fillna('0.0', inplace=True)
space_data.isnull().sum()

Unnamed: 0        0
Unnamed: 0.1      0
Company Name      0
Location          0
Datum             0
Detail            0
Status Rocket     0
 Rocket           0
Status Mission    0
dtype: int64

Touché! Now we've gotten rid of our null values. Let's proceed.

Checking for duplicated files is another way to ensure our data set is well wrangled

In [9]:
space_data.duplicated().sum()

0

### SORTING/FILTERING THE DATA SET

In addition, we can filter our data and sort them into similar occuring data. From the raw data, we can see that they have been recorded values where the status of some rockets have not been active and some of those missions were unsuccessful. With this little analysis we can filter them from the rockets that are still active and were successful.

In [10]:
new_data = space_data.loc[(space_data['Status Rocket'] != 'StatusActive') & (space_data['Status Mission'] != 'Success')]

new_data.reset_index(drop=True, inplace=True)

# new_data.to_csv('filtered_data.csv')

new_data

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Company Name,Location,Datum,Detail,Status Rocket,Rocket,Status Mission
0,202,202,Landspace,"Site 95, Jiuquan Satellite Launch Center, China","Sat Oct 27, 2018 08:00 UTC",ZhuQue-1 | CCTV Future-1,StatusRetired,0.0,Failure
1,208,208,Roscosmos,"Site 1/5, Baikonur Cosmodrome, Kazakhstan","Thu Oct 11, 2018 08:40 UTC",Soyuz FG | Soyuz MS-10 (56S),StatusRetired,0.0,Failure
2,392,392,Roscosmos,"Site 1/5, Baikonur Cosmodrome, Kazakhstan","Thu Dec 01, 2016 14:52 UTC",Soyuz U | Progress MS-04,StatusRetired,0.0,Failure
3,413,413,SpaceX,"SLC-40, Cape Canaveral AFS, Florida, USA","Thu Sep 01, 2016 13:07 UTC",Falcon 9 Block 3 | AMOS-6,StatusRetired,62,Prelaunch Failure
4,499,499,SpaceX,"SLC-40, Cape Canaveral AFS, Florida, USA","Sun Jun 28, 2015 14:21 UTC",Falcon 9 v1.1 | CRS-7,StatusRetired,56.5,Failure
...,...,...,...,...,...,...,...,...,...
386,4314,4314,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA","Mon Apr 28, 1958 02:53 UTC",Vanguard | Vanguard TV5,StatusRetired,0.0,Failure
387,4315,4315,RVSN USSR,"Site 1/5, Baikonur Cosmodrome, Kazakhstan","Sun Apr 27, 1958 09:01 UTC",Sputnik 8A91 | Sputnik-3 #1,StatusRetired,0.0,Failure
388,4318,4318,AMBA,"LC-26A, Cape Canaveral AFS, Florida, USA","Wed Mar 05, 1958 18:27 UTC",Juno I | Explorer 2,StatusRetired,0.0,Failure
389,4319,4319,US Navy,"LC-18A, Cape Canaveral AFS, Florida, USA","Wed Feb 05, 1958 07:33 UTC",Vanguard | Vanguard TV3BU,StatusRetired,0.0,Failure


Awesome! We can now properly see the the retired rockets that their missions were recorded as a failure. Also you can also see that this sorted data had been saved into a new "csv" file.

Now, we can go on to visualizing and analyzing our dataset, watch out for that in my next article. Here is the link to the GitHub repo. Thank you for reading!