# Introduction

<center><img src="https://i.imgur.com/9hLRsjZ.jpg" height=400></center>

This dataset was scraped from [nextspaceflight.com](https://nextspaceflight.com/launches/past/?page=1) and includes all the space missions since the beginning of Space Race between the USA and the Soviet Union in 1957!

### Install Package with Country Codes

In [170]:
#%pip install iso3166

### Upgrade Plotly

Run the cell below if you are working with Google Colab.

In [171]:
#%pip install --upgrade plotly

### Import Statements

In [172]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
#import seaborn as sns
import kaleido
import plotly
import plotly.graph_objs as go


# These might be helpful:
from iso3166 import countries
from datetime import datetime, timedelta

### Notebook Presentation

In [173]:
pd.options.display.float_format = '{:,.2f}'.format

### Load the Data

In [174]:
df_data = pd.read_csv('mission_launches.csv')

# Preliminary Data Exploration

* What is the shape of `df_data`? 
* How many rows and columns does it have?
* What are the column names?
* Are there any NaN values or duplicates?

In [175]:
#df_data.shape
#4324 rows and 9 columns
spacedf = df_data.drop(columns=['Unnamed: 0.1', 'Unnamed: 0'])
spacedf.sample(10)

Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
3153,RVSN USSR,"Site 86/1, Kapustin Yar, Russia","Thu Apr 19, 1973 10:20 UTC",Cosmos-2I (63SM) | Intercosmos-9,StatusRetired,,Success
953,NASA,"LC-39B, Kennedy Space Center, Florida, USA","Tue Jul 26, 2005 14:39 UTC",Space Shuttle Discovery | STS-114,StatusRetired,450.0,Success
3792,RVSN USSR,"Site 86/1, Kapustin Yar, Russia","Tue Mar 21, 1967 10:07 UTC",Cosmos-2I (63SM) | Cosmos 149,StatusRetired,,Success
628,Arianespace,"ELS, Guiana Space Centre, French Guiana, France","Fri Oct 12, 2012 15:15 UTC",Soyuz ST-B/Fregat-MT | Galileo IOV FM03-FM04,StatusActive,,Success
1872,Martin Marietta,"SLC-40, Cape Canaveral AFS, Florida, USA","Mon Sep 04, 1989 05:54 UTC",Titan 34D | DSCS-II-16 & DSCS-III-A2,StatusRetired,,Success
4119,General Dynamics,"LC-12, Cape Canaveral AFS, Florida, USA","Thu Oct 18, 1962 16:59 UTC",Atlas-LV3 Agena-B | Ranger 5,StatusRetired,,Success
445,SpaceX,"SLC-40, Cape Canaveral AFS, Florida, USA","Fri Apr 08, 2016 20:43 UTC",Falcon 9 Block 3 | CRS-8,StatusRetired,62.0,Success
2471,RVSN USSR,"Site 132/1, Plesetsk Cosmodrome, Russia","Thu Mar 27, 1980 07:30 UTC",Cosmos-3M (11K65M) | Cosmos 1169,StatusRetired,,Success
3292,RVSN USSR,"Site 132/1, Plesetsk Cosmodrome, Russia","Mon Nov 29, 1971 17:30 UTC",Cosmos-3M (11K65M) | Cosmos 459,StatusRetired,,Success
2141,NASA,"LC-39A, Kennedy Space Center, Florida, USA","Fri Apr 12, 1985 13:59 UTC",Space Shuttle Discovery | STS-51-D,StatusRetired,450.0,Success


In [176]:
spacedf.isna().sum()
#The price column has 3360 NaN values,the others are good

Organisation         0
Location             0
Date                 0
Detail               0
Rocket_Status        0
Price             3360
Mission_Status       0
dtype: int64

## Data Cleaning - Check for Missing Values and Duplicates

Consider removing columns containing junk data. 

In [177]:
#Already removed the first 2 index columns
#Lets check for duplicates
#Only found one,everything seems to match while when only going by date there were 5 duplicates 
#but they had different models that were launched
duplicates = spacedf.duplicated(subset=["Date", "Detail"],keep='first')

spacedf[duplicates]




Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
793,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Wed Nov 05, 2008 00:15 UTC",Long March 2D | Shiyan-3 & Chuangxin-1(02),StatusActive,29.75,Success


In [178]:
spacedf = spacedf.drop_duplicates(subset=['Date', 'Detail'], keep='first')


## Descriptive Statistics

In [179]:
spacedf.describe()

Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
count,4323,4323,4323,4323,4323,963.0,4323
unique,56,137,4319,4278,2,56.0,4
top,RVSN USSR,"Site 31/6, Baikonur Cosmodrome, Kazakhstan","Sun Aug 25, 1991 08:40 UTC",Cosmos-3MRB (65MRB) | BOR-5 Shuttle,StatusRetired,450.0,Success
freq,1777,235,2,6,3534,136.0,3878


In [180]:
#56 Organizations, RVSN USSR being the top one with 1777 missions
#Also the cosmodrome being the number one launch site with 235 launches, and 137 locations in total
#3534 rockets are retired out of 4323
#3878 missions were successful

# Number of Launches per Company

Create a chart that shows the number of space mission launches by organisation.

In [181]:
launchesorgs =  spacedf['Organisation'].value_counts().sort_values(ascending=False)[:20]

In [182]:

plt.figure(figsize=(14,8),dpi=200)
bar= px.bar(launchesorgs, 
             x=launchesorgs.values, 
             y=launchesorgs.index, 
             orientation='h', 
             title='Number of Launches per Organization',
             color=launchesorgs.values,
             color_continuous_scale='Electric')
bar.update_layout(coloraxis_showscale=False, xaxis_title = "Number of Launches", yaxis_title='Organisation name')
bar.update_layout(yaxis=dict(tickfont=dict(size=10)))
bar.show()


<Figure size 2800x1600 with 0 Axes>

# Number of Active versus Retired Rockets

How many rockets are active compared to those that are decomissioned? 

In [183]:
activevsretired= spacedf['Rocket_Status'].value_counts().sort_values()
activevsretired

StatusActive      789
StatusRetired    3534
Name: Rocket_Status, dtype: int64

In [184]:
plt.figure(figsize=(14,8),dpi=200)

bar = px.bar(data_frame=activevsretired,x=activevsretired.index,
             y=activevsretired.values ,
             color=activevsretired.values,
             color_continuous_scale='portland')
bar.update_layout(coloraxis_showscale=False, xaxis_title = "Active vs retired rockets", 
                  yaxis_title='Number of Rockets')

bar.update_xaxes(ticktext=['Active', 'Retired'], tickvals=[0, 1])
bar.show()


<Figure size 2800x1600 with 0 Axes>

# Distribution of Mission Status

How many missions were successful?
How many missions failed?

In [185]:
successvsfail = spacedf['Mission_Status'].value_counts().sort_values()
successvsfail

Prelaunch Failure       4
Partial Failure       102
Failure               339
Success              3878
Name: Mission_Status, dtype: int64

In [186]:
plt.figure(figsize=(14,8),dpi=200)

bar = px.bar(data_frame=successvsfail,x=successvsfail.index,
             y=successvsfail.values ,
             color=successvsfail.values,
             color_continuous_scale='portland',
             log_y=True)
bar.update_layout(coloraxis_showscale=False, xaxis_title = "Active vs retired rockets", 
                  yaxis_title='Number of Rockets')

bar.update_xaxes(ticktext=['Prelaunch Failure', 'Partial Failure', 'Failure', 'Success'], tickvals=[0, 1,2,3])
bar.show()

<Figure size 2800x1600 with 0 Axes>

# How Expensive are the Launches? 

Create a histogram and visualise the distribution. The price column is given in USD millions (careful of missing values). 

In [187]:
prices = spacedf['Price'].dropna().sort_values()
prices

3683    1,160.0
3149    1,160.0
3180    1,160.0
3243    1,160.0
3384    1,160.0
         ...   
510        90.0
365        90.0
146        90.0
236        90.0
569        90.0
Name: Price, Length: 963, dtype: object

In [188]:
fig = px.histogram(prices, x=prices.values, 
                   nbins=20,
                   title='Distribution of Space Mission Prices',
                   #histfunc='density'
                   )

fig.update_xaxes(title_text='Price')
fig.update_yaxes(title_text='Frequency')
fig.update_layout(xaxis=dict(tickfont=dict(size=10)))
fig.update_layout(yaxis=dict(tickfont=dict(size=10)))
fig.show()

# Use a Choropleth Map to Show the Number of Launches by Country

* Create a choropleth map using [the plotly documentation](https://plotly.com/python/choropleth-maps/)
* Experiment with [plotly's available colours](https://plotly.com/python/builtin-colorscales/). I quite like the sequential colour `matter` on this map. 
* You'll need to extract a `country` feature as well as change the country names that no longer exist.

Wrangle the Country Names

You'll need to use a 3 letter country code for each country. You might have to change some country names.

* Russia is the Russian Federation
* New Mexico should be USA
* Yellow Sea refers to China
* Shahrud Missile Test Site should be Iran
* Pacific Missile Range Facility should be USA
* Barents Sea should be Russian Federation
* Gran Canaria should be USA


You can use the iso3166 package to convert the country names to Alpha3 format.

In [189]:

#spacedf['Location']
spacedf['Country'] = spacedf['Location'].str.rsplit(',',1).str[-1].str.strip()
#spacedf.drop('Country', axis=1, inplace=True)
spacedf.sample(10)


In a future version of pandas all arguments of StringMethods.rsplit except for the argument 'pat' will be keyword-only.



Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status,Country
536,ISRO,"First Launch Pad, Satish Dhawan Space Centre, ...","Thu Oct 16, 2014 20:02 UTC",PSLV-XL | IRNSS-1C,StatusActive,31.0,Success,India
1734,ISAS,"Mu Pad, Uchinoura Space Center, Japan","Fri Aug 30, 1991 02:30 UTC",Mu-III S2 | Yohkoh,StatusRetired,,Success,Japan
2204,RVSN USSR,"Site 132/1, Plesetsk Cosmodrome, Russia","Thu May 17, 1984 14:43 UTC",Cosmos-3M (11K65M) | Cosmos 1553,StatusRetired,,Success,Russia
2968,RVSN USSR,"Site 133/3, Plesetsk Cosmodrome, Russia","Tue Jan 28, 1975 12:05 UTC",Cosmos-2I (63SM) | Cosmos 705,StatusRetired,,Success,Russia
3538,RVSN USSR,"Site 132/2, Plesetsk Cosmodrome, Russia","Wed Aug 13, 1969 22:00 UTC",Cosmos-3M (11K65M) | Cosmos 292,StatusRetired,,Success,Russia
25,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Sat May 30, 2020 19:22 UTC",Falcon 9 Block 5 | SpaceX Demo-2,StatusActive,50.0,Success,USA
3831,RVSN USSR,"Site 162, Baikonur Cosmodrome, Kazakhstan","Wed Nov 02, 1966 00:45 UTC",Tsyklon | OGTch 2,StatusRetired,,Partial Failure,Kazakhstan
1272,Arianespace,"ELA-2, Guiana Space Centre, French Guiana, France","Tue Dec 22, 1998 01:08 UTC",Ariane 42L | Panamsat-6B,StatusRetired,,Success,France
1951,RVSN USSR,"Site 133/3, Plesetsk Cosmodrome, Russia","Tue Apr 05, 1988 14:31 UTC",Cosmos-3M (11K65M) | Cosmos 1937,StatusRetired,,Success,Russia
1491,VKS RF,"Site 32/2, Plesetsk Cosmodrome, Russia","Thu Aug 31, 1995 06:50 UTC",Tsyklon-3 | Sich 1 & FASat Alfa,StatusRetired,,Success,Russia


In [190]:
#Renaming countries
#The trailing spaces were stopping me from changing before
#I removed the trailing spaces and now it works
spacedf['Country'].replace({"Russia": "Russian Federation", 
                            "New Mexico": "USA",
                            "Yellow Sea": "China",
                            "Shahrud Missile Test Site": "Iran",
                            "Pacific Missile Range Facility":"USA",
                            "Barents Sea": "Russian Federation",
                            "Gran Canaria":"USA",
                            "Pacific Ocean":"USA"
                            }, 
                           inplace=True)
spacedf.replace({'Country': {'USA': 'United States of America', 
                             'North Korea':'Korea, Democratic People\'s Republic of',
                             "Iran":'Iran, Islamic Republic of',
                             "South Korea":'Korea, Republic of'}}, inplace=True)
spacedf.groupby("Country",as_index=False).count()

Unnamed: 0,Country,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
0,Australia,6,6,6,6,6,0,6
1,Brazil,3,3,3,3,3,0,3
2,China,268,268,268,268,268,158,268
3,France,303,303,303,303,303,95,303
4,India,76,76,76,76,76,67,76
5,"Iran, Islamic Republic of",14,14,14,14,14,0,14
6,Israel,11,11,11,11,11,0,11
7,Japan,126,126,126,126,126,40,126
8,Kazakhstan,701,701,701,701,701,46,701
9,Kenya,9,9,9,9,9,0,9


# Use a Choropleth Map to Show the Number of Failures by Country


In [191]:
#Missing countries
#Iran,North Korea
# create a dictionary of country names to 3 letter ISO codes
country_codes = {country.name: country.alpha3 for country in countries}
country_codes['Korea, Republic of']

# use the map function to convert the country names to ISO codes and set a default value for missing countries
spacedf['Iso'] = spacedf['Country'].map(country_codes).fillna("Missing")

#i am losing 5 countries for some reason
prizespercountry = spacedf.groupby("Iso",as_index=False).agg({'Location':pd.Series.count}).sort_values(by='Location',ascending=True)
prizespercountry


Unnamed: 0,Iso,Location
1,BRA,3
10,KOR,3
12,PRK,5
0,AUS,6
9,KEN,9
6,ISR,11
11,NZL,13
5,IRN,14
4,IND,76
7,JPN,126


In [192]:
fig= px.choropleth(prizespercountry,locations='Iso',color='Location',hover_name='Iso',
                   color_continuous_scale=px.colors.sequential.matter )
fig.show()


# Create a Plotly Sunburst Chart of the countries, organisations, and mission status. 

In [193]:
top_organizations = spacedf.groupby(['Country', 'Organisation', 'Mission_Status'],as_index=False).agg({'Location':pd.Series.count}).sort_values(by='Location',ascending=True)
top_organizations


Unnamed: 0,Country,Organisation,Mission_Status,Location
0,Australia,AMBA,Success,1
38,Japan,ISAS,Partial Failure,1
40,Japan,JAXA,Failure,1
46,Japan,UT,Success,1
48,Kazakhstan,ILS,Partial Failure,1
...,...,...,...,...
103,United States of America,General Dynamics,Success,203
9,China,CASC,Success,230
19,France,Arianespace,Success,267
58,Kazakhstan,RVSN USSR,Success,495


In [194]:
fig = px.sunburst(top_organizations, path=['Country', 'Organisation','Mission_Status'], 
                  values='Location',
                   hover_data=['Location'],
                   title="Which organisations are doing the heavy lifting")
fig.update_layout(coloraxis_showscale = False,xaxis_title='Number of Missions', 
                    yaxis_title='Organisation')
fig.show()

# Analyse the Total Amount of Money Spent by Organisation on Space Missions

In [195]:
## Continue here
removednoprice = spacedf.dropna(subset=['Price'])

removednoprice.groupby('Organisation').count()
removednoprice['Price'] = pd.to_numeric(removednoprice['Price'], errors='coerce')





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [196]:
priceperorg = removednoprice.groupby("Organisation",as_index=False).agg({'Price':pd.Series.sum}).sort_values(by='Price',ascending=True)
priceperorg.dropna()
#priceperorg['Price'] = priceperorg['Price'].apply(lambda x: x.replace('.', ','))
#priceperorg['Price'] = pd.to_numeric(removednoprice['Price'], errors='coerce')
priceperorg


Unnamed: 0,Organisation,Price
16,RVSN USSR,0.0
24,Virgin Orbit,12.0
19,Sandia,15.0
3,EER,20.0
6,ExPace,28.3
4,ESA,37.0
17,Rocket Lab,97.5
9,JAXA,168.0
11,Lockheed,280.0
5,Eurockot,543.4


In [197]:
fig = px.pie(values = priceperorg['Price'], labels=priceperorg['Organisation'],names=priceperorg['Organisation'])
fig.update_layout(showlegend = True)
fig.show()

# Analyse the Amount of Money Spent by Organisation per Launch

In [198]:

removednoprice['Price'] = pd.to_numeric(removednoprice['Price'], errors='coerce')
priceperlaunch = removednoprice.groupby("Organisation",as_index=False).agg({'Price': pd.Series.mean}).sort_values(by='Price',ascending=True)

priceperlaunch.dropna()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



Unnamed: 0,Organisation,Price
17,Rocket Lab,7.5
24,Virgin Orbit,12.0
19,Sandia,15.0
3,EER,20.0
6,ExPace,28.3
10,Kosmotras,29.0
8,ISRO,32.49
11,Lockheed,35.0
4,ESA,37.0
2,CASC,40.19


In [199]:
trace = go.Bar(x=priceperlaunch['Organisation'].iloc[:-1], y=priceperlaunch['Price'].iloc[:-1],
               marker=dict(color=priceperlaunch['Price'],
                           colorscale='portland', showscale=True))

data = [trace]
layout = go.Layout(title='Average cost of a space launch per organization',
                   xaxis=dict(title='Organization'),
                   yaxis=dict(title='Average cost'))
fig = go.Figure(data=data, layout=layout)
fig.show()


# Chart the Number of Launches per Year

In [200]:
spacedf["Date"] = pd.to_datetime(spacedf["Date"],utc=True)
spacedf['Date']
spacedf['Year'] = spacedf['Date'].dt.year
spacedf.sample(10)

Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status,Country,Iso,Year
367,Arianespace,"ELV-1 (SLV), Guiana Space Centre, French Guian...",2017-03-07 01:49:00+00:00,Vega | Sentinel 2B,StatusActive,37.0,Success,France,FRA,2017
411,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA",2016-09-08 23:05:00+00:00,Atlas V 411 | OSIRIS-REx,StatusActive,115.0,Success,United States of America,USA,2016
3708,RVSN USSR,"Site 41/1, Plesetsk Cosmodrome, Russia",1967-12-16 12:00:00+00:00,Voskhod | Cosmos 195,StatusRetired,,Success,Russian Federation,RUS,1967
3735,RVSN USSR,"Site 41/1, Plesetsk Cosmodrome, Russia",1967-09-26 10:20:00+00:00,Voskhod | Cosmos 180,StatusRetired,,Success,Russian Federation,RUS,1967
1228,Arianespace,"ELA-2, Guiana Space Centre, French Guiana, France",1999-11-13 22:54:00+00:00,Ariane 44LP | GE-4,StatusRetired,,Success,France,FRA,1999
2185,RVSN USSR,"Site 32/2, Plesetsk Cosmodrome, Russia",1984-08-08 12:08:00+00:00,Tsyklon-3 | Cosmos 1589,StatusRetired,,Success,Russian Federation,RUS,1984
1839,RVSN USSR,"Site 32/2, Plesetsk Cosmodrome, Russia",1990-02-28 00:55:00+00:00,Tsyklon-3 | Okean 2,StatusRetired,,Success,Russian Federation,RUS,1990
3795,RVSN USSR,"Site 81/23, Baikonur Cosmodrome, Kazakhstan",1967-03-10 11:30:00+00:00,Proton K/Block D | Cosmos 146,StatusRetired,,Success,Kazakhstan,KAZ,1967
3137,RVSN USSR,"Site 43/4, Plesetsk Cosmodrome, Russia",1973-06-06 11:30:00+00:00,Voskhod | Cosmos 563,StatusRetired,,Success,Russian Federation,RUS,1973
3755,RVSN USSR,"Site 31/6, Baikonur Cosmodrome, Kazakhstan",1967-07-04 05:59:00+00:00,Voskhod | Cosmos 168,StatusRetired,,Success,Kazakhstan,KAZ,1967


In [201]:
launchesperyear = spacedf.groupby("Year",as_index=False).agg({'Rocket_Status': pd.Series.count})
launchesperyear


Unnamed: 0,Year,Rocket_Status
0,1957,3
1,1958,28
2,1959,20
3,1960,39
4,1961,52
...,...,...
59,2016,90
60,2017,92
61,2018,117
62,2019,109


In [202]:
fig = px.line(launchesperyear[:-1], 
              x='Year', 
              y='Rocket_Status', 
              labels={'Year':'Year',
                      'Rocket_Status':'Number of Launches'},
              title='Number of Launches per Year', 
              template='plotly_dark')
fig.write_image("LaunchesPerYear.png", format='png')
fig.show()


# Chart the Number of Launches Month-on-Month until the Present

Which month has seen the highest number of launches in all time? Superimpose a rolling average on the month on month time series chart. 

In [203]:
spacedf['Month'] = spacedf['Date'].dt.to_period("M")
launchpermonth = spacedf.groupby(["Month"],as_index=False).agg({'Rocket_Status': pd.Series.count}).astype(str)

launchpermonth['Rocket_Status'] = pd.to_numeric(launchpermonth['Rocket_Status'])
#launchpermonth.describe()
#launchpermonth['Rocket_Status'].idxmax()
#launchpermonth.loc[164]
#18 launches in one day in 1971-12


Converting to PeriodArray/Index representation will drop timezone information.



In [204]:
fig = px.line(launchpermonth, x='Month', y='Rocket_Status', 
              labels={'Month':'Month','Rocket_Status':'Number of Launches'},
              title='Number of Launches per Month')
fig.show()
#fig.write_image("LaunchesPerYear.png", format='png')



In [205]:

launchpermonth['rolling_average'] = launchpermonth['Rocket_Status'].rolling(window=6).mean()
fig = px.line(launchpermonth, 
              x='Month', 
              y='Rocket_Status', 
              labels={'Month':'Month','Rocket_Status':'Number of Launches'},
              title='Number of Launches per Month')
rolling_average_trace = go.Scatter(x=launchpermonth['Month'], 
                                   y=launchpermonth['rolling_average'],
                                   mode='lines',
                                   line=dict(color='black', width=2))
fig.add_trace(rolling_average_trace)
fig.show()

# Launches per Month: Which months are most popular and least popular for launches?

Some months have better weather than others. Which time of year seems to be best for space missions?

In [206]:
spacedf['justmonth'] = spacedf['Date'].dt.month
monthpopularity = spacedf.groupby(["justmonth"],as_index=False).agg({'Rocket_Status': pd.Series.count}).sort_values(by='Rocket_Status', ascending=True)
monthpopularity


Unnamed: 0,justmonth,Rocket_Status
0,1,268
4,5,326
10,11,335
1,2,336
6,7,351
2,3,353
8,9,365
7,8,373
9,10,381
3,4,383


In [207]:
fig = px.pie(values = monthpopularity['Rocket_Status'], labels=monthpopularity['justmonth'],names=monthpopularity['justmonth'])
fig.update_layout(showlegend = True)
fig.show()

# How has the Launch Price varied Over Time? 

Create a line chart that shows the average price of rocket launches over time. 

In [208]:
#removednoprice2 = spacedf.dropna(subset=['Price'])
#removednoprice2['Price'] = pd.to_numeric(removednoprice['Price'], errors='coerce')
#removednoprice2['Price']

#removednoprice2 = removednoprice2.groupby(["Year"],as_index=False).agg({'Price': pd.Series.mean})
#removednoprice2

In [209]:
fig = px.line(removednoprice2, 
              x='Year', 
              y='Price', 
              labels={'Year':'Year',
                      'Price':'Average cost of a launch'},
              title='Average cost of a launch per year', 
              template='plotly_dark')
fig.show()


# Chart the Number of Launches over Time by the Top 10 Organisations. 

How has the dominance of launches changed over time between the different players? 

In [238]:
#top10orgs = spacedf.groupby(['Organisation','Year'],as_index=False).agg({'Rocket_Status': pd.Series.count})
org_year_launches = spacedf.groupby(['Organisation','Year'],as_index=False).count()

org_launches = spacedf.groupby('Organisation', as_index=False)['Rocket_Status'].count()

# Select the top 10 organizations by number of launches
top_orgs = org_launches.nlargest(10, 'Rocket_Status')['Organisation']
top_org_launches = org_year_launches[org_year_launches['Organisation'].isin(top_orgs)]


In [239]:
fig = px.line(top_org_launches, 
              x='Year', 
              y='Rocket_Status', 
              color='Organisation', 
              labels={'Year':'Year', 'Rocket_Status':'Number of Launches'}, 
              title='Number of Launches per Year per Top Organisation')
fig.show()

# Cold War Space Race: USA vs USSR

The cold war lasted from the start of the dataset up until 1991. 

In [249]:
target_countries = ["Russian Federation", "United States of America"]

cold_war = spacedf.groupby(['Country','Year'],as_index=False).count()
#This only filters for countires
#cold_war = cold_war[cold_war['Country'].isin(target_countries)]

#These both work 
#cold_war = cold_war.query('Year<=1991 and Country in @target_countries')
cold_war = cold_war[(cold_war['Year'] <= 1991) & cold_war['Country'].isin(target_countries)]

cold_war

Unnamed: 0,Country,Year,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status,Iso,Month,justmonth
288,Russian Federation,1961,2,2,2,2,2,0,2,2,2,2
289,Russian Federation,1962,8,8,8,8,8,0,8,8,8,8
290,Russian Federation,1963,8,8,8,8,8,0,8,8,8,8
291,Russian Federation,1964,8,8,8,8,8,0,8,8,8,8
292,Russian Federation,1965,10,10,10,10,10,0,10,10,10,10
...,...,...,...,...,...,...,...,...,...,...,...,...
378,United States of America,1987,6,6,6,6,6,0,6,6,6,6
379,United States of America,1988,7,7,7,7,7,3,7,7,7,7
380,United States of America,1989,16,16,16,16,16,6,16,16,16,16
381,United States of America,1990,26,26,26,26,26,10,26,26,26,26


In [250]:
fig = px.line(cold_war, 
              x='Year', 
              y='Rocket_Status', 
              color='Country', 
              labels={'Year':'Year', 'Rocket_Status':'Number of Launches'}, 
              title='How close was it? Comparing Cold War launches')
fig.show()

## Create a Plotly Pie Chart comparing the total number of launches of the USSR and the USA

Hint: Remember to include former Soviet Republics like Kazakhstan when analysing the total number of launches. 

In [None]:
#Continue here

## Create a Chart that Shows the Total Number of Launches Year-On-Year by the Two Superpowers

## Chart the Total Number of Mission Failures Year on Year.

## Chart the Percentage of Failures over Time

Did failures go up or down over time? Did the countries get better at minimising risk and improving their chances of success over time? 

# For Every Year Show which Country was in the Lead in terms of Total Number of Launches up to and including including 2020)

Do the results change if we only look at the number of successful launches? 

# Create a Year-on-Year Chart Showing the Organisation Doing the Most Number of Launches

Which organisation was dominant in the 1970s and 1980s? Which organisation was dominant in 2018, 2019 and 2020? 