<a href="https://colab.research.google.com/github/Emanuel3queijos/SpaceLaunchAnalyzer/blob/master/Space_Missions_Analysis_(start).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SpaceLaunchAnalyzer

### Tecnologias Usadas
1. **Python**
2. **Pandas**
3. **Plotly**
4. **Seaborn**
5. **Matplotlib**
6. **iso3166**

### Funcionalidades
1. **Carregamento e Limpeza de Dados**: Importação dos dados de lançamentos espaciais e limpeza inicial para remover valores ausentes e duplicados.
2. **Análise Descritiva**:
   - Contagem de lançamentos por organização.
   - Análise do status dos foguetes (ativos vs. aposentados).
   - Distribuição de status das missões (sucesso vs. falha).
   - Análise de preços dos lançamentos.
3. **Visualizações**:
   - Gráficos de barras e histogramas para visualizar lançamentos, preços e status.
   - Mapa coroplético mostrando o número de lançamentos por país.
   - Gráfico de sunburst mostrando a relação entre países, organizações e status das missões.
4. **Análise de Gastos**: Cálculo e visualização do total gasto por organização e o gasto médio por lançamento.
5. **Análise Temporal**: Contagem de lançamentos por ano e mês para identificar tendências ao longo do tempo.

### Objetivo do Projeto
O objetivo do projeto é realizar uma análise abrangente de dados de lançamentos espaciais, visando compreender melhor as dinâmicas do setor espacial, como a quantidade de lançamentos por organização, o status dos foguetes, a distribuição geográfica dos lançamentos e os gastos associados. Através de visualizações interativas, o projeto busca facilitar a compreensão e a interpretação dos dados, permitindo insights valiosos sobre a indústria espacial.

<center><img src="https://i.imgur.com/9hLRsjZ.jpg" height=400></center>


<center><img src="https://i.imgur.com/9hLRsjZ.jpg" height=400></center>


In [67]:
%pip install iso3166
%pip install plotly




In [69]:
import numpy as np
import pandas as pd
import plotly.express as px
import matplotlib.pyplot as plt
import seaborn as sns

from iso3166 import countries
from datetime import datetime, timedelta

### Load the Data

In [70]:
df_data = pd.read_csv('/content/sample_data/mission_launches.csv')
df_data.head()

Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
0,0,0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Fri Aug 07, 2020 05:12 UTC",Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,50.0,Success
1,1,1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Thu Aug 06, 2020 04:01 UTC",Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,29.75,Success
2,2,2,SpaceX,"Pad A, Boca Chica, Texas, USA","Tue Aug 04, 2020 23:57 UTC",Starship Prototype | 150 Meter Hop,StatusActive,,Success
3,3,3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan","Thu Jul 30, 2020 21:25 UTC",Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,65.0,Success
4,4,4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA","Thu Jul 30, 2020 11:50 UTC",Atlas V 541 | Perseverance,StatusActive,145.0,Success


# Preliminary Data Exploration

* What is the shape of `df_data`?
* How many rows and columns does it have?
* What are the column names?
* Are there any NaN values or duplicates?

In [12]:
print(f"shape like: {df_data.shape}")


shape like: (4324, 9)


In [10]:
rows = df_data.shape[0]
columns = df_data.shape[1]
print(f"There are {rows} rows and {columns} columns.")

There are 4324 rows and 9 columns.


In [14]:
column_names = df_data.columns
print(f"Column names: {column_names}")

Column names: Index(['Unnamed: 0.1', 'Unnamed: 0', 'Organisation', 'Location', 'Date',
       'Detail', 'Rocket_Status', 'Price', 'Mission_Status'],
      dtype='object')


## Data Cleaning - Check for Missing Values and Duplicates

Consider removing columns containing junk data.

In [17]:
nan_values = df_data.isna().sum()
print(nan_values)

Unnamed: 0.1         0
Unnamed: 0           0
Organisation         0
Location             0
Date                 0
Detail               0
Rocket_Status        0
Price             3360
Mission_Status       0
dtype: int64


In [18]:
duplicates = df_data.duplicated().sum()
print(f"Number of duplicate rows: {duplicates}")


Number of duplicate rows: 0


In [20]:
df_data_cleaned = df_data.drop(columns=['Unnamed: 0', 'Unnamed: 0.1'])
df_data_cleaned.head()

Unnamed: 0,Organisation,Location,Date,Detail,Rocket_Status,Price,Mission_Status
0,SpaceX,"LC-39A, Kennedy Space Center, Florida, USA","Fri Aug 07, 2020 05:12 UTC",Falcon 9 Block 5 | Starlink V1 L9 & BlackSky,StatusActive,50.0,Success
1,CASC,"Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...","Thu Aug 06, 2020 04:01 UTC",Long March 2D | Gaofen-9 04 & Q-SAT,StatusActive,29.75,Success
2,SpaceX,"Pad A, Boca Chica, Texas, USA","Tue Aug 04, 2020 23:57 UTC",Starship Prototype | 150 Meter Hop,StatusActive,,Success
3,Roscosmos,"Site 200/39, Baikonur Cosmodrome, Kazakhstan","Thu Jul 30, 2020 21:25 UTC",Proton-M/Briz-M | Ekspress-80 & Ekspress-103,StatusActive,65.0,Success
4,ULA,"SLC-41, Cape Canaveral AFS, Florida, USA","Thu Jul 30, 2020 11:50 UTC",Atlas V 541 | Perseverance,StatusActive,145.0,Success


## Descriptive Statistics

# Number of Launches per Company


In [23]:
launches_by_org = df_data_cleaned['Organisation'].value_counts()

# Plot using Plotly
import plotly.express as px
fig = px.bar(launches_by_org, x=launches_by_org.index, y=launches_by_org.values,
             labels={'x': 'Organisation', 'y': 'Number of Launches'},
             title='Number of Launches per Organisation')
fig.show()

# Number of Active versus Retired Rockets



In [24]:
rocket_status_count = df_data_cleaned['Rocket_Status'].value_counts()
fig = px.pie(rocket_status_count, values=rocket_status_count.values, names=rocket_status_count.index,
             title="Number of Active vs Retired Rockets")
fig.show()


# Distribution of Mission Status


In [25]:
mission_status_count = df_data_cleaned['Mission_Status'].value_counts()
fig = px.bar(mission_status_count, x=mission_status_count.index, y=mission_status_count.values,
             labels={'x': 'Mission Status', 'y': 'Number of Missions'},
             title='Distribution of Mission Status')
fig.show()


# How Expensive are the Launches?


In [26]:
df_data_cleaned['Price'] = pd.to_numeric(df_data_cleaned['Price'], errors='coerce')
fig = px.histogram(df_data_cleaned, x='Price', nbins=50, title='Distribution of Launch Prices')
fig.show()


# using Choropleth Map to Show the Number of Launches by Country






In [29]:
import pandas as pd
import plotly.express as px
from iso3166 import countries
def wrangle_country(location):
    if 'Russia' in location or 'Barents Sea' in location:
        return 'Russian Federation'
    elif 'New Mexico' in location or 'Pacific Missile Range' in location or 'Gran Canaria' in location:
        return 'USA'
    elif 'Yellow Sea' in location:
        return 'China'
    elif 'Shahrud Missile Test Site' in location:
        return 'Iran'
    else:
        return location.split(',')[-1].strip()

df_data['Country'] = df_data['Location'].apply(wrangle_country)
def get_alpha3(country_name):
    try:
        return countries.get(country_name).alpha3
    except KeyError:
        return None

df_data['Country_Code'] = df_data['Country'].apply(get_alpha3)
launches_by_country = df_data.groupby('Country_Code').size().reset_index(name='Launch_Count')
fig = px.choropleth(launches_by_country, locations='Country_Code',
                    color='Launch_Count',
                    hover_name='Country_Code',
                    color_continuous_scale=px.colors.sequential.matter,
                    title="Number of Space Launches by Country")

fig.show()


# Using a Choropleth Map to Show the Number of Failures by Country




In [34]:
df_data['Country'] = df_data['Location'].apply(wrangle_country)
failed_missions = df_data[df_data['Mission_Status'] == 'Failure'].copy()
failed_missions.loc[:, 'Country_Code'] = failed_missions['Country'].apply(get_alpha3)
failures_by_country = failed_missions.groupby('Country_Code').size().reset_index(name='Failure_Count')
fig = px.choropleth(failures_by_country, locations='Country_Code',
                    color='Failure_Count',
                    hover_name='Country_Code',
                    color_continuous_scale=px.colors.sequential.matter,
                    title="Number of Failed Space Launches by Country")

fig.show()


#Plotly Sunburst Chart of the countries, organisations, and mission status.

In [35]:
import plotly.express as px

sunburst_data = df_data.groupby(['Country', 'Organisation', 'Mission_Status']).size().reset_index(name='Count')

fig = px.sunburst(sunburst_data,
                  path=['Country', 'Organisation', 'Mission_Status'],
                  values='Count',
                  title="Sunburst Chart of Countries, Organisations, and Mission Status")
fig.show()


# Analyse the Total Amount of Money Spent by Organisation on Space Missions

In [46]:
import pandas as pd
import plotly.express as px
df_data['Price'] = pd.to_numeric(df_data['Price'], errors='coerce')
total_spent = df_data.groupby('Organisation')['Price'].sum().reset_index()
total_spent = total_spent.sort_values(by='Price', ascending=False)
fig = px.bar(total_spent, x='Organisation', y='Price',
             title="Total Amount of Money Spent by Organisation on Space Missions",
             labels={'Price': 'Total Amount Spent', 'Organisation': 'Organization'},
             color='Price', color_continuous_scale='Viridis')

fig.show()
print(total_spent)


        Organisation     Price
35              NASA 61,200.00
3        Arianespace 16,345.00
48               ULA 14,798.00
7               CASC  6,340.26
46            SpaceX  5,444.00
36          Northrop  3,930.00
32               MHI  3,532.50
23              ISRO  2,177.00
49      US Air Force  1,550.92
52            VKS RF  1,548.90
19               ILS  1,320.00
6             Boeing  1,241.00
42         Roscosmos  1,187.50
34   Martin Marietta    721.40
28         Kosmotras    638.00
14          Eurockot    543.40
31          Lockheed    280.00
24              JAXA    168.00
41        Rocket Lab     97.50
13               ESA     37.00
15            ExPace     28.30
12               EER     20.00
44            Sandia     15.00
53      Virgin Orbit     12.00
43               SRC      0.00
50           US Navy      0.00
47           Starsem      0.00
40         RVSN USSR      0.00
39               RAE      0.00
38          OneSpace      0.00
37           OKB-586      0.00
51      

# Analyse the Amount of Money Spent by Organisation per Launch

In [47]:
import pandas as pd
import plotly.express as px
df_data['Price'] = pd.to_numeric(df_data['Price'], errors='coerce')
launches_count = df_data.groupby('Organisation')['Price'].count().reset_index(name='Launches')
total_spent = df_data.groupby('Organisation')['Price'].sum().reset_index()
merged_data = pd.merge(total_spent, launches_count, on='Organisation')
merged_data['Spent_per_Launch'] = merged_data['Price'] / merged_data['Launches']
merged_data = merged_data.sort_values(by='Spent_per_Launch', ascending=False)
fig = px.bar(merged_data, x='Organisation', y='Spent_per_Launch',
             title="Amount of Money Spent by Organisation per Launch",
             labels={'Spent_per_Launch': 'Amount Spent per Launch', 'Organisation': 'Organization'},
             color='Spent_per_Launch', color_continuous_scale='Viridis')

fig.show()
print(merged_data)


        Organisation     Price  Launches  Spent_per_Launch
35              NASA 61,200.00       136            450.00
6             Boeing  1,241.00         7            177.29
3        Arianespace 16,345.00        96            170.26
48               ULA 14,798.00        98            151.00
19               ILS  1,320.00        13            101.54
32               MHI  3,532.50        37             95.47
34   Martin Marietta    721.40         9             80.16
49      US Air Force  1,550.92        26             59.65
24              JAXA    168.00         3             56.00
46            SpaceX  5,444.00        99             54.99
42         Roscosmos  1,187.50        23             51.63
36          Northrop  3,930.00        83             47.35
52            VKS RF  1,548.90        33             46.94
14          Eurockot    543.40        13             41.80
7               CASC  6,340.26       158             40.13
13               ESA     37.00         1             37.

# Chart the Number of Launches per Year

In [53]:
import pandas as pd
import plotly.express as px
df_data['Date'] = pd.to_datetime(df_data['Date'], format='%a %b %d, %Y', errors='coerce')
missing_dates = df_data[df_data['Date'].isna()]
if not missing_dates.empty:
    print("Missing dates after conversion:")
    print(missing_dates)
df_data['Year'] = df_data['Date'].dt.year
yearly_launches = df_data.groupby('Year').size().reset_index(name='Number of Launches')
fig = px.bar(yearly_launches, x='Year', y='Number of Launches',
              title='Number of Launches per Year',
              labels={'Number of Launches': 'Total Launches', 'Year': 'Year'},
              color='Number of Launches',
              color_continuous_scale=px.colors.sequential.Plasma)
fig.show()



Missing dates after conversion:
      Unnamed: 0.1  Unnamed: 0 Organisation  \
0                0           0       SpaceX   
1                1           1         CASC   
2                2           2       SpaceX   
3                3           3    Roscosmos   
4                4           4          ULA   
...            ...         ...          ...   
4319          4319        4319      US Navy   
4320          4320        4320         AMBA   
4321          4321        4321      US Navy   
4322          4322        4322    RVSN USSR   
4323          4323        4323    RVSN USSR   

                                               Location Date  \
0            LC-39A, Kennedy Space Center, Florida, USA  NaT   
1     Site 9401 (SLS-2), Jiuquan Satellite Launch Ce...  NaT   
2                         Pad A, Boca Chica, Texas, USA  NaT   
3          Site 200/39, Baikonur Cosmodrome, Kazakhstan  NaT   
4              SLC-41, Cape Canaveral AFS, Florida, USA  NaT   
...                

0    1,958.00
1    1,960.00
2    1,962.00
3    1,963.00
4    1,964.00
5    1,965.00
6    1,966.00
7    1,967.00
8    1,968.00
9    1,969.00
10   1,970.00
11   1,971.00
12   1,972.00
13   1,973.00
14   1,974.00
15   1,975.00
16   1,976.00
17   1,977.00
18   1,978.00
19   1,979.00
20   1,980.00
21   1,981.00
22   1,982.00
23   1,983.00
24   1,984.00
25   1,985.00
26   1,986.00
27   1,987.00
28   1,988.00
29   1,989.00
30   1,990.00
31   1,991.00
32   1,994.00
33   1,995.00
34   1,997.00
35   1,998.00
36   1,999.00
37   2,001.00
38   2,002.00
39   2,003.00
40   2,004.00
41   2,007.00
42   2,008.00
43   2,009.00
44   2,010.00
45   2,011.00
46   2,012.00
47   2,014.00
48   2,015.00
49   2,016.00
50   2,017.00
51   2,019.00
Name: Year, dtype: float64


# Chart the Number of Launches


In [54]:
import pandas as pd
import plotly.express as px
df_data['Date'] = pd.to_datetime(df_data['Date'], format='%a %b %d, %Y', errors='coerce')
df_data['Year'] = df_data['Date'].dt.year
yearly_launches = df_data.groupby('Year').size().reset_index(name='Number of Launches')
fig_yearly = px.bar(yearly_launches, x='Year', y='Number of Launches',
                    title='Number of Launches per Year',
                    labels={'Number of Launches': 'Total Launches', 'Year': 'Year'},
                    color='Number of Launches',
                    color_continuous_scale=px.colors.sequential.Plasma)

fig_yearly.show()


# Launches per Month: Which months are most popular and least popular for launches?

Some months have better weather than others. Which time of year seems to be best for space missions?

In [55]:
import pandas as pd
import plotly.express as px
df_data['Date'] = pd.to_datetime(df_data['Date'], format='%a %b %d, %Y', errors='coerce')
df_data['Month'] = df_data['Date'].dt.month
df_data['Year'] = df_data['Date'].dt.year
monthly_launches = df_data.groupby('Month').size().reset_index(name='Number of Launches')
fig_monthly = px.bar(monthly_launches, x='Month', y='Number of Launches',
                      title='Number of Launches per Month',
                      labels={'Number of Launches': 'Total Launches', 'Month': 'Month'},
                      color='Number of Launches',
                      color_continuous_scale=px.colors.sequential.Viridis)
fig_monthly.update_xaxes(tickvals=monthly_launches['Month'],
                          ticktext=['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                                    'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
fig_monthly.show()


# How has the Launch Price varied Over Time?


In [56]:
import pandas as pd
import plotly.express as px
df_data['Date'] = pd.to_datetime(df_data['Date'], format='%a %b %d, %Y', errors='coerce')
df_data['Price'] = pd.to_numeric(df_data['Price'], errors='coerce')
df_data['Year'] = df_data['Date'].dt.year
average_price_per_year = df_data.groupby('Year')['Price'].mean().reset_index()
fig_price = px.line(average_price_per_year, x='Year', y='Price',
                    title='Average Launch Price Over Time',
                    labels={'Price': 'Average Price ($)', 'Year': 'Year'},
                    markers=True)

fig_price.show()


# Chart the Number of Launches over Time by the Top 10 Organisations.

How has the dominance of launches changed over time between the different players?

In [57]:
import pandas as pd
import plotly.express as px
df_data['Date'] = pd.to_datetime(df_data['Date'], format='%a %b %d, %Y', errors='coerce')
df_data['Year'] = df_data['Date'].dt.year
launches_per_organisation = df_data.groupby(['Year', 'Organisation']).size().reset_index(name='Launches')
top_10_organisations = launches_per_organisation.groupby('Organisation')['Launches'].sum().nlargest(10).index
top_launches_per_year = launches_per_organisation[launches_per_organisation['Organisation'].isin(top_10_organisations)]
fig_launches = px.line(top_launches_per_year, x='Year', y='Launches', color='Organisation',
                        title='Number of Launches Over Time by Top 10 Organisations',
                        labels={'Launches': 'Number of Launches', 'Year': 'Year'},
                        markers=True)
fig_launches.show()


# Cold War Space Race: USA vs USSR

The cold war lasted from the start of the dataset up until 1991.

In [71]:
import pandas as pd
import plotly.express as px
df_data['Date'] = pd.to_datetime(df_data['Date'], format='%a %b %d, %Y', errors='coerce')
cold_war_data = df_data[df_data['Date'] < '1991-01-01'].copy()
cold_war_data.loc[:, 'Country'] = cold_war_data['Organisation'].replace({
    'USA': 'USA',
    'Soviet Union': 'USSR',
    'Russian Federation': 'USSR',

})
launches_by_country = cold_war_data['Country'].value_counts().reset_index()
launches_by_country.columns = ['Country', 'Launches']
fig_cold_war = px.bar(launches_by_country, x='Country', y='Launches',
                       title='Total Number of Launches During the Cold War (USA vs USSR)',
                       labels={'Launches': 'Number of Launches', 'Country': 'Country'},
                       color='Country')
fig_cold_war.show()
cold_war_data.loc[:, 'Year'] = cold_war_data['Date'].dt.year
launches_by_year_country = cold_war_data.groupby(['Year', 'Country']).size().reset_index(name='Launches')
fig_cold_war_time = px.line(launches_by_year_country, x='Year', y='Launches', color='Country',
                             title='Number of Launches Over Time During the Cold War (USA vs USSR)',
                             labels={'Launches': 'Number of Launches', 'Year': 'Year'},
                             markers=True)
fig_cold_war_time.show()


## Create a Plotly Pie Chart comparing the total number of launches of the USSR and the USA

Hint: Remember to include former Soviet Republics like Kazakhstan when analysing the total number of launches.

In [63]:
import pandas as pd
import plotly.express as px

data = {
    'Date': ['Thu Aug 29, 2019', 'Fri Jul 24, 2020', 'Sat Jun 25, 2021', 'Sun May 30, 2021'],
    'Organisation': ['USA', 'Soviet Union', 'Kazakhstan', 'Russian Federation'],
    'Price': [100, 200, 300, 400]
}
df_data = pd.DataFrame(data)
df_data['Date'] = pd.to_datetime(df_data['Date'], format='%a %b %d, %Y', errors='coerce')

country_map = {
    'USA': 'USA',
    'Soviet Union': 'USSR',
    'Russian Federation': 'USSR',
    'Kazakhstan': 'USSR',
}

df_data['Country'] = df_data['Organisation'].replace(country_map)
total_launches = df_data['Country'].value_counts().reset_index()
total_launches.columns = ['Country', 'Total Launches']
print("Total Launches by Country:")
print(total_launches)
launch_comparison = total_launches[total_launches['Country'].isin(['USA', 'USSR'])]
print("\nLaunch Comparison:")
print(launch_comparison)
if launch_comparison.empty:
    print("No data available for the selected countries.")
else:
    fig_pie = px.pie(launch_comparison,
                     values='Total Launches',
                     names='Country',
                     title='Total Number of Launches: USA vs USSR (Including Kazakhstan)',
                     color='Country',
                     color_discrete_sequence=px.colors.qualitative.Set2)

    fig_pie.show()


Total Launches by Country:
  Country  Total Launches
0    USSR               3
1     USA               1

Launch Comparison:
  Country  Total Launches
0    USSR               3
1     USA               1



## Create a Chart that Shows the Total Number of Launches Year-On-Year by the Two Superpowers

In [64]:
import pandas as pd
import plotly.express as px
data = {
    'Date': ['Thu Aug 29, 2019', 'Fri Jul 24, 2020', 'Sat Jun 25, 2021', 'Sun May 30, 2021',
             'Mon Jan 01, 1960', 'Tue Feb 01, 1961', 'Wed Mar 01, 1962', 'Thu Apr 01, 1963'],
    'Organisation': ['USA', 'Soviet Union', 'Kazakhstan', 'Russian Federation',
                    'USA', 'Soviet Union', 'Kazakhstan', 'Russian Federation'],
    'Price': [100, 200, 300, 400, 150, 250, 350, 450]
}
df_data = pd.DataFrame(data)


df_data['Date'] = pd.to_datetime(df_data['Date'], format='%a %b %d, %Y', errors='coerce')

country_map = {
    'USA': 'USA',
    'Soviet Union': 'USSR',
    'Russian Federation': 'USSR',
    'Kazakhstan': 'USSR',
}

df_data['Country'] = df_data['Organisation'].replace(country_map)


df_data['Year'] = df_data['Date'].dt.year
launch_counts = df_data.groupby(['Year', 'Country']).size().reset_index(name='Total Launches')


launch_comparison = launch_counts[launch_counts['Country'].isin(['USA', 'USSR'])]


fig_line = px.line(launch_comparison,
                   x='Year',
                   y='Total Launches',
                   color='Country',
                   title='Total Number of Launches Year-on-Year: USA vs USSR (Including Kazakhstan)',
                   labels={'Total Launches': 'Number of Launches'},
                   markers=True)
fig_line.show()
