# Data Source - List of Rocket Launch Sites (Wiki)

## Dependecies and Pre-Preprocessing (CSV Module)

In this section the preprocessing of the dataset will be done using user defined python functions created using the csv (comma separated values) module to parse the data, the numpy library to structure the data as arrarys, and finally create graphs using matplotlib. Some comments will be made regarding the advantages and distanvatages of this methodology.

In [1]:
All_Data = '../Data/'
wikisites = All_Data + 'wiki_sites/'
Launch_Sites_Wiki_Data = wikisites + 'launchsites_wiki/'
Launch_Sites_Wiki_Data = wikisites + 'launchsites_wiki/'

# Reading Transformed Data

In [2]:
import pandas as pd
transformed_df = pd.read_excel(Launch_Sites_Wiki_Data + 'launchsites_transformed.xlsx')
print(transformed_df.columns)
print(transformed_df.shape)

Index(['Unnamed: 0', 'Country', 'Location', 'Coordinates', 'Operational date',
       'Number of rocket launches', 'Heaviest rocket launched',
       'Highest achieved altitude', 'Notes', 'Continent', 'coordinates (dms)',
       'lat (dms)', 'long (dms)', 'coordinates (decimal)', 'lat (decimal)',
       'long (decimal)', 'rotational_vel', 'year opened', 'year closed',
       'operation length'],
      dtype='object')
(111, 20)


# Filtering out LaunchSites Located at Military Bases

The notes column of the table we scrapped from wikipedia provides good information of the use case of the launchsite. For the purposes of this study we will not need to analyze launchsites located at Military bases. First I used a list comprehesion to find rows that contained the key word 'Military' in the notes. Then used pandas 'isin' to create a small dataframe from that list. To create a dataframe of items 'not in', I used a tilde '~' to filter out the launchsites of military bases.

reference: https://stackoverflow.com/questions/19960077/how-to-filter-pandas-dataframe-using-in-and-not-in-like-in-sql

In [3]:
my_list = [x for x in transformed_df['Notes'] if isinstance(x, str) and ('Military' in x)]
militarybases_df = transformed_df.loc[transformed_df['Notes'].isin(my_list)]
militarybases_df.shape

(7, 20)

In [5]:
my_list = [x for x in transformed_df['Notes'] if isinstance(x, str) and ('Military' in x)]
without_militarybases_df = transformed_df.loc[~transformed_df['Notes'].isin(my_list)]
#final_df = without_militarybases_df.dropna(subset=['Number of rocket launches'])
without_militarybases_df = without_militarybases_df.drop(columns=['Unnamed: 0', 'Notes','Coordinates'])

In [6]:
#without_militarybases_df.to_excel('./tableau_data.xlsx')
without_militarybases_df.head()

Unnamed: 0,Country,Location,Operational date,Number of rocket launches,Heaviest rocket launched,Highest achieved altitude,Continent,coordinates (dms),lat (dms),long (dms),coordinates (decimal),lat (decimal),long (decimal),rotational_vel,year opened,year closed,operation length
0,French Algeria,Centre interarmées d'essais d'engins spéciaux ...,1947–1967,230.0,18000.0,Orbital,Africa,31°05′58″N 2°50′09″W﻿,31°05′58″N,2°50′09″W﻿,﻿31.09951°N 2.83581°W,31.09951,-2.83581,398.25327,1947,1967,20
1,Algeria,Reggane,1961–1965,10.0,,,Africa,26°43′08″N 0°16′37″E﻿,26°43′08″N,0°16′37″E﻿,﻿26.71895°N 0.27691°E,26.71895,0.27691,415.439347,1961,1965,4
2,Zaire,"Shaba North, Kapani Tonneo OTRAG Launch Center",1977–1978,3.0,,50,Africa,7°55′33″S 28°31′40″E﻿,7°55′33″S,28°31′40″E﻿,﻿7.92587°S 28.52766°E,-7.92587,28.52766,460.658654,1977,1978,1
3,Egypt,Jabal Hamzah ballistic missile test and launch...,1962–1973,6.0,,,Africa,30°07′32.7″N 30°36′18.5″E﻿,30°07′32.7″N,30°36′18.5″E﻿,﻿30.125750°N 30.605139°E,30.12575,30.605139,402.278464,1962,1973,11
4,Kenya,"Broglio Space Centre (San Marco), Malindi",1964–1988,27.0,20000.0,Orbital,Africa,2°56′27″S 40°12′48″E﻿,2°56′27″S,40°12′48″E﻿,﻿2.94080°S 40.21340°E,-2.9408,40.2134,464.489125,1964,1988,24


In [12]:
without_militarybases_df['operation length'].describe()

count    104.000000
mean      24.942308
std       23.940390
min        0.000000
25%        3.750000
50%       17.000000
75%       49.250000
max       76.000000
Name: operation length, dtype: float64

# Spaceport Wikitable

In [13]:
spaceport_satellite_launches_df = pd.read_excel('../Data/wiki_sites/spaceports_wiki/spaceport_satellite_launches_df.xlsx',index_col=0)
spaceport_satellite_launches_df['Spaceport_names'] = spaceport_satellite_launches_df['Spaceport'].map(lambda x:x.split(',')[0])

# Python Sequence Matcher to compare strings

link: https://www.kite.com/python/docs/difflib.SequenceMatcher.ratio

In [41]:
import sys
import difflib

x = 0

my_dataframe = pd.DataFrame()

for spaceport in spaceport_satellite_launches_df['Spaceport_names']:
    i = spaceport.strip().replace(" ","").lower()
    i = i.split(',')[0]
    for location in without_militarybases_df['Location']:
        j = location.strip().replace(" ","").lower()
        j = j.split(',')[0]
        if difflib.SequenceMatcher(None, i, j).ratio() > 0.9:
            left = spaceport_satellite_launches_df.loc[spaceport_satellite_launches_df['Spaceport_names'] == spaceport].rename(columns={'Spaceport_names':'name'})
            right = without_militarybases_df.loc[without_militarybases_df['Location'] == location].rename(columns={'Location':'name'})
            ind = right.loc[right['name']==location].index.values[0]
            right.loc[ind,'name'] = spaceport
            #print(right)
            row = left.merge(right, left_on='name', right_on='name')
            #print(row)
            
            my_dataframe = my_dataframe.append(row, ignore_index = True)
            x += 1
        else:
            pass
print(x)

20


In [29]:
my_dataframe.columns

Index(['Spaceport', 'Location', 'Years(orbital)', 'number of launches',
       'Launch vehicles(operators)', 'Sources', 'name', 'Country',
       'Operational date', 'Number of rocket launches',
       'Heaviest rocket launched', 'Highest achieved altitude', 'Continent',
       'coordinates (dms)', 'lat (dms)', 'long (dms)', 'coordinates (decimal)',
       'lat (decimal)', 'long (decimal)', 'rotational_vel', 'year opened',
       'year closed', 'operation length'],
      dtype='object')

In [22]:
my_dataframe['number of launches'].describe()['mean']

231.2

In [35]:
top_launchers = my_dataframe.loc[my_dataframe['number of launches'] > my_dataframe['number of launches'].describe()['mean']]
top_launchers

Unnamed: 0,Spaceport,Location,Years(orbital),number of launches,Launch vehicles(operators),Sources,name,Country,Operational date,Number of rocket launches,...,coordinates (dms),lat (dms),long (dms),coordinates (decimal),lat (decimal),long (decimal),rotational_vel,year opened,year closed,operation length
0,"Baikonur Cosmodrome, Baikonur/Tyuratam, Kazakh...",Kazakhstan,1957–,1000,"R-7/Soyuz, Kosmos, Proton, Tsyklon, Zenit, Ene...",[citation needed],Baikonur Cosmodrome,Kazakhstan,1957–,,...,45°57′19″N 63°21′01″E﻿,45°57′19″N,63°21′01″E﻿,﻿45.95515°N 63.35028°E,45.95515,63.35028,323.348532,1957,operational,64
1,"Cape Canaveral Space Force Station, Florida, US",US,1958–,400,"Delta, Scout, Atlas, Titan, Saturn, Athena, Fa...",[citation needed],Cape Canaveral Space Force Station,United States,1949–,1000.0,...,28°28′00″N 80°33′31″W﻿,28°28′00″N,80°33′31″W﻿,﻿28.46675°N 80.55852°W,28.46675,-80.55852,408.86799,1949,operational,72
2,"Vandenberg Space Force Base, California, US",US,1959–,700,"Delta, Scout, Atlas, Titan, Taurus, Athena, Mi...",[14],Vandenberg Space Force Base,United States,1958–,500.0,...,34°46′19″N 120°36′04″W﻿,34°46′19″N,120°36′04″W﻿,﻿34.77204°N 120.60124°W,34.77204,-120.60124,382.047322,1958,operational,63
5,"Plesetsk Cosmodrome, Arkhangelsk Oblast, Russia",Russia,1966–,1500,"R-7/Soyuz, Kosmos, Tsyklon-3, Rokot, Angara",[20],Plesetsk Cosmodrome,Russia,1966–,1000.0,...,62°55′32″N 40°34′40″E﻿,62°55′32″N,40°34′40″E﻿,﻿62.92556°N 40.57778°E,62.92556,40.57778,211.689951,1966,operational,55
7,"Guiana Space Centre, Kourou, French Guiana, Fr...",French Guiana,1970–,261,"7 Diamant, 227 Ariane, 16 Soyuz-2, 11 Vega",see 4 rockets,Guiana Space Centre,French Guiana,1968–,200.0,...,5°14′15″N 52°46′10″W﻿,5°14′15″N,52°46′10″W﻿,﻿5.23739°N 52.76950°W,5.23739,-52.7695,463.159848,1968,operational,53


In [24]:
my_dataframe['operation length'].describe()['mean']

43.2

In [25]:
my_dataframe.loc[my_dataframe['operation length'] > my_dataframe['operation length'].describe()['mean']]

Unnamed: 0,Spaceport,Location,Years(orbital),number of launches,Launch vehicles(operators),Sources,name,Country,Operational date,Number of rocket launches,...,coordinates (dms),lat (dms),long (dms),coordinates (decimal),lat (decimal),long (decimal),rotational_vel,year opened,year closed,operation length
0,"Baikonur Cosmodrome, Baikonur/Tyuratam, Kazakh...",Kazakhstan,1957–,1000,"R-7/Soyuz, Kosmos, Proton, Tsyklon, Zenit, Ene...",[citation needed],Baikonur Cosmodrome,Kazakhstan,1957–,,...,45°57′19″N 63°21′01″E﻿,45°57′19″N,63°21′01″E﻿,﻿45.95515°N 63.35028°E,45.95515,63.35028,323.348532,1957,operational,64
1,"Cape Canaveral Space Force Station, Florida, US",US,1958–,400,"Delta, Scout, Atlas, Titan, Saturn, Athena, Fa...",[citation needed],Cape Canaveral Space Force Station,United States,1949–,1000.0,...,28°28′00″N 80°33′31″W﻿,28°28′00″N,80°33′31″W﻿,﻿28.46675°N 80.55852°W,28.46675,-80.55852,408.86799,1949,operational,72
2,"Vandenberg Space Force Base, California, US",US,1959–,700,"Delta, Scout, Atlas, Titan, Taurus, Athena, Mi...",[14],Vandenberg Space Force Base,United States,1958–,500.0,...,34°46′19″N 120°36′04″W﻿,34°46′19″N,120°36′04″W﻿,﻿34.77204°N 120.60124°W,34.77204,-120.60124,382.047322,1958,operational,63
3,"Wallops Flight Facility, Virginia, US",US,1961–1985,19,Scout,6[16]+13[17],Wallops Flight Facility,United States,1945–,1600.0,...,37°50′46″N 75°28′46″W﻿,37°50′46″N,75°28′46″W﻿,﻿37.84621°N 75.47938°W,37.84621,-75.47938,367.272354,1945,operational,76
4,"Kapustin Yar Cosmodrome, Astrakhan Oblast, Russia",Russia,1962–2008,85,Kosmos,[18][citation needed],Kapustin Yar Cosmodrome,Russia,1957–[citation needed],,...,48°34′41″N 46°15′15″E﻿,48°34′41″N,46°15′15″E﻿,﻿48.57807°N 46.25420°E,48.57807,46.2542,307.710736,1957,operational,64
5,"Plesetsk Cosmodrome, Arkhangelsk Oblast, Russia",Russia,1966–,1500,"R-7/Soyuz, Kosmos, Tsyklon-3, Rokot, Angara",[20],Plesetsk Cosmodrome,Russia,1966–,1000.0,...,62°55′32″N 40°34′40″E﻿,62°55′32″N,40°34′40″E﻿,﻿62.92556°N 40.57778°E,62.92556,40.57778,211.689951,1966,operational,55
6,"Kennedy Space Center, Florida, US",US,1967–,187,"17 Saturn, 135 Space Shuttle, 32 Falcon 9, 3 F...","Saturn, STS, F9",Kennedy Space Center,United States,1962–,151.0,...,28°36′30″N 80°36′14″W﻿,28°36′30″N,80°36′14″W﻿,﻿28.6082°N 80.6040°W,28.6082,-80.604,408.319444,1962,operational,59
7,"Guiana Space Centre, Kourou, French Guiana, Fr...",French Guiana,1970–,261,"7 Diamant, 227 Ariane, 16 Soyuz-2, 11 Vega",see 4 rockets,Guiana Space Centre,French Guiana,1968–,200.0,...,5°14′15″N 52°46′10″W﻿,5°14′15″N,52°46′10″W﻿,﻿5.23739°N 52.76950°W,5.23739,-52.7695,463.159848,1968,operational,53
8,"Jiuquan Satellite Launch Center, China",China,1970–,85,"2 LM1, 3 LM2A, 20 LM2C, 36 LM2D, 13 LM2F, 3 LM...",See 8 rockets,Jiuquan Satellite Launch Center,China,1970–,,...,40°57′38″N 100°17′54″E﻿,40°57′38″N,100°17′54″E﻿,﻿40.96056°N 100.29833°E,40.96056,100.29833,351.226613,1970,operational,51
9,"Tanegashima Space Center, Japan",Japan,1975–,65,"6 N-I, 8 N-II, 9 H-I, 6 H-II, 36 H-IIA",see 5 rockets,Tanegashima Space Center,Japan,1967–,,...,30°23′27″N 130°58′05″E﻿,30°23′27″N,130°58′05″E﻿,﻿30.39096°N 130.96813°E,30.39096,130.96813,401.193641,1967,operational,54


In [30]:
selected_locations_list = ['Broglio Space Centre (San Marco), Malindi',
                           'Plesetsk Cosmodrome',
                           'Cape Canaveral Space Force Station, Florida',
                           'Pacific Spaceport Complex, Kodiak, Alaska',
                           'Guiana Space Centre, Kourou',
                           'Spaceport America, Upham, New Mexico']

# Small Table

In [38]:
top_launchers.columns

Index(['Spaceport', 'Location', 'Years(orbital)', 'number of launches',
       'Launch vehicles(operators)', 'Sources', 'name', 'Country',
       'Operational date', 'Number of rocket launches',
       'Heaviest rocket launched', 'Highest achieved altitude', 'Continent',
       'coordinates (dms)', 'lat (dms)', 'long (dms)', 'coordinates (decimal)',
       'lat (decimal)', 'long (decimal)', 'rotational_vel', 'year opened',
       'year closed', 'operation length'],
      dtype='object')

In [57]:
small_table = top_launchers.loc[:,['name', 'number of launches','Highest achieved altitude','operation length','rotational_vel', 'Continent','Country']]

In [58]:
small_table

Unnamed: 0,name,number of launches,Highest achieved altitude,operation length,rotational_vel,Continent,Country
0,Baikonur Cosmodrome,1000,Interplanetary,64,323.348532,Asia,Kazakhstan
1,Cape Canaveral Space Force Station,400,Interstellar,72,408.86799,North America,United States
2,Vandenberg Space Force Base,700,Interplanetary,63,382.047322,North America,United States
5,Plesetsk Cosmodrome,1500,Orbital,55,211.689951,Europe,Russia
7,Guiana Space Centre,261,Interplanetary,53,463.159848,South America,French Guiana
