# Capstone: Task 1
Huanlin Wang
47587225

## Data needed:
Based on common sense, the possible influencing factors are:
* The rocket's launch time, location, orbital parameters, and payload information
* The performance parameters of the rocket's first stage booster and second stage rocket, including the number of engines, thrust, fuel type, and burn time
* The rocket's recovery method (RTLS or ASDS) and recovery result (success or failure)
* The number of times the rocket has been launched and reused

## Explanation of data names, unified

|Name|Variable Name|Meaning|
| ---|---|---|
|Launch ID |lunch_id|Unique identifier for the launch|
|Launch Number |flight_number|Launch order represented in natural order|
|Launch Name |flight_name|Name of the launch|
|Launch Success |flight_success|True if launch is successful, false if failed, None if no data|
|Launch Time UTC |date_utc|Launch time in Coordinated Universal Time|
|Launch Site ID |lunchpad_id|ID of the launch site|
|Launch Site Name |lunchpad_name|Name of the launch site|
|Launch Longitude |lunchpad_longitudes|Longitude of the launch site|
|Launch Latitude |lunchpad_latitude|Latitude of the launch site|
|Payload Mass |payloadMass|Total mass of the payload|
|Working Orbit |Orbit|Working orbit of the payload|
|Core ID |core_id|Since Falcon 9 only has one booster, return booster ID|
|Core Series |core_serial|Series to which the booster rocket belongs|
|Core Version Number |core_block|For Falcon 9, there are currently 5 versions|
|Number of Flights for First Stage Booster |core_flight_cunt|Number of flights for core|
|Core Recovery Attempted |core_landing_attempt|Whether core attempted recovery|
|Core Recovery Successful |core_landing_success|Whether core successfully recovered|
|Core Recovery Method |core_landing_type|None (no recovery data), Ocean (landed in ocean), ASDS (recovered by unmanned ship at sea), RTLS (vertical landing at launch site)|
|Has Legs |core_is_legs|"Legs" refers to the legs used for rocket landing|
|Core Reusable |core_reused|Whether the first stage booster can be reused|
|Number of Times Core Reused |core_reuse_count|Number of times the core has been used|
|Current Status of Core |core_status|Lost, or end-of-life, or activate|
|Landing Site ID |landpad_id|ID of landing site|
|Landing Site Name |landpad_name|Name of landing site|
|Landing Site Longitude |landpad_longitudes|Landing site longitude|
|Landing Site Latitude |landpad_latitude|Landing site latitude|


### Get data

In [1]:
import requests

launches = requests.get('https://api.spacexdata.com/v5/launches').json()
rockets = requests.get('https://api.spacexdata.com/v4/rockets').json()
payloads = requests.get('https://api.spacexdata.com/v4/payloads').json()
launchpads = requests.get('https://api.spacexdata.com/v4/launchpads').json()
landingpads = requests.get('https://api.spacexdata.com/v4/landpads').json()
capsules = requests.get('https://api.spacexdata.com/v4/capsules').json()
cores = requests.get('https://api.spacexdata.com/v4/cores').json()
dragons = requests.get('https://api.spacexdata.com/v4/dragons').json()

### Create an empty dataframe table and populate it with the required data

In [2]:
import pandas as pd
from datetime import datetime

def getValueinDic(dict_list, target_key, key, value):
    result = []
    for d in dict_list:
        if target_key in d and d[key] == value:
            result.append(d[target_key])
    return result

def getValueinList(input_list, dict_list, id_key, value_key):
    result = []
    for value in input_list:
        if value == None:
            result.append(None)
        else:
            for d in dict_list:
                if d[id_key] == value:
                    result.append(d[value_key])
    return result


def get_total_mass(launch_ids, launches, payloads):
    result = []
    for launch_id in launch_ids:
        total_mass = 0
        for launch in launches:
            if launch['id'] == launch_id:
                for payload_id in launch['payloads']:
                    for payload in payloads:
                        if payload['id'] == payload_id:
                            if payload['mass_kg'] is None:
                                total_mass = None
                                break
                            else:
                                total_mass += payload['mass_kg']
                result.append(total_mass)
    return result

def get_orbits(launch_ids, payloads):
    result = []
    for launch_id in launch_ids:
        orbit = None
        for payload in payloads:
            if payload['launch'] == launch_id:
                orbit = payload['orbit']
                break
        result.append(orbit)
    return result

def get_cores(launch_ids, data, data2):
    result = []
    for launch_id in launch_ids:
        for d in data:
            if d['id'] == launch_id:
                for core in d['cores']:
                    result.append(core[data2])
    return result

df = pd.DataFrame()

launch_ids = getValueinDic(launches,'id','rocket','5e9d0d95eda69973a809d1ec')
lunchpad_ids = getValueinList(launch_ids,launches,'id','launchpad')
core_ids = get_cores(launch_ids, launches,'core')
landpad_id = get_cores(launch_ids, launches,'landpad')

df['lunch_id'] = launch_ids
df['flight_number'] = getValueinList(launch_ids,launches,'id','flight_number')
df['flight_name'] = getValueinList(launch_ids,launches,'id','name')
df['flight_success'] = getValueinList(launch_ids,launches,'id','success')
date_unix = getValueinList(launch_ids,launches,'id','date_unix')
datetimes = [datetime.fromtimestamp(ts) for ts in date_unix]
df['date_utc'] = [dt.strftime('%Y-%m-%d') for dt in datetimes]
df['lunchpad_id'] = lunchpad_ids
df['lunchpad_name'] = getValueinList(lunchpad_ids,launchpads,'id','name')
df['lunchpad_longitudes'] = getValueinList(lunchpad_ids,launchpads,'id','longitude')
df['lunchpad_latitude'] = getValueinList(lunchpad_ids,launchpads,'id','latitude')
df['payloads_Mass'] = get_total_mass(launch_ids, launches, payloads)
df['Orbit'] = get_orbits(launch_ids, payloads)
df['core_id'] = core_ids
df['core_serial'] = getValueinList(core_ids,cores,'id','serial')
df['core_block'] = getValueinList(core_ids,cores,'id','block')
df['core_flight_cunt'] = get_cores(launch_ids, launches,'flight')
df['core_landing_attempt'] = get_cores(launch_ids, launches,'landing_attempt')
df['core_landing_success'] = get_cores(launch_ids, launches,'landing_success')
df['core_landing_type'] = get_cores(launch_ids, launches,'landing_type')
df['core_is_legs'] = get_cores(launch_ids, launches,'legs')
df['core_is_reused'] = get_cores(launch_ids, launches,'reused')
df['core_reuse_count'] = getValueinList(core_ids,cores,'id','reuse_count')
df['core_status'] = getValueinList(core_ids,cores,'id','status')
df['landpad_id'] = landpad_id
df['landpad_name'] = getValueinList(landpad_id,landingpads,'id','name')
df['landpad_longitudes'] = getValueinList(landpad_id,landingpads,'id','longitude')
df['landpad_latitude'] = getValueinList(landpad_id,landingpads,'id','latitude')

### Unify missing values, normalize integers, remove invalid values

In [3]:
import numpy as np

def float_to_int(x):
    if pd.isna(x):
        return np.nan
    else:
        return int(x)

df = df.fillna(value=np.nan)
df['core_block'] = df['core_block'].apply(float_to_int)
df['core_flight_cunt'] = df['core_flight_cunt'].apply(float_to_int)
df['core_reuse_count'] = df['core_reuse_count'].apply(float_to_int)

mask = df['flight_number'] >= 188
df = df.drop(index=df[mask].index)

### Dataframe to csv

In [4]:
df.to_csv('data.csv', index=False)

### Dealing with missing values in payloads

Because of the sparse amount of data, we need to manually find the missing parts of the computation in order to ensure that no details are lost.

|Launch ID|Payload ID|Payload Name|Payload Value|Data Source|Possible Reason for No Data|
|---|---|---|---|---|---|
|'5eb87cddffd86e000604b32f'|'5eb0e4b7b6c3bb0006eeb1e7'|Dragon Qualification Unit|4200kg|Take Dragon 1's dry_mass_kg|Falcon 9 launch with Dragon spacecraft verification|
|'5eb87cdeffd86e000604b330'|'5eb0e4b9b6c3bb0006eeb1e8',<br>'5eb0e4b9b6c3bb0006eeb1e9'|SpaceX COTS Demo Flight 1+Cubesats|6000kg+1kg (microsatellite)|Wikipedia|Verification launch|
|'5eb87d01ffd86e000604b350'|'5eb0e4c3b6c3bb0006eeb20c'|NROL-76|Take average value 8000kg|spacenews website|Launch mission of the US intelligence department, information is confidential|
|'5eb87d10ffd86e000604b35e'|'5eb0e4c6b6c3bb0006eeb21a'|ZUMA|8000kg|Wikipedia|Military secret|
|'5eb87d3dffd86e000604b381'|'5eb0e4d0b6c3bb0006eeb250'|Crew Dragon In Flight Abort Test|6350kg|Manned Dragon 2's dry mass|Manned In-Flight Abort test|
|'5eb87d50ffd86e000604b394'|'5eb0e4d2b6c3bb0006eeb25b'|ANASIS-II|4500kg-6000kg take 5500kg|americaspace.com|Military secret|
|'5eb87d4dffd86e000604b38e'|'5eb0e4d2b6c3bb0006eeb25f'|Crew-1|Dragon 2's 6350kg+7 crew members 560kg+150kg payload= 6950kg|Wikipedia and official data |Unknown|
|'5f8399fb818d8b59f5740d43'|'5f839ac7818d8b59f5740d48'|NROL-108 |8000kg |space.com |Military secret |
|'5fd386aa7faea57d297c86c1'|'5fd3871a7faea57d297c86c6'|Transporter-1 |9407.5kg |Search for satellite company information separately |Because the satellites carried at one time are too miscellaneous, the data is publicly available. Fortunately, most of those without data are small and micro and can almost be ignored |
|'600f9b6d8f798e2a4d5f979f'|'608ac397eb3e50044e3630e7'|Transporter-2 |9407.5kg |Use Transporter-1 data |Same because the information is too miscellaneous |
|'605b4b95aa5433645e37d041'|'605b4bfcaa5433645e37d048',<br>'609f48374a12e4692eae4667',<br>'609f49c64a12e4692eae4668'|Starlink-26,Capella-6,Tyvak-0130 |13520+112+11.2=13643.2 |eoportal.org |CubeSat seems to be very light because it is a cubic satellite and is not included in the statistics |
|5fe3b11eb3467846b324216c|5fe3c4f2b3467846b3242193|CRS-23|2207kg|Wikipedia|Unknown|
|'607a37565a906a44023e0866'|'607a382f5a906a44023e0867'|Inspiration4 |12519kg |Wikipedia |Unknown |
|'6161d2006db1a92bfba85356'|'6161d22a6db1a92bfba85357'|CRS-24 |2989kg |Wikipedia |Unknown |
|'6243ad8baf52800c6e919252'|'6243af62af52800c6e919260'|Transporter-3 |5000kg |Estimated value of americaspace.com |Too miscellaneous |
|'61bf3e31cd5ab50b0d936345'|'6175aaacefa4314085aa9c56'|NROL-87 |8000kg |Slightly |Military secret |
|'6243ad8baf52800c6e919252'|'6243af62af52800c6e919260'|Transporter-4 |5000kg |Like 3's value |Too miscellaneous |
|'61eefaa89eb1064137a1bd73'|'61eefb129eb1064137a1bd74'|Ax-1 |12519kg |Wikipedia |Unknown |
|'6243adcaaf52800c6e919254'|'6243b036af52800c6e919262'|NROL-85 |8000kg |Slightly |Military secret |
|'6243ade2af52800c6e919255'|'6243b1cdaf52800c6e919265'|Crew-4 |6850kg |Crew-1 data minus the weight of 3 fewer people |Unknown |
|'6243ae24af52800c6e919258'|'6243b39daf52800c6e919267'|Transporter-5 |5000kg |Like 3's value, the number of micros is almost the same |Too miscellaneous |
|'6243ae40af52800c6e919259'|'6243b835af52800c6e91926d'|CRS-25 |2600kg |Wikipedia |Unknown |
|'62dd70d5202306255024d139'|'62dd73ed202306255024d145'|Crew-5 |6850kg |Same as Crew-4 |Unknown |

~~~
Transporter-1 List
* 48 SuperDove satellites for Planet
* 36 SpaceBEE satellites for Swarm
* 10 Starlink satellites for SpaceX
* 8 GEN1 satellites for Kepler
* 8 Lemur-2 satellites for Spire
* 5 Astrocast satellites
* 3 HawkEye 360 satellites
* 3 ICEYE satellites
* 3 V-R3x satellites for NASA
* 3 ARCE-1 satellites for the University of South Florida
* 2 Capella satellites
* Sherpa-FX space tug for Spaceflight
* D-Orbit’s ION SCV Laurentius space tug
* iQPS-2 for iQPS of Japan
* YUSAT for Taiwan’s Ministry of Science and Technology
* IDEASSAT for Taiwan’s Ministry of Science and Technology
* UVQS-SAT for LATMOS of France
* ASELSAT for ASELSAN of Turkey
* Hiber Four for Hiber of the Netherlands
* SOMP2b for TU Dresden of Germany
* PIXL-1 for DLR of Germany
* Charlie for U.S.-based Aurora Insight
* Hugo for GHGSat of Canada
* PTD-1 for NASA
* Prometheus for Los Alamos National Laboratory

192+14.4+8000+672+36+5+40.2+85+3+3+220+0+0+100老版本+2+4+1.6+0+0+2+0.3+0+16+11+0=9407.5kg
~~~

Other missing value

Mission id'62a9f0e320413d2695d88713', flight_success'=True.