# Data Extraction Test

In this test we will be piggyback off the API call by year notebook to see if we can extract a few data variables that we want and place them into a json file. This will be helpfull as it is our goal to initially store the raw data as a json file, then for the final product to convert the json to a csv.

## API Call By Year Extraction

In [1]:
from datetime import datetime
import dateutil.parser as dateparser
from pprint import pprint
import requests

We are going to be calling for the year 2010 for this test code

In [2]:
year = 2010

In [3]:
#Make a list with the first day of every month
year_month_starts = [dateparser.parse("1/1/"+str(year)), dateparser.parse("2/1/"+str(year)), dateparser.parse("3/1/"+str(year)), 
                     dateparser.parse("4/1/"+str(year)), dateparser.parse("5/1/"+str(year)), dateparser.parse("6/1/"+str(year)), 
                     dateparser.parse("7/1/"+str(year)), dateparser.parse("8/1/"+str(year)), dateparser.parse("9/1/"+str(year)), 
                     dateparser.parse("10/1/"+str(year)), dateparser.parse("11/1/"+str(year)), dateparser.parse("12/1/"+str(year)),
                     dateparser.parse("1/1/"+str(year+1))]

#Initialize list where all information from the API requests for the year will go - this will be a list of lists of dictionaries
year_launches = [] 

for index, month in enumerate(year_month_starts[:-1]):
    start_date = month
    end_date = year_month_starts[index + 1]
    #print(f"From {start_date} to {end_date}")

    #Set filter parameters with the start and end date:
    net_filters = f'net__gte={start_date.isoformat()}&net__lte={end_date.isoformat()}'

    #Set additional filters: 
    mode = 'mode=detailed' #setting this mode to detailed returns all related objects
    limit = 'limit=100' #this is the max!
    ordering = 'ordering=net' #orders in ascending date order 

    #Assemble the full URL for the query: 
    query_url = url = "https://ll.thespacedevs.com/2.3.0/launches/previous/" + "?" + "&".join(
        (net_filters, mode, limit, ordering) 
    ) 

    print(f'Query URL: {query_url}') 

    #Make the actual API call: 
    response = requests.get(url)
    print("Status code", response.status_code)

    #Deal with data we received: 
    raw_data = response.json()
    launch_sample = raw_data['results'] #launch records are inside the 'results' key
    print("Number of launches for month:", len(launch_sample)) #just to check

    #Append to list
    year_launches.append(launch_sample)

    print("---")

Query URL: https://ll.thespacedevs.com/2.3.0/launches/previous/?net__gte=2010-01-01T00:00:00&net__lte=2010-02-01T00:00:00&mode=detailed&limit=100&ordering=net
Status code 200
Number of launches for month: 2
---
Query URL: https://ll.thespacedevs.com/2.3.0/launches/previous/?net__gte=2010-02-01T00:00:00&net__lte=2010-03-01T00:00:00&mode=detailed&limit=100&ordering=net
Status code 200
Number of launches for month: 4
---
Query URL: https://ll.thespacedevs.com/2.3.0/launches/previous/?net__gte=2010-03-01T00:00:00&net__lte=2010-04-01T00:00:00&mode=detailed&limit=100&ordering=net
Status code 200
Number of launches for month: 4
---
Query URL: https://ll.thespacedevs.com/2.3.0/launches/previous/?net__gte=2010-04-01T00:00:00&net__lte=2010-05-01T00:00:00&mode=detailed&limit=100&ordering=net
Status code 200
Number of launches for month: 9
---
Query URL: https://ll.thespacedevs.com/2.3.0/launches/previous/?net__gte=2010-05-01T00:00:00&net__lte=2010-06-01T00:00:00&mode=detailed&limit=100&ordering=n

In [4]:
#Check on API throttle
API_throttle_URL = "https://ll.thespacedevs.com/2.3.0/api-throttle/"

response_throttle = requests.get(API_throttle_URL)

In [5]:
throttle_data = response_throttle.json()

pprint(throttle_data)

{'current_use': 12,
 'ident': '144.118.77.160',
 'limit_frequency_secs': 3600,
 'next_use_secs': 0,
 'your_request_limit': 15}


In [6]:
#Look at data we received
total_num_for_year = 0 #initialize
for item in year_launches:
    total_num_for_year += len(item)
    
print(f"Total number of launches in {year}: {total_num_for_year}")
#pprint(year_launches)

Total number of launches in 2010: 77


# First Data Extraction

**First question will be how can we only seee the information for the first launch?**

In [None]:
#year_launches

[[{'id': 'bb643566-508a-4f3a-a701-85669d11e2b3',
   'url': 'https://ll.thespacedevs.com/2.3.0/launches/bb643566-508a-4f3a-a701-85669d11e2b3/',
   'name': 'Long March 3C | Compass-G1',
   'response_mode': 'detailed',
   'slug': 'long-march-3c-compass-g1',
   'launch_designator': '2010-001',
   'status': {'id': 3,
    'name': 'Launch Successful',
    'abbrev': 'Success',
    'description': 'The launch vehicle successfully inserted its payload(s) into the target orbit(s).'},
   'last_updated': '2024-06-15T14:25:50Z',
   'net': '2010-01-16T16:12:04Z',
   'net_precision': {'id': 0,
    'name': 'Second',
    'abbrev': 'SEC',
    'description': 'The T-0 is accurate to the second.'},
   'window_end': '2010-01-16T16:12:04Z',
   'window_start': '2010-01-16T16:12:04Z',
   'image': {'id': 2453,
    'name': "Chang'e 2 Liftoff_Long March 3C",
    'image_url': 'https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/chang2527e_2_l_image_20250906202038.bmp',
    'thumbnail_url': 'https://th

In [None]:
# information for the first launch
#So the data format from the request returns a list of lists.
#Inside each list is a dictionary
#year_launches[0]

[{'id': 'bb643566-508a-4f3a-a701-85669d11e2b3',
  'url': 'https://ll.thespacedevs.com/2.3.0/launches/bb643566-508a-4f3a-a701-85669d11e2b3/',
  'name': 'Long March 3C | Compass-G1',
  'response_mode': 'detailed',
  'slug': 'long-march-3c-compass-g1',
  'launch_designator': '2010-001',
  'status': {'id': 3,
   'name': 'Launch Successful',
   'abbrev': 'Success',
   'description': 'The launch vehicle successfully inserted its payload(s) into the target orbit(s).'},
  'last_updated': '2024-06-15T14:25:50Z',
  'net': '2010-01-16T16:12:04Z',
  'net_precision': {'id': 0,
   'name': 'Second',
   'abbrev': 'SEC',
   'description': 'The T-0 is accurate to the second.'},
  'window_end': '2010-01-16T16:12:04Z',
  'window_start': '2010-01-16T16:12:04Z',
  'image': {'id': 2453,
   'name': "Chang'e 2 Liftoff_Long March 3C",
   'image_url': 'https://thespacedevs-prod.nyc3.digitaloceanspaces.com/media/images/chang2527e_2_l_image_20250906202038.bmp',
   'thumbnail_url': 'https://thespacedevs-prod.nyc3.d

In [None]:
year_launches[0][0].keys()

dict_keys(['id', 'url', 'name', 'response_mode', 'slug', 'launch_designator', 'status', 'last_updated', 'net', 'net_precision', 'window_end', 'window_start', 'image', 'infographic', 'probability', 'weather_concerns', 'failreason', 'hashtag', 'launch_service_provider', 'rocket', 'mission', 'pad', 'webcast_live', 'program', 'orbital_launch_attempt_count', 'location_launch_attempt_count', 'pad_launch_attempt_count', 'agency_launch_attempt_count', 'orbital_launch_attempt_count_year', 'location_launch_attempt_count_year', 'pad_launch_attempt_count_year', 'agency_launch_attempt_count_year', 'flightclub_url', 'updates', 'info_urls', 'vid_urls', 'timeline', 'pad_turnaround', 'mission_patches'])

In [None]:
#year_launches[0][0]['rocket']

{'id': 1438,
 'configuration': {'response_mode': 'detailed',
  'id': 90,
  'url': 'https://ll.thespacedevs.com/2.3.0/launcher_configurations/90/',
  'name': 'Long March 3',
  'families': [{'response_mode': 'detailed',
    'id': 106,
    'name': 'Long March',
    'manufacturer': [{'response_mode': 'normal',
      'id': 88,
      'url': 'https://ll.thespacedevs.com/2.3.0/agencies/88/',
      'name': 'China Aerospace Science and Technology Corporation',
      'abbrev': 'CASC',
      'type': {'id': 1, 'name': 'Government'},
      'featured': True,
      'country': [{'id': 6,
        'name': 'China',
        'alpha_2_code': 'CN',
        'alpha_3_code': 'CHN',
        'nationality_name': 'Chinese',
        'nationality_name_composed': 'Sino'}],
      'description': 'The China Aerospace Science and Technology Corporation (CASC) is the main contractor for the Chinese space program. It is state-owned and has a number of subordinate entities which design, develop and manufacture a range of spac

In [None]:
#year_launches[0][1]['rocket']

{'id': 1439,
 'configuration': {'response_mode': 'detailed',
  'id': 87,
  'url': 'https://ll.thespacedevs.com/2.3.0/launcher_configurations/87/',
  'name': 'Proton-M',
  'families': [{'response_mode': 'detailed',
    'id': 130,
    'name': 'Proton / UR-500',
    'manufacturer': [{'response_mode': 'normal',
      'id': 96,
      'url': 'https://ll.thespacedevs.com/2.3.0/agencies/96/',
      'name': 'Khrunichev State Research and Production Space Center',
      'abbrev': 'KhSC',
      'type': {'id': 1, 'name': 'Government'},
      'featured': True,
      'country': [{'id': 5,
        'name': 'Russia',
        'alpha_2_code': 'RU',
        'alpha_3_code': 'RUS',
        'nationality_name': 'Russian',
        'nationality_name_composed': 'Russo'}],
      'description': 'Khrunichev State Research and Production Space Center is a Moscow-based producer of spacecraft and space-launch systems, including the Proton and Rokot rockets and is currently developing the Angara rocket family. The Prot

In [None]:
#year_launches[1][3]['rocket']

{'id': 1443,
 'configuration': {'response_mode': 'detailed',
  'id': 87,
  'url': 'https://ll.thespacedevs.com/2.3.0/launcher_configurations/87/',
  'name': 'Proton-M',
  'families': [{'response_mode': 'detailed',
    'id': 130,
    'name': 'Proton / UR-500',
    'manufacturer': [{'response_mode': 'normal',
      'id': 96,
      'url': 'https://ll.thespacedevs.com/2.3.0/agencies/96/',
      'name': 'Khrunichev State Research and Production Space Center',
      'abbrev': 'KhSC',
      'type': {'id': 1, 'name': 'Government'},
      'featured': True,
      'country': [{'id': 5,
        'name': 'Russia',
        'alpha_2_code': 'RU',
        'alpha_3_code': 'RUS',
        'nationality_name': 'Russian',
        'nationality_name_composed': 'Russo'}],
      'description': 'Khrunichev State Research and Production Space Center is a Moscow-based producer of spacecraft and space-launch systems, including the Proton and Rokot rockets and is currently developing the Angara rocket family. The Prot

In [32]:
len(year_launches)

12

In [40]:
print("List 1, Launch 1: ",year_launches[0][0]['window_start'])
print("List 1, Launch 2: ",year_launches[0][1]['window_start'])
print("List 2, Launch 1: ",year_launches[1][0]['window_start'])
print("List 3, Launch 1: ",year_launches[3][0]['window_start'])
print("List 4, Launch 1: ",year_launches[4][0]['window_start'])
print("List 5, Launch 1: ",year_launches[5][0]['window_start'])
print("List 6, Launch 1: ",year_launches[6][0]['window_start'])
print("List 7, Launch 1: ",year_launches[7][0]['window_start'])
print("List 8, Launch 1: ",year_launches[8][0]['window_start'])
print("List 9, Launch 1: ",year_launches[9][0]['window_start'])
print("List 10, Launch 1: ",year_launches[10][0]['window_start'])
print("List 11, Launch 1: ",year_launches[11][0]['window_start'])


List 1, Launch 1:  2010-01-16T16:12:04Z
List 1, Launch 2:  2010-01-28T00:18:00Z
List 2, Launch 1:  2010-02-03T03:45:30Z
List 3, Launch 1:  2010-04-02T04:04:33Z
List 4, Launch 1:  2010-05-14T18:20:09Z
List 5, Launch 1:  2010-06-02T01:59:12Z
List 6, Launch 1:  2010-07-10T18:40:36Z
List 7, Launch 1:  2010-08-04T20:59:00Z
List 8, Launch 1:  2010-09-02T00:53:43Z
List 9, Launch 1:  2010-10-01T10:59:57Z
List 10, Launch 1:  2010-11-02T00:58:39Z
List 11, Launch 1:  2010-12-05T10:25:00Z


### Data Structure

The data appears to be in a format of a list of lists. Within each list there exists multiple dictionaries that detail the information based on the rocket. From my search it appears that each disctionary in a respective list is related a a certain rocket. Here are my initial findings from looking at how the data is structured.

1. Each element within the list **year_launches** represents a month of the year
2. Multiple launches may occur within the same month, so there are dictionaries that represent each launch
    - There is the possibility that there can be zero launches during a month, more that 1, so we have to make sure that filter opperates for those cases
3. Within each launch dictionary there are multiple keys.
    - Some keys behave as key value pairs
    - Some keys have dictionaries as their values, so they may have further nested levels

## Feature Navigation

In this section I aim to locate specific features we wish to access to create our own dataset


### The Rocket

1. Rocket Name, Specific Variant Used
2. Company that Conducted the launch
3. Model Type
4. Paylod

For the example we will be using the first launch

In [43]:
year_launches[0][1]['rocket']

{'id': 1439,
 'configuration': {'response_mode': 'detailed',
  'id': 87,
  'url': 'https://ll.thespacedevs.com/2.3.0/launcher_configurations/87/',
  'name': 'Proton-M',
  'families': [{'response_mode': 'detailed',
    'id': 130,
    'name': 'Proton / UR-500',
    'manufacturer': [{'response_mode': 'normal',
      'id': 96,
      'url': 'https://ll.thespacedevs.com/2.3.0/agencies/96/',
      'name': 'Khrunichev State Research and Production Space Center',
      'abbrev': 'KhSC',
      'type': {'id': 1, 'name': 'Government'},
      'featured': True,
      'country': [{'id': 5,
        'name': 'Russia',
        'alpha_2_code': 'RU',
        'alpha_3_code': 'RUS',
        'nationality_name': 'Russian',
        'nationality_name_composed': 'Russo'}],
      'description': 'Khrunichev State Research and Production Space Center is a Moscow-based producer of spacecraft and space-launch systems, including the Proton and Rokot rockets and is currently developing the Angara rocket family. The Prot

In [55]:
year_launches[0][0]['rocket']['configuration']['families'][0]['manufacturer'][0]

{'response_mode': 'normal',
 'id': 88,
 'url': 'https://ll.thespacedevs.com/2.3.0/agencies/88/',
 'name': 'China Aerospace Science and Technology Corporation',
 'abbrev': 'CASC',
 'type': {'id': 1, 'name': 'Government'},
 'featured': True,
 'country': [{'id': 6,
   'name': 'China',
   'alpha_2_code': 'CN',
   'alpha_3_code': 'CHN',
   'nationality_name': 'Chinese',
   'nationality_name_composed': 'Sino'}],
 'description': 'The China Aerospace Science and Technology Corporation (CASC) is the main contractor for the Chinese space program. It is state-owned and has a number of subordinate entities which design, develop and manufacture a range of spacecraft, launch vehicles, strategic and tactical missile systems, and ground equipment. It was officially established in July 1999 as part of a Chinese government reform drive, having previously been one part of the former China Aerospace Corporation. Various incarnations of the program date back to 1956.',
 'administrator': 'Chairman & Preside

In [None]:
# Rocket Name and Company Name
print("Rocket Name : ", year_launches[0][0]['rocket']['configuration']['name'])
print("Company Name : ", year_launches[0][0]['rocket']['configuration']['families'][0]['manufacturer'][0]['name'])
print("Company Founding Year : ", year_launches[0][0]['rocket']['configuration']['families'][0]['manufacturer'][0]['founding_year'])
print("Country Affiliation : ", year_launches[0][0]['rocket']['configuration']['families'][0]['manufacturer'][0]['country'][0]['name'])
print("Company Type : ", year_launches[0][0]['rocket']['configuration']['families'][0]['manufacturer'][0]['type']['name'])


Rocket Name :  Long March 3
Company Name :  China Aerospace Science and Technology Corporation
Company Founding Year :  1999
Country Affiliation :  China
Company Type :  Government


In [58]:
year_launches[0][0].keys()

dict_keys(['id', 'url', 'name', 'response_mode', 'slug', 'launch_designator', 'status', 'last_updated', 'net', 'net_precision', 'window_end', 'window_start', 'image', 'infographic', 'probability', 'weather_concerns', 'failreason', 'hashtag', 'launch_service_provider', 'rocket', 'mission', 'pad', 'webcast_live', 'program', 'orbital_launch_attempt_count', 'location_launch_attempt_count', 'pad_launch_attempt_count', 'agency_launch_attempt_count', 'orbital_launch_attempt_count_year', 'location_launch_attempt_count_year', 'pad_launch_attempt_count_year', 'agency_launch_attempt_count_year', 'flightclub_url', 'updates', 'info_urls', 'vid_urls', 'timeline', 'pad_turnaround', 'mission_patches'])

In [89]:
year_launches[0][0]['failreason']

''

In [87]:
# Launch related parameters
print("Launch related parameters")
print("---------------")
print("Date of Launch : ", year_launches[0][0]['net'])
print("Lauch Status : ", year_launches[0][0]['status']['name'])
print("Short Form Status:", year_launches[0][0]['status']['abbrev'])
print("Launch pad location name:", year_launches[0][0]['pad']['location']['name'])
print("Launch pad name:", year_launches[0][0]['pad']['name'])
print("Launch pad country name:", year_launches[0][0]['pad']['country']['name'])
print("Launch pad country name:", year_launches[0][0]['pad']['country']['id'])
print("Launch pad lattitude:", year_launches[0][0]['pad']['latitude'])
print("Launch pad longitude:", year_launches[0][0]['pad']['longitude'])

print()



Launch related parameters
---------------
Date of Launch :  2010-01-16T16:12:04Z
Lauch Status :  Launch Successful
Short Form Status: Success
Launch pad location name: Xichang Satellite Launch Center, People's Republic of China
Launch pad name: Launch Complex 2 (LC-2)
Launch pad country name: China
Launch pad country name: 6
Launch pad lattitude: 28.245564
Launch pad longitude: 102.026751



## Test Data Loop

With the few parameters that we have acqired lets try to see if we can grab the specific information that we want and be able to create it into a format that is easily converted into JSON