Homework

Write a python script that will load data from the SpaceX API into DuckDB using dlt.

Use:
- @dlt.source
- @dlt.resource
- @dlt.transformer

SpaceX API URL: https://api.spacexdata.com

Docs: https://github.com/r-spacex/SpaceX-API/blob/master/docs/README.md

Endpoints for loading:
- launches
- rockets
- crew

# Install dlt with duckdb extention

In [1]:
%%capture
!pip install dlt[duckdb]

# Play with SpaceX API

In [None]:
import requests
response = requests.get("https://api.spacexdata.com/v4/launches")
response.json()[0]

# Helper
Run the cell and ignore it.

In [6]:
from dlt.common.pipeline import LoadInfo

def assert_load_info(info: LoadInfo, expected_load_packages: int = 1) -> None:
    """Asserts that expected number of packages was loaded and there are no failed jobs"""
    assert len(info.loads_ids) == expected_load_packages
    # all packages loaded
    assert all(package.state == "loaded" for package in info.load_packages) is True
    # no failed jobs in any of the packages
    info.raise_on_failed_jobs()

# Task 1


Create a pipeline for SpaceX API, for the next endpoints: launches, rockets, crew.

- Fill the empty lines in the functions below.
- `get_rockets` resource should have `table_name=rockets`.
- Create a [resource](https://dlthub.com/docs/general-usage/resource#declare-a-resource) for the `crew` endpoint from scratch.
- [Run the pipeline](https://dlthub.com/docs/walkthroughs/run-a-pipeline) without errors.

In [10]:
import time
import dlt
import requests


@dlt.resource(table_name="launches")
def get_launches():
    # url to request launches
    url = "https://api.spacexdata.com/v4/launches"
    # make the request and check if succeeded
    response = requests.get(url)
    response.raise_for_status()
    yield response.json()

pipeline = dlt.pipeline(
    pipeline_name='spacex_with_source',
    destination='duckdb',
    dataset_name='spacex_data',
    dev_mode=True,
)

load_info = pipeline.run([get_launches()])
print(load_info)
assert_load_info(load_info)


MissingDependencyException: 
You must install additional dependencies to run duckdb destination. If you use pip you may do the following:

pip install "dlt[duckdb]"

Dependencies for specific destinations are available as extras of dlt