# Download Primary and Secondary Datasets

The primary dataset comprises eight months (May–December, 2019) of flight and weather data from US airports from the [Historical Flight Delay and Weather Data USA](https://www.kaggle.com/datasets/ioanagheorghiu/historical-flight-and-weather-data) dataset on [Kaggle](https://www.kaggle.com/).

The data was originally sourced from the [United States Bureau of Transportation Statistics](https://www.bts.gov/browse-statistical-products-and-data/bts-publications/airline-service-quality-performance-234-time) and the [National Oceanic and Atmospheric Administration](https://www.ncdc.noaa.gov/cdo-web/datatools/lcd).

<hr>

The secondary dataset is a list of [Airports in the United States of America](https://data.humdata.org/dataset/ourairports-usa) that (for our purposes) attaches three-dimensional coordinates (latitude, longitide, and altitude) to US airport codes.

In [1]:
# Primary dataset; format: Kaggle USERNAME/DATASET
datasource_primary = 'ioanagheorghiu/historical-flight-and-weather-data'

# Seconday dataset; format: URL
datasource_secondary = 'https://ourairports.com/countries/US/airports.csv'

In [2]:
import os

In [3]:
# Ensure the necessary folder structure exists
data_dir = os.path.join('..','resources','data')
os.makedirs(data_dir,exist_ok=True)

# api_dir = os.path.join('..','API_keys')
# os.makedirs(api_dir,exist_ok=True)

The following cell installs the Kaggle API.

If you do not already have it installed, **enable the cell** by converting it to Cell Type `Code`. (In the Jupyter Notebook menus, select `Cell` > `Cell Type` > `Code`.) Then, go to https://www.kaggle.com/docs/api and follow the `Authentication` instructions.

### Important
Your `kaggle.json` API key file must be in the proper location as specified in the `Authentication` instructions above.

Additional details about how to use the Kaggle API with Jupyter Notebook can be found [here](https://www.kaggle.com/code/donkeys/kaggle-python-api/notebook), [here](https://technowhisp.com/kaggle-api-python-documentation/), or [here](https://stackoverflow.com/a/60309843).

## Download Primary Dataset

In [4]:
import kaggle
from kaggle.api.kaggle_api_extended import KaggleApi

In [5]:
kag = KaggleApi()
kag.authenticate()

In [6]:
# Download primary dataset from Kaggle
kag.dataset_download_files(
    dataset=datasource_primary,
    unzip=True,
    path=data_dir,
)

## Download Secondary Dataset

In [7]:
from urllib.request import urlretrieve

In [8]:
# Download the secondary dataset
urlretrieve(
    datasource_secondary,
    filename=os.path.join(
        data_dir,
        os.path.basename(datasource_secondary)
    )
)

('..\\resources\\data\\airports.csv',
 <http.client.HTTPMessage at 0x1c75c70aec8>)