# Turbulence Notebook - Pilot Report Downloads

**Description**: The purpose of this notebook is to download pilot reports (PIREPs). These are raw POINT data provided by pilots that help give information about what they are experiencing in the atmosphere during flight.

**Where**:
- https://mesonet.agron.iastate.edu/request/gis/pireps.php
- cgi-bin, direct download link was sent to me

## About the Data

**Data Provided**: date/time, urgent, raw PIREP, icing, turbulence, lat/lon, aircraft, ATRCC
- Example raw PIREP: BNA UA /OV BNA135020/TM 1932/FL070/TP LJ60/TB MOD-SEV DURGD 070
    - Decoded: BNA - Nashville | UA - routine report | OV BNA135020 - relative location, 20 miles SE of BNA  | TM 1932 - time | FL 070 - altitude, 7000ft | TP LJ60 - aircraft | TB MOD-SEV - moderate to severe turbulence, DURGD 070 during decent 7000ft
    - They can include more information than this, including sky cover, icing, temperature, and other remarks

**Time**: 2003-current

**Location**: United States (PIREPs can also be found over the Atlantic and Pacific Oceans)

**Data formats available**: Shapefile, comma delimited; this notebook will download CSVs

## Bulk Filtering and Downloading

We'll download monthly data.

In [2]:
# create a dictionary with months and corresponding days - this way the data will pull and download properly
month_days = dict({
    1: 31,
    2: 28,
    3: 31,
    4 : 30,
    5: 31,
    6: 30,
    7: 31,
    8: 31,
    9: 30,
    10: 31,
    11 : 30,
    12 : 31
})

In [2]:
file_list = []


for year in range(2020, 2024, 1):
    for month in month_days.keys(): 
        start_month = month
        end_month = month
        day_1 = int(1)
        day_2 = month_days[month]
        url = f'https://mesonet.agron.iastate.edu/cgi-bin/request/gis/pireps.py?year1={year}&month1={start_month}&day1={day_1}&hour1=0&minute1=0&year2={year}&month2={end_month}&day2={day_2}&hour2=23&minute2=59&fmt=csv'
        file_list.append(url)

print(file_list)

['https://mesonet.agron.iastate.edu/cgi-bin/request/gis/pireps.py?year1=2015&month1=1&day1=1&hour1=0&minute1=0&year2=2015&month2=1&day2=31&hour2=23&minute2=59&fmt=csv', 'https://mesonet.agron.iastate.edu/cgi-bin/request/gis/pireps.py?year1=2015&month1=2&day1=1&hour1=0&minute1=0&year2=2015&month2=2&day2=28&hour2=23&minute2=59&fmt=csv', 'https://mesonet.agron.iastate.edu/cgi-bin/request/gis/pireps.py?year1=2015&month1=3&day1=1&hour1=0&minute1=0&year2=2015&month2=3&day2=31&hour2=23&minute2=59&fmt=csv', 'https://mesonet.agron.iastate.edu/cgi-bin/request/gis/pireps.py?year1=2015&month1=4&day1=1&hour1=0&minute1=0&year2=2015&month2=4&day2=30&hour2=23&minute2=59&fmt=csv', 'https://mesonet.agron.iastate.edu/cgi-bin/request/gis/pireps.py?year1=2015&month1=5&day1=1&hour1=0&minute1=0&year2=2015&month2=5&day2=31&hour2=23&minute2=59&fmt=csv', 'https://mesonet.agron.iastate.edu/cgi-bin/request/gis/pireps.py?year1=2015&month1=6&day1=1&hour1=0&minute1=0&year2=2015&month2=6&day2=30&hour2=23&minute2=59&f

Down below, we have quite a bit bunched into one function. First, we read each URL into a data frame, and only pull certain columns. Next, we will remove duplicate reports and then remove NA values. Finally, we'll save each file per month, and then read everything into one CSV for further processing purposes.

In [3]:
import os
from pathlib import Path
import pandas as pd

for f in file_list:
    try:
        df = pd.read_csv(f, usecols = ['VALID','REPORT','TURBULENCE', 'LAT', 'LON'], dtype=str)
    except:
        print("Warning:", f, "could not be read")
    
    # remove duplicate reports
    duplicates_rem = df.drop_duplicates()

    # drop na and 'None' values
    na_drop = duplicates_rem.dropna()
    #print(turb)

    # drop None values for latitude and longitude only
    turb = na_drop.mask(na_drop.eq('None')).dropna(subset = ['LAT', 'LON'])
    
    # set data directory
    dataDir = './pirep_downloads'
    output_dir = Path('./pirep_downloads')

    # check if directory exists -> if directory doesn't exist, directory is created
    if not os.path.exists(dataDir):
        os.mkdir(dataDir)
        
    # download each dataframe with its file name
    n = (f.split("?")[-1] + ".csv")
    turb.to_csv(output_dir / f'{n}')
    print("Downloaded", n)
    
print("Finished.")

Downloaded year1=2015&month1=1&day1=1&hour1=0&minute1=0&year2=2015&month2=1&day2=31&hour2=23&minute2=59&fmt=csv.csv
Downloaded year1=2015&month1=2&day1=1&hour1=0&minute1=0&year2=2015&month2=2&day2=28&hour2=23&minute2=59&fmt=csv.csv
Downloaded year1=2015&month1=3&day1=1&hour1=0&minute1=0&year2=2015&month2=3&day2=31&hour2=23&minute2=59&fmt=csv.csv
Downloaded year1=2015&month1=4&day1=1&hour1=0&minute1=0&year2=2015&month2=4&day2=30&hour2=23&minute2=59&fmt=csv.csv
Downloaded year1=2015&month1=5&day1=1&hour1=0&minute1=0&year2=2015&month2=5&day2=31&hour2=23&minute2=59&fmt=csv.csv
Downloaded year1=2015&month1=6&day1=1&hour1=0&minute1=0&year2=2015&month2=6&day2=30&hour2=23&minute2=59&fmt=csv.csv
Downloaded year1=2015&month1=7&day1=1&hour1=0&minute1=0&year2=2015&month2=7&day2=31&hour2=23&minute2=59&fmt=csv.csv
Downloaded year1=2015&month1=8&day1=1&hour1=0&minute1=0&year2=2015&month2=8&day2=31&hour2=23&minute2=59&fmt=csv.csv
Downloaded year1=2015&month1=9&day1=1&hour1=0&minute1=0&year2=2015&month

#### Combining all Files

This is mostly for the EDA, which is in a separate notebook, but I combined all dataframes into a single dataframe/csv file as well.

In [4]:
import glob
import pandas as pd

pirep_files = glob.glob("./pirep_downloads/*.csv")

df_all = pd.concat(map(pd.read_csv, pirep_files))
print(df_all)

       Unnamed: 0         VALID  \
0               0  201903010000   
1               1  201903010000   
2               2  201903010000   
3               3  201903010000   
4               4  201903010000   
...           ...           ...   
44917       44937  201805312355   
44918       44938  201805312356   
44919       44939  201805312357   
44920       44940  201805312358   
44921       44941  201805312358   

                                                  REPORT        TURBULENCE  \
0      LNY UA /OV 15W LNY/TM 0000/FL035/TP PA31/SK SC...               NaN   
1      LFT UA /OV LFT180020 /TM 0000 /FL140 /TP E145 ...               NaN   
2      LFT UA /OV LFT180020 /TM 0000 /FL140 /TP E145 ...               NaN   
3      CLL UA /OV CLL /TM 0000 /FL250 /TP B737 /TB MO...  MOD CHOP 250-210   
4      CLL UA /OV CLL /TM 0000 /FL250 /TP B737 /TB MO...  MOD CHOP 250-210   
...                                                  ...               ...   
44917  GTF UA /OV GTF198042/TM 23

In [5]:
from pathlib import Path

output_dir = Path('./pirep_downloads')
df_all.to_csv(output_dir / "all_pireps.csv")