## Week 5 Assignment

In this assignment we will be answering a series of questions related to flights data imported into pandas from a .csv.

In [1]:
import pandas as pd

After importing pandas, we will read the following .csv files into DataFrame objects:

In [8]:
airlines = pd.read_csv('https://raw.githubusercontent.com/tidyverse/nycflights13/main/data-raw/airlines.csv')
airports = pd.read_csv('https://raw.githubusercontent.com/tidyverse/nycflights13/main/data-raw/airports.csv')
planes = pd.read_csv('https://raw.githubusercontent.com/tidyverse/nycflights13/main/data-raw/planes.csv')
weather = pd.read_csv('https://raw.githubusercontent.com/tidyverse/nycflights13/main/data-raw/weather.csv')

Now we can begin analyzing the data to answer our questions.

**What is the northernmost airport in the United States?

In [22]:
airports.sort_values('lat', ascending=False).head()

Unnamed: 0,faa,name,lat,lon,alt,tz,dst,tzone
417,EEN,Dillant Hopkins Airport,72.270833,42.898333,149,-5,A,
230,BRW,Wiley Post Will Rogers Mem,71.285446,-156.766003,44,-9,A,America/Anchorage
110,AIN,Wainwright Airport,70.638056,-159.994722,41,-9,A,America/Anchorage
708,K03,Wainwright As,70.613378,-159.86035,35,-9,A,America/Anchorage
152,ATK,Atqasuk Edward Burnell Sr Memorial Airport,70.4673,-157.436,96,-9,A,America/Anchorage


The result shows us the top 5 rows in descending order.
However, the coordinates for Dillant Hopkins Airport are incorrect. We can remove it from the DataFrame by including a dropna function for the 'tzone' column, as shown below:

In [23]:
airports.dropna(subset='tzone').sort_values('lat', ascending=False).head(1)

Unnamed: 0,faa,name,lat,lon,alt,tz,dst,tzone
230,BRW,Wiley Post Will Rogers Mem,71.285446,-156.766003,44,-9,A,America/Anchorage


Here we limit the head function to display 1 row, which is our **northernmost airport**: **Wiley Post Will Rogers Memorial Airport in Alaska**.

Let's look at the next question:

**What is the easternmost airport in the United States?

In [47]:
airports.sort_values('lon', ascending=False).head()

Unnamed: 0,faa,name,lat,lon,alt,tz,dst,tzone
1290,SYA,Eareckson As,52.712275,174.11362,98,-9,A,America/Anchorage
942,MYF,Montgomery Field,32.4759,117.759,17,8,A,Asia/Chongqing
396,DVT,Deer Valley Municipal Airport,33.4117,112.457,1478,8,A,Asia/Chongqing
417,EEN,Dillant Hopkins Airport,72.270833,42.898333,149,-5,A,
444,EPM,Eastport Municipal Airport,44.910111,-67.012694,45,-5,A,America/New_York


We see that there is a problem if we use the same code as we did to calculate the northernmost airport.

The issue here is that the longitude column has both positive and negative values. Measuring in degrees west of the Prime Meridian, we want the lowest number in the longitude field. We can get this by first filtering out the positive numbers and then applying a sort: 

In [46]:
airports[airports.lon < 0].sort_values('lon', ascending=False).head(1)

Unnamed: 0,faa,name,lat,lon,alt,tz,dst,tzone
444,EPM,Eastport Municipal Airport,44.910111,-67.012694,45,-5,A,America/New_York


And here we can see that the **easternmost airport** in the United States is **Eastport Municipal Airport in Maine**.

The last question asks the following:
**On February 12th, 2013, which New York area airport had the windiest weather?

To answer this question we need to start with a conditional statement to filter results by the month of February and day of 12. The following code will only show rows which match *both* those conditions:

In [53]:
weather[(weather.month == 2) & (weather.day == 12)].sort_values('wind_speed', ascending=False).head()

Unnamed: 0,origin,year,month,day,hour,temp,dewp,humid,wind_dir,wind_speed,wind_gust,precip,pressure,visib,time_hour
1009,EWR,2013,2,12,3,39.02,26.96,61.63,260.0,1048.36058,,0.0,1008.3,10.0,2013-02-12T08:00:00Z
18417,LGA,2013,2,12,2,42.98,26.06,50.94,290.0,23.0156,31.07106,0.0,1007.1,10.0,2013-02-12T07:00:00Z
1018,EWR,2013,2,12,12,44.06,26.06,48.87,270.0,21.86482,31.07106,0.0,1012.5,10.0,2013-02-12T17:00:00Z
18428,LGA,2013,2,12,13,44.06,23.0,43.02,300.0,21.86482,25.31716,0.0,1011.7,10.0,2013-02-12T18:00:00Z
18429,LGA,2013,2,12,14,44.06,23.0,43.02,300.0,20.71404,25.31716,0.0,1011.5,10.0,2013-02-12T19:00:00Z


However, we see an issue with the output here. The wind speed at EWR is an outlier and seems to be erroneous, so we will remove it:

In [55]:
weather[(weather.month == 2) & (weather.day == 12)].dropna().sort_values('wind_speed', ascending=False).head(1)

Unnamed: 0,origin,year,month,day,hour,temp,dewp,humid,wind_dir,wind_speed,wind_gust,precip,pressure,visib,time_hour
18417,LGA,2013,2,12,2,42.98,26.06,50.94,290.0,23.0156,31.07106,0.0,1007.1,10.0,2013-02-12T07:00:00Z


Finally, we see that the **windiest airport** on February 12th, 2013, was **LaGuardia**.