# CSCI 490 Assignment 3
#### Instructor: Dr. David Koop
#### Programmer: Dominykas Karalius
#### Due at 11:59pm on March 3rd, Tuesday
#### Z1809478


In [259]:
import os
from urllib.request import urlretrieve

url = "https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2018-120319.txt"
local_fname = "hurdat2.txt"
if not os.path.exists("hurdat2.txt"):
        urlretrieve(url, local_fname)

# a. Reading Data and Naming Columns (10 pts)

In [260]:
import pandas as pd
import numpy as np

df = pd.read_csv('hurdat2.txt',names=['date','time','record_id','status','latitude','longitude','max_wind','min_pressure'])

In [261]:
df

Unnamed: 0,date,time,record_id,status,latitude,longitude,max_wind,min_pressure
0,AL011851,UNNAMED,14,,,,,
1,18510625,0000,,HU,28.0N,94.8W,80.0,-999.0
2,18510625,0600,,HU,28.0N,95.4W,80.0,-999.0
3,18510625,1200,,HU,28.0N,96.0W,80.0,-999.0
4,18510625,1800,,HU,28.1N,96.5W,80.0,-999.0
...,...,...,...,...,...,...,...,...
53214,20181103,1200,,EX,57.9N,19.6W,55.0,960.0
53215,20181103,1800,,EX,58.9N,17.1W,50.0,964.0
53216,20181104,0000,,EX,59.8N,14.5W,45.0,968.0
53217,20181104,0600,,EX,60.8N,12.1W,40.0,973.0


## b. Extract and Fill Identifiers (15 pts)
Clearly, we still have an issue in that the rows that are hurricane identifiers are mixed in with rows that have tracking information. We want this information to be attached to each tracking point row. 

This involves four steps:

1.Extract out those rows with the identifier and name, and put them in new columns named named identifier and name.<br>
2.Fill in this information for the tracking points<br>
3.Delete the rows that just contain identifier information<br>
4.Move the new columns to the front of the data frame.<br>

In [262]:
df['identifier'] = df['date'].str.extract(pat = '([A-Z].......)').fillna(method='ffill')
df['name'] = df['time'].str.extract(pat = '([A-Z].*)').fillna(method='ffill')
df['date'] = df['date'].str.extract(pat = '([0-9].......)')
df['record_id'] = df['record_id'].replace(r'^\s*$', np.nan, regex=True)
df = df.dropna(subset=['date'])
df = df[['identifier', 'name', 'date', 'time', 'record_id', 'status', 'latitude', 'longitude', 'max_wind', 'min_pressure']]
df

Unnamed: 0,identifier,name,date,time,record_id,status,latitude,longitude,max_wind,min_pressure
1,AL011851,UNNAMED,18510625,0000,,HU,28.0N,94.8W,80.0,-999.0
2,AL011851,UNNAMED,18510625,0600,,HU,28.0N,95.4W,80.0,-999.0
3,AL011851,UNNAMED,18510625,1200,,HU,28.0N,96.0W,80.0,-999.0
4,AL011851,UNNAMED,18510625,1800,,HU,28.1N,96.5W,80.0,-999.0
5,AL011851,UNNAMED,18510625,2100,L,HU,28.2N,96.8W,80.0,-999.0
...,...,...,...,...,...,...,...,...,...,...
53214,AL162018,OSCAR,20181103,1200,,EX,57.9N,19.6W,55.0,960.0
53215,AL162018,OSCAR,20181103,1800,,EX,58.9N,17.1W,50.0,964.0
53216,AL162018,OSCAR,20181104,0000,,EX,59.8N,14.5W,45.0,968.0
53217,AL162018,OSCAR,20181104,0600,,EX,60.8N,12.1W,40.0,973.0


# c. Replace Missing Value Placeholders (10 pts)
We wish to replace hurricane names of UNNAMED, the max_wind values of -99, and the minimum pressure values of -999 with NaN (np.nan).

In [263]:
df['name'] = df['name'].replace('UNNAMED', np.nan, regex=True)
df['max_wind'] = df['max_wind'].replace(-99.0, np.nan, regex=True)
df['min_pressure'] = df['min_pressure'].replace(-999.0, np.nan, regex=True)
df

Unnamed: 0,identifier,name,date,time,record_id,status,latitude,longitude,max_wind,min_pressure
1,AL011851,,18510625,0000,,HU,28.0N,94.8W,80.0,
2,AL011851,,18510625,0600,,HU,28.0N,95.4W,80.0,
3,AL011851,,18510625,1200,,HU,28.0N,96.0W,80.0,
4,AL011851,,18510625,1800,,HU,28.1N,96.5W,80.0,
5,AL011851,,18510625,2100,L,HU,28.2N,96.8W,80.0,
...,...,...,...,...,...,...,...,...,...,...
53214,AL162018,OSCAR,20181103,1200,,EX,57.9N,19.6W,55.0,960.0
53215,AL162018,OSCAR,20181103,1800,,EX,58.9N,17.1W,50.0,964.0
53216,AL162018,OSCAR,20181104,0000,,EX,59.8N,14.5W,45.0,968.0
53217,AL162018,OSCAR,20181104,0600,,EX,60.8N,12.1W,40.0,973.0


# d. Create a timestamp (10 pts)
Right now, we have two columns for date and time. This makes it difficult to calculate the amount of time between two different hurricane tracking points. If we convert them to a timestamp, such calculations are easy. To do this, we can use pandas’ to_datetime method. This method can convert from strings to timestamps. In our case, if we concatenate the date and time columns, and feed the concatenated series to to_datetime, things should work. Add the new column named as datetime and remove the old date and time columns. Move this column to appear after the name column.

In [264]:
df['datetime'] = df['date'].str.cat(df['time'],sep=" ")
df

Unnamed: 0,identifier,name,date,time,record_id,status,latitude,longitude,max_wind,min_pressure,datetime
1,AL011851,,18510625,0000,,HU,28.0N,94.8W,80.0,,18510625 0000
2,AL011851,,18510625,0600,,HU,28.0N,95.4W,80.0,,18510625 0600
3,AL011851,,18510625,1200,,HU,28.0N,96.0W,80.0,,18510625 1200
4,AL011851,,18510625,1800,,HU,28.1N,96.5W,80.0,,18510625 1800
5,AL011851,,18510625,2100,L,HU,28.2N,96.8W,80.0,,18510625 2100
...,...,...,...,...,...,...,...,...,...,...,...
53214,AL162018,OSCAR,20181103,1200,,EX,57.9N,19.6W,55.0,960.0,20181103 1200
53215,AL162018,OSCAR,20181103,1800,,EX,58.9N,17.1W,50.0,964.0,20181103 1800
53216,AL162018,OSCAR,20181104,0000,,EX,59.8N,14.5W,45.0,968.0,20181104 0000
53217,AL162018,OSCAR,20181104,0600,,EX,60.8N,12.1W,40.0,973.0,20181104 0600


In [265]:
df['datetime'] = df['datetime'].str[:4] + "-" + df['datetime'].str[4:6] + "-" + df['datetime'].str[6:8] + " " + df['datetime'].str[10:12] + ":" + df['datetime'].str[12:] + ":00" 
df.drop(columns=['date', 'time'])

df = df[['identifier', 'name', 'datetime', 'record_id', 'status', 'latitude', 'longitude', 'max_wind', 'min_pressure']]
df

Unnamed: 0,identifier,name,datetime,record_id,status,latitude,longitude,max_wind,min_pressure
1,AL011851,,1851-06-25 00:00:00,,HU,28.0N,94.8W,80.0,
2,AL011851,,1851-06-25 06:00:00,,HU,28.0N,95.4W,80.0,
3,AL011851,,1851-06-25 12:00:00,,HU,28.0N,96.0W,80.0,
4,AL011851,,1851-06-25 18:00:00,,HU,28.1N,96.5W,80.0,
5,AL011851,,1851-06-25 21:00:00,L,HU,28.2N,96.8W,80.0,
...,...,...,...,...,...,...,...,...,...
53214,AL162018,OSCAR,2018-11-03 12:00:00,,EX,57.9N,19.6W,55.0,960.0
53215,AL162018,OSCAR,2018-11-03 18:00:00,,EX,58.9N,17.1W,50.0,964.0
53216,AL162018,OSCAR,2018-11-04 00:00:00,,EX,59.8N,14.5W,45.0,968.0
53217,AL162018,OSCAR,2018-11-04 06:00:00,,EX,60.8N,12.1W,40.0,973.0
