# Cleaning Typhoon and Hurricane Data

### This script will generate a nicely tidied data frame containing all the hurricanes and typhoons which occur in the vicinity of our weather stations.

The initial dataset from this challenge comes from https://www.kaggle.com/noaa/hurricane-database. This dataset consists of all hurricanes and typhoons in the Atlantic and Pacific. As we're interested here in the events occuring around Florida, we are going to have to trim the dataset considerably. Fairly arbitrarily we remove all hurricanes outside the box shown below

![title](pics/Region-of-Interest.png)

This is loosely bounded by the region 31.5° N to 22.5° N and 93.5° W to 70° W

In [82]:
import os
import pandas as pd

atlantic_df = pd.read_csv("atlantic.csv")
#atlantic_df.set_index("Date", inplace = True)
atlantic_df.head(5)

Unnamed: 0,ID,Name,Date,Time,Event,Status,Latitude,Longitude,Maximum Wind,Minimum Pressure,...,Low Wind SW,Low Wind NW,Moderate Wind NE,Moderate Wind SE,Moderate Wind SW,Moderate Wind NW,High Wind NE,High Wind SE,High Wind SW,High Wind NW
0,AL011851,UNNAMED,18510625,0,,HU,28.0N,94.8W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
1,AL011851,UNNAMED,18510625,600,,HU,28.0N,95.4W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
2,AL011851,UNNAMED,18510625,1200,,HU,28.0N,96.0W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
3,AL011851,UNNAMED,18510625,1800,,HU,28.1N,96.5W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
4,AL011851,UNNAMED,18510625,2100,L,HU,28.2N,96.8W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999


We begin by removing all events outside our dates of interest and those outside the region of interest shown on the map above

In [83]:
# remove data outside dates of interest
atlantic_df = atlantic_df[atlantic_df['Date'] > 20000101]
atlantic_df = atlantic_df[atlantic_df['Date'] < 20120101]

# remove data outside regions of interest
atlantic_df["Latitude"] = atlantic_df['Latitude'].map(lambda x: x.rstrip('NS')).astype(float) # Get rid of N suffix of latitude data
atlantic_df["Longitude"] = atlantic_df['Longitude'].map(lambda x: x.rstrip('WE')).astype(float) # Get rid of W suffix of longitude data
# set north and south bounds
atlantic_df = atlantic_df[atlantic_df['Latitude'] < 31.5]
atlantic_df = atlantic_df[atlantic_df['Latitude'] > 22.5]
# set east and west bounds
atlantic_df = atlantic_df[atlantic_df['Longitude'] < 93.5]
atlantic_df = atlantic_df[atlantic_df['Longitude'] > 70.0]


atlantic_df.head()



Unnamed: 0,ID,Name,Date,Time,Event,Status,Latitude,Longitude,Maximum Wind,Minimum Pressure,...,Low Wind SW,Low Wind NW,Moderate Wind NE,Moderate Wind SE,Moderate Wind SW,Moderate Wind NW,High Wind NE,High Wind SE,High Wind SW,High Wind NW
41257,AL042000,UNNAMED,20000808,1200,,TD,28.2,74.2,30,1011,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
41258,AL042000,UNNAMED,20000808,1800,,TD,28.1,75.1,30,1010,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
41259,AL042000,UNNAMED,20000809,0,,TD,28.0,76.0,30,1010,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
41260,AL042000,UNNAMED,20000809,600,,TD,27.9,76.8,30,1010,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
41261,AL042000,UNNAMED,20000809,1200,,TD,27.9,77.4,30,1010,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999


Now that we have removed the unnecessary data, we save this as a csv file called "cleaned_typhoon_data.csv". This will constitute our labels for the classifier

In [None]:
at