# Data Preprocessing - The NHC Cyclones Dataset

This is a data preprocessing project involving a dataset of observation records of Atlantic tropical cyclones released by the National Hurricane Center. The objective is to transform the original dataset into a "tidy one" (as defined by Hadley Wickham) that meets the following criteria: 1) Only include data from 2004-2017 and were observed in the continental U.S.; 2) Drop observations of tropical cyclones that never made landfalls or never became hurricanes.

First load the necessary libraries.

In [1]:
import numpy as np # library for working with matrices / also supports many different math tools
import pandas as pd # library for data wrangling and analysis

In [2]:
cyclones = pd.read_csv("https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2019-052520.txt") # read in the data

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
cyclones.head(20)

Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3.1,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14,Unnamed: 15,Unnamed: 16,AL011851,UNNAMED,14,Unnamed: 3
18510625,0000,,HU,28.0N,94.8W,80.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510625,0600,,HU,28.0N,95.4W,80.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510625,1200,,HU,28.0N,96.0W,80.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510625,1800,,HU,28.1N,96.5W,80.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510625,2100,L,HU,28.2N,96.8W,80.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510626,0000,,HU,28.2N,97.0W,70.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510626,0600,,TS,28.3N,97.6W,60.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510626,1200,,TS,28.4N,98.3W,60.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510626,1800,,TS,28.6N,98.9W,50.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,
18510627,0000,,TS,29.0N,99.4W,50.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,-999.0,


The specific documentation of the dataset can be found here: https://www.nhc.noaa.gov/data/hurdat/hurdat2-format-nov2019.pdf


Below is a simple summary:

Each row contains several relevant information regarding the status of a tropical cyclone event that occured in the Atlantic basin on a specific date and time.

Each record of a cyclone event comes with two data lines - the header line and the data lines. The header line gives the very basic information of a cyclone event (i.e. name/year) and the data lines provide six-hours-basis status updates (in most times).

The header line looks something like this:

AL092011, IRENE, 39

Breakdown:

AL - Basin(Atlantic)

09 – Unique ATCF cyclone number for that year

2011 - Year

IRENE – Name, if available, or else “UNNAMED” 

39 - Number of 'status records' for the cyclone


The data lines would look something like this:

20110828, 0935, L, TS, 39.4N, 74.4W, 60, 959, 230, 280, 160, 110, 150, 150, 80, 30, 0, 0, 0, 0

Breakdown:

2011 – Year

08 – Month

28 – Day

09 – Hours in UTC/GMT

35 – Minutes

L – Record identifier (whether or not the cyclone makes landfall or to indicate indicate the reason for inclusion of a record not at the standard synoptic times (0000, 0600, 1200, and 1800 UTC).)

TS - Type of status (for instance, TD – Tropical cyclone of tropical depression intensity (< 34 knots) / TS – Tropical cyclone of tropical storm intensity (34-63 knots), etc.)

39.4N – Latitude

74.4W – Longitude

60 – Maximum sustained wind (in knots)

959 – Minimum Pressure (in millibars)

230 – 34 kt wind radii maximum extent in northeastern quadrant (in nautical miles) 

280 – 34 kt wind radii maximum extent in southeastern quadrant (in nautical miles) 

160 – 34 kt wind radii maximum extent in southwestern quadrant (in nautical miles) 

110 – 34 kt wind radii maximum extent in northwestern quadrant (in nautical miles) 

150 – 50 kt wind radii maximum extent in northeastern quadrant (in nautical miles) 

150 – 50 kt wind radii maximum extent in southeastern quadrant (in nautical miles) 

80 – 50 kt wind radii maximum extent in southwestern quadrant (in nautical miles) 

30 – 50 kt wind radii maximum extent in northwestern quadrant (in nautical miles) 

0 – 64 kt wind radii maximum extent in northeastern quadrant (in nautical miles)

0 – 64 kt wind radii maximum extent in southeastern quadrant (in nautical miles)

0 – 64 kt wind radii maximum extent in southwestern quadrant (in nautical miles) 

0 – 64 kt wind radii maximum extent in northwestern quadrant (in nautical miles)

# Data Preprocessing

We are only interested in observations recorded between 2004 and 2017.

In [4]:
cyclones = cyclones.reset_index() # first convert all indices to columns
cyclones = cyclones[cyclones["level_0"].str.match("200[4-9][0-9]{4}|201[0-7][0-9]{4}|AL[0-9]{2}200[4-9]|AL[0-9]{2}201[0-7]")] # only include 2004-2017

In [5]:
cyclones.head(10)

Unnamed: 0,level_0,level_1,level_2,level_3,level_4,level_5,level_6,level_7,level_8,level_9,...,level_11,level_12,level_13,level_14,level_15,level_16,AL011851,UNNAMED,14,Unnamed: 3
45166,AL012004,ALEX,25.0,,,,,,,,...,,,,,,,,,,
45167,20040731,1800,,TD,30.3N,78.3W,25.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
45168,20040801,0000,,TD,31.0N,78.8W,25.0,1009.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
45169,20040801,0600,,TD,31.5N,79.0W,25.0,1009.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
45170,20040801,1200,,TD,31.6N,79.1W,30.0,1009.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
45171,20040801,1800,,TS,31.6N,79.2W,35.0,1009.0,0.0,50.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
45172,20040802,0000,,TS,31.5N,79.3W,35.0,1007.0,0.0,50.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
45173,20040802,0600,,TS,31.4N,79.4W,40.0,1005.0,60.0,90.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
45174,20040802,1200,,TS,31.3N,79.0W,50.0,992.0,75.0,90.0,...,20.0,30.0,30.0,0.0,0.0,0.0,0.0,0.0,0.0,
45175,20040802,1800,,TS,31.8N,78.7W,50.0,993.0,75.0,90.0,...,30.0,30.0,30.0,20.0,20.0,0.0,0.0,0.0,0.0,


It would be better if there are separate columns for names (i.e. 'ALEX') and unique labels (i.e. 'AL012004').

In [6]:
# header lines 
headers = cyclones[cyclones["level_1"].str.match("\s+[A-Z]+")]

# new column for names
n1 = headers["level_1"]*headers["level_2"].astype(int)
n2 = list(n1.str.split(" "))

import itertools
n3 = list(itertools.chain(*n2))
while "" in n3: 
    n3.remove("")

# new column for unique labels
u1 = (headers["level_0"] + " ")*headers["level_2"].astype(int)
u2 = list(u1.str.split(" "))
u3 = list(itertools.chain(*u2))
while "" in u3:
    u3.remove("")

In [7]:
# drop all header lines
cyclones.drop(headers.index, inplace = True)

cyclones["name"] = pd.Series(n3, index = cyclones.index) # add the new "name" column to the dataset
cyclones["unique_code"] = pd.Series(u3, index = cyclones.index) # add the new "unique_code" column to the dataset

cyclones.head(10)

Unnamed: 0,level_0,level_1,level_2,level_3,level_4,level_5,level_6,level_7,level_8,level_9,...,level_13,level_14,level_15,level_16,AL011851,UNNAMED,14,Unnamed: 3,name,unique_code
45167,20040731,1800,,TD,30.3N,78.3W,25.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45168,20040801,0,,TD,31.0N,78.8W,25.0,1009.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45169,20040801,600,,TD,31.5N,79.0W,25.0,1009.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45170,20040801,1200,,TD,31.6N,79.1W,30.0,1009.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45171,20040801,1800,,TS,31.6N,79.2W,35.0,1009.0,0.0,50.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45172,20040802,0,,TS,31.5N,79.3W,35.0,1007.0,0.0,50.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45173,20040802,600,,TS,31.4N,79.4W,40.0,1005.0,60.0,90.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45174,20040802,1200,,TS,31.3N,79.0W,50.0,992.0,75.0,90.0,...,30.0,0.0,0.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45175,20040802,1800,,TS,31.8N,78.7W,50.0,993.0,75.0,90.0,...,30.0,20.0,20.0,0.0,0.0,0.0,0.0,,ALEX,AL012004
45176,20040803,0,,TS,32.4N,78.2W,60.0,987.0,75.0,90.0,...,30.0,20.0,20.0,0.0,0.0,0.0,0.0,,ALEX,AL012004


The above new dataset now contains the columns "name" and "unique_code". 

Next - drop all observations for tropical cyclones that never made landfall (i.e. if a cyclone at some point made landfall, then include all status rows for that cyclone.)

Here the "level_2" column contains all the information for this.

In [8]:
# first create a temporary boolean column indicating whether the row has the value "L" for "level_2"
cyclones["landfall"] = cyclones["level_2"] == " L"

# use groupby aggregation to determine unique codes that have 0 count of "L"
l_or_not = cyclones.groupby("unique_code")["landfall"].agg(np.sum)
l = list(l_or_not[l_or_not > 0].index) # unique_codes to keep

In [9]:
def is_lf(code):
    if code in l:
        return True
    else:
        return False

cyclones = cyclones[cyclones["unique_code"].apply(is_lf)]
cyclones.head(10) # only cyclones that made landfalls

Unnamed: 0,level_0,level_1,level_2,level_3,level_4,level_5,level_6,level_7,level_8,level_9,...,level_14,level_15,level_16,AL011851,UNNAMED,14,Unnamed: 3,name,unique_code,landfall
45193,20040803,1200,,TD,12.9N,53.6W,25.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45194,20040803,1800,,TD,13.2N,55.4W,25.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45195,20040804,0,,TD,13.5N,57.4W,30.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45196,20040804,600,,TD,13.6N,59.5W,30.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45197,20040804,1200,,TD,13.6N,61.6W,30.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45198,20040804,1800,,WV,13.7N,63.7W,30.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45199,20040805,0,,WV,14.0N,65.7W,25.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45200,20040805,600,,WV,14.9N,67.7W,25.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45201,20040805,1200,,WV,16.0N,69.7W,25.0,1011.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False
45202,20040805,1800,,WV,16.5N,71.5W,25.0,1011.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,,BONNIE,AL022004,False


All cyclones that never made landfalls have been dropped. Lastly, drop cyclones that never became hurricanes. The "level_3" column has this information ("HU" indicates hurricane status).

In [10]:
# first create a temporary boolean column indicating whether the row has the value "HU" for "level_3"
cyclones["hurricane"] = cyclones["level_3"] == " HU"

# use groupby aggregation to determine unique codes that have 0 count of "HU"
h_or_not = cyclones.groupby("unique_code")["hurricane"].agg(np.sum)
h = list(h_or_not[h_or_not > 0].index) # unique_codes to keep

In [11]:
def is_hu(code):
    if code in h:
        return True
    else:
        return False

cyclones = cyclones[cyclones["unique_code"].apply(is_hu)]
cyclones.head(10) # only cyclones that became hurricanes

Unnamed: 0,level_0,level_1,level_2,level_3,level_4,level_5,level_6,level_7,level_8,level_9,...,level_15,level_16,AL011851,UNNAMED,14,Unnamed: 3,name,unique_code,landfall,hurricane
45238,20040809,1200,,TD,11.4N,59.2W,30.0,1010.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45239,20040809,1800,,TD,11.7N,61.1W,30.0,1009.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45240,20040810,0,,TD,12.2N,63.2W,30.0,1009.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45241,20040810,600,,TS,12.9N,65.3W,35.0,1007.0,75.0,0.0,...,0.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45242,20040810,1200,,TS,13.8N,67.6W,40.0,1004.0,90.0,50.0,...,0.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45243,20040810,1800,,TS,14.9N,69.8W,45.0,1000.0,90.0,50.0,...,0.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45244,20040811,0,,TS,15.6N,71.8W,55.0,999.0,90.0,50.0,...,0.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45245,20040811,600,,TS,16.0N,73.7W,55.0,999.0,90.0,50.0,...,0.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45246,20040811,1200,,TS,16.3N,75.4W,60.0,995.0,90.0,75.0,...,40.0,0.0,0.0,0.0,0.0,,CHARLEY,AL032004,False,False
45247,20040811,1800,,HU,16.7N,76.8W,65.0,993.0,90.0,75.0,...,40.0,25.0,0.0,0.0,25.0,,CHARLEY,AL032004,False,True


In [12]:
# drop unwanted columns and change column names
cyclones.rename(columns = {"level_0": "date", "level_1": "UTCtime", "level_3": "status", "level_4": "latitude", "level_5": "longitude", "level_6": "maximum_wind_s", "level_7": "min_pressure", "level_8": "ne34", "level_9": "se34", "level_10": "sw34", "level_11": "nw34", "level_12": "ne50", "level_13": "se50", "level_14": "sw50", "level_15": "nw50", "level_16": "ne64", "AL011851": "se64", "            UNNAMED": "sw64", "     14": "nw64"}, inplace = True)
cyclones.drop(["level_2", "Unnamed: 3"], axis = 1, inplace = True)
cyclones.head(10)

Unnamed: 0,date,UTCtime,status,latitude,longitude,maximum_wind_s,min_pressure,ne34,se34,sw34,...,sw50,nw50,ne64,se64,sw64,nw64,name,unique_code,landfall,hurricane
45238,20040809,1200,TD,11.4N,59.2W,30.0,1010.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45239,20040809,1800,TD,11.7N,61.1W,30.0,1009.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45240,20040810,0,TD,12.2N,63.2W,30.0,1009.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45241,20040810,600,TS,12.9N,65.3W,35.0,1007.0,75.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45242,20040810,1200,TS,13.8N,67.6W,40.0,1004.0,90.0,50.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45243,20040810,1800,TS,14.9N,69.8W,45.0,1000.0,90.0,50.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45244,20040811,0,TS,15.6N,71.8W,55.0,999.0,90.0,50.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45245,20040811,600,TS,16.0N,73.7W,55.0,999.0,90.0,50.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45246,20040811,1200,TS,16.3N,75.4W,60.0,995.0,90.0,75.0,0.0,...,0.0,40.0,0.0,0.0,0.0,0.0,CHARLEY,AL032004,False,False
45247,20040811,1800,HU,16.7N,76.8W,65.0,993.0,90.0,75.0,0.0,...,0.0,40.0,25.0,0.0,0.0,25.0,CHARLEY,AL032004,False,True


Above dataset now contains only the **cyclones that made at least one landfall AND was a hurricane once**.

It would be very useful to match each set of geo coordinates to the corresponding U.S county, if any. For this I have used the GeoPy library.

In [13]:
# Modify the "longitude column"
west = cyclones[cyclones["longitude"].str.contains("W")].index
cyclones.loc[west, "longitude"] = "-" + cyclones.loc[west, "longitude"].str.replace("W", "").str.replace(" ", "")
east = cyclones[cyclones["longitude"].str.contains("E")].index
cyclones.loc[east, "longitude"] = cyclones.loc[east, "longitude"].str.replace("E", "").str.replace(" ", "")

# Modify the "latitude column"
north = cyclones[cyclones["latitude"].str.contains("N")].index
cyclones.loc[north, "latitude"] = cyclones.loc[north, "latitude"].str.replace("N", "").str.replace(" ", "")
south = cyclones[cyclones["latitude"].str.contains("S")].index
cyclones.loc[south, "latitude"] = "-" + cyclones.loc[south, "latitude"].str.replace("S", "").str.replace(" ", "")

In [14]:
# library for geocoding 
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent = "http")

# when given a set of geo coordinates return the county and state in question
def county_state(cset):
    try:
        location = geolocator.reverse(cset)
        return location.raw["address"]["county"] + ", " + location.raw["address"]["state"] + ", " + location.raw["address"]["country"]
    except:
        return "None, None, None"

In [15]:
# match location for every set
gset = cyclones["latitude"] + ", " + cyclones["longitude"]
gset = gset.str.replace(" ", "")
cs = gset.apply(county_state)

In [16]:
# create new columns for county, state, and country
cyclones["county"] = cs.str.split(",").str[0]
cyclones["state"] = cs.str.split(",").str[1]
cyclones["country"] = cs.str.split(",").str[2]

Only include observations made in the continental U.S. The continental United States is a region including the Washington D.C. and all states except for Hawaii.

In [17]:
cyclones = cyclones[cyclones["country"] == " United States of America"]
cyclones.drop(cyclones[cyclones["state"] == " Puerto Rico"].index, inplace = True)
cyclones.drop(cyclones[cyclones["state"] == " United States Virgin Islands"].index, inplace = True)

Just one more thing: it would be nice if there is a one-sentence synopsis for each observation - so let's do that.

In [18]:
# changing the formats for date and time
cyclones["date"] = cyclones["date"].str[0:4] + "-" + cyclones["date"].str[4:6] + "-" + cyclones["date"].str[6:8]
cyclones["UTCtime"] = cyclones["UTCtime"].str.replace(" ", "")
cyclones["UTCtime"] = cyclones["UTCtime"].str[0:2] + ":" + cyclones["UTCtime"].str[2:4]

# summary
cyclones["summary"] = "Cyclone " + cyclones["name"] + " was in " + cyclones["county"] + "," + cyclones["state"] + " on " + cyclones["date"] + "."
cyclones.loc[cyclones["status"] == " HU", "summary"] = cyclones.loc[cyclones["status"] == " HU", "summary"].str.replace("Cyclone", "Hurricane")

In [19]:
cyclones.head(10)

Unnamed: 0,date,UTCtime,status,latitude,longitude,maximum_wind_s,min_pressure,ne34,se34,sw34,...,sw64,nw64,name,unique_code,landfall,hurricane,county,state,country,summary
45257,2004-08-13,19:45,HU,26.6,-82.2,130.0,941.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,CHARLEY,AL032004,True,True,Lee County,Florida,United States of America,"Hurricane CHARLEY was in Lee County, Florida o..."
45258,2004-08-13,20:45,HU,26.9,-82.1,125.0,942.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,CHARLEY,AL032004,True,True,Charlotte County,Florida,United States of America,"Hurricane CHARLEY was in Charlotte County, Flo..."
45259,2004-08-14,00:00,HU,28.1,-81.6,75.0,970.0,40.0,75.0,40.0,...,20.0,10.0,CHARLEY,AL032004,False,True,Polk County,Florida,United States of America,"Hurricane CHARLEY was in Polk County, Florida ..."
45262,2004-08-14,14:00,HU,33.0,-79.4,70.0,992.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,CHARLEY,AL032004,True,True,Charleston County,South Carolina,United States of America,"Hurricane CHARLEY was in Charleston County, So..."
45263,2004-08-14,16:00,HU,33.8,-78.7,65.0,997.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,CHARLEY,AL032004,True,True,Horry County,South Carolina,United States of America,"Hurricane CHARLEY was in Horry County, South C..."
45264,2004-08-14,18:00,TS,34.5,-78.1,60.0,1000.0,75.0,75.0,30.0,...,0.0,0.0,CHARLEY,AL032004,False,False,Pender County,North Carolina,United States of America,"Cyclone CHARLEY was in Pender County, North Ca..."
45265,2004-08-15,00:00,EX,36.9,-75.9,40.0,1012.0,75.0,75.0,20.0,...,0.0,0.0,CHARLEY,AL032004,False,False,Virginia Beach (city),Virginia,United States of America,"Cyclone CHARLEY was in Virginia Beach (city), ..."
45377,2004-09-05,04:30,HU,27.2,-80.2,90.0,960.0,-999.0,-999.0,-999.0,...,-999.0,-999.0,FRANCES,AL062004,True,True,Martin County,Florida,United States of America,"Hurricane FRANCES was in Martin County, Florid..."
45378,2004-09-05,06:00,HU,27.2,-80.2,90.0,960.0,175.0,160.0,125.0,...,40.0,60.0,FRANCES,AL062004,False,True,Martin County,Florida,United States of America,"Hurricane FRANCES was in Martin County, Florid..."
45379,2004-09-05,12:00,HU,27.4,-80.7,80.0,969.0,175.0,150.0,125.0,...,30.0,30.0,FRANCES,AL062004,False,True,Okeechobee County,Florida,United States of America,"Hurricane FRANCES was in Okeechobee County, Fl..."


And there we have it. Below are the brief descriptions of the columns:

date - date of observation, year-month-day

UTCtime - UTC(GMT) time of observation

status - status type (i.e. is it hurricane or not)

latitude - latitude of the observation

longitude - longitude of the observation

maximum_wind_s - maximum sustained wind, in knots

min_pressure - minimum pressure, in millibars

ne34 – 34 kt wind radii maximum extent in northeastern quadrant (in nautical miles) 

se34 – 34 kt wind radii maximum extent in southeastern quadrant (in nautical miles) 

sw34 – 34 kt wind radii maximum extent in southwestern quadrant (in nautical miles) 

nw34 – 34 kt wind radii maximum extent in northwestern quadrant (in nautical miles) 

ne50 – 50 kt wind radii maximum extent in northeastern quadrant (in nautical miles) 

se50 – 50 kt wind radii maximum extent in southeastern quadrant (in nautical miles) 

sw50 – 50 kt wind radii maximum extent in southwestern quadrant (in nautical miles) 

nw50 – 50 kt wind radii maximum extent in northwestern quadrant (in nautical miles) 

ne64 – 64 kt wind radii maximum extent in northeastern quadrant (in nautical miles)

se64 – 64 kt wind radii maximum extent in southeastern quadrant (in nautical miles)

sw64 – 64 kt wind radii maximum extent in southwestern quadrant (in nautical miles) 

nw64 – 64 kt wind radii maximum extent in northwestern quadrant (in nautical miles)

name - name of the tropical cyclone

unique_code - unique label of the cyclone indicating its basin, number of the year, and year

landfall - whether the cyclone made landfall or not

hurricane - whether it was a hurricane or not 

county - county at the location

state - state at the location

country - country at the location

summary - brief one-sentence summary of the event

In [20]:
# export the dataset
cyclones.to_csv("/Users/Jun/Desktop/cyclones_cleaned.csv")