
# Question Description

Using Python, In this assignment, we will analyze the HURDAT2 dataset for Atlantic hurricane data from 1851 through 2017. This dataset is provided by the National Hurricane Center and is documented here. You will do some analysis of this data to answer some questions about it. I have provided code to organize this data, but you may feel free to improve this rudimentary organization. I have also provided functions that allow you to check your work. Note that you may choose to organize the cells as you wish, but you must properly label each problem’s code and its solution. Use a Markdown cell to add a header denoting your work for a particular problem.
You should start with the provided Jupyter Notebook, http://www.cis.umassd.edu/~dkoop/dsc201-2018fa/a1/a1.ipynb. Download this notebook (right-click to save the link) and upload it to your Jupyter workspace (on the JupyterHub server or your local notebook/lab). Make sure to execute the first two cells in the notebook (Shift+Enter). The second cell will download the data and define a variable records which consists of a list of tuples each with two entries:
a string with information about the hurricane and
a list of strings each of which is a tracking point for the hurricane
To access the fourth hurricane’s third tracking point, you would access records[3][1][2]. Remember indexing is zero-based! Thus [3] accesses the fourth hurricane, [1] accesses the list of tracking point strings, and [2] accesses the third tracking point.
In the provided file, I provided examples of how to check your work. For example, for Problem 1, you would call the check1 function with the number of hurricane names. After executing this function, you will see a message that indicates whether your answer is correct.

## 1. Number of Unique Hurricane Names (10 pts)

Write code that computes the number of unique hurricane names in the dataset. Note that UNNAMED is not a hurricane name.
Hints:
You will need to extract the name from the string in the first entry in the tuple
The split function for strings will be useful
The strip function will also be useful to trim whitespace
Consider using a set to keep track of all the names

## 2. Most Frequently Used Name (10 pts)

Write code that computes the most frequently used hurricane name. Again, UNNAMED does not count!
Hints:
collections.Counter() is a good structure to help with counting.
Clean up the strings in the same manner as in Problem 1.

## 3. Year with Most Hurricanes (10 pts)

Write code that computes the year with the most hurricanes.
Hints:
You can extract the year from the first entry in the tuple. It is the last four characters before the first comma.

## 4. Most Northerly Hurricane (10 pts)

Write code that computes the hurricane that went furthest north as measured by the greatest latitude. You need to find the name and the year of the hurricane.
Hints:
Check the documentation to find where the latitude is recorded.
You will need to go through the tracking points to check all of the latitude points recorded.
You need to keep track of three things: the maximum latitude seen so far plus the name of the corresponding hurricane and year
The latitude adds the N character to indicate the northern hemisphere. This needs to be removed to do numeric comparisons.
You can convert a string to a float or int by casting it. For example, float("81.5") returns a floating-point value of 81.5.

## 5. Hurricane with Maximum Sustained Wind (10 pts)

Write code that determines the hurricane with the highest sustained windspeed. You need to find the name, year, and wind speed for this hurricane.

In [110]:
# EXECUTE BUT DO NOT CHANGE THIS CELL!
# check function definitions

import hashlib
def check1(num_names):
    if (hashlib.sha256(str(num_names).encode('utf-8')).hexdigest() == 
        '23c657f2efda7731a3c1990b25f318fa2eb1332208f97ab9cc2a7eac70ab5a76'):
        print("PROBLEM 1 CORRECT")
    else:
        print("PROBLEM 1 INCORRECT")

def check2(top_name):
    if (hashlib.sha256(str(top_name).encode('utf-8')).hexdigest() == 
        '8f7489eb3c242628d0c9d99d769669340f961652e2f25e314c659c06aac73885'):
        print("PROBLEM 2 CORRECT")
    else:
        print("PROBLEM 2 INCORRECT")

def check3(top_year):
    if (hashlib.sha256(str(top_year).encode('utf-8')).hexdigest() == 
        'a20a2b7bb0842d5cf8a0c06c626421fd51ec103925c1819a51271f2779afa730'):
        print("PROBLEM 3 CORRECT")
    else:
        print("PROBLEM 3 INCORRECT")


def check4(northmost_name, northmost_year):
    if (hashlib.sha256((str(northmost_name) + str(northmost_year)).encode('utf-8')).hexdigest() == 
        '41bd369952039f0fd6c28982fe6f6fa9eb73ab884b04477a7580f5cfe33ecd0b'):
        print("PROBLEM 4 CORRECT")
    else:
        print("PROBLEM 4 INCORRECT")

def check5(top_wind_name, top_wind_year, top_wind_speed):
    if (hashlib.sha256((str(top_wind_name) + str(top_wind_year) + str(top_wind_speed)).encode('utf-8')).hexdigest() == 
        '0a6ce2c3bbf53522f329e5ff3724f6234dca8954b6c12a91c383b41cf15cc554'):
        print("PROBLEM 5 CORRECT")
    else:
        print("PROBLEM 5 INCORRECT")

In [111]:
import os
from urllib.request import urlretrieve

# download the data if we don't have it locally
url = "https://www.nhc.noaa.gov/data/hurdat/hurdat2-1851-2017-050118.txt"
local_fname = "hurdat2.txt"
if not os.path.exists("hurdat2.txt"):
    urlretrieve(url, local_fname)

# very primtive way of reading the data
# can be improved
records = []
with open(local_fname,'r') as f:
    for line in f:
        if line.startswith("AL"):
            record = line.strip()
            reports = []
            records.append((record, reports))
        else:
            reports.append(line.strip())
records[:3]

[('AL011851,            UNNAMED,     14,',
  ['18510625, 0000,  , HU, 28.0N,  94.8W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
   '18510625, 0600,  , HU, 28.0N,  95.4W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
   '18510625, 1200,  , HU, 28.0N,  96.0W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
   '18510625, 1800,  , HU, 28.1N,  96.5W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
   '18510625, 2100, L, HU, 28.2N,  96.8W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
   '18510626, 0000,  , HU, 28.2N,  97.0W,  70, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
   '18510626, 0600,  , TS, 28.3N,  97.6W,  60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
   '18510626, 1200,  , TS, 28.4N,  98.3W,  60, -999, -999, -999, -99

Each record is a tuple that consists of
1. a string with the hurricane info 
2. a list of track points where each point is as documented here(https://www.nhc.noaa.gov/data/hurdat/hurdat2-format-atlantic.pdf)

In [112]:
# the first record
records[0]

('AL011851,            UNNAMED,     14,',
 ['18510625, 0000,  , HU, 28.0N,  94.8W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
  '18510625, 0600,  , HU, 28.0N,  95.4W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
  '18510625, 1200,  , HU, 28.0N,  96.0W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
  '18510625, 1800,  , HU, 28.1N,  96.5W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
  '18510625, 2100, L, HU, 28.2N,  96.8W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
  '18510626, 0000,  , HU, 28.2N,  97.0W,  70, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
  '18510626, 0600,  , TS, 28.3N,  97.6W,  60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
  '18510626, 1200,  , TS, 28.4N,  98.3W,  60, -999, -999, -999, -999, -999, 

In [113]:
# records[0]'s hurricane info including identifier, name, and number of points
records[0][0]

'AL011851,            UNNAMED,     14,'

In [114]:
# records[0]'s list with all of the points
records[0][1]

['18510625, 0000,  , HU, 28.0N,  94.8W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
 '18510625, 0600,  , HU, 28.0N,  95.4W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
 '18510625, 1200,  , HU, 28.0N,  96.0W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
 '18510625, 1800,  , HU, 28.1N,  96.5W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
 '18510625, 2100, L, HU, 28.2N,  96.8W,  80, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
 '18510626, 0000,  , HU, 28.2N,  97.0W,  70, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
 '18510626, 0600,  , TS, 28.3N,  97.6W,  60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',
 '18510626, 1200,  , TS, 28.4N,  98.3W,  60, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999, -999,',


In [115]:
# print the years of the hurricanes
years = []
for record in records:
    first_entry = record[0].split(',')[0]
    year = first_entry[-4:]
    years.append(int(year))
years.sort()
print(set(years)) # unique years

{1851, 1852, 1853, 1854, 1855, 1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863, 1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871, 1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879, 1880, 1881, 1882, 1883, 1884, 1885, 1886, 1887, 1888, 1889, 1890, 1891, 1892, 1893, 1894, 1895, 1896, 1897, 1898, 1899, 1900, 1901, 1902, 1903, 1904, 1905, 1906, 1907, 1908, 1909, 1910, 1911, 1912, 1913, 1914, 1915, 1916, 1917, 1918, 1919, 1920, 1921, 1922, 1923, 1924, 1925, 1926, 1927, 1928, 1929, 1930, 1931, 1932, 1933, 1934, 1935, 1936, 1937, 1938, 1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 201

# Problem 1

In [116]:
# do your work for Problem 1 here
def total_hurricane_names(records):
    hurricane_names = []
    
    for record in records:
        hurricane_name = record[0].split(',')[1].strip()
        if hurricane_name != 'UNNAMED':  # dropping 'UNNAMED' hurricanes
            hurricane_names.append(hurricane_name)

    return hurricane_names 

print(len(set(total_hurricane_names(records))))

288


In [117]:
# check your solution
# check1(1023) # the total number of hurricanes in the dataset
check1(len(set(total_hurricane_names(records))))  # the total number of hurricanes in the dataset

PROBLEM 1 CORRECT


# Problem 2

In [118]:
# do your work for Problem 2 here
from collections import Counter

def most_frequent_hurricane(records):
    collection_hurricane = Counter(total_hurricane_names(records))
    return collection_hurricane.most_common(1)[0][0]  # most common hurricane name

print(most_frequent_hurricane(records))

ARLENE


In [119]:
# check your solution
# check2("DAVE") # the most frequent hurricane name
check2(most_frequent_hurricane(records)) # the most frequent hurricane name

PROBLEM 2 CORRECT


# Problem 3

In [120]:
# do your work for Problem 3 here

def year_with_most_hurricanes(records):
    years_of_hurricanes = [] 
    
    for record in records:
        hurricane_name = record[0].split(',')[1].strip()
        if hurricane_name != 'UNNAMED':  # dropping 'UNNAMED' hurricanes
            year = int(record[0].split(',')[0][-4:])
            years_of_hurricanes.append(year)

    collection_hurricane_years = Counter(years_of_hurricanes)
    return collection_hurricane_years.most_common(1)[0][0]

print(year_with_most_hurricanes(records))

2005


In [121]:
# check your solution
# check3(1912) # the year with the most hurricanes
check3(year_with_most_hurricanes(records)) # the year with the most hurricanes

PROBLEM 3 CORRECT


# Problem 4

In [122]:
# do your work for Problem 4 here
def most_northerly_hurricane(records):
    name = None
    year = None
    latitude = None
    name_year_latitude = []
    
    for record in records:
        name = record[0].split(',')[1].strip()
        if name != 'UNNAMED':  # dropping 'UNNAMED' hurricanes
            year = int(record[0].split(',')[0][-4:])
            tracking_point_list = record[1] # list of strings
            item_north_latitudes = [] # To collect all northern latitudes per hurricane
            for item in tracking_point_list:
                north_latitude = item.split(',')[4]
                item_north_latitudes.append(float(north_latitude.replace('N', '').strip()))
            name_year_latitude.append((name, year, max(item_north_latitudes)))
    
    all_max_north_latitudes = []
    for item in name_year_latitude:
        all_max_north_latitudes.append(item[2])
        
    for item in name_year_latitude:
        if item[2] == max(all_max_north_latitudes):
            name = item[0]
            year = item[1]
    return (name, year)

print(most_northerly_hurricane(records))      

('HOW', 1951)


In [123]:
# check your solution
# check4("DAVE", 1912) # the hurricane that reached the highest latitude and year

# Unpacking the tuple returned from function call to match positional arguments 
check4(*most_northerly_hurricane(records))  # the hurricane that reached the highest latitude and year

PROBLEM 4 CORRECT


# Problem 5

In [124]:
# do your work for Problem 5 here
def most_speedy_hurricane(records):
    name = None
    year = None
    speed = None
    name_year_speed = []
    
    for record in records:
        name = record[0].split(',')[1].strip()
        if name != 'UNNAMED':  # dropping 'UNNAMED' hurricanes
            year = int(record[0].split(',')[0][-4:])
            tracking_point_list = record[1] # list of strings
            item_speeds = [] # To collect all wind speeds per hurricane
            for item in tracking_point_list:
                wind_speed = item.split(',')[6]
                item_speeds.append(int(wind_speed.strip()))
            name_year_speed.append((name, year, max(item_speeds)))
    
    all_max_speeds = []
    for item in name_year_speed:
        all_max_speeds.append(item[2])
        
    for item in name_year_speed:
        if item[2] == max(all_max_speeds):
            name = item[0]
            year = item[1]
            speed = item[2]
    return (name, year, speed)

print(most_speedy_hurricane(records))   

('ALLEN', 1980, 165)


In [125]:
# check your solution
# check5("DAVE", 1912, 130) # the hurricane that had the highest sustained winds and year and wind speed
check5(*most_speedy_hurricane(records))  # the hurricane that had the highest sustained winds and year and wind speed

PROBLEM 5 CORRECT
