# Hurricane Analysis - Codecademy Project

This project is from the Codecademy Data Scientist Career Path and involves an analysis of hurricanes. For this project I will write several functions that organize and manipulate data about Category 5 Hurricanes, the strongest hurricanes as rated by their wind speed. Each one of these functions will use a number of parameters, conditionals, lists, dictionaries, string manipulation, and return statements.

The data for the project is contained in Data and set out in seperate lists for names of hurricanes, month and year they occurred, max sustained winds, areas affected, damages and deaths.  

In [29]:
from Data import *

### Cleaning the Damages List

The damages are recored in the format "100B", "100M" and "Damage not recorded", we want these values to be in float format not string format.

In [30]:
def clean_damages(damages):
    """ Takes a list of damages as an input and returns a list with the damages converted from string format to
        float format. """
    conversion = {"M": 1000000, "B": 1000000000}
    new_damages_list = []
    for data in damages: 
        if data.endswith("M"): 
            data_sr = data.removesuffix("M") 
            new_damages_list.append(float(data_sr) * conversion["M"])
        elif data.endswith("B"):
            data_sr = data.removesuffix("B")
            new_damages_list.append(float(data_sr) * conversion["B"])
        else:
            new_damages_list.append(data)
    return new_damages_list

In [31]:
new_damages = clean_damages(damages)
print(new_damages)

['Damages not recorded', 100000000.0, 'Damages not recorded', 40000000.0, 27900000.0, 5000000.0, 'Damages not recorded', 306000000.0, 2000000.0, 65800000.0, 326000000.0, 60300000.0, 208000000.0, 1420000000.0, 25400000.0, 'Damages not recorded', 1540000000.0, 1240000000.0, 7100000000.0, 10000000000.0, 26500000000.0, 6200000000.0, 5370000000.0, 23300000000.0, 1010000000.0, 125000000000.0, 12000000000.0, 29400000000.0, 1760000000.0, 720000000.0, 15100000000.0, 64800000000.0, 91600000000.0, 25100000000.0]


### Constucting a Dictionary of the Data

We want to place all the information into one variable, a dictionary will be used for this with the name of the hurricane as the key ("master") and a dictionary of the information as the value for that key.

In [41]:
def construct_dictionary(name, month, year, max_sustained_wind, areas_affected, damage, death):
    """ Constructs a dictionary made out of the lists, where the keys of the dictionary are the names of the 
        hurricanes, and the values are dictionaries themselves containing a key for each piece of data 
        (Name, Month, Year, Max Sustained Wind, Areas Affected, Damage, Death) about the hurricane."""
    new_dict = {}
    for i in range(len(name)):
        new_dict.update({name[i]: {"Name": name[i], "Month": month[i], "Year": year[i], 
                                   "Max Sustained Wind": max_sustained_wind[i], "Areas Affected": areas_affected[i],
                                   "Damage": damage[i], "Deaths": death[i]}})
    return new_dict

In [50]:
hurricanes_data = construct_dictionary(names, months, years, max_sustained_winds, areas_affected, new_damages, deaths)
print(hurricanes_data)

{'Cuba I': {'Name': 'Cuba I', 'Month': 'October', 'Year': 1924, 'Max Sustained Wind': 165, 'Areas Affected': ['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], 'Damage': 'Damages not recorded', 'Deaths': 90}, 'San Felipe II Okeechobee': {'Name': 'San Felipe II Okeechobee', 'Month': 'September', 'Year': 1928, 'Max Sustained Wind': 160, 'Areas Affected': ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], 'Damage': 100000000.0, 'Deaths': 4000}, 'Bahamas': {'Name': 'Bahamas', 'Month': 'September', 'Year': 1932, 'Max Sustained Wind': 160, 'Areas Affected': ['The Bahamas', 'Northeastern United States'], 'Damage': 'Damages not recorded', 'Deaths': 16}, 'Cuba II': {'Name': 'Cuba II', 'Month': 'November', 'Year': 1932, 'Max Sustained Wind': 175, 'Areas Affected': ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], 'Damage': 40000000.0, 'Deaths': 3103}, 'CubaBrownsville': {'Name': 'CubaBrownsville', 'Month': 'August', 

We might want to change the "master" key to a different data point (e.g. "Year), this next function will take care of that. 

In [43]:
def change_master_key(dict1, new_master_key):
    """ Takes dict1 and changes the initial key value to new_master_key. """
    new_dict = {}
    for key in dict1:
        new_key = dict1[key][new_master_key]
        current_cane = dict1[key]
        if new_key not in new_dict:
            new_dict[new_key] = [current_cane]
        else:
            new_dict[new_key].append(current_cane)
    return new_dict

In [51]:
hurricanes_data_year_first = change_master_key(hurricanes, "Year")
print(hurricanes_data_year_first)

{1924: [{'Name': 'Cuba I', 'Month': 'October', 'Year': 1924, 'Max Sustained Wind': 165, 'Areas Affected': ['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], 'Damage': 'Damages not recorded', 'Deaths': 90}], 1928: [{'Name': 'San Felipe II Okeechobee', 'Month': 'September', 'Year': 1928, 'Max Sustained Wind': 160, 'Areas Affected': ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], 'Damage': 100000000.0, 'Deaths': 4000}], 1932: [{'Name': 'Bahamas', 'Month': 'September', 'Year': 1932, 'Max Sustained Wind': 160, 'Areas Affected': ['The Bahamas', 'Northeastern United States'], 'Damage': 'Damages not recorded', 'Deaths': 16}, {'Name': 'Cuba II', 'Month': 'November', 'Year': 1932, 'Max Sustained Wind': 175, 'Areas Affected': ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], 'Damage': 40000000.0, 'Deaths': 3103}], 1933: [{'Name': 'CubaBrownsville', 'Month': 'August', 'Year': 1933, 'Max Sustained Wind': 160, 'Areas 

### What areas were affected?

We want to find out what areas were affected by hurricanes and how frequently they were affected.

In [52]:
def count_areas(data):
    """ Counts how often each area is listed as an affected area of a hurricane and returns the results in a 
        dictionary where the keys are the affected areas and the values are counts of how many times the areas 
        were affected."""
    count = []
    areas = []
    areas_from_data = []
    for cane in data:
        for area in data[cane]["Areas Affected"]:
            areas_from_data.append(area)
    for area in areas_from_data:
        if area not in areas:
            areas.append(area)
    count = [areas_from_data.count(area) for area in areas]
    area_count_dict = {area: cnt for area, cnt in zip(areas, count)}
    return area_count_dict

In [53]:
areas_affected = count_areas(hurricanes_data)
print(areas_affected)

{'Central America': 9, 'Mexico': 7, 'Cuba': 6, 'Florida': 6, 'The Bahamas': 7, 'Lesser Antilles': 4, 'United States East Coast': 3, 'Atlantic Canada': 3, 'Northeastern United States': 2, 'Jamaica': 4, 'Cayman Islands': 1, 'Bermuda': 2, 'Texas': 4, 'Tamaulipas': 1, 'Yucatn Peninsula': 3, 'Georgia': 1, 'The Carolinas': 1, 'Virginia': 1, 'Southeastern United States': 1, 'Southwestern Quebec': 1, 'New England': 1, 'Louisiana': 1, 'Midwestern United States': 1, 'The Caribbean': 8, 'United States Gulf Coast': 6, 'United States East coast': 1, 'South Texas': 1, 'Venezuela': 3, 'Hispaniola': 1, 'South Florida': 1, 'Greater Antilles': 2, 'Bahamas': 2, 'Eastern United States': 1, 'Ontario': 1, 'Windward Islands': 1, 'Nicaragua': 1, 'Honduras': 1, 'Antilles': 1, 'Colombia': 1, 'Cape Verde': 1, 'British Virgin Islands': 1, 'U.S. Virgin Islands': 1, 'Virgin Islands': 1, 'Puerto Rico': 1, 'Dominican Republic': 1, 'Turks and Caicos Islands': 1, 'United States Gulf Coast (especially Florida Panhandle)

Now that we know what areas were affected and how often they were, we can find the areas that were most frequently affected.

In [46]:
def area_most_affected(data):
    """ Counts the areas most affected by hurricanes and returns them in a dictionary, with the area affected as 
        the key and the frequency as the value. """
    vals = list(data.values())
    max_val = max(vals)
    most_affected_dict = {}
    for key in data:
        if data[key] == max_val:
            most_affected_dict[key] = max_val
    return most_affected_dict

In [55]:
area_affected_most = area_most_affected(areas_affected)
print(area_affected_most)

{'Central America': 9}


### Which hurricance caused the most deaths?

We want to find which hurricane caused the most deaths, so the next fucntion will deal with that.

In [56]:
def most_deaths(data):
    """ Returns the hurricane that caused the most deaths and the number of deaths the hurricane caused. """
    death_count_dict = {key: data[key]["Deaths"] for key in data}
    max_deaths = max(death_count_dict.values())
    for key in death_count_dict:
        if death_count_dict[key] == max_deaths:
            return key, death_count_dict[key]

In [57]:
area_most_deaths = most_deaths(hurricanes)
print(area_most_deaths)

('Mitch', 19325)


We now know what hurricane caused the most damage to life, but what about the others? We want to create a mortality scale best on the number of deaths caused by the hurricane. The scale will be rated from 0 - 5 based on the following bounds:
* 0 - 0 Deaths
* 1 - 0 $< $ Deaths $\leq $ 100
* 2 - 100 $< $ Deaths $\leq $ 500
* 3 - 500 $< $ Deaths $\leq $ 1000
* 4 - 1000 $< $ Deaths $\leq $ 10000
* 5 - Deaths $> $ 10000

In [59]:
def mortality_scale(data):
    """ Returns a dictionary of the names of hurricanes rated on a mortality scale where the mortality scale is the
        key. """
    mortality_dict = {0: [], 1: [], 2: [], 3: [], 4: [], 5: []}
    for key in hurricanes:
        if data[key]["Deaths"] == 0:
            mortality_dict[0].append(key)
        elif data[key]["Deaths"] > 0 and data[key]["Deaths"] <= 100:
            mortality_dict[1].append(key)
        elif data[key]["Deaths"] > 100 and data[key]["Deaths"] <= 500:
            mortality_dict[2].append(key)
        elif data[key]["Deaths"] > 500 and data[key]["Deaths"] <= 1000:
            mortality_dict[3].append(key)
        elif data[key]["Deaths"] > 1000 and data[key]["Deaths"] <=10000:
            mortality_dict[4].append(key)
        else:
            mortality_dict[5].append(key)
    return mortality_dict

In [61]:
mortality = mortality_scale(hurricanes)
print(mortality)

{0: [], 1: ['Cuba I', 'Bahamas', 'Carol', 'Carla', 'Edith', 'Anita', 'Andrew', 'Isabel', 'Emily', 'Wilma', 'Dean', 'Michael'], 2: ['CubaBrownsville', 'Tampico', 'Labor Day', 'Hattie', 'Camille', 'Allen', 'Gilbert', 'Hugo', 'Ivan', 'Rita', 'Felix', 'Irma'], 3: ['New England', 'Beulah', 'Matthew'], 4: ['San Felipe II Okeechobee', 'Cuba II', 'Janet', 'David', 'Katrina', 'Maria'], 5: ['Mitch']}


### Which hurricane caused the most damage?

Finally, we want to take a look at damages, we will first find which hurricane caused the most damage and the value of that damage.

In [62]:
def greatest_damage(data):
    """ Returns the hurricane that caused the most damage and the value of that damage in dollars. """
    name = ""
    damage = 0.0
    for key in data:
        if data[key]["Damage"] == "Damages not recorded": 
            continue
        elif data[key]["Damage"] > damage:
            name = key
            damage = data[key]["Damage"]
    return name, damage

In [63]:
most_damage = greatest_damage(hurricanes)
print(most_damage)

('Katrina', 125000000000.0)


Now to finish off, we will rate the hurricanes on a damage scale, similar to the mortality scale above where the bounds are: 
* 0 - \\$ 0 Damage
* 1 - \\$ 0 $<$ Damage $\leq$ \\$ 100,000,000
* 2 - \\$ 100,000,000 $<$ Damage $\leq$ \\$ 1,000,000,000
* 3 - \\$ 1,000,000,000 $<$ Damage $\leq$ \\$ 10,000,000,000
* 4 - \\$ 10,000,000,000 $<$ Damage $\leq$ \\$ 100,000,000,000
* 5 - Damage $>$ \\$ 100,000,000,000

In [64]:
def damage_scale(data):
    """ Returns a dictionary of the names of hurricanes rated on a damage scale where the damage scale is the
        key. """
    scale = {0: [], 1: [], 2: [], 3: [], 4: [], 5: []}
    for key in data:
        if data[key]["Damage"] == "Damages not recorded":
            continue
        elif data[key]["Damage"] == 0:
            scale[0].append(key)
        elif data[key]["Damage"] > 0 and data[key]["Damage"] <= 100000000:
            scale[1].append(key)
        elif data[key]["Damage"] > 100000000 and data[key]["Damage"] <= 1000000000:
            scale[2].append(key)
        elif data[key]["Damage"] > 1000000000 and data[key]["Damage"] <= 10000000000:
            scale[3].append(key)
        elif data[key]["Damage"] > 10000000000 and data[key]["Damage"] <= 100000000000:
            scale[4].append(key)
        else:
            scale[5].append(key)
    return scale

In [65]:
damage_scl = damage_scale(hurricanes)
print(damage_scl)

{0: [], 1: ['San Felipe II Okeechobee', 'Cuba II', 'CubaBrownsville', 'Tampico', 'Carol', 'Janet', 'Hattie', 'Edith'], 2: ['New England', 'Carla', 'Beulah', 'Felix'], 3: ['Camille', 'David', 'Allen', 'Gilbert', 'Hugo', 'Mitch', 'Isabel', 'Emily', 'Dean'], 4: ['Andrew', 'Ivan', 'Rita', 'Wilma', 'Matthew', 'Irma', 'Maria', 'Michael'], 5: ['Katrina']}
