# Python Fundamentals - Hurricane Project

## About the dataset
Data collected on the 34 strongest Atlantic hurricanes are provided in a series of lists. The data is organized such that the data at each index, from 0 to 33, corresponds to the same hurricane. The data includes:

### damages
Contains strings representing the total cost in USD caused by 34 category 5 hurricanes (wind speeds ≥ 157 mph (252 km/h )) in the Atlantic region. For some of the hurricanes, damage data was not recorded (`"Damages not recorded"`), while the rest are written in the format `"Prefix-B/M"`, where `B` stands for billions (1000000000) 10e9 and `M` stands for millions (1000000) 10e6.

### names
Names of the hurricanes

### months 
Months in which the hurricanes occurred

### years
Years in which the hurricanes occurred

### max_sustained_winds
Maximum sustained winds (miles per hour) of the hurricanes

### areas_affected
List of different areas affected by each of the hurricanes

### deaths
Total number of deaths caused by each of the hurricanes

### Source of dataset
* Dataset downloaded from [codecademy](https://content.codecademy.com/PRO/independent-practice-projects/hurricane-analysis/hurricane_analysis_starting.zip)

In [1]:
# damages (USD($)) of hurricanes
damages = ['Damages not recorded', '100M', 'Damages not recorded', '40M', '27.9M', '5M', 'Damages not recorded', '306M', '2M', '65.8M', '326M', '60.3M', '208M', '1.42B', '25.4M', 'Damages not recorded', '1.54B', '1.24B', '7.1B', '10B', '26.5B', '6.2B', '5.37B', '23.3B', '1.01B', '125B', '12B', '29.4B', '1.76B', '720M', '15.1B', '64.8B', '91.6B', '25.1B']
# alt damages for testing damages == 0
# damages = ['0', '100M', 'Damages not recorded', '40M', '27.9M', '5M', 'Damages not recorded', '306M', '2M', '65.8M', '326M', '60.3M', '208M', '1.42B', '25.4M', 'Damages not recorded', '1.54B', '1.24B', '7.1B', '10B', '26.5B', '6.2B', '5.37B', '23.3B', '1.01B', '125B', '12B', '29.4B', '1.76B', '720M', '15.1B', '64.8B', '91.6B', '25.1B']

# names of hurricanes
names = ['Cuba I', 'San Felipe II Okeechobee', 'Bahamas', 'Cuba II', 'CubaBrownsville', 'Tampico', 'Labor Day', 'New England', 'Carol', 'Janet', 'Carla', 'Hattie', 'Beulah', 'Camille', 'Edith', 'Anita', 'David', 'Allen', 'Gilbert', 'Hugo', 'Andrew', 'Mitch', 'Isabel', 'Ivan', 'Emily', 'Katrina', 'Rita', 'Wilma', 'Dean', 'Felix', 'Matthew', 'Irma', 'Maria', 'Michael']

# months of hurricanes
months = ['October', 'September', 'September', 'November', 'August', 'September', 'September', 'September', 'September', 'September', 'September', 'October', 'September', 'August', 'September', 'September', 'August', 'August', 'September', 'September', 'August', 'October', 'September', 'September', 'July', 'August', 'September', 'October', 'August', 'September', 'October', 'September', 'September', 'October']

# years of hurricanes
years = [1924, 1928, 1932, 1932, 1933, 1933, 1935, 1938, 1953, 1955, 1961, 1961, 1967, 1969, 1971, 1977, 1979, 1980, 1988, 1989, 1992, 1998, 2003, 2004, 2005, 2005, 2005, 2005, 2007, 2007, 2016, 2017, 2017, 2018]

# maximum sustained winds (mph) of hurricanes
max_sustained_winds = [165, 160, 160, 175, 160, 160, 185, 160, 160, 175, 175, 160, 160, 175, 160, 175, 175, 190, 185, 160, 175, 180, 165, 165, 160, 175, 180, 185, 175, 175, 165, 180, 175, 160]

# areas affected by each hurricane
areas_affected = [['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], ['Lesser Antilles', 'The Bahamas', 'United States East Coast', 'Atlantic Canada'], ['The Bahamas', 'Northeastern United States'], ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], ['The Bahamas', 'Cuba', 'Florida', 'Texas', 'Tamaulipas'], ['Jamaica', 'Yucatn Peninsula'], ['The Bahamas', 'Florida', 'Georgia', 'The Carolinas', 'Virginia'], ['Southeastern United States', 'Northeastern United States', 'Southwestern Quebec'], ['Bermuda', 'New England', 'Atlantic Canada'], ['Lesser Antilles', 'Central America'], ['Texas', 'Louisiana', 'Midwestern United States'], ['Central America'], ['The Caribbean', 'Mexico', 'Texas'], ['Cuba', 'United States Gulf Coast'], ['The Caribbean', 'Central America', 'Mexico', 'United States Gulf Coast'], ['Mexico'], ['The Caribbean', 'United States East coast'], ['The Caribbean', 'Yucatn Peninsula', 'Mexico', 'South Texas'], ['Jamaica', 'Venezuela', 'Central America', 'Hispaniola', 'Mexico'], ['The Caribbean', 'United States East Coast'], ['The Bahamas', 'Florida', 'United States Gulf Coast'], ['Central America', 'Yucatn Peninsula', 'South Florida'], ['Greater Antilles', 'Bahamas', 'Eastern United States', 'Ontario'], ['The Caribbean', 'Venezuela', 'United States Gulf Coast'], ['Windward Islands', 'Jamaica', 'Mexico', 'Texas'], ['Bahamas', 'United States Gulf Coast'], ['Cuba', 'United States Gulf Coast'], ['Greater Antilles', 'Central America', 'Florida'], ['The Caribbean', 'Central America'], ['Nicaragua', 'Honduras'], ['Antilles', 'Venezuela', 'Colombia', 'United States East Coast', 'Atlantic Canada'], ['Cape Verde', 'The Caribbean', 'British Virgin Islands', 'U.S. Virgin Islands', 'Cuba', 'Florida'], ['Lesser Antilles', 'Virgin Islands', 'Puerto Rico', 'Dominican Republic', 'Turks and Caicos Islands'], ['Central America', 'United States Gulf Coast (especially Florida Panhandle)']]

# deaths for each hurricane
deaths = [90,4000,16,3103,179,184,408,682,5,1023,43,319,688,259,37,11,2068,269,318,107,65,19325,51,124,17,1836,125,87,45,133,603,138,3057,74]
# alt death for testing deaths == 0
# deaths = [0,4000,16,3103,179,184,408,682,5,1023,43,319,688,259,37,11,2068,269,318,107,65,19325,51,124,17,1836,125,87,45,133,603,138,3057,74]

In [2]:
# Check input lists
print("Names: {}".format(len(names)))
print("Damages: {}".format(len(damages)))
print("Areas: {}".format(len(areas_affected)))
print("Deaths: {}".format(len(deaths)))
print("Max Winds: {}".format(len(max_sustained_winds)))
print("Years: {}".format(len(years)))

Names: 34
Damages: 34
Areas: 34
Deaths: 34
Max Winds: 34
Years: 34


## Question 2
Write a function that returns a new list of updated damages where the recorded data is converted to float values and the missing data is retained as "Damages not recorded".

In [3]:
def update_damages(damages):
    """Convert damages to float numbers. Returns list"""
    updated_damages = []
    for damage in damages:
        if damage == "Damages not recorded":
            pass
        elif 'M' in damage:
            damage = float(damage.strip('M')) * 1000000
        elif 'B' in damage:
            damage = float(damage.strip('B')) * 1000000000 
        else:
            damage = float(damage)

        updated_damages.append(damage)
    return updated_damages

In [4]:
updated_damages = update_damages(damages)
print(updated_damages)

['Damages not recorded', 100000000.0, 'Damages not recorded', 40000000.0, 27900000.0, 5000000.0, 'Damages not recorded', 306000000.0, 2000000.0, 65800000.0, 326000000.0, 60300000.0, 208000000.0, 1420000000.0, 25400000.0, 'Damages not recorded', 1540000000.0, 1240000000.0, 7100000000.0, 10000000000.0, 26500000000.0, 6200000000.0, 5370000000.0, 23300000000.0, 1010000000.0, 125000000000.0, 12000000000.0, 29400000000.0, 1760000000.0, 720000000.0, 15100000000.0, 64800000000.0, 91600000000.0, 25100000000.0]


## Question 3
Write a function that constructs a dictionary made out of the lists, where the keys of the dictionary are the names of the hurricanes, and the values are dictionaries themselves containing a key for each piece of data (Name, Month, Year,Max Sustained Wind, Areas Affected, Damage, Death) about the hurricane.

Thus the key "Cuba I" would have the value: 
```
{'Name': 'Cuba I', 'Month': 'October', 'Year': 1924, 'Max Sustained Wind': 165, 'Areas Affected': ['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], 'Damage': 'Damages not recorded', 'Deaths': 90}.
```

In [5]:
def make_dictionary(names, months, years, max_winds, areas, damages, deaths):
    """Returns a dictionary with name:{hurricane_data} as pairs"""
    records = {}
    for i in range(len(names)):
        records[names[i]] = {"Name": names[i], 
                            'Month': months[i], 
                            'Year': years[i], 
                            'Max Sustained Wind': max_winds[i],
                            'Areas Affected': areas[i],
                            'Damage': damages[i],
                            'Deaths': deaths[i]}
    return records 

In [6]:
hurricane_records = make_dictionary(names, months, years, max_sustained_winds, areas_affected, updated_damages, deaths)

In [7]:
print(hurricane_records['Cuba I'])
print(len(hurricane_records.keys()))

{'Name': 'Cuba I', 'Month': 'October', 'Year': 1924, 'Max Sustained Wind': 165, 'Areas Affected': ['Central America', 'Mexico', 'Cuba', 'Florida', 'The Bahamas'], 'Damage': 'Damages not recorded', 'Deaths': 90}
34


## Question 4
In addition to organizing the hurricanes in a dictionary with names as the key, you want to be able to organize the hurricanes by year.

Write a function that converts the current dictionary of hurricanes to a new dictionary, where the keys are years and the values are lists containing a dictionary for each hurricane that occurred in that year.

For example, the key `1932` would yield the value:
```
[{'Name': 'Bahamas', 'Month': 'September', 'Year': 1932, 'Max Sustained Wind': 160, 'Areas Affected': ['The Bahamas', 'Northeastern United States'], 'Damage': 'Damages not recorded', 'Deaths': 16}, {'Name': 'Cuba II', 'Month': 'November', 'Year': 1932, 'Max Sustained Wind': 175, 'Areas Affected': ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], 'Damage': 40000000.0, 'Deaths': 3103}].
```

In [8]:
def hurricanes_by_year(hurricane_dict):
    """ Returns a dictionary with years:{records} as pairs """
    hurricanes_by_year = {}
    for record in hurricane_dict.values():
        year = record['Year']
        if year not in hurricanes_by_year:
            hurricanes_by_year[year] = [record]
        else:
            hurricanes_by_year[year].append(record)
    return hurricanes_by_year

In [9]:
hurricane_records_year = hurricanes_by_year(hurricane_records)

In [10]:
print(hurricane_records_year[1932])

[{'Name': 'Bahamas', 'Month': 'September', 'Year': 1932, 'Max Sustained Wind': 160, 'Areas Affected': ['The Bahamas', 'Northeastern United States'], 'Damage': 'Damages not recorded', 'Deaths': 16}, {'Name': 'Cuba II', 'Month': 'November', 'Year': 1932, 'Max Sustained Wind': 175, 'Areas Affected': ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], 'Damage': 40000000.0, 'Deaths': 3103}]


## Question 5
You believe that knowing how often each of the areas of the Atlantic are affected by these strong hurricanes is important for making preparations for future hurricanes.

Write a function that counts how often each area is listed as an affected area of a hurricane. Store and return the results in a dictionary where the keys are the affected areas and the values are counts of how many times the areas were affected.

In [11]:
def count_by_area(hurricane_dict):
    """ Counts how often each area is listed as an affected area of a hurricane. Returns a dictionary with area:count pairs"""
    hurricanes_by_area = {}
    for record in hurricane_dict.values():
        for area in record['Areas Affected']:
            if area not in hurricanes_by_area:
                hurricanes_by_area[area] = 1
            else:
                hurricanes_by_area[area] +=1
    return hurricanes_by_area        

In [12]:
hurricanes_by_area = count_by_area(hurricane_records)

In [13]:
hurricanes_by_area['The Bahamas']

7

## Question 6
Write a function that finds the area affected by the most hurricanes, and how often it was hit.

In [14]:
def area_most_affected(hurricane_dict):
    """Returns the area most affected by hurricanes and how often was hit"""
    hurricanes_by_area = count_by_area(hurricane_records)
    max_hurricanes = max(hurricanes_by_area.values())
    index_of_max = list(hurricanes_by_area.values()).index(max_hurricanes)
    return list(hurricanes_by_area.items())[index_of_max]

In [15]:
area_most_affected(hurricane_records)

('Central America', 9)

## Question 7
Write a function that finds the hurricane that caused the greatest number of deaths, and how many deaths it caused.

In [16]:
def most_deaths(hurricane_dict):
    """Returns the hurricane that caused the most deaths and the num of deaths"""
    max_deaths = 0
    hurricane_name = ''
    for name, record in hurricane_dict.items():
        if record['Deaths'] > max_deaths: 
            max_deaths = record['Deaths']
            hurricane_name = name
    return hurricane_name, max_deaths

In [17]:
most_deaths(hurricane_records)

('Mitch', 19325)

Another option, making it more general. The following two functions look for the maximum value in any the key of the records (that has numbers as values)

In [18]:
def subdict_by_name(hurricane_dict, key):
    """Simplifies to a dictionary with the name as key and the value for one data categories"""
    name_dict = {}
    for name, record in hurricane_dict.items():
        name_dict[name] = record[key]
    return name_dict        

In [19]:
def most_by_key(hurricane_dict, key):
    """For items in the dictionary with numbers as values it finds the maximun value for that key. It returns a pair (name, value)"""
    name_dict = subdict_by_name(hurricane_dict, key)
    max_value = max(name_dict.values())
    index_of_max_value = list(name_dict.values()).index(max_value)
    return list(name_dict.items())[index_of_max_value]

In [20]:
most_by_key(hurricane_records, 'Deaths')

('Mitch', 19325)

## Question 8
Just as hurricanes are rated by their windspeed, you want to try rating hurricanes based on other metrics.

Write a function that rates hurricanes on a mortality scale according to the following ratings, where the key is the rating and the value is the upper bound of deaths for that rating.
```
mortality_scale = {0: 0,
                   1: 100,
                   2: 500,
                   3: 1000,
                   4: 10000}
```
For example, a hurricane with a 1 mortality rating would have resulted in greater than 0 but less than or equal to 100 deaths. A hurricane with a 5 mortality rating would have resulted in greater than 10000 deaths.

Store the hurricanes in a new dictionary where the keys are mortality ratings and the values are lists containing a dictionary for each hurricane that falls into that mortality rating.

In [21]:
def mortality_rating(hurricane_dict):
    """Groups hurricanes by number of deaths and returns a dictionary with mortality_scale:[hurricanes] as pairs"""
    hurricane_by_deaths = {0:[],1:[],2:[],3:[],4:[],5:[]}
    mortality_scale = {0: 0,
                   1: 100,
                   2: 500,
                   3: 1000,
                   4: 10000}
    deaths_dict = subdict_by_name(hurricane_dict, 'Deaths')
    for name, deaths in deaths_dict.items():
        some_deaths = False 
        for index in range(4, -1, -1):
            if deaths > mortality_scale[index]:
                hurricane_by_deaths[index + 1].append(name)
                some_deaths = True
                break
        # catch for zero deaths
        if not some_deaths: hurricane_by_deaths[0].append(name)
    return hurricane_by_deaths

In [22]:
list(range(4, -1, -1))

[4, 3, 2, 1, 0]

In [23]:
hurricane_scale_by_deaths = mortality_rating(hurricane_records)

Some tests:

In [24]:
print(hurricane_records['Cuba II'])
hurricane_scale_by_deaths[4]

{'Name': 'Cuba II', 'Month': 'November', 'Year': 1932, 'Max Sustained Wind': 175, 'Areas Affected': ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], 'Damage': 40000000.0, 'Deaths': 3103}


['San Felipe II Okeechobee', 'Cuba II', 'Janet', 'David', 'Katrina', 'Maria']

In [25]:
print(hurricane_records['Bahamas'])
hurricane_scale_by_deaths[1]

{'Name': 'Bahamas', 'Month': 'September', 'Year': 1932, 'Max Sustained Wind': 160, 'Areas Affected': ['The Bahamas', 'Northeastern United States'], 'Damage': 'Damages not recorded', 'Deaths': 16}


['Cuba I',
 'Bahamas',
 'Carol',
 'Carla',
 'Edith',
 'Anita',
 'Andrew',
 'Isabel',
 'Emily',
 'Wilma',
 'Dean',
 'Michael']

## Question 9
Write a function that finds the hurricane that caused the greatest damage, and how costly it was.

Note: I'm using a dictionary comprehension with if|else:
```
{ (name): (a if condition else b) for key, value in dict.items()}
```

I am not considering the hurricanes with "Damages not recorded"

In [26]:
def most_damages(hurricane_dict):
    """Returns the name of the hurricane with most damages (name, damage). Hurricanes with "Damages not recorded" are not considered"""
    
    name_dict = subdict_by_name(hurricane_dict, 'Damage')
    #for name, damage in name_dict.items():
    #    if damage == "Damages not recorded":
    #        name_dict[name] = 0
    name_dict_fixed = { ( name ): (0 if damage == "Damages not recorded" else damage) for name,damage in name_dict.items()}
    # print(name_dict_fixed)
    max_damage = max(name_dict_fixed.values())
    index_max_damage = list(name_dict_fixed.values()).index(max_damage)
    return list(name_dict_fixed.items())[index_max_damage]

In [27]:
most_damages(hurricane_records)

('Katrina', 125000000000.0)

## Question 10

Lastly, you want to rate hurricanes according to how much damage they cause.

Write a function that rates hurricanes on a damage scale according to the following ratings, where the key is the rating and the value is the upper bound of damage for that rating.
```
damage_scale = {0: 0,
                1: 100000000,
                2: 1000000000,
                3: 10000000000,
                4: 50000000000}
```
For example, a hurricane with a 1 damage rating would have resulted in damages greater than 0 USD but less than or equal to 100000000 USD. A hurricane with a 5 damage rating would have resulted in damages greater than 50000000000 USD (talk about a lot of money).

Store the hurricanes in a new dictionary where the keys are damage ratings and the values are lists containing a dictionary for each hurricane that falls into that damage rating.

In [28]:
def damages_rating(hurricane_dict):
    # Returns a dictionary with pairs damage_rating:list_of_hurricanes
    hurricane_by_damage = {0:[],1:[],2:[],3:[],4:[],5:[],'Not Recorded':[]}
    damage_scale = {0: 0,
                1: 100000000,
                2: 1000000000,
                3: 10000000000,
                4: 50000000000}
    damages_dict = subdict_by_name(hurricane_dict, 'Damage')
    for name, damage in damages_dict.items():
        some_damage = False
        for index in range(4, -1, -1):
            if damage == "Damages not recorded":
                hurricane_by_damage['Not Recorded'].append(name)
                some_damage = True
                break
            elif damage > damage_scale[index]:
                hurricane_by_damage[index + 1].append(name)
                some_damage = True
                break
        # catch damages == 0
        if some_damage == False: hurricane_by_damage[0].append(name)
    return hurricane_by_damage

In [29]:
hurricane_scale_by_damages = damages_rating(hurricane_records)
print(hurricane_scale_by_damages)

{0: [], 1: ['San Felipe II Okeechobee', 'Cuba II', 'CubaBrownsville', 'Tampico', 'Carol', 'Janet', 'Hattie', 'Edith'], 2: ['New England', 'Carla', 'Beulah', 'Felix'], 3: ['Camille', 'David', 'Allen', 'Gilbert', 'Hugo', 'Mitch', 'Isabel', 'Emily', 'Dean'], 4: ['Andrew', 'Ivan', 'Rita', 'Wilma', 'Matthew', 'Michael'], 5: ['Katrina', 'Irma', 'Maria'], 'Not Recorded': ['Cuba I', 'Bahamas', 'Labor Day', 'Anita']}


Some tests:

In [30]:
print(hurricane_scale_by_damages[1])
print(hurricane_records['Cuba II'])

['San Felipe II Okeechobee', 'Cuba II', 'CubaBrownsville', 'Tampico', 'Carol', 'Janet', 'Hattie', 'Edith']
{'Name': 'Cuba II', 'Month': 'November', 'Year': 1932, 'Max Sustained Wind': 175, 'Areas Affected': ['Lesser Antilles', 'Jamaica', 'Cayman Islands', 'Cuba', 'The Bahamas', 'Bermuda'], 'Damage': 40000000.0, 'Deaths': 3103}


Check if we are using all the values in the dictionary:

In [31]:
total_counted = 0
for hurricanes in hurricane_scale_by_damages.values():
    total_counted += len(hurricanes)
print("Total Counted: {}".format(total_counted))
print("Not Recorded: {}".format(updated_damages.count("Damages not recorded")))
print("Total in dataset: {}".format(len(names)))

Total Counted: 34
Not Recorded: 4
Total in dataset: 34


## References
* [Python: Retrieve the Index of the Max Value in a List](https://careerkarma.com/blog/python-max-index/)

* [Stackoverflow - Python dictionary: are keys() and values() always the same order?](https://stackoverflow.com/questions/835092/python-dictionary-are-keys-and-values-always-the-same-order)

* [How can I use if/else in a dictionary comprehension?](https://stackoverflow.com/questions/9442724/how-can-i-use-if-else-in-a-dictionary-comprehension)