# Overview
Hurricanes can be very damaging in terms of life and cost as we have seen during the year 2017, when hurricanes like Harvey, Irma and Maria caused pandemonium in the Atlantic Basin. We have compiled from Wikipedia a list of the costliest hurricanes in the Atlantic. This data can give us insights on which have been the most costly hurricanes, the busiest seasons, and the areas that have been affected the most.

# Data & Setup
The data has been collected from https://en.wikipedia.org/wiki/List_of_costliest_Atlantic_hurricanes. The data from 2017 has not been published yet, so it's not included in this dataset

In [21]:
import pandas as pd
import numpy as np
proxy='https://proxy.mentoracademy.org/getContentFromWikiUrl/'
df1 = pd.read_html('https://en.wikipedia.org/wiki/List_of_costliest_Atlantic_hurricanes')
df = df1[0].drop(df1[0].index[0]).drop(5, axis=1)
df.columns = ['Name','Cost in Billions','Season','Classification','Areas Affected']
df['Classification']=df['Classification'].apply(lambda x: str(x)[3:]).apply(lambda x: x.strip()).replace('\xa0',' ',regex=True)
df['Cost in Billions']=df['Cost in Billions'].apply(lambda x: str(x)[1:]).apply(pd.to_numeric)

# Skills & Concepts
The third cell of the notebook will not be shown to the learner, and lists the skills and concepts you are aiming to test with this question. Please remove all of the bullets from list below which you are *not* testing with this question. You can add new bullets with new skills wherever you see fit, and skills can focus on specific APIs or tools or more broadly on concepts related to the course. And don't worry if you're not sure, this is just a best guess list, there are usually many ways to solve a problem!

* python-strings
* python-pandas-Series
* python-pandas-DataFrame
* python-pandas-DataFrame-groupby
* python-pandas-DataFrame-agg

In [22]:
# list the 10 most costly hurricanes and their season
# it should return a dataframe with two columns: the name of the hurricane and the season
def question1():
    data = df
    return "Answer" # put your answer here

In [23]:
# we will need to sort on the dataframe and return the top 10 rows and the two columns needed
def solution1():
    return (df.sort_values(['Cost in Billions'], ascending = False).head(10)[['Name','Season']])

solution1()

Unnamed: 0,Name,Season
1,Katrina,2005
2,Sandy,2012
3,Ike,2008
4,Wilma,2005
5,Andrew,1992
6,Ivan,2004
7,Irene,2011
8,Charley,2004
9,Matthew,2016
10,Rita,2005


In [24]:
# find the four seasons with the most hurricanes on this list: 
# return a dataset of size 4 with an index of 
# season, and a value of number of hurricanes for that season
# ordered from most hurricanes to least hurricanes:
# e.g
# 2005  6
# 2012  2
# ... 
def question2():
    data = df
    return "Answer" # put your answer here

In [25]:
# count the number of hurricanes per season and return the top 4
def solution2():
    return (df['Season'].value_counts().head(4))

solution2()

2005    6
2004    4
1995    4
2008    3
Name: Season, dtype: int64

In [26]:
# compare the cost of the damage caused by the hurricanes for Cuba, 
# Mexico and United States.
# return a tuple with the three countries in order from more cost sustained
# by hurricane damage to less cost sustained
# e.g ('United States','Mexico','Cuba')
# you can manually build the tuple based on the results
# tip: you can use str.contains to look for that country string in the 'Areas Affected' column
def question3():
    data = df
    return "Answer" # put your answer here

In [27]:
# add the cost for hurricanes that affected that area
def solution3():
    
    cuba = df[df['Areas Affected'].str.contains("Cuba")]['Cost in Billions'].sum() #43.03
    mexico = df[df['Areas Affected'].str.contains("Mexico")]['Cost in Billions'].sum() # 26.464
    us = df[df['Areas Affected'].str.contains("United States")]['Cost in Billions'].sum() #411.41
    return ('United States','Mexico','Cuba')
    #return (us,cuba,mexico)

solution3()

('United States', 'Mexico', 'Cuba')

In [28]:
#List the four seasons which were most costly along with its associated cost
#put your results in a series where the index is the season and the value is the cost
# tip: you will need to use a groupby on the Season and then add the cost
def question4():
    data = df
    return "Answer" # put your answer here

In [29]:
def solution4():
    
    return df.groupby('Season')['Cost in Billions'].sum().sort_values(ascending=False).head(4)

solution4()

Season
2005    158.084
2012     77.390
2004     57.060
2008     45.460
Name: Cost in Billions, dtype: float64

In [30]:
# How much more costly were hurricanes of category 5 than hurricanes of category 4
# you result should be a single number representing the difference in billions
# tip: you'll need to use a groupby on the Classification column
def question5():
    data = df
    return "Answer" # put your answer here

In [31]:
def solution5():
    se = df.groupby('Classification')['Cost in Billions'].sum()
    return (se.loc['Category 5 hurricane'] - se.loc['Category 4 hurricane'])

solution5()

144.56400000000002