## Data Preparation: Notebook shows how you prepare your data and explains why by including…  
- Instructions or code needed to get and prepare the raw data for analysis  
- Code comments and text to explain what your data preparation code does  
- Valid justifications for why the steps you took are appropriate for the problem you are solving  

# Joining Hurricane and Housing Dataframes 

In [1]:
#Importing libraries needed
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import pyplot
%matplotlib inline
import numpy as np
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', None)

# Joining Housing Values with Hurricanes 
In order to do logistic regression on our data we need to join the datasets. We will use the join method joining the hurricane dataset into the housing dataset. 

Documentation can be found here: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.join.html

In [2]:
hurricane = pd.read_csv(r'data\hurricane_clean.csv')
hurricane.head()

Unnamed: 0,DATE,AWND,WSF2,WSF5,HurricaneName,City
0,8/14/2004,5.82,13.0,15.0,1,Apalachicola
1,7/10/2005,19.46,30.0,34.9,2,Apalachicola
2,7/11/2005,17.0,32.0,38.0,2,Apalachicola
3,10/7/2016,10.74,21.9,27.1,3,Apalachicola
4,10/8/2016,8.05,15.0,21.9,3,Apalachicola


In [3]:
hurricane['HurricaneName'] = hurricane['HurricaneName'].astype(str).map({'1': 'c', '2': 'd', '3': 'ma', '4':'ir', '5':'mi'})

In [4]:
hurricane.head()

Unnamed: 0,DATE,AWND,WSF2,WSF5,HurricaneName,City
0,8/14/2004,5.82,13.0,15.0,c,Apalachicola
1,7/10/2005,19.46,30.0,34.9,d,Apalachicola
2,7/11/2005,17.0,32.0,38.0,d,Apalachicola
3,10/7/2016,10.74,21.9,27.1,ma,Apalachicola
4,10/8/2016,8.05,15.0,21.9,ma,Apalachicola


In [5]:
#saving the data
hurricane.to_csv(r'data\hurricane_name.csv', index=False)

In [6]:
#opening dataframes 
hurricane = pd.read_csv(r'data\hurricane_name.csv')
#setting the index to city and HurricaneName so that we use .join()
hurricane.set_index(['City', 'HurricaneName'], inplace = True)

### Writing a Function
This function will be used to join our housing data with our hurricane data

In [7]:
def join():
    #setting the index
    df.set_index(['City', 'HurricaneName'], inplace = True)
    #joining the dataframe
    df = hurricane.join(df, how='inner')
    #reseting the index
    df.reset_index(inplace = True)
    return df.head()

## Joining Bottom Tier Home Values with Hurricane Data

In [10]:
#opening dataframes 
bottom =  pd.read_csv(r'data\bottom.csv')
middle = pd.read_csv(r'data\middle.csv')
top = pd.read_csv(r'data\top.csv')

FileNotFoundError: [Errno 2] No such file or directory: 'data\\middle.csv'

In [None]:
bottom.head()

In [None]:
#setting the index to city and HurricaneName so that we use .join()
hurricane.set_index(['City', 'HurricaneName'], inplace = True)
bottom.set_index(['City', 'HurricaneName'], inplace = True)

In [None]:
#joining the housing dataframe into the hurricane dataframe 
bottom = hurricane.join(bottom, how='inner')

In [None]:
#reseting the index
bottom.reset_index(inplace = True)
bottom.head()

In [None]:
#saving the h1 dataframe
bottom.to_csv(r'data\bottom_hurricane', index=False)

## Joining Middle Tier Home Values with Hurricane Data 

In [None]:
#opening dataframes 
hurricane6 = pd.read_csv(r'data\hurricane_name.csv')
housing6 =  pd.read_csv(r'data\housing_6months.csv')

In [None]:
hurricane6.head()

In [None]:
housing6.head()

In [None]:
#setting the index to city so that we use .join()
hurricane6.set_index(['City', 'HurricaneName'], inplace = True)
housing6.set_index(['City', 'HurricaneName'], inplace = True)

In [None]:
#joining the housing dataframe into the hurricane dataframe 
df6 = hurricane1.join(housing6, how='inner')

In [None]:
#reseting the index
df6.reset_index(inplace = True)
df6.head()

In [None]:
#saving the h1 dataframe
df6.to_csv(r'data\sixmonths.csv', index=False)

## Joining 3 months before and after hurricane 

In [None]:
#opening dataframes 
hurricane3 = pd.read_csv(r'data\hurricane_name.csv')
housing3 =  pd.read_csv(r'data\housing_3months.csv')

In [None]:
hurricane3.head()

In [None]:
housing3.head()

In [None]:
#setting the index to city so that we use .join()
hurricane3.set_index(['City', 'HurricaneName'], inplace = True)
housing3.set_index(['City', 'HurricaneName'], inplace = True)

In [None]:
#joining the housing dataframe into the hurricane dataframe 
df3 = hurricane1.join(housing3, how='inner')

In [None]:
#reseting the index
df3.reset_index(inplace = True)
df3.head()

In [None]:
#saving the h1 dataframe
df3.to_csv(r'data\threemonths.csv', index=False)

## Joining 1 year before and after hurricane (top tier)

In [None]:
#opening dataframes 
hurricanet = pd.read_csv(r'data\hurricane_name.csv')
housingt =  pd.read_csv(r'data\toptier1year.csv')

In [None]:
hurricanet.head()

In [None]:
housingt.head()

In [None]:
#setting the index to city so that we use .join()
hurricanet.set_index(['City', 'HurricaneName'], inplace = True)
housingt.set_index(['City', 'HurricaneName'], inplace = True)

In [None]:
#joining the housing dataframe into the hurricane dataframe 
dft = hurricane1.join(housingt, how='inner')

In [None]:
#reseting the index
dft.reset_index(inplace = True)
dft.head()

In [None]:
#saving the h1 dataframe
dft.to_csv(r'data\top.csv', index=False)

## Joining 1 year before and after hurricane (bottom tier)

In [None]:
#opening dataframes 
hurricaneb = pd.read_csv(r'data\hurricane_name.csv')
housingb =  pd.read_csv(r'data\bottomtier1year.csv')

In [None]:
hurricaneb.head()

In [None]:
housingb.head()

In [None]:
#setting the index to city so that we use .join()
hurricaneb.set_index(['City', 'HurricaneName'], inplace = True)
housingb.set_index(['City', 'HurricaneName'], inplace = True)

In [None]:
#joining the housing dataframe into the hurricane dataframe 
dfb = hurricane1.join(housingb, how='inner')

In [None]:
#reseting the index
dfb.reset_index(inplace = True)
dfb.head()

In [None]:
#saving the h1 dataframe
dfb.to_csv(r'data\bottom.csv', index=False)

## Checking Crosstabs

### 1 year before and after

In [None]:
#check crosstabs 
df1[df1['bool'] == 0].describe()

In [None]:
#check crosstabs 
df1[df1['bool'] == 1].describe()

### 6 months before and after

In [None]:
#check crosstabs 
df6[df6['bool'] == 0].describe()

In [None]:
#check crosstabs 
df6[df6['bool'] == 1].describe()

### 3 months before and after

In [None]:
#check crosstabs 
df3[df3['bool'] == 0].describe()

In [None]:
#check crosstabs 
df3[df3['bool'] == 1].describe()

### 1 year before and after (top)

In [None]:
#check crosstabs 
dft[dft['bool'] == 0].describe()

In [None]:
#check crosstabs 
dft[dft['bool'] == 1].describe()

### 1 year before and after (bottom)

In [None]:
#check crosstabs 
dfb[dfb['bool'] == 0].describe()

In [None]:
#check crosstabs 
dfb[dfb['bool'] == 1].describe()