Notebook

# Climate Displacement

Disaster Types
- Year
- State
- Households Inflow (Number of Returns)
- Households Outflow (Number of Returns)
- Individuals Inflow (Number of Exemptions)
- Individuals Outflow (Number of Exemptions)
- Chemical
- Dam/Levee Break
- Drought
- Earthquake
- Fire
- Flood
- Human Cause
- Hurricane
- Ice
- Mud/Landslide
- Other
- Snow
- Storm
- Terrorism
- Tornado
- Tsunami
- Typhoon
- Volcano
- Water
- Winter

In [None]:
"""
Folders Setup

code 
    notebook.ipynb
    data
        Disasters
            FEMA_dataset.csv
        StateMigration
            1990to1991StateMigration
                 1990to1991StateMigrationInflow
                     Alabama91in.xls
                     Alaska91in.xls
                     .
                     .
                     .
                     Wisconsin91in.xls
                     Wyoming91in.xls 
                 1990to1991StateMigrationOutflow
                     Alabama91Out.xls
                     Alaska91Out.xls 
                     .
                     .
                     .
                     Wisconsin91Out.xls
                     Wyoming91Out.xls 
            .
            .
            .
            2008to2009StateMigration
            2009to2010StateMigration
            2010to2011StateMigration 
"""

## Pseudocode:

### FEMA Dataset Pre-processing (Neely)
1. Create new FEMA_dataset with columns 
    - contains Year, State, Disaster Type
2. Name file "State_Disasters_by_Year"

### StateMigration Data Pre-Processing (Ben)
1. Convert all datasets in StateMigration from .xls into .csv files
2. Extract "Total Flow" row with "Number of Returns" and "Number of Exemptions" - assign I if from inflow and O if from outflow - from every state file.
3. Extract "State" and "Year" from every file
4. Create file with "State" (from file name), "Year" (from file name), "Number_of_Returns_I", "Number_of_Exemptions_I", "Number_of_Returns_I" and "Number_of_Exemptions_O"
5. Name file "State Migration by Year"

### Merge Datasets (Both)
1. Merge datasets on common attributes "Year" and "State"
2. Name dataset "State_Migration_and_Disasters_by_Year"

#### Train and Testing
1. Create training and testing datasets
    Questions: How should we split training and testing data?
2. Create Neural Network models
    Input: Year, State, Disaster Type
    Output: Migration Inflow (Household/Individual), Migration Outflow (Household/Individual)
3. Put training and testing through the Neural Network models.
4. Evaluate which models are the most effective.
---
Data Augmentation
Synthetic Data
Use svm or decision tree -- skleant -- and compare against a neural network
could use all data for training and all data for validation - not this is a faulty practice in 
20% distribution of state 


```
read_file = pd.read_excel ("Test.xlsx")
 
# Write the dataframe object
# into csv file
read_file.to_csv ("Test.csv",
                  index = None,
                  header=True)
```

In [None]:
# adding imports
import shutil
import pandas as pd
import os
import glob
import xlrd
import csv
#from fastai import *

### FEMA Dataset Pre-processing

Creates State_Disasters_by_Year.csv with:
- State
- Disaster Type
- Start Year
- End Year

In [None]:
# FEMA Dataset Preprocessing

# copy original FEMA dataset to new file
original = r'../code/data/Disasters/FEMA_dataset.csv'
new = r'../code/data/Disasters/State_Disasters_by_Year.csv'
shutil.copyfile(original, new)

# read csv file
data = pd.read_csv('~/code/data/Disasters/State_Disasters_by_Year.csv')

# delete irrelevant rows
data.pop('Declaration Number')
data.pop('Declaration Type')
data.pop('Declaration Date')
data.pop('County')
data.pop('Disaster Title')
data.pop('Close Date')
data.pop('Individual Assistance Program')
data.pop('Public Assistance Program')
data.pop('Hazard Mitigation Program')
data.pop('Individuals & Households Program')

# extract years
data['Start Year'] = pd.DatetimeIndex(data['Start Date']).year
data['End Year'] = pd.DatetimeIndex(data['End Date']).year

# delete start and end dates
data.pop('Start Date')
data.pop('End Date')

print(data)

# save changes csv
data.to_csv('../code/data/Disasters/State_Disasters_by_Year.csv')

### Converting State Migration data from .xls to .csv

Converting all datasets in StateMigration from .xls to .csv

In [None]:
# Convert all files in StateMigration folder from .xls to .csv

# create list of xls files
xls_list = glob.glob("/Users/ben/Desktop/climate-displacement/code/data/StateMigration/*/*/*.xls")

# replace xls 
for xls_file in xls_list:
    
    wb = xlrd.open_workbook(xls_file)
    sh = wb.sheet_by_index(0)
    csv_file = open(xls_file[0:-3]+'csv', "w")
    wr = csv.writer(csv_file, quoting=csv.QUOTE_ALL)
    
    for rownum in range(sh.nrows):
        wr.writerow(sh.row_values(rownum))
        
    csv_file.close()
    
    # remove .xls files
    os.remove(xls_file)


### More data wrangling - StateMigration dataset
- Extract "Total Flow" row with "Number of Returns" and "Number of Exemptions"
- Assign "I" if from inflow and "O" if from outflow - for every state file
- Extract "State" and "Year" from every file
- Create file with "State" (from file name), "Year" (from file name), "Number of Returns_I", "Number of Exemptions I", "Number of Returns O" and "Number of Exemptions O"

### Pseudocode

1. create output path in repository for the merged StateMigration dataset

2. read each csv file in the StateMigration folder, and for each file, 
    - d
3. check the csv files to make sure they are the intended data
4. remove the original csv files

In [None]:
# set new file location for merged StateMigration dataset
output_path = r'../code/data/StateMigration/State_Migrations_by_Year.csv'

In [None]:
# create the output file at output_path
output = open(output_path, "w")
output.close()

In [None]:
# create empty DataFrame object
df = pd.DataFrame()
df.insert(0,'State', '')
df.insert(1,'Year', '')
df.insert(2,'NOR(I)', '')
df.insert(3,'NOE(I)', '')
df.insert(4,'NOR(O)', '')
df.insert(5,'NOE(O)', '')

# print(df)

In [219]:
# create dictionary of "state initial keys" with multiple "values"
# run each segment through dictionary, and convert into state initial
stateDict = {
    "AL":['Alabama', 'al', 'AL', 'alab', 'Alab'],
    "AK":['Alaska', 'ak', 'AK', 'alas', 'Alas'],
    "AZ":['Arizona', 'az', 'AZ', 'ariz', 'Ariz'],
    "AR":['Arkansas', 'ar', 'AR', 'arka', 'Arka', 'aka'],
    "CA":['California', 'ca', 'CA', 'cali', 'Cali'],
    "CO":['Colorado', 'co', 'CO', 'colo', 'Colo'],
    "CT":['Connecticut', 'ct', 'CT', 'conn', 'Conn'],
    "DE":['Delaware', 'de', 'DE', 'dela', 'Dela'],
    "DC":['DistrictofColumbia', 'Districtofcolumbia', 'District of Columbia', 'dc', 'DC', 'dist', 'Dist', 'DiCo', 'dico'],
    "FL":['Florida', 'fl', 'FL', 'flor', 'Flor'],
    "GA":['Georgia', 'ga', 'GA', 'geor', 'Geor'],
    "HI":['Hawaii', 'hi', 'HI', 'hawa', 'Hawa'],
    "ID":['Idaho', 'id', 'ID', 'idah', 'Idah'],
    "IL":['Illinois', 'il', 'IL', 'illi', 'Illi'],
    "IN":['Indiana', 'in', 'IN', 'indi', 'Indi'],
    "IA":['Iowa', 'ia', 'IA', 'iowa'],
    "KS":['Kansas', 'ks', 'KS', 'kans', 'Kans'],
    "KY":['Kentucky', 'ky', 'KY', 'kent', 'Kent'],
    "LA":['Louisiana', 'la', 'LA', 'loui', 'Loui'],
    "MA":['Massachusetts', 'ma', 'MA', 'mass', 'Mass'],
    "MD":['Maryland', 'md', 'MD', 'mary', 'Mary'],
    "ME":['Maine', 'me', 'ME', 'main', 'Main'],
    "MI":['Michigan', 'mi', 'MI', 'mich', 'Mich'],
    "MN":['Minnesota', 'mn', 'MN', 'minn', 'Minn'],
    "MO":['Missouri', 'mo', 'MO', 'Miso', 'miso'],
    "MS":['Mississippi', 'ms', 'MS', 'Misi', 'misi', 'miss', 'Miss'],
    "MT":['Montana', 'mt', 'MT', 'mont', 'Mont'],
    "NC":['North Carolina', 'NorthCarolina', 'nc', 'NC', 'NoCa', 'noca', 'ncar', 'Northcarolina'],
    "ND":['North Dakota', 'NorthDakota', 'nd', 'ND', 'NoDa', 'noda', 'ndak', 'Northdakota'],
    "NE":['Nebraska', 'ne', 'NE', 'Nebr', 'nrbt', 'nebr'],
    "NH":['New Hampshire', 'NewHampshire', 'nh', 'NH', 'NeHa', 'neha', 'newh'],
    "NJ":['New Jersey', 'NewJersey', 'nj', 'NJ', 'NeJe', 'neje', 'newj', 'Newjersey'],
    "NM":['New Mexico', 'NewMexico', 'nm', 'NM', 'NeMe', 'neme', 'newm', 'Newmexico'],
    "NV":['Nevada', 'nv', 'NV', 'Neva', 'neva'],
    "NY":['New York', 'NewYork', 'ny', 'NY', 'newy', 'NeYo', 'neyo', 'newY','Newyork'],
    "OH":['Ohio', 'oh', 'OH', 'ohio', 'nhio'],
    "OK":['Oklahoma', 'ok', 'OK', 'okla', 'Okla'],
    "OR":['Oregon', 'or', 'OR', 'oreg', 'Oreg', 'oeg'],
    "PA":['Pennsylvania', 'pa', 'PA', 'penn', 'Penn'],
    "RI":['Rhode Island', 'RhodeIsland', 'ri', 'RI', 'Rhls', 'rhod', 'Rhod', 'RhIs'],
    "SC":['South Carolina', 'SouthCarolina', 'sc', 'SC', 'SoCa', 'soca', 'scar', 'Southcarolina'],
    "SD":['South Dakota', 'SouthDakota', 'sd', 'SD', 'SoDa', 'soda', 'sdak', 'Southdakota'],
    "TN":['Tennessee', 'tn', 'TN', 'Tenn', 'tenn'],
    "TX":['Texas', 'tx', 'TX', 'texa', 'Texa'],
    "UT":['Utah', 'ut', 'UT', 'utah'],
    "VA":['Virginia', 'va', 'VA', 'virg', 'Virg', 'vrg'],
    "VT":['Vermont', 'vt', 'VT', 'verm', 'Verm'],
    "WA":['Washington', 'wa', 'WA', 'wash', 'Wash'],
    "WI":['Wisconsin', 'wi', 'WI', 'wisc', 'Wisc', 'wiso', 'wsc'],
    "WV":['West Virginia', 'WestVirginia', 'wv', 'WV', 'west', 'wevi', 'wvir', 'Westvirginia'],
    "WY":['Wyoming', 'wy', 'WY', 'wyom', 'Wyom']    
}

def getKey(val):
    for key, valueList in stateDict.items():
         for value in valueList:
            if val == value:
                 return key
    
    return "key not found for " + val

print(getKey('Wisconsin'))

WI


In [220]:
# read csv file
csv_list = glob.glob("/Users/ben/Desktop/climate-displacement/code/data/StateMigration/*/*/*.csv")

for csv_file in csv_list:
    # os.path.split returns a list of (head, tail) where head is the parent directories 
    # and tail is the filename and extension
    temp = os.path.split(csv_file)
    temp2 = os.path.split(temp[0])
    
    # get file name and parent folder from temp, temp2 respectively
    filename = temp[1]
    parentfile = temp2[1]
    
    # print (filename, parentfile)
    # print (type(filename))
    
    # extract state, year, and inflow/outflow
    # three different naming conventions in the StateMigration dataset
    # 1) [State][Year1Year2 e.g. (0708)][in/out]
    # 2) [State][Year2 e.g. 91][In/Out]
    # 3) [Year1Year2 like 1)]inmig[in/out][state INITIAL e.g. AL]
    # 4) [first 4 letters of State][Year2][in/ot]
    # 5) s9[last digits of Year1, Year2 e.g. 56][state INITIAL][ir/or]
    # 6) same as 4) but with extra "r" at the end
    
    # naming convention 1: used for years 2004-2009
    name1 = [2004,2005,2006,2007,2008]
    # naming convention 2: used for years 1990-1993
    name2 = [1990,1991,1992]
    # naming convention 3: used for years 2009-2011
    name3 = [2009,2010]
    # naming convention 4: used for years 1993-1995, 1996-2000, 2001-2004
    name4 = [1993,1994,1996,1997, 1998, 1999, 2001,2002,2003]
    # naming convention 5: used for years 1995-1996
    name5 = 1995
    # naming convention 6: used for years 2000-2001
    name6 = 2000
    
    # extract inflow/outflow, year using parentfile, and state using filename
    if parentfile[-6] == 'u':
        io = parentfile[-7:]
    elif parentfile[-5] == 'n':
        io = parentfile[-6:]
    year = int(parentfile[0:4])
    # print(io, year)
    
    if year in name1:
        if io == 'Inflow':
            state = filename[:-10]
        elif io == 'Outflow':
            state = filename[:-11]
    elif year in name2:
        if io == 'Inflow':
            state = filename[:-8]
        elif io == 'Outflow':
            state = filename[:-9]
    elif year in name3:
        state = filename[-6:-4]
    elif year in name4:
        state = filename[:4]
        if state == 'vrg9':
            state = 'vrg'
        elif state == 'vrg0':
            state = 'vrg'
        elif state == 'az94':
            state = 'az'
        elif state == 'aka9':
            state = 'aka'
        elif state == 'wsc9':
            state = 'wsc'
    elif year == name5:
        state = filename[-8:-6]
    elif year == name6:
        state = filename[:4]
        if state == 'vrg0':
            state = 'vrg'
        elif state == 'oeg0':
            state = 'oeg'
    # else:
       # state = filename
  
    si = getKey(state)
    print(filename, si)
    # print(filename)
    # data = pd.read_csv(csv_file)
    # print(data)
    
    #break

1011inmigoutmn.csv MN
1011inmigoutaz.csv AZ
1011inmigoutal.csv AL
1011inmigoutmo.csv MO
1011inmigoutnc.csv NC
1011inmigoutwv.csv WV
1011inmigoutwa.csv WA
1011inmigouttx.csv TX
1011inmigoutnv.csv NV
1011inmigoutor.csv OR
1011inmigouttn.csv TN
1011inmigoutnd.csv ND
1011inmigoutak.csv AK
1011inmigoutct.csv CT
1011inmigoutne.csv NE
1011inmigoutmi.csv MI
1011inmigoutvt.csv VT
1011inmigoutva.csv VA
1011inmigoutca.csv CA
1011inmigoutdc.csv DC
1011inmigoutid.csv ID
1011inmigoutky.csv KY
1011inmigoutri.csv RI
1011inmigoutpa.csv PA
1011inmigoutfl.csv FL
1011inmigoutde.csv DE
1011inmigoutia.csv IA
1011inmigoutks.csv KS
1011inmigoutil.csv IL
1011inmigouthi.csv HI
1011inmigoutsc.csv SC
1011inmigoutin.csv IN
1011inmigoutsd.csv SD
1011inmigoutga.csv GA
1011inmigoutwi.csv WI
1011inmigoutar.csv AR
1011inmigoutnj.csv NJ
1011inmigoutmd.csv MD
1011inmigoutnh.csv NH
1011inmigoutms.csv MS
1011inmigoutco.csv CO
1011inmigoutme.csv ME
1011inmigoutla.csv LA
1011inmigoutut.csv UT
1011inmigoutnm.csv NM
1011inmigo

Florida92Out.csv FL
Colorado92Out.csv CO
Oregon92Out.csv OR
Hawaii92Out.csv HI
Louisiana92Out.csv LA
Foreign92Out.csv key not found for Foreign
Michigan92Out.csv MI
Alabama92Out.csv AL
Tennessee92Out.csv TN
Iowa92Out.csv IA
Nevada92Out.csv NV
Wisconsin92Out.csv WI
West Virginia92Out.csv WV
Utah92Out.csv UT
Montana92Out.csv MT
Indiana92Out.csv IN
New Jersey92Out.csv NJ
Minnesota92Out.csv MN
soca95in.csv SC
idah95in.csv ID
west95in.csv WV
loui95in.csv LA
wisc95in.csv WI
okla95in.csv OK
conn95in.csv CT
nebr95in.csv NE
dela95in.csv DE
mich95in.csv MI
virg95in.csv VA
rhod95in.csv RI
miss95in.csv MS
penn95in.csv PA
newm95in.csv NM
dist95in.csv DC
alas95in.csv AK
verm95in.csv VT
miso95in.csv MO
newy95in.csv NY
soda95in.csv SD
main95in.csv ME
mont95in.csv MT
oreg95in.csv OR
noda95in.csv ND
indi95in.csv IN
alab95in.csv AL
utah95in.csv UT
newh95in.csv NH
wash95in.csv WA
colo95in.csv CO
ohio95in.csv OH
mass95in.csv MA
neva95in.csv NV
iowa95in.csv IA
illi95in.csv IL
tenn95in.csv TN
wyom95in.csv WY

In [None]:
# add parameters to df object

In [None]:
# convert df into csv

In [None]:
# close output csv file
output.close()

In [None]:
"""
for filename in os.listdir(directory):
    f = os.path.join(directory, filename)
    if os.path.isfile(f) and filename.endswith('.txt'):
        read_file = pd.read_excel (r'../code/data/StateMigration/1990to1991StateMigration/1990to1991StateMigrationInflow/Alabama91in.xls')
    read_file.to_csv (r'../code/data/StateMigration/1990to1991StateMigration/1990to1991StateMigrationInflow/Alabama91in.csv', index = None, header=True)


read_file = pd.read_excel (r'../code/data/StateMigration/1990to1991StateMigration/1990to1991StateMigrationInflow/Alabama91in.xls')
read_file.to_csv (r'../code/data/StateMigration/1990to1991StateMigration/1990to1991StateMigrationInflow/Alabama91in.csv', index = None, header=True)



location = "/Users/neely/Desktop/climate-displacement/code/data/StateMigration/1990to1991StateMigration/1990to1991StateMigrationInflow"
placement = "./Users/neely/Desktop/climate-displacement/code/data/StateMigration"

for file in glob.glob("*.xls"):
"""

"""
from os.path import isfile, join
onlyfiles = [f for f in listdir('../code/data/StateMigration/') if isfile(join('../code/data/StateMigration/', f))]
"""

In [None]:
# Neural Network -> fastai

# path to dataset here (./Users/neely/Desktop/climate-displacement/code/data/StateMigration)
path = untar_data(URLs.MNIST)
path.ls()

# data loader
dls = ImageDataLoaders.from_folder(path, train="training")