# ETL Project
<ul>
    <li>UofMN Data Visualization and Analytics Bootcamp</li>
    <li>Week 13 | ETL Project</li>
    <li>Created by: Stephanie Hartje, Chris Howard</li>
    <li>05/18/2019</li>
</ul>

### Project Description and Purpose
<p>This project extracts(E) data from multiple sources, uses the Python Pandas module to transform(T) the data into 
    useful tables, which can then be mapped and loaded(L) into a SQL database. There is no direct analysis done on
    the data for the project, but the intention is to have a usable database for a theoretical analysis at the end of 
    the process.</p>
<p>Our theoretical analysis is looking at any (albeit spurious) correlation between solar eclipses, ufo sightings, and 
    multiple natural disasters including hurricanes and volcanoes. Each event type has been given its own table 
    in the database with a minimum of event date, some form of ID, and location (including latitude and longitude where
    available). All dates have been separated into 'year' 'month' 'day' columns so that events can be easily 
    compared by date for clusters around certain months as well as by year and location.</p>
<p>The sql code for our database can be found in our <a href='https://github.com/Survifit/ETLProject/blob/master/disaster_etl.sql'>repository</a>, or opened directly into a new Jupyter window <a href='../edit/disaster_etl.sql'>using this link</a> if this notebook is being run locally within a copy of the repository.</p>
<p>Extract:</p>
     <p>We extracted the following:</p>
        <ol><li>Solar Eclipses</li>
            <ul><li>From: https://data.world/nasa/five-millennium-catalog-of-solar-eclipses-detailed</li>
                <li>Original Format: 2 CSV files, one for 19th century and one for 20th century</li></ul><br>
        <li>UFO Sightings</li>
            <ul><li>From: https://en.wikipedia.org/wiki/List_of_reported_UFO_sightings</li>
                <li>Original Format: 2 HTML tables, one for 19th century and one for 20th century</li></ul><br>
        <li>Hurricanes</li>
            <ul><li>From: https://www.kaggle.com/noaa/hurricane-database</li>
                <li>Original Format: Two CSV files, one for Atlantic and one for Pacific Storms</li></ul><br>
        <li>Volcanoes:</li>
            <ul><li>From:  https://data.world/dhs/historical-significant</li>
                <li>Original Format: CSV</li></ul></ol><br>
<p>Transform:</p>
    <p>We performed the following transformation steps for each data set respectively:</p>
        <ol><li>Solar Eclipses</li>
            <ul><li>Manually modified both CSV files to remove extra columns in random rows</li>
                <li>Combine CSV files into a single dataframe, modify column names to SQL friendly structure</li>
                <li>Loop through Latitude and Longitude data to replace N/S/E/W notation with positive/negatives</li>
                <li>Loop through month column to replace month names with numeric representation</li>
            </ul><br>
        <li>UFO Sightings</li>
            <ul><li>remove label row from 20th century table and combine into one table</li>
                <li>use first row as header and re-index</li>
                <li>separate year, month, and date into separate columns</li>
                <li>select the columns to keep</li></ul><br>
        <li>Hurricanes</li>
            <ul><li>combine Atlantic and Pacific tables into one table</li>
            <li>convert date column to string and separate year, month, and day into separate columns</li>
            <li>select the columns to keep</li>
            <li>select the rows corresponding to Hurricane data</li>
            <li>keep only the first observation related to a particular Hurricane</li>
            <li>convert latitude and longitude to format consistent with other tables</li></ul><br>
        <li>Volcanoes</li>
            <ul><li>select columns to keep</li>
                <li>rename columns</li></ul></ol><br>
<p>Load:</p>
    <p>We created a database in MySQL (disaster_etl) and created a table for each dataset.  We then loaded the data into MySQL using sqlalchemy.</p>
    

In [4]:
# imports
import pandas as pd
import numpy as np
import requests
from sqlalchemy import create_engine
import config

In [5]:
## Chris Extract/Transform below


In [6]:
# ufo data from wikipedia, data from 19th & 20th 
ufo_url = 'https://en.wikipedia.org/wiki/List_of_reported_UFO_sightings'
ufo_df_19th = pd.read_html(ufo_url)[5]
ufo_df_20th = pd.read_html(ufo_url)[6]

# remove label row from 20th century data
ufo_df_20th = ufo_df_20th.drop(0)

# combine tables into single dataframe
ufo_df = ufo_df_19th.append(ufo_df_20th, ignore_index=True) 

# use first row as column headers, then reindex removing top row
ufo_df.columns = ufo_df.iloc[0]
ufo_df = ufo_df.reindex(ufo_df.index.drop(0))

# create loop to extract year/month/day from formatting
dates = ufo_df['Date']
year = []
month = []
day = []
for date in dates:
    date = date.strip('s')
    date = date.split('-')
    year.append(date[0])
    if len(date) > 1:
        month.append(date[1])
    else:
        month.append(None)
    if len(date) > 2:
        day.append(date[2])
    else:
        day.append(None)

# insert 'Year' 'Month' 'Day' columns into the dataframe
ufo_df.insert(loc=0, column='Year', value=year)
ufo_df.insert(loc=1, column='Month', value=month)
ufo_df.insert(loc=2, column='Day', value=day)
ufo_df_clean = ufo_df[['Year', 'Month', 'Day', 'Date', 'Name', 'Country', 'Description']].copy()
ufo_df_clean


Unnamed: 0,Year,Month,Day,Date,Name,Country,Description
1,1909,,,1909,Mystery airships,New Zealand,Strange moving lights and some solid bodies in...
2,1917,08,1309,1917-08-1309-1310-13,Miracle of the Sun,Portugal,Thousands of people observed the sun gyrate an...
3,1940,,,1940s,Foo fighters,Over World War II theaters,Small metallic spheres and colorful balls of l...
4,1941,,,1941,Cape Girardeau UFO crash,United States,First responders and a Baptist minister allege...
5,1942,,,1942,Hopeh Incident,China,Photographs show what is reported to be a UFO.
6,1942,02,24,1942-02-24,Battle of Los Angeles,United States,Unidentified aerial objects trigger the firing...
7,1946,,,1946,The Ghost Rockets,"Mostly in Scandinavia, but also other European...",Numerous UFO sightings were reported over Scan...
8,1946,05,18,1946-05-18,UFO-Memorial Ängelholm,Sweden,Gösta Karlsson reports seeing a flying saucer ...
9,1947,06,21,1947-06-21,Maury Island incident,United States,Harold A. Dahl reported that his dog was kille...
10,1947,06,24,1947-06-24,Kenneth Arnold UFO sighting,United States,The UFO sighting that sparked the name flying ...


In [7]:
# Eclipse data, first needed manual cleaning in .csv files to remove extra columns from random rows
eclipse_1900 = pd.read_csv('Data/1901-2000.csv', index_col=False)
eclipse_2000 = pd.read_csv('Data/2001-2100.csv', index_col=False)
eclipse_df = eclipse_1900.append(eclipse_2000)

# Select just desired columns from dataset
eclipse_df_clean = eclipse_df[['Catalog Number', 'Calendar Year', 'Calendar Month', 'Calendar Day', 'Ecl. Type',
                              u'Lat \N{DEGREE SIGN}', u'Long \N{DEGREE SIGN}']]

# Rename columns to SQL friendly names
eclipse_df_clean = eclipse_df_clean.rename(columns={'Catalog Number':'catalog_number', 
                                 'Calendar Year': 'year', 
                                 'Calendar Month': 'month_old', 
                                 'Calendar Day': 'day', 
                                 'Ecl. Type': 'eclipse_type', 
                                 u'Lat \N{DEGREE SIGN}': 'latitude_old', 
                                 u'Long \N{DEGREE SIGN}': 'longitude_old'})

# Change lat/lon notation from N/S/E/W to -/+ notation
latitude = eclipse_df_clean['latitude_old']
new_lat = []
for lat in latitude:
    
    if lat[-1] == 'S':
        lat = lat[:-1]
        lat = '-' + ''.join(lat)
    else:
        lat = lat[:-1]
        lat = ''.join(lat)
    new_lat.append(lat)

longitude = eclipse_df_clean['longitude_old']
new_lon = []
for lon in longitude:
    if lon[-1] == 'W':
        lon = lon[:-1]
        lon = '-' + ''.join(lon)
    else:
        lon = lon[:-1]
        lon = ''.join(lon)
    new_lon.append(lon)

# Add new latitude/longitude columns, remove columns with old formatting
eclipse_df_clean['latitude'] = new_lat
eclipse_df_clean['longitude'] = new_lon
eclipse_df_clean = eclipse_df_clean.drop(columns=['latitude_old', 'longitude_old'])

# Replace month abbreviations with month numeric indicators
months = eclipse_df_clean['month_old']
months_new = []
def months_to_numbers(argument): 
    switcher = { 
        'Jan': '01', 
        'Feb': '02', 
        'Mar': '03',
        'Apr': '04',
        'May': '05',
        'Jun': '06',
        'Jul': '07',
        'Aug': '08',
        'Sep': '09',
        'Oct': '10',
        'Nov': '11',
        'Dec': '12'
    } 
    return switcher.get(argument, "nothing") 

for month in months:
    month = months_to_numbers(month)
    months_new.append(month)

# Insert new month column, remove old month column
eclipse_df_clean.insert(loc=2, column='month', value=months_new)
eclipse_df_clean = eclipse_df_clean.drop(columns=['month_old'])
eclipse_df_clean.head()

Unnamed: 0,catalog_number,year,month,day,eclipse_type,latitude,longitude
0,9283,1901,5,18,T,-2,98
1,9284,1901,11,11,A,11,69
2,9285,1902,4,8,Pe,72,-142
3,9286,1902,5,7,P,-70,-125
4,9287,1902,10,31,P,71,101


In [8]:
## Stephanie Extract/Transform below

# Extract CSVs into DataFrames
    AtlanticStorms from https://www.kaggle.com/noaa/hurricane-database
        - Each date has up to 5 observations per day (but not all days have 5)
        - Older data appears to use -999 from wind pressure and speed instead of something like NA
        - ID: AL = Atlantic, XX = number storm for year, YYYY = year
    PacificStorms from https://www.kaggle.com/noaa/hurricane-database
        - Each date has up to 5 observations per day (but not all days have 5)
        - Older data appears to use -999 from wind pressure and speed instead of something like NA
        - ID: EP = Pacific, XX = number storm for year, YYYY = year
    Volcanoes from https://data.world/dhs/historical-significant

In [9]:
#Extract Atlantic Storm Data

AtlanticStorm = "Data/Atlantic_Storms.csv"
AtlanticStorm_df = pd.read_csv(AtlanticStorm)
AtlanticStorm_df.head()

Unnamed: 0,ID,Name,Date,Time,Event,Status,Latitude,Longitude,Maximum Wind,Minimum Pressure,...,Low Wind SW,Low Wind NW,Moderate Wind NE,Moderate Wind SE,Moderate Wind SW,Moderate Wind NW,High Wind NE,High Wind SE,High Wind SW,High Wind NW
0,AL011851,UNNAMED,18510625,0,,HU,28.0N,94.8W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
1,AL011851,UNNAMED,18510625,600,,HU,28.0N,95.4W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
2,AL011851,UNNAMED,18510625,1200,,HU,28.0N,96.0W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
3,AL011851,UNNAMED,18510625,1800,,HU,28.1N,96.5W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
4,AL011851,UNNAMED,18510625,2100,L,HU,28.2N,96.8W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999


In [10]:
#Extract Pacific Storm Data

PacificStorm = "Data/Pacific_Storms.csv"
PacificStorm_df = pd.read_csv(PacificStorm)
PacificStorm_df.head()

Unnamed: 0,ID,Name,Date,Time,Event,Status,Latitude,Longitude,Maximum Wind,Minimum Pressure,...,Low Wind SW,Low Wind NW,Moderate Wind NE,Moderate Wind SE,Moderate Wind SW,Moderate Wind NW,High Wind NE,High Wind SE,High Wind SW,High Wind NW
0,EP011949,UNNAMED,19490611,0,,TS,20.2N,106.3W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
1,EP011949,UNNAMED,19490611,600,,TS,20.2N,106.4W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
2,EP011949,UNNAMED,19490611,1200,,TS,20.2N,106.7W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
3,EP011949,UNNAMED,19490611,1800,,TS,20.3N,107.7W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
4,EP011949,UNNAMED,19490612,0,,TS,20.4N,108.6W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999


In [11]:
#Transform Hurricane Data
# Combine Atlantic and Pacific Storm Data

AtlPacStorms = [AtlanticStorm_df, PacificStorm_df]
AtlPacStorms_df = pd.concat(AtlPacStorms).reset_index(drop=True)
AtlPacStorms_df.head()

Unnamed: 0,ID,Name,Date,Time,Event,Status,Latitude,Longitude,Maximum Wind,Minimum Pressure,...,Low Wind SW,Low Wind NW,Moderate Wind NE,Moderate Wind SE,Moderate Wind SW,Moderate Wind NW,High Wind NE,High Wind SE,High Wind SW,High Wind NW
0,AL011851,UNNAMED,18510625,0,,HU,28.0N,94.8W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
1,AL011851,UNNAMED,18510625,600,,HU,28.0N,95.4W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
2,AL011851,UNNAMED,18510625,1200,,HU,28.0N,96.0W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
3,AL011851,UNNAMED,18510625,1800,,HU,28.1N,96.5W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
4,AL011851,UNNAMED,18510625,2100,L,HU,28.2N,96.8W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999


In [12]:
# Check that Pacific Storms are included in combined df

AtlPacStorms_df.loc[AtlPacStorms_df['ID'] == "EP011949"]

Unnamed: 0,ID,Name,Date,Time,Event,Status,Latitude,Longitude,Maximum Wind,Minimum Pressure,...,Low Wind SW,Low Wind NW,Moderate Wind NE,Moderate Wind SE,Moderate Wind SW,Moderate Wind NW,High Wind NE,High Wind SE,High Wind SW,High Wind NW
49105,EP011949,UNNAMED,19490611,0,,TS,20.2N,106.3W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
49106,EP011949,UNNAMED,19490611,600,,TS,20.2N,106.4W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
49107,EP011949,UNNAMED,19490611,1200,,TS,20.2N,106.7W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
49108,EP011949,UNNAMED,19490611,1800,,TS,20.3N,107.7W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
49109,EP011949,UNNAMED,19490612,0,,TS,20.4N,108.6W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
49110,EP011949,UNNAMED,19490612,600,,TS,20.5N,109.4W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999
49111,EP011949,UNNAMED,19490612,1200,,TS,20.6N,110.2W,45,-999,...,-999,-999,-999,-999,-999,-999,-999,-999,-999,-999


In [13]:
AtlPacStorms_df.dtypes

ID                  object
Name                object
Date                 int64
Time                 int64
Event               object
Status              object
Latitude            object
Longitude           object
Maximum Wind         int64
Minimum Pressure     int64
Low Wind NE          int64
Low Wind SE          int64
Low Wind SW          int64
Low Wind NW          int64
Moderate Wind NE     int64
Moderate Wind SE     int64
Moderate Wind SW     int64
Moderate Wind NW     int64
High Wind NE         int64
High Wind SE         int64
High Wind SW         int64
High Wind NW         int64
dtype: object

In [14]:
# Adjust date format

# make string version of original Date column, call it 'col'
AtlPacStorms_df['col'] = AtlPacStorms_df['Date'].apply(str)

# make the new columns using string indexing
AtlPacStorms_df['Year'] = AtlPacStorms_df['col'].str[0:4]
AtlPacStorms_df['Month'] = AtlPacStorms_df['col'].str[4:6]
AtlPacStorms_df['Day'] = AtlPacStorms_df['col'].str[6:8]

# get rid of the extra variable (if you want)
AtlPacStorms_df.drop('col', axis=1, inplace=True)

#check result
AtlPacStorms_df.head()

Unnamed: 0,ID,Name,Date,Time,Event,Status,Latitude,Longitude,Maximum Wind,Minimum Pressure,...,Moderate Wind SE,Moderate Wind SW,Moderate Wind NW,High Wind NE,High Wind SE,High Wind SW,High Wind NW,Year,Month,Day
0,AL011851,UNNAMED,18510625,0,,HU,28.0N,94.8W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,1851,6,25
1,AL011851,UNNAMED,18510625,600,,HU,28.0N,95.4W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,1851,6,25
2,AL011851,UNNAMED,18510625,1200,,HU,28.0N,96.0W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,1851,6,25
3,AL011851,UNNAMED,18510625,1800,,HU,28.1N,96.5W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,1851,6,25
4,AL011851,UNNAMED,18510625,2100,L,HU,28.2N,96.8W,80,-999,...,-999,-999,-999,-999,-999,-999,-999,1851,6,25


In [15]:
#Select columns to keep

AtlPacStorms_df = AtlPacStorms_df[["Year", "Month", "Day", "ID", "Status", "Time", "Latitude", "Longitude"]]
AtlPacStorms_df.head()


Unnamed: 0,Year,Month,Day,ID,Status,Time,Latitude,Longitude
0,1851,6,25,AL011851,HU,0,28.0N,94.8W
1,1851,6,25,AL011851,HU,600,28.0N,95.4W
2,1851,6,25,AL011851,HU,1200,28.0N,96.0W
3,1851,6,25,AL011851,HU,1800,28.1N,96.5W
4,1851,6,25,AL011851,HU,2100,28.2N,96.8W


In [16]:
AtlPacStorms_df["Status"] = AtlPacStorms_df['Status'].astype(str)
AtlPacStorms_df.dtypes

Year         object
Month        object
Day          object
ID           object
Status       object
Time          int64
Latitude     object
Longitude    object
dtype: object

In [17]:
# We are only interested in Hurricanes to only keep rows with Status = HU

AtlPacStorms_df = AtlPacStorms_df.loc[AtlPacStorms_df["Status"] == " HU"]
AtlPacStorms_df.head()

Unnamed: 0,Year,Month,Day,ID,Status,Time,Latitude,Longitude
0,1851,6,25,AL011851,HU,0,28.0N,94.8W
1,1851,6,25,AL011851,HU,600,28.0N,95.4W
2,1851,6,25,AL011851,HU,1200,28.0N,96.0W
3,1851,6,25,AL011851,HU,1800,28.1N,96.5W
4,1851,6,25,AL011851,HU,2100,28.2N,96.8W


In [18]:
# Keep only the first observation of each unique ID

Hurricane_df = AtlPacStorms_df.drop_duplicates(subset=[AtlPacStorms_df.columns[3]], keep = "first")
Hurricane_df.head()

Unnamed: 0,Year,Month,Day,ID,Status,Time,Latitude,Longitude
0,1851,6,25,AL011851,HU,0,28.0N,94.8W
14,1851,7,5,AL021851,HU,1200,22.2N,97.6W
22,1851,8,17,AL041851,HU,1200,15.9N,58.5W
102,1852,8,20,AL011852,HU,0,21.2N,70.6W
143,1852,9,5,AL021852,HU,0,17.0N,64.1W


In [19]:
# Drop Status and Time columns

Hurricane_df = Hurricane_df[["Year", "Month", "Day", "ID", "Latitude", "Longitude"]]
Hurricane_df = Hurricane_df.reset_index(drop = True)
Hurricane_df.head()


Unnamed: 0,Year,Month,Day,ID,Latitude,Longitude
0,1851,6,25,AL011851,28.0N,94.8W
1,1851,7,5,AL021851,22.2N,97.6W
2,1851,8,17,AL041851,15.9N,58.5W
3,1852,8,20,AL011852,21.2N,70.6W
4,1852,9,5,AL021852,17.0N,64.1W


In [20]:
#Re-format Latitude and Longitude to align with format in other tables
#rename columns
Hurricane_df.columns = ["Year", "Month", "Day", "ID", "latitude_old", "longitude_old"]
Hurricane_df.head()

Unnamed: 0,Year,Month,Day,ID,latitude_old,longitude_old
0,1851,6,25,AL011851,28.0N,94.8W
1,1851,7,5,AL021851,22.2N,97.6W
2,1851,8,17,AL041851,15.9N,58.5W
3,1852,8,20,AL011852,21.2N,70.6W
4,1852,9,5,AL021852,17.0N,64.1W


In [21]:
#Change format
latitude = Hurricane_df['latitude_old']
new_lat = []
for lat in latitude:

   if lat[-1] == 'S':
       lat = lat[:-1]
       lat = '-' + ''.join(lat)
   else:
       lat = lat[:-1]
       lat = ''.join(lat)
   new_lat.append(lat)

longitude = Hurricane_df['longitude_old']
new_lon = []
for lon in longitude:
   if lon[-1] == 'W':
       lon = lon[:-1]
       lon = '-' + ''.join(lon)
   else:
       lon = lon[:-1]
       lon = ''.join(lon)
   new_lon.append(lon)

In [22]:
#add new lists to dataframe
Hurricane_df['Latitude'] = new_lat
Hurricane_df['Longitude'] = new_lon
Hurricane_df.head()

Unnamed: 0,Year,Month,Day,ID,latitude_old,longitude_old,Latitude,Longitude
0,1851,6,25,AL011851,28.0N,94.8W,28.0,-94.8
1,1851,7,5,AL021851,22.2N,97.6W,22.2,-97.6
2,1851,8,17,AL041851,15.9N,58.5W,15.9,-58.5
3,1852,8,20,AL011852,21.2N,70.6W,21.2,-70.6
4,1852,9,5,AL021852,17.0N,64.1W,17.0,-64.1


In [23]:
#drop old columns
Hurricane_df = Hurricane_df[['Year','Month','Day','ID','Latitude','Longitude']]
Hurricane_df.head()

Unnamed: 0,Year,Month,Day,ID,Latitude,Longitude
0,1851,6,25,AL011851,28.0,-94.8
1,1851,7,5,AL021851,22.2,-97.6
2,1851,8,17,AL041851,15.9,-58.5
3,1852,8,20,AL011852,21.2,-70.6
4,1852,9,5,AL021852,17.0,-64.1


In [24]:
#Extract Volcano Data

Volcano = "Data/Historical_Significant_Volcanic_Eruption_Locations.csv"
Volcano_df = pd.read_csv(Volcano)
Volcano_df.head()

Unnamed: 0,X,Y,YEAR,MO,DAY,VEI,VOL_ID,FATALITIES,ASSOC_EQ,ASSOC_TSU,...,MORPHOLOGY,STATUS,TIME_ERUPT,ID,COUNTRY,NUM_SLIDES,SIG_ID,TSU_ID,HAZ_EVENT_ID,OBJECTID
0,176.0,-38.82,230,,,6.0,40107,,,,...,Caldera,Radiocarbon,D6,40107,New Zealand,0,,,1904,563
1,140.45,38.15,1867,10.0,21.0,2.0,80319,3.0,,,...,Complex volcano,Historical,D2,80319,Japan,0,,,1232,813
2,37.25,27.08,640,,,2.0,30102,,,,...,Volcanic field,Anthropology,D6,30102,Saudi Arabia,0,,,1352,827
3,140.28,37.62,1900,7.0,17.0,2.0,80317,72.0,,,...,Stratovolcano,Historical,D1,80317,Japan,0,,,1213,835
4,150.108,-5.056,1895,,,2.0,50204,,,,...,Caldera,Anthropology,D3,50204,Papua New Guinea,0,,,1353,27


In [25]:
#Transform Volcano Data
#Select columns

Volcano_df = Volcano_df[["YEAR", "MO", "DAY", "VOL_ID", "NAME", "LOCATION", "LATITUDE", "LONGITUDE"]]
Volcano_df.head()

Unnamed: 0,YEAR,MO,DAY,VOL_ID,NAME,LOCATION,LATITUDE,LONGITUDE
0,230,,,40107,Taupo,New Zealand,-38.82,176.0
1,1867,10.0,21.0,80319,Zao,Honshu-Japan,38.15,140.45
2,640,,,30102,"Uwayrid, Harrat",Arabia-W,27.08,37.25
3,1900,7.0,17.0,80317,Adatara,Honshu-Japan,37.62,140.28
4,1895,,,50204,Dakataua,New Britain-SW Pac,-5.056,150.108


In [26]:
#Rename Columns

Volcano_df.columns = ['Year', 'Month', 'Day', 'Volcano_ID', 'Volcano_Name', 'Location', 'Latitude', 'Longitude']
Volcano_df.head()

Unnamed: 0,Year,Month,Day,Volcano_ID,Volcano_Name,Location,Latitude,Longitude
0,230,,,40107,Taupo,New Zealand,-38.82,176.0
1,1867,10.0,21.0,80319,Zao,Honshu-Japan,38.15,140.45
2,640,,,30102,"Uwayrid, Harrat",Arabia-W,27.08,37.25
3,1900,7.0,17.0,80317,Adatara,Honshu-Japan,37.62,140.28
4,1895,,,50204,Dakataua,New Britain-SW Pac,-5.056,150.108


In [27]:
## Chris Load below

In [28]:
# Create database connection & engine
conn = f"{config.username}:{config.password}@127.0.0.1/disaster_etl"
engine = create_engine(f'mysql+pymysql://{conn}')

In [29]:
# Check table names from database
engine.table_names()

['eclipse_event', 'hurricanes', 'ufo_sightings', 'volcano_eruptions']

In [30]:
# Load UFO data into database table
ufo_df_clean.to_sql(name='ufo_sightings', con=engine, if_exists='append', index=False)

In [31]:
# Load Eclipse data into database table
eclipse_df_clean.to_sql(name='eclipse_event', con=engine, if_exists='append', index=False)

In [32]:
## Stephanie Load below

In [33]:
conn = f"{config.username}:{config.password}@127.0.0.1/disaster_etl"
engine = create_engine(f'mysql+pymysql://{conn}')

In [34]:
engine.table_names()

['eclipse_event', 'hurricanes', 'ufo_sightings', 'volcano_eruptions']

In [35]:
# Load Volcano data into database table
Volcano_df.to_sql(name='volcano_eruptions', con=engine, if_exists='append', index=False)

In [36]:
# Load Hurricane data into database table
Hurricane_df.to_sql(name='hurricanes', con=engine, if_exists='append', index=False)