### ETL-Project: Extract, Tranform, Load - A Tale of a Vineyard
This repository explores the concept of ETL's - Extract, Tranform, Load - by creating a database accessable through (SQL/Postgres?) to assess which locations in (Country/Area) are ideal to establish a vineyard. 

#### Team Members:
* Michael Bett
* Carmen Sin
* Josh Thomas
* Aline Hornoff

#### The Project
A good glass of wine to wind down after a day hectic and stress or celebrate a great achievment is a special treat. But where is the ideal location to turn a simple fruit like a grape into a glass of joy to celebrate a special occasion.

Making wine is a long, slow process. It can take a full three years to get from the initial planting of a brand-new grapevine through the first harvest, and the first vintage might not be bottled for another two years after that. Longterm investment is required and it is therefore vital to pick the right location to establish a vineyard.

The database created will give an insight into what locations are ideal for a particular variety based on weather conditions, soil constitution and grape varietel.

### Summary
What can the database be used for? Is the database relational/non-relational? Why?

In [9]:
# Import dependencies
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Import create_engine
from sqlalchemy import create_engine

### Extract
... Why did we choose these data Sources

#### Data Sources:
1. Department Primary Industries and Regional Development WA: https://www.agric.wa.gov.au/soil-api-10
2. https://www.kaggle.com: WineData.csv   
3. 

#### Steps:
* 
* 
* 

In [10]:
# Extraction Steps

# Read in WineData.csv
csv_path = "Resources/WineData.csv"
wine_df = pd.read_csv(csv_path)

## Transform
... why did we transform the data in this way

#### Step 1 - Transform WineData.csv
* Reading in WineData.csv and display data
* Create new dataframe with the following columns: country, price, province, region_1, title, winery, variety
* Rename column headers
* Filter dataframe for 'Western Australia'
* Inspect dataframe for unique values
* Reset index for each of the wineries
*  

In [11]:
# Transformation Steps

# Display
wine_df.head()

Unnamed: 0,id,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [12]:
# Create new dataframe with the following columns: country, price, province, region_1, title, winery, variety
cleanwine_df = wine_df[["country", "price", "province", "region_1", "title", "winery", "variety"]]

# Display dataframe
cleanwine_df


Unnamed: 0,country,price,province,region_1,title,winery,variety
0,Italy,,Sicily & Sardinia,Etna,Nicosia 2013 Vulkà Bianco (Etna),Nicosia,White Blend
1,Portugal,15.0,Douro,,Quinta dos Avidagos 2011 Avidagos Red (Douro),Quinta dos Avidagos,Portuguese Red
2,US,14.0,Oregon,Willamette Valley,Rainstorm 2013 Pinot Gris (Willamette Valley),Rainstorm,Pinot Gris
3,US,13.0,Michigan,Lake Michigan Shore,St. Julian 2013 Reserve Late Harvest Riesling ...,St. Julian,Riesling
4,US,65.0,Oregon,Willamette Valley,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Sweet Cheeks,Pinot Noir
...,...,...,...,...,...,...,...
129966,Germany,28.0,Mosel,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Dr. H. Thanisch (Erben Müller-Burggraef),Riesling
129967,US,75.0,Oregon,Oregon,Citation 2004 Pinot Noir (Oregon),Citation,Pinot Noir
129968,France,30.0,Alsace,Alsace,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Domaine Gresser,Gewürztraminer
129969,France,32.0,Alsace,Alsace,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Domaine Marcel Deiss,Pinot Gris


In [13]:
# Rename column heads
cleanwine_df = cleanwine_df.rename(columns={"country":"Country", "price":"Price", "province":"State", "region_1":"Region", "title": "Wine_Name", "winery":"Winery", "variety":"Grape_Variety"})

# Display dataframe
cleanwine_df

Unnamed: 0,Country,Price,State,Region,Wine_Name,Winery,Grape_Variety
0,Italy,,Sicily & Sardinia,Etna,Nicosia 2013 Vulkà Bianco (Etna),Nicosia,White Blend
1,Portugal,15.0,Douro,,Quinta dos Avidagos 2011 Avidagos Red (Douro),Quinta dos Avidagos,Portuguese Red
2,US,14.0,Oregon,Willamette Valley,Rainstorm 2013 Pinot Gris (Willamette Valley),Rainstorm,Pinot Gris
3,US,13.0,Michigan,Lake Michigan Shore,St. Julian 2013 Reserve Late Harvest Riesling ...,St. Julian,Riesling
4,US,65.0,Oregon,Willamette Valley,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Sweet Cheeks,Pinot Noir
...,...,...,...,...,...,...,...
129966,Germany,28.0,Mosel,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Dr. H. Thanisch (Erben Müller-Burggraef),Riesling
129967,US,75.0,Oregon,Oregon,Citation 2004 Pinot Noir (Oregon),Citation,Pinot Noir
129968,France,30.0,Alsace,Alsace,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Domaine Gresser,Gewürztraminer
129969,France,32.0,Alsace,Alsace,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Domaine Marcel Deiss,Pinot Gris


In [14]:
# Filter dataframe for 'Western Australia'
WaWinedf = cleanwine_df.loc[cleanwine_df["State"]=="Western Australia"]

# Display dataframe
WaWinedf

Unnamed: 0,Country,Price,State,Region,Wine_Name,Winery,Grape_Variety
652,Australia,21.0,Western Australia,Great Southern,Plantagenet 2014 Riesling (Great Southern),Plantagenet,Riesling
1649,Australia,16.0,Western Australia,Margaret River,Xanadu 2015 Exmoor Sauvignon Blanc-Semillon (M...,Xanadu,Sauvignon Blanc-Semillon
1707,Australia,23.0,Western Australia,Margaret River,Vasse River 2006 Chardonnay (Margaret River),Vasse River,Chardonnay
1930,Australia,30.0,Western Australia,Margaret River,Robert Oatley 2011 Finisterre Chardonnay (Marg...,Robert Oatley,Chardonnay
2312,Australia,33.0,Western Australia,Pemberton,Picardy 2006 Chardonnay (Pemberton),Picardy,Chardonnay
...,...,...,...,...,...,...,...
125506,Australia,15.0,Western Australia,Margaret River,Franklin Tate 2013 Tate Chardonnay (Margaret R...,Franklin Tate,Chardonnay
125578,Australia,37.0,Western Australia,Western Australia,Marchand & Burch 2014 Villages Chardonnay (Wes...,Marchand & Burch,Chardonnay
126392,Australia,16.0,Western Australia,Margaret River,Skuttlebutt 2012 Sauvignon Blanc-Semillon (Mar...,Skuttlebutt,Sauvignon Blanc-Semillon
127225,Australia,90.0,Western Australia,Western Australia,Howard Park 2012 Abercrombie Cabernet Sauvigno...,Howard Park,Cabernet Sauvignon


### Load
... why did we choose this method 

#### Steps:
* 
* 
* 

In [None]:
# Loading Steps