# Winemag SCV Extract and Cleanup

Comments: 
    1) Convert CSV into DataFrame
    2) Create new data frame with select columns
    3) Determinied that there were incomplete rows and eliminated those rows. 
    4) Evaluated data to determine that data was of the appropriate type. It looked good for now.
    5) Renamed Pinot Grigio to match Pinot Gris in other dataset. 
    5) Eliminated all wines that were not included in the top 6 favorite wines per https://artwinepreserver.com/pages/types-of-wine. 
    6) Reindexed the DataFrame after eliminating non-top 6 wines.  
 

In [1]:
import pandas as pd
from sqlalchemy import create_engine

### Store CSV into DataFrame

In [2]:
csv_file = "Resources/winemag-data_first150k_for_project.csv"
winemag_data_df = pd.read_csv(csv_file)
winemag_data_df.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,variety,winery
0,34920,France,"A big, powerful wine that sums up the richness...",,99,2300.0,Bordeaux,Pauillac,,Bordeaux-style Red Blend,Château Latour
1,13318,US,The nose on this single-vineyard wine from a s...,Roger Rose Vineyard,91,2013.0,California,Arroyo Seco,Central Coast,Chardonnay,Blair
2,34922,France,"A massive wine for Margaux, packed with tannin...",,98,1900.0,Bordeaux,Margaux,,Bordeaux-style Red Blend,Château Margaux
3,26296,France,A wine that has created its own universe. It h...,Clos du Mesnil,100,1400.0,Champagne,Champagne,,Chardonnay,Krug
4,51886,France,A wine that has created its own universe. It h...,Clos du Mesnil,100,1400.0,Champagne,Champagne,,Chardonnay,Krug


### Create new data frame with select columns

In [3]:
new_winemag_data_df = winemag_data_df[['country', 'description', 'points', 'price', 'province', 'variety', 'winery']].copy()
new_winemag_data_df.head()

Unnamed: 0,country,description,points,price,province,variety,winery
0,France,"A big, powerful wine that sums up the richness...",99,2300.0,Bordeaux,Bordeaux-style Red Blend,Château Latour
1,US,The nose on this single-vineyard wine from a s...,91,2013.0,California,Chardonnay,Blair
2,France,"A massive wine for Margaux, packed with tannin...",98,1900.0,Bordeaux,Bordeaux-style Red Blend,Château Margaux
3,France,A wine that has created its own universe. It h...,100,1400.0,Champagne,Chardonnay,Krug
4,France,A wine that has created its own universe. It h...,100,1400.0,Champagne,Chardonnay,Krug


### Basic Cleaning

In [4]:
# Check for missing information
new_winemag_data_df.count()

country        150925
description    150930
points         150930
price          137235
province       150925
variety        150930
winery         150930
dtype: int64

In [5]:
# Drop all rows with missing information
new_winemag_data_df = new_winemag_data_df.dropna(how='any')

In [6]:
# Check that all columns have the same amount of rows now. 
new_winemag_data_df.count()

country        137230
description    137230
points         137230
price          137230
province       137230
variety        137230
winery         137230
dtype: int64

In [7]:
# Check that the data types fit the data. 
new_winemag_data_df.dtypes

country         object
description     object
points           int64
price          float64
province        object
variety         object
winery          object
dtype: object

In [8]:
# look at the values in the 'variety' column.
new_winemag_data_df['variety'].value_counts()

Chardonnay            13775
Pinot Noir            13625
Cabernet Sauvignon    12671
Red Blend              9377
Sauvignon Blanc        6054
                      ...  
Pinotage-Merlot           1
Forcallà                  1
Pinela                    1
Früburgunder              1
Erbaluce                  1
Name: variety, Length: 619, dtype: int64

Looking at the csv file directly, it was clear both Pinot Grigio and Pinot Gris where listed in the variety column. The cell below renames Pinot Grigio to Pinot Gris to match the other data set. 

In [9]:
new_winemag_data_df['variety'] = new_winemag_data_df['variety'].replace(
    {'Pinot Grigio': 'Pinot Gris'})

Isolate the top 6 wine types per https://artwinepreserver.com/pages/types-of-wine: Cabernet Sauvignon, Chardonnay, Pinot Grigio, Pinot Noir, Sauvignon Blanc, Merlot. 

In [10]:
new_winemag_df = new_winemag_data_df.loc[(new_winemag_data_df['variety'] == "Cabernet Sauvignon") | (new_winemag_data_df['variety'] == "Chardonnay") 
                       | (new_winemag_data_df['variety'] == "Pinot Gris") | (new_winemag_data_df['variety'] == "Pinot Noir")
                       | (new_winemag_data_df['variety'] == "Sauvignon Blanc") | (new_winemag_data_df['variety'] == "Merlot")]
count = new_winemag_df['variety'].value_counts()
count

Chardonnay            13775
Pinot Noir            13625
Cabernet Sauvignon    12671
Sauvignon Blanc        6054
Merlot                 4987
Pinot Gris             2545
Name: variety, dtype: int64

In [16]:
us_winemag_df = new_winemag_df.loc[(new_winemag_df['country'] == "US")] 
us_winemag_df.count()

country        35029
description    35029
points         35029
price          35029
province       35029
variety        35029
winery         35029
dtype: int64

In [12]:
new_winemag_df.head()

Unnamed: 0,country,description,points,price,province,variety,winery
1,US,The nose on this single-vineyard wine from a s...,91,2013.0,California,Chardonnay,Blair
3,France,A wine that has created its own universe. It h...,100,1400.0,Champagne,Chardonnay,Krug
4,France,A wine that has created its own universe. It h...,100,1400.0,Champagne,Chardonnay,Krug
5,France,A wine that has created its own universe. It h...,100,1400.0,Champagne,Chardonnay,Krug
26,France,From arguably the finest white wine vineyard i...,98,757.0,Burgundy,Chardonnay,Bouchard Père & Fils


In [13]:
new_winemag_df = new_winemag_df.reset_index(drop=True)
new_winemag_df.head()

Unnamed: 0,country,description,points,price,province,variety,winery
0,US,The nose on this single-vineyard wine from a s...,91,2013.0,California,Chardonnay,Blair
1,France,A wine that has created its own universe. It h...,100,1400.0,Champagne,Chardonnay,Krug
2,France,A wine that has created its own universe. It h...,100,1400.0,Champagne,Chardonnay,Krug
3,France,A wine that has created its own universe. It h...,100,1400.0,Champagne,Chardonnay,Krug
4,France,From arguably the finest white wine vineyard i...,98,757.0,Burgundy,Chardonnay,Bouchard Père & Fils
