<a id="TableOfContents"></a>
# TABLE OF CONTENTS:
<li><a href='#imports'>Imports</a></li>
<li><a href="#acquiremvp">Acquire-MVP</a></li>
<li><a href='#preparemvp'>Prepare-MVP</a></li>
<li><a href="#acquire1">Acquire-V1</a></li>
<li><a href='#prepare1'>Prepare-V1</a></li>
<li><a href='#extra'>Extra</a></li>

<a id="imports"></a>
# Imports:
<li><a href='#TableOfContents'>Table of Contents</a></li>

In [1]:
# Vectorization and tables
import numpy as np
import pandas as pd

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns

# Stats
from scipy import stats

# .py files
import wrangle as w

<a id="acquiremvp"></a>
# Acquire-MVP:
<li><a href='#TableOfContents'>Table of Contents</a></li>

Acquire everything from the vanilla wine database via csv joins.

- Red Vanilla Shape:
    - Rows: 1599
    - Columns: 13
- White Vanilla Shape:
    - Rows: 4898
    - Columns: 13

In [2]:
# Get both red and white wine dataframes
red = pd.read_csv('https://query.data.world/s/k6viyg23e4usmgc2joiodhf2pvcvao?dws=00000')
white = pd.read_csv('https://query.data.world/s/d5jg7efmkn3kq7cmrvvfkx2ww7epq7?dws=00000')

<a id="preparemvp"></a>
# Prepare-MVP:
<li><a href='#TableOfContents'>Table of Contents</a></li>

In [3]:
# Label color of wine in respective dataframes
white['wine_color'] = 'white'
red['wine_color'] = 'red'

In [4]:
# Confirm column names are identical for proper joining
whitelist = white.drop(columns='wine_color').columns.to_list()
redlist = red.drop(columns='wine_color').columns.to_list()
whitelist == redlist

True

In [5]:
# Confirm dtypes are identical for proper joining
whitedtypes = white.drop(columns='wine_color').dtypes
reddtypes = red.drop(columns='wine_color').dtypes
whitedtypes == reddtypes

fixed acidity           True
volatile acidity        True
citric acid             True
residual sugar          True
chlorides               True
free sulfur dioxide     True
total sulfur dioxide    True
density                 True
pH                      True
sulphates               True
alcohol                 True
quality                 True
dtype: bool

In [6]:
# Get a sample of red to confirm data is identical
red.sample()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,wine_color
222,6.8,0.61,0.04,1.5,0.057,5.0,10.0,0.99525,3.42,0.6,9.5,5,red


In [7]:
# Get a sample of red to confirm data is identical
white.sample()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,wine_color
949,7.3,0.25,0.39,6.4,0.034,8.0,84.0,0.9942,3.18,0.46,11.5,5,white


<div style='background-color : green'>
<h1 style='text-align : center'><b><u><i>
    Both red and white are identical
</i></u></b></h1>

In [8]:
# Since identical, merge both by rows
wines = pd.concat([red, white], axis=0)
wines.sample()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,wine_color
1350,8.0,0.57,0.39,3.9,0.034,22.0,122.0,0.9917,3.29,0.67,12.8,7,white


In [9]:
# Absence of nulls
wines.isna().sum()

fixed acidity           0
volatile acidity        0
citric acid             0
residual sugar          0
chlorides               0
free sulfur dioxide     0
total sulfur dioxide    0
density                 0
pH                      0
sulphates               0
alcohol                 0
quality                 0
wine_color              0
dtype: int64

### List o' column determinations:
- Drop Columns:
    - None
- Fix columns:
    - None
- Create columns:
    - None

In [12]:
# Test .py file
train, validate, test = w.wrangle_wines_mvp()
train.shape, validate.shape, test.shape

train.shape:(3637, 12)
validate.shape:(1560, 12)
test.shape:(1300, 12)


((3637, 12), (1560, 12), (1300, 12))

In [21]:
wines = pd.read_csv('wines.csv', index_col=0)
wines.shape

(6497, 13)

- Prepped shape:
    - Rows: 6497
    - Columns: 13

<a id="acquire1"></a>
# Acquire-V1:
<li><a href='#TableOfContents'>Table of Contents</a></li>

<a id="prepare1"></a>
# Prepare-V1:
<li><a href='#TableOfContents'>Table of Contents</a></li>

<a id="extra"></a>
# Extra:
<li><a href='#TableOfContents'>Table of Contents</a></li>