# Acquire
- Data is collected from Data.World Wine Quality Dataset

In [1]:
# imports:
import pandas as pd
import numpy as np
import nikki_env as env

Data Dictionary: 
These variables were based on physciochemical tests. Physicochemical tests are: tests that evaluate the materials of the contrainer component or system to ensure purity and the absence of harmful contaminants or residuals from the manufacturing process.
|**Input Variables**|**Description**|
|----------|----------------|
|Fixed Acidity| corresponds to the set of low volatility organic acids such as malic, lactic, tartaric or citric acids and is inherent to the characteristics of the sample|
|Volatile Acidity | corresponds to the set of short chain organic acids that can be extracted from the sample by means of a distillation process: formic acid, acetic acid, propionic acid and butyric acid. If there is a large amount of organic acids it can lead to a upleasent, vingear taste. | 
| Citric Acid | Often added to wines to increase acidity, which can result in a "fresh" flavor to the wine|
| Residual Sugar | Measured in grams per liter, it is the natural graph sugar leftover in a wine afte the alcholic fermentation finishes. For example dry wine has 0-4 g/L while sweet 35 g/L | 
| Chlorides | chloride ions give the preception of a salty taste in the wine |
| Free Sulfur Dioxide | protects wine by scavenging oxygen and interrupting microbiological activity |
| Total Sulfur Dioxide | is the portion of SO2 that is free in the wine plus the portion that is bound to other chemicals in the wine such as aldehydes, pigments, or sugar. The levels are regulated by the TTB |
| Density | is the measurement of how tightly a material is packed together. The density of wine is slightly less than that of water |
| ph | is referred to as acidity or basicitity. Wine has 3.0-3.5, water is 7 pH |
| Sulphates | Protects the wine agains oxidation, which can effect the color and the tatse of wine |
| Alcohol |  standard measure of how much alcohol (ethanol) there is within a given volume of the drink, in percentage terms | 
| Quality | subtlety and complexity, aging potential, stylistic purity, varietal expression, ranking by experts, or consumer acceptance. Normally is on a scale of 1 being the worst, 10 being the best |
| Wine color | The color of the wine, red and white |

In [2]:
# create a table for red wine
red = pd.read_csv('winequality-red.csv')

In [3]:
# create a table for white wine
white = pd.read_csv('winequality-white.csv')

In [4]:
# create columns in both tables that describe what the wine type is 
red['wine_color'] = 'red'
white['wine_color'] = 'white'

In [5]:
# combined the two data frames
wine_df = pd.concat([red, white], ignore_index = True)
wine_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,wine_color
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,red
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,red
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,red
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red


### Create Functions:

In [6]:
# imports
import os

In [7]:
def wine_df ():
    '''
    This function will:
    - read in two csv files: winequality-red.csv and winequlaity-white.csv
    - it will then create a new column wine_type
    - it will combined the two data frames into a new wine_df
    - it will read the dataframe into a new csv: wine.csv
    '''
    
    # create the csv: 
    if os.path.isfile('wine.csv'):
        
        #if the csv file exists it will read the file
        wine_df = pd.read_csv('wine.csv', index_col = 0)
    
    else: 
        # create a table for red wine
        red = pd.read_csv('winequality-red.csv')
    
        # create a table for white wine
        white = pd.read_csv('winequality-white.csv')
    
        # create columns in both tables that describe what the wine type is 
        red['wine_color'] = 'red'
        white['wine_color'] = 'white'
    
        # combined the two data frames
        wine_df = pd.concat([red, white], ignore_index = True)
    
        wine_df.to_csv('wine.csv')
        
    return wine_df

In [8]:
wine_df = wine_df()
wine_df.head()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality,wine_color
0,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red
1,7.8,0.88,0.0,2.6,0.098,25.0,67.0,0.9968,3.2,0.68,9.8,5,red
2,7.8,0.76,0.04,2.3,0.092,15.0,54.0,0.997,3.26,0.65,9.8,5,red
3,11.2,0.28,0.56,1.9,0.075,17.0,60.0,0.998,3.16,0.58,9.8,6,red
4,7.4,0.7,0.0,1.9,0.076,11.0,34.0,0.9978,3.51,0.56,9.4,5,red
