# Python Tutorial

This notebook was derived with content from the Galvanize Intro to Python Fundamentals course

## Plotting with Pandas

We begin by importing the data as before, and then visualizing some of the stats that we found in the previous notebook. 

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

In [None]:
wine_df = pd.read_csv('data/winequality-red.csv')
wine_df.head()

In [None]:
# Plot the average amount of chlorides for each quality value  

wine_df.groupby('quality').mean()['chlorides'].plot(kind='bar')

The `plot` method parameter `kind` can also be set to `scatter`, `hist`, or `box`

In [None]:
# Let's discover if there is a relationship between pH and alcohol levels

wine_df.plot(kind='scatter', y='alcohol', x='pH')

In [None]:
# Is there a relationship between ph and fixed acidity?

wine_df.plot(kind='scatter', y='fixed acidity', x='pH')

In [None]:
# Are the quality values evenly distributed?

wine_df['quality'].hist()

In [None]:
# What is the distribution of citric acid? 

wine_df['citric acid'].plot(kind='box')

## Plotting with Seaborn

Seaborn is another data visualization library. Use pip to install the seaborn  with the command `pip install seaborn`. For full documentation on the different types of plots available visit - https://seaborn.pydata.org/api.html

In [None]:
import seaborn as sns

In [None]:
_= sns.barplot(x='quality', y='chlorides', data=wine_df)

This bar chart is similar to the one generated with pandas, but there are 90% confidence bars included with the each bar. We can see that the chloride levels have a larger range for quality 3 wines than quality 6 wines. See the documentation for default parameters that can be changed. 

We can also create **histograms and scatterplots with seaborn** like we did with pandas, but with an added grouping layer called `hue`. 

In [None]:
_= sns.scatterplot(x='pH', y='fixed acidity', hue='quality', data=wine_df)

In [None]:
# We can also specify the color scheme for plots 

_= sns.scatterplot(x='pH', y='fixed acidity', hue='quality', data=wine_df, palette='deep')

## Plotting categorical data

What if we want to see distribution of alcohol percentage depending on the amount of sugar in the wine? Let's first create a new column that labels each wine as high or low sugar. 

In [None]:
avg_sugar = wine_df['residual sugar'].mean()
wine_df['high_sugar'] = wine_df['residual sugar'].apply(lambda x: x >=avg_sugar)
wine_df[['residual sugar', 'high_sugar']]

In [None]:
_= sns.histplot(data=wine_df, x='alcohol', hue='high_sugar', stat='density', kde=True)


These are just a few examples of common plots that you can make. Your next assignment is to create your own visuals through the **Exploratory Data Analysis (EDA) process**. See the folder `eda-examples` for an example and your next task.  