# Making better graphs in Python 3 with Matplotlib and Seaborn.
**CS 461 Sauppe, Spring 2020**<br>
***Tutorial created by Douglas Krouth***

<p><img src="Skiena_graph_guide.jpg" align="left">


# Why is graph design important?
<p>For data scientists, the creation of effective graphics and visualizations is crucial to presenting one's analysis and potential insights. The use of graphical tools allows us to simplify our work by removing unneccessary complexity; this leads to easier interpretation and a greater focus on key points.
    
<p>When creating a data product that will be used regularly (dashboards, interactive graphs, etc.), it is crucial to incorporate strong visuals as this will enhance user experience and the overall efficiency of the tool. Robust data graphics enable for less time to be spent on interpretation, which frees up time for further analysis or questioning on behalf of the user / audience.
    
<p> Without the use of well-made graphics, we run the risk of misrepresenting central parts of our analysis. On one end, a poorly made graph may just bore our audience; a graph or chart that is quickly seen and forgotten after the presentation's conclusion. On the other hand, a poorly made visual has the potential to mislead your audience or misconstrue your analysis. Both of these outcomes are suboptimal and should always be avoided.

## Quick review: Five principes of effective graph design
as written by Edward R. Tufte in *The Visual Display of Quantitative Information* [1]
***

<p>Above all else show the data. 
<img src="chartjunk.jpg" width="250" align="right">
<p>Maximize the data-ink ratio.
<p>Erase non-data ink.
<p>Erase redundant data-ink.
<p>Revise and edit.


# Let's get started

Listed below are the topics that we'll go over in this tutorial:
* [Importing the required libraries](#1)
* [Loading the practice datasets](#2)
* [Exploratory data analysis: summary stats. and basic Matplotlib graphs](#3)
* [Improving our graphs using Seaborn and other tools](#4)
* [Creating animated graphs](#5)

## 1. Importing the required libraries<a name="1"></a>

***
**pandas**: Using pandas will allow us to store our data in a DataFrame format; this will be extremely useful for quickly plotting the data and using the *pandas.describe()* function to give us some summary statistics during later steps.<br>
* doc: https://pandas.pydata.org/docs/<br>
* installation: https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html
***

**matplotlib.pyplot**: This library is extreme useful for providing us with ready-to-go plots that can be easily modified. Matplotlib is most useful when deployed for basic plotting on things like scatter or line plots, bar charts, and histograms. We will be using this to create our exploratory visuals.<br>
* doc: https://matplotlib.org/3.2.1/contents.html
* installation: https://matplotlib.org/users/installing.html
***

**seaborn**: Seaborn is a data visualization library that was based off of Matplotlib for the purpose of providing a much greater variety of visualization patterns. This tool is best used for creating statistical visualizations that require greater degrees of complexity, such as 3-D modeling or multiple visuals overlaid on the same graph.<br>
We will also be using the *load_dataset()* from Seaborn to obtain some test datasets that will be used during our demonstration.
* doc & tutorials: https://seaborn.pydata.org/tutorial.html
* installation: https://seaborn.pydata.org/installing.html
* seaborn.load_dataset(): https://seaborn.pydata.org/generated/seaborn.load_dataset.html

In [2]:
# IMPORTS
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets

## 2. Loading the practice datasets<a name="2"></a>
***
Since we will be using the toy datasets provided by seaborn, this process will be rather simple. The datasets are considered to be canon when it comes to testing models, plotting data, and many other things. This is due to the simplicity of the datasets and how widely accepted they are. Listed below are the provided datasets along with their common use cases.
* `Information about each of these sets is provided on the Seaborn datasets doc page listed above.`


In [14]:
# seaborn code to load the sample data
tips_df = sns.load_dataset("tips")

## 3. Exploratory data analysis: summary stats. and basic Matplotlib graphs<a name="3"></a>

In [43]:
# Summary stats for tips_df
print('Shape {}\n'.format(tips_df.shape), '\n{}\n'.format(tips_df.head()))
print('Summary Stats.\n')
print(tips_df.describe())
print('\nDataFrame info\n')
print(tips_df.info())

Shape (244, 7)
 
   total_bill   tip     sex smoker  day    time  size
0       16.99  1.01  Female     No  Sun  Dinner     2
1       10.34  1.66    Male     No  Sun  Dinner     3
2       21.01  3.50    Male     No  Sun  Dinner     3
3       23.68  3.31    Male     No  Sun  Dinner     2
4       24.59  3.61  Female     No  Sun  Dinner     4

Summary Stats.

       total_bill         tip        size
count  244.000000  244.000000  244.000000
mean    19.785943    2.998279    2.569672
std      8.902412    1.383638    0.951100
min      3.070000    1.000000    1.000000
25%     13.347500    2.000000    2.000000
50%     17.795000    2.900000    2.000000
75%     24.127500    3.562500    3.000000
max     50.810000   10.000000    6.000000

DataFrame info

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
total_bill    244 non-null float64
tip           244 non-null float64
sex           244 non-null category
smoker        244 non-null category
d

## Improving our graphs using Seaborn and other tools<a name="4"></a>

## Creating animated graphs <a name="5"></a>

# Citations
### References
<ol>
<li> Edward R. Tufte (2001). Pt. 2, Ch. 4 Data-Ink and Graphical Redesign; Pt. 2, Ch.6 Data-Ink Mazimization and Graphical Design. The Visual Display of Quantitative Information. Second edition, seventh printing. Published by Graphics Press LLC. Cheshire, Connecticut.</li>
<li>Steven S. Skiena (2017). Chapter 6: Visualizing Data; 6.2 Developing a Visualization Aesthetic, 6.3 Chart Types, 6.4 Great Visualizations The Data Science Design Manual. Published by Springer in Cham, Switzerland.</li>
<li>Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, NJ: Wiley.</li>
</ol>

### Misc. Images
<ol>
<li> Chart Suggestions - A Thought Starter
Image scanned from Skiena's The Visual Display of Quantitative Information, pg. 171
<li>'chartjunk before and after'<br>* URL: https://image.slidesharecdn.com/avoidingchartjunk-cesjune2013-130614174325-phpapp02/95/avoiding-chartjunk-18-638.jpg?cb=1371231839 </li>
<li> Loading the Practice Datasets
Image taken as a screenshot from scikit learn<br>
* URL: https://scikit-learn.org/stable/datasets/index.html
</ol>