# Making better graphs in Python 3 with Matplotlib and Seaborn.
**CS 461 Sauppe, Spring 2020**<br>
***Tutorial created by Douglas Krouth***

<p><img src="Skiena_graph_guide.jpg" align="left">


# Why is graph design important?
<p>For data scientists, the creation of effective graphics and visualizations is crucial to presenting one's analysis and potential insights. The use of graphical tools allows us to simplify our work by removing unneccessary complexity; this leads to easier interpretation and a greater focus on key points.
    
<p>When creating a data product that will be used regularly (dashboards, interactive graphs, etc.), it is crucial to incorporate strong visuals as this will enhance user experience and the overall efficiency of the tool. Robust data graphics enable for less time to be spent on interpretation, which frees up time for further analysis or questioning on behalf of the user / audience.
    
<p> Without the use of well-made graphics, we run the risk of misrepresenting central parts of our analysis. On one end, a poorly made graph may just bore our audience; a graph or chart that is quickly seen and forgotten after the presentation's conclusion. On the other hand, a poorly made visual has the potential to mislead your audience or misconstrue your analysis. Both of these outcomes are suboptimal and should always be avoided.

## Quick review: Five principes of effective graph design
as written by Edward R. Tufte in *The Visual Display of Quantitative Information* [1]
***

<p>Above all else show the data. 
<img src="chartjunk.jpg" width="250" align="right">
<p>Maximize the data-ink ratio.
<p>Erase non-data ink.
<p>Erase redundant data-ink.
<p>Revise and edit.


# Let's get started

Listed below are the topics that we'll go over in this tutorial:
* [Importing the required libraries](#1)
* [Loading the practice datasets](#2)
* [Exploratory data analysis: summary stats. and basic Matplotlib graphs](#3)
* [Improving our graphs using Seaborn and other tools](#4)
* [Creating animated graphs](#5)

## 1. Importing the required libraries<a name="1"></a>

***
**pandas**: Using pandas will allow us to store our data in a DataFrame format; this will be extremely useful for quickly plotting the data and using the *pandas.describe()* function to give us some summary statistics during later steps.<br>
* doc: https://pandas.pydata.org/docs/<br>
* installation: https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html
***

**matplotlib.pyplot**: This library is extreme useful for providing us with ready-to-go plots that can be easily modified. Matplotlib is most useful when deployed for basic plotting on things like scatter or line plots, bar charts, and histograms. We will be using this to create our exploratory visuals.<br>
* doc: https://matplotlib.org/3.2.1/contents.html
* installation: https://matplotlib.org/users/installing.html
***

**Seaborn**: Seaborn is a data visualization library that was based off of Matplotlib for the purpose of providing a much greater variety of visualization patterns. This tool is best used for creating statistical visualizations that require greater degrees of complexity, such as 3-D modeling or multiple visuals overlaid on the same graph.<br>
* doc & tutorials: https://seaborn.pydata.org/tutorial.html
* installation: https://seaborn.pydata.org/installing.html
***

**sklearn.datasets**: Using this package from scikit-learn with provide us with some toy datasets to practice our data visualizations with. I would highly recommend this library as a resource for testing all forms of data science tools as it contains 7 classic datasets which are commonly used to observe algorithmic behavior across variable circumstances. <br>
* doc: https://scikit-learn.org/stable/datasets/index.html
* installation: https://scikit-learn.org/stable/install.html

In [3]:
# IMPORTS
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets

## 2. Loading the practice datasets<a name="2"></a>
***



## Exploratory data analysis: summary stats. and basic Matplotlib graphs<a name="3"></a>

## Improving our graphs using Seaborn and other tools<a name="4"></a>

## Creating animated graphs <a name="5"></a>

# Citations
### References
<ol>
<li> Edward R. Tufte (2001). Pt. 2, Ch. 4 Data-Ink and Graphical Redesign; Pt. 2, Ch.6 Data-Ink Mazimization and Graphical Design. The Visual Display of Quantitative Information. Second edition, seventh printing. Published by Graphics Press LLC. Cheshire, Connecticut.</li>
<li>Steven S. Skiena (2017). Chapter 6: Visualizing Data; 6.2 Developing a Visualization Aesthetic, 6.3 Chart Types, 6.4 Great Visualizations The Data Science Design Manual. Published by Springer in Cham, Switzerland.</li>
<li>Knaflic, C. N. (2015). Storytelling with data: a data visualization guide for business professionals. Hoboken, NJ: Wiley.</li>
</ol>

### Misc. Images
<ol>
<li> Chart Suggestions - A Thought Starter
Image scanned from Skiena's The Visual Display of Quantitative Information, pg. 171
<li>'chartjunk before and after'<br> https://image.slidesharecdn.com/avoidingchartjunk-cesjune2013-130614174325-phpapp02/95/avoiding-chartjunk-18-638.jpg?cb=1371231839 </li>
</ol>