# Introduction to Data Vizualization with Python (Matplotlib)

### [Intro](#Goal:-Learn-the-basics-of-Data-Visualization-with-Python-(Matplotlib))
### [Part 1: Basic Syntax](#Part-1:-Basic-Syntax)
### [Part 2: Create a Chart](#Part-2:-Create-a-chart)
### [Part 3: Add Multiple Charts](#Part-3:-Add-Multiple-Charts)
### [Part 4: Formatting & Aesthetics](#Part-4:-Formatting-&-AESTHETICS)
### [Part 5: Exporting](#Part-5:-Contact-&-Feedback)


-----
<br>

### Goal: Learn the basics of Data Visualization with Python (Matplotlib)
Python can be a powerful and flexible tool for creating Data Visualizations. It can also be an exercise in frustration and confusing to get started. This Notebook is designed to introduce core concempts used for creating charts and graphs in Python, and to explor common usecases.

In [None]:
### Basic Imports ###
import pandas as pd
import matplotlib.pyplot as plt

Matplotlib (the Python library covered in this notebook) is efficient, scalabel, and customizable. but it can be confusing at first, and may not be the iht tool for every usecase.
<br> <br>
It may be a good idea to use Python when...
> - you are running an analysis in Python on a REGULAR basis
> - You can save time by AUTOMATING the creation of Data Vizualizations
> - You believe a vizualization (chart, graph) can reinforce your argument/analysis


It may NOT be a good idea to use Python when...
> - You are completing a "one-off" analysis (not done on a regular basis)
> - You are crafting something on a tight deadline
> - A process for creating vizuals is already in place, and you won't save time by recreating in Python

Quick Example

In [None]:
df = pd.read_csv(r'usedcars.csv')

x_values_list = df[''].value_counts().index
y_values_list = df[''].value_counts()

df.sample(3)

In [None]:
fig, ax = plt.subplots()   ### Create Figure and Axes

ax.bar(x_values_list,
      y_values_list,
      color = 'green'
      )

ax.set_title('Number of Cars by Year')

plt.show()   ### Display

<br><br><br>
-----
# Part 1: Basic Syntax

To start, its important to understand the concepts/terms used with Matplotlib
> - figure: A blank "canvas" that you create graphs on
> - AxesL your actual chart(s)

One figure can contain many Axes. Think of this as "one page" or "one slide" in a powerpoint deck. You can fit multiple "charts" side-by-side on that page/slide

> - Subplot: Basically the same as an Axes ("Axes" = "Subplots" = "Charts")
> - X Axis: The horizontal (left/right) frame of your chart
> - Y Axis: The verticle (up/down) frame of your chart
> - Spine: Almost the same as the X/Y axis, but it refers to the graph's "box edge" - for example, you can hide the spine so it doesn't show... but things are still plotted along the axis (still plotted where the spine would show).
> - Major/Minor Ticks: These are the little marks on the edge of a graph or on a grid that tell you what number/value you are looking at. You can define what values are "major" or "minor" (i.e. Major ticks would be whole numbers (0, 1, 2, 3, 4), and minor ticks would be partway between the whole numbers (.5, 1.5, 2.5, 3.5) ).

Some basic functions:
> - fig, ax = plt.subplots(): This is commonly used at the START of creating a chart. It makes a blank figure and a blank chart. We'll dissect this more in a bit
> - plt.show(): Essentially a "print" statement. You can create a chart and define how it looks, but it won't display until you run plt.show()
> - ax.set_title: Set a title on th especific Axes/Subplot
fig.suptitle(): Puts a title on the entire figure (above the axes/subplots)

and some common chart types:
> - Bar Chart: ax.bar())
> - Horiziontal Bar Chart: ax.barh()
> - Line Chart: ax.plot()
> - Stacked Line Chart: ax.stackplot()
> - Histogram: ax.hist()
> - Scatter Plot: ax.scatter()
> - Pie Chart: ax.pie()



<br><br><br>
-----
# Part 2: Create a Chart

To CREATE a chart:
<br>
1. Create the Figure and the Axes
2. Fill in the Data
3. Tell it to "Display"

In [1]:
fig, ax = plt.subplots()

NameError: name 'plt' is not defined

The Common way to start is "fig, ax = plt.subplots()". This creates the Blank Figure, and Blank Chart(s).

Technically this is a short way to say:
> fig = plt.figure() <br>
> fig.add_subplot()

Which creates the figure and creates one axis on that figure. But really just focus on "fig, ax = plt.subplots()"

Now that we've created the Figure and Axis, we need to say what data we want to enter: <br>
Typically , the chart functions ask you to enter an X Value(s) and a Y Value(s). You can enter specific values, or a list of values. <br><br> Here is an example of a Bar Chart:

In [2]:
fig, ax = plt.subplots()

x_values = ['Jack', 'Kate', 'Locke', 'Sawyer', 'Hugo', 'Charlie']
y_values = [4, 8, 15, 16, 23, 42]

ax.bar(x_values,
      y_values
      )

plt.show()

NameError: name 'plt' is not defined

And here is an example of a Line Chart:

In [None]:
fig, ax = plt.subplots()

x_values = ['Jack', 'Kate', 'Locke', 'Sawyer', 'Hugo', 'Charlie']
y_values = [4, 8, 15, 16, 23, 42]

ax.plot(x_values,
        y_values
      )

plt.show()

If we wanted to add multiple lines on one plot, we just need to "plot" more than one line, before we plt.show()

In [None]:
fig, ax = plt.subplots()

x_values = ['Jack', 'Kate', 'Locke', 'Sawyer', 'Hugo', 'Charlie']
y_values = [4, 8, 15, 16, 23, 42]
y_values_2 = [42, 23, 16, 15, 8, 4]


ax.plot(x_values,
       y_values,
       color = 'green',   ### Color lets you specify the color of the line
       label = 'Example 1'   ### The labels will be shown in a legend, along with the color
      )

ax.plot(x_values,
        y_values_2,
        color = 'red',
        label = 'Example 2'
      )

plt.legend (loc = 'upper center')

plt.show()

Specific charts may require slightly different inputs -- Remeber to check the documentation or search online for help when running into errors!

In [None]:
fig, ax = plt.subplots()

x_values = ['Jack', 'Kate', 'Locke', 'Sawyer', 'Hugo', 'Charlie']
y_values = [4, 8, 15, 16, 23, 42]

ax.pie(y_values,
       labels = x_values
      )

plt.show()

Regardless of the chart type, you can see above the 3 step process
1. Create the Figure and the Axes with fig, ax = plt.subplots()
2. Fill in the data with X and Y values
3. And tell it to "Display" with plt.show()

Let's look at a practical example using the example "Used Cars" dataset:

In [None]:
df = pd.read_csv(r'usedcars.csv')
df.head(3)

We are going to take advantage of the "Value_Counts()" function to create our X and Y value lists. Value_Counts() will tell us the number of unqiue values:

In [None]:
df[''].value_counts()

Let's take the "Left Column" as our X Values, and the numbers on the right as out Y Values:

In [None]:
x_values_list = df[''].value_counts(sort=False).index.tolist()
y_values_list = df[''].value_counts(sort=False).tolist()

And put them into a chart

In [None]:
fig, ax = plt.subplots()

x_values_list = df[''].value_counts(sort=False).index.tolist()
y_values_list = df[''].value_counts(sort=False).tolist()

ax.bar(x_values,
       y_values,
       color = 'green'
      )

ax.set_title('Count of Cars by Year')   ### This sets up a title above your specific Axes/Subplots

plt.show()

In [None]:
fig, ax = plt.subplots()

x_values_list = df.groupby('')[''].mean().index.tolist()
y_values_list = df.groupby('')[''].mean().tolist()

ax.plot(x_values,
       y_values,
      )

ax.set_title('')

plt.show()

One final note on the charts above - we've added specific colors and titles to the charts. <br> <br>
As mentioned, thank of the process as creating the figure/axis, filling in data (numbers, labels, colors), then adding additional things on the chart (titles and labels). You can check the documentation for a full list of possible attributes.<br><br> We will discuss formatting shortly (section 4), but to illustrate the 3 part process:

In [None]:
fig, ax = plt.subplots()

x_values_list = df[''].value_counts(sort=False).index.tolist()
y_values_list = df[''].value_counts(sort=False).tolist()

ax.bar(x_values,
       y_values,
       color = 'orange',   ### Color of the chart
       edgecolor = 'blue',   ### Outline color of the bar
       label = '# of THING',   ### Used for a legend label
       linewidth = 5,   ### Thickness of bar outline
       alpha = .8,   ### Transparency
       zorder = 2   ### Used to set items on top/underneath one another. In this case, the chart is on top of the grid
      )

ax.set_title('TITLE')

ax.grid(linestyle = '-',   ###
        which = 'major',   ### Add lines on the Major/Minor/Both values
        axis = 'y',   ### Adds verticle/horizontal/both lines (specify the axis, or 'both')
        alpha = .5,   ### Transparency
        linewidth = .75,   ### Thickness of gridlines
        color = 'black',   ### Color of the gridlines
        zorder = 1   ### Used to set items on top/underneath one another. In this case, the grid is underneath the chart
      )
    
ax.legend(loc = 'upper right')

plt.show()

In [None]:
fig, ax = plt.subplots()

x_values = ['Jack', 'Kate', 'Locke', 'Sawyer', 'Hugo', 'Charlie']
y_values = [4, 8, 15, 16, 23, 42]
y_values_2 = [42, 23, 16, 15, 8, 4]
y_values_3 = [23, 4, 32, 16, 42, 8]


ax.plt(x_values,
       y_values,
       color = 'green',   ### Color of the chart
       label = 'Example 1',   ### Used for a legend label
       linewidth = 4,   ### Thickness of bar outline
      )

ax.plt(x_values,
       y_values_2,
       color = 'magenta',   
       label = 'Example 2',
       linestyle = 'dashed'   ### How the line is drawn: "-----"
      )

ax.plt(x_values,
       y_values_3,
       color = 'yellow',   
       label = 'Example 3',
       marker = 'o',   ### The 'Data Point' where the line meets the gridlines
       markersize = 12   ### Size of the "Data Point"
      )

ax.spines['top'].set_visible(False)   ### Spines are the 'main square/lines' of a subplot. You can select certain ones "off"
ax.spines['right'].set_visible(False)
ax.spines['left'].set_visible(False)

ax.grid(linestyle = '-',   ###
        which = 'major',   ### Add lines on the Major/Minor/Both values
        axis = 'x',   ### Adds verticle/horizontal/both lines (specify the axis, or 'both')
        alpha = .5,   ### Transparency
        linewidth = .5,   ### Thickness of gridlines
        color = 'red',   ### Color of the gridlines
      )
    
ax.legend(loc = 'upper center')

plt.show()

<br><br><br>
-----
# Part 3: 

So far we've been working with ONE figure, and ONE subplot (one axes, or one chart). But if we want to place multiple charts onto one figure, we need to update our syntax.

In [3]:
fig, ax = plt.subplots(2,1)

NameError: name 'plt' is not defined

plt.subplots() lets you specify the number of rows and the number of columns <br><br> And to add charts, we need to specigy which subplot we are working with. Python uses zero-based indexing. That means the first item is [0], the second is [1], and so forth.

In [None]:
fig, ax = plt.subplots(2, 1)

x_values = ['Britta', 'Troy', 'Annie', 'Abed', 'Jeff', 'Shirley', 'Pierce']
y_values = [28, 21, 20, 22, 34, 35, 60]

ax[0].bar(x_values,
          y_values,
          color = 'blue'
         )

ax[1].bar(x_values,
          y_values,
          color = 'green'
         )

plt.show()

You can think of it like creating a grid, with each square in that grid being a different chart <br> So a 2x2 grid creates 4 squares, which is 4 charts:

In [None]:
fig, ax = plt.subplots(2,2)

From here the syntax is the same -- EXCEPT you need to specify which square you're working in:
> Top Left = ax[0,0] <br>
> Top Right = ax[0,1] <br>
> Bottom Left = ax[1,0] <br>
> Bottom Left = ax[1,1] 

Again, zero-based indexing means the first item is [0], the second is [1], and so forth. so the FIRST row, SECOND column is [0,1]

In [None]:
fig, ax = plt.subplots(2, 1)
fig.set_size_inches(12, 6)   ### There are a variety of ways to "change the size" of the plot/figure. Ignore this for now

x_values = ['Britta', 'Troy', 'Annie', 'Abed', 'Jeff', 'Shirley', 'Pierce']
y_values = [28, 21, 20, 22, 34, 35, 60]

##############
ax[0,0].axis('off')   ### Hides all the axes edges
##############


##############
ax[0,1].bar(x_values,
            y_values,
            color = 'blue'
           )
ax[0,1].set_title('Top Right Subplot')
##############


##############
ax[1,0].barh(x_values,
             y_values,
             color = 'green'
           )
ax[1,0].spines['right'].set_visible(False)   
ax[1,0].invert_yaxis()   ### Reverses theorder of the Y values
ax[1,0].set_title('Bottom Left Subplot')  
##############


##############
ax[1,1].text(0,   ### The Left/Right location, 0 is the START/LEFT and 1 is the END/RIGHT
             .5,   ### The Up/Down location, 0 is the TOP and 1 is the BOTTOM
            'You can put a text box into a subplot!',
             fontsize = 14
            )
##############


plt.show()

Sometimes when you want to change something, you need to do it on the "Figure" level, and sometimes you need to do it to the "Axes"

In [None]:
fig, ax = plt.subplots(2, 1)

x_values = ['Britta', 'Troy', 'Annie', 'Abed', 'Jeff', 'Shirley', 'Pierce']
y_values = [28, 21, 20, 22, 34, 35, 60]

ax[0].bar(x_values, y_values)
ax[0].set_title('First/Left Side Chart')


ax[1].bar(x_values, y_values)
ax[1].set_title('Second/Right Side')  

fig.suptitle('This is a FIGURE Title', fontsize = 24)


plt.show()

You can often perform the same actions by adjusting the FIGURE or by adjusting the SUBPLOTS -- but this can also cause a lot of confusion
> plt.ticks(rotation = 45) <br>
> ax.set_xticklabels(x_values, rotation = 45)

It is important to pay attention to what you want to adjust, and how you are trying to do it. If you mix/match these methods, you may end up with unexpected results!

<br><br><br>
-----
# Part 4: Formatting & AESTHETICS

So far we've been making specific adjustments to the look and style of our charts. Wouldn't it be great if we can specify the look ONCE, and not have to worry about it again? <br> <br> WIth matplotlib, we can change the default global style using"reParams"
> rc = "runtime configuration" <br>
> Params = Parameter

Each time Matplotlib loads, it defines a runtime configuration (rc) containing the default styles for every plot element you create. This configuration can be customized using "plt.rcParams"

In [None]:
### Before ###
fig, ax = plt.subplots()
fig.set_size_inches(12, 6)

x_values = ['Eddard "Ned" Stark', 'Jamie Lannister', 'Tyrion Lannister', 'Daenerys Targaryen', 'Jon Snow', 'Arya Stark']
y_values = [15, 17, 49, 31, 42, 34]

ax.bar(x_values, y_values)

plt.show()

In [None]:
### You can find more items to customize in Matplotlib's Documentation: https://matplotlib.org/stable/tutorials/introductory/customizing.html
### Or some options are available here: [(param ,value) for param, value in plt.rcParams.items() if 'color' in param]
plt.rcParams['axes.facecolor'] = 'grey'
plt.rcParams['axes.edgecolor'] = 'yellow'
plt.rcParams['axes.linewidth'] = 4

In [None]:
### After ###
fig, ax = plt.subplots()
fig.set_size_inches(12, 6)

x_values = ['Eddard "Ned" Stark', 'Jamie Lannister', 'Tyrion Lannister', 'Daenerys Targaryen', 'Jon Snow', 'Arya Stark']
y_values = [15, 17, 49, 31, 42, 34]

ax.bar(x_values,
       y_values
      )

plt.show()

Here's a cleaner example

In [None]:
plt.rcParams['figure.facecolor'] = 'black'
plt.rcParams['figure.edgecolor'] = 'black'

plt.rcParams['axes.facecolor'] = 'grey'
plt.rcParams['axes.edgecolor'] = 'yellow'

plt.rcParams['axes.linewidth'] = 2

plt.rcParams['xtick.color'] = 'white'
plt.rcParams['ytick.color'] = 'white'

plt.rcParams['axes.spines.right'] = False
plt.rcParams['axes.spines.top'] = False

plt.rcParams['xtick.labelsize'] = 'smaller'

In [None]:
### After ###
fig, ax = plt.subplots()
fig.set_size_inches(12, 6)

x_values = ['Eddard "Ned" Stark', 'Jamie Lannister', 'Tyrion Lannister', 'Daenerys Targaryen', 'Jon Snow', 'Arya Stark']
y_values = [15, 17, 49, 31, 42, 34]

ax.bar(x_values,
       y_values
      )

plt.show()

There are a variety of pre-set styles that you can import as well:

In [None]:
print(plt.style.available)

In [None]:
plt.style.use('fivethirtyeight')

fig, ax = plt.subplots()
fig.set_size_inches(12, 6)

x_values = ['Eddard "Ned" Stark', 'Jamie Lannister', 'Tyrion Lannister', 'Daenerys Targaryen', 'Jon Snow', 'Arya Stark']
y_values = [15, 17, 49, 31, 42, 34]

ax.bar(x_values,
       y_values
      )

plt.show()

In [None]:
plt.style.use('seaborn-deep')

fig, ax = plt.subplots()
fig.set_size_inches(12, 6)

x_values = ['Eddard "Ned" Stark', 'Jamie Lannister', 'Tyrion Lannister', 'Daenerys Targaryen', 'Jon Snow', 'Arya Stark']
y_values = [15, 17, 49, 31, 42, 34]

ax.bar(x_values,
       y_values
      )

plt.show()

and return to default settings:

In [None]:
plt.style.use('default')

<br><br><br>
-----
# Part 5: Exporting

Now that you've put in a lot of work to make effective charts, let's finishe with how to share them!<br> We do this with the "plt.savefig" function, which will export the FIGURE (and all the subplots on that figure)

Let's go back to a practical example using the used cars dataframe:

In [4]:
fig, ax = plt.subplots()
fig.set_size_inches(12, 6)

x_values_1 = df['COLUMN'].value_counts(sort=False).index.tolist()   ### Values
y_values_1 = df['COLUMN'].value_counts(sort=False).tolist()   ### Values

x_values_2 = df['COLUMN'].value_counts(sort=False).index.tolist()   ### Values
y_values_2 = df['COLUMN'].value_counts(sort=False).tolist()   ### Values

ax[0].bar(x_values_1, y_values_1)
ax[0].set_title('First/Left Side Chart')

ax[1].bar(x_values_2, y_values_2)
ax[1].set_title('Second/Right Side Chart')

### We could specify the X Axis in both subplots, OR we can loop through each axes in the figure, like this:
for ax in fig.axes:
    ax.tick_params(axis = 'x', labelrotation = 45 )
#ax[0].set_xticklabels(x_values_1, rotation = 45)
#ax[0].set_xticklabels(x_values_1, rotation = 45)

### Similarly, we can do the same for data labels -- define the function
def autolabel(rects, ax):
    for rect in rects: ### get the hight of bars
        x = rect.get_x() + rect.get_width()/2
        y = rect/get_height()
        ax.annotate("{}".format(y), (x, y), xytext = (0,1), textcoords = "offset points", 
                    ha = 'center', va = 'bottom')   ### and add an annotation

for ax in fig.axes:
    autolabel(ax.patches, ax)
    
fig.suptitle('Title for the Figure',
             fontsize = 24,
             y = 1.05
            )

plt.tight_layout()   ### Sometimes the text or labels can get cutoff -- tight_layout will resize your figure/margins to keep everything "in frame"

plt.savefig('exports\example.png')
plt.show()

NameError: name 'plt' is not defined

savefig lets you export the figure in a variety of formats (.png, .jpeg, or even PDF if you're making a report). <br> <br> You can also specify a specific folder or directory on your computer! Try this:

In [None]:
### Create a folder on your C:Drive for demo purposes
from datetime import date
import os

today_date = date.today().strftime("%Y_%m_%d")
path = r'C:\My Documents\Matplotlib_Exports\\' + today_date

if not os.path.exists(path):
    os.makedirs(path)
    print('Making Folder...')
else:
    print('Folder already exists!')

In [None]:
fig, ax = plt.subplots()
fig.set_size_inches(12, 6)

x_values_1 = df['COLUMN'].value_counts(sort=False).index.tolist()   ### Values
y_values_1 = df['COLUMN'].value_counts(sort=False).tolist()   ### Values

x_values_2 = df['COLUMN'].value_counts(sort=False).index.tolist()   ### Values
y_values_2 = df['COLUMN'].value_counts(sort=False).tolist()   ### Values

ax[0].bar(x_values_1, y_values_1)
ax[0].set_title('First/Left Side Chart')

ax[1].bar(x_values_2, y_values_2)
ax[1].set_title('Second/Right Side Chart')

for ax in fig.axes:
    ax.tick_params(axis = 'x', labelrotation = 45 )

    
fig.suptitle('Title for the Figure',
             fontsize = 24,
             y = 1.05
            )

plt.tight_layout()   ### Sometimes the text or labels can get cutoff -- tight_layout will resize your figure/margins to keep everything "in frame"

plt.savefig('C:\My Documents\Matplotlib_Exports\\' + today_date + '\\example_fig.pdf', bbox_inches = 'tight')   #### bbox_inches is similar to tight_layout, may fix things being cut off. However thing will adjust page size
plt.show()

In [None]:
print('Yuge Win')