# Practical Data Visualization with Python

By: [Paul Jeffries](https://twitter.com/ByPaulJ)

## High-Level Overview of Content:

### Why We Visualize

1. The power of visual data representation and storytelling. 
2. A few principles and heuristics of visualization.
3. The building blocks of visualization explored.

### Overview of Python Visualization Landscape

1. Intro to the visualization ecosystem: python's Tower of Babel.
2. Smorgasbord of packages explored through a single example viz.
3. Quick & dirty (and subjective) heuristics for picking a visualization package.

### Statistical Visualization in the Wild

1. Example business use case of data visualization: debt-to-income ratios explored.
    1. Observational:
        - mean, median, and variance
        - distributions
    2. Inferential:
        - two-sample t-test
        - KS test

### Library Deep-Dive (Plotly)

1. Quick and simple data visualizations with Plotly Express.
    - Mark types, colors, facets, etc.
2. Additional control and complexity with base Plotly.
    - Choropleth maps 
    - Heatmaps 

## Setup

In [1]:
# basic packages
import numpy as np
import pandas as pd
import datetime

# for data cleaning
from janitor import clean_names, remove_empty

In [2]:
# store the datetime of the most recent running of this notebook as a form of a log
most_recent_run_datetime = datetime.datetime.now().strftime("%Y-%m-%d %H:%M")
f"This notebook was last executed on {most_recent_run_datetime}"

'This notebook was last executed on 2019-07-19 15:53'

In [3]:
# pulling in our main data; for more info on the data, see the "data_prep_nb.ipynb" file
main_df = pd.read_csv(filepath_or_buffer='../data/jan_and_dec_17_acqs.csv')

# taking a peek at our data
main_df.head()

Unnamed: 0,loan_id,orig_chn,seller_name,orig_rt,orig_amt,orig_trm,orig_dte,frst_dte,oltv,ocltv,...,occ_stat,state,zip_3,mi_pct,product_type,cscore_c,mi_type,relocation_flg,cscore_min,orig_val
0,100020736692,B,"CALIBER HOME LOANS, INC.",4.875,492000,360,12/2017,02/2018,75,75,...,I,CA,920,,FRM,,,N,757.0,656000.0
1,100036136334,R,OTHER,2.75,190000,180,12/2017,01/2018,67,67,...,P,MD,206,,FRM,798.0,,N,797.0,283582.089552
2,100043912941,R,OTHER,4.125,68000,360,12/2017,02/2018,66,66,...,P,OH,432,,FRM,,,N,804.0,103030.30303
3,100057175226,R,OTHER,4.99,71000,360,12/2017,02/2018,95,95,...,P,NC,278,30.0,FRM,,1.0,N,696.0,74736.842105
4,100060715643,R,OTHER,4.5,180000,360,12/2017,02/2018,75,75,...,I,WA,983,,FRM,,,N,726.0,240000.0


## Why We Visualize

- Intro comments about not just jumping into the sea of tools. 
    - Credit to [Jake VanderPlas' talk from PyCon 2019](https://www.youtube.com/watch?v=vTingdk_pVM).
- Slide about "Anscombe's Quartet" (see other repo)
- Updated slide with Dino Dozen dataset.
- So what did we really "do" when we visualized these data points in order to glean additional information?
    - Answer to "why we visualize?": because encoding data into a visual representation can often lead to insights that we might not glean intuitively, if at all, without visualization. 
- Show an example of encoding: dino dozen, with X, Y, and color.
    - Follow that with an example of many potential encodings (facet, size, shape, color, all-in-one)
- Go over Bertin's book on encodings, and levels of organization, recognizing what types of encodings are better suited for communicating different levels of information. 
    - Bertin or VanderPlas' lists though are not exhaustive, nor should they be taken as irrefutable. It's the concept that matters. 
    - Before starting a visualization one should always ask:
        - What type of information am I trying to encode?
        - As such, what type of ecoding would most clearly and powerfully communicate that information?
        - Simple heuristic here that I find helpful: does the design of this visualization call the attention of the observer seemlessly to the most important features of the dataset
- Example with Fannie Mae data of not using the clearest and most powerful encoding:
    - Looking at the FICO distributions of lenders, first all-in-one by color (bad idea) and then facetted.
- End on "building blocks of visualization":
    - Data
    - Transformation
    - Marks
    - Encoding
    - Scale
    - Guides
    - Reference here to L. Wilkinson's book and Hadley's Book
    - Provide example highlighting each of the above building blocks

## Other Notes

- to do today:
-- sketch out what needs to be done for part 3
- start working on slides: 
-- **main thing today is to get an outline done**