# Data Visualizations


So far, we have learned several how to work with the primitive data types, like strings, integers, floats, and booleans. We have also spent a lot of time with data collections, particularly lists and dictionaries.

We will now use Plotly, a graphing library, to produce graphical representations of the data contained in our lists and dictionaries.

### Plotly

* **Step 1:** Import the plotly module so we can use it in this notebook


In [3]:
# Step 1. Import the plotly module so we can use it in this notebook

import plotly

plotly.offline.init_notebook_mode(connected=True)
# use offline mode to avoid initial registration

In [2]:
!pip install plotly



**Step 2:** Each trace we plot on a graph needs to be represented by a dictionary. We should have the following keys:

- `'type'` points to the type of graph we wish to create
- `'x'` points to a list of our x-values
- `'y'` points to the list of data points (order matters, because we want each y-value to match its corresponding x-value) 

In [None]:
dogs = ["Scooby", "Scrappy", "Pickles", "Clifford", "Lassie", "Floyd"]

In [None]:
ages = [7, 1, 13, 4, 9, 11]

In [None]:
# defaults to a scatter/line plot if 'type' isn't specified

trace1 = {'type': 'bar', 'x': dogs, 'y': ages}

In [26]:
plotly

<module 'plotly' from '/anaconda3/lib/python3.6/site-packages/plotly/__init__.py'>

**Step 3:** Pass a list of trace dictionaries into the `plotly.offline.iplot` method

In [None]:
plotly.offline.iplot([trace1])

## Pair Coding Challenge

Work with the person next to you.

1)  **Create a graph of $y = x^2$**
    - make a list of the first 1000 square numbers
    - plot a line chart using this list




In [10]:
x_values = list(range(1,1001))

In [21]:
y_values = [num**2 for num in x_values]

In [22]:
y_values

[1,
 4,
 9,
 16,
 25,
 36,
 49,
 64,
 81,
 100,
 121,
 144,
 169,
 196,
 225,
 256,
 289,
 324,
 361,
 400,
 441,
 484,
 529,
 576,
 625,
 676,
 729,
 784,
 841,
 900,
 961,
 1024,
 1089,
 1156,
 1225,
 1296,
 1369,
 1444,
 1521,
 1600,
 1681,
 1764,
 1849,
 1936,
 2025,
 2116,
 2209,
 2304,
 2401,
 2500,
 2601,
 2704,
 2809,
 2916,
 3025,
 3136,
 3249,
 3364,
 3481,
 3600,
 3721,
 3844,
 3969,
 4096,
 4225,
 4356,
 4489,
 4624,
 4761,
 4900,
 5041,
 5184,
 5329,
 5476,
 5625,
 5776,
 5929,
 6084,
 6241,
 6400,
 6561,
 6724,
 6889,
 7056,
 7225,
 7396,
 7569,
 7744,
 7921,
 8100,
 8281,
 8464,
 8649,
 8836,
 9025,
 9216,
 9409,
 9604,
 9801,
 10000,
 10201,
 10404,
 10609,
 10816,
 11025,
 11236,
 11449,
 11664,
 11881,
 12100,
 12321,
 12544,
 12769,
 12996,
 13225,
 13456,
 13689,
 13924,
 14161,
 14400,
 14641,
 14884,
 15129,
 15376,
 15625,
 15876,
 16129,
 16384,
 16641,
 16900,
 17161,
 17424,
 17689,
 17956,
 18225,
 18496,
 18769,
 19044,
 19321,
 19600,
 19881,
 20164,
 20449

In [23]:
y_equals_squared = {'type': 'scatter', 'x': x_values, 'y': y_values}

In [17]:
plotly.offline.iplot([y_equals_squared])

2) **Hockey statistics**

Your mission, should you choose to accept it, is to plot a bar graph of NHL statistics contained in the `'./rangers.csv'` file.  The `x-axis` will be a list of all the New York Rangers players from last season.  The `y-axis` should be a list of each player's respective point total from last season.

`rangers_dict` is a dictionary containing all of the relevant information. 

In [24]:
import pandas as pd

In [25]:
pd

<module 'pandas' from '/anaconda3/lib/python3.6/site-packages/pandas/__init__.py'>

In [27]:
rangers_df = pd.read_csv('./rangers.csv')

In [29]:
rangers_dict = rangers_df.to_dict()

In [31]:
rangers_dict.keys()

dict_keys(['Player', 'Season', 'Team', 'Position', 'GP', 'TOI', 'G', 'A', 'P', 'P1', 'P/60', 'P1/60', 'GS', 'GS/60', 'CF', 'CA', 'C+/-', 'CF%', 'Rel CF%', 'GF', 'GA', 'G+/-', 'GF%', 'Rel GF%', 'xGF', 'xGA', 'xG+/-', 'xGF%', 'Rel xGF%', 'iPENT', 'iPEND', 'iP+/-', 'iCF', 'iCF/60', 'ixGF', 'ixGF/60', 'iSh%', 'PDO', 'ZSR', 'TOI%', 'TOI% QoT', 'CF% QoT', 'TOI% QoC', 'CF% QoC'])

In [36]:
players = list(rangers_dict['Player'].values())

In [41]:
points = list(rangers_dict['P'].values())
points

['--',
 '9',
 '22',
 '8',
 '37',
 '4',
 '--',
 '28',
 '3',
 '32',
 '28',
 '5',
 '58',
 '44',
 '23',
 '2',
 '8',
 '52',
 '--',
 '36',
 '47',
 '12',
 '16',
 '14',
 '43',
 '4',
 '34',
 '3',
 '28',
 '40',
 '5',
 '--',
 '1',
 '8',
 '5',
 '48']

In [50]:
cleaned_points = []

for stat in points:
    if stat == '--':
        cleaned_points.append(0)
    else:
        cleaned_points.append(int(stat))
        
cleaned_points

[0,
 9,
 22,
 8,
 37,
 4,
 0,
 28,
 3,
 32,
 28,
 5,
 58,
 44,
 23,
 2,
 8,
 52,
 0,
 36,
 47,
 12,
 16,
 14,
 43,
 4,
 34,
 3,
 28,
 40,
 5,
 0,
 1,
 8,
 5,
 48]

In [37]:
hockey_trace = {'type': 'bar', 'x': players, 'y': points}

In [38]:
plotly.offline.iplot([hockey_trace])

3) Create a **pie chart** of Rangers players and their respective slices of the team's total goals last season.

4) Plot a histogram for the lyrics from the Beatles song "Let It Be". The lyrics are provided for you below:

In [None]:
lyrics = """When I find myself in times of trouble, Mother Mary comes to me
Speaking words of wisdom, let it be
And in my hour of darkness she is standing right in front of me
Speaking words of wisdom, let it be
Let it be, let it be, let it be, let it be
Whisper words of wisdom, let it be
And when the broken hearted people living in the world agree
There will be an answer, let it be
For though they may be parted, there is still a chance that they will see
There will be an answer, let it be
Let it be, let it be, let it be, let it be
There will be an answer, let it be
Let it be, let it be, let it be, let it be
Whisper words of wisdom, let it be
Let it be, let it be, let it be, let it be
Whisper words of wisdom, let it be
And when the night is cloudy there is still a light that shines on me
Shine until tomorrow, let it be
I wake up to the sound of music, Mother Mary comes to me
Speaking words of wisdom, let it be
Let it be, let it be, let it be, yeah, let it be
There will be an answer, let it be
Let it be, let it be, let it be, yeah, let it be
Whisper words of wisdom, let it be
"""

5) Find a set of CSV data that interests you. Make any type of visualization that shows off interesting characteristics of that data.