<img style="float: center;" src="images/CI_horizontal.png" width="600">
<center>
    <span style="font-size: 1.5em;">
        <a href='https://www.coleridgeinitiative.org'>Website</a>
    </span>
</center>

Ghani, Rayid, Frauke Kreuter, Julia Lane, Adrianne Bradford, Alex Engler, Nicolas Guetta Jeanrenaud, Graham Henke, Daniela Hochfellner, Clayton Hunter, Brian Kim, Avishek Kumar, Jonathan Morgan, and Ridhima Sodhi.

# Data Visualization in Python
---

<h1>Table of Contents<span class="tocSkip"><a href="#Table-of-Contents"></a></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Data-Visualization-in-Python" data-toc-modified-id="Data-Visualization-in-Python-1">Data Visualization in Python</a></span><ul class="toc-item"><li><span><a href="#Introduction" data-toc-modified-id="Introduction-1.1">Introduction</a></span><ul class="toc-item"><li><span><a href="#Learning-Objectives" data-toc-modified-id="Learning-Objectives-1.1.1">Learning Objectives</a></span></li></ul></li><li><span><a href="#Python-Setup" data-toc-modified-id="Python-Setup-1.2">Python Setup</a></span></li><li><span><a href="#Load-the-Data" data-toc-modified-id="Load-the-Data-1.3">Load the Data</a></span></li><li><span><a href="#Visual-data-exploration-with-matplotlib" data-toc-modified-id="Visual-data-exploration-with-matplotlib-1.4">Visual data exploration with <code>matplotlib</code></a></span><ul class="toc-item"><li><span><a href="#A-Note-on-Data-Sourcing" data-toc-modified-id="A-Note-on-Data-Sourcing-1.4.1">A Note on Data Sourcing</a></span></li><li><span><a href="#Layering-in-matplotlib" data-toc-modified-id="Layering-in-matplotlib-1.4.2">Layering in <code>matplotlib</code></a></span></li></ul></li><li><span><a href="#Introducing-seaborn" data-toc-modified-id="Introducing-seaborn-1.5">Introducing <code>seaborn</code></a></span><ul class="toc-item"><li><span><a href="#Combining-seaborn-and-matplotlib" data-toc-modified-id="Combining-seaborn-and-matplotlib-1.5.1">Combining <code>seaborn</code> and <code>matplotlib</code></a></span></li></ul></li><li><span><a href="#Exploring-cohort-employment" data-toc-modified-id="Exploring-cohort-employment-1.6">Exploring cohort employment</a></span><ul class="toc-item"><li><span><a href="#A-heatmap-using-Seaborn" data-toc-modified-id="A-heatmap-using-Seaborn-1.6.1">A heatmap using Seaborn</a></span></li><li><span><a href="#Full-quarter-employment" data-toc-modified-id="Full-quarter-employment-1.6.2">Full quarter employment</a></span></li></ul></li><li><span><a href="#Exporting-Completed-Graphs" data-toc-modified-id="Exporting-Completed-Graphs-1.7">Exporting Completed Graphs</a></span></li><li><span><a href="#Choosing-a-Data-Visualization-Package" data-toc-modified-id="Choosing-a-Data-Visualization-Package-1.8">Choosing a Data Visualization Package</a></span><ul class="toc-item"><li><span><a href="#An-Important-Note-on-Graph-Titles" data-toc-modified-id="An-Important-Note-on-Graph-Titles-1.8.1">An Important Note on Graph Titles</a></span></li></ul></li><li><span><a href="#Additional-Resources" data-toc-modified-id="Additional-Resources-1.9">Additional Resources</a></span></li></ul></li></ul></div>

## Introduction
- Back to [Table of Contents](#Table-of-Contents)

In this module, you will learn to quickly and flexibly make a wide series of visualizations for exploratory data analysis and communicating to your audience. This module contains a practical introduction to data visualization in Python and covers important rules that any data visualizer should follow.

### Learning Objectives

* Become familiar with a core base of data visualization tools in Python - specifically matplotlib and seaborn

* Begin exploring what visualizations are going to best reveal various types of patterns in your data

* Learn more about our primary datasets data with exploratory analyses and visualizations

## Python Setup
- Back to [Table of Contents](#Table-of-Contents)

In [None]:
# data manipulation in Python
import pandas as pd

# visualization packages
import matplotlib.pyplot as plt 
import seaborn as sns

# database connection
from sqlalchemy import create_engine

# see how long queries/etc take
import time

# so images get plotted in the notebook
%matplotlib inline

## Load the Data
- Back to [Table of Contents](#Table-of-Contents)

In [None]:
# set up sqlalchemy engine
host = 'stuffed.adrf.info'
DB = 'appliedda'

connection_string = "postgresql://{}/{}".format(host, DB)
conn = create_engine(connection_string)

We will continue exploring a similar selection of data as we ended with in the [Dataset Exploration](02_2_Dataset_Exploration.ipynb) notebook.

**SQL code to generate the tables we'll explore below**


In [None]:
schema = 'mo_dhe'
tbl = 'completions'

query = '''
SELECT column_name
FROM information_schema.columns 
WHERE table_schema = '{}' AND table_name = '{}'
'''.format(schema, tbl)

# read results
dhe_columns = pd.read_sql(query, conn)

In [None]:
print('dataset contains {} columns'.format(dhe_columns.shape[0]))

In [None]:
# define which columns from the table we want
select_columns = [c for c in dhe_columns['column_name'].values]
select_columns

In [None]:
####### explore 2010 DHE data

# code run to generate 2010 graduate table cohort

start_time = time.time()
sql = """
 CREATE TABLE IF NOT EXISTS ada_edwork_mo.mo_dhe_grad_2010_v2 AS
 SELECT *
 FROM  mo_dhe.completions
 WHERE calyear = 2010
 """
# run sql
conn.execute(sql)


print('query completed in {:.2f} seconds'.format(time.time()-start_time))


In [None]:
# create index on "key_id" for joins to other tables
# conn.execute('CREATE INDEX ON ada_edwork_mo.mo_dhe_grad_2010 (deident_id)')

In [None]:
df = pd.read_sql('SELECT * FROM ada_edwork_mo.mo_dhe_grad_2010_v2', conn)
df.shape

In [None]:
df.info()

In [None]:

print('In this sample, {:,.0f} individuals graduated with {} different degree types and studied {} subjects'\
.format(df['deident_id'].nunique(),
        df['degreec'].nunique(),
        df['pgm_nm_dat'].nunique()
       ))

In [None]:
# get jobs worked by our cohort

start_time = time.time()

query = """
    CREATE TABLE IF NOT EXISTS ada_edwork_mo.mo_dhe_grad2010_jobs_v2 AS
    SELECT * 
    FROM kcmo_lehd.mo_wage
    WHERE ssn IN (SELECT distinct deident_id FROM ada_edwork_mo.mo_dhe_grad_2010_v2)
    AND year = 2015
    """
conn.execute(query)

# report how long reading this data frame took
print('query ran in {:.2f} seconds'.format(time.time()-start_time))

In [None]:
# read the jobs data for 2015, and parse the dates so we can use datetime functions
df_jobs = pd.read_sql("SELECT * FROM ada_edwork_mo.mo_dhe_grad2010_jobs_v2", conn)

df_jobs.info()

In [None]:
num_grad = df['deident_id'].nunique()
num_empl = df_jobs['ssn'].nunique()
print('of {:,.0f} 2010 Missouri graduates, {:,.0f} ({:.1f}%) had at least one job in Missouri in 2015'.format(num_grad,num_empl, 
                                                                                    num_empl/num_grad*100))

In [None]:
df_jobs.head()

## Visual data exploration with `matplotlib`
- Back to [Table of Contents](#Table-of-Contents)

Under the hood, `Pandas` uses `matplotlib` to produce visualizations. `matplotlib` is the most commonly used base visualization package and provides low level access to different chart characteristics (eg tick mark labels)

In [None]:
# simple distribution
df_jobs.hist(column='wage', grid=False);

In [None]:
# simple distribution with specified number of bins
df_jobs.hist(column='wage', bins=50, grid=False);

In [None]:
# the simple histogram produced above shows a l/ot of small earnings values
# what is the distribution of the higher values
df_jobs['wage'].describe(percentiles = [.01, .1, .25, .5, .75, .9, .95, .99, .999])

In [None]:
## We can see a long tail in the earnings per job
## let's subset to below the 99% percentile and make a historgram
subset_values = df_jobs['wage']<pd.np.percentile(df_jobs['wage'], 99)

df_jobs[subset_values].hist(column='wage', bins=50, grid=False);

> Note in the above cell we split subsetting the data into two steps:
1. We created `subset_values` which is simply a list of True or False
2. Then we selected all rows in the  `df_jobs` dataframe where `subset_values` was True

In [None]:
## We can change options within the hist function (e.g. number of bins, color, transparency):
df_jobs[subset_values].hist(column='wage', bins=20, facecolor="purple", alpha=0.5, figsize=(10,6), grid=False)

## And we can change the plot options with `plt` (which is our alias for matplotlib.pyplot)
plt.xlabel('Job earnings ($)')
plt.ylabel('Number of jobs')
plt.title('Distribution of jobs by earnings for the cohort')

## And add Data sourcing:
### xy are measured in percent of axes length, from bottom left of graph:
plt.annotate('Source: Missouri LEHD', 
             xy=(0.5,-0.15), xycoords="axes fraction")

## We use plt.show() to display the graph once we are done setting options:
plt.show()

### A Note on Data Sourcing
- Back to [Table of Contents](#Table-of-Contents)

Data sourcing is a critical aspect of any data visualization. Although here we are simply referencing the agencies that created the data, it is ideal to provide as direct of a path as possible for the viewer to find the data the graph is based on. When this is not possible (e.g. the data is sequestered), directing the viewer to documentation or methodology for the data is a good alternative. Regardless, providing clear sourcing for the underlying data is an **absolutely requirement** of any respectable visualization, and further builds trusts and enables reproducibility.

### Layering in `matplotlib`
- Back to [Table of Contents](#Table-of-Contents)

As briefly demonstrated by changing the labels and adding the source, above, we can make consecutive changes to the same plot; that means we can also layer multiple plots on the same `figure`. By default, the first graph you create will be on the bottom with following graphs on top.

In [None]:
# demonstrate simple layering

plt.hist(df_jobs[subset_values & (df_jobs['quarter']==2)]['wage'], facecolor="blue", alpha=0.6)
plt.hist(df_jobs[subset_values & (df_jobs['quarter']==3)]['wage'], facecolor="orange", alpha=0.6)

plt.annotate('Source: Missouri LEHD', 
             xy=(0.5,-0.15), xycoords="axes fraction")
plt.show()

## Introducing `seaborn`
- Back to [Table of Contents](#Table-of-Contents)

`Seaborn` is a popular visualization package built on top of `matplotlib` which makes some more cumbersome graphs easier to make, however it does not give direct access to the lower level objects in a `figure` (more on that later).

In [None]:
## Barplot function in seaborn
sns.barplot(x='quarter', y='wage', data=df_jobs)
plt.show()

What values does the above plot actually show us? Let's use the `help()` function to check the details of the `seaborn.barplot()` function we called above:

In [None]:
help(sns.barplot)

In the documentation, we can see that there is an `estimator` function that by default is `mean`

In [None]:
## Barplot using sum of earnings rather than the default mean
sns.barplot(x='quarter', y='wage', data=df_jobs, estimator=sum)
plt.show()

In [None]:
## Barplot using median of earnings
sns.barplot(x='quarter', y='wage', data=df_jobs, estimator=pd.np.median)
plt.show()

In [None]:
## Seaborn has a great series of charts for showing different cuts of data
sns.factorplot(x='state', y='wage', hue='quarter', data=df_jobs, kind='bar')
plt.show()

## Other options for the 'kind' argument can be found in the documentation

### Combining `seaborn` and `matplotlib` 
- Back to [Table of Contents](#Table-of-Contents)

There are many excellent data visualiation modules available in Python, but for the tutorial we will stick to the tried and true combination of `matplotlib` and `seaborn`.

Below, we use `seaborn` for setting an overall aesthetic style and then faceting (created small multiples). We then use `matplotlib` to set very specific adjustments - things like adding the title, adjusting the locations of the plots, and sizing th graph space. This is a pretty protoyptical use of the power of these two libraries together. 

More on [`seaborn`'s set_style function](https://seaborn.pydata.org/generated/seaborn.set_style.html).
More on [`matplotlib`'s figure (fig) API](https://matplotlib.org/api/figure_api.html).

In [None]:
## Seaborn offers a powerful tool called FacetGrid for making small multiples of matplotlib graphs:

### Create an empty set of grids:
facet_histograms = sns.FacetGrid(df_jobs[subset_values], row='state', col='quarter')

## "map' a histogram to each grid:
facet_histograms = facet_histograms.map(plt.hist, 'wage')

## Data Sourcing:
plt.annotate('Source: Missouri LEHD', 
             xy=(0.5,-0.35), xycoords="axes fraction")
plt.show()

In [None]:
# Seaborn's set_style function allows us to set many aesthetic parameters.
sns.set_style("white")

### Create an empty set of grids:
facet_histograms = sns.FacetGrid(df_jobs[subset_values], row='state', col='quarter')
## "map' a histogram to each grid:
facet_histograms = facet_histograms.map(plt.hist, 'wage')

## We can still change options with matplotlib, using facet_histograms.fig
facet_histograms.fig.subplots_adjust(top=0.9)
facet_histograms.fig.suptitle("Earnings for 99% of the jobs held by the cohort", fontsize=14)
facet_histograms.fig.set_size_inches(12,8)

## Data Sourcing:
facet_histograms.fig.text(x=0.5, y=-0.05, s='Source: Missouri LEHD',
                         fontsize=12)

plt.show()

## Exploring cohort employment

Question: what are employment patterns of our cohort?

In [None]:
# reminder of what columns we have in our two DataFrames
print(df.columns.tolist())
print('') # just to add a space
print(df_jobs.columns.tolist())

In [None]:
# also check the total rows in the two datasets, and the number of unique individuals in our jobs data
print(df.shape[0], df['deident_id'].nunique())
print(df_jobs.shape[0], df_jobs['ssn'].nunique())

In [None]:
# how many in our cohort had any job during each quarter
df_jobs.groupby(['year', 'quarter'])['ssn'].nunique().plot(kind='bar', grid=False);

In [None]:
# did individuals have more than one job in a given quarter?
df_jobs.groupby(['year', 'quarter', 'ssn'])['ein'].count().sort_values(ascending=False).head()

In [None]:
# enter one of the key_id values here to see what the underlying data looks like
ssn_to_view = 'replace-this-text-with-ssn'

df_jobs[df_jobs['ssn']==ssn_to_view]

In [None]:
# count the number of jobs each individual had in each quarter
# where a "job" is simply that they had a record in the data
df_tmp = df_jobs.groupby(['year', 'quarter', 'ssn'])['ein'].count().unstack(['year', 'quarter'])

In [None]:
df_tmp.head(1)

In [None]:
# flatten all columns to a single name with an '_' separator:
df_tmp.columns = ['_'.join([str(c) for c in col]) for col in df_tmp.columns.values]

In [None]:
df_tmp.head()

In [None]:
# replace NaN with 0
df_tmp.fillna(0, inplace=True)

# and set values >0 to 1
df_tmp[df_tmp>0] = 1

In [None]:
# make ID value a column instead of an index - then we can count it when we group by the 'year_q' columns
df_tmp.reset_index(inplace=True)
df_tmp.head()

In [None]:
# make a list of just the columns that start with '2015'
cols = [c for c in df_tmp.columns.values if c.startswith('2015')]

print(cols)

In [None]:
# aside on the above "list comprehension", here are the same steps one by one:

# 1- get an array of our columns
column_list = df_tmp.columns.values

# 2 - loop through each value in the array
for c in column_list:
    # 3 - check if the string starts with either '2005' or '2006'
    if c.startswith('2015'):
        # 4 - add the column to our new list (here we just print to demonstrate)
        print(c)

In [None]:
# group by all columns to count number of people with the same pattern
df_tmp = df_tmp.groupby(cols)['ssn'].count().reset_index()

In [None]:
# rename key_id to make it less confusing later
df_tmp.rename(columns={'ssn': 'count_ssn'}, inplace=True)

In [None]:
print('There are {} different patterns of employment in 2015'.format(df_tmp.shape[0]))

In [None]:
# total possible patterns of employment
poss_patterns = 2**len(cols)

pct_of_patterns = 100 * df_tmp.shape[0] / poss_patterns

print('With this definition of employment, our cohort shows {:.1f}% of the possible patterns'.format(pct_of_patterns))

In [None]:
# Look at just the top 10:
df_tmp.sort_values('count_ssn', ascending=False).head(10)

In [None]:
# and how many people follow other patterns
df_tmp.sort_values('count_ssn', ascending=False).tail(df_tmp.shape[0]-10)['count_ssn'].sum()

In [None]:
# grab the top 10 for a visualization
df_tmp_top = df_tmp.sort_values('count_ssn', ascending=False).head(10).reset_index()

In [None]:
# drop old index
df_tmp_top.drop(columns='index', inplace=True)

In [None]:
print('percent of employed in top 10 patterns is {:.1f}%'\
.format(100*df_tmp_top['count_ssn'].sum()/df_tmp['count_ssn'].sum()))

In [None]:
# calculate percentage of cohort in each group:
df_tmp_top['pct_cohort'] = df_tmp_top['count_ssn'].astype(float) / df['deident_id'].nunique()
df_tmp_top.head()

### A heatmap using Seaborn

In [None]:
# visualize with a simple heatmap
sns.heatmap(df_tmp_top[cols])

The default visualization leaves a lot to be desired. Now let's customize the same heatmap.

In [None]:
# Create the matplotlib object so we can tweak graph properties later
fig, ax = plt.subplots(figsize = (14,8))

# create the list of labels we want on our y-axis
ylabs = ['{:.2f}%'.format(x*100) for x in df_tmp_top['pct_cohort']]

# make the heatmap
sns.heatmap(df_tmp_top[cols], linewidths=0.01, linecolor='grey', yticklabels=ylabs, cbar=False, cmap="Blues")

# make y-labels horizontal and change tickmark font size
plt.yticks(rotation=360, fontsize=12)
plt.xticks(fontsize=12)

# add axis labels
ax.set_ylabel('Percent of cohort', fontsize=16)
ax.set_xlabel('Quarter', fontsize=16)

## Data Sourcing:
ax.annotate('Source: Missouri LEHD', 
            xy=(0.5,-0.15), xycoords="axes fraction", fontsize=12)

## add a title
fig.suptitle('Top 10 most common employment patterns of cohort', fontsize=18)
ax.set_title('Blue is "employed" and white is "not employed"', fontsize=12)

plt.show()

### Full quarter employment

Define full quarter employment as "paid by same employer in both the quarter before and after"

In [None]:
# check if employed full quarter either 2nd quarter or 3rd quarter of 2015
full_qtr = df_jobs.pivot_table(index=['ssn', 'ein'], columns='quarter', values=['wage', 'weeks'])

# index for full empl in 2nd quarter
idx2 = full_qtr[[('wage', 1), ('wage', 2), ('wage', 3)]].notnull().sum(1)==3

# index for full empl in 3rd quarter
idx3 = full_qtr[[('wage', 2), ('wage', 3), ('wage', 4)]].notnull().sum(1)==3

In [None]:
full_qtr.shape

In [None]:
full_qtr.head()

In [None]:
# calculate average full quarter earnings

full_qtr['avg_full_qtr_wages'] = \
((full_qtr[('wage', 2)]*idx2) + (full_qtr[('wage', 3)]*idx3))\
/ (pd.np.ones(len(idx2))*idx2 + pd.np.ones(len(idx3))*idx3)

# subset to only those fully employed in one or both quarters
full_qtr = full_qtr[full_qtr['avg_full_qtr_wages'].notnull()]

In [None]:
full_qtr.head()

In [None]:
full_qtr.shape

In [None]:
# rename columns for easier reference
full_qtr.columns = ['_'.join([str(c) for c in col]).strip() for col in full_qtr.columns.values]
full_qtr.head()

In [None]:
full_qtr.reset_index(inplace=True)

In [None]:
sns.distplot(full_qtr['avg_full_qtr_wages_']);

## Exporting Completed Graphs
- Back to [Table of Contents](#Table-of-Contents)

When you are satisfied with your visualization, you may want to save a a copy outside of your notebook. You can do this with `matplotlib`'s savefig function. You simply need to run:

plt.savefig("fileName.fileExtension")

The file extension is actually surprisingly important. Image formats like png and jpeg are actually **not ideal**. These file formats store your graph as a giant grid of pixels, which is space-efficient, but can't be edited later. Saving your visualizations instead as a PDF is strongly advised. PDFs are a type of vector image, which means all the components of the graph will be maintained.

With PDFs, you can later open the image in a program like Adobe Illustrator and make changes like the size or typeface of your text, move your legends, or adjust the colors of your visual encodings. All of this would be impossible with a png or jpeg.

In [None]:
## Let's save the employement patterns heatmap we created earlier
## below just copied and pasted from above:

# Create the matplotlib object so we can tweak graph properties later
fig, ax = plt.subplots(figsize = (14,8))

# create the list of labels we want on our y-axis
ylabs = ['{:.2f}%'.format(x*100) for x in df_tmp_top['pct_cohort']]

# make the heatmap
sns.heatmap(df_tmp_top[cols], linewidths=0.01, linecolor='grey', yticklabels=ylabs, cbar=False, cmap="Blues")

# make y-labels horizontal and change tickmark font size
plt.yticks(rotation=360, fontsize=12)
plt.xticks(fontsize=12)

# add axis labels
ax.set_ylabel('Percent of cohort', fontsize=16)
ax.set_xlabel('Quarter', fontsize=16)

## Data Sourcing:
ax.annotate('Source: Missouri LEHD', 
            xy=(0.5,-0.15), xycoords="axes fraction", fontsize=12)

## add a title
fig.suptitle('Top 10 most common employment patterns of cohort', fontsize=18)
ax.set_title('Blue is "employed" and white is "not employed"', fontsize=12)

fig.savefig('./images/cohort_empl_patterns.pdf')

## Choosing a Data Visualization Package

- Back to [Table of Contents](#Table-of-Contents)

You can read more about different options for data visualization in Python in the [Additional Resources](#Additional-Resources) section at the bottom of this notebook. 

`matplotlib` is very expressive, meaning it has functionality that can easily account for fine-tuned graph creation and adjustment. However, this also means that `matplotlib` is somewhat more complex to code.

`seaborn` is a higher-level visualization module, which means it is much less expressive and flexible than matplotlib, but far more concise and easier to code.

It may seem like we need to choose between these two approaches, but this is not the case! Since `seaborn` is itself written in `matplotlib` (you will sometimes see `seaborn` be called a `matplotlib` 'wrapper'), we can use `seaborn` for making graphs quickly and then `matplotlib` for specific adjustments. When you see `plt` referenced in the code below, we are using `matplotlib`'s pyplot submodule.


`seaborn` also improves on `matplotlib` in important ways, such as the ability to more easily visualize regression model results, creating small multiples, enabling better color palettes, and improve default aesthetics. From [`seaborn`'s documentation](https://seaborn.pydata.org/introduction.html):

> If matplotlib 'tries to make easy things easy and hard things possible', seaborn tries to make a well-defined set of hard things easy too. 

### An Important Note on Graph Titles
- Back to [Table of Contents](#Table-of-Contents)

The title of a visualization occupies the most valuable real estate on the page. If nothing else, you can be reasonably sure a viewer will at least read the title and glance at your visualization. This is why you want to put thought into making a clear and effective title that acts as a **narrative** for your chart. Many novice visualizers default to an **explanatory** title, something like: "Average Wages Over Time (2006-2016)". This title is correct - it just isn't very useful. This is particularly true since any good graph will have explained what the visualization is through the axes and legends. Instead, use the title to reinforce and explain the core point of the visualization. It should answer the question "Why is this graph important?" and focus the viewer onto the most critical take-away.

---

## Additional Resources

* [Data-Viz-Extras](../notebooks_additional/Data-Viz-extras.ipynb) notebook in the "notebooks_additional" folder

* [A Thorough Comparison of Python's DataViz Modules](https://dsaber.com/2016/10/02/a-dramatic-tour-through-pythons-data-visualization-landscape-including-ggplot-and-altair)

* [Seaborn Documentation](http://seaborn.pydata.org)

* [Matplotlib Documentation](https://matplotlib.org)

* [Advanced Functionality in Seaborn](blog.insightdatalabs.com/advanced-functionality-in-seaborn)

* Other Python Visualization Libraries:
    * [`Bokeh`](http://bokeh.pydata.org)
    * [`Altair`](https://altair-viz.github.io)
    * [`ggplot`](http://ggplot.yhathq.com.com)
    * [`Plotly`](https://plot.ly)