# Statistical Charts in Plotly

In this activity, we'll explore some of the most common stastistical charts made in Plotly that you will need for data science and analytics careers.

### Exercise

Read through the comments and code, then run each code block to see the chart appear. Then create your own chart using the instructions below.

In [None]:
# Import plotly and plotly express
import plotly as pt
import plotly.express as px
# Import pandas to use a dataframe
import pandas as pd

# Histogram

The histogram chart is often used in statistics to represent the distribution of numerical data through the width of their bins (the bars). Histograms are also used for aggregated data, like count or sum. Below are charts for ranged and date histograms. The Y-axis is where a variable will be aggregated.

`.histogram()`

Parameters:


* `data_frame` : the data set
* `x` : the numeric variable
* `nbins` : number of bins
* `color_discrete_sequence` : changes the color

More information on histograms: https://plotly.com/python/histograms/ 

In [None]:
# Data set for dog toys found in San Francisco
data = {
    'toy': ['bone','ball','rope','plushie','squeaky_toy','kibble_puzzle','wind_up'],
    'found': [2403, 3410, 1743, 5321, 9450, 467, 310],
    'cost_each': [2.00, 4.50, 7.00, 12.00, 5.50, 19.00, 14.50]
}

# Notice the color parameter needs to be in a list [] format
fig = px.histogram(data, x='found', nbins=7, color='toy',
                   color_discrete_sequence=['darkred', 'indianred', 'salmon', 'red', 'pink', 'palevioletred', 'crimson'], width=600, height=300)

# Name the axes something other than the default column names
fig.update_layout(xaxis_title='Toys found', yaxis_title='Toy type count')
fig.show()

# Box Plot

The box plot is good for descriptive (summary) statistics. It consists of a box with "whiskers" that divide the data into quartiles (four parts). The inside of the box represents median values, while the upper and lower whiskers represent the upper and lower range of values.

`.box()`

Using the built-in Tips data set from Plotly Express, we can see what portion of the total bill the various tips falls into.

For more information on box plots: https://plotly.com/python/box-plots/

In [None]:
# Import the graph objects library so we can color the boxes
import plotly.graph_objects as go

In [None]:
df = px.data.tips()
# Let's add notches so our median differences are more pronounced. 
# The notches make you more aware of where the medians differ.
# The color parameter indicates which column of data you want to represent as colored,
# although the data set used was pre-built and we cannot see the column names, there
# must have been a column named "smoker".
fig = px.box(df, x="time", y="total_bill", notched=True, color='smoker', width=700, height=350)
fig.show()

# Regression Line Chart

Regression lines, also called trend lines, are useful in statistics for forecasting. They describe how a dependent variable is related to an independent variable(s). It's an added feature in a scatterplot chart.

For more options on the shape of your trendline: https://plotly.com/python/linear-fits/

In [None]:
# A common regression line is in the linear line shape (a straight line) through your data points

# Comparing outdoor temperature with ice cream total sales
x=[14.2, 16.4, 11.9, 15.2, 18.5, 22.1, 19.4, 25.1, 23.4, 18.1, 22.6, 17.2] 
y=[215, 325, 185, 332, 406, 522, 412, 614, 544, 421, 445, 408]

# We add the 'trendline' parameter, using the 'ols' (ordinary least squares) setting
fig = px.scatter(x, y, trendline='ols', width=700, height=350) 
fig.show()


pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.



In [None]:
# An alternative is to use a non-linear trendline, using the 'lowess' setting
fig = px.scatter(x, y, trendline='lowess', width=700, height=350) 
fig.show()

In [None]:
# You can asdjust the non-linear line to make it follow the data a little more (reduce the frac value)
# The default frac value is 0.66
fig = px.scatter(x, y, trendline='lowess', trendline_options=dict(frac=0.5), width=700, height=350) 
fig.show()

# Exercise

Create one of the charts presented above (the histogram, box plot, or scatterplot with regression line). Change some of the variables to get unique results. For example, what would happen if you doubled or tripled the number of data points? The graph might look very different, which means the styling might need to change to make the chart look nicer or more clearly represent the data.

In [None]:
# ADD CODE HERE


## References

* CSS Colors : https://www.w3schools.com/cssref/css_colors.asp
* Plotly Histograms in Python : https://plotly.com/python/histograms/
* Plotly Box Plots in Python : https://plotly.com/python/box-plots/ 
* Plotly Linear and Non-Linear Trendlines in Python : https://plotly.com/python/linear-fits/