# Making Box Plots

## Getting ready


In addition to `plotly`, `numpy` and `pandas`, make sure the `scipy` Python library avaiable in your Python environment
You can install it using the command:

```
pip install scipy 
```

For this recipe we will create two data sets

1. Import the Python modules `numpy`, `pandas`. Import the [`norm`](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html) object from `scipy.stats`. This object will allow us to generate random samples from a normal distribution. This will help us to create data sets to be used in this recipe.

In [1]:
import numpy as np
import pandas as pd
from scipy.stats import norm, t

2. Create two data sets to be used in this recipe

In [2]:
n = 200
sample1 = norm(loc=2).rvs(n)
sample2 = t(df=3).rvs(n)

In [3]:
data1 = pd.DataFrame({'Normal': sample1})

In [4]:
samples =  np.concatenate( (sample1, sample2))
labels = ['Normal']*n + ['t-Student']*n 
data2 = pd.DataFrame({'Data': samples, 'Label':labels})

## How to do it

1. Import the `plotly.express` module as `px`

In [5]:
import plotly.express as px

2. Make a simple scatter plot to illustrate the points in the `data1` data set using the function `histogram`

In [6]:
df = data1
fig = px.box(df, y="Normal")
fig.show()

2. Add a title to your chart by passing a string as the input `title` into the function `histogram`
3. And customise the size of the figure by using the inputs `height` and `width`. Both have to be integers and correspond to the size of the figure in pixels.

In [7]:
fig = px.box(df, y="Normal", 
                 height = 500, width = 600,
                 title='Sample from a Normal Distribution')
fig.show()

In [8]:
fig = px.box(df, x="Normal", 
                 height = 500, width = 800,
                 title='Sample from a Normal Distribution')
fig.show()

4. Customise the color of the bars using the input `color_discrete_sequence` as follows. Note that we have to pass a list of strings, where each string corresponds to a color.  In this case, we pass the color `teal`

In [9]:
fig = px.box(df, x="Normal", 
             color_discrete_sequence=['teal'],
                 height = 500, width = 800,
                 title='Sample from a Normal Distribution')
fig.show()

In [10]:
fig = px.box(df, x="Normal", 
             notched = True,
             color_discrete_sequence=['teal'],
             height = 500, width = 800,
             title='Sample from a Normal Distribution')
fig.show()

In [11]:
fig = px.box(df, x="Normal",
             points="suspectedoutliers",
             notched = True,
             color_discrete_sequence=['orange'],
             height = 500, width = 800,
             title='Sample from a Normal Distribution')
fig.show()

In [12]:
fig = px.box(df, x="Normal",
             points="all",
             notched = True,
             color_discrete_sequence=['purple'],
             height = 500, width = 800,
             title='Sample from a Normal Distribution')
fig.show()

In [13]:
df = data2

In [14]:
df.head()

Unnamed: 0,Data,Label
0,0.766134,Normal
1,2.936433,Normal
2,1.698716,Normal
3,3.072707,Normal
4,1.434231,Normal


In [15]:
fig = px.box(df, x="Data", 
             color="Label",
             height = 500, width = 800,
             title='Box Plots')
fig.show()

In [16]:
fig = px.box(df, x="Data", 
             color="Label",
             color_discrete_sequence=['teal', 'purple'],
             height = 500, width = 800,
             title='Box Plots')
fig.show()

In [17]:
fig = px.box(df, x="Data", 
             color="Label",
             color_discrete_sequence=px.colors.qualitative.Prism,
             height = 500, width = 800,
             title='Box Plots')
fig.show()

In [18]:
fig = px.box(df, x="Data", 
             color="Label",
             notched=True,
             color_discrete_sequence=['teal', 'purple'],
             height = 500, width = 800,
             title='Box Plots')
fig.show()

In [19]:
fig = px.box(df, x="Data", 
             color="Label",
             points="suspectedoutliers",
             notched=True,
             color_discrete_sequence=['teal', 'purple'],
             height = 500, width = 800,
             title='Box Plots')
fig.show()

In [20]:
fig = px.box(df, x="Data", 
             color="Label",
             points="all",
             notched=True,
             color_discrete_sequence=['teal', 'purple'],
             height = 500, width = 800,
             title='Box Plots')
fig.show()

In [21]:
fig = px.box(df, x="Data", 
             color="Label",
             boxmode="overlay",
             notched=True,
             color_discrete_sequence=['teal', 'purple'],
             height = 500, width = 800,
             title='Box Plots')
fig.show()