# Anscombe Quartet Dataset

By alexmahesh  

The Anscombe Quartet Dataset consists of four datasets that have nearly identical
descriptive statistics but very different charts.  
It was created by 1973 by the statistician Francis Anscombe to show (among others) the 
importance of using graphics/charts in data analysis.  


Here I do a quick visual EDA and try out the needed steps I want to use for the Streamlit Dashboard.

## Import the needed Python libraries

In [27]:
import pandas as pd
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go

## Read the Anscombe Data Set

In [28]:
df = pd.read_csv('data/anscombe.csv')
df.head()

Unnamed: 0,x1,x2,x3,x4,y1,y2,y3,y4
0,10.0,10.0,10.0,8.0,8.04,9.14,7.46,6.58
1,8.0,8.0,8.0,8.0,6.95,8.14,6.77,5.76
2,13.0,13.0,13.0,8.0,7.58,8.74,12.74,7.71
3,9.0,9.0,9.0,8.0,8.81,8.77,7.11,8.84
4,11.0,11.0,11.0,8.0,8.33,9.26,7.81,8.47


## Compare the four datasets visually by drawing some scatter plots.

In [29]:
fig = make_subplots(
    rows=2, 
    cols=2,
    subplot_titles=('Dataset 1', 'Dataset 2', 'Dataset 3', 'Dataset 4')
)

fig.add_trace(
    go.Scatter(x=df['x1'], y=df['y1'], mode='markers'),
    row=1, col=1,
)

fig.add_trace(
    go.Scatter(x=df['x2'], y=df['y2'], mode='markers'),
    row=1, col=2
)

fig.add_trace(
    go.Scatter(x=df['x3'], y=df['y3'], mode='markers'),
    row=2, col=1
)

fig.add_trace(
    go.Scatter(x=df['x4'], y=df['y4'], mode='markers'),
    row=2, col=2
)

fig.update_layout(title_text='Anscombe Quartet', showlegend=False)

## Try out what I want to do in Streamlit.  
I want to create a simple, small dashboard where you can select one of the four datasets with an
interactive widget (e.g. a dropdown, slider or radio buttons).  
After choosing one dataset, it's values, some descriptive statistics and the graph are shown to the user.

### Interactive Widget

In [34]:
# Choose dataset
# This has to be replaced in Streamlit with the interactive widget.
dataset = "2"

### Show the values of the chosen dataset

In [35]:
df[[f"x{dataset}", f"y{dataset}"]]

Unnamed: 0,x2,y2
0,10.0,9.14
1,8.0,8.14
2,13.0,8.74
3,9.0,8.77
4,11.0,9.26
5,14.0,8.1
6,6.0,6.13
7,4.0,3.1
8,12.0,9.13
9,7.0,7.26


### Show some descriptive statistics for the chosen dataset

In [36]:
df[[f"x{dataset}", f"y{dataset}"]].describe()

Unnamed: 0,x2,y2
count,11.0,11.0
mean,9.0,7.500909
std,3.316625,2.031657
min,4.0,3.1
25%,6.5,6.695
50%,9.0,8.14
75%,11.5,8.95
max,14.0,9.26


### Show the graph of the chosen dataset

In [37]:
fig = px.scatter(df, x=f"x{dataset}", y=f"y{dataset}")
fig.show()