# Anscombe Quartet Data Set

## Background

When starting to look for some background on the Anscombe Quartet I found some interesting information about Francis Anscombe himself.

Frank Anscombe, 13 May 1918 – 17 October 2001

Born in Hove England.

![Frank Anscombe](FrancisAnscombe.jpg)

Frank Anscombe went to Cambridge University, he graduated with first class honors in mathematics in 1939.  He served in the Second World War in the English Ministry of Supply.  He contributed to a project for aiming anti-aircraft rockets at German bombers. He also developed a mathematical solution for firing rockets during D-Day, when a bad sequence could have resulted in projectiles falling on English forces. This has since been deemed a sucess.

He returned to Cambridge University as a lecturer

In addition to his work in statistics, Fancis Anscombe had many other interests, including classical music, poetry and art. He corresponded with poet T.S. Eliot in the 1940s. 

During 1951, impressed by then little-known surrealist painter Francis Bacon, he purchased a Bacon work on behalf of the Fitzwilliam Museum at Cambridge. After displaying the painting for a few months, the Museum returned it as too modernist. When Bacon became renowned, Mr. Anscombe sold this work to pay for educating his children.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [3]:
df = pd.read_csv('anscombedataset.csv')
df

Unnamed: 0,x1,x2,x3,x4,y1,y2,y3,y4
0,10,10,10,8,8.04,9.14,7.46,6.58
1,8,8,8,8,6.95,8.14,6.77,5.76
2,13,13,13,8,7.58,8.74,12.74,7.71
3,9,9,9,8,8.81,8.77,7.11,8.84
4,11,11,11,8,8.33,9.26,7.81,8.47
5,14,14,14,8,9.96,8.1,8.84,7.04
6,6,6,6,8,7.24,6.13,6.08,5.25
7,4,4,4,19,4.26,3.1,5.39,12.5
8,12,12,12,8,10.84,9.13,8.15,5.56
9,7,7,7,8,4.82,7.26,6.42,7.91


In [2]:
data = np.genfromtxt("anscombedataset.csv", delimiter=',')
data

array([[  nan,   nan,   nan,   nan,   nan,   nan,   nan,   nan],
       [10.  , 10.  , 10.  ,  8.  ,  8.04,  9.14,  7.46,  6.58],
       [ 8.  ,  8.  ,  8.  ,  8.  ,  6.95,  8.14,  6.77,  5.76],
       [13.  , 13.  , 13.  ,  8.  ,  7.58,  8.74, 12.74,  7.71],
       [ 9.  ,  9.  ,  9.  ,  8.  ,  8.81,  8.77,  7.11,  8.84],
       [11.  , 11.  , 11.  ,  8.  ,  8.33,  9.26,  7.81,  8.47],
       [14.  , 14.  , 14.  ,  8.  ,  9.96,  8.1 ,  8.84,  7.04],
       [ 6.  ,  6.  ,  6.  ,  8.  ,  7.24,  6.13,  6.08,  5.25],
       [ 4.  ,  4.  ,  4.  , 19.  ,  4.26,  3.1 ,  5.39, 12.5 ],
       [12.  , 12.  , 12.  ,  8.  , 10.84,  9.13,  8.15,  5.56],
       [ 7.  ,  7.  ,  7.  ,  8.  ,  4.82,  7.26,  6.42,  7.91],
       [ 5.  ,  5.  ,  5.  ,  8.  ,  5.68,  4.74,  5.73,  6.89]])

In [10]:
X1 = data[1:,0] 
X2 = data[1:,1]
X3 = data[1:,2]
X4 = data[1:,3]
Y1 = data[1:,4]
Y2 = data[1:,5]
Y3 = data[1:,6]
Y4 = data[1:,7]

In [56]:
df.describe()

Unnamed: 0,x1,x2,x3,x4,y1,y2,y3,y4
count,11.0,11.0,11.0,11.0,11.0,11.0,11.0,11.0
mean,9.0,9.0,9.0,9.0,7.500909,7.500909,7.5,7.500909
std,3.316625,3.316625,3.316625,3.316625,2.031568,2.031657,2.030424,2.030579
min,4.0,4.0,4.0,8.0,4.26,3.1,5.39,5.25
25%,6.5,6.5,6.5,8.0,6.315,6.695,6.25,6.17
50%,9.0,9.0,9.0,8.0,7.58,8.14,7.11,7.04
75%,11.5,11.5,11.5,8.0,8.57,8.95,7.98,8.19
max,14.0,14.0,14.0,19.0,10.84,9.26,12.74,12.5


We can see from the above table that the data look slike it should be related.

All the X arrays of values have a mean of 9 and a standard deviation of 3.31
All the Y arrays of values have a mean of 7.5 and a standard deviation of 2.03.

In [19]:
from scipy.stats import linregress
linregress(X1, Y1)

LinregressResult(slope=0.5000909090909091, intercept=3.0000909090909103, rvalue=0.8164205163448399, pvalue=0.00216962887307879, stderr=0.11790550059563408)

In [20]:
from scipy.stats import linregress
linregress(X2, Y2)

LinregressResult(slope=0.5000000000000001, intercept=3.000909090909089, rvalue=0.816236506000243, pvalue=0.0021788162369107845, stderr=0.11796374596764074)

In [21]:
from scipy.stats import linregress
linregress(X3, Y3)

LinregressResult(slope=0.4997272727272729, intercept=3.002454545454544, rvalue=0.8162867394895984, pvalue=0.002176305279228015, stderr=0.11787766222100221)

In [22]:
from scipy.stats import linregress
linregress(X3, Y3)

LinregressResult(slope=0.4997272727272729, intercept=3.002454545454544, rvalue=0.8162867394895984, pvalue=0.002176305279228015, stderr=0.11787766222100221)

## References

https://en.wikipedia.org/wiki/Anscombe%27s_quartet

https://pandas.pydata.org/pandas-docs/stable/indexing.html#slicing-ranges