# <center>Margin of Error & Confidence Interval</center>

 The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic  
 prices and the demand for clean air', J. Environ. Economics & Management,  
 vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics  
 ...', Wiley, 1980.   N.B. Various transformations are used in the table on  
 pages 244-261 of the latter.

<br>

 <b>Variables in order:</b>  
 CRIM:     per capita crime rate by town  
 ZN:       proportion of residential land zoned for lots over 25,000 sq.ft.  
 INDUS:    proportion of non-retail business acres per town  
 CHAS:     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)   
 NOX:      nitric oxides concentration (parts per 10 million)  
 RM:       average number of rooms per dwelling  
 AGE:      proportion of owner-occupied units built prior to 1940   
 DIS:      weighted distances to five Boston employment centres  
 RAD:      index of accessibility to radial highways  
 TAX:      full-value property-tax rate per dollar 10,000  
 PTRATIO:  pupil-teacher ratio by town  
 B:        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town  
 LSTAT:    % lower status of the population  
 MEDV:     Median value of owner-occupied homes in dollar 1000's  

In [7]:
import numpy as np
import pandas as pd
from scipy import stats
from sklearn.utils import shuffle

In [2]:
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)

data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])

target = raw_df.values[1::2, 2]
feat = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PIRATIO', 'B', 'LSTAT']

In [3]:
df = pd.DataFrame(data, columns=feat)
df['target'] = target
df = shuffle(df)
df.head()

Unnamed: 0,CRIM,ZN,INDUS,CHAS,NOX,RM,AGE,DIS,RAD,TAX,PIRATIO,B,LSTAT,target
69,0.12816,12.5,6.07,0.0,0.409,5.885,33.0,6.498,4.0,345.0,18.9,396.9,8.79,20.9
211,0.37578,0.0,10.59,1.0,0.489,5.404,88.6,3.665,4.0,277.0,18.6,395.24,23.98,19.3
78,0.05646,0.0,12.83,0.0,0.437,6.232,53.7,5.0141,5.0,398.0,18.7,386.4,12.34,21.2
420,11.0874,0.0,18.1,0.0,0.718,6.411,100.0,1.8589,24.0,666.0,20.2,318.75,15.02,16.7
152,1.12658,0.0,19.58,1.0,0.871,5.012,88.0,1.6102,5.0,403.0,14.7,343.28,12.12,15.3


#### Generate the sample

In [5]:
sample_size = 200
sample = df.sample(n=sample_size, random_state=1)

#### Generate z-critical, margin of error & confidence interval

In [8]:
sample_mean = sample.target.mean()
sample_std = sample.target.std()

z_critical = stats.norm.ppf(q=0.975)
margin_of_error = z_critical * (sample_std / sample_size**0.5)
confidence_interval = (sample_mean - margin_of_error, sample_mean + margin_of_error)

In [9]:
print("Sample mean:", sample_mean)
print("Sample standard deviation:", sample_std)
print()
print("Sample critical point:", z_critical)
print("Sample margin of error:", margin_of_error)
print("Sample confidence interval:", confidence_interval)

Sample mean: 22.769000000000002
Sample standard deviation: 9.203933996124979

Sample critical point: 1.959963984540054
Sample margin of error: 1.275576732429162
Sample confidence interval: (21.49342326757084, 24.044576732429164)
