# 통계적 추론
* 추정(estimation)
    * 점추정(point estimation)
    * 구간추정(interval estimation)
* 검정(testing)

# 점추정
* 불편성
* 최소분사성

## 평균의 추정량

## 분산의 추정량

## 비율의 추정량

# 구간추정
* 신뢰구간

## 평균의 신뢰구간
* 분산을 알고 있는 경우 - 정규분포
* 분산을 모르고 있는 경우 - t분포

> 분산을 알고 있는 경우

In [1]:
### library
import pandas as pd
import numpy as np
from scipy import stats

In [2]:
pop_var = 16
x = pd.Series([5.9, 6.8, 10.2, 14.0, 17.3, 10.1, 4.6, 9.5, 7.5, 9.8])

In [3]:
# 평균의 점추정량
round(x.mean(), 4)

9.57

In [4]:
# 95% 신뢰구간 (1-alpha)% 신뢰구간
alpha = 0.05
n = x.size
c_025 = stats.norm.ppf(1-alpha/2)
pop_sd = np.sqrt(pop_var)
err = c_025 * pop_sd / np.sqrt(n)
CI_95 = pd.Series([np.mean(x)-err, np.mean(x)+err])
CI_95.round(2)

0     7.09
1    12.05
dtype: float64

> 분산을 모르는 경우

In [5]:
# 95% 신뢰구간 (1-alpha)% 신뢰구간
alpha = 0.05
n = x.size
c_025 = stats.t(n-1).ppf(1-alpha/2)
x_sd = x.std()
err = c_025 * x_sd / np.sqrt(n)
CI_95 = pd.Series([np.mean(x)-err, np.mean(x)+err])
CI_95.round(2)

0     6.85
1    12.29
dtype: float64

In [6]:
x.size

10

## 분산의 신뢰구간

In [7]:
# 95% 신뢰구간 (1-alpha)% 신뢰구간
alpha = 0.05
n = x.size
x_var = x.var()
c_025 = stats.chi2(n-1).ppf(1-alpha/2)
c_075 = stats.chi2(n-1).ppf(alpha/2)
err = c_025 * x_sd / np.sqrt(n)
CI_95 = pd.Series([(n-1)*x_var/c_025, (n-1)*x_var/c_075])
CI_95.round(2)

0     6.85
1    48.23
dtype: float64

## 비율의 신뢰구간

In [8]:
n = 1000
p_hat = 540/n
# 95% 신뢰구간 (1-alpha)% 신뢰구간
alpha = 0.05
c_025 = stats.norm.ppf(1-alpha/2)
err = c_025 * np.sqrt(p_hat*(1-p_hat)/n)
CI_95 = pd.Series([p_hat-err, p_hat+err])
CI_95.round(2)

0    0.51
1    0.57
dtype: float64

예-titanic
* titanic.csv 자료에서 생존(survived)에 대한 비율의 95% 신뢰구간 구하기.

In [9]:
data_file = 'http://youngho.iwinv.net/data/titanic.csv'
data_raw = pd.read_csv(data_file)
data_raw.shape
data_raw.head()

Unnamed: 0.1,Unnamed: 0,pclass,survived,name,sex,age,sibsp,parch,ticket,fare,cabin,embarked,boat,body,home.dest
0,1,1st,1,"Allen, Miss. Elisabeth Walton",female,29.0,0,0,24160,211.337494,B5,Southampton,2.0,,"St Louis, MO"
1,2,1st,1,"Allison, Master. Hudson Trevor",male,0.9167,1,2,113781,151.550003,C22 C26,Southampton,11.0,,"Montreal, PQ / Chesterville, ON"
2,3,1st,0,"Allison, Miss. Helen Loraine",female,2.0,1,2,113781,151.550003,C22 C26,Southampton,,,"Montreal, PQ / Chesterville, ON"
3,4,1st,0,"Allison, Mr. Hudson Joshua Crei",male,30.0,1,2,113781,151.550003,C22 C26,Southampton,,135.0,"Montreal, PQ / Chesterville, ON"
4,5,1st,0,"Allison, Mrs. Hudson J C (Bessi",female,25.0,1,2,113781,151.550003,C22 C26,Southampton,,,"Montreal, PQ / Chesterville, ON"


In [10]:
n = data_raw.shape[0]
p_hat = data_raw.survived.sum()/n      # data_raw.survived.mean()
# 95% 신뢰구간 (1-alpha)% 신뢰구간
alpha = 0.05
c_025 = stats.norm.ppf(1-alpha/2)
err = c_025 * np.sqrt(p_hat*(1-p_hat)/n)
CI_95 = pd.Series([p_hat-err, p_hat+err])
CI_95.round(2)

0    0.36
1    0.41
dtype: float64

In [11]:
p_hat

0.3819709702062643

In [12]:
err

0.026320727211628304