# (연습) 모집단과 표본

**기본 설정**

Numpy와 Pandas 라이브러리를 각각 np와 pd로 불러온다.

In [1]:
import numpy as np
import pandas as pd

데이터프레임의 [chained indexing을 금지시키기 위한 설정](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy)을 지정한다.
Pandas 3.0 버전부터는 기본 옵션으로 지정된다.

In [2]:
pd.options.mode.copy_on_write = True

주피터 노트북에서 부동소수점의 출력을 소수점 이하 6자리로 제한한다.
아래 코드는 주피터 노트북에서만 사용하며 일반적인 파이썬 코드가 아니다.

In [3]:
%precision 6

'%.6f'

아래 코드는 데이터프레임 내에서 부동소수점의 출력을 소수점 이하 6자리로 제한한다.

In [4]:
pd.set_option('display.precision', 6)

데이터 시각화를 위해 `matplotlib.pyplot`를 `plt`라는 별칭으로 불러온다.

In [5]:
import matplotlib.pyplot as plt

**데이터 저장소 디렉토리**

코드에 사용되는 [데이터 저장소의 기본 디렉토리](https://github.com/codingalzi/DataSci/tree/master/data)를 지정한다.

In [6]:
data_url = 'https://raw.githubusercontent.com/codingalzi/DataSci/refs/heads/master/data/'

## 타이타닉 데이터셋

타이타닉호의 승객에 대한 정보와 생존 여부를 담은 데이터셋을 불러온다.

In [7]:
titanic = pd.read_csv(data_url+"titanic.csv")

각 생존자별로 12개의 정보가 포함된다.

| 특성 | 의미 |
| :--- | :--- |
| PassengerId  | 승객 번호 |
| Survived | 생존 여부. 0 또는 1. 1일 때 생존 |
| Pclass | 승객 클래스 |
| Name | 승객 이름 |
| Sex | 승객의 성 |
| Age | 승객 나이 |
| SibSp | 타이타닉에 함께 승선한 형제자매와 배우자의 수 |
| Parch | 타이타닉에 함께 승선한 자녀와 부모의 수 |
| Ticket | 티켓 번호 |
| Fare | 티켓 요금(영구 파운드화) |
| Cabin | 객실 번호 |
| Embarked | 승객이 타이타닉호에 승선한 항구 |
| | C=Cherbourg, Q=Queenstown, S=Southampton |

In [8]:
titanic

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


**인덱스 변경**

먼저 `PassengerId` 특성을 인덱스로 지정한다.

In [9]:
titanic = titanic.set_index("PassengerId")

In [10]:
titanic

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...
887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


여기서는 `Sex`, `Age`, `Survived` 세 개의 특성만 활용한다.

In [11]:
titanic = titanic[['Sex', 'Age', 'Survived']]
titanic

Unnamed: 0_level_0,Sex,Age,Survived
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,male,22.0,0
2,female,38.0,1
3,female,26.0,1
4,female,35.0,1
5,male,35.0,0
...,...,...,...
887,male,27.0,0
888,female,19.0,1
889,female,,0
890,male,26.0,1


**결측치 확인**

데이터셋의 크기인 891보다 적은 수의 `non-null` 값을 갖는 특성에 결측치가 .
데이터프레임의 `info()` 메서드로 확인하면 `Age` 특성에 
데이터셋의 크기인 891보다 적은 수의 `non-null` 값이 포함되어 있다.
이는 177개의 결측치가 존재함을 의미한다.

In [12]:
titanic.info()

<class 'pandas.core.frame.DataFrame'>
Index: 891 entries, 1 to 891
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sex       891 non-null    object 
 1   Age       714 non-null    float64
 2   Survived  891 non-null    int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 27.8+ KB


**`Age` 특성 결측치 처리 방법 1: 특성 중앙값 활용**

`Age` 특성의 결측치를 다양한 방식으로 채운다.
따라서 `titanic` 데이터셋의  원본을 그대로 두고 복제해서 사용한다.

In [13]:
titanic_median = titanic.copy()

`Age` 특성의 결측치를 모두 해당 특성의 중앙값으로 대체한다.

In [14]:
age_median =titanic_median['Age'].median()
age_median

28.000000

In [15]:
titanic_median['Age'] = titanic_median['Age'].fillna(age_median)

In [16]:
titanic_median.info()

<class 'pandas.core.frame.DataFrame'>
Index: 891 entries, 1 to 891
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sex       891 non-null    object 
 1   Age       891 non-null    float64
 2   Survived  891 non-null    int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 27.8+ KB


**`Age` 특성 결측치 처리 방법 2: 성(Sex)별 중앙값 활용**

- 방식 1: 부울 인덱싱 활용

먼저 타이타닉 데이터셋을 복제한다.

In [17]:
titanic_sex_median = titanic.copy()

여성과 남성의 중위연령을 확인한다.

In [18]:
f_mask = titanic_sex_median["Sex"]=="female"
f_age_median = titanic_sex_median.loc[f_mask, "Age"].median()
print("여성 중위연령:", f_age_median)

여성 중위연령: 27.0


In [19]:
m_mask = titanic_sex_median["Sex"]=="male"
m_age_median = titanic_sex_median.loc[m_mask, "Age"].median()
print("남성 중위연령:", m_age_median)

남성 중위연령: 29.0


부울 인덱싱으로 남녀별로 결측치를 각각의 중위값으로 대체한다.

In [20]:
titanic_sex_median.loc[f_mask, 'Age'] = titanic_sex_median.loc[f_mask, 'Age'].fillna(f_age_median)
titanic_sex_median.loc[m_mask, 'Age'] = titanic_sex_median.loc[m_mask, 'Age'].fillna(m_age_median)

모든 결측치가 사라졌음을 확인한다.

In [21]:
titanic_sex_median.info()

<class 'pandas.core.frame.DataFrame'>
Index: 891 entries, 1 to 891
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sex       891 non-null    object 
 1   Age       891 non-null    float64
 2   Survived  891 non-null    int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 27.8+ KB


- 방식 2: `groupby()` 활용

먼저 타이타닉 데이터셋을 복제한다.

In [22]:
titanic_sex_median = titanic.copy()

여성과 남성의 중위연령을 확인한다.

In [23]:
titanic_sex_median.groupby('Sex')['Age'].median()

Sex
female    27.0
male      29.0
Name: Age, dtype: float64

아래 코드는 성별에 따라 결측치를 각각의 중위값으로 대체한다.

In [24]:
titanic_sex_median_age = titanic_sex_median.groupby('Sex')['Age'].apply(lambda y:y.fillna(y.median()))
titanic_sex_median_age

Sex     PassengerId
female  2              38.0
        3              26.0
        4              35.0
        9              27.0
        10             14.0
                       ... 
male    884            28.0
        885            25.0
        887            27.0
        890            26.0
        891            32.0
Name: Age, Length: 891, dtype: float64

다중인덱스의 레벨 1에 위치한 `PassensgerId`를 기준으로 오름차순으로 정렬한다.

In [25]:
titanic_sex_median_age = titanic_sex_median_age.sort_index(level=1)
titanic_sex_median_age

Sex     PassengerId
male    1              22.0
female  2              38.0
        3              26.0
        4              35.0
male    5              35.0
                       ... 
        887            27.0
female  888            19.0
        889            27.0
male    890            26.0
        891            32.0
Name: Age, Length: 891, dtype: float64

`Age` 특성의 값을 새롭게 지정한다.

In [26]:
titanic_sex_median.loc[:, 'Age'] = titanic_sex_median_age.values

모든 결측치가 사라졌음을 확인한다.

In [27]:
titanic_sex_median.info()

<class 'pandas.core.frame.DataFrame'>
Index: 891 entries, 1 to 891
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   Sex       891 non-null    object 
 1   Age       891 non-null    float64
 2   Survived  891 non-null    int64  
dtypes: float64(1), int64(1), object(1)
memory usage: 27.8+ KB


## 연습문제

`Age`의 결측치가 성별 중위연령으로 대체된 타이타닉 데이터셋을 이용한다.

In [28]:
titanic = titanic_sex_median
titanic

Unnamed: 0_level_0,Sex,Age,Survived
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,male,22.0,0
2,female,38.0,1
3,female,26.0,1
4,female,35.0,1
5,male,35.0,0
...,...,...,...
887,male,27.0,0
888,female,19.0,1
889,female,27.0,0
890,male,26.0,1


**문제 1**

(1) 891명의 10%를 무작위로 추출했을 때의 성비율을 확인하는 코드르 작성하라.

답:

In [29]:
random_sampling = titanic.sample(frac=0.1, random_state=42)
random_sampling

Unnamed: 0_level_0,Sex,Age,Survived
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
710,male,29.0,1
440,male,31.0,0
841,male,20.0,0
721,female,6.0,1
40,female,14.0,1
...,...,...,...
175,male,56.0,0
494,male,71.0,0
216,female,31.0,1
310,female,30.0,1


In [30]:
random_size = len(random_sampling)
random_size

89

표본의 성비율은 다음과 같다.

- 여성: 41.6%
- 남성: 58.4%

In [31]:
random_sampling_ratio =  random_sampling.groupby('Sex').count()/random_size
random_sampling_ratio

Unnamed: 0_level_0,Age,Survived
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,0.41573,0.41573
male,0.58427,0.58427


(2) 891명의 10%를 성비율을 반영하면서 층화표집으로 추출했을 때의 성비율을 확인하는 코드를 작성하라.

답:

In [32]:
stratification = titanic.groupby('Sex', observed=True, group_keys=True)

성별로 10%를 무작위 추출하면 89명이 추출된다.

In [33]:
stratified_sampling = stratification.apply(lambda y:y.sample(frac=0.1, random_state=42), include_groups=False)
stratified_sampling

Unnamed: 0_level_0,Unnamed: 1_level_0,Age,Survived
Sex,PassengerId,Unnamed: 2_level_1,Unnamed: 3_level_1
female,357,22.0,1
female,80,30.0,1
female,597,27.0,1
female,162,40.0,1
female,717,38.0,1
...,...,...,...
male,123,32.5,0
male,838,29.0,0
male,60,11.0,0
male,757,28.0,0


`Sex` 인덱스를 특성으로 변환한다.

In [34]:
stratified_sampling = stratified_sampling.reset_index(level=0)
stratified_sampling

Unnamed: 0_level_0,Sex,Age,Survived
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
357,female,22.0,1
80,female,30.0,1
597,female,27.0,1
162,female,40.0,1
717,female,38.0,1
...,...,...,...
123,male,32.5,0
838,male,29.0,0
60,male,11.0,0
757,male,28.0,0


층화표집으로 추출된 표본의 크기는 89다.

In [35]:
stratified_size = len(stratified_sampling)
stratified_size

89

층화표집으로 추출된 표본의 성비율은 다음과 같다.

- 여성: 34.8%
- 남성: 65.2%

In [36]:
stratified_sampling_count = stratified_sampling.groupby('Sex', observed=False).count()
stratified_sampling_ratio = stratified_sampling_count / stratified_size
stratified_sampling_ratio

Unnamed: 0_level_0,Age,Survived
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,0.348315,0.348315
male,0.651685,0.651685


(3) 무작위 추출과 층화표집의 결과를 비교하는 데이터프레임을 생성하는 코드를 작성하라.

답:

전체 데이터셋의 성비율은 다음과 같다.

In [37]:
stratified_ratio =  titanic.groupby('Sex').count() / len(titanic)
stratified_ratio

Unnamed: 0_level_0,Age,Survived
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1
female,0.352413,0.352413
male,0.647587,0.647587


전체, 층화표집, 무자위 추출의 성비율을 담은 데이터프레임을 선언한다.

In [38]:
proportions = pd.concat([stratified_ratio.iloc[:, [1]], 
                         stratified_sampling_ratio.iloc[:, [1]],
                         random_sampling_ratio.iloc[:, [1]]],
                        axis=1)

proportions.columns = ['전체', '층화표집', '무작위 추출']
proportions.index.name = 'Sex'
proportions

Unnamed: 0_level_0,전체,층화표집,무작위 추출
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.352413,0.348315,0.41573
male,0.647587,0.651685,0.58427


여기에 전체 데이터셋의 성비율에 대한 층화표집과 무작위 추출의 성비율의 오차율를 추가한다.
결과적으로 층화표집을 이용한 표본의 성비율이 전체 데이터셋의 성비율에 보다 많이 근사한다.

In [39]:
proportions["층화표집 오차율"] = (proportions["층화표집"] / proportions["전체"] - 1)
proportions["무작위 추출 오차율"] = (proportions["무작위 추출"] / proportions["전체"] - 1)

proportions

Unnamed: 0_level_0,전체,층화표집,무작위 추출,층화표집 오차율,무작위 추출 오차율
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
female,0.352413,0.348315,0.41573,-0.01163,0.179668
male,0.647587,0.651685,0.58427,0.006329,-0.097774


**문제 2**

(1) `Age` 특성을 10살 단위로 구분하는 연령구간을 지정하여 `Age_Bucket` 특성으로 추가하는 코드를 작성하라.

답:

`pd.cut()`을 사용하지 않고 다음과 같이 연령구간을 10년 기준으로 10년 기준으로 나이대를 지정하여 `Age_Bucket` 특성으로 추가한다.
아래 코드에 사용된 `astype('i8')` 또는 `astype('int64')`는 해당 특성에 포함된
값들의 자료형인 `dtype`을 정수형으로 지정한다.

In [40]:
titanic['Age_Bucket'] = (titanic["Age"] // 10 * 10).astype('i8')

In [41]:
titanic

Unnamed: 0_level_0,Sex,Age,Survived,Age_Bucket
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,male,22.0,0,20
2,female,38.0,1,30
3,female,26.0,1,20
4,female,35.0,1,30
5,male,35.0,0,30
...,...,...,...,...
887,male,27.0,0,20
888,female,19.0,1,10
889,female,27.0,0,20
890,male,26.0,1,20


(2) 추가된 연령구간 정보를 활용하여 891명의 10%를 층화표집하는 코드를 작성하라.

답:

연령대별 그룹화를 진행한다.

In [42]:
stratification_age = titanic.groupby('Age_Bucket', observed=True, group_keys=True)

연령별로 10%를 무작위 추출한다.

In [43]:
stratified_sampling_age = stratification_age.apply(lambda y:y.sample(frac=0.1, random_state=42), include_groups=False)
stratified_sampling_age

Unnamed: 0_level_0,Unnamed: 1_level_0,Sex,Age,Survived
Age_Bucket,PassengerId,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
0,756,male,0.67,1
0,825,male,2.00,0
0,8,male,2.00,0
0,828,male,1.00,1
0,51,male,7.00,0
...,...,...,...,...
50,773,female,57.00,0
50,483,male,50.00,0
60,34,male,66.00,0
60,281,male,65.00,0


`Age_Bucket` 인덱스를 특성으로 변환한다.

In [44]:
stratified_sampling_age = stratified_sampling_age.reset_index(level=0)
stratified_sampling_age

Unnamed: 0_level_0,Age_Bucket,Sex,Age,Survived
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
756,0,male,0.67,1
825,0,male,2.00,0
8,0,male,2.00,0
828,0,male,1.00,1
51,0,male,7.00,0
...,...,...,...,...
773,50,female,57.00,0
483,50,male,50.00,0
34,60,male,66.00,0
281,60,male,65.00,0


(3) 층화표집의 결과와 무작위 추출의 결과를 비교하는 데이터프레임을 생성하는 코드를 작성하라.

답:

층화표집으로 추출된 표본의 크기는 90이다.

In [45]:
stratified_size_age = len(stratified_sampling_age)
stratified_size_age

90

층화표집으로 추출된 표본의 연령비율은 다음과 같다.

In [46]:
stratified_sampling_count_age = stratified_sampling_age.groupby('Age_Bucket', observed=False).count()
stratified_sampling_ratio_age = stratified_sampling_count_age / stratified_size_age
stratified_sampling_ratio_age

Unnamed: 0_level_0,Sex,Age,Survived
Age_Bucket,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.066667,0.066667,0.066667
10,0.111111,0.111111,0.111111
20,0.444444,0.444444,0.444444
30,0.188889,0.188889,0.188889
40,0.1,0.1,0.1
50,0.055556,0.055556,0.055556
60,0.022222,0.022222,0.022222
70,0.011111,0.011111,0.011111


무작위로 10%를 추출하면 다음과 같다.

In [47]:
random_sampling_age = titanic.sample(frac=0.1, random_state=42)
random_sampling_age

Unnamed: 0_level_0,Sex,Age,Survived,Age_Bucket
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
710,male,29.0,1,20
440,male,31.0,0,30
841,male,20.0,0,20
721,female,6.0,1,0
40,female,14.0,1,10
...,...,...,...,...
175,male,56.0,0,50
494,male,71.0,0,70
216,female,31.0,1,30
310,female,30.0,1,30


표본의 크기는 89다.

In [48]:
random_size_age = len(random_sampling_age)
random_size_age

89

무작위로 추출한 표본의 연령비율은 다음과 같다.

In [49]:
random_sampling_ratio_age =  random_sampling_age.groupby('Age_Bucket').count()/random_size_age
random_sampling_ratio_age

Unnamed: 0_level_0,Sex,Age,Survived
Age_Bucket,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.033708,0.033708,0.033708
10,0.202247,0.202247,0.202247
20,0.393258,0.393258,0.393258
30,0.191011,0.191011,0.191011
40,0.089888,0.089888,0.089888
50,0.067416,0.067416,0.067416
60,0.011236,0.011236,0.011236
70,0.011236,0.011236,0.011236


전체 데이터셋의 연령비율은 다음과 같다.

In [50]:
stratified_ratio_age =  titanic.groupby('Age_Bucket').count() / len(titanic)
stratified_ratio_age

Unnamed: 0_level_0,Sex,Age,Survived
Age_Bucket,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.069585,0.069585,0.069585
10,0.114478,0.114478,0.114478
20,0.445567,0.445567,0.445567
30,0.18743,0.18743,0.18743
40,0.099888,0.099888,0.099888
50,0.053872,0.053872,0.053872
60,0.021324,0.021324,0.021324
70,0.006734,0.006734,0.006734
80,0.001122,0.001122,0.001122


전체, 층화표집, 무자위 추출의 연령비율을 담은 데이터프레임을 선언한다.

In [51]:
proportions_age = pd.concat([stratified_ratio_age.iloc[:, [1]], 
                         stratified_sampling_ratio_age.iloc[:, [1]],
                         random_sampling_ratio_age.iloc[:, [1]]],
                        axis=1)

proportions_age.columns = ['전체', '층화표집', '무작위 추출']
proportions_age.index.name = 'Age_Bucket'
proportions_age

Unnamed: 0_level_0,전체,층화표집,무작위 추출
Age_Bucket,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,0.069585,0.066667,0.033708
10,0.114478,0.111111,0.202247
20,0.445567,0.444444,0.393258
30,0.18743,0.188889,0.191011
40,0.099888,0.1,0.089888
50,0.053872,0.055556,0.067416
60,0.021324,0.022222,0.011236
70,0.006734,0.011111,0.011236
80,0.001122,,


여기에 전체 데이터셋의 연령비율에 대한 층화표집과 무작위 추출의 연령비율의 오차율를 추가한다.
결과적으로 층화표집을 이용한 표본의 연령비율이 전체 데이터셋의 연령비율에 보다 많이 근사한다.

In [52]:
proportions_age["층화표집 오차율"] = (proportions_age["층화표집"] / proportions_age["전체"] - 1)
proportions_age["무작위 추출 오차율"] = (proportions_age["무작위 추출"] / proportions_age["전체"] - 1)

proportions_age

Unnamed: 0_level_0,전체,층화표집,무작위 추출,층화표집 오차율,무작위 추출 오차율
Age_Bucket,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,0.069585,0.066667,0.033708,-0.041935,-0.515585
10,0.114478,0.111111,0.202247,-0.029412,0.766689
20,0.445567,0.444444,0.393258,-0.002519,-0.117397
30,0.18743,0.188889,0.191011,0.007784,0.019108
40,0.099888,0.1,0.089888,0.001124,-0.100114
50,0.053872,0.055556,0.067416,0.03125,0.251404
60,0.021324,0.022222,0.011236,0.042105,-0.473093
70,0.006734,0.011111,0.011236,0.65,0.668539
80,0.001122,,,,
