# Final Project of Sampling Method Jakarta Worker's Revenue in 2021

![title](https://www.mas-software.com/wp-content/uploads/2021/03/Laba-Adalah.jpg)

### A.	Introduction & Background

The salary of informal workers in Jakarta is often a problem because they do not have the same rights and protection as formal workers. Some of the main issues are as follows:

1. Inadequate wages: Many informal workers in Jakarta do not receive adequate wages and are disproportionate to the number of hours they work. They are often forced to work more than one job to make ends meet.
2. Absence of social security: Informal workers in Jakarta rarely have access to social security services such as health insurance and pensions. This means that they have no guarantee of assistance if they develop health problems or are approaching retirement.
3. Lack of workers' rights: Informal workers often do not have the same labor rights as formal workers, such as annual leave, rest periods, and the right to assembly and association.
4. Unsecured work stability: Informal workers in Jakarta often do not have work contracts and no guarantee of job stability. This means that they could be fired at any time for no apparent reason.
5. Discrimination: Many informal workers in Jakarta experience discrimination and stigma from society and employers because of their status as informal workers.

Based on this story, we will make a sampling design on the income level of information workers in DKI Jakarta in 2023 for further analysis by stakeholders regarding the resilience of this group of workers in facing potential economic crises in the future. The hope that can be achieved from this research is to obtain population parameters for income levels in the form of average and distribution which can later be used for public policy purposes.

### B.	Import Package

In [None]:
# Load data manipulation package
import numpy as np
import pandas as pd
import math
import copy

# Load data visualization package
import matplotlib.pyplot as plt
import seaborn as sns

# Load statistic package
from scipy import stats

### C.	Load Dataset

In [None]:
# Load dataset
prior_2021_rev = pd.read_csv('2021_Revenue.csv')
prior_2021_percent = pd.read_csv('2021_Informal.csv')
prior_2021_tingkatan = pd.read_csv('2021_Tingkat_Pendidikan.csv')
prior_2021_kota = pd.read_csv('2021_Tingkat_Kota.csv')
prior_2021_kec = pd.read_csv('2021_Tingkat_Kecamatan.csv')

### D.	Cleansing Dataset

In [None]:
#Cleasing dataset untuk kolom "Rata-rata"
rev_kota_2021 = prior_2021_rev.loc[0:6,["Kabupaten/Kota","Rata-rata"]]
rev_kota_2021["Rata-rata"] = rev_kota_2021["Rata-rata"].str.replace(",","")
rev_kota_2021["Rata-rata"] = rev_kota_2021["Rata-rata"].str.strip()
rev_kota_2021["Rata-rata"] = rev_kota_2021["Rata-rata"].astype(float)
rev_kota_2021

Unnamed: 0,Kabupaten/Kota,Rata-rata
0,Kepulauan Seribu,1938617.4
1,Jakarta Selatan,2575959.43
2,Jakarta Timur,2530202.31
3,Jakarta Pusat,2593224.37
4,Jakarta Barat,2711186.59
5,Jakarta Utara,2849107.24
6,DKI Jakarta,2647851.61


### E. Sampling Plan

* Population

The population for this study consisted of several stages including the following
1. Stage 1 population: All regencies/cities in DKI Jakarta for each stratum
2. Stage 2 population: All sub-districts in regencies/cities in DKI Jakarta
3. Stage 3 population: All informal workers in sub-districts in DKI Jakarta

* Target population

The target population is the entire population who are informal workers in DKI Jakarta

* Sampling frame

To obtain information from the population, the authors use data from https://jakarta.bps.go.id/ based on the 2021 National Labor Force Survey (Sakernas).

* Sampling units

For research related to the level of income of informal workers, samples were taken based on the following units:
- First unit: districts/cities in DKI Jakarta for each stratum
- Second unit: sub-districts in regencies/cities in DKI Jakarta
- The third unit: informal workers in sub-districts in DKI Jakarta

* Observation unit

In this case, the unit of observation is the informal workers in DKI Jakarta because the object will be the data on the level of monthly income.

* Units of analysis

Because we want to see the level of income of informal workers in DKI Jakarta, the unit of analysis is used at the unit level of informal workers

* Characteristics studied

Because the characteristic studied is the level of income of informal workers in DKI Jakarta. Then the character to be studied is the level of income

* Estimated characteristic value

Characteristic values that are estimated are average income, proportion of income, and total income

* Sampling method

The sampling method used is a combination of stratification, cluster sampling and probability proportional to size or commonly known as multistage sampling because to accommodate strata of income level to level of education requires stratified sampling and district/city, while cluster sampling is carried out to select which sub-district clusters will be used in sampling. Furthermore, sampling was carried out using the SRS method.

### F.  Estimating the Needs of the Number of Samples

There's 4 stage in this sampling **(Multistage Sampling)**

a. Stage 1: Dividing the population into 4 strata (Stratified Sampling Stage 1)

b. Stage 2: Dividing the population into 6 strata (Stratified Sampling Stage 2)

c. Stage 3: Cluster sampling at stage 1 in each stratum (District)

d. Stage 4: Sampling with SRS

#### Stage 1: Dividing the population into 4 strata (Stratified Sampling Stage 1)

This division was made due to the initial assumption that there are differences in income levels based on their level of education. So the population is divided into never attended school / did not graduate from Elementary School, Junior High School, and Secondary High School. The average percentage is obtained from DKI Jakarta BPS data.

Total Cost Formula :

$$
C = c_{0} + c_{1}n + c_{2}nm
$$

In [None]:
# Assumption Formula
c0 = 5_000_000
c1 = 100_000
c2 = 100_000
N = 6
n_ = 6
M = 1000
nm = 1000


$$
m_{\text{opt}} = \sqrt{
\cfrac{c_{1} \sigma_{w}^{2}}
{c_{2} \left( \sigma_{b}^{2} - \cfrac{\sigma_{w}^{2}}{\bar{M}} \right)}
}
$$

$$
\sigma_{b}^{2} = \cfrac{1}{N-1} \sum_{i=1}^{N} (\mu_{i} - \mu)^{2}
$$

$$
\sigma_{w}^{2} = \cfrac{1}{N} \sum_{i=1}^{N} \sigma_{i}^{2}
$$

We can calculate the variance of income levels using prior information which is data from BPS DKI Jakarta in 2021. With this data, the following values are obtained:

In [None]:
# Average Revenue
rev_dki_2021 = rev_kota_2021.loc[6::,["Rata-rata"]]
rev_dki_2021 = rev_dki_2021["Rata-rata"].values[0]

In [None]:
rev_kota_2021["Variance"] = (rev_kota_2021["Rata-rata"]-rev_dki_2021)**2

In [None]:
rev_kota_2021

Unnamed: 0,Kabupaten/Kota,Rata-rata,Variance
0,Kepulauan Seribu,1938617.4,503013200000.0
1,Jakarta Selatan,2575959.43,5168486000.0
2,Jakarta Timur,2530202.31,13841360000.0
3,Jakarta Pusat,2593224.37,2984135000.0
4,Jakarta Barat,2711186.59,4011320000.0
5,Jakarta Utara,2849107.24,40503830000.0
6,DKI Jakarta,2647851.61,0.0


Next, calculations are performed to find σ_b^2 and σ_w^2

In [None]:
# Value of sigma_b^2 & sigma_w^2
sigma_b = (1/(N-1))*(rev_kota_2021["Variance"].sum())
sigma_w = (1/(N))*(rev_kota_2021["Variance"].sum())
print("Nilai dai sigma_b^2 adalah", sigma_b)
print("Nilai dai sigma_w^2 adalah", sigma_w)

Nilai dai sigma_b^2 adalah 113904458323.65628
Nilai dai sigma_w^2 adalah 94920381936.38022


The assumption for the value of c_o is Rp. 5,000,000 which is the salary for survey staff. Then assume the value of c_1 and c_2 for PSU and SSU is IDR 100,000. Based on the assumption that the value of M is 1,000 for the prior data, it is obtained

In [None]:
# Value of m_opt
m_opt = math.sqrt(c1*sigma_w/(c2*(sigma_b-(sigma_w/M))))
print("Nilai dai m_opt adalah", round(m_opt,3))

Nilai dai m_opt adalah 0.913


Next, calculations are performed to obtain the total allocation cost (C) with a PSU value of 6 (N according to the number of districts) and the assumption that the number of SSUs sampled in the prior data is 1,000 (not listed in BPS data) so that it can be calculated as follows:

In [None]:
# Value C
C = c0 + c1*n_+c2*nm
print("Nilai dai C adalah Rp.",C)

Nilai dai C adalah Rp. 105600000


With the C value obtained, the required number of samples can be calculated using the following equation:

$$
n = \cfrac{C - c_{0}}{c_{1} + c_{2} m_{\text{opt}}}
$$

In [None]:
# Value n
n = (C-c0)/(c1+c2*m_opt)
n = math.ceil(n)
print("Total dari sample adalah", n)

Total dari sample adalah 526


In [None]:
# Copy of School Strata
df_2021_strata = copy.copy(prior_2021_strata)

In [None]:
df_2021_strata["Persentase (%)"] = df_2021_strata["Persentase (%)"]/100

In [None]:
# Dividing the sample based on school strata
df_2021_strata["Total Sample"] = round(df_2021_strata["Persentase (%)"]*n,0)

The total of the sample will be divided into the first strata for the Education level group using *proportional allocation* and the number of samples is obtained as follows:

In [None]:
df_2021_strata

Unnamed: 0,No,Strata Pendidikan,Persentase (%),Total Sample
0,1,Tidak Sekolah,0.1074,56.0
1,2,SD / MI,0.1334,70.0
2,3,SMP / Mts,0.0603,32.0
3,4,SMA ke Atas,0.6989,368.0


#### Stage 2: Dividing the population into 6 strata (Stratified Sampling Stage 2)

This division was made due to the initial assumption that there are differences in income levels by district/city. So the population is divided into the Thousand Islands, South Jakarta, East Jakarta, Central Jakarta, West Jakarta and North Jakarta. The number of samples to be used uses *proportional allocation*

In [None]:
prior_2021_kota

Unnamed: 0,Kabupaten/Kota,Presentase
0,Kepulauan Seribu,0.27
1,Jakarta Selatan,21.05
2,Jakarta Timur,28.81
3,Jakarta Pusat,10.05
4,Jakarta Barat,23.0
5,Jakarta Utara,16.82
6,DKI Jakarta,100.0


In [None]:
tingkat_pendidikan = df_2021_strata["Strata Pendidikan"].values.tolist()
kota_dki = prior_2021_kota["Kabupaten/Kota"].values.tolist()

In [None]:
rancang_sampling = pd.DataFrame({'Strata Pendidikan' : ['Tidak Sekolah','Tidak Sekolah','Tidak Sekolah','Tidak Sekolah','Tidak Sekolah','Tidak Sekolah',
                                                        'SD / MI','SD / MI','SD / MI','SD / MI','SD / MI','SD / MI',
                                                        'SMP / Mts','SMP / Mts','SMP / Mts','SMP / Mts','SMP / Mts','SMP / Mts',
                                                        'SMA ke Atas','SMA ke Atas','SMA ke Atas','SMA ke Atas','SMA ke Atas','SMA ke Atas'],
                                 'Kabupaten/Kota' : ['Kepulauan Seribu','Jakarta Selatan', 'Jakarta Timur', 'Jakarta Pusat', 'Jakarta Barat', 'Jakarta Utara',
                                                     'Kepulauan Seribu','Jakarta Selatan', 'Jakarta Timur', 'Jakarta Pusat', 'Jakarta Barat', 'Jakarta Utara',
                                                     'Kepulauan Seribu','Jakarta Selatan', 'Jakarta Timur', 'Jakarta Pusat', 'Jakarta Barat', 'Jakarta Utara',
                                                     'Kepulauan Seribu','Jakarta Selatan', 'Jakarta Timur', 'Jakarta Pusat', 'Jakarta Barat', 'Jakarta Utara',]})

In [None]:
rancang_sampling["Total Sample"] = round((prior_2021_kota["Presentase"].iloc[0:6]/100)*df_2021_strata["Total Sample"][0])

In [None]:
rancang_sampling["Total Sample"].iloc[6:12] = round((prior_2021_kota["Presentase"].iloc[0:6]/100)*df_2021_strata["Total Sample"][1])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Sample"].iloc[6:12] = round((prior_2021_kota["Presentase"].iloc[0:6]/100)*df_2021_strata["Total Sample"][1])


In [None]:
rancang_sampling["Total Sample"].iloc[12:18] = round((prior_2021_kota["Presentase"].iloc[0:6]/100)*df_2021_strata["Total Sample"][2])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Sample"].iloc[12:18] = round((prior_2021_kota["Presentase"].iloc[0:6]/100)*df_2021_strata["Total Sample"][2])


In [None]:
rancang_sampling["Total Sample"].iloc[18:24] = round((prior_2021_kota["Presentase"].iloc[0:6]/100)*df_2021_strata["Total Sample"][3])

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Sample"].iloc[18:24] = round((prior_2021_kota["Presentase"].iloc[0:6]/100)*df_2021_strata["Total Sample"][3])


In [None]:
#Adjusting Total Sample
rancang_sampling["Total Sample"][0] = 1
rancang_sampling["Total Sample"][3] = 5
rancang_sampling["Total Sample"][6] = 1
rancang_sampling["Total Sample"][12] = 1
rancang_sampling["Total Sample"][13] = 6

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Sample"][0] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Sample"][3] = 5
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Sample"][6] = 1
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Sample"][12] = 1
A value

Furthermore, the value of the sample is calculated based on the strata ratio listed in the conditions above so that each is obtained as follows:

In [None]:
rancang_sampling

Unnamed: 0,Strata Pendidikan,Kabupaten/Kota,Total Sample
0,Tidak Sekolah,Kepulauan Seribu,1.0
1,Tidak Sekolah,Jakarta Selatan,12.0
2,Tidak Sekolah,Jakarta Timur,16.0
3,Tidak Sekolah,Jakarta Pusat,5.0
4,Tidak Sekolah,Jakarta Barat,13.0
5,Tidak Sekolah,Jakarta Utara,9.0
6,SD / MI,Kepulauan Seribu,1.0
7,SD / MI,Jakarta Selatan,15.0
8,SD / MI,Jakarta Timur,20.0
9,SD / MI,Jakarta Pusat,7.0


#### Stage 3 : Cluster sampling in stage 1 in each stratum (District)

The distribution of the number of sub-district clusters used for sampling was calculated using cluster sampling design calculations.

In [None]:
#Clustering Count group by City
sample_kota = rancang_sampling.groupby(["Kabupaten/Kota"]).sum()
sample_kota

  sample_kota = rancang_sampling.groupby(["Kabupaten/Kota"]).sum()


Unnamed: 0_level_0,Total Sample
Kabupaten/Kota,Unnamed: 1_level_1
Jakarta Barat,121.0
Jakarta Pusat,52.0
Jakarta Selatan,110.0
Jakarta Timur,151.0
Jakarta Utara,88.0
Kepulauan Seribu,4.0


In [None]:
#M of each city
M_sample_ks = sample_kota.iloc[5][0]
M_sample_js = sample_kota.iloc[2][0]
M_sample_jt = sample_kota.iloc[3][0]
M_sample_jp = sample_kota.iloc[1][0]
M_sample_jb = sample_kota.iloc[0][0]
M_sample_ju = sample_kota.iloc[4][0]
M_sample_kec = [M_sample_ks, M_sample_js, M_sample_jt, M_sample_jp, M_sample_jb, M_sample_ju]
M_sample_kec

[4.0, 110.0, 151.0, 52.0, 121.0, 88.0]

In [None]:
#M of each subdistrict
prior_2021_kec["M Kecamatan"] = M_sample_kec

In [None]:
prior_2021_kec

Unnamed: 0,Kabupaten/Kota,Jumlah Kecamatan,M Kecamatan
0,Kepulauan Seribu,2,4.0
1,Jakarta Selatan,10,110.0
2,Jakarta Timur,10,151.0
3,Jakarta Pusat,8,52.0
4,Jakarta Barat,8,121.0
5,Jakarta Utara,6,88.0


In [None]:
#Assumption of B Value
B = 15_000_000

In [None]:
rev_kota_2021["y"] =  rev_kota_2021["Rata-rata"][0:6]*prior_2021_kec["M Kecamatan"]

In [None]:
rev_kota_2021["miu x m"] = rev_dki_2021*prior_2021_kec["M Kecamatan"]

In [None]:
rev_kota_2021["(y-miu x m)^2"] = rev_kota_2021["y"] - rev_kota_2021["miu x m"]
rev_kota_2021["(y-miu x m)^2"] = rev_kota_2021["(y-miu x m)^2"]**2

In [None]:
rev_kota_2021["sigma c"] = rev_kota_2021["(y-miu x m)^2"]/(prior_2021_kec["Jumlah Kecamatan"]-1)

In [None]:
rev_kota_2021["D"] = B**2/(4*prior_2021_kec["Jumlah Kecamatan"]**2)

In [None]:
rev_kota_2021["n"] = round((prior_2021_kec["Jumlah Kecamatan"]*rev_kota_2021["sigma c"])/(prior_2021_kec["Jumlah Kecamatan"]*rev_kota_2021["D"]+rev_kota_2021["sigma c"]))

In [None]:
#Adjustment n value of Kepulauan Seribu
rev_kota_2021["y"][6] = 0
rev_kota_2021["miu x m"][6] = 0
rev_kota_2021["sigma c"][6] = 0
rev_kota_2021["D"][6] = 0
rev_kota_2021["n"][6] = 0
rev_kota_2021["n"][0] = 1

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rev_kota_2021["y"][6] = 0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rev_kota_2021["miu x m"][6] = 0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rev_kota_2021["sigma c"][6] = 0
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rev_kota_2021["D"][6] = 0
A value is trying to be set on a copy of a slice fro

In [None]:
rev_kota_2021

Unnamed: 0,Kabupaten/Kota,Rata-rata,Variance,y,miu x m,(y-miu x m)^2,sigma c,D,n
0,Kepulauan Seribu,1938617.4,503013200000.0,7754470.0,10591410.0,8048211000000.0,8048211000000.0,14062500000000.0,1.0
1,Jakarta Selatan,2575959.43,5168486000.0,283355500.0,291263700.0,62538680000000.0,6948742000000.0,562500000000.0,6.0
2,Jakarta Timur,2530202.31,13841360000.0,382060500.0,399825600.0,315596800000000.0,35066310000000.0,562500000000.0,9.0
3,Jakarta Pusat,2593224.37,2984135000.0,134847700.0,137688300.0,8069102000000.0,1152729000000.0,878906200000.0,1.0
4,Jakarta Barat,2711186.59,4011320000.0,328053600.0,320390000.0,58729730000000.0,8389962000000.0,878906200000.0,4.0
5,Jakarta Utara,2849107.24,40503830000.0,250721400.0,233010900.0,313661600000000.0,62732330000000.0,1562500000000.0,5.0
6,DKI Jakarta,2647851.61,0.0,0.0,0.0,,0.0,0.0,0.0


#### Stage 4 : Sampling with SRS

The following is the result of the recapitulation for the sampling design with the division adjusted to the sampling design and calculations above

In [None]:
rancang_sampling["Total Kecamatan yang disampling"] = rev_kota_2021["n"][0:6]

In [None]:
rancang_sampling["Total Kecamatan yang disampling"][6:12] = rev_kota_2021["n"][0:6]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Kecamatan yang disampling"][6:12] = rev_kota_2021["n"][0:6]


In [None]:
rancang_sampling["Total Kecamatan yang disampling"][12:18] = rev_kota_2021["n"][0:6]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Kecamatan yang disampling"][12:18] = rev_kota_2021["n"][0:6]


In [None]:
rancang_sampling["Total Kecamatan yang disampling"][18:24] = rev_kota_2021["n"][0:6]

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  rancang_sampling["Total Kecamatan yang disampling"][18:24] = rev_kota_2021["n"][0:6]


In [None]:
rancang_sampling

Unnamed: 0,Strata Pendidikan,Kabupaten/Kota,Total Sample,Total Kecamatan yang disampling
0,Tidak Sekolah,Kepulauan Seribu,1.0,1.0
1,Tidak Sekolah,Jakarta Selatan,12.0,6.0
2,Tidak Sekolah,Jakarta Timur,16.0,9.0
3,Tidak Sekolah,Jakarta Pusat,5.0,1.0
4,Tidak Sekolah,Jakarta Barat,13.0,4.0
5,Tidak Sekolah,Jakarta Utara,9.0,5.0
6,SD / MI,Kepulauan Seribu,1.0,1.0
7,SD / MI,Jakarta Selatan,15.0,6.0
8,SD / MI,Jakarta Timur,20.0,9.0
9,SD / MI,Jakarta Pusat,7.0,1.0


In [None]:
#Based on calculations and adjustments obtained samples per cluster
per_kecamatan = [[1],[2,2,2,2,2,2],[2,2,2,2,2,2,2,1,1],[5],[3,3,3,4],[2,2,2,2,1],
                 [1],[2,2,2,3,3,3],[2,2,2,2,2,2,2,3,3],[7],[4,4,4,4],[2,2,2,3,3],
                 [1],[1,1,1,1,1,1],[1,1,1,1,1,1,1,1,1],[3],[1,2,2,2],[1,1,1,1,1],
                 [1],[12,13,13,13,13,13],[12,12,12,12,12,12,12,11,11],[37],[21,21,21,22],[12,12,12,13,13]]

In [None]:
rancang_sampling["Pembagian per Kecamatan"] = per_kecamatan
rancang_sampling

Unnamed: 0,Strata Pendidikan,Kabupaten/Kota,Total Sample,Total Kecamatan yang disampling,Pembagian per Kecamatan
0,Tidak Sekolah,Kepulauan Seribu,1.0,1.0,[1]
1,Tidak Sekolah,Jakarta Selatan,12.0,6.0,"[2, 2, 2, 2, 2, 2]"
2,Tidak Sekolah,Jakarta Timur,16.0,9.0,"[2, 2, 2, 2, 2, 2, 2, 1, 1]"
3,Tidak Sekolah,Jakarta Pusat,5.0,1.0,[5]
4,Tidak Sekolah,Jakarta Barat,13.0,4.0,"[3, 3, 3, 4]"
5,Tidak Sekolah,Jakarta Utara,9.0,5.0,"[2, 2, 2, 2, 1]"
6,SD / MI,Kepulauan Seribu,1.0,1.0,[1]
7,SD / MI,Jakarta Selatan,15.0,6.0,"[2, 2, 2, 3, 3, 3]"
8,SD / MI,Jakarta Timur,20.0,9.0,"[2, 2, 2, 2, 2, 2, 2, 3, 3]"
9,SD / MI,Jakarta Pusat,7.0,1.0,[7]


The table above is the final output of the sampling design that will be implemented. Each district is plotted based on calculations of sample needs

### G.  Estimating the Needs of the Number of Samples

The method used for data collection is using the telephone or using an online messenger. The method of collecting data by telephone sampling is one way to collect data by telephone. This is a popular method of conducting surveys and market research, as it is relatively cheaper and faster than other methods such as in-person interviews or sending out questionnaires.

To use this method, you must first create a list of samples to call. This sample can be taken randomly from the population to be studied. After that, the surveyor or interviewer will call each telephone number in the sample and ask a series of questions needed to collect data.

Some things to note in this method are:
1. Sample validity: make sure that the sample taken truly represents the population you want to study.
2. Skills of the interviewer: the interviewer must have good skills in managing telephone interviews and ensure that the answers from the respondents are valid and accurate.
3. Cost: although relatively cheaper than other methods, cost can still be a significant factor depending on the number of samples one wishes to study.
4. Respondents not available: some respondents may not be available or not willing to participate in telephone interviews.
5. Validity of answers: answers given by respondents may not be accurate because they may not want to open up or are uncomfortable with telephone interviews.

Despite some limitations, the telephone sampling method remains a popular choice because of its convenience and lower cost. Based on the sampling analysis, this can be done because according to the National Socio-Economic Survey (Susenas) conducted by the Central Statistics Agency (BPS) in March 2019 smartphone use has reached 84.32% in DKI Jakarta. By telephone and online messenger, interviews were conducted with the surveyor regarding general data such as ID Number, Name, Address, Number of Insured's Family, Job Name, and Monthly Income.

### H.	Parameter Estimation Method

After sampling based on the plan described above, the data from the sampling results can be used to estimate the income level of the population of informal workers in DKI Jakarta. The parameters that are used for inference are the total, mean, and variance.

**bold text**### I.	Analysis

Based on the results of the sampling design above, it can be concluded that:

1. Prior knowledge is needed in the sampling design process to make it easier to provide value assumptions for the parameters in sampling
2. Based on the results of the sampling design, multistage sampling was used with the first stratification using education level, the second stratification related to districts/cities, cluster sampling at the sub-district level, and simple random sampling at the lowest level
3. Based on the calculation of the sampling design, at least 526 samples are needed to get a statistical picture regarding the level of income of informal workers in DKI Jakarta where the total sample is considered reasonable
4. The cost incurred to carry out this sampling is Rp. 105,600,000 consisting of a fixed cost of Rp. 5,000,000 and the sampling fee for the PSU and SSU levels is Rp. 100,000 per sample where this value is still relatively affordable for government agencies in survey needs. The cost per survey is still a "worst case" assumption where the cost can still be reduced by up to 70-80%
5. To carry out this sampling, telephone and online messenger methods were used to ask the survey participants the amount of income each month. So that the use of resources and the allocation of surveyors will be more flexible.

The advantages of the sampling design carried out are:

1. Clear sampling stratification so that inferences can also be made for each district/city in DKI Jakarta
2. Multi-stage sampling is relatively cheaper and more efficient than SRS because the target population is large and geographically dispersed
3. This method is quite flexible in using any sampling method

Weaknesses of the sampling design carried out are:

1. Compared to SRS, a larger sample size is required for multi-stage samples to obtain the same statistical inference
2. Selection of the best sampling method at each stage is subjective, so clear reasons are needed to avoid biased decision making
3. Can lead to non-representative samples because a large portion of the population may not be selected in the population collection, potentially leading to undercoverage bias and selection bias

### J.	Conclusion

1. To conduct a sampling of the income level of informal workers in DKI Jakarta in 2023, a sample of 526 people is required, stratified based on their education level and city of residence (Multi Stage Sampling).
2. The cost incurred for conducting this sampling is Rp. 105,600,000 and can still be optimized again.
3. Sampling is done using the telephone method or online massager.
4. The advantage of using this method is that it is cheaper and more efficient for geographically dispersed target populations.
5. The drawback of this method is that it requires a large sample size, and there is a potential for bias due to an unrepresentative sample.

### K.	Reference

- https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1a-epidemiology/methods-of-sampling-population

- https://jakarta.bps.go.id/publication/2022/02/25/5979600247867d861a1f334c/provinsi-dki-jakarta-dalam-angka-2022.html