# 01 - Regresi Dummy (Dasar)

Notebook ini menunjukkan contoh regresi linier dengan variabel dummy menggunakan dataset `gaji_dummy.csv`.

Variabel:
- `salary_million`      : gaji (juta/bulan) → Y
- `experience_years`    : pengalaman kerja (tahun) → X numerik
- `gender`              : L / P → dummy
- `education_level`     : SMA / S1 / S2 → dummy

In [1]:
import pandas as pd
import statsmodels.formula.api as smf

# Load data
df = pd.read_csv("../data/gaji_dummy.csv")
df.head()

Unnamed: 0,id,experience_years,gender,education_level,salary_million
0,1,10,L,SMA,14.0
1,2,3,L,SMA,8.4
2,3,10,L,S2,17.3
3,4,0,L,SMA,5.0
4,5,9,L,S2,16.1


In [2]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 5 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   id                20 non-null     int64  
 1   experience_years  20 non-null     int64  
 2   gender            20 non-null     object 
 3   education_level   20 non-null     object 
 4   salary_million    20 non-null     float64
dtypes: float64(1), int64(2), object(2)
memory usage: 932.0+ bytes


In [3]:
df[["gender", "education_level"]].value_counts()

gender  education_level
L       SMA                7
        S2                 5
P       S2                 4
        SMA                2
L       S1                 1
P       S1                 1
Name: count, dtype: int64

In [4]:
# Model: salary_million ~ experience_years + gender + education_level
model = smf.ols(
    "salary_million ~ experience_years + C(gender) + C(education_level)",
    data=df
).fit()

model.summary()


0,1,2,3
Dep. Variable:,salary_million,R-squared:,0.986
Model:,OLS,Adj. R-squared:,0.982
Method:,Least Squares,F-statistic:,263.9
Date:,"Tue, 02 Dec 2025",Prob (F-statistic):,1.05e-13
Time:,13:29:47,Log-Likelihood:,-11.172
No. Observations:,20,AIC:,32.34
Df Residuals:,15,BIC:,37.32
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,7.2694,0.404,18.001,0.000,6.409,8.130
C(gender)[T.P],-1.6411,0.237,-6.926,0.000,-2.146,-1.136
C(education_level)[T.S2],1.6555,0.385,4.305,0.001,0.836,2.475
C(education_level)[T.SMA],-1.9695,0.387,-5.083,0.000,-2.795,-1.144
experience_years,0.8548,0.033,25.767,0.000,0.784,0.925

0,1,2,3
Omnibus:,3.521,Durbin-Watson:,1.808
Prob(Omnibus):,0.172,Jarque-Bera (JB):,2.421
Skew:,0.852,Prob(JB):,0.298
Kurtosis:,2.946,Cond. No.,40.4


In [5]:
print("Intercept       : rata-rata gaji kategori baseline")
print("experience_years: tambahan gaji per 1 tahun pengalaman (dalam juta)")
print("C(gender)[T.L]  : selisih gaji Laki-laki terhadap Perempuan (baseline)")
print("C(education_level)[T.S1]/[T.S2] : selisih terhadap pendidikan SMA (baseline)")

Intercept       : rata-rata gaji kategori baseline
experience_years: tambahan gaji per 1 tahun pengalaman (dalam juta)
C(gender)[T.L]  : selisih gaji Laki-laki terhadap Perempuan (baseline)
C(education_level)[T.S1]/[T.S2] : selisih terhadap pendidikan SMA (baseline)
