# Analisis Regresi Linear Manual dengan Dataset Mobil India

Notebook ini hanya menampilkan perhitungan manual regresi linear sederhana dan berganda tanpa visualisasi atau library scikit-learn. Semua output berupa angka dan teks.

In [3]:
import pandas as pd
import numpy as np

# Membaca dataset
car_data = pd.read_csv('car_dataset_india.csv')

# Menampilkan 5 baris pertama dari dataset
print(car_data.head())

# Melihat informasi dasar dari dataset
print("Informasi Dataset:")
car_data.info()

# Melihat statistik deskriptif dari dataset
print(car_data.describe())

# Korelasi antar fitur numerik
correlation_matrix = car_data.corr(numeric_only=True)
print("\nKorelasi antar fitur numerik:")
print(correlation_matrix)

   Car_ID          Brand   Model  Year Fuel_Type Transmission      Price  \
0       1         Toyota  Innova  2024       CNG       Manual  2020000.0   
1       2            Kia     EV6  2023    Diesel       Manual  1770000.0   
2       3  Maruti Suzuki   Dzire  2016    Petrol       Manual  3430000.0   
3       4          Honda   Amaze  2019    Petrol       Manual  1610000.0   
4       5          Honda    City  2015  Electric       Manual  1840000.0   

   Mileage  Engine_CC  Seating_Capacity  Service_Cost  
0     27.3        800                 4       24100.0  
1     16.4       2500                 7       18800.0  
2     17.6       2000                 6       24700.0  
3     19.2       2500                 6       23300.0  
4     15.8       1000                 5        5800.0  
Informasi Dataset:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 11 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            -------------- 

In [4]:
# Regresi Linear Sederhana (Manual) - Mileage vs Price
X = car_data['Mileage'].values
y = car_data['Price'].values

mean_x = np.mean(X)
mean_y = np.mean(y)

numerator = np.sum((X - mean_x) * (y - mean_y))
denominator = np.sum((X - mean_x) ** 2)
slope = numerator / denominator
intercept = mean_y - slope * mean_x

print(f'\nRegresi Linear Sederhana (Manual):')
print(f'Slope (b1): {slope}')
print(f'Intercept (b0): {intercept}')

predictions = intercept + slope * X
print('\nContoh prediksi harga (5 data pertama):')
for i in range(5):
    print(f'Mileage: {X[i]}, Actual Price: {y[i]}, Predicted Price: {predictions[i]}')

ss_total = np.sum((y - mean_y) ** 2)
ss_res = np.sum((y - predictions) ** 2)
r_squared = 1 - (ss_res / ss_total)
print(f'\nR-squared (Manual): {r_squared}')


Regresi Linear Sederhana (Manual):
Slope (b1): 19397.868650808738
Intercept (b0): 1427569.000020339

Contoh prediksi harga (5 data pertama):
Mileage: 27.3, Actual Price: 2020000.0, Predicted Price: 1957130.8141874175
Mileage: 16.4, Actual Price: 1770000.0, Predicted Price: 1745694.045893602
Mileage: 17.6, Actual Price: 3430000.0, Predicted Price: 1768971.4882745726
Mileage: 19.2, Actual Price: 1610000.0, Predicted Price: 1800008.0781158668
Mileage: 15.8, Actual Price: 1840000.0, Predicted Price: 1734055.324703117

R-squared (Manual): 0.01745964382719034


In [6]:
# Regresi Linear Berganda (Manual) - Mileage, Engine_CC, Service_Cost vs Price
X_multi = car_data[['Mileage', 'Engine_CC']].values
y_multi = car_data['Price'].values
X_multi_b = np.c_[np.ones(X_multi.shape[0]), X_multi]  # Tambahkan kolom 1 untuk intercept
beta = np.linalg.inv(X_multi_b.T @ X_multi_b) @ X_multi_b.T @ y_multi
print('\nRegresi Linear Berganda (Manual):')
print(f'Intercept (b0): {beta[0]}')
print(f'Koefisien Mileage (b1): {beta[1]}')
print(f'Koefisien Engine_CC (b2): {beta[2]}')
pred_multi = X_multi_b @ beta
print('\nContoh prediksi harga (5 data pertama) Regresi Berganda:')
for i in range(5):
    print(f'Mileage: {X_multi[i,0]}, Engine_CC: {X_multi[i,1]}, Actual Price: {y_multi[i]}, Predicted Price: {pred_multi[i]}')
ss_total_multi = np.sum((y_multi - np.mean(y_multi)) ** 2)
ss_res_multi = np.sum((y_multi - pred_multi) ** 2)
r_squared_multi = 1 - (ss_res_multi / ss_total_multi)
print(f'\nR-squared (Manual) Regresi Berganda: {r_squared_multi}')


Regresi Linear Berganda (Manual):
Intercept (b0): 739311.599504671
Koefisien Mileage (b1): 23539.03298521877
Koefisien Engine_CC (b2): 366.1283990284809

Contoh prediksi harga (5 data pertama) Regresi Berganda:
Mileage: 27.3, Engine_CC: 800.0, Actual Price: 2020000.0, Predicted Price: 1674829.9192239281
Mileage: 16.4, Engine_CC: 2500.0, Actual Price: 1770000.0, Predicted Price: 2040672.738033461
Mileage: 17.6, Engine_CC: 2000.0, Actual Price: 3430000.0, Predicted Price: 1885855.3781014832
Mileage: 19.2, Engine_CC: 2500.0, Actual Price: 1610000.0, Predicted Price: 2106582.0303920736
Mileage: 15.8, Engine_CC: 1000.0, Actual Price: 1840000.0, Predicted Price: 1477356.7196996086

R-squared (Manual) Regresi Berganda: 0.06225480568915753
