# Khipus.ai
## Introduction to Machine Learning
### Supervised Learning - Linear Regression
### Case Study: Car Prices
<span>© Copyright Notice 2025, Khipus.ai - All Rights Reserved.</span>

In [18]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

### Load Dataset
Let's load the dataset and explore its structure.

In [19]:

file_path = 'Automobile_price_data.csv'
df = pd.read_csv(file_path)
df.head()

Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,2,164,audi,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950
1,2,164,audi,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450
2,1,158,audi,gas,std,four,sedan,fwd,front,105.8,...,136,mpfi,3.19,3.4,8.5,110,5500,19,25,17710
3,1,158,audi,gas,turbo,four,sedan,fwd,front,105.8,...,131,mpfi,3.13,3.4,8.3,140,5500,17,20,23875
4,2,192,bmw,gas,std,two,sedan,rwd,front,101.2,...,108,mpfi,3.5,2.8,8.8,101,5800,23,29,16430


### Data Preparation
We need to clean the data and prepare it for the regression model.

In [20]:
# Convert missing values and prepare the dataset
df = df.replace('?', pd.NA).dropna()
df['price'] = pd.to_numeric(df['price'])
df['horsepower'] = pd.to_numeric(df['horsepower'])


In [21]:
# Select features and target variable
# Split the data into features and target variable
X = df[['horsepower']]
y = df['price']

### Split Data
We'll split the data into training and testing sets.

In [22]:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

### Train the Model
We will train a linear regression model using the training data.

In [23]:

model = LinearRegression()
model.fit(X_train, y_train)

### Evaluate the Model
We'll calculate the mean squared error to evaluate the model performance.

In [27]:

y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'Root Mean Squared Error: {rmse}')
print(f'R² Score: {r2}')

Mean Squared Error: 7431602.670900858
Root Mean Squared Error: 2726.096599700909
R² Score: 0.5823682396698122


Root Mean Squared Error (RMSE): 2726.10

On average, the model's predictions are off by about $2726.10 from the actual car prices.
RMSE is in the same units as the target variable (price), making it easy to understand.

R² Score: 0.582

The model explains about 58.2% of the variance in car prices.
Indicates a moderate level of accuracy in the model's predictions.

Interpretation:

The model's predictions are somewhat accurate but can be improved. The horsepower feature is not enough to predict the car price.

The average prediction error is around $2726.10, which is moderate.
The model captures 58.2% of the variability in car prices, but there's still room for improvement.

Next Steps:

Improve the model by refining features and trying different models:

    Use a Random Forest Regressor Multiple Linear Regression with Random Forest Regressor


Fine-tune the model settings to reduce errors.

Ensure the data is clean and well-prepared, and handle any outliers or unusual data points.