# PriceTrack: Unlocking Bike Market Insights

PriceTrack is a data science project designed to predict the valuation of used bike based on key input parameters.
Leveraging Multiple Linear regression model, it provides data-driven insights to help sellers make informed decisions.

### Import necessary libraries
This cell imports essential libraries including pandas, NumPy, and scikit-learn modules.

In [10]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score

from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor

from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder, StandardScaler

### Load and prepare dataset
This cell reads the dataset and separates features from the target variable.

In [11]:
# Load dataset
df = pd.read_csv("Cleaned_Bike_Data.csv")

# Separate target and features
X = df[["age", "power", "brand", "owner_encoded", "city", "kms_driven"]]
y = df["price"]

# Identify categorical and numerical columns
categorical_cols = X.select_dtypes(include=["object", "category"]).columns.tolist()
numerical_cols = X.select_dtypes(include=["int64", "float64"]).columns.tolist()

# Preprocessor: OneHotEncode categoricals, Scale numericals
preprocessor = ColumnTransformer(
    transformers=[
        ("num", StandardScaler(), numerical_cols),
        ("cat", OneHotEncoder(handle_unknown="ignore"), categorical_cols),
    ],
    force_int_remainder_cols=False,
)

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

### Linear Regression pipeline
This cell defines a pipeline with preprocessing and a Linear Regression model.

In [13]:
pipe_lr = Pipeline([
    ("preprocessing", preprocessor),
    ("model", LinearRegression())
])
pipe_lr.fit(X_train, y_train)
pred_lr = pipe_lr.predict(X_test)
r2_lr = r2_score(y_test, pred_lr)
print("Linear Regression R² Score:", round(r2_lr, 4), "| Accuracy:", round(r2_lr * 100, 2), "%")

Linear Regression R² Score: 0.9278 | Accuracy: 92.78 %


### Ridge Regression pipeline
This cell defines a pipeline with preprocessing and a Ridge Regression model.

In [14]:
pipe_ridge = Pipeline([
    ("preprocessing", preprocessor),
    ("model", Ridge(alpha=1.0))
])
pipe_ridge.fit(X_train, y_train)
pred_ridge = pipe_ridge.predict(X_test)
r2_ridge = r2_score(y_test, pred_ridge)
print("Ridge Regression R² Score:", round(r2_ridge, 4), "| Accuracy:", round(r2_ridge * 100, 2), "%")

Ridge Regression R² Score: 0.9286 | Accuracy: 92.86 %


### Lasso Regression pipeline
This cell defines a pipeline with preprocessing and a Lasso Regression model.

In [29]:
pipe_lasso = Pipeline([
    ("preprocessing", preprocessor),
    ("model", Lasso(alpha=0.1, tol=0.035, max_iter=500))
])
pipe_lasso.fit(X_train, y_train)
pred_lasso = pipe_lasso.predict(X_test)
r2_lasso = r2_score(y_test, pred_lasso)
print("Lasso Regression R² Score:", round(r2_lasso, 4), "| Accuracy:", round(r2_lasso * 100, 2), "%")


Lasso Regression R² Score: 0.9279 | Accuracy: 92.79 %


### Decision Tree Regression pipeline
This cell defines a pipeline with preprocessing and a Decision Tree Regression model.

In [30]:
pipe_dt = Pipeline([
    ("preprocessing", preprocessor),
    ("model", DecisionTreeRegressor(max_depth=5))
])
pipe_dt.fit(X_train, y_train)
pred_dt = pipe_dt.predict(X_test)
r2_dt = r2_score(y_test, pred_dt)
print("Decision Tree R² Score:", round(r2_dt, 4), "| Accuracy:", round(r2_dt * 100, 2), "%")


Decision Tree R² Score: 0.9459 | Accuracy: 94.59 %


### Random Forest Regression pipeline
This cell defines a pipeline with preprocessing and a Random Forest Regression model.

In [17]:
pipe_rf = Pipeline([
    ("preprocessing", preprocessor),
    ("model", RandomForestRegressor(n_estimators=100, random_state=42))
])
pipe_rf.fit(X_train, y_train)
pred_rf = pipe_rf.predict(X_test)
r2_rf = r2_score(y_test, pred_rf)
print("Random Forest R² Score:", round(r2_rf, 4), "| Accuracy:", round(r2_rf * 100, 2), "%")


Random Forest R² Score: 0.9822 | Accuracy: 98.22 %


### Gradient Boosting Regression pipeline
This cell defines a pipeline with preprocessing and a Gradient Boosting Regression model.

In [18]:
pipe_gb = Pipeline([
    ("preprocessing", preprocessor),
    ("model", GradientBoostingRegressor(n_estimators=100, learning_rate=0.1))
])
pipe_gb.fit(X_train, y_train)
pred_gb = pipe_gb.predict(X_test)
r2_gb = r2_score(y_test, pred_gb)
print("Gradient Boosting R² Score:", round(r2_gb, 4), "| Accuracy:", round(r2_gb * 100, 2), "%")


Gradient Boosting R² Score: 0.9715 | Accuracy: 97.15 %
