<a href="https://colab.research.google.com/github/azharanowar/machine_learning/blob/main/Car_Pricing_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Demo with Dummy Data: Building a Car Price Predictor with Python
Introduction
bold textIn this demo, we'll build a simple machine learning model to predict the price of a car based on various features like mileage, age, brand, model, engine size, and condition. We'll use a linear regression model, a fundamental algorithm in machine learning, to establish a relationship between these features and the car's price.

Understanding Categorical Features
Categorical features, like "brand" and "model" in our car price prediction dataset, cannot be directly fed into a linear regression model. We need to convert these categorical features into numerical representations that the model can understand.

One-Hot Encoding
One common technique to encode categorical features is One-Hot Encoding. This involves creating a new binary feature for each category within a categorical variable. For example, for the "Brand" feature:

Brand_Toyota: 1 if the brand is Toyota, 0 otherwise
Brand_Ford: 1 if the brand is Ford, 0 otherwise
Brand_Honda: 1 if the brand is Honda, 0 otherwise

1. Import Necessary Libraries

In [None]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline

2. Create Dummy Dataset


In [None]:
# Create dummy dataset
data = {
    'mileage': [35000, 50000, 75000, 10000, 60000, 20000],
    'age': [3, 5, 7, 1, 4, 2],
    'brand': [0, 1, 0, 2, 1, 2],
    'model': [1, 2, 1, 3, 2, 3],
    'engine_size': [2.0, 1.8, 2.2, 1.5, 2.0, 1.6],
    'condition': [4, 3, 2, 5, 3, 4],
    'price': [15000, 12000, 9000, 20000, 11000, 18000]
}

df = pd.DataFrame(data)
df.head()

Unnamed: 0,mileage,age,brand,model,engine_size,condition,price
0,35000,3,0,1,2.0,4,15000
1,50000,5,1,2,1.8,3,12000
2,75000,7,0,1,2.2,2,9000
3,10000,1,2,3,1.5,5,20000
4,60000,4,1,2,2.0,3,11000


3. The remaining steps


In [None]:
# Define features and target
X = df[['mileage', 'age', 'brand', 'model', 'engine_size', 'condition']]
y = df['price']

# Define categorical columns
categorical_features = ['brand', 'model']

# Create the column transformer
preprocessor = ColumnTransformer(
    transformers=[
        ('cat', OneHotEncoder(), categorical_features)
    ],
    remainder='passthrough'
)

# Create a pipeline that first transforms the data and then fits the model
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', LinearRegression())
])

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
pipeline.fit(X_train, y_train)

# Predict and evaluate
y_pred = pipeline.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f'MSE: {mse}')

# Predict price for a new car
new_car = pd.DataFrame([[40000, 1, 1, 2, 1.9, 3]], columns=['mileage', 'age', 'brand', 'model', 'engine_size', 'condition'])
predicted_price = pipeline.predict(new_car)
print(f'Predicted price: {predicted_price}')

MSE: 1026133.7312196433
Predicted price: [14072.58946819]


Conclusion
In this demo, we've built a basic linear regression model to predict car prices. While this is a simplified example, it provides a solid foundation for understanding machine learning concepts. Real-world applications would involve larger datasets, more complex models, and advanced techniques like feature engineering and hyperparameter tuning.

Remember: This is a simplified example. Real-world car price prediction models would involve more complex features, data cleaning, and potentially more sophisticated algorithms.