# CAR PRICE PREDICTION (Used Cars) :

**Objective:**
- The main goal of this dataset is to support research and analysis in the domain of predicting used car prices. 
  This dataset aims to provide a foundation for the development and evaluation of predictive models that can assist both 
 buyers and sellers in estimating the price of used cars based on various features.

**Description:**
- The dataset encompasses features of used cars, including details such as 
the car's make, model, year of manufacture, mileage, fuel type, and other relevant attributes.
The dataset is curated to aid researchers and automotive enthusiasts in accurately predicting the price of used cars.

**Dataset Features:**
- **Car Make:** The brand or manufacturer of the used car.
- **Car Model:** The specific model of the used car.
- **Year of Manufacture:** The year when the car was manufactured.
- **Mileage (km):** The total distance the car has traveled, indicating its usage.
- **Fuel Type:** The type of fuel the car uses (e.g., petrol, diesel, hybrid).
- **Transmission:** The type of transmission in the car (e.g., manual, automatic).
- **Owner Type:** The number of previous owners the car has had.
- **Engine Displacement (cc):** The total cubic capacity of the car's engine.
- **Power (bhp):** The power of the car's engine in brake horsepower.
-  **Seats:** The number of seats in the car.
-  **Price (Lakh):** The target variable representing the price of the used car.

**Steps:**
1. **Importing Libraries:** 
   - Start by importing the necessary libraries such as Pandas, NumPy, and Scikit-learn for data manipulation and model training.

2. **Read Data:**
   - Load the dataset containing information about used cars.

3. **Data Preprocessing:** 
   - Handle missing values, clean the data, and perform any necessary transformations to prepare it for modeling.

4. **Feature Engineering:** 
   - Extract relevant features and create new ones that might enhance the predictive power of the model.

5. **Model Training:** 
   - Use machine learning algorithms to train a model that can predict the price of used cars based on the provided features.



# Importing Libraries

In [1]:
import pandas as pd 
from sklearn.model_selection import train_test_split

In [2]:
Car_data=pd.read_csv("D:/DS NOTE/OASIS INFO BYTE/Car price prediction(used cars)/car data.csv")
Car_data

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Driven_kms,Fuel_Type,Selling_type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.60,6.87,42450,Diesel,Dealer,Manual,0
...,...,...,...,...,...,...,...,...,...
296,city,2016,9.50,11.60,33988,Diesel,Dealer,Manual,0
297,brio,2015,4.00,5.90,60000,Petrol,Dealer,Manual,0
298,city,2009,3.35,11.00,87934,Petrol,Dealer,Manual,0
299,city,2017,11.50,12.50,9000,Diesel,Dealer,Manual,0


# Unwanted data in the data we need to remove 

In [7]:
def remove_bikes(dataframe):
    # List of common bike-related keywords
    bike_keywords = ['scooter', 'bike', 'motorcycle', 'suzuki', 'hero', 'honda', 'bajaj', 'ktm', 'mahindra', 'yamaha',
                     'royal enfield', 'tvs', 'apache', 'duke', 'avenger', 'mojo', 'passion', 'splendor', 'hunk', 'discover',
                     'karizma', 'dominar', 'fz', 'rtr', 'bullet', 'thunder', 'dream', 'activa', 'jupiter']

    # Convert all car names to lowercase for case-insensitive matching
    dataframe['Car_Name'] = dataframe['Car_Name'].str.lower()

    # Remove rows where the "Car_Name" column contains bike-related keywords
    dataframe = dataframe[~dataframe['Car_Name'].str.contains('|'.join(bike_keywords))]

    # Resetting index after removal
    dataframe.reset_index(drop=True, inplace=True)

    return dataframe

In [8]:
# Call the function to remove bike names
df = remove_bikes(Car_data)
df

Unnamed: 0,Car_Name,Year,Selling_Price,Present_Price,Driven_kms,Fuel_Type,Selling_type,Transmission,Owner
0,ritz,2014,3.35,5.59,27000,Petrol,Dealer,Manual,0
1,sx4,2013,4.75,9.54,43000,Diesel,Dealer,Manual,0
2,ciaz,2017,7.25,9.85,6900,Petrol,Dealer,Manual,0
3,wagon r,2011,2.85,4.15,5200,Petrol,Dealer,Manual,0
4,swift,2014,4.60,6.87,42450,Diesel,Dealer,Manual,0
...,...,...,...,...,...,...,...,...,...
197,city,2016,9.50,11.60,33988,Diesel,Dealer,Manual,0
198,brio,2015,4.00,5.90,60000,Petrol,Dealer,Manual,0
199,city,2009,3.35,11.00,87934,Petrol,Dealer,Manual,0
200,city,2017,11.50,12.50,9000,Diesel,Dealer,Manual,0


# Data split 

In [9]:
train,test=train_test_split(df,train_size=0.5,shuffle=True,random_state=141)

In [10]:
train.to_csv('D:/DS NOTE/OASIS INFO BYTE/Car price prediction(used cars)/trian/raw/train.csv',index=False)
test.to_csv('D:/DS NOTE/OASIS INFO BYTE/Car price prediction(used cars)/test/raw/test.csv',index=False)