#Dataset Description

\
# Upload Dataset : https://www.kaggle.com/datasets/avikasliwal/used-cars-price-prediction/data

## The dataset contains information about Used Cars Price with the following columns:

`index`: Index

`Name`: The brand and model of the car.

`Location`: The location in which the car is being sold or is available for purchase.

`Year`: The year or edition of the model.

`Kilometers_Driven`: The total kilometres driven in the car by the previous owner(s) in KM.

`Fuel_Type`: The type of fuel used by the car. (Petrol / Diesel / Electric / CNG / LPG)

`Transmission`: The type of transmission used by the car. (Automatic / Manual)

`Owner_Type`: Whether the ownership is Firsthand, Second hand or other.

`Mileage`: The standard mileage offered by the car company in kmpl or km/kg

`Engine`: The displacement volume of the engine in cc.


#Tasks

## 1 . Data Cleaning

### Read the dataset

In [6]:
import pandas as pd 
df = pd.read_csv('../datasets/Used_Cars.csv')

In [7]:
df.head()

Unnamed: 0.1,Unnamed: 0,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,New_Price,Price
0,0,Maruti Wagon R LXI CNG,Mumbai,2010,72000,CNG,Manual,First,26.6 km/kg,998 CC,58.16 bhp,5.0,,1.75
1,1,Hyundai Creta 1.6 CRDi SX Option,Pune,2015,41000,Diesel,Manual,First,19.67 kmpl,1582 CC,126.2 bhp,5.0,,12.5
2,2,Honda Jazz V,Chennai,2011,46000,Petrol,Manual,First,18.2 kmpl,1199 CC,88.7 bhp,5.0,8.61 Lakh,4.5
3,3,Maruti Ertiga VDI,Chennai,2012,87000,Diesel,Manual,First,20.77 kmpl,1248 CC,88.76 bhp,7.0,,6.0
4,4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,2013,40670,Diesel,Automatic,Second,15.2 kmpl,1968 CC,140.8 bhp,5.0,,17.74


### Handling Missing Values

In [8]:
df.isnull().sum()

Unnamed: 0              0
Name                    0
Location                0
Year                    0
Kilometers_Driven       0
Fuel_Type               0
Transmission            0
Owner_Type              0
Mileage                 2
Engine                 36
Power                  36
Seats                  42
New_Price            5195
Price                   0
dtype: int64

### Correct any inconsistent data entries.

In [9]:
df['New_Price'].fillna(df['New_Price'].mode()[0], inplace=True)
df['Mileage'].fillna(df['Mileage'].mode()[0], inplace=True)
df['Engine'].fillna(df['Engine'].mode()[0], inplace=True)
df['Seats'].fillna(df['Seats'].mode()[0], inplace=True)
df['Power'].fillna(df['Power'].mode()[0], inplace=True)
df.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['New_Price'].fillna(df['New_Price'].mode()[0], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Mileage'].fillna(df['Mileage'].mode()[0], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object 

Unnamed: 0.1,Unnamed: 0,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,New_Price,Price
0,0,Maruti Wagon R LXI CNG,Mumbai,2010,72000,CNG,Manual,First,26.6 km/kg,998 CC,58.16 bhp,5.0,4.78 Lakh,1.75
1,1,Hyundai Creta 1.6 CRDi SX Option,Pune,2015,41000,Diesel,Manual,First,19.67 kmpl,1582 CC,126.2 bhp,5.0,4.78 Lakh,12.5
2,2,Honda Jazz V,Chennai,2011,46000,Petrol,Manual,First,18.2 kmpl,1199 CC,88.7 bhp,5.0,8.61 Lakh,4.5
3,3,Maruti Ertiga VDI,Chennai,2012,87000,Diesel,Manual,First,20.77 kmpl,1248 CC,88.76 bhp,7.0,4.78 Lakh,6.0
4,4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,2013,40670,Diesel,Automatic,Second,15.2 kmpl,1968 CC,140.8 bhp,5.0,4.78 Lakh,17.74


In [10]:
df.isnull().sum()

Unnamed: 0           0
Name                 0
Location             0
Year                 0
Kilometers_Driven    0
Fuel_Type            0
Transmission         0
Owner_Type           0
Mileage              0
Engine               0
Power                0
Seats                0
New_Price            0
Price                0
dtype: int64

### Ensure data types are appropriate for each column.

---

In [11]:
df.dtypes

Unnamed: 0             int64
Name                  object
Location              object
Year                   int64
Kilometers_Driven      int64
Fuel_Type             object
Transmission          object
Owner_Type            object
Mileage               object
Engine                object
Power                 object
Seats                float64
New_Price             object
Price                float64
dtype: object

## 2. Exploratory Data Analysis (EDA)

In [12]:
df.head()

Unnamed: 0.1,Unnamed: 0,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,New_Price,Price
0,0,Maruti Wagon R LXI CNG,Mumbai,2010,72000,CNG,Manual,First,26.6 km/kg,998 CC,58.16 bhp,5.0,4.78 Lakh,1.75
1,1,Hyundai Creta 1.6 CRDi SX Option,Pune,2015,41000,Diesel,Manual,First,19.67 kmpl,1582 CC,126.2 bhp,5.0,4.78 Lakh,12.5
2,2,Honda Jazz V,Chennai,2011,46000,Petrol,Manual,First,18.2 kmpl,1199 CC,88.7 bhp,5.0,8.61 Lakh,4.5
3,3,Maruti Ertiga VDI,Chennai,2012,87000,Diesel,Manual,First,20.77 kmpl,1248 CC,88.76 bhp,7.0,4.78 Lakh,6.0
4,4,Audi A4 New 2.0 TDI Multitronic,Coimbatore,2013,40670,Diesel,Automatic,Second,15.2 kmpl,1968 CC,140.8 bhp,5.0,4.78 Lakh,17.74


In [13]:
df.tail()

Unnamed: 0.1,Unnamed: 0,Name,Location,Year,Kilometers_Driven,Fuel_Type,Transmission,Owner_Type,Mileage,Engine,Power,Seats,New_Price,Price
6014,6014,Maruti Swift VDI,Delhi,2014,27365,Diesel,Manual,First,28.4 kmpl,1248 CC,74 bhp,5.0,7.88 Lakh,4.75
6015,6015,Hyundai Xcent 1.1 CRDi S,Jaipur,2015,100000,Diesel,Manual,First,24.4 kmpl,1120 CC,71 bhp,5.0,4.78 Lakh,4.0
6016,6016,Mahindra Xylo D4 BSIV,Jaipur,2012,55000,Diesel,Manual,Second,14.0 kmpl,2498 CC,112 bhp,8.0,4.78 Lakh,2.9
6017,6017,Maruti Wagon R VXI,Kolkata,2013,46000,Petrol,Manual,First,18.9 kmpl,998 CC,67.1 bhp,5.0,4.78 Lakh,2.65
6018,6018,Chevrolet Beat Diesel,Hyderabad,2011,47000,Diesel,Manual,First,25.44 kmpl,936 CC,57.6 bhp,5.0,4.78 Lakh,2.5


In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6019 entries, 0 to 6018
Data columns (total 14 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Unnamed: 0         6019 non-null   int64  
 1   Name               6019 non-null   object 
 2   Location           6019 non-null   object 
 3   Year               6019 non-null   int64  
 4   Kilometers_Driven  6019 non-null   int64  
 5   Fuel_Type          6019 non-null   object 
 6   Transmission       6019 non-null   object 
 7   Owner_Type         6019 non-null   object 
 8   Mileage            6019 non-null   object 
 9   Engine             6019 non-null   object 
 10  Power              6019 non-null   object 
 11  Seats              6019 non-null   float64
 12  New_Price          6019 non-null   object 
 13  Price              6019 non-null   float64
dtypes: float64(2), int64(3), object(9)
memory usage: 658.5+ KB


In [15]:
df.set_index('Name', inplace=True)

In [19]:
df = df.drop(['Unnamed: 0'], axis=1)

In [21]:
from sklearn.preprocessing import LabelEncoder

Label_encoder = LabelEncoder()

df['Location_Encoded'] = Label_encoder.fit_transform(df['Location'])

In [28]:
df[['Location', 'Location_Encoded']]

Unnamed: 0_level_0,Location,Location_Encoded
Name,Unnamed: 1_level_1,Unnamed: 2_level_1
Maruti Wagon R LXI CNG,Mumbai,9
Hyundai Creta 1.6 CRDi SX Option,Pune,10
Honda Jazz V,Chennai,2
Maruti Ertiga VDI,Chennai,2
Audi A4 New 2.0 TDI Multitronic,Coimbatore,3
...,...,...
Maruti Swift VDI,Delhi,4
Hyundai Xcent 1.1 CRDi S,Jaipur,6
Mahindra Xylo D4 BSIV,Jaipur,6
Maruti Wagon R VXI,Kolkata,8


In [29]:
df['Fuel_Encoded'] = Label_encoder.fit_transform(df['Fuel_Type'])

In [43]:
df1 = df[['Fuel_Type', 'Fuel_Encoded']]
print(df1['Fuel_Type'].unique(), df1['Fuel_Encoded'].unique())

['CNG' 'Diesel' 'Petrol' 'LPG' 'Electric'] [0 1 4 3 2]


In [44]:
print(df1)

                                 Fuel_Type  Fuel_Encoded
Name                                                    
Maruti Wagon R LXI CNG                 CNG             0
Hyundai Creta 1.6 CRDi SX Option    Diesel             1
Honda Jazz V                        Petrol             4
Maruti Ertiga VDI                   Diesel             1
Audi A4 New 2.0 TDI Multitronic     Diesel             1
...                                    ...           ...
Maruti Swift VDI                    Diesel             1
Hyundai Xcent 1.1 CRDi S            Diesel             1
Mahindra Xylo D4 BSIV               Diesel             1
Maruti Wagon R VXI                  Petrol             4
Chevrolet Beat Diesel               Diesel             1

[6019 rows x 2 columns]


In [32]:
df.dtypes

Location              object
Year                   int64
Kilometers_Driven      int64
Fuel_Type             object
Transmission          object
Owner_Type            object
Mileage               object
Engine                object
Power                 object
Seats                float64
New_Price             object
Price                float64
Location_Encoded       int64
Fuel_Encoded           int64
dtype: object

### Perform summary statistics on the dataset.

In [59]:
df.describe()

Unnamed: 0.1,Unnamed: 0,Year,Kilometers_Driven,Seats,Price
count,6019.0,6019.0,6019.0,6019.0,6019.0
mean,3009.0,2013.358199,58738.38,5.27679,9.479468
std,1737.679967,3.269742,91268.84,0.806346,11.187917
min,0.0,1998.0,171.0,0.0,0.44
25%,1504.5,2011.0,34000.0,5.0,3.5
50%,3009.0,2014.0,53000.0,5.0,5.64
75%,4513.5,2016.0,73000.0,5.0,9.95
max,6018.0,2019.0,6500000.0,10.0,160.0


### Identify and analyze patterns in the data.

### Visualize the distribution of key variables.

### Explore relationships between variables.


In [13]:
#df1 = df.select_dtypes('float64', 'int64')
df.corr()

ValueError: could not convert string to float: 'Maruti Wagon R LXI CNG'

## 3. Data Visualization

* Ensure the visualizations are clear and informative.

### Create visualizations to illustrate the findings from the EDA.


### Use appropriate plots such as histograms, bar charts, pie charts, scatter plots, and heatmaps.

## 4. Insights and Conclusions

* <h3>Summarize the key insights gained from the data analysis.<h3/>
* <h3>Draw conclusions based on the patterns observed in the data.<h3/>