### Automible Price Prediction Project

### Introduction

#### Objectives
- Loading dataset
- Preprocessing the data
- Explore feature or charecteristics to predict price of car
- Develop prediction models
- Evaluate and refine prediction models

#### Dataset Description

#### Source:

- Dataset Name: Automobile Dataset
- Source: Kaggle
- Dataset Link: Automobile Dataset on Kaggle (https://www.kaggle.com/datasets/premptk/automobile-data-changed)

#### Dataset Size:

- The dataset consists of a substantial number of rows and columns, providing a rich source of information for analysis and modeling.

#### Features (Variables) Included:
The dataset includes a variety of features (variables) that capture different aspects of automobiles. Some of the key features commonly found in this dataset include:

1.  Car ID: A unique identifier for each car.
2.  Symboling: A numeric value representing the degree of risk associated with the car.
3.  Make: The manufacturer or brand of the car.
4.  Fuel Type: The type of fuel used by the car (e.g., gas, diesel).
5.  Aspiration: The type of aspiration used in the engine (e.g., std, turbo).
6.  Num of Doors: The number of doors in the car.
7.  Body Style: The body style of the car (e.g., sedan, hatchback).
8.  Drive Wheels: The type of drive wheels (e.g., fwd, rwd, 4wd).
9.  Engine Location: The location of the car's engine (e.g., front, rear).
10. Wheel Base: The wheelbase of the car.
11. Length: The length of the car.
12. Width: The width of the car.
13. Height: The height of the car.
14. Curb Weight: The weight of the car without passengers or cargo.
15. Engine Type: The type of engine (e.g., dohc, ohc).
16. Num of Cylinders: The number of cylinders in the engine.
17. Engine Size: The size of the car's engine.
18. Fuel System: The type of fuel system used in the engine.
19. Bore: The bore diameter of the engine.
20. Stroke: The stroke length of the engine.
21. Compression Ratio: The compression ratio of the engine.
22. Horsepower: The horsepower of the engine.
23. Peak RPM: The peak RPM (revolutions per minute) of the engine.
24. City MPG: The city fuel efficiency in miles per gallon.
25. Highway MPG: The highway fuel efficiency in miles per gallon.
26. Price: The price of the automobile (target variable for prediction).


#### Sample of the Dataset:
Below is a sample of rows from the dataset to give readers an idea of its structure:


| Car ID | Symboling | Make     | Fuel Type | Aspiration | Num of Doors | Body Style | ... | Horsepower | Peak RPM | City MPG | Highway MPG | Price |
|--------|-----------|----------|-----------|------------|--------------|------------|-----|------------|----------|----------|-------------|-------|
| 1      | 3         | alfa-rom | gas       | std        | two          | convertible| ... | 111        | 5000     | 21       | 27          | 13495 |
| 2      | 3         | alfa-rom | gas       | std        | two          | convertible| ... | 111        | 5000     | 21       | 27          | 16500 |
| 3      | 1         | alfa-rom | gas       | std        | two          | hatchback  | ... | 154        | 5000     | 19       | 26          | 16500 |
| 4      | 2         | audi     | gas       | std        | four         | sedan      | ... | 102        | 5500     | 24       | 30          | 13950 |
| 5      | 2         | audi     | gas       | std        | four         | sedan      | ... | 115        | 5500     | 18       | 22          | 17450 |
| ...    | ...       | ...      | ...       | ...        | ...          | ...        | ... | ...        | ...      | ...      | ...         | ...   |




This dataset provides valuable information about various automobile attributes and serves as a valuable resource for building predictive models for automobile prices.

In [1]:
# Import library
import numpy as np
import pandas as pd

In [2]:
df = pd.read_csv('Automobile_data.csv')

In [8]:
# Show the first 10 rows using dataframe.head() method
print("First 10 rows of the dataframe")
df.head(10)


First 10 rows of the dataframe


Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,13495
1,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.0,111,5000,21,27,16500
2,1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.0,154,5000,19,26,16500
3,2,164,audi,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950
4,2,164,audi,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450
5,2,?,audi,gas,std,two,sedan,fwd,front,99.8,...,136,mpfi,3.19,3.4,8.5,110,5500,19,25,15250
6,1,158,audi,gas,std,four,sedan,fwd,front,105.8,...,136,mpfi,3.19,3.4,8.5,110,5500,19,25,17710
7,1,?,audi,gas,std,four,wagon,fwd,front,105.8,...,136,mpfi,3.19,3.4,8.5,110,5500,19,25,18920
8,1,158,audi,gas,turbo,four,sedan,fwd,front,105.8,...,131,mpfi,3.13,3.4,8.3,140,5500,17,20,23875
9,0,?,audi,gas,turbo,two,hatchback,4wd,front,99.5,...,131,mpfi,3.13,3.4,7.0,160,5500,16,22,?


In [3]:
# Show the Last 10 rows using dataframe.tail() method
print("Last 10 rows of the dataframe")
df.tail(10)

Last 10 rows of the dataframe


Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
195,-1,74,volvo,gas,std,four,wagon,rwd,front,104.3,...,141,mpfi,3.78,3.15,9.5,114,5400,23,28,13415
196,-2,103,volvo,gas,std,four,sedan,rwd,front,104.3,...,141,mpfi,3.78,3.15,9.5,114,5400,24,28,15985
197,-1,74,volvo,gas,std,four,wagon,rwd,front,104.3,...,141,mpfi,3.78,3.15,9.5,114,5400,24,28,16515
198,-2,103,volvo,gas,turbo,four,sedan,rwd,front,104.3,...,130,mpfi,3.62,3.15,7.5,162,5100,17,22,18420
199,-1,74,volvo,gas,turbo,four,wagon,rwd,front,104.3,...,130,mpfi,3.62,3.15,7.5,162,5100,17,22,18950
200,-1,95,volvo,gas,std,four,sedan,rwd,front,109.1,...,141,mpfi,3.78,3.15,9.5,114,5400,23,28,16845
201,-1,95,volvo,gas,turbo,four,sedan,rwd,front,109.1,...,141,mpfi,3.78,3.15,8.7,160,5300,19,25,19045
202,-1,95,volvo,gas,std,four,sedan,rwd,front,109.1,...,173,mpfi,3.58,2.87,8.8,134,5500,18,23,21485
203,-1,95,volvo,diesel,turbo,four,sedan,rwd,front,109.1,...,145,idi,3.01,3.4,23.0,106,4800,26,27,22470
204,-1,95,volvo,gas,turbo,four,sedan,rwd,front,109.1,...,141,mpfi,3.78,3.15,9.5,114,5400,19,25,22625


In [4]:
# Show the 10 random sample rows using dataframe.sample() method
print("10 random sample rows of the dataframe")
df.sample(10)

10 random sample rows of the dataframe


Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
143,0,102,subaru,gas,std,four,sedan,fwd,front,97.2,...,108,mpfi,3.62,2.64,9.0,94,5200,26,32,9960
71,-1,?,mercedes-benz,gas,std,four,sedan,rwd,front,115.6,...,234,mpfi,3.46,3.1,8.3,155,4750,16,18,34184
156,0,91,toyota,gas,std,four,sedan,fwd,front,95.7,...,98,2bbl,3.19,3.03,9.0,70,4800,30,37,6938
108,0,161,peugot,diesel,turbo,four,sedan,rwd,front,107.9,...,152,idi,3.7,3.52,21.0,95,4150,28,33,13200
17,0,?,bmw,gas,std,four,sedan,rwd,front,110.0,...,209,mpfi,3.62,3.39,8.0,182,5400,15,20,36880
60,0,115,mazda,gas,std,four,sedan,fwd,front,98.8,...,122,2bbl,3.39,3.39,8.6,84,4800,26,32,8495
107,0,161,peugot,gas,std,four,sedan,rwd,front,107.9,...,120,mpfi,3.46,3.19,8.4,97,5000,19,24,11900
179,3,197,toyota,gas,std,two,hatchback,rwd,front,102.9,...,171,mpfi,3.27,3.35,9.3,161,5200,19,24,15998
11,0,192,bmw,gas,std,four,sedan,rwd,front,101.2,...,108,mpfi,3.5,2.8,8.8,101,5800,23,29,16925
158,0,91,toyota,diesel,std,four,sedan,fwd,front,95.7,...,110,idi,3.27,3.35,22.5,56,4500,34,36,7898


In [5]:
# Shape of this dataframe using dataframe.shape() method
print("The dataframe shape(Rows, Columns) is :",df.shape)

The dataframe shape(Rows, Columns) is : (205, 26)
