# 🚗 Automobile Data Set

The **Automobile Data Set** is an online dataset commonly used for data analysis and machine learning practice.  
It contains various attributes related to cars such as price, engine size, horsepower, fuel type, and more — ideal for exploring **data cleaning**, **visualization**, and **modeling** techniques.

---

## 📂 Dataset Information
- **📡 Data Source:** [UCI Machine Learning Repository – Automobile Data Set](https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data)  
- **📁 Data Type:** CSV (Comma-Separated Values)  
- **🧩 File Format Example:**

In [6]:
# Import neccessary libraries

import pandas as pd
import numpy as np
import requests
from io import StringIO

In [None]:
# Create a download function to download data from a link and write it to a local file

def download(url, filename):
    response = requests.get(url)
    if response.status_code == 200:
        with open (filename, 'w', encoding='utf-8') as f:
            f.write(response.text)
    else:
        print('Download failed: {response.status_code}')

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data'
filename = 'cars.csv'

download(url, filename)


In [None]:
# Create a dataframe to store data
df = pd.read_csv(filename)

# Create a list to store headers
headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of-doors","body-style",
         "drive-wheels","engine-location","wheel-base", "length","width","height","curb-weight","engine-type",
         "num-of-cylinders", "engine-size","fuel-system","bore","stroke","compression-ratio","horsepower",
         "peak-rpm","city-mpg","highway-mpg","price"]
print("headers\n", headers)

# Assign the headers
df.columns = headers

df = df.replace('?', np.nan)
df = df.dropna()

df.to_csv('cars_after_erase_NaN.csv', index=False)

headers
 ['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration', 'num-of-doors', 'body-style', 'drive-wheels', 'engine-location', 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type', 'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'price']


In [4]:
print('===The first 5 rows===')
df.head()

===The first 5 rows===


Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
2,2,164,audi,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.4,10.0,102,5500,24,30,13950
3,2,164,audi,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.4,8.0,115,5500,18,22,17450
5,1,158,audi,gas,std,four,sedan,fwd,front,105.8,...,136,mpfi,3.19,3.4,8.5,110,5500,19,25,17710
7,1,158,audi,gas,turbo,four,sedan,fwd,front,105.8,...,131,mpfi,3.13,3.4,8.3,140,5500,17,20,23875
9,2,192,bmw,gas,std,two,sedan,rwd,front,101.2,...,108,mpfi,3.5,2.8,8.8,101,5800,23,29,16430


In [5]:
print('===Statistic Description===')
df.describe()

===Statistic Description===


Unnamed: 0,symboling,wheel-base,length,width,height,curb-weight,engine-size,compression-ratio,city-mpg,highway-mpg
count,159.0,159.0,159.0,159.0,159.0,159.0,159.0,159.0,159.0,159.0
mean,0.735849,98.264151,172.413836,65.607547,53.899371,2461.138365,119.226415,10.161132,26.522013,32.081761
std,1.193086,5.167416,11.523177,1.947883,2.268761,481.941321,30.460791,3.889475,6.097142,6.459189
min,-2.0,86.6,141.1,60.3,49.4,1488.0,61.0,7.0,15.0,18.0
25%,0.0,94.5,165.65,64.0,52.25,2065.5,97.0,8.7,23.0,28.0
50%,1.0,96.9,172.4,65.4,54.1,2340.0,110.0,9.0,26.0,32.0
75%,2.0,100.8,177.8,66.5,55.5,2809.5,135.0,9.4,31.0,37.0
max,3.0,115.6,202.6,71.7,59.8,4066.0,258.0,23.0,49.0,54.0
