# 🚗 Automobile Data Set

The **Automobile Data Set** is an online dataset commonly used for data analysis and machine learning practice.  
It contains various attributes related to cars such as price, engine size, horsepower, fuel type, and more — ideal for exploring **data cleaning**, **visualization**, and **modeling** techniques.

---

## 📂 Dataset Information
- **📡 Data Source:** [UCI Machine Learning Repository – Automobile Data Set](https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data)  
- **📁 Data Type:** CSV (Comma-Separated Values)  
- **🧩 File Format Example:**

In [2]:
# Import neccessary libraries

import pandas as pd
import numpy as np
import requests
from io import StringIO

In [None]:
def download(url, filename):
    response = requests.get(url)
    if response.status_code == 200:
        with open (filename, 'w', encoding='utf-8') as f:
            f.write(response.text)
    else:
        print('Download failed: {response.status_code}')

url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/autos/imports-85.data'
filename = 'cars.csv'

download(url, filename)


In [12]:
df = pd.read_csv(filename)

# create headers list
headers = ["symboling","normalized-losses","make","fuel-type","aspiration", "num-of-doors","body-style",
         "drive-wheels","engine-location","wheel-base", "length","width","height","curb-weight","engine-type",
         "num-of-cylinders", "engine-size","fuel-system","bore","stroke","compression-ratio","horsepower",
         "peak-rpm","city-mpg","highway-mpg","price"]
print("headers\n", headers)

df.columns = headers

df = df.replace('?', np.nan)
df = df.dropna()

df.to_csv('cars_after_erase_NaN.csv', index=False)

headers
 ['symboling', 'normalized-losses', 'make', 'fuel-type', 'aspiration', 'num-of-doors', 'body-style', 'drive-wheels', 'engine-location', 'wheel-base', 'length', 'width', 'height', 'curb-weight', 'engine-type', 'num-of-cylinders', 'engine-size', 'fuel-system', 'bore', 'stroke', 'compression-ratio', 'horsepower', 'peak-rpm', 'city-mpg', 'highway-mpg', 'price']


In [None]:
print('===The first 5 rows===')
print(df.head())

===The first 5 rows===
   symboling normalized-losses  make fuel-type aspiration num-of-doors  \
2          2               164  audi       gas        std         four   
3          2               164  audi       gas        std         four   
5          1               158  audi       gas        std         four   
7          1               158  audi       gas      turbo         four   
9          2               192   bmw       gas        std          two   

  body-style drive-wheels engine-location  wheel-base  ...  engine-size  \
2      sedan          fwd           front        99.8  ...          109   
3      sedan          4wd           front        99.4  ...          136   
5      sedan          fwd           front       105.8  ...          136   
7      sedan          fwd           front       105.8  ...          131   
9      sedan          rwd           front       101.2  ...          108   

   fuel-system  bore  stroke compression-ratio horsepower  peak-rpm city-mpg  \
2

In [None]:
print('===Statistic Description===')
df.describe()

===Statistic Description===
<class 'pandas.core.frame.DataFrame'>
Index: 159 entries, 2 to 203
Data columns (total 26 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   symboling          159 non-null    int64  
 1   normalized-losses  159 non-null    object 
 2   make               159 non-null    object 
 3   fuel-type          159 non-null    object 
 4   aspiration         159 non-null    object 
 5   num-of-doors       159 non-null    object 
 6   body-style         159 non-null    object 
 7   drive-wheels       159 non-null    object 
 8   engine-location    159 non-null    object 
 9   wheel-base         159 non-null    float64
 10  length             159 non-null    float64
 11  width              159 non-null    float64
 12  height             159 non-null    float64
 13  curb-weight        159 non-null    int64  
 14  engine-type        159 non-null    object 
 15  num-of-cylinders   159 non-null    object 
 16  eng