# Analisando o conjunto de dados de carros usados das Empresas Autotrader e carfax

Definição dos dados:
- **id:** Número de identificação do carro
- **brand:** Marca do carro
- **model:** Modelo do carro
- **year:** Ano de fabricação do carro
- **miles:** Milhas rodadas pelo carro
- **city_mileage:** Milhas rodadas pelo carro na cidade
- **highway_mileage:** Milhas rodadas pelo carro na estrada
- **horsepower:** Potência do carro
- **torque:** Torque do carro
- **engine_capacity_litre:** Capacidade do motor em litros
- **fuel_capacity:** Capacidade do tanque de combustível
- **num_cylinder:** Número de cilindros
- **num_seat:** Número de assentos
- **num_owners:** Número de donos
- **price:** Preço do carro
- **link:** Link do carro
- **condition:** Condição do carro
- **type:** Tipo do carro
- **doors:** Número de portas
- **wheel_drive:** Tração do carro
- **engine_type:** Tipo de motor
- **speed_levels:** Número de velocidades
- **front_headroom:** Altura da cabeça na frente
- **front_legroom:** Altura da perna na frente
- **rear_headroom:** Altura da cabeça atrás
- **rear_legroom:** Altura da perna atrás
- **service_records:** Registros de serviço

In [1]:
# Importando as bibliotecas
import pandas as pd

In [17]:
# Importando os dados
carros = pd.read_csv("Data/cars.csv", sep = ",", na_values = ["NaN", "na", "N/A", ""])

In [18]:
carros.head()

Unnamed: 0,id,brand,model,year,miles,city_mileage,highway_mileage,horsepower,torque,engine_capacity_litre,...,type,doors,wheel_drive,engine_type,speed_levels,front_headroom,front_legroom,rear_headroom,rear_legroom,service_records
0,3,Chevrolet,Volt LT,2019,27173,43,42,149,294,1.5,...,sedan,4,2,hybrid_gas_electric,6.0,37.8,42.1,35.8,34.7,4
1,6,Hyundai,Elantra SE,2017,76941,29,38,146,132,2.0,...,sedan,4,2,gas,6.0,39.0,42.2,37.0,35.7,16
2,7,Ford,Focus,2014,97027,27,37,159,146,2.0,...,sedan,4,2,gas,6.0,38.3,43.7,38.0,33.2,13
3,8,Honda,Civic,2016,95396,31,42,158,138,1.5,...,sedan,4,2,gas,6.0,37.5,42.3,36.8,37.4,26
4,9,Hyundai,Elantra SE,2019,28405,29,38,147,132,2.0,...,sedan,4,2,gas,6.0,39.0,42.2,37.0,35.7,5


## Análise Exploratória dos Dados


In [19]:
# Verificando o tamanho do dataset
carros.shape

(52, 27)

In [24]:
# Verificando os dados faltantes
carros.isnull().sum()

id                        0
brand                     0
model                     0
year                      0
miles                     0
city_mileage              0
highway_mileage           0
horsepower                0
torque                    0
engine_capacity_litre     0
fuel_capacity             0
num_cylinder              0
num_seat                  0
num_owners                0
price                     0
link                      0
condition                51
type                      0
doors                     0
wheel_drive               0
engine_type               0
speed_levels              1
front_headroom            0
front_legroom             0
rear_headroom             0
rear_legroom              0
service_records           0
dtype: int64

### Limpeza dos dados

In [27]:
# Transformando coluna link em empresa
carros["empresa"] = carros["link"].str.split("/").str[2].replace({"www.autotrader.com": "Autotrader", "www.carfax.com": "Carfax", "www.carvana.com": "Carvana"})

In [28]:
# Removendo as colunas que não serão utilizadas
carros.drop(["id", "link", "condition"], axis = 1, inplace = True)

In [31]:
# Removendo os dados faltantes
carros.dropna(inplace = True)

In [33]:
carros.describe()

Unnamed: 0,year,miles,city_mileage,highway_mileage,horsepower,torque,engine_capacity_litre,fuel_capacity,num_cylinder,num_seat,num_owners,price,doors,wheel_drive,speed_levels,front_headroom,front_legroom,rear_headroom,rear_legroom,service_records
count,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0,51.0
mean,2018.686275,33984.137255,28.745098,37.372549,176.176471,179.470588,1.839216,14.37451,4.039216,5.0,1.352941,21660.764706,4.0,2.156863,6.313725,38.672549,42.472549,37.241176,37.660784,7.431373
std,1.406985,22918.580263,5.613709,4.161542,33.758973,46.376871,0.380567,2.418582,0.280056,0.0,0.62685,3488.385097,0.0,0.543049,0.76132,0.9263,1.054908,0.448186,1.607243,4.700021
min,2014.0,5000.0,17.0,24.0,143.0,99.0,1.4,8.9,4.0,5.0,1.0,13590.0,4.0,2.0,6.0,37.5,41.1,35.8,33.2,1.0
25%,2018.0,16328.5,26.0,35.5,152.0,138.0,1.5,12.4,4.0,5.0,1.0,19496.0,4.0,2.0,6.0,37.65,42.2,37.05,37.4,4.0
50%,2019.0,27440.0,30.0,38.0,159.0,184.0,2.0,14.0,4.0,5.0,1.0,21998.0,4.0,2.0,6.0,38.5,42.3,37.2,37.4,6.0
75%,2019.5,43085.0,30.0,40.0,192.0,192.0,2.0,15.3,4.0,5.0,2.0,22994.0,4.0,2.0,6.0,39.3,42.4,37.5,38.3,10.0
max,2022.0,97027.0,55.0,49.0,288.0,294.0,3.5,19.0,6.0,5.0,3.0,33777.0,4.0,4.0,9.0,40.4,45.5,38.0,40.4,26.0
