## Notebook for Used cars in germany  [Kaggle Dataset](https://www.kaggle.com/datasets/wspirat/germany-used-cars-dataset-2023)
### 1. General Idea
---
from the kaggle link, we can see that the dataset is mostly clean, we still need to clean it a little before fetching 
it to the visualization tool, namely Tableau.

In [1]:
#loading the tidyverse
library(tidyverse)
library(janitor)

#loading the .csv file into the dataframe
car_df <- read.csv("Files/car_data.csv")

── [1mAttaching core tidyverse packages[22m ──────────────────────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.4     [32m✔[39m [34mreadr    [39m 2.1.5
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.0     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.1
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors

Attache Paket: 'janitor'


Die folgenden Objekte sind maskiert von 'package:stats':

   

### 2. chosing our columns
---
the columns that we care about are mainly : brand, model, color, year, price_in_euro, transmission_type, fuel_type, fuel_consumption_l_100km, mileage_in_km. loading that into dummy datafram and cleaning it is our main objective.

In [2]:
#choosing the colomns that we need in our dashboard
car <- car_df[c('num', 'brand', 'model','color','year','price_in_euro','transmission_type','fuel_type','fuel_consumption_l_100km', 'mileage_in_km')]

#using car2 as dummy variable to store cleaning codes on car variable
car2 <- car

In [3]:
#filtering columns after checking their unique values and noticing that their values has errors

#filtering year in only the range
car2 <- car2 %>% filter( as.integer((year)) %in% c((1995:2023)  ) )

#filtering the fuel type to the main ones
car2 <- car2 %>% filter( fuel_type %in% c("Petrol", "Diesel", "Hybrid", "Electric")  )

#getting only known transmission types
car2 <- car2 %>% filter( transmission_type != "Unknown"  ) 

#filtering the consumption to the results that have dimensional inits
car2 <- car2 %>% filter( grepl("l/100 km", fuel_consumption_l_100km ) == TRUE)


car2

[1m[22m[36mℹ[39m In argument: `as.integer((year)) %in% c((1995:2023))`.
[33m![39m NAs durch Umwandlung erzeugt"


num,brand,model,color,year,price_in_euro,transmission_type,fuel_type,fuel_consumption_l_100km,mileage_in_km
<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>
0,alfa-romeo,Alfa Romeo GTV,red,1995,1300,Manual,Petrol,"10,9 l/100 km",160500
3,alfa-romeo,Alfa Romeo Spider,black,1995,4900,Manual,Petrol,"9,5 l/100 km",189500
4,alfa-romeo,Alfa Romeo 164,red,1996,17950,Manual,Petrol,"7,2 l/100 km",96127
5,alfa-romeo,Alfa Romeo Spider,red,1996,7900,Manual,Petrol,"9,5 l/100 km",47307
6,alfa-romeo,Alfa Romeo 145,red,1996,3500,Manual,Petrol,"8,8 l/100 km",230000
7,alfa-romeo,Alfa Romeo 164,black,1996,5500,Manual,Petrol,"13,4 l/100 km",168000
8,alfa-romeo,Alfa Romeo Spider,black,1996,8990,Manual,Petrol,11 l/100 km,168600
9,alfa-romeo,Alfa Romeo Spider,black,1996,6976,Manual,Petrol,"9,2 l/100 km",99000
10,alfa-romeo,Alfa Romeo Spider,silver,1996,5499,Manual,Petrol,"11,1 l/100 km",157000
11,alfa-romeo,Alfa Romeo Spider,silver,1996,8499,Manual,Petrol,"9,5 l/100 km",15550


In [None]:
# removing empty values

car2 <- subset(car2, num != "" )
car2 <- subset(car2, brand != "" )
car2 <- subset(car2, model != "" )
car2 <- subset(car2, color != "" )
car2 <- subset(car2, year != "" )
car2 <- subset(car2, price_in_euro != "" )
car2 <- subset(car2, transmission_type != "" )
car2 <- subset(car2, fuel_type != "" )
car2 <- subset(car2, fuel_consumption_l_100km != "" )
car2 <- subset(car2, mileage_in_km != "" )

car2

Unnamed: 0_level_0,num,brand,model,color,year,price_in_euro,transmission_type,fuel_type,fuel_consumption_l_100km,mileage_in_km
Unnamed: 0_level_1,<int>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<chr>,<dbl>
1,0,alfa-romeo,Alfa Romeo GTV,red,1995,1300,Manual,Petrol,"10,9 l/100 km",160500
2,3,alfa-romeo,Alfa Romeo Spider,black,1995,4900,Manual,Petrol,"9,5 l/100 km",189500
3,4,alfa-romeo,Alfa Romeo 164,red,1996,17950,Manual,Petrol,"7,2 l/100 km",96127
4,5,alfa-romeo,Alfa Romeo Spider,red,1996,7900,Manual,Petrol,"9,5 l/100 km",47307
5,6,alfa-romeo,Alfa Romeo 145,red,1996,3500,Manual,Petrol,"8,8 l/100 km",230000
6,7,alfa-romeo,Alfa Romeo 164,black,1996,5500,Manual,Petrol,"13,4 l/100 km",168000
7,8,alfa-romeo,Alfa Romeo Spider,black,1996,8990,Manual,Petrol,11 l/100 km,168600
8,9,alfa-romeo,Alfa Romeo Spider,black,1996,6976,Manual,Petrol,"9,2 l/100 km",99000
9,10,alfa-romeo,Alfa Romeo Spider,silver,1996,5499,Manual,Petrol,"11,1 l/100 km",157000
10,11,alfa-romeo,Alfa Romeo Spider,silver,1996,8499,Manual,Petrol,"9,5 l/100 km",15550


### 3. saving the result
---
after cleaning the data, we save it in the "Files" dictionary

In [None]:
write.csv(car2, "Files/Clean_car_data.csv")

In [1]:
#Python code
from IPython.display import display, HTML
display(HTML("Files/Dashboard_link.html"))