## Answer questions

In [373]:
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [374]:
import pandas as pd
from src.paths import TRANSFORMED_DATA_DIR, MODELS_DIR, RAW_DATA_DIR

In [375]:
data = pd.read_csv(TRANSFORMED_DATA_DIR / 'transformed_data.csv')

In [376]:
data.shape

(4817, 22)

In [377]:
# Convert date columns to datetime
data['registration_date'] = pd.to_datetime(data['registration_date'])
data['sold_at'] = pd.to_datetime(data['sold_at'])

In [378]:
# Change data types from object to categorical
from src.data import convert_object_columns_to_category

data = convert_object_columns_to_category(data)

In [379]:
from src.data import get_train_test_data

In [380]:
# Read metadata json file from models folder
import json

metadata = json.load(open(MODELS_DIR / 'metadata.json'))

In [381]:
print(f'Model being used is: {metadata["name"]}')

Model being used is: XGBoost linear tuned with RandomizedSearchCV


In [382]:
features = metadata['features']
target = metadata['target']

In [383]:
# Print the features
features

['model_key',
 'mileage',
 'engine_power',
 'fuel',
 'paint_color',
 'car_type',
 'feature_1',
 'feature_2',
 'feature_3',
 'feature_4',
 'feature_5',
 'feature_6',
 'feature_7',
 'feature_8',
 'age_in_months_when_sold',
 'month_sold_at',
 'season_sold_at',
 'model_initial']

In [384]:
X, y, X_train, X_test, y_train, y_test = get_train_test_data(data, features, target)

In [385]:
import joblib

In [386]:
# Load the model
model = joblib.load(MODELS_DIR / 'model.pkl')

In [387]:
# Print the model
model

### Q1

In [388]:
# Read feature importance df from models folder
feature_importance = joblib.load(MODELS_DIR / 'feature_importance_df.pkl')

In [389]:
feature_importance

Unnamed: 0,feature,importance
6,feature_1,0.217826
13,feature_8,0.19279
11,feature_6,0.156673
8,feature_3,0.155754
9,feature_4,0.152137
0,model_key,0.053437
15,month_sold_at,0.05067
7,feature_2,0.050252
2,engine_power,0.043828
5,car_type,0.037984


### Q2

As found during the data exploration phase:
- Except for feature_7, cars with True in the other car features have in average higher prices than cars without them. This suggests that these features will be important for predictive models.
- There seems to be a peak average price in Aug 2018. This may be caused by a sale of a highly price car precisely this month.
- Hybrid and electrical cars are more expensive on average.
- Electrical cars average prices were stable from winter to summer, and were not sold in autumn.
- Diesel and petrol cars had similar average prices, although petrol cars had a drop in average prices starting summer 2018.
- The most expensive car type is, on average, suv, although coupe was most expensive at the start of the year and then dropped below suv also starting in summer.
- Coupe and convertible cars were, on average, more expensive in winter than in summer.
- Vans where more expensive, on average, in spring, summer, and autumn, than in winter.
- Subcompact had generally the lowest average prices.
- Paint color does not seem to generally determine or be associated with the average price, except for color green, which consistently had prices much lower than other colors. Maybe not very popular.
- Orange and white cars were sold for more, on average, during summer than during winter and spring.
- Red cars were the opposite, with lower average prices during summer than during winter and spring.

Find similar observations using estimated price instead of real price

In [390]:
pred = model.predict(X)

In [391]:
data_q2 = data.copy()

In [392]:
data_q2['price'] = pred

In [393]:
from src.plots import plot_avg_target_time_series_by_features

In [394]:
# Load car features
car_features = joblib.load(RAW_DATA_DIR / 'car_features.pkl')

In [395]:
plot_avg_target_time_series_by_features(data_q2, car_features)

In [396]:
# Load small cardinality features
small_cardinality_features = joblib.load(RAW_DATA_DIR / 'small_cardinality_features.pkl')

In [397]:
plot_avg_target_time_series_by_features(data_q2, small_cardinality_features)

In [398]:
from src.questions import ttest_mean_price_difference_between_groups_after_filter

In [399]:
grouping_column = 'season_sold_at'
group_1 = 'winter'
group_2 = 'summer'
feature_list = []
feature_value_list = []
t_stat_list = []
p_val_list = []
for feature in small_cardinality_features:
    for feature_value in data_q2[feature].unique():
        t_stat, p_val = ttest_mean_price_difference_between_groups_after_filter(
            data_q2, feature, feature_value, grouping_column, group_1, group_2)
        feature_list.append(feature)
        feature_value_list.append(feature_value)
        t_stat_list.append(t_stat)
        p_val_list.append(p_val)

ttest_df = pd.DataFrame({'feature': feature_list, 'feature_value': feature_value_list, 't_stat': t_stat_list, 'p_val': p_val_list})

In [400]:
ttest_df.sort_values(by='p_val')

Unnamed: 0,feature,feature_value,t_stat,p_val
8,car_type,sedan,4.255771,2.5e-05
14,paint_color,white,-2.437928,0.015437
5,car_type,coupe,2.439044,0.019033
6,car_type,estate,2.233628,0.025801
1,fuel,petrol,1.49919,0.138137
10,car_type,suv,-1.297684,0.194888
12,paint_color,black,1.125668,0.260642
4,car_type,convertible,1.153081,0.262472
11,car_type,van,-0.956647,0.352153
20,paint_color,brown,-0.852651,0.395104


### Q3

In [401]:
today_date = '3/1/2024'

In [402]:
data_q3 = data.copy()

In [403]:
# Convert date columns to datetime and mock the sold_at date as today's date
data_q3['registration_date'] = pd.to_datetime(data_q3['registration_date'])
data_q3['sold_at'] = today_date
data_q3['sold_at'] = pd.to_datetime(data_q3['sold_at'])

In [404]:
# Calculate mileage per month
data_q3['mileage_per_month'] = data_q3['mileage'] / data_q3['age_in_months_when_sold']

In [405]:
# Calculate age in month at today date and replace in data_q3
data_q3['age_in_months_when_sold'] = (data_q3['sold_at'].dt.to_period('M') - data_q3['registration_date'].dt.to_period('M')).apply(lambda x: x.n)

In [406]:
# Update estimated mileage at today date
data_q3['mileage'] = data_q3['age_in_months_when_sold'] * data_q3['mileage_per_month']

In [407]:
data_q3[['registration_date', 'sold_at', 'age_in_months_when_sold', 'mileage']].head()

Unnamed: 0,registration_date,sold_at,age_in_months_when_sold,mileage
0,2012-02-01,2024-03-01,145,286754.859155
1,2016-04-01,2024-03-01,95,60147.954545
2,2012-04-01,2024-03-01,143,374449.585714
3,2014-07-01,2024-03-01,116,345396.744186
4,2014-12-01,2024-03-01,111,269444.175


In [408]:
X_q3, y_q3, X_train_q3, X_test_q3, y_train_q3, y_test_q3 = get_train_test_data(data_q3, features, target)

In [409]:
# Get estimated prices today
pred_q3 = model.predict(X_q3)

In [410]:
# Add 1 year to the age_in_months_when_sold
X_q3['age_in_months_when_sold'] = X_q3['age_in_months_when_sold'] + 12

In [411]:
# Add 1 year worth of mileage
X_q3['mileage'] = X_q3['mileage'] + 12 * data_q3['mileage_per_month']

In [412]:
# Get estimated prices 1 year later
pred_q3_one_year_later = model.predict(X_q3)

In [413]:
X_q3['price_today'] = pred_q3
X_q3['price_one_year_later'] = pred_q3_one_year_later
X_q3['loss'] = X_q3['price_today'] - X_q3['price_one_year_later'] 

In [414]:
price_today_threshold = 20000
loss_threshold = 2000

In [415]:
# Indentify cars that are candidates for buying
candidate_cars = X_q3[(X_q3['price_today'] >= price_today_threshold) & (X_q3['loss'] <= loss_threshold)]

In [416]:
# Show top 10 candidate cars with lowest loss
candidate_cars.sort_values('loss').head(10)

Unnamed: 0,model_key,mileage,engine_power,fuel,paint_color,car_type,feature_1,feature_2,feature_3,feature_4,...,feature_6,feature_7,feature_8,age_in_months_when_sold,month_sold_at,season_sold_at,model_initial,price_today,price_one_year_later,loss
4346,X5,3090.6,183,diesel,black,suv,True,True,False,False,...,False,True,True,101,6,summer,X,27745.03125,26574.242188,1170.789062
3910,X5,9278.857143,155,diesel,black,suv,True,True,False,True,...,True,True,True,92,2,winter,X,25896.519531,24694.556641,1201.962891
4121,X6 M,13241.25,423,petrol,red,suv,True,True,True,False,...,False,True,True,107,4,spring,X,51002.59375,49790.472656,1212.121094
4705,X5 M,18844.8,230,diesel,silver,suv,True,True,False,True,...,False,False,True,104,8,summer,X,31912.771484,30675.164062,1237.607422
17,650,39463.484375,270,petrol,grey,convertible,True,False,False,False,...,False,True,False,206,9,autumn,6,22405.34375,21163.136719,1242.207031
37,650,39977.079365,270,petrol,grey,convertible,True,False,False,False,...,False,True,False,206,7,summer,6,22139.833984,20896.513672,1243.320312
2980,525,23582.695652,160,diesel,blue,sedan,True,True,True,False,...,True,True,True,106,4,spring,5,22330.25,21074.328125,1255.921875
4675,X5,41148.046512,160,diesel,grey,suv,True,True,True,False,...,False,True,True,122,8,summer,X,22510.773438,21203.923828,1306.849609
2664,530,37186.666667,195,diesel,grey,sedan,True,True,True,False,...,True,True,True,100,2,winter,5,25655.060547,24332.865234,1322.195312
3718,X5,47620.666667,160,diesel,grey,suv,True,True,True,False,...,False,True,True,122,1,winter,X,21473.169922,20142.78125,1330.388672


In [417]:
# Find the value counts for model_key in candidate_cars if exists. If not try model key initial.
try:
    model_related_value_counts = candidate_cars['model_key'].value_counts()
except KeyError:
    model_related_value_counts = candidate_cars['model_initial'].value_counts()

In [418]:
model_related_value_counts

model_key
X5                  39
X6                  14
640 Gran Coupé       6
X5 M                 6
X4                   5
                    ..
530 Gran Turismo     0
318 Gran Turismo     0
318                  0
630                  0
114                  0
Name: count, Length: 75, dtype: int64

In [419]:
# Describe numerical features of candidate cars
candidate_cars.describe()

Unnamed: 0,mileage,engine_power,age_in_months_when_sold,month_sold_at,price_today,price_one_year_later,loss
count,120.0,120.0,120.0,120.0,120.0,120.0,120.0
mean,138986.217068,222.25,125.316667,5.45,24375.5625,22725.402344,1650.159912
std,59000.382753,42.002651,18.218553,2.332993,4311.842285,4365.159668,206.243805
min,3090.6,155.0,92.0,1.0,20175.839844,18205.679688,1170.789062
25%,98877.018817,190.0,113.75,4.0,21497.603516,19834.683105,1514.142578
50%,146896.344615,225.0,125.0,5.0,23056.180664,21313.549805,1658.555664
75%,181843.261364,230.0,132.0,8.0,26005.69043,24383.196289,1805.486816
max,260508.348837,423.0,206.0,9.0,51002.59375,49790.472656,1998.583984


In [420]:
# Show the car with the lowest loss
car_index = candidate_cars.sort_values('loss').index[0]
data_q3.loc[car_index]

maker_key                                  BMW
model_key                                   X5
mileage                                 2723.4
engine_power                               183
registration_date          2016-10-01 00:00:00
fuel                                    diesel
paint_color                              black
car_type                                   suv
feature_1                                 True
feature_2                                 True
feature_3                                False
feature_4                                False
feature_5                                False
feature_6                                False
feature_7                                 True
feature_8                                 True
price                                    49100
sold_at                    2024-03-01 00:00:00
age_in_months_when_sold                     89
month_sold_at                                6
season_sold_at                          summer
model_initial

In [421]:
data.loc[car_index]

maker_key                                  BMW
model_key                                   X5
mileage                                    612
engine_power                               183
registration_date          2016-10-01 00:00:00
fuel                                    diesel
paint_color                              black
car_type                                   suv
feature_1                                 True
feature_2                                 True
feature_3                                False
feature_4                                False
feature_5                                False
feature_6                                False
feature_7                                 True
feature_8                                 True
price                                    49100
sold_at                    2018-06-01 00:00:00
age_in_months_when_sold                     20
month_sold_at                                6
season_sold_at                          summer
model_initial

In [422]:
X_q3.loc[car_index]

model_key                            X5
mileage                          3090.6
engine_power                        183
fuel                             diesel
paint_color                       black
car_type                            suv
feature_1                          True
feature_2                          True
feature_3                         False
feature_4                         False
feature_5                         False
feature_6                         False
feature_7                          True
feature_8                          True
age_in_months_when_sold             101
month_sold_at                         6
season_sold_at                   summer
model_initial                         X
price_today                 27745.03125
price_one_year_later       26574.242188
loss                        1170.789062
Name: 4346, dtype: object

In [423]:
# Check cars with the same model_key
data[data['model_key'] == data['model_key'].loc[car_index]].sort_values(
    'price', ascending=False)[['model_key', 'price', 'mileage', 'age_in_months_when_sold']]

Unnamed: 0,model_key,price,mileage,age_in_months_when_sold
4588,X5,55700,24912,22
3993,X5,52200,33639,21
4645,X5,50600,80307,54
4525,X5,50000,82397,39
4346,X5,49100,612,20
...,...,...,...,...
4423,X5,5100,249546,171
4607,X5,4600,217045,169
4678,X5,3600,220242,180
4347,X5,3500,350615,205


### Q3 assumming today is a month after latest sold at date in the data

In [424]:
today_date = data['sold_at'].max()

In [425]:
data_q3 = data.copy()

In [426]:
# Convert date columns to datetime and mock the sold_at date as today's date
data_q3['registration_date'] = pd.to_datetime(data_q3['registration_date'])
data_q3['sold_at'] = today_date
data_q3['sold_at'] = pd.to_datetime(data_q3['sold_at'])

In [427]:
# Calculate mileage per month
data_q3['mileage_per_month'] = data_q3['mileage'] / data_q3['age_in_months_when_sold']

In [428]:
# Calculate age in month at today date and replace in data_q3
data_q3['age_in_months_when_sold'] = (data_q3['sold_at'].dt.to_period('M') - data_q3['registration_date'].dt.to_period('M')).apply(lambda x: x.n)

In [429]:
# Update estimated mileage at today date
data_q3['mileage'] = data_q3['age_in_months_when_sold'] * data_q3['mileage_per_month']

In [430]:
data_q3[['registration_date', 'sold_at', 'age_in_months_when_sold', 'mileage']].head()

Unnamed: 0,registration_date,sold_at,age_in_months_when_sold,mileage
0,2012-02-01,2018-09-01,79,156231.957746
1,2016-04-01,2018-09-01,29,18360.954545
2,2012-04-01,2018-09-01,77,201626.7
3,2014-07-01,2018-09-01,50,148877.906977
4,2014-12-01,2018-09-01,45,109234.125


In [431]:
X_q3, y_q3, X_train_q3, X_test_q3, y_train_q3, y_test_q3 = get_train_test_data(data_q3, features, target)

In [432]:
# Get estimated prices today
pred_q3 = model.predict(X_q3)

In [433]:
# Add 1 year to the age_in_months_when_sold
X_q3['age_in_months_when_sold'] = X_q3['age_in_months_when_sold'] + 12

In [434]:
# Add 1 year worth of mileage
X_q3['mileage'] = X_q3['mileage'] + 12 * data_q3['mileage_per_month']

In [435]:
# Get estimated prices 1 year later
pred_q3_one_year_later = model.predict(X_q3)

In [436]:
X_q3['price_today'] = pred_q3
X_q3['price_one_year_later'] = pred_q3_one_year_later
X_q3['loss'] = X_q3['price_today'] - X_q3['price_one_year_later'] 

In [437]:
# Indentify cars that are candidates for buying
candidate_cars = X_q3[(X_q3['price_today'] >= price_today_threshold) & (X_q3['loss'] <= loss_threshold)]

In [438]:
# Show top 10 candidate cars with lowest loss
candidate_cars.sort_values('loss').head(10)

Unnamed: 0,model_key,mileage,engine_power,fuel,paint_color,car_type,feature_1,feature_2,feature_3,feature_4,...,feature_6,feature_7,feature_8,age_in_months_when_sold,month_sold_at,season_sold_at,model_initial,price_today,price_one_year_later,loss
2400,420 Gran Coupé,597.333333,120,diesel,blue,hatchback,True,True,False,False,...,True,True,True,64,8,summer,4,22133.398438,20972.039062,1161.359375
4346,X5,1071.0,183,diesel,black,suv,True,True,False,False,...,False,True,True,35,6,summer,X,34184.398438,33013.605469,1170.792969
3910,X5,2622.285714,155,diesel,black,suv,True,True,False,True,...,True,True,True,26,2,winter,X,32507.318359,31305.355469,1201.962891
4121,X6 M,5073.75,423,petrol,red,suv,True,True,True,False,...,False,True,True,41,4,spring,X,57669.253906,56457.132812,1212.121094
4705,X5 M,6885.6,230,diesel,silver,suv,True,True,False,True,...,False,False,True,38,8,summer,X,38719.613281,37482.003906,1237.609375
17,650,26819.84375,270,petrol,grey,convertible,True,False,False,False,...,False,True,False,140,9,autumn,6,29237.488281,27995.283203,1242.205078
778,318,6956.0,100,diesel,blue,estate,True,True,False,False,...,False,True,False,36,3,spring,3,20120.177734,18877.236328,1242.941406
37,650,27168.888889,270,petrol,grey,convertible,True,False,False,False,...,False,True,False,140,7,summer,6,28978.060547,27734.746094,1243.314453
4353,X3,15729.389831,120,diesel,brown,suv,True,True,True,True,...,False,True,True,74,6,summer,X,23984.964844,22733.441406,1251.523438
2980,525,8899.130435,160,diesel,blue,sedan,True,True,True,False,...,True,True,True,40,4,spring,5,29237.816406,27981.892578,1255.923828


In [439]:
# Find the value counts for model_key in candidate_cars if exists. If not try model key initial.
try:
    model_related_value_counts = candidate_cars['model_key'].value_counts()
except KeyError:
    model_related_value_counts = candidate_cars['model_initial'].value_counts()

In [440]:
model_related_value_counts

model_key
X5                  90
X3                  83
X6                  25
520                 24
530                 23
                    ..
335 Gran Turismo     0
418 Gran Coupé       0
630                  0
116                  0
114                  0
Name: count, Length: 75, dtype: int64

In [441]:
# Describe numerical features of candidate cars
candidate_cars.describe()

Unnamed: 0,mileage,engine_power,age_in_months_when_sold,month_sold_at,price_today,price_one_year_later,loss
count,476.0,476.0,476.0,476.0,476.0,476.0,476.0
mean,88757.488819,173.934874,68.535714,5.344538,26028.857422,24306.267578,1722.590576
std,41606.410865,45.447274,21.682804,2.12231,5323.515625,5354.304199,186.03981
min,597.333333,75.0,26.0,1.0,20009.070312,18092.205078,1161.359375
25%,60277.442308,135.0,54.75,4.0,21798.53418,20038.561035,1612.322754
50%,85945.180047,170.0,65.0,5.0,24637.391602,22873.609375,1747.697266
75%,113972.724279,190.0,78.0,7.0,29385.67627,27848.645508,1864.964844
max,236997.405405,423.0,140.0,9.0,57669.253906,56457.132812,1998.902344


In [442]:
# Show the car with the lowest loss
car_index = candidate_cars.sort_values('loss').index[0]
data_q3.loc[car_index]

maker_key                                  BMW
model_key                       420 Gran Coupé
mileage                             485.333333
engine_power                               120
registration_date          2014-05-01 00:00:00
fuel                                    diesel
paint_color                               blue
car_type                             hatchback
feature_1                                 True
feature_2                                 True
feature_3                                False
feature_4                                False
feature_5                                False
feature_6                                 True
feature_7                                 True
feature_8                                 True
price                                    30300
sold_at                    2018-09-01 00:00:00
age_in_months_when_sold                     52
month_sold_at                                8
season_sold_at                          summer
model_initial

In [443]:
data.loc[car_index]

maker_key                                  BMW
model_key                       420 Gran Coupé
mileage                                    476
engine_power                               120
registration_date          2014-05-01 00:00:00
fuel                                    diesel
paint_color                               blue
car_type                             hatchback
feature_1                                 True
feature_2                                 True
feature_3                                False
feature_4                                False
feature_5                                False
feature_6                                 True
feature_7                                 True
feature_8                                 True
price                                    30300
sold_at                    2018-08-01 00:00:00
age_in_months_when_sold                     51
month_sold_at                                8
season_sold_at                          summer
model_initial

In [444]:
X_q3.loc[car_index]

model_key                  420 Gran Coupé
mileage                        597.333333
engine_power                          120
fuel                               diesel
paint_color                          blue
car_type                        hatchback
feature_1                            True
feature_2                            True
feature_3                           False
feature_4                           False
feature_5                           False
feature_6                            True
feature_7                            True
feature_8                            True
age_in_months_when_sold                64
month_sold_at                           8
season_sold_at                     summer
model_initial                           4
price_today                  22133.398438
price_one_year_later         20972.039062
loss                          1161.359375
Name: 2400, dtype: object

In [445]:
# Check cars with the same model_key
data[data['model_key'] == data['model_key'].loc[car_index]].sort_values(
    'price', ascending=False)[['model_key', 'price', 'mileage', 'age_in_months_when_sold']]

Unnamed: 0,model_key,price,mileage,age_in_months_when_sold
2170,420 Gran Coupé,32100,27547,28
2400,420 Gran Coupé,30300,476,51
1806,420 Gran Coupé,30200,104133,34
2138,420 Gran Coupé,29800,64380,30
2222,420 Gran Coupé,29500,40168,39
1904,420 Gran Coupé,29500,27966,22
3403,420 Gran Coupé,29300,54836,52
2422,420 Gran Coupé,29200,77358,41
2395,420 Gran Coupé,28100,64500,45
1978,420 Gran Coupé,27400,139258,41


### Q4

In [446]:
print(f'Test MSE: {metadata["mse_test"]:.2f}')
print(f'Test RMSE: {metadata["rmse_test"]:.2f}')
print(f'Test MAE: {metadata["mae_test"]:.2f}')
print(f'Test R2: {metadata["r2_test"]:.2f}')

Test MSE: 17458150.42
Test RMSE: 4178.30
Test MAE: 2878.18
Test R2: 0.74


### Q5

#### Data quality

There are no null values in the data.

There are no infinite values in the data.

There are no duplicates.

From the number of unique feature values we can observe that:
- There is only one maker (BWM). So, this feature will not give information to the models.
- There are 199 different registration dates.
- feature_1 to feature_8 are binary variables.
- The auction happened in 9 different dates.

Looking at the description of numerical features, hints that there are possibly erroneous observations. For example:
- a car with -64 miles,
- a car with 0 (I assume hp) engine power, 
- and a car that costed 100 (I assume USD)

Therefore, data needs some further cleaning.

There is a 640 Gran Coupé with negative mileage which is not possible. Also, there are other 18 cars of the same model key, so this row will be removed.

This is a 13 year-old car (159 months) with more than a million miles. Although strange, it is not impossible that it has driven this many miles (about 210 in average daily). So this observation is not recommended to be removed.

There is a wrong observation with 0 engine power for an X1 which is a SUV. This is impossible. Since there are more than 200 other X1 this observation can be removed.

Very likely the engine power of 25 (hp) for two i3 is wrong. These cars should have 75 (hp) engine power, so this is probably a typo. Since there are very few other i3 cars, data imputation might be a better alternative here to dropping the records. According to most values, a good candidate value for imputation is 75.

Regarding models 316 and 318, since there are more than 200 hundred other observations with the same model, and these cars typically have at least 75 of engine power, these records can also be deleted.

There were 62 cars sold at less than 1,000 which is very unusual. Looking at a description of the numerical features of these cars we find:
- The newest car sold at this price was less than 3 years old. This seems unusual.
- The oldest car was 24 years old.
- Minimum and max mileage seem sensible.

Prices below 800 seem more unusual and might correspond to cars with important damages. Since cars with engine damages were removed before, these other cars with important damages could be removed as well and maybe priced with another strategy.

#### Data distribution (after cleaning)

The features still have very high values after removing the data that probably had errors. 
Examples are:
- 1M miles drove. This can be rare but possible for old cars. 
- A few cars costed more than 100k, which is also possible depending on the car.
- There are some cars with engine power above 400, which can happen for sports cars.
- There are cars as old as 22 years, which is also possible.

These values do not suggest that further data cleaning is needed.

#### Numerical features vs. price

Based on the scatter plots, the following observations can be made:
- As expected, prices tend to decrease with mileage.
- As expected, prices tend to increase with engine power.
- As expected, prices tend to decrease with age.