The used car sales service Rusty Bargain is developing an app to attract new customers. Thanks to this app, you can quickly find out the market value of your car. You have access to the history: technical specifications, equipment versions and prices. You have to create a model that determines the market value.
Rusty Bargain is interested in:
- the quality of the prediction;
- the speed of the prediction;
- the time required for training

Translated with DeepL.com (free version)

# Data Preparation 

We would like to start by importing all the libraries that we will use during this process.

In [1]:
# Pandas for dataset management
import pandas as pd
# Sklearn for machine learning practices and modelling
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.preprocessing import StandardScaler
from sklearn import metrics as skmet
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
# Light GBM for the LightGBM algorithm
import lightgbm as lgb
from lightgbm import LGBMRegressor
# XGBoost for the XGBoost algorithm
from xgboost import XGBRegressor
# CatBoost for the CatBoost algorithm
from catboost import CatBoostRegressor
# Plotly for showing the data in dynamic graphics
import matplotlib.pyplot as plt
# Numpy for specific numeric operations
import numpy as np
# Measure time on a function
import time
# Seaborn for better graphics
import seaborn as sns
# Disables all warnings
import warnings
warnings.filterwarnings("ignore")

Then we proceed to import our dataset for our project

In [359]:
df = pd.read_csv('datasets/car_data.csv')
print(df.head(5))
print()
print(f"Rows: {df.shape[0]} - Columns: {df.shape[1]}")

        DateCrawled  Price VehicleType  RegistrationYear Gearbox  Power  \
0  24/03/2016 11:52    480         NaN              1993  manual      0   
1  24/03/2016 10:58  18300       coupe              2011  manual    190   
2  14/03/2016 12:52   9800         suv              2004    auto    163   
3  17/03/2016 16:54   1500       small              2001  manual     75   
4  31/03/2016 17:25   3600       small              2008  manual     69   

   Model  Mileage  RegistrationMonth  FuelType       Brand NotRepaired  \
0   golf   150000                  0    petrol  volkswagen         NaN   
1    NaN   125000                  5  gasoline        audi         yes   
2  grand   125000                  8  gasoline        jeep         NaN   
3   golf   150000                  6    petrol  volkswagen          no   
4  fabia    90000                  7  gasoline       skoda          no   

        DateCreated  NumberOfPictures  PostalCode          LastSeen  
0  24/03/2016 00:00               

In [383]:
df

Unnamed: 0,DateCrawled,Price,VehicleType,RegistrationYear,Gearbox,Power,Model,Mileage,RegistrationMonth,FuelType,Brand,NotRepaired,DateCreated,PostalCode,LastSeen
3,2016-03-17 16:54:00,1500,small,2001,manual,75,golf,150000,6,petrol,volkswagen,False,2016-03-17 16:54:00,91074,2016-03-17 16:54:00
4,2016-03-31 17:25:00,3600,small,2008,manual,69,fabia,90000,7,gasoline,skoda,False,2016-03-31 17:25:00,60437,2016-03-31 17:25:00
5,2016-04-04 17:36:00,650,sedan,1995,manual,102,3er,150000,10,petrol,bmw,True,2016-04-04 17:36:00,33775,2016-04-04 17:36:00
6,2016-04-01 20:48:00,2200,convertible,2004,manual,109,2_reihe,150000,8,petrol,peugeot,False,2016-04-01 20:48:00,67112,2016-04-01 20:48:00
7,2016-03-21 18:54:00,0,sedan,1980,manual,50,other,40000,7,petrol,volkswagen,False,2016-03-21 18:54:00,19348,2016-03-21 18:54:00
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
354360,2016-04-02 20:37:00,3999,wagon,2005,manual,3,3er,150000,5,gasoline,bmw,False,2016-04-02 20:37:00,81825,2016-04-02 20:37:00
354362,2016-03-19 19:53:00,3200,sedan,2004,manual,225,leon,150000,5,petrol,seat,True,2016-03-19 19:53:00,96465,2016-03-19 19:53:00
354363,2016-03-27 20:36:00,1150,bus,2000,manual,0,zafira,150000,3,petrol,opel,False,2016-03-27 20:36:00,26624,2016-03-27 20:36:00
354366,2016-03-05 19:56:00,1199,convertible,2000,auto,101,fortwo,125000,3,petrol,smart,False,2016-03-05 19:56:00,26135,2016-03-05 19:56:00


In [360]:
# First we would want to check null and repeated values in the dataset
df.isna().sum()

DateCrawled              0
Price                    0
VehicleType          37490
RegistrationYear         0
Gearbox              19833
Power                    0
Model                19705
Mileage                  0
RegistrationMonth        0
FuelType             32895
Brand                    0
NotRepaired          71154
DateCreated              0
NumberOfPictures         0
PostalCode               0
LastSeen                 0
dtype: int64

In [361]:
# drop all NaN values in the dataset
df = df.dropna()
print()
print(f"Rows: {df.shape[0]} - Columns: {df.shape[1]}")


Rows: 245814 - Columns: 16


In [362]:
# We check if there are any duplicated values in the dataset
print("Number of duplicated items: ",df.duplicated().sum())
# and drop them
df = df.drop_duplicates()
print()
print(f"Rows: {df.shape[0]} - Columns: {df.shape[1]}")

Number of duplicated items:  247

Rows: 245567 - Columns: 16


In [363]:
# We check the data types of the columns to see if there are any incorrect types
df.dtypes

DateCrawled          object
Price                 int64
VehicleType          object
RegistrationYear      int64
Gearbox              object
Power                 int64
Model                object
Mileage               int64
RegistrationMonth     int64
FuelType             object
Brand                object
NotRepaired          object
DateCreated          object
NumberOfPictures      int64
PostalCode            int64
LastSeen             object
dtype: object

We can change the next types just to maintain a clearer view on our data
- DateCarawled to datetime
- Not Repaired boolean

In [364]:
# Change type to datetime
df['DateCrawled'] = pd.to_datetime(df['DateCrawled'], format='%d/%m/%Y %H:%M')
df['DateCreated'] = pd.to_datetime(df['DateCrawled'], format='%d/%m/%Y %H:%M')
df['LastSeen'] = pd.to_datetime(df['DateCrawled'], format='%d/%m/%Y %H:%M')
# Transform the column to NotRepaired into boolean
df['NotRepaired'] = df['NotRepaired'].map({'yes': True, 'no': False})
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 245567 entries, 3 to 354367
Data columns (total 16 columns):
 #   Column             Non-Null Count   Dtype         
---  ------             --------------   -----         
 0   DateCrawled        245567 non-null  datetime64[ns]
 1   Price              245567 non-null  int64         
 2   VehicleType        245567 non-null  object        
 3   RegistrationYear   245567 non-null  int64         
 4   Gearbox            245567 non-null  object        
 5   Power              245567 non-null  int64         
 6   Model              245567 non-null  object        
 7   Mileage            245567 non-null  int64         
 8   RegistrationMonth  245567 non-null  int64         
 9   FuelType           245567 non-null  object        
 10  Brand              245567 non-null  object        
 11  NotRepaired        245567 non-null  bool          
 12  DateCreated        245567 non-null  datetime64[ns]
 13  NumberOfPictures   245567 non-null  int64        

In [365]:
# Fainally we notice that there is a column that never changes, which means it is useless for our analysis
df = df.drop(columns=['NumberOfPictures'])

# Model Training

## Numerical Encoding

To get a Numerical Encoding for all our values we need to first categorize all of them to then transform each data type correctly and create dataframe with all numerical values.

In [290]:
# Gets all categorical columns in the dataset
cat_cols = df.select_dtypes(include=['object', 'bool']).columns.tolist()
print(cat_cols)

['VehicleType', 'Gearbox', 'Model', 'FuelType', 'Brand', 'NotRepaired']


In [291]:
# Gets all numeric columns in the dataset
num_cols = df.select_dtypes(include=['number']).columns.tolist()
print(num_cols)

['Price', 'RegistrationYear', 'Power', 'Mileage', 'RegistrationMonth', 'PostalCode']


In [292]:
# Gets all date columns in the dataset
date_cols = df.select_dtypes(include=['datetime64[ns]']).columns.tolist()
print(date_cols)

['DateCrawled', 'DateCreated', 'LastSeen']


In [293]:
df_encoded = pd.get_dummies(df, columns=cat_cols, drop_first=True)

In [294]:
def prepare_data(df, categorical_cols, date_cols):
    """
    Prepares the dataframe:
    - Converts date columns into year and month
    - One-Hot Encodes categorical columns
    - Returns a clean dataframe
    """
    prepared_df = df.copy()  # To avoid modifying original

    # --- 1. Handle date columns ---
    for col in date_cols:
        # Convert to datetime
        prepared_df[col] = pd.to_datetime(prepared_df[col], format='%d/%m/%Y %H:%M')

        # Create year and month features
        prepared_df[f'{col}_year'] = prepared_df[col].dt.year
        prepared_df[f'{col}_month'] = prepared_df[col].dt.month

    # Drop original date columns
    prepared_df = prepared_df.drop(columns=date_cols)

    # --- 2. Handle categorical columns ---
    prepared_df = pd.get_dummies(prepared_df, columns=categorical_cols, drop_first=True)

    return prepared_df

In [295]:
# Assign the result of the function to a new dataframe variable
df_numeric = prepare_data(df, cat_cols, date_cols)
# Get ll the booleans created on the OHE
bool_columns = df_numeric.select_dtypes(include=['bool']).columns.tolist()
# Convert boolean columns to integers (0 and 1)
# This is done to make it easier to work with the data later on
df_numeric[bool_columns] = df_numeric[bool_columns].astype(int)


In [296]:
df_numeric.dtypes.value_counts()

int64    307
int32      6
Name: count, dtype: int64

## Timer

We need to know how much our models are taking to return the results, for this we will create a function that does this for us.

In [297]:
def timer(func):
    def wrapper(*args, **kwargs):
        start = time.perf_counter()
        result = func(*args, **kwargs)
        end = time.perf_counter()
        print(f"⏰ Function '{func.__name__}' took {end - start:.4f} seconds")
        return result
    return wrapper

## Generate a Standarized version of the data

In the span of the project we will have various types of models.
- Linear Regression
- Decision Tree
- Random Forest
- LightGBM
- CatBoost
- XGBoost

Now that we have our data processed, we can split our data into our raw values and standarized values. Having 2 dataframes can help with the usage of various models. 

* Standarized dataframes can help with our Logistic Regression and CatBoost
* Meanwhile, having a raw dataframe can help in order to use Decision Trees, Random Forest, XGBoost and LightGBM

In [298]:
scaler = StandardScaler()
# Creatre a copy of the dataframe
df_standardized = df_numeric.copy()
# Standardize the numeric columns
df_standardized[num_cols] = scaler.fit_transform(df_standardized[num_cols])

In [299]:
df_standardized.head(5)

Unnamed: 0,Price,RegistrationYear,Power,Mileage,RegistrationMonth,PostalCode,DateCrawled_year,DateCrawled_month,DateCreated_year,DateCreated_month,...,Brand_seat,Brand_skoda,Brand_smart,Brand_subaru,Brand_suzuki,Brand_toyota,Brand_trabant,Brand_volkswagen,Brand_volvo,NotRepaired_True
3,-0.768404,-0.311174,-0.322499,0.612277,-0.051555,1.532972,2016,3,2016,3,...,0,0,0,0,0,0,0,1,0,0
4,-0.323343,0.824328,-0.365526,-1.00587,0.235819,0.347318,2016,3,2016,3,...,0,1,0,0,0,0,0,0,0,0
5,-0.948547,-1.284461,-0.128878,0.612277,1.097944,-0.684503,2016,4,2016,4,...,0,0,0,0,0,0,0,0,0,1
6,-0.62005,0.17547,-0.07868,0.612277,0.523194,0.605641,2016,4,2016,4,...,0,0,0,0,0,0,0,0,0,0
7,-1.086304,-3.717679,-0.501778,-2.354325,0.235819,-1.242829,2016,3,2016,3,...,0,0,0,0,0,0,0,1,0,0


## Features and Targets

For this section we will need to adjust our parameters for the training of the models by assigning the features and the targets as well as their percentages of the train data and the test data.

In [None]:
# Assigning the features and target variables
features = df_numeric.columns.drop(['Price'])
target = 'Price'
# Split the data into training and testing sets
X_train,X_test,y_train,y_test = train_test_split(df_numeric[features], df_numeric[target], test_size=0.25, random_state=12345)
# For standarized data
X_train_std,X_test_std,y_train_std,y_test_std = train_test_split(df_standardized[features], df_standardized[target], test_size=0.25, random_state=12345)

## Sanity Check

In [301]:
def sanity_check(X_train,X_test,y_train,y_test):
    """
    Runs a linear and gradient boosting regression model on the data.
    Prints the mean squared error for both models.
    """
    # Create a linear regression model
    lr = LinearRegression()
    lr.fit(X_train, y_train)
    y_pred_lr = lr.predict(X_test)
    # Create a gradient boosting regressor model
    gbr = GradientBoostingRegressor(random_state=42)
    gbr.fit(X_train, y_train)
    y_pred_gbr = gbr.predict(X_test)
    # Calculate the mean squared error for both models
    lr_mse = skmet.root_mean_squared_error(y_test, y_pred_lr)
    gbr_mse = skmet.root_mean_squared_error(y_test, y_pred_gbr)
    # Print the mean squared error for both models
    print(f"Linear Regression MSE: {lr_mse:.4f}")
    print(f"Gradient Boosting MSE: {gbr_mse:.4f}")

In [302]:
# Run with the original data
sanity_check(X_train,X_test,y_train,y_test)
# Run with the standardized data
sanity_check(X_train_std,X_test_std,y_train_std,y_test_std)

Linear Regression MSE: 2709.8108
Gradient Boosting MSE: 1940.8392
Linear Regression MSE: 0.5743
Gradient Boosting MSE: 0.4112


In both General and Standarized data, our gradient boosting performed better, so the sanity check is passed.

## Random Forest Regressor

In [379]:
def randomForestRegression(features_train, features_test, valid_train, valid_test, depth):
    # Measure the execution time
    start_time = time.perf_counter()

    # Create and train the model
    modelRFR = RandomForestRegressor(max_depth=depth, n_estimators=25, random_state=12345)
    modelRFR.fit(features_train, valid_train)

    # Make predictions
    y_pred = modelRFR.predict(features_test)

    # Calculate RMSE
    rmse = skmet.root_mean_squared_error(valid_test, y_pred)

    # Measure the execution time
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time

    # Create a array to store the results of depth, RMSE, and execution time
    return [depth, rmse, elapsed_time]

In [380]:
RFR_results_df = pd.DataFrame()
results_rfr = []

for depth in [5, 10, 15]:
    results_rfr.append(randomForestRegression(X_train, X_test, y_train, y_test, depth))

# Create a DataFrame from the results
RFR_results_df = pd.DataFrame(results_rfr, columns=['Depth', 'RMSE', 'Execution Time'])
# Sort the DataFrame by RMSE
RFR_results_df = RFR_results_df.sort_values(by='RMSE', ascending=True).reset_index(drop=True)
# Display the DataFrame
RFR_results_df

Unnamed: 0,Depth,RMSE,Execution Time
0,15,1742.795131,78.46181
1,10,1940.369225,59.999662
2,5,2409.756568,34.320981


## Decision Tree Regressor

In [381]:
def decisionTreeRegressor(features_train, features_test, valid_train, valid_test, depth,leaves):
    # Measure the execution time
    start_time = time.perf_counter()

    # Create and train the model
    modelRFR = DecisionTreeRegressor(max_depth=depth,max_leaf_nodes=leaves, random_state=12345)
    modelRFR.fit(features_train, valid_train)

    # Make predictions
    y_pred = modelRFR.predict(features_test)

    # Calculate RMSE
    rmse = skmet.root_mean_squared_error(valid_test, y_pred)

    # Measure the execution time
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time

    # Create a array to store the results of depth, RMSE, and execution time
    return [depth, rmse, elapsed_time]

In [382]:
DT_results_df = pd.DataFrame()
results_dt = []

for depth in [5, 10, 15]:
    for leaves in [10, 20, 30]:
        results_dt.append(decisionTreeRegressor(X_train, X_test, y_train, y_test, depth,leaves))

# Create a DataFrame from the results
DT_results_df = pd.DataFrame(results_dt, columns=['Depth', 'RMSE', 'Execution Time'])
# Sort the DataFrame by RMSE
DT_results_df = DT_results_df.sort_values(by='RMSE', ascending=True).reset_index(drop=True)
# Display the DataFrame
DT_results_df

Unnamed: 0,Depth,RMSE,Execution Time
0,10,2428.911293,3.211641
1,15,2428.911293,3.21962
2,5,2454.1184,2.84931
3,10,2529.600816,2.796249
4,15,2529.600816,2.903398
5,5,2529.600816,2.992082
6,10,2734.025956,2.383446
7,5,2734.025956,2.425082
8,15,2734.025956,2.276804


## LightGBM

In [373]:
def prepare_datetimes(df, date_cols):
    """
    Prepares the dataframe:
    - Converts date columns into year and month
    - Returns a clean dataframe for LightGBM model training
    """
    prepared_df = df.copy()  # To avoid modifying original

    # --- 1. Handle date columns ---
    for col in date_cols:
        # Convert to datetime
        prepared_df[col] = pd.to_datetime(prepared_df[col], format='%d/%m/%Y %H:%M')

        # Create year and month features
        prepared_df[f'{col}_year'] = prepared_df[col].dt.year
        prepared_df[f'{col}_month'] = prepared_df[col].dt.month

    # Drop original date columns
    prepared_df = prepared_df.drop(columns=date_cols)

    return prepared_df

In [None]:
def lightGBM(df, date_cols, cat_cols, depth,leaves):

    # Creating a dataset compatible with LightGBM
    lgb_df = prepare_datetimes(df, date_cols)

    # Assigning the features and target variables
    features_lgb = lgb_df[lgb_df.columns.drop(['Price'])]
    target_lgb = lgb_df['Price']

    # Convert are categorical features to category type
    for col in [cat_cols]:
        features_lgb[col] = features_lgb[col].astype('category')

    # Create a dataset for LightGBM model
    lgb_data = lgb.Dataset(features_lgb, label=target_lgb, categorical_feature=cat_cols)

    # Set parameters for training the model
    params = {'objective': 'regression', 'metric': 'rmse', 'force_row_wise':'true', 'max_depth': depth, 'num_leaves': leaves, 'learning_rate': 0.1, 'verbose': -1}

    # Measure the execution time
    start_time = time.perf_counter()

    # Create and train the LGBMmodel
    model = lgb.train(params, lgb_data, num_boost_round=100)

    # Make predictions
    y_pred = model.predict(features_lgb)

    # Calculate RMSE
    rmse = np.sqrt(skmet.mean_squared_error(target_lgb, y_pred))

    # Measure the execution time
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time

    # Return DataFrame with the number of leaves, number of depths and execution time
    return [leaves,depth,rmse,elapsed_time]

In [None]:
LGBM_results_df = pd.DataFrame()
results_lgb = []

for depth in [5, 10, 15]:
    for leaves in [10, 20, 30]:
        results_lgb.append(lightGBM(df, date_cols, cat_cols, depth, leaves))

# Create a DataFrame from the results
LGBM_results_df = pd.DataFrame(results_lgb, columns=['Leaves', 'Depth', 'RMSE', 'Execution Time'])
# Sort the DataFrame by RMSE
LGBM_results_df = LGBM_results_df.sort_values(by='RMSE', ascending=True).reset_index(drop=True)
# Display the DataFrame
LGBM_results_df

Unnamed: 0,Leaves,Depth,RMSE,Execution Time
0,30,15,1610.989585,1.119664
1,30,10,1611.347664,1.093746
2,30,5,1642.029419,1.0825
3,20,15,1656.065253,0.966109
4,20,10,1656.898828,0.967618
5,20,5,1664.956019,0.904469
6,10,5,1738.521139,0.849649
7,10,10,1741.200177,0.849143
8,10,15,1741.200177,0.819459


## CatBoostRegressor

In [374]:
def catBoostRegressor(X_train,X_test,y_train,y_test, depth):
    # Measure the execution time
    start_time = time.perf_counter()

    # Create and train the model
    modelCBR = CatBoostRegressor(iterations=100,learning_rate=0.1,depth=depth,verbose=0,random_state=12345)
    modelCBR.fit(X_train, y_train, cat_features=cat_cols)

    # Make predictions
    y_pred = modelCBR.predict(X_test)

    # Calculate RMSE
    rmse = skmet.root_mean_squared_error(y_test, y_pred)

    # Measure the execution time
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time

    # Create a array to store the results of depth, RMSE, and execution time
    return [depth, rmse, elapsed_time]

In [377]:
# Use the prepared datetimes function to prepare the data for CatBoost
cat_df = prepare_datetimes(df, date_cols)
# Get features and target variables
features_og = cat_df.columns.drop(['Price'])
target_og = 'Price'
# Split the data into training and testing sets
X_train_og,X_test_og,y_train_og,y_test_og = train_test_split(cat_df[features_og], cat_df[target_og], test_size=0.30, random_state=12345)

In [378]:
CB_results_df = pd.DataFrame()
results_cb = []

for depth in [5, 10, 15]:
    results_cb.append(catBoostRegressor(X_train_og,X_test_og,y_train_og,y_test_og, depth))

# Create a DataFrame from the results
CB_results_df = pd.DataFrame(results_cb, columns=['Depth', 'RMSE', 'Execution Time'])
# Sort the DataFrame by RMSE
CB_results_df = CB_results_df.sort_values(by='RMSE', ascending=True).reset_index(drop=True)
# Display the DataFrame
CB_results_df

Unnamed: 0,Depth,RMSE,Execution Time
0,15,1648.777015,73.303453
1,10,1708.74143,12.864216
2,5,1863.508939,7.555382


## XGBoost

In [386]:
def xgBoost(features_train, features_test, valid_train, valid_test, depth):
    # Measure the execution time
    start_time = time.perf_counter()

    # With XGBoost we can use our standardized data because it was done using OHE and scaling
    # Create and train the model
    modelXGB = XGBRegressor(n_estimators=100, max_depth=depth, random_state=12345)
    modelXGB.fit(features_train, valid_train)
    # Make predictions
    y_pred = modelXGB.predict(features_test)
    # Calculate RMSE
    rmse = skmet.root_mean_squared_error(valid_test, y_pred)

    # Measure the execution time
    end_time = time.perf_counter()
    elapsed_time = end_time - start_time

    # Create a array to store the results of depth, RMSE, and execution time
    return [depth, rmse, elapsed_time]

In [387]:
XGB_results_df = pd.DataFrame()
results_xgb = []

for depth in [5, 10, 15]:
    results_xgb.append(xgBoost(X_train, X_test, y_train, y_test, depth))

# Create a DataFrame from the results
XGB_results_df = pd.DataFrame(results_xgb, columns=['Depth', 'RMSE', 'Execution Time'])
# Sort the DataFrame by RMSE
XGB_results_df = XGB_results_df.sort_values(by='RMSE', ascending=True).reset_index(drop=True)
# Display the DataFrame
XGB_results_df

Unnamed: 0,Depth,RMSE,Execution Time
0,10,1621.960449,4.902606
1,15,1638.045898,8.678057
2,5,1725.646729,3.706982


# Model Analysis Conclusion

All the models where put through a series of different hyperparameters for this analysis. Only the depth in every model trained persisted to see which reactions and values would the model return.

## Tree Models

**Random forest regressor** on this analysis demonstrated to be very slow and but moderalty accurate in all the records we had during this anlayisis.

Meanwhile **Decision Tree model** demonstrated to be considerably fast, meanwhile in accuracy there was more to be expected for the analysis.

## Gradient Boosting Models

**LigthGBM** demonstrated to be the best model in all the analysis being lightspeed fast and very accurate with the same values the decision tree was given. With just 1 second to get a 16% of accuracy on a depth of 15 with 30 leaves.

On the side, **XGBoost** performed very well with speed, getting the vlue to 1621 of root error on 4 seconds, which is very accurate. We can also observe on the model that passing the 10 depth the model gets worse and gives us a higher value than we expected.

## CatBoost Model

**CatBoost** Model was very poor on speed when it came to depth , but accuracy in comparison of other models was very accurate only missing the rmse by 20.