# Laptop Price Prediction in R

## Preprocessing

#### Imports

In [1]:
library(tidyverse)
library(recipes)
library(caret)

── [1mAttaching core tidyverse packages[22m ───────────────────────────────────────────────── tidyverse 2.0.0 ──
[32m✔[39m [34mdplyr    [39m 1.1.3     [32m✔[39m [34mreadr    [39m 2.1.4
[32m✔[39m [34mforcats  [39m 1.0.0     [32m✔[39m [34mstringr  [39m 1.5.1
[32m✔[39m [34mggplot2  [39m 3.5.0     [32m✔[39m [34mtibble   [39m 3.2.1
[32m✔[39m [34mlubridate[39m 1.9.3     [32m✔[39m [34mtidyr    [39m 1.3.0
[32m✔[39m [34mpurrr    [39m 1.0.2     
── [1mConflicts[22m ─────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mℹ[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors

Attaching package: ‘recipes’


The following object is masked from ‘package:stringr’:

    fixed


The following objec

In [2]:
df <- read.csv("../input/laptop_price_prepared.csv")

In [3]:
head(df)

Unnamed: 0_level_0,Company,Product,TypeName,Inches,Cpu,Ram,PrimaryMemSize,PrimaryMemType,SecondaryMemory,OpSys,⋯,Touchscreen,IPS_Panel,RetinaDisplay,CpuBrand,CpuProduct,CpuClockSpeed_GHz,Ram_GB,Gpu_Brand,Gpu_Product,Weight_kg
Unnamed: 0_level_1,<chr>,<chr>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<int>,<chr>,⋯,<int>,<int>,<int>,<chr>,<chr>,<dbl>,<int>,<chr>,<chr>,<dbl>
1,Apple,MacBook Pro,Ultrabook,13.3,Intel Core i5 2.3GHz,8GB,128GB,SSD,0,macOS,⋯,0,1,1,Intel,i5,5,8,Intel,Iris,1.37
2,Apple,Macbook Air,Ultrabook,13.3,Intel Core i5 1.8GHz,8GB,128GB,Flash Storage,0,macOS,⋯,0,0,0,Intel,i5,5,8,Intel,HD,1.34
3,HP,250 G6,Notebook,15.6,Intel Core i5 7200U 2.5GHz,8GB,256GB,SSD,0,No OS,⋯,0,0,0,Intel,i5,5,8,Intel,HD,1.86
4,Apple,MacBook Pro,Ultrabook,15.4,Intel Core i7 2.7GHz,16GB,512GB,SSD,0,macOS,⋯,0,1,1,Intel,i7,7,16,AMD,Radeon,1.83
5,Apple,MacBook Pro,Ultrabook,13.3,Intel Core i5 3.1GHz,8GB,256GB,SSD,0,macOS,⋯,0,1,1,Intel,i5,5,8,Intel,Iris,1.37
6,Acer,Aspire 3,Notebook,15.6,AMD A9-Series 9420 3GHz,4GB,500GB,HDD,0,Windows 10,⋯,0,0,0,AMD,,9,4,AMD,Radeon,2.1


#### Splitting data

In [4]:
y = df$Price_euros

In [5]:
X <- df[, -which(names(df) == "Price_euros")]

#### Dropping columns

In [6]:
cols <- c("Product", "Model", "CpuProduct", "ResolutionWidth", "Inches", "Weight", "Cpu")

X <- X %>%
  select(-one_of(cols))

#### Data types

In [7]:
str(X)

'data.frame':	1295 obs. of  19 variables:
 $ Company          : chr  "Apple" "Apple" "HP" "Apple" ...
 $ TypeName         : chr  "Ultrabook" "Ultrabook" "Notebook" "Ultrabook" ...
 $ Ram              : chr  "8GB" "8GB" "8GB" "16GB" ...
 $ PrimaryMemSize   : chr  "128GB" "128GB" "256GB" "512GB" ...
 $ PrimaryMemType   : chr  "SSD" "Flash Storage" "SSD" "SSD" ...
 $ SecondaryMemory  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ OpSys            : chr  "macOS" "macOS" "No OS" "macOS" ...
 $ Inches_Binned    : int  2 2 4 4 2 4 4 2 3 3 ...
 $ ResolutionHeight : int  1600 900 1080 1800 1600 768 1800 900 1080 1080 ...
 $ DisplayType      : chr  "Retina Display" NA "Full HD" "Retina Display" ...
 $ Touchscreen      : int  0 0 0 0 0 0 0 0 0 0 ...
 $ IPS_Panel        : int  1 0 0 1 1 0 1 0 0 1 ...
 $ RetinaDisplay    : int  1 0 0 1 1 0 1 0 0 0 ...
 $ CpuBrand         : chr  "Intel" "Intel" "Intel" "Intel" ...
 $ CpuClockSpeed_GHz: num  5 5 5 7 5 9 7 5 7 5 ...
 $ Ram_GB           : int  8 8 8 16 8 4 16 8 16 

In [8]:
# Specify the preprocessing steps using recipes
recipe <- recipe(~ ., data = X) %>%
  # Step to encode categorical variables
  step_dummy(all_nominal()) %>%
  # Step to scale numerical variables
  step_scale(all_numeric())

# Fit the recipe to your data
processed_X <- prep(recipe, data = X)

# Extract the preprocessed data
processed_X <- bake(processed_X, new_data = NULL)

# View the preprocessed data
print(processed_X)


“[1m[22m[33m![39m There are new levels in a factor: `NA`.”


[90m# A tibble: 1,295 × 63[39m
   SecondaryMemory Inches_Binned ResolutionHeight Touchscreen IPS_Panel
             [3m[90m<dbl>[39m[23m         [3m[90m<dbl>[39m[23m            [3m[90m<dbl>[39m[23m       [3m[90m<dbl>[39m[23m     [3m[90m<dbl>[39m[23m
[90m 1[39m               0          1.72             5.61           0      2.23
[90m 2[39m               0          1.72             3.16           0      0   
[90m 3[39m               0          3.44             3.79           0      0   
[90m 4[39m               0          3.44             6.32           0      2.23
[90m 5[39m               0          1.72             5.61           0      2.23
[90m 6[39m               0          3.44             2.69           0      0   
[90m 7[39m               0          3.44             6.32           0      2.23
[90m 8[39m               0          1.72             3.16           0      0   
[90m 9[39m               0          2.58             3.79           0 

In [9]:
names(processed_X)

#### Test train split

In [12]:
set.seed(123) # for reproducibility
trainIndex <- createDataPartition(y, p = .8, 
                                  list = FALSE, 
                                  times = 1)
X_train <- processed_X[trainIndex, ]
y_train <- y[trainIndex]
X_test  <- processed_X[-trainIndex, ]
y_test  <- y[-trainIndex]

#### Linear Regression

In [13]:
# Perform linear regression
lm_model <- lm(y_train ~ ., data = X_train)

# Summarize the linear regression model
summary(lm_model)

# Predictions
y_pred <- predict(lm_model, newdata = X_test)

# Evaluate the model
mse <- mean((y_pred - y_test)^2)
rmse <- sqrt(mse)
print(paste("Root Mean Squared Error (RMSE):", rmse))


Call:
lm(formula = y_train ~ ., data = X_train)

Residuals:
     Min       1Q   Median       3Q      Max 
-1616.46  -190.46   -41.26   164.72  1615.13 

Coefficients: (10 not defined because of singularities)
                           Estimate Std. Error t value Pr(>|t|)    
(Intercept)                 652.937   1050.800   0.621 0.534560    
SecondaryMemory              27.547     19.805   1.391 0.164701    
Inches_Binned               -58.984     27.949  -2.110 0.035184 *  
ResolutionHeight             89.863    123.876   0.725 0.468439    
Touchscreen                  17.152     23.620   0.726 0.467996    
IPS_Panel                   -10.822     14.102  -0.767 0.443117    
RetinaDisplay               -19.173     56.403  -0.340 0.734015    
CpuClockSpeed_GHz           -44.541     28.188  -1.580 0.114535    
Ram_GB                      127.202    108.793   1.169 0.242723    
Weight_kg                   140.114     27.544   5.087 4.69e-07 ***
Company_Apple               -73.908     85

“prediction from rank-deficient fit; attr(*, "non-estim") has doubtful cases”


[1] "Root Mean Squared Error (RMSE): NA"
