In [1]:
# ================================
# 07 FULL REGRESSION MODEL
# ================================

library(readr)
library(dplyr)

setwd("C:/Users/Graf David/R/FinalProject")

df <- read_csv("dataset/train.csv", show_col_types = FALSE)

# прибираємо v.id
df_model <- df %>% select(-`v.id`)

# фіксуємо той самий seed
set.seed(42)

sample_size <- floor(0.75 * nrow(df_model))
train_index <- sample(seq_len(nrow(df_model)), size = sample_size)

train_data <- df_model[train_index, ]
test_data  <- df_model[-train_index, ]



Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union



In [2]:
# Повна регресійна модель
full_model <- lm(
  `current price` ~ .,
  data = train_data
)


In [3]:
summary(full_model)


In [4]:
coeff_table <- summary(full_model)$coefficients

coeff_sorted <- coeff_table[order(coeff_table[, 4]), ]
coeff_sorted


In [5]:
r2  <- summary(full_model)$r.squared
adj_r2 <- summary(full_model)$adj.r.squared

cat("R-squared:", r2, "\n")
cat("Adjusted R-squared:", adj_r2, "\n")


R-squared: 0.9953061 
Adjusted R-squared: 0.9952425 


In [6]:
cat("\n==============================\n")
cat("INTERPRETATION (FULL MODEL)\n")
cat("==============================\n")

cat("Each coefficient shows how 'current price' changes when this variable increases by 1 unit,\n")
cat("while all other variables are held constant.\n\n")

cat("Key interpretation rules:\n")
cat("- Positive coefficient  -> price increases\n")
cat("- Negative coefficient  -> price decreases\n")
cat("- Small p-value (< 0.05) -> statistically significant\n")



INTERPRETATION (FULL MODEL)
Each coefficient shows how 'current price' changes when this variable increases by 1 unit,
while all other variables are held constant.

Key interpretation rules:
- Positive coefficient  -> price increases
- Negative coefficient  -> price decreases
- Small p-value (< 0.05) -> statistically significant


In [7]:
# ================================
# FULL SUMMARY — STEP 07
# ================================

cat("\n==============================\n")
cat("FULL REGRESSION MODEL SUMMARY\n")
cat("==============================\n")

cat("\n--- MODEL FORMULA ---\n")
print(formula(full_model))

cat("\n--- COEFFICIENTS ---\n")
print(coeff_sorted)

cat("\n--- MODEL QUALITY ---\n")
cat("R-squared:        ", r2, "\n")
cat("Adjusted R-squared:", adj_r2, "\n")

cat("\n--- SIGNIFICANT VARIABLES (p < 0.05) ---\n")
print(coeff_sorted[coeff_sorted[,4] < 0.05, ])

cat("\n--- TRAIN PRICE MEAN ---\n")
cat(mean(train_data$`current price`), "\n")

cat("\n==============================\n")
cat("END OF STEP 07\n")
cat("==============================\n")



FULL REGRESSION MODEL SUMMARY

--- MODEL FORMULA ---
`current price` ~ `on road old` + `on road now` + years + km + 
    rating + condition + economy + `top speed` + hp + torque

--- COEFFICIENTS ---
                   Estimate   Std. Error      t value      Pr(>|t|)
`on road old`  5.058015e-01 5.412251e-03   93.4549376  0.000000e+00
`on road now`  5.019115e-01 5.549887e-03   90.4363580  0.000000e+00
km            -4.001032e+00 1.092471e-02 -366.2370176  0.000000e+00
condition      4.614065e+03 1.121874e+02   41.1281917 3.116403e-193
years         -1.621826e+03 1.853476e+02   -8.7501848  1.431296e-17
(Intercept)   -1.523163e+04 6.887830e+03   -2.2113835  2.731463e-02
rating         4.444516e+02 2.270504e+02    1.9575021  5.066448e-02
hp             1.614339e+01 1.541895e+01    1.0469838  2.954493e-01
`top speed`   -1.485023e+01 1.635096e+01   -0.9082174  3.640595e-01
economy        7.055422e+01 1.419216e+02    0.4971353  6.192415e-01
torque         2.344075e+00 1.504412e+01    0.15581