###Summary
The tidy model “verse” is a collection of packages for modelling and statistical analysis that share the underlying design philosophy, grammar, and data structures of the tidyverse.
The core packages of tidymodels:
rsample: for sample splitting
recipes: for preprocessing
parsnip: for specifying the model
yardstick: for evaluating the model
tune: for parameter tuning 
workflows: for putting everything together 
broom: for converting the information into user-friendly format
dials: for creating and managing tuning parameters
###Comparisons
There have been debates over R’s consistency problem. The reason behind is that everything is made by different people by using different principles and everything has a slightly different interface. At first, caret was developed to provide a uniform interface for a variety of models in R. It was a great starting point, however, it was slow for even modest size operations compared to tidymodels. Because of the fact that caret is older than tidymodels, there are a lot of resources available for problems about caret. On the other side, tidymodels is newer and is built on the tidyverse principles. Furthermore, caret is a single package consisting of various functions. However, tidymodels has different packages, which give it greater flexibility and possibility to the users. 
Compared to mlr3, tidymodels has greater functionality in the preprocessing step. However, the nested sampling procedure looks more straightforward in mlr3. 
Coming to mlflow, it is useful for tracking tidymodels. The tidy model packages integrate greatly with mlfow, which allows automation in the process of tracking.
###Pros and Cons
Pros:
It is flexible. 
It is faster in general especially compared to caret.
It is more tidy in general.
Cons:
It is newer, therefore, there are less resources available.
It is still in development.
It is hard to learn at first because there are lots of specialized packages for each stage.
###links
https://cran.r-project.org/web/packages/tidymodels/tidymodels.pdf
http://www.rebeccabarter.com/blog/2020-03-25_machine_learning/
https://www.gmudatamining.com/lesson-10-r-tutorial.html
https://towardsdatascience.com/caret-vs-tidymodels-how-to-use-both-packages-together-ee3f85b381c
https://pharmacoecon.me/post/2021-05-01-tidymodels-vs-mlr3/
https://mdneuzerling.com/post/tracking-tidymodels-with-mlflow/

In [None]:
library(tidyverse)
library(tidymodels)

In [None]:
options(repr.matrix.max.rows=20, repr.matrix.max.cols=15) # for limiting the number of top and bottom rows of tables printed

In [None]:
datapath <- "~/data_ad454"

In [None]:
weo_wide2 <- readRDS(sprintf("%s/rds/01_01_weo_wide2.rds", datapath))

In [None]:
weo_countries <- readRDS(sprintf("%s/rds/01_01_weo_countries.rds", datapath))
weo_subject <- readRDS(sprintf("%s/rds/01_01_weo_subject.rds", datapath))

In [None]:
weo_subject[WEO_Subject_Code == "NGDP_RPCH"]

In [None]:
features<- c("NID_NGDP","NGDP_RPCH")

In [None]:
plot1<- weo_wide2 %>% filter(year==2019)%>% select(all_of(features)) %>% na.omit()

In [None]:
plot1 %>% ggplot(aes(x=NID_NGDP,y=NGDP_RPCH))+
geom_point() +
geom_smooth(method="lm",formula=y~x,se=F)

In [None]:
set.seed(1000)
# split the data into trainng (60%) and testing (40%)
data_split <- initial_split(plot1, 
                             prop = 3/5)
data_split

In [None]:
train_data <- training(data_split)
test_data <- testing(data_split)

In [None]:
train_data

In [None]:
test_data

In [None]:
lm_model <- linear_reg() %>%
            set_engine("lm") %>%
            set_mode("regression") 

In [None]:
lm_model

In [None]:
lm_fit <- lm_model %>% 
          fit(NGDP_RPCH~NID_NGDP, data = train_data)

In [None]:
lm_fit

In [None]:
names(lm_fit)

In [None]:
summary(lm_fit$fit)

In [None]:
par(mfrow=c(2,2)) # plot all 4 plots in one

plot(lm_fit$fit, 
     pch = 16,    # optional parameters to make points blue
     col = '#006EA1')

In [None]:
tidy(lm_fit)

In [None]:
glance(lm_fit)

In [None]:
predict(lm_fit, new_data = test_data)

In [None]:
test_results <- predict(lm_fit, new_data = test_data) %>% 
                            bind_cols(test_data)


In [None]:
test_results

In [None]:
# RMSE on test set
rmse(test_results, 
     truth = NGDP_RPCH,
     estimate = .pred)

In [None]:
rsq(test_results,
    truth = NGDP_RPCH,
    estimate = .pred)

In [None]:
ggplot(data = test_results,
       mapping = aes(x = .pred, y = NGDP_RPCH)) +
  geom_point(color = '#006EA1') +
  geom_abline(intercept = 0, slope = 1, color = 'orange') +
  labs(title = 'Linear Regression Results - Test Set',
       x = 'Predicted',
       y = 'Actual')