Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can data argumetn be multivariate? #10

Closed
MislavSag opened this issue Jun 12, 2019 · 4 comments
Closed

Can data argumetn be multivariate? #10

MislavSag opened this issue Jun 12, 2019 · 4 comments

Comments

@MislavSag
Copy link

@MislavSag MislavSag commented Jun 12, 2019

I have just tried your package. I am not sure is data argument in AutoTS univariate time series or it can contain multiplie variables?

I tried with more than one variable, but I got final graph with two time series (instead of one, target variable).

EDIT: One more issue

I have following data:

data <- structure(list(zadnja = c(421, 425, 432, 415, 414, 409.99, 407, 
415, 424.99, 432, 425, 433, 428, 428.99, 425, 425, 420, 420, 
420, 419.98, 415, 410, 407, 407.5, 399.98, 400.05, 380, 400, 
394.99, 389.98, 395.05, 381.5, 385, 395.9, 383, 376, 390, 385.01, 
385, 379, 375.1, 380, 378.99, 368.99, 355.75, 367.97, 370, 376, 
386.98, 392), index = structure(c(13917, 13920, 13921, 13922, 
13923, 13924, 13927, 13928, 13929, 13930, 13931, 13934, 13935, 
13936, 13937, 13938, 13941, 13942, 13943, 13944, 13945, 13948, 
13949, 13950, 13951, 13952, 13955, 13956, 13957, 13958, 13963, 
13964, 13965, 13966, 13969, 13970, 13971, 13972, 13973, 13976, 
13977, 13978, 13979, 13980, 13983, 13984, 13985, 13986, 13987, 
13990), class = "Date")), row.names = c(NA, -50L), index_quo = ~index, index_time_zone = "UTC", class = c("tbl_time", 
"tbl_df", "tbl", "data.frame"))

When I tried to estimate model using AutoTS:

stock_forecast = RemixAutoML::AutoTS(
  data = data,
  TargetName = "zadnja",
  DateName = "index",
  FCPeriods = 7,
  HoldOutPeriods = 5,
  TimeUnit = "day"
)

I got an error:

 Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : 
  NA/NaN/Inf in 'y' 

P. S. Why do you have R code in one file?

@AdrianAntico
Copy link
Owner

@AdrianAntico AdrianAntico commented Jun 12, 2019

Currently, AutoTS() runs a single series at a time. You should supply a data.table with a date column and a target column (in that order), like the example in the help file. If you want to run through multiple series, create a loop and subset the data before each AutoTS() run.

This works for me:

data <- structure(list(zadnja = c(421, 425, 432, 415, 414, 409.99, 407, 
                                  415, 424.99, 432, 425, 433, 428, 428.99, 425, 425, 420, 420, 
                                  420, 419.98, 415, 410, 407, 407.5, 399.98, 400.05, 380, 400, 
                                  394.99, 389.98, 395.05, 381.5, 385, 395.9, 383, 376, 390, 385.01, 
                                  385, 379, 375.1, 380, 378.99, 368.99, 355.75, 367.97, 370, 376, 
                                  386.98, 392), index = structure(c(13917, 13920, 13921, 13922, 
                                                                    13923, 13924, 13927, 13928, 13929, 13930, 13931, 13934, 13935, 
                                                                    13936, 13937, 13938, 13941, 13942, 13943, 13944, 13945, 13948, 
                                                                    13949, 13950, 13951, 13952, 13955, 13956, 13957, 13958, 13963, 
                                                                    13964, 13965, 13966, 13969, 13970, 13971, 13972, 13973, 13976, 
                                                                    13977, 13978, 13979, 13980, 13983, 13984, 13985, 13986, 13987, 
                                                                    13990), class = "Date")), row.names = c(NA, -50L), index_quo = ~index, index_time_zone = "UTC", class = c("tbl_time", 
                                                                                                                                                                              "tbl_df", "tbl", "data.frame"))

data <- data.table::as.data.table(data)
data.table::setcolorder(data, c(2,1))

xx <- RemixAutoML::AutoTS(data,
                          TargetName       = "zadnja",
                          DateName         = "index",
                          FCPeriods        = 1,
                          HoldOutPeriods   = 1,
                          EvaluationMetric = "MAPE",
                          TimeUnit         = "day",
                          Lags             = 1,
                          SLags            = 1,
                          NumCores         = 4,
                          SkipModels       = c("NNET","TBATS","ETS","TSLM","ARFIMA","DSHW"),
                          StepWise         = TRUE,
                          TSClean          = FALSE,
                          ModelFreq        = TRUE,
                          PrintUpdates     = FALSE)

P.S. I keep the code in a single file because it's easier for me to develop that way. I understand it's more challenging to find specific code blocks that way and I'll be splitting them up eventually, when development slows down.

@AdrianAntico
Copy link
Owner

@AdrianAntico AdrianAntico commented Jun 12, 2019

I would like to add that I added in the functionality to remove all extraneous columns you may put in your data along with ensuring the ordering of columns are correct.

@MislavSag
Copy link
Author

@MislavSag MislavSag commented Jun 13, 2019

Ok, it works now.

Thanks.

It would be to incorporate multivariate models in the future.

@AdrianAntico
Copy link
Owner

@AdrianAntico AdrianAntico commented Jun 13, 2019

You should check out AutoCatBoostCARMA(). It's a multivariate catboost forecasting function. It utilizes calendar, trend, and ARMA variables, and replicates an ARMA forecasting process. I tested it recently on some Walmart store / department data and was able to generate forecasts for 2660 store / dept's in about 15 minutes on GPU.

You need to have your data in long format - that is, you need a date column, values column, and categorical columns such that, by filtering for a unique set of factor levels you will have an individual series. So basically, stacked data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants