diff --git a/DESCRIPTION b/DESCRIPTION index cd28bf0..078f124 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,18 +1,21 @@ Package: smooth Type: Package Title: Forecasting Using State Space Models -Version: 2.5.0 -Date: 2019-04-25 +Version: 2.5.1 +Date: 2019-06-13 Authors@R: person("Ivan", "Svetunkov", email = "ivan@svetunkov.ru", role = c("aut", "cre"), comment="Lecturer at Centre for Marketing Analytics and Forecasting, Lancaster University, UK") URL: https://github.com/config-i1/smooth BugReports: https://github.com/config-i1/smooth/issues Language: en-GB -Description: Functions implementing Single Source of Error state space models for purposes - of time series analysis and forecasting. The package includes Exponential Smoothing, - SARIMA, Complex Exponential Smoothing, Simple Moving Average, Vector Exponential - Smoothing in state space forms, several simulation functions and intermittent demand - state space models. +Description: Functions implementing Single Source of Error state space models for purposes of time series analysis and forecasting. + The package includes Exponential Smoothing (Hyndman et al., 2008, ), + SARIMA (Svetunkov & Boylan, 2019 ), + Complex Exponential Smoothing (Svetunkov & Kourentzes, 2018, ), + Simple Moving Average (Svetunkov & Petropoulos, 2018 ), + Vector Exponential Smoothing (de Silva et al., 2010, ) in state space forms, + several simulation functions and intermittent demand state space models. It also allows dealing with + intermittent demand based on the iETS framework (Svetunkov & Boylan, 2017, ). License: GPL (>= 2) Depends: R (>= 3.0.2), greybox (>= 0.5.0) Imports: Rcpp (>= 0.12.3), stats, graphics, forecast, nloptr, utils, @@ -23,9 +26,9 @@ VignetteBuilder: knitr RoxygenNote: 6.1.1 Encoding: UTF-8 NeedsCompilation: yes -Packaged: 2019-04-25 18:08:23 UTC; config +Packaged: 2019-06-13 11:00:38 UTC; config Author: Ivan Svetunkov [aut, cre] (Lecturer at Centre for Marketing Analytics and Forecasting, Lancaster University, UK) Maintainer: Ivan Svetunkov Repository: CRAN -Date/Publication: 2019-04-25 22:50:56 UTC +Date/Publication: 2019-06-14 16:10:03 UTC diff --git a/MD5 b/MD5 index 6538cb1..ea52714 100644 --- a/MD5 +++ b/MD5 @@ -1,128 +1,129 @@ -9eba23b88eda6615718235d53534ff05 *DESCRIPTION -e6470d91312ebb45f8594bc8be403565 *NAMESPACE -5329aa9a2a0109d179cb862b4d143d4d *NEWS +26d5eb602bc45701f16c01fb892596f5 *DESCRIPTION +f9c96f4753f9472634e7a90fcc01e709 *NAMESPACE +438f3e7ed9709bda1b95a0f13261675f *NEWS f6758fdbb19908b222caf93cf4ae933b *R/RcppExports.R -175b3dccded8ac6522fe13ab90a761e8 *R/autoces.R -28ccf9d2a57f03852ec18d89556f1ffb *R/autogum.R -0593a4af6356f35503904dbc34733993 *R/automsarima.R -65dcab9d8af7a3c718c57996be08a1e3 *R/autossarima.R -8f1f2f5390bc01cf17c57953c2e31a8b *R/ces.R -ea8f8684d4dcba3d1286cd1f59ecfba3 *R/cma.R -8da0895929fe2e8951684f57266388dd *R/es.R -ae3af26c65cce9ed32c3fb1a0cf46156 *R/gsi.R -80140bca51cea90e9583d71d2f3a8a76 *R/gum.R +b87267f9da0364aa9ba99afc8b587a61 *R/autoces.R +865d40b22795427d76cf78cc314b472e *R/autogum.R +c964a6bf865398cc11a030d4371c4a7a *R/automsarima.R +d27a7b84cb709567a2e0445cf179fc0b *R/autossarima.R +ff4797aec9b93877283eb27ea6117a2f *R/ces.R +04fdedf8f782d6c8173821b6a62cafbe *R/cma.R +31f124a4c711e51c07a755ad2ae4e88a *R/depricator.R +3fd63a9a8637d82a7cd86997e1e0c8c1 *R/es.R +ef2749124a11fce6654da82644cce3ee *R/gsi.R +1543db985524eafcf8af4d663dee0f44 *R/gum.R 234bdbdc6d8630a2d2c97d24b750e001 *R/isFunctions.R -242ec1d4c4bbde46856ab226c57df48b *R/iss.R -787ae4d8e6ec0837698456cf32e78c20 *R/methods.R -9a8a8d56664f5dd3a250793b564bf498 *R/msarima.R -2f9e89010928790b23ef654e14fbbb33 *R/oes.R -dafcfa0b9f5e478d44750ff2b0938b9a *R/oesg.R -f3002fee0c96944f78dffa1619097548 *R/simces.R -253e660eceacf33b5142b07254b30821 *R/simes.R -a48d1f49d0f04fb60f5298cae4e8cb75 *R/simgum.R -1e33c5f6d740fabcb5fee2c9b199f550 *R/simsma.R -155abfb5c7411ac5f98def3baf3eb6a0 *R/simssarima.R -01161a2a18b89b49937a7dc72061c53c *R/simves.R -aa32ef5b0848bc6bd2b79f9894d6f102 *R/sma.R +409cd8f916b16bad8a68e99934709ad2 *R/iss.R +1b191be08c624d161413b04e35bcf7ca *R/methods.R +a8c76910588afb7d6d837afab05164be *R/msarima.R +e094e7fa3956a4884b8e035904c5eb81 *R/oes.R +95659023fb64abe9c8779a3e57ac7c2e *R/oesg.R +06b350ee19d7f5e79da776ef09c98b52 *R/simces.R +3e1fee1f15be343eec450e7fe062f32d *R/simes.R +666cb469e978cab49a9f7b74f7850c7d *R/simgum.R +e032b99453b0ff4312e6ef2226bb519d *R/simsma.R +24f2b804c4e90017ab23ebe288bd857f *R/simssarima.R +e1b55572802f6e7cb539c3c0b944e922 *R/simves.R +ad02922cd5f872a3d7aba90f85330442 *R/sma.R 474c485bd44e489896ad1f7df6f8e1e0 *R/smooth-package.R -a344399fb35d267cbca0c6659aa61299 *R/smoothCombine.R +6630b9660581361b2eac2409ec4356b1 *R/smoothCombine.R 24d42f1698e1f50eb27f1a32ecab61c8 *R/sowhat.R -612bb43ff081baff6f8b552e988f1ab5 *R/ssarima.R -7bc269b0242111e81754ef3762a3b1da *R/ssfunctions.R +f89190cb7061df51e913a57c7d70e314 *R/ssarima.R +9239e77c40bb2ae994570df0a468cba9 *R/ssfunctions.R b45db627556532cfdf6e20ccc1ff34dc *R/variance-covariance.R -1cd7cc21afc92abd6baa9017b3270893 *R/ves.R -ad70c2c861a2d83a98093974868fdf80 *R/viss.R -915d9cb9dacf122e0d9004ea62c9b6c2 *R/vmethods.R -c9c0c03afeca53eeb9dfd565fa4f0225 *R/vssFunctions.R +2a413fc35dab9c53e5189cc79a6d3d74 *R/ves.R +4e9c4c8a3de3a463c32c2c1a3ec4f752 *R/viss.R +65ecf30a68069fa57d9a55be9365a7a5 *R/vmethods.R +e1c31c82fdd699a08dc1e8a9c00c4525 *R/vssFunctions.R 3113d776d98ddd16e1ee1c7cfe5ba513 *R/zzz.R 4e5507d3c906e24069458a1db2ade608 *README.md -4eb8525b4cd770af249e14334ad3fded *build/partial.rdb +35318f8438c1eb93ca90312965db7b72 *build/partial.rdb a5088d39f054b33df4ba58425511b45a *build/vignette.rds -cbee8299ea7a4e9d9b4ef303f4e6a01a *inst/doc/ces.R -9a6306fb84d1f4cf62fdecb1966aa0ab *inst/doc/ces.Rmd -d6b0b0f6743df6346f62500340a0227a *inst/doc/ces.html -dcc169d4022fdb0fac9799df36a53609 *inst/doc/es.R -aee1e5b2cf8c8bbc33a28000ea3e3489 *inst/doc/es.Rmd -da23dfd11c64ed68fbe89a00778a497f *inst/doc/es.html -78a0d7f6f8275afd57059a4c1cd03e6d *inst/doc/gum.R -3c58414f2930dc462b5befda3a20ae1c *inst/doc/gum.Rmd -bbb29ec445aa476b278626fdbd248965 *inst/doc/gum.html -1c112665017fbee58702c69a01b4fe6a *inst/doc/oes.R -f53dc1121b6721a2c211ebf585324aad *inst/doc/oes.Rmd -745d06e2857d80aec5aecfda8d50cf7d *inst/doc/oes.html +e7ff134dfef81ef90cb355df49cd20ad *inst/doc/ces.R +63102e1de32a1dfa9fca0c23576696a1 *inst/doc/ces.Rmd +35a08b39541c75de685c5daf99658fbd *inst/doc/ces.html +7a5731f4e90d943393e09cea3807eb39 *inst/doc/es.R +75d528e227579a8662d15950b92d6d55 *inst/doc/es.Rmd +66cb8f25b318ef885049912179415b26 *inst/doc/es.html +0183cb06ce8f3ad215895bff752bdf63 *inst/doc/gum.R +c757d7f4971db027d931c0543d77f79b *inst/doc/gum.Rmd +1fe10a98591cddbba17b08c1d725a626 *inst/doc/gum.html +c6dff78b989971508587a79def61ad9b *inst/doc/oes.R +a4334fbba782e484e7b037f79ddcfb46 *inst/doc/oes.Rmd +51a63207ab30e17ab3f61d584a42eaf0 *inst/doc/oes.html 990d77b6e7229994f2ae259df6f121a1 *inst/doc/simulate.R 3c284a1692f90426fba996c01a78af8c *inst/doc/simulate.Rmd -d9e5c82c7ee02055363c4df28058ba2b *inst/doc/simulate.html -5834d35fd17557d8005dd308aa351bb2 *inst/doc/sma.R -8cb859d21072dff613ae6439ef6bdd91 *inst/doc/sma.Rmd -0fd32bd1a80fed353f3a22477504a396 *inst/doc/sma.html +2883075c963a6c74683b837c1e6308e5 *inst/doc/simulate.html +82219c258851c03622ae04b35f645022 *inst/doc/sma.R +7a6cd5f84b090d58c565b781c2801327 *inst/doc/sma.Rmd +3c8feb687c0c07b72c26db095501df31 *inst/doc/sma.html f2be0cff7be52faff06f2a377333a8c6 *inst/doc/smooth-Documentation.pdf ab7560c1647929fa67f92f931009917d *inst/doc/smooth.R -c572cefabe7e478d3b5dca6e503a2329 *inst/doc/smooth.Rmd -454ac1811460e229fbf857572f4ac6bc *inst/doc/smooth.html -e568bdbdf1b30f81203586215b2a960c *inst/doc/ssarima.R -21442a46be54e6f5a93b9a511b9a55ec *inst/doc/ssarima.Rmd -c507e5471029c01c3066e397468e24f4 *inst/doc/ssarima.html +23f9e9e4b5aeaaf04edf034dc871c2be *inst/doc/smooth.Rmd +4ce781baebaa5a35f94c4981ed7d4ffd *inst/doc/smooth.html +45d8e57dc5084d4aafefc907917a2f34 *inst/doc/ssarima.R +3033ecc24af0b53c65632e0838b4a059 *inst/doc/ssarima.Rmd +ee6d8f76b857fd3845f8aa5921982c34 *inst/doc/ssarima.html 00cfbce0c000855b2dde75eb6c7de1d1 *inst/doc/ves.R -c1b787a1aeeb7201c31615f06bab22e7 *inst/doc/ves.Rmd -8662586abd7ef57e95bfe64a555b066f *inst/doc/ves.html -8da23ef630e39c9d6efc1994a8859f4e *man/auto.ces.Rd -ead1bd9b96d64f96c4c0f6057ba9ca21 *man/auto.gum.Rd -4d5e998efec13d22f2563f05ea59ec51 *man/auto.msarima.Rd -864705acad07a5d2c651d111bd9d2039 *man/auto.ssarima.Rd -8d47ca8dcba673728ae82d0bd53fa1e8 *man/ces.Rd -29db3246e818ef103b130f0ceb0a4656 *man/cma.Rd +4859dd9f3e5ae42e29339bc4a38effe6 *inst/doc/ves.Rmd +fb9799d249a703169a50f99861073608 *inst/doc/ves.html +127dfd17e5bf95957ba3d31ec7a8e883 *man/auto.ces.Rd +9771a33d1eef094b837811de79e51e46 *man/auto.gum.Rd +e78f2ca3e2f2d87aedbc0111f8982fe7 *man/auto.msarima.Rd +a9f340337d73bf796813e98b69c66f00 *man/auto.ssarima.Rd +4d8b52efccf47a94334eb315aa9bb958 *man/ces.Rd +0685115bc87b456a6f18afa4709164de *man/cma.Rd 682b961418970a7c88da89553be1a734 *man/covar.Rd -45c1d46b28e060e1bcdd92ec82b55c70 *man/es.Rd -8ebd4387e4c50f16eaf0191f23d971ed *man/forecast.smooth.Rd -a34910501457499fbe4c659e75a1a933 *man/gsi.Rd -ef9bdd35d5131e33a0714b8a1bd5358f *man/gum.Rd +5ba50db1b729e0e805555dee8fa7da27 *man/es.Rd +ee2a0e6ef5ade36c0e7e1aa35c21af5c *man/forecast.smooth.Rd +bc2310c7e416f9dd5b3991d6e719dc16 *man/gsi.Rd +5534d246ca6e76b7e4899895579591c9 *man/gum.Rd 7ab3cb6652f433d2b1fd9a6050b8a788 *man/isFunctions.Rd -61c3647d5bee1779bfe978b888031acd *man/iss.Rd -0150586ad50352c4448f49b268dba24d *man/msarima.Rd -4e9574a61af3751dc52c7db282010dc3 *man/oes.Rd -684ec7ab6ed6f916d497a937ad82cc98 *man/oesg.Rd +60699e209fb84df141bb6df57f5cacf5 *man/iss.Rd +6297f953f94dd98250ca8045fb8fd40a *man/msarima.Rd +996e16e792fe67fe7dcb00b30173867c *man/oes.Rd +6735bde966b989b45c7f45748a2e1922 *man/oesg.Rd 0f3268c8a858c058a3fa8b8939767d2e *man/orders.Rd -71fbcaa315a948702323a67f8205d57a *man/pls.Rd -9e7bdcc612fe485d34f821e52c43bc03 *man/sim.ces.Rd -c39b1646cfc651ed54272236948b752a *man/sim.es.Rd -b674cfb7c633db0c13752b0d9120550e *man/sim.gum.Rd -bb7a65f56db1c98a73ee02557fba3aae *man/sim.sma.Rd -8b2f8e9f3b739a7763643dc1b6c712a8 *man/sim.ssarima.Rd -2c89eafe613b249484e3e751e82900e4 *man/sim.ves.Rd -b0fc4c221b2ce4041102b564fcc9ec41 *man/sma.Rd -90b6c4a11fcbb8c3d945f4ec58f566c9 *man/smooth.Rd -23fdef7f5c3a6341c0a3897f5b2be942 *man/smoothCombine.Rd +794af59e4019d28d2250c13e112fa095 *man/pls.Rd +0a88c82743571b944e17a449d67b683d *man/sim.ces.Rd +953145e5939aa8ec528b0ebdbae9793c *man/sim.es.Rd +d9be59d43ba1e8339e9052a7b8d2f927 *man/sim.gum.Rd +1ee9bd971c6944ffe5bb7b1b78457fde *man/sim.sma.Rd +764d74d71353e3cf4cbf7a1f467192e6 *man/sim.ssarima.Rd +9d1b07ab398697314fbd71d624e8cfdb *man/sim.ves.Rd +6ea9f98b3c13da722e11682d0fa056e8 *man/sma.Rd +5e3f4d5d21a600d459f1ea68d4aab801 *man/smooth.Rd +ed2c775df12f5fd91fbd45d64adebb8a *man/smoothCombine.Rd a0993d7de7f0b96415bb7b69f66b60ab *man/sowhat.Rd -717f009995f5decbbddbf0dc7a068496 *man/ssarima.Rd -c06e1ec5113abf735506a257653bab32 *man/ves.Rd -9ceb1e025b3bd37d79eb21bc0c86608f *man/viss.Rd +69fe9e56655640bea87c6fddf2a6a1ed *man/ssarima.Rd +f6b15eeac1e45375873ea2c9a1b81952 *man/ves.Rd +0bed3336877e918ead66d79ba752cfe1 *man/viss.Rd 9859afefaf6500832d7469cadbbb28c8 *src/Makevars a6850c2998c396b505104b800670480e *src/Makevars.win 251c5afc2a6526d023da28e6dffce421 *src/RcppExports.cpp 9fb0915a4e7ffe192150808e5e890c69 *src/registerDynamicSymbol.c -173db5745362dcc36bd4e34680f03ffd *src/ssGeneral.cpp +8403f9e059cf0235af9eafc138387d83 *src/ssGeneral.cpp 1ff530213052894349cca6864ef67a38 *src/ssGeneral.h -808aeef1ef735df78ef5aab1cab21fc0 *src/ssOccurrence.cpp +f1c1efab9e21d3aa53e7e85caefd153e *src/ssOccurrence.cpp cccddc1650403630c1f85f6bc9a2ac63 *src/ssSimulator.cpp 71a168fa8cd1314f43dd0aeeea113dc9 *src/vSimulator.cpp 3ff978aed1beb78736800b0bf9e709d1 *src/vssGeneral.cpp 4e0f43b23ba7abbb29b225614e88f276 *tests/testthat.R -ce0c82d2b00807f0783cb0b472a4b5a1 *tests/testthat/test_ces.R -844d4d705d76fadf7ab735828bbbb7e1 *tests/testthat/test_es.R -3d75b6cb57922b211cb49f57c8a460f3 *tests/testthat/test_ges.R +9fb40d92f8f77a1429f1add4bc387a36 *tests/testthat/test_ces.R +f925ba8597ccf07763fb7b851122455c *tests/testthat/test_es.R +02ac837d0ec958cbdd82441b4c0e54d4 *tests/testthat/test_ges.R e9b748a6aae74f3650f228089b6b75be *tests/testthat/test_oes.R 69e2784165cabce3cf19514c0907df6c *tests/testthat/test_simulate.R -5903710aa5287a11fcfba27b84b5e1a7 *tests/testthat/test_ssarima.R -9e08bd4ab8213411a400e978ca465058 *tests/testthat/test_ves.R -9a6306fb84d1f4cf62fdecb1966aa0ab *vignettes/ces.Rmd -aee1e5b2cf8c8bbc33a28000ea3e3489 *vignettes/es.Rmd -3c58414f2930dc462b5befda3a20ae1c *vignettes/gum.Rmd +84b5bd3ad4f998695404cf76b9fef734 *tests/testthat/test_ssarima.R +07a3dd2e0e194debc65693f98f90f88f *tests/testthat/test_ves.R +63102e1de32a1dfa9fca0c23576696a1 *vignettes/ces.Rmd +75d528e227579a8662d15950b92d6d55 *vignettes/es.Rmd +c757d7f4971db027d931c0543d77f79b *vignettes/gum.Rmd d5c5cdaf1dcb23ce35182bc1d846ef63 *vignettes/library.bib -f53dc1121b6721a2c211ebf585324aad *vignettes/oes.Rmd +a4334fbba782e484e7b037f79ddcfb46 *vignettes/oes.Rmd 3c284a1692f90426fba996c01a78af8c *vignettes/simulate.Rmd -8cb859d21072dff613ae6439ef6bdd91 *vignettes/sma.Rmd +7a6cd5f84b090d58c565b781c2801327 *vignettes/sma.Rmd f2be0cff7be52faff06f2a377333a8c6 *vignettes/smooth-Documentation.pdf -c572cefabe7e478d3b5dca6e503a2329 *vignettes/smooth.Rmd -21442a46be54e6f5a93b9a511b9a55ec *vignettes/ssarima.Rmd -c1b787a1aeeb7201c31615f06bab22e7 *vignettes/ves.Rmd +23f9e9e4b5aeaaf04edf034dc871c2be *vignettes/smooth.Rmd +3033ecc24af0b53c65632e0838b4a059 *vignettes/ssarima.Rmd +4859dd9f3e5ae42e29339bc4a38effe6 *vignettes/ves.Rmd diff --git a/NAMESPACE b/NAMESPACE index 4f19f62..77a629b 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -4,6 +4,7 @@ S3method(AICc,smooth) S3method(AICc,vsmooth) S3method(BICc,smooth) S3method(BICc,vsmooth) +S3method(actuals,iss) S3method(actuals,smooth) S3method(actuals,smooth.forecast) S3method(coef,smooth) @@ -34,6 +35,7 @@ S3method(modelName,forecast) S3method(modelName,lm) S3method(modelName,smooth) S3method(modelType,default) +S3method(modelType,ets) S3method(modelType,iss) S3method(modelType,oesg) S3method(modelType,smooth) diff --git a/NEWS b/NEWS index 08b7ce7..3a44a82 100644 --- a/NEWS +++ b/NEWS @@ -1,3 +1,26 @@ +smooth v2.5.1 (Release data: 2019-06-13) +============== + +Changes: +* New initials for the smoothing parameters in oes(): 0.05 instead of 0.01. This should help in the optimisation. +* oesmodel can now be set to NA / NULL, the es(), ssarima() etc will work, resetting this to the default "MNN". +* Corrections in the vignette for oes, according to the most recent version of our paper. +* A little bit more explanation for the es() function and references to the website. +* Model selection is now available in the oes() function. I'd recommend sticking with pure models. +* Renamed the fitted values for the underlying models in the oes() and oesg() into "fittedModel" (former "fittedBeta"). +* Rolling back "quiet" into "silent". The former is more difficult than the latter. +* "data" parameter is now renamed into "y" for consistency purposes (so that the input is more similar to the functions of the forecast package). +* "cfType" is now renamed into "loss". +* "workFast" in auto.ssarima() and auto.msarima() is now "fast" instead. +* "intervals" (plural) is now "interval" (singular) instead. This is needed, so that the input is consistent with predict() function. +* "actuals" are now renamed into "y" in the output of the functions. +* A failsafe mechanism for SSARIMA and MSARIMA for the cases, when there is no models and no constant. + +Bugfixes: +* A bugfix in occurrence="d" - an error was not calculated correctly. +* A bugfix in the model selection mechanism for oes() (the forecast was set to 10 instead of h). + + smooth v2.5.0 (Release data: 2019-04-25) ============== diff --git a/R/autoces.R b/R/autoces.R index 53c10fe..a8e6cbd 100644 --- a/R/autoces.R +++ b/R/autoces.R @@ -12,6 +12,9 @@ utils::globalVariables(c("silentText","silentGraph","silentLegend","initialType" #' seasonality and chooses the one with the lowest value of information #' criterion. #' +#' For some more information about the model and its implementation, see the +#' vignette: \code{vignette("ces","smooth")} +#' #' #' @template ssBasicParam #' @template ssAdvancedParam @@ -42,21 +45,20 @@ utils::globalVariables(c("silentText","silentGraph","silentLegend","initialType" #' library("Mcomp") #' \dontrun{y <- ts(c(M3$N0740$x,M3$N0740$xx),start=start(M3$N0740$x),frequency=frequency(M3$N0740$x)) #' # Selection between "none" and "full" seasonalities -#' auto.ces(y,h=8,holdout=TRUE,models=c("n","f"),intervals="p",level=0.8,ic="AIC")} +#' auto.ces(y,h=8,holdout=TRUE,models=c("n","f"),interval="p",level=0.8,ic="AIC")} #' -#' y <- ts(c(M3$N1683$x,M3$N1683$xx),start=start(M3$N1683$x),frequency=frequency(M3$N1683$x)) -#' ourModel <- auto.ces(y,h=18,holdout=TRUE,intervals="sp") +#' ourModel <- auto.ces(M3[[1683]],interval="sp") #' #' summary(ourModel) #' forecast(ourModel) #' plot(forecast(ourModel)) #' #' @export auto.ces -auto.ces <- function(data, models=c("none","simple","full"), +auto.ces <- function(y, models=c("none","simple","full"), initial=c("optimal","backcasting"), ic=c("AICc","AIC","BIC","BICc"), - cfType=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), + loss=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), h=10, holdout=FALSE, cumulative=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, occurrence=c("none","auto","fixed","general","odds-ratio","inverse-odds-ratio","direct"), oesmodel="MNN", bounds=c("admissible","none"), @@ -73,6 +75,11 @@ auto.ces <- function(data, models=c("none","simple","full"), # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + # Add all the variables in ellipsis to current environment list2env(list(...),environment()); @@ -94,7 +101,7 @@ auto.ces <- function(data, models=c("none","simple","full"), } models <- substr(models,1,1); - dataFreq <- frequency(data); + dataFreq <- frequency(y); # Define maximum needed number of parameters if(any(models=="n")){ @@ -140,11 +147,11 @@ auto.ces <- function(data, models=c("none","simple","full"), message("The data is not seasonal. Simple CES was the only solution here."); } - CESModel <- ces(data, seasonality="n", + CESModel <- ces(y, seasonality="n", initial=initialType, ic=ic, - cfType=cfType, + loss=loss, h=h, holdout=holdout,cumulative=cumulative, - intervals=intervalsType, level=level, + interval=intervalType, level=level, occurrence=occurrence, oesmodel=oesmodel, bounds=bounds, silent=silent, xreg=xreg, xregDo=xregDo, initialX=initialX, @@ -177,11 +184,11 @@ auto.ces <- function(data, models=c("none","simple","full"), if(!silentText){ cat(paste0('"',i,'" ')); } - CESModel[[j]] <- ces(data, seasonality=i, + CESModel[[j]] <- ces(y, seasonality=i, initial=initialType, ic=ic, - cfType=cfType, + loss=loss, h=h, holdout=holdout,cumulative=cumulative, - intervals=intervalsType, level=level, + interval=intervalType, level=level, occurrence=occurrence, oesmodel=oesmodel, bounds=bounds, silent=TRUE, xreg=xreg, xregDo=xregDo, initialX=initialX, @@ -210,19 +217,19 @@ auto.ces <- function(data, models=c("none","simple","full"), yUpperNew <- yUpper; yLowerNew <- yLower; if(cumulative){ - yForecastNew <- ts(rep(yForecast/h,h),start=start(yForecast),frequency=dataFreq) - if(intervals){ - yUpperNew <- ts(rep(yUpper/h,h),start=start(yForecast),frequency=dataFreq) - yLowerNew <- ts(rep(yLower/h,h),start=start(yForecast),frequency=dataFreq) + yForecastNew <- ts(rep(yForecast/h,h),start=yForecastStart,frequency=dataFreq) + if(interval){ + yUpperNew <- ts(rep(yUpper/h,h),start=yForecastStart,frequency=dataFreq) + yLowerNew <- ts(rep(yLower/h,h),start=yForecastStart,frequency=dataFreq) } } - if(intervals){ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, + if(interval){ + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, level=level,legend=!silentLegend,main=modelname,cumulative=cumulative); } else{ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, legend=!silentLegend,main=modelname,cumulative=cumulative); } } diff --git a/R/autogum.R b/R/autogum.R index 498cf10..c00cb09 100644 --- a/R/autogum.R +++ b/R/autogum.R @@ -1,4 +1,4 @@ -utils::globalVariables(c("silentText","silentGraph","silentLegend","initialType","ar.orders","i.orders","ma.orders")); +utils::globalVariables(c("silentText","silentGraph","silentLegend","initialType")); #' Automatic GUM #' @@ -13,6 +13,9 @@ utils::globalVariables(c("silentText","silentGraph","silentLegend","initialType" #' \code{initial="b"}, because optimising GUM of arbitrary order is not a simple #' task. #' +#' For some more information about the model and its implementation, see the +#' vignette: \code{vignette("gum","smooth")} +#' #' @template ssBasicParam #' @template ssAdvancedParam #' @template ssInitialParam @@ -22,14 +25,14 @@ utils::globalVariables(c("silentText","silentGraph","silentLegend","initialType" #' @template ssGeneralRef #' @template ssIntermittentRef #' -#' @param orderMax The value of the max order to check. This is the upper bound +#' @param orders The value of the max order to check. This is the upper bound #' of orders, but the real orders could be lower than this because of the #' increasing number of parameters in the models with higher orders. -#' @param lagMax The value of the maximum lag to check. This should usually be +#' @param lags The value of the maximum lag to check. This should usually be #' a maximum frequency of the data. -#' @param type Type of model. Can either be \code{"Additive"} or -#' \code{"Multiplicative"}. The latter means that the GUM is fitted on -#' log-transformed data. If \code{"Z"}, then this is selected automatically, +#' @param type Type of model. Can either be \code{"additive"} or +#' \code{"multiplicative"}. The latter means that the GUM is fitted on +#' log-transformed data. If \code{"select"}, then this is selected automatically, #' which may slow down things twice. #' @param ... Other non-documented parameters. For example \code{FI=TRUE} will #' make the function also produce Fisher Information matrix, which then can be @@ -44,7 +47,7 @@ utils::globalVariables(c("silentText","silentGraph","silentLegend","initialType" #' x <- rnorm(50,100,3) #' #' # The best GUM model for the data -#' ourModel <- auto.gum(x,orderMax=2,lagMax=4,h=18,holdout=TRUE,intervals="np") +#' ourModel <- auto.gum(x,orders=2,lags=4,h=18,holdout=TRUE,interval="np") #' #' summary(ourModel) #' forecast(ourModel) @@ -52,11 +55,11 @@ utils::globalVariables(c("silentText","silentGraph","silentLegend","initialType" #' #' #' @export auto.gum -auto.gum <- function(data, orderMax=3, lagMax=frequency(data), type=c("A","M","Z"), +auto.gum <- function(y, orders=3, lags=frequency(y), type=c("additive","multiplicative","select"), initial=c("backcasting","optimal"), ic=c("AICc","AIC","BIC","BICc"), - cfType=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), + loss=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), h=10, holdout=FALSE, cumulative=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, occurrence=c("none","auto","fixed","general","odds-ratio","inverse-odds-ratio","direct"), oesmodel="MNN", bounds=c("restricted","admissible","none"), @@ -70,38 +73,50 @@ auto.gum <- function(data, orderMax=3, lagMax=frequency(data), type=c("A","M","Z # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + orders <- depricator(orders, list(...), "ordersMax"); + lags <- depricator(lags, list(...), "lagsMax"); + # Add all the variables in ellipsis to current environment list2env(list(...),environment()); + # If this is Mcomp data, then take the frequency from it + if(any(class(y)=="Mdata") && lags==frequency(y)){ + lags <- frequency(y$x); + } + ##### Set environment for ssInput and make all the checks ##### environment(ssAutoInput) <- environment(); ssAutoInput("auto.gum",ParentEnvironment=environment()); - if(any(is.complex(c(orderMax,lagMax)))){ + if(any(is.complex(c(orders,lags)))){ stop("Complex numbers? Really? Be serious! This is GUM, not CES!",call.=FALSE); } - if(any(c(orderMax)<0)){ + if(any(c(orders)<0)){ stop("Funny guy! How am I gonna construct a model with negative maximum order?",call.=FALSE); } - if(any(c(lagMax)<0)){ + if(any(c(lags)<0)){ stop("Right! Why don't you try complex lags then, mister smart guy?",call.=FALSE); } - if(any(c(lagMax,orderMax)==0)){ + if(any(c(lags,orders)==0)){ stop("Sorry, but we cannot construct GUM model with zero lags / orders.",call.=FALSE); } type <- substr(type[1],1,1); # Check if the multiplictive model is possible - if(any(type==c("Z","M"))){ - if(any(y<=0)){ + if(any(type==c("s","m"))){ + if(any(yInSample<=0)){ warning("Multiplicative model can only be used on positive data. Switching to the additive one.",call.=FALSE); - type <- "A"; + type <- "a"; } - if(type=="Z"){ - type <- c("A","M"); + if(type=="s"){ + type <- c("a","m"); } } @@ -110,23 +125,23 @@ auto.gum <- function(data, orderMax=3, lagMax=frequency(data), type=c("A","M","Z ordersFinal <- list(NA); if(!silentText){ - if(lagMax>12){ - message(paste0("You have large lagMax: ",lagMax,". So, the calculation may take some time.")); - if(lagMax<24){ + if(lags>12){ + message(paste0("You have large lags: ",lags,". So, the calculation may take some time.")); + if(lags<24){ message(paste0("Go get some coffee, or tea, or whatever, while we do the work here.\n")); } else{ message(paste0("Go for a lunch or something, while we do the work here.\n")); } } - if(orderMax>3){ - message(paste0("Beware that you have specified large orderMax: ",orderMax, + if(orders>3){ + message(paste0("Beware that you have specified large orders: ",orders, ". This means that the calculations may take a lot of time.\n")); } } for(t in 1:length(type)){ - ics <- rep(NA,lagMax); + ics <- rep(NA,lags); lagsBest <- NULL if((!silentText) & length(type)!=1){ @@ -134,26 +149,26 @@ auto.gum <- function(data, orderMax=3, lagMax=frequency(data), type=c("A","M","Z } #### Preliminary loop #### - #Checking all the models with lag from 1 to lagMax + #Checking all the models with lag from 1 to lags if(!silentText){ progressBar <- c("/","\u2014","\\","|"); cat("Starting preliminary loop: "); - cat(paste0(rep(" ",9+nchar(lagMax)),collapse="")); + cat(paste0(rep(" ",9+nchar(lags)),collapse="")); } - for(i in 1:lagMax){ - gumModel <- gum(data,orders=c(1),lags=c(i),type=type[t], + for(i in 1:lags){ + gumModel <- gum(y,orders=c(1),lags=c(i),type=type[t], silent=TRUE,h=h,holdout=holdout, - initial=initial,cfType=cfType, + initial=initial,loss=loss, cumulative=cumulative, - intervals=intervalsType, level=level, + interval=intervalType, level=level, occurrence=occurrence, oesmodel=oesmodel, bounds=bounds, xreg=xreg, xregDo=xregDo, initialX=initialX, updateX=updateX, persistenceX=persistenceX, transitionX=transitionX, ...); ics[i] <- gumModel$ICs[ic]; if(!silentText){ - cat(paste0(rep("\b",nchar(paste0(i-1," out of ",lagMax))),collapse="")); - cat(paste0(i," out of ",lagMax)); + cat(paste0(rep("\b",nchar(paste0(i-1," out of ",lags))),collapse="")); + cat(paste0(i," out of ",lags)); } } @@ -165,7 +180,7 @@ auto.gum <- function(data, orderMax=3, lagMax=frequency(data), type=c("A","M","Z lagsBest <- c(which(ics==min(ics)),lagsBest); icsBest <- 1E100; while(min(ics)1){ + if(!is.null(ncol(y))){ + if(ncol(y)>1){ stop("The provided data is not a vector! Can't construct any model!", call.=FALSE); } } # Check the data for NAs - if(any(is.na(data))){ + if(any(is.na(y))){ if(!silentText){ warning("Data contains NAs. These observations will be substituted by zeroes.",call.=FALSE); } - data[is.na(data)] <- 0; + y[is.na(y)] <- 0; } # Define obs, the number of observations of in-sample - obsInsample <- length(data) - holdout*h; + obsInSample <- length(y) - holdout*h; # Define obsAll, the overal number of observations (in-sample + holdout) - obsAll <- length(data) + (1 - holdout)*h; + obsAll <- length(y) + (1 - holdout)*h; - # If obsInsample is negative, this means that we can't do anything... - if(obsInsample<=0){ + # If obsInSample is negative, this means that we can't do anything... + if(obsInSample<=0){ stop("Not enough observations in sample.",call.=FALSE); } # Define the actual values - datafreq <- frequency(data); - dataStart <- start(data); - y <- ts(data[1:obsInsample], start=dataStart, frequency=datafreq); + datafreq <- frequency(y); + dataStart <- start(y); + yInSample <- ts(y[1:obsInSample], start=dataStart, frequency=datafreq); # Order of the model if(!is.null(order)){ - if(obsInsample < order){ + if(obsInSample < order){ stop("Sorry, but we don't have enough observations for that order.",call.=FALSE); } @@ -141,35 +142,35 @@ cma <- function(data, order=NULL, silent=TRUE){ } if(orderSelect){ - order <- orders(sma(y)); + order <- orders(sma(yInSample)); } - if((order %% 2)!=0){ - smaModel <- sma(y, order=order, h=order, holdout=FALSE, cumulative=FALSE, silent=TRUE); + if((order %% 2)!=0 | (order==obsInSample)){ + smaModel <- sma(yInSample, order=order, h=order, holdout=FALSE, cumulative=FALSE, silent=TRUE); yFitted <- c(smaModel$fitted[-c(1:((order+1)/2))],smaModel$forecast); logLik <- smaModel$logLik; errors <- residuals(smaModel); } else{ - ssarimaModel <- msarima(y, orders=c(order+1,0,0), AR=c(0.5,rep(1,order-1),0.5)/order, + ssarimaModel <- msarima(yInSample, orders=c(order+1,0,0), AR=c(0.5,rep(1,order-1),0.5)/order, h=order, holdout=FALSE, silent=TRUE); yFitted <- c(ssarimaModel$fitted[-c(1:(order/2))],ssarimaModel$forecast); - smaModel <- sma(y, order=1, h=order, holdout=FALSE, cumulative=FALSE, silent=TRUE); + smaModel <- sma(yInSample, order=1, h=order, holdout=FALSE, cumulative=FALSE, silent=TRUE); logLik <- ssarimaModel$logLik; errors <- residuals(ssarimaModel); } yForecast <- ts(NA, start=start(smaModel$forecast), frequency=datafreq); - yFitted <- ts(yFitted[1:obsInsample], start=dataStart, frequency=datafreq); + yFitted <- ts(yFitted[1:obsInSample], start=dataStart, frequency=datafreq); modelname <- paste0("CMA(",order,")"); nParam <- smaModel$nParam; - s2 <- sum(errors^2)/(obsInsample - 2); + s2 <- sum(errors^2)/(obsInSample - 2); cfObjective <- mean(errors^2); model <- structure(list(model=modelname,timeElapsed=Sys.time()-startTime, order=order, nParam=nParam, fitted=yFitted,forecast=yForecast,residuals=errors,s2=s2, - actuals=data, - ICs=NULL,logLik=logLik,cf=cfObjective,cfType="MSE"), + y=y, + ICs=NULL,logLik=logLik,lossValue=cfObjective,loss="MSE"), class="smooth"); ICs <- c(AIC(model),AICc(model),BIC(model),BICc(model)); @@ -177,7 +178,7 @@ cma <- function(data, order=NULL, silent=TRUE){ model$ICs <- ICs; if(!silent){ - graphmaker(data, yForecast, yFitted, legend=FALSE, vline=FALSE, + graphmaker(y, yForecast, yFitted, legend=FALSE, vline=FALSE, main=model$model); } diff --git a/R/depricator.R b/R/depricator.R new file mode 100644 index 0000000..7c563b0 --- /dev/null +++ b/R/depricator.R @@ -0,0 +1,29 @@ +depricator <- function(newValue, ellipsis, oldName){ + if(!is.null(ellipsis$data) & oldName=="data"){ + warning("You have provided 'data' parameter. This is deprecated. Please, use 'y' instead.", call.=FALSE); + return(ellipsis$data); + } + else if(!is.null(ellipsis$cfType) & oldName=="cfType"){ + warning("You have provided 'cfType' parameter. This is deprecated. Please, use 'loss' instead.", call.=FALSE); + return(ellipsis$cfType); + } + else if(!is.null(ellipsis$workFast) & oldName=="workFast"){ + warning("You have provided 'workFast' parameter. This is deprecated. Please, use 'fast' instead.", call.=FALSE); + return(ellipsis$workFast); + } + else if(!is.null(ellipsis$lagsMax) & oldName=="lagsMax"){ + warning("You have provided 'lagsMax' parameter. This is deprecated. Please, use 'lags' instead.", call.=FALSE); + return(ellipsis$lagsMax); + } + else if(!is.null(ellipsis$ordersMax) & oldName=="ordersMax"){ + warning("You have provided 'ordersMax' parameter. This is deprecated. Please, use 'orders' instead.", call.=FALSE); + return(ellipsis$ordersMax); + } + else if(!is.null(ellipsis$intervals) & oldName=="intervals"){ + warning("You have provided 'intervals' parameter. This is deprecated. Please, use 'interval' (singular) instead.", call.=FALSE); + return(ellipsis$intervals); + } + else{ + return(newValue); + } +} diff --git a/R/es.R b/R/es.R index c126c71..30643b2 100644 --- a/R/es.R +++ b/R/es.R @@ -1,4 +1,4 @@ -utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dataFreq","initialType", +utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","yInSample","dataFreq","initialType", "yot","maxlag","silent","allowMultiplicative","modelCurrent", "nParamOccurrence","matF","matw","pForecast","errors.mat", "results","s2","FI","occurrence","normalizer", @@ -30,6 +30,15 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' #' For the details see Hyndman et al.(2008). #' +#' For some more information about the model and its implementation, see the +#' vignette: \code{vignette("es","smooth")}. +#' +#' Also, there are posts about the functions of the package smooth on the +#' website of Ivan Svetunkov: +#' \url{https://forecasting.svetunkov.ru/en/tag/smooth/} - they explain the +#' underlying models and how to use the functions. +#' +#' #' @template ssBasicParam #' @template ssAdvancedParam #' @template ssPersistenceParam @@ -41,11 +50,14 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' @template ssETSRef #' @template ssIntervalsRef #' -#' @param model The type of ETS model. Can consist of 3 or 4 chars: \code{ANN}, -#' \code{AAN}, \code{AAdN}, \code{AAA}, \code{AAdA}, \code{MAdM} etc. -#' \code{ZZZ} means that the model will be selected based on the chosen -#' information criteria type. Models pool can be restricted with additive only -#' components. This is done via \code{model="XXX"}. For example, making +#' @param model The type of ETS model. The first letter stands for the type of +#' the error term ("A" or "M"), the second (and sometimes the third as well) is for +#' the trend ("N", "A", "Ad", "M" or "Md"), and the last one is for the type of +#' seasonality ("N", "A" or "M"). So, the function accepts words with 3 or 4 +#' characters: \code{ANN}, \code{AAN}, \code{AAdN}, \code{AAA}, \code{AAdA}, +#' \code{MAdM} etc. \code{ZZZ} means that the model will be selected based on the +#' chosen information criteria type. Models pool can be restricted with additive +#' only components. This is done via \code{model="XXX"}. For example, making #' selection between models with none / additive / damped additive trend #' component only (i.e. excluding multiplicative trend) can be done with #' \code{model="ZXZ"}. Furthermore, selection between multiplicative models @@ -116,9 +128,9 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' \item \code{fitted} - fitted values of ETS. In case of the intermittent model, the #' fitted are multiplied by the probability of occurrence. #' \item \code{forecast} - point forecast of ETS. -#' \item \code{lower} - lower bound of prediction interval. When \code{intervals="none"} +#' \item \code{lower} - lower bound of prediction interval. When \code{interval="none"} #' then NA is returned. -#' \item \code{upper} - higher bound of prediction interval. When \code{intervals="none"} +#' \item \code{upper} - higher bound of prediction interval. When \code{interval="none"} #' then NA is returned. #' \item \code{residuals} - residuals of the estimated model. #' \item \code{errors} - trace forecast in-sample errors, returned as a matrix. In the @@ -126,10 +138,10 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' it is returned just for the information. #' \item \code{s2} - variance of the residuals (taking degrees of freedom into account). #' This is an unbiased estimate of variance. -#' \item \code{intervals} - type of intervals asked by user. -#' \item \code{level} - confidence level for intervals. +#' \item \code{interval} - type of interval asked by user. +#' \item \code{level} - confidence level for interval. #' \item \code{cumulative} - whether the produced forecast was cumulative or not. -#' \item \code{actuals} - original data. +#' \item \code{y} - original data. #' \item \code{holdout} - holdout part of the original data. #' \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. #' If the model is non-intermittent, then occurrence is \code{NULL}. @@ -142,8 +154,8 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' \item \code{transitionX} - transition matrix F for exogenous variables. #' \item \code{ICs} - values of information criteria of the model. Includes AIC, AICc, BIC and BICc. #' \item \code{logLik} - concentrated log-likelihood of the function. -#' \item \code{cf} - cost function value. -#' \item \code{cfType} - type of cost function used in the estimation. +#' \item \code{lossValue} - loss function value. +#' \item \code{loss} - type of loss function used in the estimation. #' \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} or when \code{FI} #' is not provided at all. #' \item \code{accuracy} - vector of accuracy measures for the holdout sample. In @@ -167,15 +179,15 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' \item \code{upper}, #' \item \code{residuals}, #' \item \code{s2} - variance of additive error of combined one-step-ahead forecasts, -#' \item \code{intervals}, +#' \item \code{interval}, #' \item \code{level}, #' \item \code{cumulative}, -#' \item \code{actuals}, +#' \item \code{y}, #' \item \code{holdout}, #' \item \code{occurrence}, #' \item \code{ICs} - combined ic, #' \item \code{ICw} - ic weights used in the combination, -#' \item \code{cfType}, +#' \item \code{loss}, #' \item \code{xreg}, #' \item \code{accuracy}. #' } @@ -187,8 +199,8 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' library(Mcomp) #' #' # See how holdout and trace parameters influence the forecast -#' es(M3$N1245$x,model="AAdN",h=8,holdout=FALSE,cfType="MSE") -#' \dontrun{es(M3$N2568$x,model="MAM",h=18,holdout=TRUE,cfType="TMSE")} +#' es(M3$N1245$x,model="AAdN",h=8,holdout=FALSE,loss="MSE") +#' \dontrun{es(M3$N2568$x,model="MAM",h=18,holdout=TRUE,loss="TMSE")} #' #' # Model selection example #' es(M3$N1245$x,model="ZZN",ic="AIC",h=8,holdout=FALSE,bounds="a") @@ -206,16 +218,16 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' # Model selection using a specified pool of models #' ourModel <- es(M3$N1587$x,model=c("ANN","AAM","AMdA"),h=18) #' -#' # Redo previous model and produce prediction intervals -#' es(M3$N1587$x,model=ourModel,h=18,intervals="p") +#' # Redo previous model and produce prediction interval +#' es(M3$N1587$x,model=ourModel,h=18,interval="p") #' -#' # Semiparametric intervals example -#' \dontrun{es(M3$N1587$x,h=18,holdout=TRUE,intervals="sp")} +#' # Semiparametric interval example +#' \dontrun{es(M3$N1587$x,h=18,holdout=TRUE,interval="sp")} #' #' # Exogenous variables in ETS example #' \dontrun{x <- cbind(c(rep(0,25),1,rep(0,43)),c(rep(0,10),1,rep(0,58))) #' y <- ts(c(M3$N1457$x,M3$N1457$xx),frequency=12) -#' es(y,h=18,holdout=TRUE,xreg=x,cfType="aTMSE",intervals="np") +#' es(y,h=18,holdout=TRUE,xreg=x,loss="aTMSE",interval="np") #' ourModel <- es(ts(c(M3$N1457$x,M3$N1457$xx),frequency=12),h=18,holdout=TRUE,xreg=x,updateX=TRUE)} #' #' # This will be the same model as in previous line but estimated on new portion of data @@ -237,11 +249,11 @@ utils::globalVariables(c("vecg","nComponents","modellags","phiEstimate","y","dat #' plot(forecast(ourModel)) #' #' @export es -es <- function(data, model="ZZZ", persistence=NULL, phi=NULL, +es <- function(y, model="ZZZ", persistence=NULL, phi=NULL, initial=c("optimal","backcasting"), initialSeason=NULL, ic=c("AICc","AIC","BIC","BICc"), - cfType=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), + loss=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), h=10, holdout=FALSE, cumulative=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, occurrence=c("none","auto","fixed","general","odds-ratio","inverse-odds-ratio","direct"), oesmodel="MNN", bounds=c("usual","admissible","none"), @@ -253,20 +265,25 @@ es <- function(data, model="ZZZ", persistence=NULL, phi=NULL, # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + #This overrides the similar thing in ssfunctions.R but only for data generated from sim.es() - if(is.smooth.sim(data)){ - if(smoothType(data)=="ETS"){ - model <- data; - data <- data$data; + if(is.smooth.sim(y)){ + if(smoothType(y)=="ETS"){ + model <- y; + y <- y$data; } } - else if(is.smooth(data)){ - model <- data; - data <- data$actuals; + else if(is.smooth(y)){ + model <- y; + y <- y$y; } # If a previous model provided as a model, write down the variables - if(any(is.smooth(model)) | any(is.smooth.sim(model))){ + if(is.smooth(model) | is.smooth.sim(model)){ if(smoothType(model)!="ETS"){ stop("The provided model is not ETS.",call.=FALSE); } @@ -384,9 +401,9 @@ CF <- function(C){ persistenceEstimate, phiEstimate, initialType=="o", initialSeasonEstimate, xregEstimate, matFX, vecgX, updateX, FXEstimate, gXEstimate, initialXEstimate); - cfRes <- costfunc(elements$matvt, elements$matF, elements$matw, y, elements$vecg, + cfRes <- costfunc(elements$matvt, elements$matF, elements$matw, yInSample, elements$vecg, h, modellags, Etype, Ttype, Stype, - multisteps, cfType, normalizer, initialType, + multisteps, loss, normalizer, initialType, matxt, elements$matat, elements$matFX, elements$vecgX, ot, bounds, elements$errorSD); @@ -425,7 +442,7 @@ CValues <- function(bounds,Ttype,Stype,vecg,matvt,phi,maxlag,nComponents,matat){ else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - C <- c(C,mean(y[1:min(dataFreq,obsInsample)]),0); + C <- c(C,mean(yInSample[1:min(dataFreq,obsInSample)]),0); } else{ C <- c(C,abs(matvt[maxlag,1:(nComponents - (Stype!="N"))])); @@ -478,7 +495,7 @@ CValues <- function(bounds,Ttype,Stype,vecg,matvt,phi,maxlag,nComponents,matat){ else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - C <- c(C,mean(y[1:dataFreq]),0); + C <- c(C,mean(yInSample[1:dataFreq]),0); } else{ C <- c(C,abs(matvt[maxlag,1:(nComponents - (Stype!="N"))])); @@ -531,7 +548,7 @@ CValues <- function(bounds,Ttype,Stype,vecg,matvt,phi,maxlag,nComponents,matat){ else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - C <- c(C,mean(y[1:dataFreq]),0); + C <- c(C,mean(yInSample[1:dataFreq]),0); } else{ C <- c(C,abs(matvt[maxlag,1:(nComponents - (Stype!="N"))])); @@ -602,7 +619,7 @@ BasicMakerES <- function(...){ ellipsis <- list(...); ParentEnvironment <- ellipsis[['ParentEnvironment']]; - basicparams <- initparams(Etype, Ttype, Stype, dataFreq, obsInsample, obsAll, y, + basicparams <- initparams(Etype, Ttype, Stype, dataFreq, obsInSample, obsAll, yInSample, damped, phi, smoothingParameters, initialstates, seasonalCoefs); list2env(basicparams,ParentEnvironment); } @@ -650,7 +667,7 @@ EstimatorES <- function(...){ } if(rounded){ - cfType <- "MSE"; + loss <- "MSE"; } if(any(is.infinite(C))){ @@ -698,7 +715,7 @@ EstimatorES <- function(...){ j <- j+1; } j <- j+1; - C[j] <- mean(y[1:dataFreq]); + C[j] <- mean(yInSample[1:dataFreq]); j <- j+1; C[j] <- 0; } @@ -721,7 +738,7 @@ EstimatorES <- function(...){ C <- c(C,sqrt(CF(C))); CLower <- c(CLower,0); CUpper <- c(CUpper,Inf); - cfType <- "Rounded"; + loss <- "Rounded"; } res2 <- nloptr(C, CF, lb=CLower, ub=CUpper, @@ -767,7 +784,7 @@ XregSelector <- function(listToReturn){ ssFitter(ParentEnvironment=environment()); xregNames <- colnames(matxtOriginal); - xregNew <- cbind(errors,xreg[1:obsInsample,]); + xregNew <- cbind(errors,xreg[1:obsInSample,]); colnames(xregNew)[1] <- "errors"; colnames(xregNew)[-1] <- xregNames; xregNew <- as.data.frame(xregNew); @@ -1239,7 +1256,7 @@ CreatorES <- function(silent=FALSE,...){ initialstates[1,1] <- (mean(yot[1:min(max(dataFreq,12),obsNonzero)]) - initialstates[1,2] * mean(c(1:min(max(dataFreq,12), obsNonzero)))); - if(any(cfType=="LogisticD")){ + if(any(loss=="LogisticD")){ if(all(yot[1:min(max(dataFreq,12),obsNonzero)]==0)){ initialstates[1,1] <- -50; } @@ -1251,7 +1268,7 @@ CreatorES <- function(silent=FALSE,...){ } } if(allowMultiplicative){ - if(any(cfType=="LogisticL")){ + if(any(loss=="LogisticL")){ initialstates[1,3] <- initialstates[1,1]; initialstates[1,4] <- exp(initialstates[1,2]); initialstates[1,3] <- exp((initialstates[1,3] - 0.5)); @@ -1273,10 +1290,10 @@ CreatorES <- function(silent=FALSE,...){ else{ if(initialType!="p"){ initialstates <- matrix(rep(mean(yot[1:min(max(dataFreq,12),obsNonzero)]),4),nrow=1); - if(any(cfType=="LogisticL") & any(initialstates==0)){ + if(any(loss=="LogisticL") & any(initialstates==0)){ initialstates[initialstates==0] <- 0.001; } - if(any(cfType=="LogisticD")){ + if(any(loss=="LogisticD")){ if(all(yot[1:min(max(dataFreq,12),obsNonzero)]==0)){ initialstates[,] <- -50; } @@ -1295,8 +1312,8 @@ CreatorES <- function(silent=FALSE,...){ if(Stype!="N"){ if(is.null(initialSeason)){ initialSeasonEstimate <- TRUE; - seasonalCoefs <- decompose(ts(c(y),frequency=dataFreq),type="additive")$seasonal[1:dataFreq]; - decompositionM <- decompose(ts(c(y),frequency=dataFreq), type="multiplicative"); + seasonalCoefs <- decompose(ts(c(yInSample),frequency=dataFreq),type="additive")$seasonal[1:dataFreq]; + decompositionM <- decompose(ts(c(yInSample),frequency=dataFreq), type="multiplicative"); seasonalCoefs <- cbind(seasonalCoefs,decompositionM$seasonal[1:dataFreq]); seasonalRandomness <- c(min(decompositionM$random,na.rm=TRUE), max(decompositionM$random,na.rm=TRUE)); @@ -1324,23 +1341,23 @@ CreatorES <- function(silent=FALSE,...){ smoothingParameters <- cbind(c(0.1,0.05,0.1),c(0.05,0.01,0.01)); } - if(cfType=="HAM"){ + if(loss=="HAM"){ smoothingParameters <- cbind(rep(0.01,3),rep(0.01,3)); } } ##### Preset yFitted, yForecast, errors and basic parameters ##### - yFitted <- rep(NA,obsInsample); + yFitted <- rep(NA,obsInSample); yForecast <- rep(NA,h); - errors <- rep(NA,obsInsample); + errors <- rep(NA,obsInSample); - basicparams <- initparams(Etype, Ttype, Stype, dataFreq, obsInsample, obsAll, y, + basicparams <- initparams(Etype, Ttype, Stype, dataFreq, obsInSample, obsAll, yInSample, damped, phi, smoothingParameters, initialstates, seasonalCoefs); ##### Prepare exogenous variables ##### - xregdata <- ssXreg(data=data, Etype=Etype, xreg=xreg, updateX=updateX, ot=ot, + xregdata <- ssXreg(y=y, Etype=Etype, xreg=xreg, updateX=updateX, ot=ot, persistenceX=persistenceX, transitionX=transitionX, initialX=initialX, - obsInsample=obsInsample, obsAll=obsAll, obsStates=obsStates, + obsInSample=obsInSample, obsAll=obsAll, obsStates=obsStates, maxlag=basicparams$maxlag, h=h, xregDo=xregDo, silent=silentText, allowMultiplicative=allowMultiplicative); @@ -1547,7 +1564,7 @@ CreatorES <- function(silent=FALSE,...){ persistence <- 0; persistenceEstimate <- FALSE; smoothingParameters <- matrix(0,3,2); - initialValue <- mean(y); + initialValue <- mean(yInSample); initialType <- "p"; initialstates <- matrix(rep(initialValue,2),nrow=1); warning("We did not have enough of non-zero observations, so persistence value was set to zero and initial was preset.", @@ -1567,7 +1584,7 @@ CreatorES <- function(silent=FALSE,...){ persistence <- 0; persistenceEstimate <- FALSE; smoothingParameters <- matrix(0,3,2); - initialValue <- y[y!=0]; + initialValue <- yInSample[yInSample!=0]; initialType <- "p"; initialstates <- matrix(rep(initialValue,2),nrow=1); warning("We did not have enough of non-zero observations, so we used Naive.",call.=FALSE); @@ -1594,14 +1611,14 @@ CreatorES <- function(silent=FALSE,...){ modelCurrent <- model; } else{ - if(!any(cfType==c("MSE","MAE","HAM","MSEh","MAEh","HAMh","MSCE","MACE","CHAM", + if(!any(loss==c("MSE","MAE","HAM","MSEh","MAEh","HAMh","MSCE","MACE","CHAM", "TFL","aTFL","Rounded","TSB","LogisticD","LogisticL"))){ if(modelDo=="combine"){ - warning(paste0("'",cfType,"' is used as cost function instead of 'MSE'.", + warning(paste0("'",loss,"' is used as loss function instead of 'MSE'.", "The produced combination weights may be wrong."),call.=FALSE); } else{ - warning(paste0("'",cfType,"' is used as cost function instead of 'MSE'. ", + warning(paste0("'",loss,"' is used as loss function instead of 'MSE'. ", "The results of the model selection may be wrong."),call.=FALSE); } } @@ -1839,7 +1856,7 @@ CreatorES <- function(silent=FALSE,...){ # If this was rounded values, extract the variance if(rounded){ s2 <- C[length(C)]^2; - s2g <- log(1 + vecg %*% as.vector(errors*ot)) %*% t(log(1 + vecg %*% as.vector(errors*ot)))/obsInsample; + s2g <- log(1 + vecg %*% as.vector(errors*ot)) %*% t(log(1 + vecg %*% as.vector(errors*ot)))/obsInSample; } ssForecaster(ParentEnvironment=environment()); @@ -1990,10 +2007,10 @@ CreatorES <- function(silent=FALSE,...){ # Produce the forecasts using AIC weights modelsNumber <- nrow(icWeights); model.current <- rep(NA,modelsNumber); - fittedList <- matrix(NA,obsInsample,modelsNumber); - # errorsList <- matrix(NA,obsInsample,modelsNumber); + fittedList <- matrix(NA,obsInSample,modelsNumber); + # errorsList <- matrix(NA,obsInSample,modelsNumber); forecastsList <- matrix(NA,h,modelsNumber); - if(intervals){ + if(interval){ lowerList <- matrix(NA,h,modelsNumber); upperList <- matrix(NA,h,modelsNumber); } @@ -2038,7 +2055,7 @@ CreatorES <- function(silent=FALSE,...){ fittedList[,i] <- yFitted; forecastsList[,i] <- yForecast; - if(intervals){ + if(interval){ lowerList[,i] <- yLower; upperList[,i] <- yUpper; } @@ -2049,10 +2066,10 @@ CreatorES <- function(silent=FALSE,...){ forecastsList <- forecastsList[,!badStuff]; model.current <- model.current[!badStuff]; yFitted <- ts(fittedList %*% icWeights[!badStuff,ic],start=dataStart,frequency=dataFreq); - yForecast <- ts(forecastsList %*% icWeights[!badStuff,ic],start=time(data)[obsInsample]+deltat(data),frequency=dataFreq); - errors <- ts(c(y) - yFitted,start=dataStart,frequency=dataFreq); + yForecast <- ts(forecastsList %*% icWeights[!badStuff,ic],start=yForecastStart,frequency=dataFreq); + errors <- ts(c(yInSample) - yFitted,start=dataStart,frequency=dataFreq); s2 <- mean(errors^2); - if(intervals){ + if(interval){ lowerList <- lowerList[,!badStuff]; upperList <- upperList[,!badStuff]; yLower <- ts(lowerList %*% icWeights[!badStuff,ic],start=yForecastStart,frequency=dataFreq); @@ -2093,12 +2110,12 @@ CreatorES <- function(silent=FALSE,...){ ##### Now let's deal with holdout ##### if(holdout){ - yHoldout <- ts(data[(obsInsample+1):obsAll],start=yForecastStart,frequency=dataFreq); + yHoldout <- ts(y[(obsInSample+1):obsAll],start=yForecastStart,frequency=dataFreq); if(cumulative){ - errormeasures <- measures(sum(yHoldout),yForecast,h*y); + errormeasures <- measures(sum(yHoldout),yForecast,h*yInSample); } else{ - errormeasures <- measures(yHoldout,yForecast,y); + errormeasures <- measures(yHoldout,yForecast,yInSample); } if(cumulative){ @@ -2139,18 +2156,18 @@ CreatorES <- function(silent=FALSE,...){ yLowerNew <- yLower; if(cumulative){ yForecastNew <- ts(rep(yForecast/h,h),start=yForecastStart,frequency=dataFreq) - if(intervals){ + if(interval){ yUpperNew <- ts(rep(yUpper/h,h),start=yForecastStart,frequency=dataFreq) yLowerNew <- ts(rep(yLower/h,h),start=yForecastStart,frequency=dataFreq) } } - if(intervals){ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, + if(interval){ + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, level=level,legend=!silentLegend, main=modelnameForGraph, cumulative=cumulative); } else{ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, legend=!silentLegend, main=modelnameForGraph, cumulative=cumulative); } } @@ -2163,21 +2180,21 @@ CreatorES <- function(silent=FALSE,...){ initialType=initialType,initial=initialValue,initialSeason=initialSeason, nParam=parametersNumber, fitted=yFitted,forecast=yForecast,lower=yLower,upper=yUpper,residuals=errors, - errors=errors.mat,s2=s2,intervals=intervalsType,level=level,cumulative=cumulative, - actuals=data,holdout=yHoldout,occurrence=occurrenceModel, + errors=errors.mat,s2=s2,interval=intervalType,level=level,cumulative=cumulative, + y=y,holdout=yHoldout,occurrence=occurrenceModel, xreg=xreg,updateX=updateX,initialX=initialX,persistenceX=persistenceX,transitionX=transitionX, - ICs=ICs,logLik=logLik,cf=cfObjective,cfType=cfType,FI=FI,accuracy=errormeasures); + ICs=ICs,logLik=logLik,lossValue=cfObjective,loss=loss,FI=FI,accuracy=errormeasures); return(structure(model,class="smooth")); } else{ model <- list(model=modelname,formula=esFormula,timeElapsed=Sys.time()-startTime, initialType=initialType, fitted=yFitted,forecast=yForecast, - lower=yLower,upper=yUpper,residuals=errors,s2=s2,intervals=intervalsType,level=level, + lower=yLower,upper=yUpper,residuals=errors,s2=s2,interval=intervalType,level=level, cumulative=cumulative, - actuals=data,holdout=yHoldout,occurrence=occurrenceModel, + y=y,holdout=yHoldout,occurrence=occurrenceModel, xreg=xreg,updateX=updateX, - ICs=ICs,ICw=icWeights,cf=NULL,cfType=cfType,accuracy=errormeasures); + ICs=ICs,ICw=icWeights,lossValue=NULL,loss=loss,accuracy=errormeasures); return(structure(model,class="smooth")); } } diff --git a/R/gsi.R b/R/gsi.R index f1b34c5..e390fd2 100644 --- a/R/gsi.R +++ b/R/gsi.R @@ -5,16 +5,18 @@ #' Function estimates VES in a form of the Single Source of Error state space #' model, restricting the seasonal indices. The model is based on \link[smooth]{ves} #' -#' #' In case of multiplicative model, instead of the vector y_t we use its logarithms. #' As a result the multiplicative model is much easier to work with. #' +#' For some more information about the model and its implementation, see the +#' vignette: \code{vignette("ves","smooth")} +#' #' @template ssAuthor #' @template vssKeywords #' #' @template vssGeneralRef #' -#' @param data The matrix with data, where series are in columns and +#' @param y The matrix with data, where series are in columns and #' observations are in rows. #' @param model The type of seasonal ETS model. Currently only "MMM" is available. #' @param weights The vector of weights for seasonal indices of the length equal to @@ -24,25 +26,25 @@ #' @param holdout If \code{TRUE}, holdout sample of size \code{h} is taken from #' the end of the data. #' @param ic The information criterion used in the model selection procedure. -#' @param intervals Type of intervals to construct. NOT AVAILABLE YET! +#' @param interval Type of interval to construct. NOT AVAILABLE YET! #' #' This can be: #' #' \itemize{ #' \item \code{none}, aka \code{n} - do not produce prediction -#' intervals. +#' interval. #' \item \code{conditional}, \code{c} - produces multidimensional elliptic -#' intervals for each step ahead forecast. +#' interval for each step ahead forecast. #' \item \code{unconditional}, \code{u} - produces separate bounds for each series #' based on ellipses for each step ahead. These bounds correspond to min and max #' values of the ellipse assuming that all the other series but one take values in #' the centre of the ellipse. This leads to less accurate estimates of bounds -#' (wider intervals than needed), but these could still be useful. -#' \item \code{independent}, \code{i} - produces intervals based on variances of +#' (wider interval than needed), but these could still be useful. +#' \item \code{independent}, \code{i} - produces interval based on variances of #' each separate series. This does not take vector structure into account. #' } #' The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -#' conditional intervals are constructed, while the latter is equivalent to +#' conditional interval are constructed, while the latter is equivalent to #' \code{none}. #' @param level Confidence level. Defines width of prediction interval. #' @param silent If \code{silent="none"}, then nothing is silent, everything is @@ -55,7 +57,7 @@ #' \code{silent="all"}, while \code{silent=FALSE} is equivalent to #' \code{silent="none"}. The parameter also accepts first letter of words ("n", #' "a", "g", "l", "o"). -#' @param cfType Type of Cost Function used in optimization. \code{cfType} can +#' @param loss Type of Cost Function used in optimization. \code{loss} can #' be: #' \itemize{ #' \item \code{likelihood} - which assumes the minimisation of the determinant @@ -85,7 +87,7 @@ #' \item \code{initial} - The initial values of the non-seasonal components; #' \item \code{initialSeason} - The initial values of the seasonal components; #' \item \code{nParam} - The number of estimated parameters; -#' \item \code{actuals} - The matrix with the original data; +#' \item \code{y} - The matrix with the original data; #' \item \code{fitted} - The matrix of the fitted values; #' \item \code{holdout} - The matrix with the holdout values (if \code{holdout=TRUE} in #' the estimation); @@ -93,13 +95,13 @@ #' \item \code{Sigma} - The covariance matrix of the errors (estimated with the correction #' for the number of degrees of freedom); #' \item \code{forecast} - The matrix of point forecasts; -#' \item \code{PI} - The bounds of the prediction intervals; -#' \item \code{intervals} - The type of the constructed prediction intervals; -#' \item \code{level} - The level of the confidence for the prediction intervals; +#' \item \code{PI} - The bounds of the prediction interval; +#' \item \code{interval} - The type of the constructed prediction interval; +#' \item \code{level} - The level of the confidence for the prediction interval; #' \item \code{ICs} - The values of the information criteria; #' \item \code{logLik} - The log-likelihood function; -#' \item \code{cf} - The value of the cost function; -#' \item \code{cfType} - The type of the used cost function; +#' \item \code{lossValue} - The value of the loss function; +#' \item \code{loss} - The type of the used loss function; #' \item \code{accuracy} - the values of the error measures. Currently not available. #' \item \code{FI} - Fisher information if user asked for it using \code{FI=TRUE}. #' } @@ -126,11 +128,11 @@ #' \dontrun{gsi(Y, h=10, holdout=TRUE, interval="u", silent=FALSE)} #' #' @export -gsi <- function(data, model="MNM", weights=1/ncol(data), +gsi <- function(y, model="MNM", weights=1/ncol(y), type=c(3,2,1), - cfType=c("likelihood","diagonal","trace"), + loss=c("likelihood","diagonal","trace"), ic=c("AICc","AIC","BIC","BICc"), h=10, holdout=FALSE, - intervals=c("none","conditional","unconditional","independent"), level=0.95, + interval=c("none","conditional","unconditional","independent"), level=0.95, bounds=c("admissible","usual","none"), silent=c("all","graph","output","none"), ...){ # Copyright (C) 2018 - Inf Ivan Svetunkov @@ -138,6 +140,11 @@ gsi <- function(data, model="MNM", weights=1/ncol(data), # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + # If a previous model provided as a model, write down the variables # if(any(class(model)=="vsmooth")){ # if(smoothType(model)!="GSI"){ @@ -182,8 +189,8 @@ gsi <- function(data, model="MNM", weights=1/ncol(data), CF <- function(A){ elements <- BasicInitialiserGSI(matvt,matF,matG,matW,A); - cfRes <- vOptimiserWrap(y, elements$matvt, elements$matF, elements$matW, elements$matG, - modelLags, "A", "A", "A", cfType, normalizer, bounds, ot, otObs); + cfRes <- vOptimiserWrap(yInSample, elements$matvt, elements$matF, elements$matW, elements$matG, + modelLags, "A", "A", "A", loss, normalizer, bounds, ot, otObs); # multisteps, initialType, bounds, if(is.nan(cfRes) | is.na(cfRes) | is.infinite(cfRes)){ @@ -273,13 +280,13 @@ BasicMakerGSI <- function(...){ "_",statesNames),"seasonal"),NULL)); ## Deal with non-seasonal part of the vector of states XValues <- rbind(rep(1,obsInSample),c(1:obsInSample)); - initialValue <- y %*% t(XValues) %*% solve(XValues %*% t(XValues)); + initialValue <- yInSample %*% t(XValues) %*% solve(XValues %*% t(XValues)); initialValue <- matrix(as.vector(t(initialValue)),nComponentsNonSeasonal * nSeries,1); ## Deal with seasonal part of the vector of states # Matrix of dummies for seasons XValues <- matrix(rep(diag(maxlag),ceiling(obsInSample/maxlag)),maxlag)[,1:obsInSample]; - initialSeasonValue <- (y-rowMeans(y)) %*% t(XValues) %*% solve(XValues %*% t(XValues)); + initialSeasonValue <- (yInSample-rowMeans(yInSample)) %*% t(XValues) %*% solve(XValues %*% t(XValues)); initialSeasonValue <- matrix(colMeans(initialSeasonValue),1,maxlag); ### modelLags @@ -367,7 +374,7 @@ EstimatorGSI <- function(...){ names(A) <- AList$ANames; # First part is for the covariance matrix - if(cfType=="l"){ + if(loss=="l"){ nParam <- nSeries * (nSeries + 1) / 2 + length(A); } else{ @@ -444,61 +451,62 @@ CreatorGSI <- function(silent=FALSE,...){ } #### Check data #### - if(any(is.vsmooth.sim(data))){ - data <- data$data; - if(length(dim(data))==3){ + if(any(is.vsmooth.sim(y))){ + y <- y$data; + if(length(dim(y))==3){ warning("Simulated data contains several samples. Selecting a random one.",call.=FALSE); - data <- ts(data[,,runif(1,1,dim(data)[3])]); + y <- ts(y[,,runif(1,1,dim(y)[3])]); } } - if(!is.data.frame(data)){ - if(!is.numeric(data)){ + if(!is.data.frame(y)){ + if(!is.numeric(y)){ stop("The provided data is not a numeric matrix! Can't construct any model!", call.=FALSE); } } - if(is.null(dim(data))){ + if(is.null(dim(y))){ stop("The provided data is not a matrix or a data.frame! If it is a vector, please use es() function instead.", call.=FALSE); } - if(is.data.frame(data)){ - data <- as.matrix(data); + if(is.data.frame(y)){ + y <- as.matrix(y); } # Number of series in the matrix - nSeries <- ncol(data); + nSeries <- ncol(y); - if(is.null(ncol(data))){ + if(is.null(ncol(y))){ stop("The provided data is not a matrix! Use es() function instead!", call.=FALSE); } - if(ncol(data)==1){ + if(ncol(y)==1){ stop("The provided data contains only one column. Use es() function instead!", call.=FALSE); } # Check the data for NAs - if(any(is.na(data))){ + if(any(is.na(y))){ if(!silentText){ warning("Data contains NAs. These observations will be substituted by zeroes.", call.=FALSE); } - data[is.na(data)] <- 0; + y[is.na(y)] <- 0; } # Define obs, the number of observations of in-sample - obsInSample <- nrow(data) - holdout*h; + obsInSample <- nrow(y) - holdout*h; # Define obsAll, the overal number of observations (in-sample + holdout) - obsAll <- nrow(data) + (1 - holdout)*h; + obsAll <- nrow(y) + (1 - holdout)*h; # If obsInSample is negative, this means that we can't do anything... if(obsInSample<=0){ stop("Not enough observations in sample.", call.=FALSE); } # Define the actual values. Transpose the matrix! - y <- matrix(data[1:obsInSample,],nSeries,obsInSample,byrow=TRUE); - dataFreq <- frequency(data); - dataDeltat <- deltat(data); - dataStart <- start(data); - dataNames <- colnames(data); + yInSample <- matrix(y[1:obsInSample,],nSeries,obsInSample,byrow=TRUE); + dataFreq <- frequency(y); + dataDeltat <- deltat(y); + dataStart <- start(y); + yForecastStart <- time(y)[obsInSample]+deltat(y); + dataNames <- colnames(y); if(!is.null(dataNames)){ dataNames <- gsub(" ", "_", dataNames, fixed = TRUE); dataNames <- gsub(":", "_", dataNames, fixed = TRUE); @@ -508,8 +516,8 @@ CreatorGSI <- function(silent=FALSE,...){ dataNames <- paste0("Series",c(1:nSeries)); } - if(all(y>0)){ - y <- log(y); + if(all(yInSample>0)){ + yInSample <- log(yInSample); } else{ stop("Cannot apply multiplicative model to the non-positive data", call.=FALSE); @@ -530,14 +538,14 @@ CreatorGSI <- function(silent=FALSE,...){ } ##### Cost function type ##### - cfType <- cfType[1]; - if(!any(cfType==c("likelihood","diagonal","trace","l","d","t"))){ - warning(paste0("Strange cost function specified: ",cfType,". Switching to 'likelihood'."),call.=FALSE); - cfType <- "likelihood"; + loss <- loss[1]; + if(!any(loss==c("likelihood","diagonal","trace","l","d","t"))){ + warning(paste0("Strange loss function specified: ",loss,". Switching to 'likelihood'."),call.=FALSE); + loss <- "likelihood"; } - cfType <- substr(cfType,1,1); + loss <- substr(loss,1,1); - normalizer <- sum(colMeans(abs(diff(t(y))),na.rm=TRUE)); + normalizer <- sum(colMeans(abs(diff(t(yInSample))),na.rm=TRUE)); ##### Define the main variables ##### # For now we only have level and trend. The seasonal component is common to all the series @@ -551,7 +559,7 @@ CreatorGSI <- function(silent=FALSE,...){ FI <- FALSE; ##### Non-intermittent model, please! - ot <- matrix(1,nrow=nrow(y),ncol=ncol(y)); + ot <- matrix(1,nrow=nrow(yInSample),ncol=ncol(yInSample)); otObs <- matrix(obsInSample,nSeries,nSeries); intermittent <- "n"; imodel <- NULL; @@ -565,42 +573,42 @@ CreatorGSI <- function(silent=FALSE,...){ ic <- "AICc"; } -##### intervals, intervalsType, level ##### - intervalsType <- intervals[1]; +##### interval, intervalType, level ##### + intervalType <- interval[1]; # Check the provided type of interval - if(is.logical(intervalsType)){ - if(intervalsType){ - intervalsType <- "c"; + if(is.logical(intervalType)){ + if(intervalType){ + intervalType <- "c"; } else{ - intervalsType <- "none"; + intervalType <- "none"; } } - if(all(intervalsType!=c("c","u","i","n","none","conditional","unconditional","independent"))){ - warning(paste0("Wrong type of interval: '",intervalsType, "'. Switching to 'conditional'."),call.=FALSE); - intervalsType <- "c"; + if(all(intervalType!=c("c","u","i","n","none","conditional","unconditional","independent"))){ + warning(paste0("Wrong type of interval: '",intervalType, "'. Switching to 'conditional'."),call.=FALSE); + intervalType <- "c"; } - if(intervalsType=="none"){ - intervalsType <- "n"; - intervals <- FALSE; + if(intervalType=="none"){ + intervalType <- "n"; + interval <- FALSE; } - else if(intervalsType=="conditional"){ - intervalsType <- "c"; - intervals <- TRUE; + else if(intervalType=="conditional"){ + intervalType <- "c"; + interval <- TRUE; } - else if(intervalsType=="unconditional"){ - intervalsType <- "u"; - intervals <- TRUE; + else if(intervalType=="unconditional"){ + intervalType <- "u"; + interval <- TRUE; } - else if(intervalsType=="independent"){ - intervalsType <- "i"; - intervals <- TRUE; + else if(intervalType=="independent"){ + intervalType <- "i"; + interval <- TRUE; } else{ - intervals <- TRUE; + interval <- TRUE; } if(level>1){ @@ -616,7 +624,7 @@ CreatorGSI <- function(silent=FALSE,...){ -##### Preset y.fit, y.for, errors and basic parameters ##### +##### Preset yFitted, yForecast, errors and basic parameters ##### yFitted <- matrix(NA,nSeries,obsInSample); yForecast <- matrix(NA,nSeries,h); errors <- matrix(NA,nSeries,obsInSample); @@ -667,30 +675,30 @@ CreatorGSI <- function(silent=FALSE,...){ parametersNumber[1,1] <- parametersNumber[1,1] + length(unique(as.vector(initialSeasonValue))); - matvt <- ts(t(matvt),start=(time(data)[1] - dataDeltat*maxlag),frequency=dataFreq); + matvt <- ts(t(matvt),start=(time(y)[1] - dataDeltat*maxlag),frequency=dataFreq); yFitted <- ts(t(yFitted),start=dataStart,frequency=dataFreq); errors <- ts(t(errors),start=dataStart,frequency=dataFreq); - yForecast <- ts(t(yForecast),start=time(data)[obsInSample] + dataDeltat,frequency=dataFreq); + yForecast <- ts(t(yForecast),start=yForecastStart,frequency=dataFreq); if(!is.matrix(yForecast)){ yForecast <- as.matrix(yForecast,h,nSeries); } colnames(yForecast) <- dataNames; - forecastStart <- start(yForecast) - if(any(intervalsType==c("i","u"))){ - PI <- ts(PI,start=forecastStart,frequency=dataFreq); + yForecastStart <- start(yForecast) + if(any(intervalType==c("i","u"))){ + PI <- ts(PI,start=yForecastStart,frequency=dataFreq); } - if(cfType=="l"){ - cfType <- "likelihood"; + if(loss=="l"){ + loss <- "likelihood"; parametersNumber[1,1] <- parametersNumber[1,1] + nSeries * (nSeries + 1) / 2; } - else if(cfType=="d"){ - cfType <- "diagonal"; + else if(loss=="d"){ + loss <- "diagonal"; parametersNumber[1,1] <- parametersNumber[1,1] + nSeries; } else{ - cfType <- "trace"; + loss <- "trace"; parametersNumber[1,1] <- parametersNumber[1,1] + nSeries; } @@ -699,16 +707,16 @@ CreatorGSI <- function(silent=FALSE,...){ ##### Now let's deal with the holdout ##### if(holdout){ - yHoldout <- ts(data[(obsInSample+1):obsAll,],start=forecastStart,frequency=dataFreq); + yHoldout <- ts(y[(obsInSample+1):obsAll,],start=yForecastStart,frequency=dataFreq); colnames(yHoldout) <- dataNames; - measureFirst <- measures(yHoldout[,1],yForecast[,1],y[1,]); + measureFirst <- measures(yHoldout[,1],yForecast[,1],yInSample[1,]); errorMeasures <- matrix(NA,nSeries,length(measureFirst)); rownames(errorMeasures) <- dataNames; colnames(errorMeasures) <- names(measureFirst); errorMeasures[1,] <- measureFirst; for(i in 2:nSeries){ - errorMeasures[i,] <- measures(yHoldout[,i],yForecast[,i],y[i,]); + errorMeasures[i,] <- measures(yHoldout[,i],yForecast[,i],yInSample[i,]); } } else{ @@ -744,37 +752,37 @@ CreatorGSI <- function(silent=FALSE,...){ for(j in 1:pages){ par(mar=c(4,4,2,1),mfcol=c(perPage,1)); for(i in packs[j]:(packs[j+1]-1)){ - if(any(intervalsType==c("u","i"))){ - plotRange <- range(min(data[,i],yForecast[,i],yFitted[,i],PI[,i*2-1]), - max(data[,i],yForecast[,i],yFitted[,i],PI[,i*2])); + if(any(intervalType==c("u","i"))){ + plotRange <- range(min(y[,i],yForecast[,i],yFitted[,i],PI[,i*2-1]), + max(y[,i],yForecast[,i],yFitted[,i],PI[,i*2])); } else{ - plotRange <- range(min(data[,i],yForecast[,i],yFitted[,i]), - max(data[,i],yForecast[,i],yFitted[,i])); + plotRange <- range(min(y[,i],yForecast[,i],yFitted[,i]), + max(y[,i],yForecast[,i],yFitted[,i])); } - plot(data[,i],main=paste0(modelname," ",dataNames[i]),ylab="Y", - ylim=plotRange, xlim=range(time(data[,i])[1],time(yForecast)[max(h,1)]), + plot(y[,i],main=paste0(modelname," ",dataNames[i]),ylab="Y", + ylim=plotRange, xlim=range(time(y[,i])[1],time(yForecast)[max(h,1)]), type="l"); lines(yFitted[,i],col="purple",lwd=2,lty=2); if(h>1){ - if(any(intervalsType==c("u","i"))){ + if(any(intervalType==c("u","i"))){ lines(PI[,i*2-1],col="darkgrey",lwd=3,lty=2); lines(PI[,i*2],col="darkgrey",lwd=3,lty=2); - polygon(c(seq(dataDeltat*(forecastStart[2]-1)+forecastStart[1],dataDeltat*(end(yForecast)[2]-1)+end(yForecast)[1],dataDeltat), - rev(seq(dataDeltat*(forecastStart[2]-1)+forecastStart[1],dataDeltat*(end(yForecast)[2]-1)+end(yForecast)[1],dataDeltat))), + polygon(c(seq(dataDeltat*(yForecastStart[2]-1)+yForecastStart[1],dataDeltat*(end(yForecast)[2]-1)+end(yForecast)[1],dataDeltat), + rev(seq(dataDeltat*(yForecastStart[2]-1)+yForecastStart[1],dataDeltat*(end(yForecast)[2]-1)+end(yForecast)[1],dataDeltat))), c(as.vector(PI[,i*2]), rev(as.vector(PI[,i*2-1]))), col = "lightgray", border=NA, density=10); } lines(yForecast[,i],col="blue",lwd=2); } else{ - if(any(intervalsType==c("u","i"))){ + if(any(intervalType==c("u","i"))){ points(PI[,i*2-1],col="darkgrey",lwd=3,pch=4); points(PI[,i*2],col="darkgrey",lwd=3,pch=4); } points(yForecast[,i],col="blue",lwd=2,pch=4); } - abline(v=dataDeltat*(forecastStart[2]-2)+forecastStart[1],col="red",lwd=2); + abline(v=dataDeltat*(yForecastStart[2]-2)+yForecastStart[1],col="red",lwd=2); } } par(parDefault); @@ -786,9 +794,9 @@ CreatorGSI <- function(silent=FALSE,...){ persistence=persistenceValue, initial=initialValue, initialSeason=initialSeasonValue, nParam=parametersNumber, - actuals=data,fitted=yFitted,holdout=yHoldout,residuals=errors,Sigma=Sigma, - forecast=yForecast,PI=PI,intervals=intervalsType,level=level, - ICs=ICs,logLik=logLik,cf=cfObjective,cfType=cfType,accuracy=errorMeasures, + y=y,fitted=yFitted,holdout=yHoldout,residuals=errors,Sigma=Sigma, + forecast=yForecast,PI=PI,interval=intervalType,level=level, + ICs=ICs,logLik=logLik,lossValue=cfObjective,loss=loss,accuracy=errorMeasures, FI=FI); return(structure(model,class=c("vsmooth","smooth"))); } diff --git a/R/gum.R b/R/gum.R index 7618dbf..d642b3a 100644 --- a/R/gum.R +++ b/R/gum.R @@ -1,5 +1,5 @@ utils::globalVariables(c("measurementEstimate","transitionEstimate", "C", - "persistenceEstimate","obsAll","obsInsample","multisteps","ot","obsNonzero","ICs","cfObjective", + "persistenceEstimate","obsAll","obsInSample","multisteps","ot","obsNonzero","ICs","cfObjective", "yForecast","yLower","yUpper","normalizer","yForecastStart")); #' Generalised Univariate Model @@ -25,6 +25,8 @@ utils::globalVariables(c("measurementEstimate","transitionEstimate", "C", #' \eqn{F_{X}} is the \code{transitionX} matrix and \eqn{g_{X}} is the #' \code{persistenceX} matrix. Finally, \eqn{\epsilon_{t}} is the error term. #' +#' For some more information about the model and its implementation, see the +#' vignette: \code{vignette("gum","smooth")} #' #' @template ssBasicParam #' @template ssAdvancedParam @@ -83,17 +85,17 @@ utils::globalVariables(c("measurementEstimate","transitionEstimate", "C", #' \item \code{fitted} - fitted values. #' \item \code{forecast} - point forecast. #' \item \code{lower} - lower bound of prediction interval. When -#' \code{intervals="none"} then NA is returned. +#' \code{interval="none"} then NA is returned. #' \item \code{upper} - higher bound of prediction interval. When -#' \code{intervals="none"} then NA is returned. +#' \code{interval="none"} then NA is returned. #' \item \code{residuals} - the residuals of the estimated model. #' \item \code{errors} - matrix of 1 to h steps ahead errors. #' \item \code{s2} - variance of the residuals (taking degrees of freedom #' into account). -#' \item \code{intervals} - type of intervals asked by user. -#' \item \code{level} - confidence level for intervals. +#' \item \code{interval} - type of interval asked by user. +#' \item \code{level} - confidence level for interval. #' \item \code{cumulative} - whether the produced forecast was cumulative or not. -#' \item \code{actuals} - original data. +#' \item \code{y} - original data. #' \item \code{holdout} - holdout part of the original data. #' \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. #' If the model is non-intermittent, then occurrence is \code{NULL}. @@ -107,8 +109,8 @@ utils::globalVariables(c("measurementEstimate","transitionEstimate", "C", #' \item \code{ICs} - values of information criteria of the model. Includes #' AIC, AICc, BIC and BICc. #' \item \code{logLik} - log-likelihood of the function. -#' \item \code{cf} - Cost function value. -#' \item \code{cfType} - Type of cost function used in the estimation. +#' \item \code{lossValue} - Cost function value. +#' \item \code{loss} - Type of loss function used in the estimation. #' \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} or #' when \code{FI} variable is not provided at all. #' \item \code{accuracy} - vector of accuracy measures for the holdout sample. @@ -124,20 +126,20 @@ utils::globalVariables(c("measurementEstimate","transitionEstimate", "C", #' @examples #' #' # Something simple: -#' gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="a",intervals="p") +#' gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="a",interval="p") #' #' # A more complicated model with seasonality #' \dontrun{ourModel <- gum(rnorm(118,100,3),orders=c(2,1),lags=c(1,4),h=18,holdout=TRUE)} #' -#' # Redo previous model on a new data and produce prediction intervals -#' \dontrun{gum(rnorm(118,100,3),model=ourModel,h=18,intervals="sp")} +#' # Redo previous model on a new data and produce prediction interval +#' \dontrun{gum(rnorm(118,100,3),model=ourModel,h=18,interval="sp")} #' #' # Produce something crazy with optimal initials (not recommended) #' \dontrun{gum(rnorm(118,100,3),orders=c(1,1,1),lags=c(1,3,5),h=18,holdout=TRUE,initial="o")} #' -#' # Simpler model estiamted using trace forecast error cost function and its analytical analogue -#' \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",cfType="TMSE") -#' gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",cfType="aTMSE")} +#' # Simpler model estiamted using trace forecast error loss function and its analytical analogue +#' \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",loss="TMSE") +#' gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",loss="aTMSE")} #' #' # Introduce exogenous variables #' \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,xreg=c(1:118))} @@ -146,8 +148,8 @@ utils::globalVariables(c("measurementEstimate","transitionEstimate", "C", #' \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,xreg=c(1:118),updateX=TRUE)} #' #' # Do the same but now let's shrink parameters... -#' \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,xreg=c(1:118),updateX=TRUE,cfType="TMSE") -#' ourModel <- gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,cfType="aTMSE")} +#' \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,xreg=c(1:118),updateX=TRUE,loss="TMSE") +#' ourModel <- gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,loss="aTMSE")} #' #' # Or select the most appropriate one #' \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,xreg=c(1:118),xregDo="s") @@ -158,12 +160,12 @@ utils::globalVariables(c("measurementEstimate","transitionEstimate", "C", #' #' @rdname gum #' @export gum -gum <- function(data, orders=c(1,1), lags=c(1,frequency(data)), type=c("A","M"), +gum <- function(y, orders=c(1,1), lags=c(1,frequency(y)), type=c("additive","multiplicative"), persistence=NULL, transition=NULL, measurement=NULL, initial=c("optimal","backcasting"), ic=c("AICc","AIC","BIC","BICc"), - cfType=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), + loss=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), h=10, holdout=FALSE, cumulative=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, occurrence=c("none","auto","fixed","general","odds-ratio","inverse-odds-ratio","direct"), oesmodel="MNN", bounds=c("restricted","admissible","none"), @@ -177,6 +179,11 @@ gum <- function(data, orders=c(1,1), lags=c(1,frequency(data)), type=c("A","M"), # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + # Add all the variables in ellipsis to current environment list2env(list(...),environment()); @@ -318,9 +325,9 @@ CF <- function(C){ matFX <- elements$matFX; vecgX <- elements$vecgX; - cfRes <- costfunc(matvt, matF, matw, y, vecg, + cfRes <- costfunc(matvt, matF, matw, yInSample, vecg, h, modellags, Etype, Ttype, Stype, - multisteps, cfType, normalizer, initialType, + multisteps, loss, normalizer, initialType, matxt, matat, matFX, vecgX, ot, bounds, 0); @@ -459,14 +466,14 @@ CreatorGUM <- function(silentText=FALSE,...){ ##### Preset yFitted, yForecast, errors and basic parameters ##### matvt <- matrix(NA,nrow=obsStates,ncol=nComponents); - yFitted <- rep(NA,obsInsample); + yFitted <- rep(NA,obsInSample); yForecast <- rep(NA,h); - errors <- rep(NA,obsInsample); + errors <- rep(NA,obsInSample); ##### Prepare exogenous variables ##### - xregdata <- ssXreg(data=data, xreg=xreg, updateX=updateX, ot=ot, + xregdata <- ssXreg(y=y, xreg=xreg, updateX=updateX, ot=ot, persistenceX=persistenceX, transitionX=transitionX, initialX=initialX, - obsInsample=obsInsample, obsAll=obsAll, obsStates=obsStates, + obsInSample=obsInSample, obsAll=obsAll, obsStates=obsStates, maxlag=maxlag, h=h, xregDo=xregDo, silent=silentText); if(xregDo=="u"){ @@ -546,9 +553,9 @@ CreatorGUM <- function(silentText=FALSE,...){ # If this is tiny sample, use SES instead if(tinySample){ warning("Not enough observations to fit GUM Switching to ETS(A,N,N).",call.=FALSE); - return(es(data,"ANN",initial=initial,cfType=cfType, + return(es(y,"ANN",initial=initial,loss=loss, h=h,holdout=holdout,cumulative=cumulative, - intervals=intervals,level=level, + interval=interval,level=level, occurrence=occurrence, oesmodel=oesmodel, bounds="u", @@ -748,7 +755,7 @@ CreatorGUM <- function(silentText=FALSE,...){ ssForecaster(ParentEnvironment=environment()); if(modelIsMultiplicative){ - y <- exp(y); + yInSample <- exp(yInSample); yFitted <- exp(yFitted); yForecast <- exp(yForecast); yLower <- exp(yLower); @@ -807,7 +814,7 @@ CreatorGUM <- function(silentText=FALSE,...){ } # Make some preparations - matvt <- ts(matvt,start=(time(data)[1] - deltat(data)*maxlag),frequency=frequency(data)); + matvt <- ts(matvt,start=(time(y)[1] - deltat(y)*maxlag),frequency=dataFreq); if(!is.null(xreg)){ matvt <- cbind(matvt,matat); colnames(matvt) <- c(paste0("Component ",c(1:nComponents)),colnames(matat)); @@ -834,12 +841,12 @@ CreatorGUM <- function(silentText=FALSE,...){ ##### Deal with the holdout sample ##### if(holdout){ - yHoldout <- ts(data[(obsInsample+1):obsAll],start=yForecastStart,frequency=frequency(data)); + yHoldout <- ts(y[(obsInSample+1):obsAll],start=yForecastStart,frequency=dataFreq); if(cumulative){ - errormeasures <- measures(sum(yHoldout),yForecast,h*y); + errormeasures <- measures(sum(yHoldout),yForecast,h*yInSample); } else{ - errormeasures <- measures(yHoldout,yForecast,y); + errormeasures <- measures(yHoldout,yForecast,yInSample); } if(cumulative){ @@ -887,18 +894,18 @@ CreatorGUM <- function(silentText=FALSE,...){ yLowerNew <- yLower; if(cumulative){ yForecastNew <- ts(rep(yForecast/h,h),start=yForecastStart,frequency=dataFreq) - if(intervals){ + if(interval){ yUpperNew <- ts(rep(yUpper/h,h),start=yForecastStart,frequency=dataFreq) yLowerNew <- ts(rep(yLower/h,h),start=yForecastStart,frequency=dataFreq) } } - if(intervals){ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, + if(interval){ + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, level=level,legend=!silentLegend,main=modelname,cumulative=cumulative); } else{ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, legend=!silentLegend,main=modelname,cumulative=cumulative); } } @@ -909,10 +916,10 @@ CreatorGUM <- function(silentText=FALSE,...){ initialType=initialType,initial=initialValue, nParam=parametersNumber, fitted=yFitted,forecast=yForecast,lower=yLower,upper=yUpper,residuals=errors, - errors=errors.mat,s2=s2,intervals=intervalsType,level=level,cumulative=cumulative, - actuals=data,holdout=yHoldout,occurrence=occurrenceModel, + errors=errors.mat,s2=s2,interval=intervalType,level=level,cumulative=cumulative, + y=y,holdout=yHoldout,occurrence=occurrenceModel, xreg=xreg,updateX=updateX,initialX=initialX,persistenceX=persistenceX,transitionX=transitionX, - ICs=ICs,logLik=logLik,cf=cfObjective,cfType=cfType,FI=FI,accuracy=errormeasures); + ICs=ICs,logLik=logLik,lossValue=cfObjective,loss=loss,FI=FI,accuracy=errormeasures); return(structure(model,class="smooth")); } diff --git a/R/iss.R b/R/iss.R index 9bc851f..4514105 100644 --- a/R/iss.R +++ b/R/iss.R @@ -1,4 +1,4 @@ -utils::globalVariables(c("y","obs","occurrenceModelProvided","occurrenceModel","occurrenceModel")) +utils::globalVariables(c("yInSample","obs","occurrenceModelProvided","occurrenceModel","occurrenceModel")) intermittentParametersSetter <- function(occurrence="n",...){ # Function returns basic parameters based on occurrence type @@ -6,9 +6,9 @@ intermittentParametersSetter <- function(occurrence="n",...){ ParentEnvironment <- ellipsis[['ParentEnvironment']]; if(all(occurrence!=c("n","p"))){ - ot <- (y!=0)*1; + ot <- (yInSample!=0)*1; obsNonzero <- sum(ot); - obsZero <- obsInsample - obsNonzero; + obsZero <- obsInSample - obsNonzero; # 1 parameter for estimating initial probability. Works for the fixed probability model nParamOccurrence <- 1; if(any(occurrence==c("o","i","d"))){ @@ -20,22 +20,22 @@ intermittentParametersSetter <- function(occurrence="n",...){ nParamOccurrence <- nParamOccurrence + 3; } # Demand sizes - yot <- matrix(y[y!=0],obsNonzero,1); + yot <- matrix(yInSample[yInSample!=0],obsNonzero,1); if(!occurrenceModelProvided){ - pFitted <- matrix(mean(ot),obsInsample,1); + pFitted <- matrix(mean(ot),obsInSample,1); pForecast <- matrix(1,h,1); } else{ - if(length(fitted(occurrenceModel))>obsInsample){ - pFitted <- matrix(fitted(occurrenceModel)[1:obsInsample],obsInsample,1); + if(length(fitted(occurrenceModel))>obsInSample){ + pFitted <- matrix(fitted(occurrenceModel)[1:obsInSample],obsInSample,1); } - else if(length(fitted(occurrenceModel))=h){ @@ -49,15 +49,15 @@ intermittentParametersSetter <- function(occurrence="n",...){ } } else{ - obsNonzero <- obsInsample; + obsNonzero <- obsInSample; obsZero <- 0; } if(occurrence=="n"){ - ot <- rep(1,obsInsample); - obsNonzero <- obsInsample; - yot <- y; - pFitted <- matrix(1,obsInsample,1); + ot <- rep(1,obsInSample); + obsNonzero <- obsInSample; + yot <- yInSample; + pFitted <- matrix(1,obsInSample,1); pForecast <- matrix(1,h,1); nParamOccurrence <- 0; } @@ -147,7 +147,7 @@ intermittentMaker <- function(occurrence="n",...){ #' \item \code{logLik} - likelihood value for the model #' \item \code{nParam} - number of parameters used in the model; #' \item \code{residuals} - residuals of the model; -#' \item \code{actuals} - actual values of probabilities (zeros and ones). +#' \item \code{y} - actual values of probabilities (zeros and ones). #' \item \code{persistence} - the vector of smoothing parameters; #' \item \code{initial} - initial values of the state vector; #' \item \code{initialSeason} - the matrix of initials seasonal states; @@ -186,11 +186,11 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb data <- data$data; } - obsInsample <- length(data) - holdout*h; + obsInSample <- length(data) - holdout*h; obsAll <- length(data) + (1 - holdout)*h; - y <- ts(data[1:obsInsample],frequency=frequency(data),start=start(data)); + yInSample <- ts(data[1:obsInSample],frequency=frequency(data),start=start(data)); - ot <- abs((y!=0)*1); + ot <- abs((yInSample!=0)*1); otAll <- abs((data!=0)*1); iprob <- mean(ot); obsOnes <- sum(ot); @@ -237,19 +237,19 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb #### Fixed probability #### if(intermittent=="f"){ if(!is.null(initial)){ - pFitted <- ts(matrix(rep(initial,obsInsample),obsInsample,1), start=start(y), frequency=frequency(y)); + pFitted <- ts(matrix(rep(initial,obsInSample),obsInSample,1), start=start(yInSample), frequency=frequency(yInSample)); } else{ initial <- iprob; - pFitted <- ts(matrix(rep(iprob,obsInsample),obsInsample,1), start=start(y), frequency=frequency(y)); + pFitted <- ts(matrix(rep(iprob,obsInSample),obsInSample,1), start=start(yInSample), frequency=frequency(yInSample)); } names(initial) <- "level"; - pForecast <- ts(rep(pFitted[1],h), start=time(y)[obsInsample]+deltat(y), frequency=frequency(y)); - errors <- ts(ot-iprob, start=start(y), frequency=frequency(y)); + pForecast <- ts(rep(pFitted[1],h), start=time(yInSample)[obsInSample]+deltat(yInSample), frequency=frequency(yInSample)); + errors <- ts(ot-iprob, start=start(yInSample), frequency=frequency(yInSample)); output <- list(model=model, fitted=pFitted, forecast=pForecast, states=pFitted, variance=pForecast*(1-pForecast), logLik=NA, nParam=1, - residuals=errors, actuals=otAll, + residuals=errors, y=otAll, persistence=NULL, initial=initial, initialSeason=NULL); } #### Croston's method #### @@ -258,18 +258,18 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb initial <- "o"; } # Define the matrix of states - ivt <- matrix(rep(iprob,obsInsample+1),obsInsample+1,1); + ivt <- matrix(rep(iprob,obsInSample+1),obsInSample+1,1); # Define the matrix of actuals as intervals between demands - # zeroes <- c(0,which(y!=0),obsInsample+1); - zeroes <- c(0,which(y!=0)); + # zeroes <- c(0,which(y!=0),obsInSample+1); + zeroes <- c(0,which(yInSample!=0)); ### With this thing we fit model of the type 1/(1+qt) # zeroes <- diff(zeroes)-1; zeroes <- diff(zeroes); # Number of intervals in Croston iyt <- matrix(zeroes,length(zeroes),1); - newh <- which(y!=0); + newh <- which(yInSample!=0); newh <- newh[length(newh)]; - newh <- obsInsample - newh + h; + newh <- obsInSample - newh + h; crostonModel <- es(iyt,model=model,silent=TRUE,h=newh, persistence=persistence,initial=initial, ic=ic,xreg=xreg,initialSeason=initialSeason); @@ -278,7 +278,7 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb if(any(pFitted<1)){ pFitted[pFitted<1] <- 1; } - tailNumber <- obsInsample - length(pFitted); + tailNumber <- obsInSample - length(pFitted); if(tailNumber>0){ pForecast <- crostonModel$forecast[1:tailNumber]; if(any(pForecast<1)){ @@ -292,20 +292,20 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb } if(sbaCorrection){ - pFitted <- ts((1-sum(crostonModel$persistence)/2)/pFitted,start=start(y),frequency=frequency(y)); - pForecast <- ts((1-sum(crostonModel$persistence)/2)/pForecast, start=time(y)[obsInsample]+deltat(y),frequency=frequency(y)); + pFitted <- ts((1-sum(crostonModel$persistence)/2)/pFitted,start=start(yInSample),frequency=frequency(yInSample)); + pForecast <- ts((1-sum(crostonModel$persistence)/2)/pForecast, start=time(yInSample)[obsInSample]+deltat(yInSample),frequency=frequency(yInSample)); states <- 1/crostonModel$states; intermittent <- "s"; } else{ - pFitted <- ts(1/pFitted,start=start(y),frequency=frequency(y)); - pForecast <- ts(1/pForecast, start=time(y)[obsInsample]+deltat(y),frequency=frequency(y)); + pFitted <- ts(1/pFitted,start=start(yInSample),frequency=frequency(yInSample)); + pForecast <- ts(1/pForecast, start=time(yInSample)[obsInSample]+deltat(yInSample),frequency=frequency(yInSample)); states <- 1/crostonModel$states; } output <- list(model=model, fitted=pFitted, forecast=pForecast, states=states, variance=pForecast*(1-pForecast), logLik=NA, nParam=nparam(crostonModel), - residuals=crostonModel$residuals, actuals=otAll, + residuals=crostonModel$residuals, y=otAll, persistence=crostonModel$persistence, initial=crostonModel$initial, initialSeason=crostonModel$initialSeason); } @@ -318,14 +318,14 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb initial <- "o"; } - iyt <- matrix(ot,obsInsample,1); + iyt <- matrix(ot,obsInSample,1); iyt <- ts(iyt,frequency=frequency(data)); kappa <- 1E-5; - iy_kappa <- ts(iyt*(1 - 2*kappa) + kappa,start=start(y),frequency=frequency(y)); + iy_kappa <- ts(iyt*(1 - 2*kappa) + kappa,start=start(yInSample),frequency=frequency(yInSample)); tsbModel <- es(iy_kappa,model,persistence=persistence,initial=initial, - ic=ic,silent=TRUE,h=h,cfType="TSB",xreg=xreg, + ic=ic,silent=TRUE,h=h,loss="TSB",xreg=xreg, initialSeason=initialSeason); # Correction so we can return from those iy_kappa values @@ -350,7 +350,7 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb output <- list(model=model, fitted=tsbModel$fitted, forecast=tsbModel$forecast, states=tsbModel$states, variance=tsbModel$forecast*(1-tsbModel$forecast), logLik=NA, nParam=nparam(tsbModel)-1, - residuals=tsbModel$residuals, actuals=otAll, + residuals=tsbModel$residuals, y=otAll, persistence=tsbModel$persistence, initial=tsbModel$initial, initialSeason=tsbModel$initialSeason); } @@ -370,32 +370,32 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb substr(model,nchar(model),nchar(model))!="X") & all(c(substr(model,1,1)!="Z", substr(model,2,2)!="Z"), substr(model,nchar(model),nchar(model))!="Z")){ - cfType <- "LogisticL"; + loss <- "LogisticL"; } else if(all(c(substr(model,1,1)!="Z", substr(model,2,2)!="Z"), substr(model,nchar(model),nchar(model))!="Z")){ - cfType <- "LogisticD"; + loss <- "LogisticD"; } else{ - cfType <- "LogisticZ"; + loss <- "LogisticZ"; } ##### Need to introduce also the one with ZZZ ##### - iyt <- ts(matrix(ot,obsInsample,1),start=start(y),frequency=frequency(y)); + iyt <- ts(matrix(ot,obsInSample,1),start=start(yInSample),frequency=frequency(yInSample)); - if(cfType=="LogisticZ"){ + if(loss=="LogisticZ"){ logisticModel <- list(NA); - cfType <- "LogisticD"; + loss <- "LogisticD"; modelNew <- gsub("Z","X",model); logisticModel[[1]] <- es(iyt,modelNew,persistence=persistence,initial=initial, - ic=ic,silent=TRUE,h=h,cfType=cfType,xreg=xreg, + ic=ic,silent=TRUE,h=h,loss=loss,xreg=xreg, initialSeason=initialSeason); - cfType <- "LogisticL"; + loss <- "LogisticL"; modelNew <- gsub("Z","Y",model); logisticModel[[2]] <- es(iyt,modelNew,persistence=persistence,initial=initial, - ic=ic,silent=TRUE,h=h,cfType=cfType,xreg=xreg, + ic=ic,silent=TRUE,h=h,loss=loss,xreg=xreg, initialSeason=initialSeason); if(logisticModel[[1]]$ICs[nrow(logisticModel[[1]]$ICs),ic] < @@ -408,24 +408,24 @@ iss <- function(data, intermittent=c("none","fixed","interval","probability","sb } else{ logisticModel <- es(iyt,model,persistence=persistence,initial=initial, - ic=ic,silent=TRUE,h=h,cfType=cfType,xreg=xreg, + ic=ic,silent=TRUE,h=h,loss=loss,xreg=xreg, initialSeason=initialSeason); } output <- list(model=modelType(logisticModel), fitted=logisticModel$fitted, forecast=logisticModel$forecast, states=logisticModel$states, variance=logisticModel$forecast*(1-logisticModel$forecast), logLik=NA, nParam=nparam(logisticModel), - residuals=logisticModel$residuals, actuals=otAll, + residuals=logisticModel$residuals, y=otAll, persistence=logisticModel$persistence, initial=logisticModel$initial, initialSeason=logisticModel$initialSeason); } #### None #### else{ - pFitted <- ts(y,start=start(y),frequency=frequency(y)); - pForecast <- ts(rep(y[obsInsample],h), start=time(y)[obsInsample]+deltat(y),frequency=frequency(y)); - errors <- ts(rep(0,obsInsample), start=start(y), frequency=frequency(y)); + pFitted <- ts(yInSample,start=start(yInSample),frequency=frequency(yInSample)); + pForecast <- ts(rep(yInSample[obsInSample],h), start=time(yInSample)[obsInSample]+deltat(yInSample),frequency=frequency(yInSample)); + errors <- ts(rep(0,obsInSample), start=start(yInSample), frequency=frequency(yInSample)); output <- list(model=NULL, fitted=pFitted, forecast=pForecast, states=pFitted, variance=rep(0,h), logLik=NA, nParam=0, - residuals=errors, actuals=pFitted, + residuals=errors, y=pFitted, persistence=NULL, initial=NULL, initialSeason=NULL); } output$intermittent <- intermittent; diff --git a/R/methods.R b/R/methods.R index 7bca9b1..a51c07c 100644 --- a/R/methods.R +++ b/R/methods.R @@ -415,7 +415,7 @@ nparam.iss <- function(object, ...){ #' #' # Generate data, apply es() with the holdout parameter and calculate PLS #' x <- rnorm(100,0,1) -#' ourModel <- es(x, h=10, holdout=TRUE, intervals=TRUE) +#' ourModel <- es(x, h=10, holdout=TRUE, interval=TRUE) #' pls(ourModel, type="a") #' pls(ourModel, type="e") #' pls(ourModel, type="s", obs=100, nsim=100) @@ -470,24 +470,24 @@ pls.smooth <- function(object, holdout=NULL, ...){ h <- length(holdout); Etype <- errorType(object); - cfType <- object$cfType; - if(any(cfType==c("MAE","MAEh","TMAE","GTMAE","MACE"))){ - cfType <- "MAE"; + loss <- object$loss; + if(any(loss==c("MAE","MAEh","TMAE","GTMAE","MACE"))){ + loss <- "MAE"; } - else if(any(cfType==c("HAM","HAMh","THAM","GTHAM","CHAM"))){ - cfType <- "HAM"; + else if(any(loss==c("HAM","HAMh","THAM","GTHAM","CHAM"))){ + loss <- "HAM"; } else{ - cfType <- "MSE"; + loss <- "MSE"; } - densityFunction <- function(cfType, ...){ - if(cfType=="MAE"){ + densityFunction <- function(loss, ...){ + if(loss=="MAE"){ # This is a simplification. The real multivariate Laplace is bizarre! scale <- sqrt(diag(covarMat)/2); plsValue <- sum(dlaplace(errors, 0, scale, log=TRUE)); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ # This is a simplification. We don't have multivariate HAM yet. scale <- (diag(covarMat)/120)^0.25; plsValue <- sum(ds(errors, 0, scale, log=TRUE)); @@ -521,7 +521,7 @@ pls.smooth <- function(object, holdout=NULL, ...){ # Non-intermittent data if(is.null(object$occurrence)){ errors <- holdout - yForecast; - plsValue <- densityFunction(cfType, errors, covarMat); + plsValue <- densityFunction(loss, errors, covarMat); } # Intermittent data else{ @@ -529,7 +529,7 @@ pls.smooth <- function(object, holdout=NULL, ...){ pForecast <- object$occurrence$forecast; errors <- holdout - yForecast / pForecast; if(all(ot)){ - plsValue <- densityFunction(cfType, errors, covarMat) + sum(log(pForecast)); + plsValue <- densityFunction(loss, errors, covarMat) + sum(log(pForecast)); } else if(all(!ot)){ plsValue <- sum(log(1-pForecast)); @@ -537,7 +537,7 @@ pls.smooth <- function(object, holdout=NULL, ...){ else{ errors[!ot] <- 0; - plsValue <- densityFunction(cfType, errors, covarMat); + plsValue <- densityFunction(loss, errors, covarMat); plsValue <- plsValue + sum(log(pForecast[ot])) + sum(log(1-pForecast[!ot])); } } @@ -547,7 +547,7 @@ pls.smooth <- function(object, holdout=NULL, ...){ # Non-intermittent data if(is.null(object$occurrence)){ errors <- log(holdout) - log(yForecast); - plsValue <- densityFunction(cfType, errors, covarMat) - sum(log(holdout)); + plsValue <- densityFunction(loss, errors, covarMat) - sum(log(holdout)); } # Intermittent data else{ @@ -555,7 +555,7 @@ pls.smooth <- function(object, holdout=NULL, ...){ pForecast <- object$occurrence$forecast; errors <- log(holdout) - log(yForecast / pForecast); if(all(ot)){ - plsValue <- (densityFunction(cfType, errors, covarMat) - sum(log(holdout)) + + plsValue <- (densityFunction(loss, errors, covarMat) - sum(log(holdout)) + sum(log(pForecast))); } else if(all(!ot)){ @@ -564,7 +564,7 @@ pls.smooth <- function(object, holdout=NULL, ...){ else{ errors[!ot] <- 0; - plsValue <- densityFunction(cfType, errors, covarMat) - sum(log(holdout[ot])); + plsValue <- densityFunction(loss, errors, covarMat) - sum(log(holdout[ot])); plsValue <- plsValue + sum(log(pForecast[ot])) + sum(log(1-pForecast[!ot])); } } @@ -586,17 +586,17 @@ pointLik.smooth <- function(object, ...){ obs <- nobs(object); errors <- residuals(object); likValues <- vector("numeric",obs); - cfType <- object$cfType; + loss <- object$loss; if(errorType(object)=="M"){ errors <- log(1+errors); likValues <- likValues - log(actuals(object)); } - if(any(cfType==c("MAE","MAEh","TMAE","GTMAE","MACE"))){ + if(any(loss==c("MAE","MAEh","TMAE","GTMAE","MACE"))){ likValues <- likValues + dlaplace(errors, 0, mean(abs(errors)), TRUE); } - else if(any(cfType==c("HAM","HAMh","THAM","GTHAM","CHAM"))){ + else if(any(loss==c("HAM","HAMh","THAM","GTHAM","CHAM"))){ likValues <- likValues + ds(errors, 0, mean(sqrt(abs(errors))/2), TRUE); } else{ @@ -666,7 +666,6 @@ coef.smooth <- function(object, ...) return(parameters); } -#' @importFrom greybox actuals #### Fitted, forecast and actual values #### #' @export @@ -691,7 +690,7 @@ NULL #' @aliases forecast forecast.smooth #' @param object Time series model for which forecasts are required. #' @param h Forecast horizon -#' @param intervals Type of intervals to construct. See \link[smooth]{es} for +#' @param interval Type of interval to construct. See \link[smooth]{es} for #' details. #' @param level Confidence level. Defines width of prediction interval. #' @param ... Other arguments accepted by either \link[smooth]{es}, @@ -702,13 +701,13 @@ NULL #' \item \code{model} - the estimated model (ES / CES / GUM / SSARIMA). #' \item \code{method} - the name of the estimated model (ES / CES / GUM / SSARIMA). #' \item \code{fitted} - fitted values of the model. -#' \item \code{actuals} - actuals provided in the call of the model. +#' \item \code{y} - actual values provided in the call of the model. #' \item \code{forecast} aka \code{mean} - point forecasts of the model #' (conditional mean). -#' \item \code{lower} - lower bound of prediction intervals. -#' \item \code{upper} - upper bound of prediction intervals. +#' \item \code{lower} - lower bound of prediction interval. +#' \item \code{upper} - upper bound of prediction interval. #' \item \code{level} - confidence level. -#' \item \code{intervals} - binary variable (whether intervals were produced or not). +#' \item \code{interval} - binary variable (whether interval were produced or not). #' \item \code{residuals} - the residuals of the original model. #' } #' @template ssAuthor @@ -722,32 +721,32 @@ NULL #' ourModel <- ces(rnorm(100,0,1),h=10) #' #' forecast.smooth(ourModel,h=10) -#' forecast.smooth(ourModel,h=10,intervals=TRUE) -#' plot(forecast.smooth(ourModel,h=10,intervals=TRUE)) +#' forecast.smooth(ourModel,h=10,interval=TRUE) +#' plot(forecast.smooth(ourModel,h=10,interval=TRUE)) #' #' @export forecast.smooth #' @export forecast.smooth <- function(object, h=10, - intervals=c("parametric","semiparametric","nonparametric","none"), + interval=c("parametric","semiparametric","nonparametric","none"), level=0.95, ...){ smoothType <- smoothType(object); - intervals <- intervals[1]; + interval <- interval[1]; if(smoothType=="ETS"){ - newModel <- es(object$actuals,model=object,h=h,intervals=intervals,level=level,silent="all",...); + newModel <- es(actuals(object),model=object,h=h,interval=interval,level=level,silent="all",...); } else if(smoothType=="CES"){ - newModel <- ces(object$actuals,model=object,h=h,intervals=intervals,level=level,silent="all",...); + newModel <- ces(actuals(object),model=object,h=h,interval=interval,level=level,silent="all",...); } else if(smoothType=="GUM"){ - newModel <- gum(object$actuals,model=object,type=errorType(object),h=h,intervals=intervals,level=level,silent="all",...); + newModel <- gum(actuals(object),model=object,type=errorType(object),h=h,interval=interval,level=level,silent="all",...); } else if(smoothType=="ARIMA"){ if(any(unlist(gregexpr("combine",object$model))==-1)){ if(is.msarima(object)){ - newModel <- msarima(object$actuals,model=object,h=h,intervals=intervals,level=level,silent="all",...); + newModel <- msarima(actuals(object),model=object,h=h,interval=interval,level=level,silent="all",...); } else{ - newModel <- ssarima(object$actuals,model=object,h=h,intervals=intervals,level=level,silent="all",...); + newModel <- ssarima(actuals(object),model=object,h=h,interval=interval,level=level,silent="all",...); } } else{ @@ -756,14 +755,14 @@ forecast.smooth <- function(object, h=10, } } else if(smoothType=="SMA"){ - newModel <- sma(object$actuals,model=object,h=h,intervals=intervals,level=level,silent="all",...); + newModel <- sma(actuals(object),model=object,h=h,interval=interval,level=level,silent="all",...); } else{ stop("Wrong object provided. This needs to be either 'ETS', or 'CES', or 'GUM', or 'SSARIMA', or 'SMA' model.",call.=FALSE); } - output <- list(model=object,method=object$model,fitted=newModel$fitted,actuals=newModel$actuals, + output <- list(model=object,method=object$model,fitted=newModel$fitted,y=actuals(newModel), forecast=newModel$forecast,lower=newModel$lower,upper=newModel$upper,level=newModel$level, - intervals=intervals,mean=newModel$forecast,x=object$actuals,residuals=object$residuals); + interval=interval,mean=newModel$forecast,x=actuals(object),residuals=object$residuals); return(structure(output,class=c("smooth.forecast","forecast"))); } @@ -772,11 +771,15 @@ forecast.smooth <- function(object, h=10, #' @importFrom greybox actuals #' @export actuals.smooth <- function(object, ...){ - return(window(object$actuals,start(object$actuals),end(object$fitted))); + return(window(object$y,start(object$y),end(object$fitted))); } #' @export actuals.smooth.forecast <- function(object, ...){ - return(window(object$model$actuals,start(object$model$actuals),end(object$model$fitted))); + return(window(object$model$y,start(object$model$y),end(object$model$fitted))); +} +#' @export +actuals.iss <- function(object, ...){ + return(window(object$y,start(object$y),end(object$fitted))); } #### Function extracts lags of provided model #### @@ -1007,14 +1010,7 @@ modelName.smooth <- function(object, ...){ #### Function extracts type of model. For example "AAN" from ets #### #' @export modelType.default <- function(object, ...){ - modelType <- NA; - if(is.null(object$model)){ - if(any(gregexpr("ets",object$call)!=-1)){ - model <- object$method; - modelType <- gsub(",","",substring(model,5,nchar(model)-1)); - } - } - return(modelType); + return(NA); } #' @export @@ -1060,6 +1056,11 @@ modelType.oesg <- function(object, ...){ return(modelType(object$modelA)); } +#' @export +modelType.ets <- function(object, ...){ + return(gsub(",","",substring(object$method,5,nchar(object$method)-1))); +} + #### Function extracts orders of provided model #### #' @export orders.default <- function(object, ...){ @@ -1161,10 +1162,10 @@ plot.oes <- function(x, ...){ ellipsis <- list(...); if(is.null(ellipsis$main)){ - graphmaker(x$actuals,x$forecast,x$fitted,x$lower,x$upper,main=x$model,...); + graphmaker(actuals(x),x$forecast,x$fitted,x$lower,x$upper,main=x$model,...); } else{ - graphmaker(x$actuals,x$forecast,x$fitted,x$lower,x$upper, ...); + graphmaker(actuals(x),x$forecast,x$fitted,x$lower,x$upper, ...); } } @@ -1206,7 +1207,7 @@ plot.smooth <- function(x, ...){ } } else if(smoothType=="CMA"){ - ellipsis$actuals <- x$actuals; + ellipsis$actuals <- actuals(x); ellipsis$forecast <- x$forecast; ellipsis$fitted <- x$fitted; ellipsis$legend <- FALSE; @@ -1253,7 +1254,7 @@ plot.smooth <- function(x, ...){ #' @export plot.smoothC <- function(x, ...){ - graphmaker(x$actuals, x$forecast, x$fitted, x$lower, x$upper, x$level, + graphmaker(actuals(x), x$forecast, x$fitted, x$lower, x$upper, x$level, main="Combined smooth forecasts"); } @@ -1293,11 +1294,11 @@ plot.smooth.sim <- function(x, ...){ #' @method plot smooth.forecast #' @export plot.smooth.forecast <- function(x, ...){ - if(any(x$intervals!=c("none","n"))){ - graphmaker(x$actuals,x$forecast,x$fitted,x$lower,x$upper,x$level,main=x$method); + if(any(x$interval!=c("none","n"))){ + graphmaker(actuals(x),x$forecast,x$fitted,x$lower,x$upper,x$level,main=x$method); } else{ - graphmaker(x$actuals,x$forecast,x$fitted,main=x$method); + graphmaker(actuals(x),x$forecast,x$fitted,main=x$method); } } @@ -1321,10 +1322,10 @@ plot.iss <- function(x, ...){ intermittent <- "None"; } if(is.null(ellipsis$main)){ - graphmaker(x$actuals,x$forecast,x$fitted,main=paste0("iSS, ",intermittent), ...); + graphmaker(actuals(x),x$forecast,x$fitted,main=paste0("iSS, ",intermittent), ...); } else{ - graphmaker(x$actuals,x$forecast,x$fitted, ...); + graphmaker(actuals(x),x$forecast,x$fitted, ...); } } @@ -1336,34 +1337,34 @@ print.smooth <- function(x, ...){ if(!is.list(x$model)){ if(smoothType=="CMA"){ holdout <- FALSE; - intervals <- FALSE; + interval <- FALSE; cumulative <- FALSE; } else{ holdout <- any(!is.na(x$holdout)); - intervals <- any(!is.na(x$lower)); + interval <- any(!is.na(x$lower)); cumulative <- x$cumulative; } } else{ holdout <- any(!is.na(x$holdout)); - intervals <- any(!is.na(x$lower)); + interval <- any(!is.na(x$lower)); cumulative <- x$cumulative; } - if(all(holdout,intervals)){ + if(all(holdout,interval)){ if(!cumulative){ - insideintervals <- sum((x$holdout <= x$upper) & (x$holdout >= x$lower)) / length(x$forecast) * 100; + insideinterval <- sum((x$holdout <= x$upper) & (x$holdout >= x$lower)) / length(x$forecast) * 100; } else{ - insideintervals <- NULL; + insideinterval <- NULL; } } else{ - insideintervals <- NULL; + insideinterval <- NULL; } - intervalsType <- x$intervals; + intervalType <- x$interval; if(!is.null(x$model)){ if(!is.list(x$model)){ @@ -1375,7 +1376,7 @@ print.smooth <- function(x, ...){ else if(smoothType=="ETS"){ # If cumulative forecast and Etype=="M", report that this was "parameteric" interval if(cumulative & substr(modelType(x),1,1)=="M"){ - intervalsType <- "p"; + intervalType <- "p"; } } } @@ -1390,9 +1391,9 @@ print.smooth <- function(x, ...){ ssOutput(x$timeElapsed, x$model, persistence=x$persistence, transition=x$transition, measurement=x$measurement, phi=x$phi, ARterms=x$AR, MAterms=x$MA, constant=x$constant, A=x$A, B=x$B,initialType=x$initialType, nParam=x$nParam, s2=x$s2, hadxreg=!is.null(x$xreg), wentwild=x$updateX, - cfType=x$cfType, cfObjective=x$cf, intervals=intervals, cumulative=cumulative, - intervalsType=intervalsType, level=x$level, ICs=x$ICs, - holdout=holdout, insideintervals=insideintervals, errormeasures=x$accuracy, + loss=x$loss, cfObjective=x$lossValue, interval=interval, cumulative=cumulative, + intervalType=intervalType, level=x$level, ICs=x$ICs, + holdout=holdout, insideinterval=insideinterval, errormeasures=x$accuracy, occurrence=occurrence); } @@ -1504,7 +1505,7 @@ print.smooth.sim <- function(x, ...){ #' @export print.smooth.forecast <- function(x, ...){ - if(any(x$intervals!=c("none","n"))){ + if(any(x$interval!=c("none","n"))){ level <- x$level; if(level>1){ level <- level/100; @@ -1564,8 +1565,8 @@ print.oes <- function(x, ...){ if(occurrence=="g"){ occurrence <- "General"; } - else if(occurrence=="p"){ - occurrence <- "Probability-based"; + else if(occurrence=="d"){ + occurrence <- "Direct probability"; } else if(occurrence=="f"){ occurrence <- "Fixed probability"; @@ -1613,8 +1614,8 @@ simulate.smooth <- function(object, nsim=1, seed=NULL, obs=NULL, ...){ # Start a list of arguments args <- vector("list",0); - cfType <- object$cfType; - if(any(cfType==c("MAE","MAEh","TMAE","GTMAE","MACE"))){ + loss <- object$loss; + if(any(loss==c("MAE","MAEh","TMAE","GTMAE","MACE"))){ randomizer <- "rlaplace"; if(!is.null(ellipsis$mu)){ args$mu <- ellipsis$mu; @@ -1630,7 +1631,7 @@ simulate.smooth <- function(object, nsim=1, seed=NULL, obs=NULL, ...){ args$scale <- mean(abs(residuals(object))); } } - else if(any(cfType==c("HAM","HAMh","THAM","GTHAM","CHAM"))){ + else if(any(loss==c("HAM","HAMh","THAM","GTHAM","CHAM"))){ randomizer <- "rs"; if(!is.null(ellipsis$mu)){ args$mu <- ellipsis$mu; @@ -1668,7 +1669,7 @@ simulate.smooth <- function(object, nsim=1, seed=NULL, obs=NULL, ...){ } } args$randomizer <- randomizer; - args$frequency <- frequency(object$actuals); + args$frequency <- frequency(actuals(object)); args$obs <- obs; args$nsim <- nsim; args$initial <- object$initial; diff --git a/R/msarima.R b/R/msarima.R index 3071f44..6746a0d 100644 --- a/R/msarima.R +++ b/R/msarima.R @@ -49,6 +49,8 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' take some time... Still this should be estimated in finite time (not like #' with \code{ssarima}). #' +#' For some additional details see the vignette: \code{vignette("ssarima","smooth")} +#' #' @template ssBasicParam #' @template ssAdvancedParam #' @template ssInitialParam @@ -116,17 +118,17 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' \item \code{fitted} - the fitted values. #' \item \code{forecast} - the point forecast. #' \item \code{lower} - the lower bound of prediction interval. When -#' \code{intervals="none"} then NA is returned. +#' \code{interval="none"} then NA is returned. #' \item \code{upper} - the higher bound of prediction interval. When -#' \code{intervals="none"} then NA is returned. +#' \code{interval="none"} then NA is returned. #' \item \code{residuals} - the residuals of the estimated model. #' \item \code{errors} - The matrix of 1 to h steps ahead errors. #' \item \code{s2} - variance of the residuals (taking degrees of freedom into #' account). -#' \item \code{intervals} - type of intervals asked by user. -#' \item \code{level} - confidence level for intervals. +#' \item \code{interval} - type of interval asked by user. +#' \item \code{level} - confidence level for interval. #' \item \code{cumulative} - whether the produced forecast was cumulative or not. -#' \item \code{actuals} - the original data. +#' \item \code{y} - the original data. #' \item \code{holdout} - the holdout part of the original data. #' \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. #' If the model is non-intermittent, then occurrence is \code{NULL}. @@ -142,8 +144,8 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' \item \code{ICs} - values of information criteria of the model. Includes #' AIC, AICc, BIC and BICc. #' \item \code{logLik} - log-likelihood of the function. -#' \item \code{cf} - Cost function value. -#' \item \code{cfType} - Type of cost function used in the estimation. +#' \item \code{lossValue} - Cost function value. +#' \item \code{loss} - Type of loss function used in the estimation. #' \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} #' or when \code{FI} is not provided at all. #' \item \code{accuracy} - vector of accuracy measures for the holdout sample. @@ -160,7 +162,7 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' @examples #' #' # The previous one is equivalent to: -#' ourModel <- msarima(rnorm(118,100,3),orders=c(1,1,1),lags=1,h=18,holdout=TRUE,intervals="p") +#' ourModel <- msarima(rnorm(118,100,3),orders=c(1,1,1),lags=1,h=18,holdout=TRUE,interval="p") #' #' # Example of SARIMA(2,0,0)(1,0,0)[4] #' msarima(rnorm(118,100,3),orders=list(ar=c(2,1)),lags=c(1,4),h=18,holdout=TRUE) @@ -169,7 +171,7 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' ourModel <- msarima(AirPassengers,orders=list(ar=c(1,0,3),i=c(1,0,1),ma=c(0,1,2)), #' lags=c(1,6,12),h=10,holdout=TRUE,FI=TRUE) #' -#' # Construct the 95% confidence intervals for the parameters of the model +#' # Construct the 95% confidence interval for the parameters of the model #' ourCoefs <- coef(ourModel) #' ourCoefsSD <- sqrt(abs(diag(solve(ourModel$FI)))) #' # Sort values accordingly @@ -180,9 +182,9 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' ourConfInt #' #' # ARIMA(1,1,1) with Mean Squared Trace Forecast Error -#' msarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,cfType="TMSE") +#' msarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,loss="TMSE") #' -#' msarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,cfType="aTMSE") +#' msarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,loss="aTMSE") #' #' # SARIMA(0,1,1) with exogenous variables with crazy estimation of xreg #' ourModel <- msarima(rnorm(118,100,3),orders=list(i=1,ma=1),h=18,holdout=TRUE, @@ -193,12 +195,12 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' plot(forecast(ourModel)) #' #' @export msarima -msarima <- function(data, orders=list(ar=c(0),i=c(1),ma=c(1)), lags=c(1), +msarima <- function(y, orders=list(ar=c(0),i=c(1),ma=c(1)), lags=c(1), constant=FALSE, AR=NULL, MA=NULL, initial=c("backcasting","optimal"), ic=c("AICc","AIC","BIC","BICc"), - cfType=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), + loss=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), h=10, holdout=FALSE, cumulative=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, occurrence=c("none","auto","fixed","general","odds-ratio","inverse-odds-ratio","direct"), oesmodel="MNN", bounds=c("admissible","none"), @@ -213,6 +215,11 @@ msarima <- function(data, orders=list(ar=c(0),i=c(1),ma=c(1)), lags=c(1), # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + # Add all the variables in ellipsis to current environment list2env(list(...),environment()); @@ -322,9 +329,9 @@ msarima <- function(data, orders=list(ar=c(0),i=c(1),ma=c(1)), lags=c(1), CF <- function(C){ cfRes <- costfuncARIMA(ar.orders, ma.orders, i.orders, lags, nComponents, ARValue, MAValue, constantValue, C, - matvt, matF, matw, y, vecg, + matvt, matF, matw, yInSample, vecg, h, modellags, Etype, Ttype, Stype, - multisteps, cfType, normalizer, initialType, + multisteps, loss, normalizer, initialType, nExovars, matxt, matat, matFX, vecgX, ot, AREstimate, MAEstimate, constantRequired, constantEstimate, xregEstimate, updateX, FXEstimate, gXEstimate, initialXEstimate, @@ -368,10 +375,10 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ if(constantEstimate){ if(all(i.orders==0)){ - C <- c(C,sum(yot)/obsInsample); + C <- c(C,sum(yot)/obsInSample); } else{ - C <- c(C,sum(diff(yot))/obsInsample); + C <- c(C,sum(diff(yot))/obsInSample); } } @@ -457,8 +464,8 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ else{ for(i in 1:nComponents){ nRepeats <- ceiling(maxlag/modellags[i]); - matvt[1:maxlag,i] <- rep(y[1:modellags[i]],nRepeats)[nRepeats*modellags[i]+(-maxlag+1):0]; - # matvt[1:maxlag,i] <- rep(y[1:modellags[i]],nRepeats)[1:maxlag]; + matvt[1:maxlag,i] <- rep(yInSample[1:modellags[i]],nRepeats)[nRepeats*modellags[i]+(-maxlag+1):0]; + # matvt[1:maxlag,i] <- rep(yInSample[1:modellags[i]],nRepeats)[1:maxlag]; } } } @@ -470,14 +477,14 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ } ##### Preset yFitted, yForecast, errors and basic parameters ##### - yFitted <- rep(NA,obsInsample); + yFitted <- rep(NA,obsInSample); yForecast <- rep(NA,h); - errors <- rep(NA,obsInsample); + errors <- rep(NA,obsInSample); ##### Prepare exogenous variables ##### - xregdata <- ssXreg(data=data, xreg=xreg, updateX=updateX, ot=ot, + xregdata <- ssXreg(y=y, xreg=xreg, updateX=updateX, ot=ot, persistenceX=persistenceX, transitionX=transitionX, initialX=initialX, - obsInsample=obsInsample, obsAll=obsAll, obsStates=obsStates, + obsInSample=obsInSample, obsAll=obsAll, obsStates=obsStates, maxlag=maxlag, h=h, xregDo=xregDo, silent=silentText); if(xregDo=="u"){ @@ -558,11 +565,11 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ # If this is tiny sample, use ARIMA with constant instead if(tinySample){ warning("Not enough observations to fit ARIMA. Switching to ARIMA(0,0,0) with constant.",call.=FALSE); - return(msarima(data,orders=list(ar=0,i=0,ma=0),lags=1, + return(msarima(y,orders=list(ar=0,i=0,ma=0),lags=1, constant=TRUE, - initial=initial,cfType=cfType, + initial=initial,loss=loss, h=h,holdout=holdout,cumulative=cumulative, - intervals=intervals,level=level, + interval=interval,level=level, occurrence=occurrence, oesmodel=oesmodel, bounds="u", @@ -756,7 +763,7 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ } # Fill in the rest of matvt - matvt <- ts(matvt,start=(time(data)[1] - deltat(data)*maxlag),frequency=frequency(data)); + matvt <- ts(matvt,start=(time(y)[1] - deltat(y)*maxlag),frequency=dataFreq); if(!is.null(xreg)){ matvt <- cbind(matvt,matat[1:nrow(matvt),]); colnames(matvt) <- c(paste0("Component ",c(1:max(1,nComponents))),colnames(matat)); @@ -903,12 +910,12 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ ##### Deal with the holdout sample ##### if(holdout){ - yHoldout <- ts(data[(obsInsample+1):obsAll],start=yForecastStart,frequency=frequency(data)); + yHoldout <- ts(y[(obsInSample+1):obsAll],start=yForecastStart,frequency=dataFreq); if(cumulative){ - errormeasures <- measures(sum(yHoldout),yForecast,h*y); + errormeasures <- measures(sum(yHoldout),yForecast,h*yInSample); } else{ - errormeasures <- measures(yHoldout,yForecast,y); + errormeasures <- measures(yHoldout,yForecast,yInSample); } if(cumulative){ @@ -927,18 +934,18 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ yLowerNew <- yLower; if(cumulative){ yForecastNew <- ts(rep(yForecast/h,h),start=yForecastStart,frequency=dataFreq) - if(intervals){ + if(interval){ yUpperNew <- ts(rep(yUpper/h,h),start=yForecastStart,frequency=dataFreq) yLowerNew <- ts(rep(yLower/h,h),start=yForecastStart,frequency=dataFreq) } } - if(intervals){ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, + if(interval){ + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, level=level,legend=!silentLegend,main=modelname,cumulative=cumulative); } else{ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, legend=!silentLegend,main=modelname,cumulative=cumulative); } } @@ -951,9 +958,9 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ initialType=initialType,initial=initialValue, nParam=parametersNumber, modelLags=modellags, fitted=yFitted,forecast=yForecast,lower=yLower,upper=yUpper,residuals=errors, - errors=errors.mat,s2=s2,intervals=intervalsType,level=level,cumulative=cumulative, - actuals=data,holdout=yHoldout,occurrence=occurrenceModel, + errors=errors.mat,s2=s2,interval=intervalType,level=level,cumulative=cumulative, + y=y,holdout=yHoldout,occurrence=occurrenceModel, xreg=xreg,updateX=updateX,initialX=initialX,persistenceX=persistenceX,transitionX=transitionX, - ICs=ICs,logLik=logLik,cf=cfObjective,cfType=cfType,FI=FI,accuracy=errormeasures); + ICs=ICs,logLik=logLik,lossValue=cfObjective,loss=loss,FI=FI,accuracy=errormeasures); return(structure(model,class=c("smooth","msarima"))); } diff --git a/R/oes.R b/R/oes.R index f81a866..30ed11d 100644 --- a/R/oes.R +++ b/R/oes.R @@ -6,8 +6,10 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax")); #' probability update and model types. #' #' The function estimates probability of demand occurrence, using the selected -#' ETS state space models. Although the function accepts all types of ETS models, -#' only the pure multiplicative models make sense. +#' ETS state space models. +#' +#' For the details about the model and its implementation, see the respective +#' vignette: \code{vignette("oes","smooth")} #' #' @template ssIntermittentRef #' @template ssInitialParam @@ -15,9 +17,13 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax")); #' @template ssAuthor #' @template ssKeywords #' -#' @param data Either numeric vector or time series vector. +#' @param y Either numeric vector or time series vector. #' @param model The type of ETS model used for the estimation. Normally this should -#' be \code{"MNN"} or any other pure multiplicative model. +#' be \code{"MNN"} or any other pure multiplicative or additive model. The model +#' selection is available here (although it's not fast), so you can use, for example, +#' \code{"YYN"} and \code{"XXN"} for selecting between the pure multiplicative and +#' pure additive models respectively. Using mixed models is possible, but not +#' recommended. #' @param occurrence The type of model used in probability estimation. Can be #' \code{"none"} - none, #' \code{"fixed"} - constant probability, @@ -34,25 +40,25 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax")); #' @param h The forecast horizon. #' @param holdout If \code{TRUE}, holdout sample of size \code{h} is taken from #' the end of the data. -#' @param intervals The type of intervals to construct. This can be: +#' @param interval The type of interval to construct. This can be: #' #' \itemize{ #' \item \code{none}, aka \code{n} - do not produce prediction -#' intervals. +#' interval. #' \item \code{parametric}, \code{p} - use state-space structure of ETS. In #' case of mixed models this is done using simulations, which may take longer #' time than for the pure additive and pure multiplicative models. -#' \item \code{semiparametric}, \code{sp} - intervals based on covariance +#' \item \code{semiparametric}, \code{sp} - interval based on covariance #' matrix of 1 to h steps ahead errors and assumption of normal / log-normal #' distribution (depending on error type). -#' \item \code{nonparametric}, \code{np} - intervals based on values from a +#' \item \code{nonparametric}, \code{np} - interval based on values from a #' quantile regression on error matrix (see Taylor and Bunn, 1999). The model #' used in this process is e[j] = a j^b, where j=1,..,h. #' } #' The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -#' parametric intervals are constructed, while the latter is equivalent to +#' parametric interval are constructed, while the latter is equivalent to #' \code{none}. -#' If the forecasts of the models were combined, then the intervals are combined +#' If the forecasts of the models were combined, then the interval are combined #' quantile-wise (Lichtendahl et al., 2013). #' @param level The confidence level. Defines width of prediction interval. #' @param bounds What type of bounds to use in the model estimation. The first @@ -98,17 +104,18 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax")); #' #' \itemize{ #' \item \code{model} - the type of the estimated ETS model; +#' \item \code{timeElapsed} - the time elapsed for the construction of the model; #' \item \code{fitted} - the fitted values for the probability; -#' \item \code{fittedBeta} - the fitted values of the underlying ETS model, where applicable +#' \item \code{fittedModel} - the fitted values of the underlying ETS model, where applicable #' (only for occurrence=c("o","i","d")); #' \item \code{forecast} - the forecast of the probability for \code{h} observations ahead; -#' \item \code{forecastBeta} - the forecast of the underlying ETS model, where applicable +#' \item \code{forecastModel} - the forecast of the underlying ETS model, where applicable #' (only for occurrence=c("o","i","d")); #' \item \code{states} - the values of the state vector; #' \item \code{logLik} - the log-likelihood value of the model; #' \item \code{nParam} - the number of parameters in the model (the matrix is returned); #' \item \code{residuals} - the residuals of the model; -#' \item \code{actuals} - actual values of occurrence (zeros and ones). +#' \item \code{y} - actual values of occurrence (zeros and ones). #' \item \code{persistence} - the vector of smoothing parameters; #' \item \code{phi} - the value of the damped trend parameter; #' \item \code{initial} - initial values of the state vector; @@ -132,10 +139,10 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax")); #' oes(y, occurrence="f") #' #' @export -oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason=NULL, phi=NULL, +oes <- function(y, model="MNN", persistence=NULL, initial="o", initialSeason=NULL, phi=NULL, occurrence=c("fixed","general","odds-ratio","inverse-odds-ratio","direct","auto","none"), ic=c("AICc","AIC","BIC","BICc"), h=10, holdout=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, bounds=c("usual","admissible","none"), silent=c("all","graph","legend","output","none"), xreg=NULL, xregDo=c("use","select"), initialX=NULL, @@ -143,6 +150,13 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= ...){ # Function returns the occurrence part of the intermittent state space model +# Start measuring the time of calculations + startTime <- Sys.time(); + + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + interval <- depricator(interval, list(...), "intervals"); + # Options for the fitter and forecaster: # O: M / A odds-ratio - "odds-ratio" # I: - M / A inverse-odds-ratio - "inverse-odds-ratio" @@ -158,8 +172,8 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= # If the model is oes or oesg, use it if(is.oesg(model)){ - return(oesg(data, modelA=model$modelA, modelB=model$modelB, h=h, holdout=holdout, - intervals=intervals, level=level, bounds=bounds, + return(oesg(y, modelA=model$modelA, modelB=model$modelB, h=h, holdout=holdout, + interval=interval, level=level, bounds=bounds, silent=silent, ...)); } else if(is.oes(model)){ @@ -178,17 +192,9 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= ##### Preparations ##### occurrence <- substring(occurrence[1],1,1); - if(occurrence=="g"){ - return(oesg(data, modelA=model, modelB=model, persistenceA=persistence, persistenceB=persistence, phiA=phi, phiB=phi, - initialA=initial, initialB=initial, initialSeasonA=initialSeason, initialSeasonB=initialSeason, - ic=ic, h=h, holdout=holdout, intervals=intervals, level=level, bounds=bounds, - silent=silent, xregA=xreg, xregB=xreg, xregDoA=xregDo, xregDoB=xregDo, updateXA=updateX, updateXB=updateX, - persistenceXA=persistenceX, persistenceXB=persistenceX, transitionXA=transitionX, transitionXB=transitionX, - initialXA=initialX, initialXB=initialX, ...)); - } - if(is.smooth.sim(data)){ - data <- data$data; + if(is.smooth.sim(y)){ + y <- y$data; } # Add all the variables in ellipsis to current environment @@ -222,7 +228,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= } #### These are needed in order for ssInput to go forward - cfType <- "MSE"; + loss <- "MSE"; oesmodel <- NULL; ##### Set environment for ssInput and make all the checks ##### @@ -230,8 +236,8 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= ssInput("oes",ParentEnvironment=environment()); ### Produce vectors with zeroes and ones, fixed probability and the number of ones. - ot <- (y!=0)*1; - otAll <- (data!=0)*1; + ot <- (yInSample!=0)*1; + otAll <- (y!=0)*1; iprob <- mean(ot); obsOnes <- sum(ot); @@ -242,9 +248,9 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= } ##### Prepare exogenous variables ##### - xregdata <- ssXreg(data=otAll, Etype="A", xreg=xreg, updateX=updateX, ot=rep(1,obsInsample), + xregdata <- ssXreg(y=otAll, Etype="A", xreg=xreg, updateX=updateX, ot=rep(1,obsInSample), persistenceX=persistenceX, transitionX=transitionX, initialX=initialX, - obsInsample=obsInsample, obsAll=obsAll, obsStates=obsStates, + obsInSample=obsInSample, obsAll=obsAll, obsStates=obsStates, maxlag=1, h=h, xregDo=xregDo, silent=silentText, allowMultiplicative=FALSE); @@ -258,15 +264,12 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= xreg <- xregdata$xreg; initialXEstimate <- xreg$initialXEstimate; - # The start time for the forecasts - yForecastStart <- time(data)[obsInsample]+deltat(data); - - #### The functions for the O, I, and P models #### + #### The functions for the O, I, and P models #### if(any(occurrence==c("o","i","d"))){ ##### Initialiser of oes ##### # This creates the states, transition, persistence and measurement matrices oesInitialiser <- function(Etype, Ttype, Stype, damped, phiEstimate, occurrence, - dataFreq, obsInsample, obsAll, obsStates, ot, + dataFreq, obsInSample, obsAll, obsStates, ot, persistenceEstimate, persistence, initialType, initialValue, initialSeasonEstimate, initialSeason){ # Define the lags of the model, number of components and max lag @@ -300,7 +303,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= # Persistence vector. The initials are set here! if(persistenceEstimate){ - vecg <- matrix(0.01, nComponentsAll, 1); + vecg <- matrix(0.05, nComponentsAll, 1); } else{ vecg <- matrix(persistence, nComponentsAll, 1); @@ -343,7 +346,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= # Define the seasonals if(modelIsSeasonal){ if(initialSeasonEstimate){ - XValues <- matrix(rep(diag(modelLagsMax),ceiling(obsInsample/modelLagsMax)),modelLagsMax)[,1:obsInsample]; + XValues <- matrix(rep(diag(modelLagsMax),ceiling(obsInSample/modelLagsMax)),modelLagsMax)[,1:obsInSample]; # The seasonal values should be between -1 and 1 initialSeasonValue <- solve(XValues %*% t(XValues)) %*% XValues %*% (ot - mean(ot)); # But make sure that it lies there @@ -476,7 +479,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - A <- c(A,mean(ot[1:min(dataFreq,obsInsample)]),1E-5); + A <- c(A,mean(ot[1:min(dataFreq,obsInSample)]),1E-5); ALower <- c(ALower,-Inf); AUpper <- c(AUpper,Inf); } @@ -534,7 +537,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - A <- c(A,mean(ot[1:min(dataFreq,obsInsample)]),1E-5); + A <- c(A,mean(ot[1:min(dataFreq,obsInSample)]),1E-5); ALower <- c(ALower,-Inf); AUpper <- c(AUpper,Inf); } @@ -592,7 +595,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - A <- c(A,mean(ot[1:min(dataFreq,obsInsample)]),1E-5); + A <- c(A,mean(ot[1:min(dataFreq,obsInSample)]),1E-5); ALower <- c(ALower,-Inf); AUpper <- c(AUpper,Inf); } @@ -656,7 +659,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= return(list(A=A,ALower=ALower,AUpper=AUpper)); } -##### Cost Function for oes ##### + ##### Cost Function for oes ##### CF <- function(A, modelLags, Etype, Ttype, Stype, occurrence, damped, nComponentsAll, nComponentsNonSeasonal, nExovars, modelLagsMax, persistenceEstimate, initialType, phiEstimate, modelIsSeasonal, initialSeasonEstimate, @@ -683,36 +686,47 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= } } - ##### Fixed probability ##### - if(occurrence=="f"){ - model <- "MNN"; - if(initialType!="o"){ - pt <- ts(matrix(rep(initial,obsInsample),obsInsample,1), start=dataStart, frequency=dataFreq); - } - else{ - initial <- iprob; - pt <- ts(matrix(rep(initial,obsInsample),obsInsample,1), start=dataStart, frequency=dataFreq); + ##### Estimate the model ##### + if(modelDo=="estimate"){ + ##### General model - from oesg() ##### + if(occurrence=="g"){ + return(oesg(y, modelA=model, modelB=model, persistenceA=persistence, persistenceB=persistence, phiA=phi, phiB=phi, + initialA=initial, initialB=initial, initialSeasonA=initialSeason, initialSeasonB=initialSeason, + ic=ic, h=h, holdout=holdout, interval=interval, level=level, bounds=bounds, + silent=silent, xregA=xreg, xregB=xreg, xregDoA=xregDo, xregDoB=xregDo, updateXA=updateX, updateXB=updateX, + persistenceXA=persistenceX, persistenceXB=persistenceX, transitionXA=transitionX, transitionXB=transitionX, + initialXA=initialX, initialXB=initialX, ...)); } - names(initial) <- "level"; - pForecast <- ts(rep(pt[1],h), start=yForecastStart, frequency=dataFreq); - errors <- ts(ot-iprob, start=dataStart, frequency=dataFreq); + ##### Fixed probability ##### + else if(occurrence=="f"){ + model <- "MNN"; + if(initialType!="o"){ + pt <- ts(matrix(rep(initial,obsInSample),obsInSample,1), start=dataStart, frequency=dataFreq); + } + else{ + initial <- iprob; + pt <- ts(matrix(rep(initial,obsInSample),obsInSample,1), start=dataStart, frequency=dataFreq); + } + names(initial) <- "level"; + pForecast <- ts(rep(pt[1],h), start=yForecastStart, frequency=dataFreq); + errors <- ts(ot-iprob, start=dataStart, frequency=dataFreq); - parametersNumber[1,c(1,4)] <- 1; + parametersNumber[1,c(1,4)] <- 1; - output <- list(fitted=pt, forecast=pForecast, states=pt, - nParam=parametersNumber, residuals=errors, actuals=otAll, - persistence=matrix(0,1,1,dimnames=list("level",NULL)), - initial=initial, initialSeason=NULL); - } - ##### Odds-ratio, inverse and direct models ##### - else if(any(occurrence==c("o","i","d"))){ - if(modelDo=="estimate"){ + output <- list(fitted=pt, forecast=pForecast, states=pt, + nParam=parametersNumber, residuals=errors, y=otAll, + persistence=matrix(0,1,1,dimnames=list("level",NULL)), + initial=initial, initialSeason=NULL); + } + ##### Odds-ratio, inverse and direct models ##### + else if(any(occurrence==c("o","i","d"))){ # Initialise the model basicparams <- oesInitialiser(Etype, Ttype, Stype, damped, phiEstimate, occurrence, - dataFreq, obsInsample, obsAll, obsStates, ot, + dataFreq, obsInSample, obsAll, obsStates, ot, persistenceEstimate, persistence, initialType, initialValue, initialSeasonEstimate, initialSeason); list2env(basicparams, environment()); + # nComponentsAll, nComponentsNonSeasonal, modelLagsMax, modelLags, matvt, vecg, matF, matw if(damped){ model <- paste0(Etype,Ttype,"d",Stype); @@ -743,7 +757,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= ot=ot, bounds=bounds); A <- res$solution; - # Parameters estimated. The variance is not estimated, so not needed + # Parameters estimated. The variance is not estimated, so not needed parametersNumber[1,1] <- length(A); # Write down phi if it was estimated @@ -784,7 +798,7 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= # Produce forecasts if(h>0){ # yForecast is the underlying forecast, while pForecast is the probability forecast - pForecast <- yForecast <- as.vector(forecasterwrap(t(matvt[,(obsInsample+1):(obsInsample+modelLagsMax),drop=FALSE]), + pForecast <- yForecast <- as.vector(forecasterwrap(t(matvt[,(obsInSample+1):(obsInSample+modelLagsMax),drop=FALSE]), elements$matF, elements$matw, h, Etype, Ttype, Stype, modelLags, matxt[(obsAll-h+1):(obsAll),,drop=FALSE], t(matat[,(obsAll-h+1):(obsAll),drop=FALSE]), elements$matFX)); @@ -818,74 +832,332 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= parametersNumber[1,4] <- sum(parametersNumber[1,1:3]); parametersNumber[2,4] <- sum(parametersNumber[2,1:3]); + + # Merge states of vt and at if the xreg was provided + if(!is.null(xreg)){ + matvt <- rbind(matvt,matat); + xreg <- matxt; + } + + if(modelIsSeasonal){ + initialSeason <- matvt[nComponentsAll,1:modelLagsMax]; + } + else{ + initialSeason <- NULL; + } + + #### Form the output #### + output <- list(fitted=pFitted, forecast=pForecast, + states=ts(t(matvt), start=(time(y)[1] - deltat(y)*modelLagsMax), + frequency=dataFreq), + nParam=parametersNumber, residuals=errors, y=otAll, + persistence=vecg, phi=phi, initial=matvt[1:nComponentsNonSeasonal,1], + initialSeason=initialSeason, fittedModel=yFitted, forecastModel=yForecast, + initialX=matat[,1], xreg=xreg, updateX=updateX, transitionX=matFX, persistenceX=vecgX); } + #### Automatic model selection #### + else if(occurrence=="a"){ + IC <- switch(ic, + "AIC"=AIC, + "AICc"=AICc, + "BIC"=BIC, + "BICc"=BICc); + + occurrencePool <- c("f","o","i","d","g"); + occurrencePoolLength <- length(occurrencePool); + occurrenceModels <- vector("list",occurrencePoolLength); + for(i in 1:occurrencePoolLength){ + occurrenceModels[[i]] <- oes(y=y,model=model,occurrence=occurrencePool[i], + ic=ic, h=h, holdout=holdout, + interval=interval, level=level, + bounds=bounds, + silent=TRUE, + xreg=xreg, xregDo=xregDo, updateX=updateX, ...); + } + ICBest <- which.min(sapply(occurrenceModels, IC))[1] + occurrence <- occurrencePool[ICBest]; + + if(!silentGraph){ + graphmaker(actuals=otAll,forecast=occurrenceModels[[ICBest]]$forecast,fitted=occurrenceModels[[ICBest]]$fitted, + legend=!silentLegend,main=paste0(occurrenceModels[[ICBest]]$model,"_",toupper(occurrence))); + } + return(occurrenceModels[[ICBest]]); + } + #### None #### else{ - stop("The model selection and combinations are not implemented in oes just yet", call.=FALSE); + pt <- ts(ot,start=dataStart,frequency=dataFreq); + pForecast <- ts(rep(ot[obsInSample],h), start=yForecastStart, frequency=dataFreq); + errors <- ts(rep(0,obsInSample), start=dataStart, frequency=dataFreq); + parametersNumber[] <- 0; + output <- list(fitted=pt, forecast=pForecast, states=pt, + nParam=parametersNumber, residuals=errors, y=pt, + persistence=NULL, initial=NULL, initialSeason=NULL); } - - # Merge states of vt and at if the xreg was provided - if(!is.null(xreg)){ - matvt <- rbind(matvt,matat); - xreg <- matxt; + } + else if(modelDo=="select"){ + if(!is.null(modelsPool)){ + modelsNumber <- length(modelsPool); + # List for the estimated models in the pool + results <- as.list(c(1:modelsNumber)); + j <- 0; } + ##### Use branch-and-bound from es() to form the initial pool ##### + else{ + # Define the pool of models in case of "ZZZ" or "CCC" to select from + poolErrors <- c("A","M"); + poolTrends <- c("N","A","Ad","M","Md"); + poolSeasonals <- c("N","A","M"); + + if(all(Etype!=c("Z","C"))){ + poolErrors <- Etype; + } + + # List for the estimated models in the pool + results <- list(NA); + + ### Use brains in order to define models to estimate ### + if(modelDo=="select" & + (any(c(Ttype,Stype)=="X") | any(c(Ttype,Stype)=="Y") | any(c(Ttype,Stype)=="Z"))){ + + # Define information criterion function to use + IC <- switch(ic, + "AIC"=AIC, + "AICc"=AICc, + "BIC"=BIC, + "BICc"=BICc); + + if(!silentText){ + cat("Forming the pool of models based on... "); + } + + # poolErrorsSmall is needed for the priliminary selection + if(Etype!="Z"){ + poolErrors <- poolErrorsSmall <- Etype; + } + else{ + poolErrorsSmall <- "A"; + } + + # Define the trends to check + if(Ttype!="Z"){ + if(Ttype=="X"){ + poolTrendSmall <- c("N","A"); + poolTrends <- c("N","A","Ad"); + trendCheck <- TRUE; + } + else if(Ttype=="Y"){ + poolTrendSmall <- c("N","M"); + poolTrends <- c("N","M","Md"); + trendCheck <- TRUE; + } + else{ + if(damped){ + poolTrendSmall <- paste0(Ttype,"d"); + poolTrends <- poolTrendSmall; + } + else{ + poolTrendSmall <- Ttype; + poolTrends <- Ttype; + } + trendCheck <- FALSE; + } + } + else{ + poolTrendSmall <- c("N","A"); + trendCheck <- TRUE; + } + + # Define seasonality to check + if(Stype!="Z"){ + if(Stype=="X"){ + poolSeasonalSmall <- c("N","A"); + poolSeasonals <- c("N","A"); + seasonalCheck <- TRUE; + } + else if(Stype=="Y"){ + poolSeasonalSmall <- c("N","M"); + poolSeasonals <- c("N","M"); + seasonalCheck <- TRUE; + } + else{ + poolSeasonalSmall <- Stype; + poolSeasonals <- Stype; + seasonalCheck <- FALSE; + } + } + else{ + poolSeasonalSmall <- c("N","A","M"); + seasonalCheck <- TRUE; + } + + # If ZZZ, then the vector is: "ANN" "ANA" "ANM" "AAN" "AAA" "AAM" + poolSmall <- paste0(rep(poolErrorsSmall,length(poolTrendSmall)*length(poolSeasonalSmall)), + rep(poolTrendSmall,each=length(poolSeasonalSmall)), + rep(poolSeasonalSmall,length(poolTrendSmall))); + modelTested <- NULL; + modelCurrent <- ""; + + # Counter + checks for the components + j <- 1; + i <- 0; + check <- TRUE; + besti <- bestj <- 1; + + #### Branch and bound starts here #### + while(check){ + i <- i + 1; + modelCurrent[] <- poolSmall[j]; + if(!silentText){ + cat(paste0(modelCurrent,", ")); + } + + results[[i]] <- oes(y, model=modelCurrent, occurrence=occurrence, h=h, holdout=FALSE, + bounds=bounds, silent=TRUE, xreg=xreg, xregDo=xregDo); + + modelTested <- c(modelTested,modelCurrent); + + if(j>1){ + # If the first is better than the second, then choose first + if(IC(results[[besti]]) <= IC(results[[i]])){ + # If Ttype is the same, then we checked seasonality + if(substring(modelCurrent,2,2) == substring(poolSmall[bestj],2,2)){ + poolSeasonals <- substr(modelType(results[[besti]]), + nchar(modelType(results[[besti]])), + nchar(modelType(results[[besti]]))); + seasonalCheck <- FALSE; + j <- which(poolSmall!=poolSmall[bestj] & + substring(poolSmall,nchar(poolSmall),nchar(poolSmall))==poolSeasonals); + } + # Otherwise we checked trend + else{ + poolTrends <- substr(modelType(results[[besti]]),2,2); + trendCheck <- FALSE; + } + } + else{ + if(substring(modelCurrent,2,2) == substring(poolSmall[besti],2,2)){ + poolSeasonals <- poolSeasonals[poolSeasonals!=substr(modelType(results[[besti]]), + nchar(modelType(results[[besti]])), + nchar(modelType(results[[besti]])))]; + if(length(poolSeasonals)>1){ + # Select another seasonal model, that is not from the previous iteration and not the current one + bestj <- j; + besti <- i; + j <- 3; + } + else{ + bestj <- j; + besti <- i; + j <- which(substring(poolSmall,nchar(poolSmall),nchar(poolSmall))==poolSeasonals & + substring(poolSmall,2,2)!=substring(modelCurrent,2,2)); + seasonalCheck <- FALSE; + } + } + else{ + poolTrends <- poolTrends[poolTrends!=substr(modelType(results[[besti]]),2,2)]; + besti <- i; + bestj <- j; + trendCheck <- FALSE; + } + } - if(modelIsSeasonal){ - initialSeason <- matvt[nComponentsAll,1:modelLagsMax]; + if(all(!c(trendCheck,seasonalCheck))){ + check <- FALSE; + } + } + else{ + j <- 2; + } + } + + modelsPool <- paste0(rep(poolErrors,each=length(poolTrends)*length(poolSeasonals)), + poolTrends, + rep(poolSeasonals,each=length(poolTrends))); + + modelsPool <- unique(c(modelTested,modelsPool)); + modelsNumber <- length(modelsPool); + j <- length(modelTested); + } + else{ + # Make the corrections in the pool for combinations + if(all(Ttype!=c("Z","C"))){ + if(Ttype=="Y"){ + poolTrends <- c("N","M","Md"); + } + else if(Ttype=="X"){ + poolTrends <- c("N","A","Ad"); + } + else{ + if(damped){ + poolTrends <- paste0(Ttype,"d"); + } + else{ + poolTrends <- Ttype; + } + } + } + if(all(Stype!=c("Z","C"))){ + if(Stype=="Y"){ + poolTrends <- c("N","M"); + } + else if(Stype=="X"){ + poolTrends <- c("N","A"); + } + else{ + poolSeasonals <- Stype; + } + } + + modelsNumber <- (length(poolErrors)*length(poolTrends)*length(poolSeasonals)); + modelsPool <- paste0(rep(poolErrors,each=length(poolTrends)*length(poolSeasonals)), + poolTrends, + rep(poolSeasonals,each=length(poolTrends))); + j <- 0; + } } - else{ - initialSeason <- NULL; + + ##### Check models in the smaller pool ##### + if(!silentText){ + cat("Estimation progress: "); } + while(j < modelsNumber){ + j <- j + 1; + if(!silentText){ + if(j==1){ + cat("\b"); + } + cat(paste0(rep("\b",nchar(round((j-1)/modelsNumber,2)*100)+1),collapse="")); + cat(paste0(round(j/modelsNumber,2)*100,"%")); + } - #### Form the output #### - output <- list(fitted=pFitted, forecast=pForecast, states=ts(t(matvt), start=(time(data)[1] - deltat(data)*modelLagsMax), - frequency=dataFreq), - nParam=parametersNumber, residuals=errors, actuals=otAll, - persistence=vecg, phi=phi, initial=matvt[1:nComponentsNonSeasonal,1], - initialSeason=initialSeason, fittedBeta=yFitted, forecastBeta=yForecast, - initialX=matat[,1], xreg=xreg, updateX=updateX, transitionX=matFX, persistenceX=vecgX); - } -#### Automatic model selection #### - else if(occurrence=="a"){ - IC <- switch(ic, - "AIC"=AIC, - "AICc"=AICc, - "BIC"=BIC, - "BICc"=BICc); - - occurrencePool <- c("f","o","i","d","g"); - occurrencePoolLength <- length(occurrencePool); - occurrenceModels <- vector("list",occurrencePoolLength); - for(i in 1:occurrencePoolLength){ - occurrenceModels[[i]] <- oes(data=data,model=model,occurrence=occurrencePool[i], - ic=ic, h=h, holdout=holdout, - intervals=intervals, level=level, - bounds=bounds, - silent=TRUE, - xreg=xreg, xregDo=xregDo, updateX=updateX, ...); + modelCurrent <- modelsPool[j]; + + results[[j]] <- oes(y, model=modelCurrent, occurrence=occurrence, h=h, holdout=FALSE, + bounds=bounds, silent=TRUE, xreg=xreg, xregDo=xregDo); } - ICBest <- which.min(sapply(occurrenceModels, IC))[1] - occurrence <- occurrencePool[ICBest]; - if(!silentGraph){ - graphmaker(actuals=otAll,forecast=occurrenceModels[[ICBest]]$forecast,fitted=occurrenceModels[[ICBest]]$fitted, - legend=!silentLegend,main=paste0(occurrenceModels[[ICBest]]$model,"_",toupper(occurrence))); + if(!silentText){ + cat("... Done! \n"); } - return(occurrenceModels[[ICBest]]); + + # Write down the ICs of all the tested models + icSelection <- sapply(results, IC); + names(icSelection) <- modelsPool; + icSelection[is.nan(icSelection)] <- 1E100; + icBest <- which.min(icSelection); + + output <- results[[icBest]]; + output$ICs <- icSelection; + occurrence[] <- output$occurrence; + model[] <- modelsPool[icBest]; } -#### None #### else{ - pt <- ts(ot,start=dataStart,frequency=dataFreq); - pForecast <- ts(rep(ot[obsInsample],h), start=yForecastStart, frequency=dataFreq); - errors <- ts(rep(0,obsInsample), start=dataStart, frequency=dataFreq); - parametersNumber[] <- 0; - output <- list(fitted=pt, forecast=pForecast, states=pt, - nParam=parametersNumber, residuals=errors, actuals=pt, - persistence=NULL, initial=NULL, initialSeason=NULL); + stop("The model combination is not implemented in oes just yet", call.=FALSE); } # If there was a holdout, measure the accuracy if(holdout){ - yHoldout <- ts(otAll[(obsInsample+1):obsAll],start=yForecastStart,frequency=dataFreq); + yHoldout <- ts(otAll[(obsInSample+1):obsAll],start=yForecastStart,frequency=dataFreq); output$accuracy <- measures(yHoldout,pForecast,ot); } else{ @@ -901,20 +1173,18 @@ oes <- function(data, model="MNN", persistence=NULL, initial="o", initialSeason= modelname <- "oETS"; } output$occurrence <- occurrence; - output$model <- paste0(modelname,"[",toupper(occurrence),"]"); - if(any(occurrence==c("o","i","d","g"))){ - output$model <- paste0(output$model,"(",model,")"); - } + output$model <- paste0(modelname,"[",toupper(occurrence),"]","(",model,")"); + output$timeElapsed <- Sys.time()-startTime; ##### Make a plot ##### if(!silentGraph){ - # if(intervals){ + # if(interval){ # graphmaker(actuals=otAll, forecast=yForecastNew, fitted=pFitted, lower=yLowerNew, upper=yUpperNew, # level=level,legend=!silentLegend,main=output$model); # } # else{ - graphmaker(actuals=otAll,forecast=output$forecast,fitted=output$fitted, - legend=!silentLegend,main=paste0(output$model)); + graphmaker(actuals=otAll,forecast=output$forecast,fitted=output$fitted, + legend=!silentLegend,main=paste0(output$model)); # } } diff --git a/R/oesg.R b/R/oesg.R index e24c4a1..e432419 100644 --- a/R/oesg.R +++ b/R/oesg.R @@ -7,8 +7,10 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax","updateX","xreg #' The function estimates probability of demand occurrence, based on the iETS_G #' state-space model. It involves the estimation and modelling of the two #' simultaneous state space equations. Thus two parts for the model type, -#' persistence, initials etc. Although the function accepts all types -#' of ETS models, only the pure multiplicative models make sense. +#' persistence, initials etc. +#' +#' For the details about the model and its implementation, see the respective +#' vignette: \code{vignette("oes","smooth")} #' #' The model is based on: #' @@ -21,7 +23,7 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax","updateX","xreg #' @template ssAuthor #' @template ssKeywords #' -#' @param data Either numeric vector or time series vector. +#' @param y Either numeric vector or time series vector. #' @param modelA The type of the ETS for the model A. #' @param modelB The type of the ETS for the model B. #' @param persistenceA The persistence vector \eqn{g}, containing smoothing @@ -44,23 +46,23 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax","updateX","xreg #' @param h Forecast horizon. #' @param holdout If \code{TRUE}, holdout sample of size \code{h} is taken from #' the end of the data. -#' @param intervals Type of intervals to construct. This can be: +#' @param interval Type of interval to construct. This can be: #' #' \itemize{ #' \item \code{none}, aka \code{n} - do not produce prediction -#' intervals. +#' interval. #' \item \code{parametric}, \code{p} - use state-space structure of ETS. In #' case of mixed models this is done using simulations, which may take longer #' time than for the pure additive and pure multiplicative models. -#' \item \code{semiparametric}, \code{sp} - intervals based on covariance +#' \item \code{semiparametric}, \code{sp} - interval based on covariance #' matrix of 1 to h steps ahead errors and assumption of normal / log-normal #' distribution (depending on error type). -#' \item \code{nonparametric}, \code{np} - intervals based on values from a +#' \item \code{nonparametric}, \code{np} - interval based on values from a #' quantile regression on error matrix (see Taylor and Bunn, 1999). The model #' used in this process is e[j] = a j^b, where j=1,..,h. #' } #' The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -#' parametric intervals are constructed, while the latter is equivalent to +#' parametric interval are constructed, while the latter is equivalent to #' \code{none}. #' @param level Confidence level. Defines width of prediction interval. #' @param bounds What type of bounds to use in the model estimation. The first @@ -115,9 +117,9 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax","updateX","xreg #' values: #' #' \itemize{ -#' \item \code{modelA} - the model A of the class oesg, that contains the output similar +#' \item \code{modelA} - the model A of the class oes, that contains the output similar #' to the one from the \code{oes()} function; -#' \item \code{modelB} - the model B of the class oesg, that contains the output similar +#' \item \code{modelB} - the model B of the class oes, that contains the output similar #' to the one from the \code{oes()} function. #' } #' @seealso \code{\link[smooth]{es}, \link[smooth]{oes}} @@ -129,11 +131,11 @@ utils::globalVariables(c("modelDo","initialValue","modelLagsMax","updateX","xreg #' oesg(y, modelA="MNN", modelB="ANN") #' #' @export -oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenceB=NULL, +oesg <- function(y, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenceB=NULL, phiA=NULL, phiB=NULL, initialA="o", initialB="o", initialSeasonA=NULL, initialSeasonB=NULL, ic=c("AICc","AIC","BIC","BICc"), h=10, holdout=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, bounds=c("usual","admissible","none"), silent=c("all","graph","legend","output","none"), xregA=NULL, xregB=NULL, initialXA=NULL, initialXB=NULL, @@ -143,11 +145,18 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc ...){ # Function returns the occurrence part of the intermittent state space model, type G +# Start measuring the time of calculations + startTime <- Sys.time(); + + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + interval <- depricator(interval, list(...), "intervals"); + ##### Preparations ##### occurrence <- "g"; - if(is.smooth.sim(data)){ - data <- data$data; + if(is.smooth.sim(y)){ + y <- y$data; } # Add all the variables in ellipsis to current environment @@ -216,7 +225,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc } #### These are needed in order for ssInput to go forward - cfType <- "MSE"; + loss <- "MSE"; oesmodel <- NULL; #### First call for the environment #### @@ -233,21 +242,21 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc ssInput("oes",ParentEnvironment=environment()); ### Produce vectors with zeroes and ones, fixed probability and the number of ones. - ot <- (y!=0)*1; - otAll <- (data!=0)*1; + ot <- (yInSample!=0)*1; + otAll <- (y!=0)*1; iprob <- mean(ot); obsOnes <- sum(ot); if(all(ot==ot[1])){ warning(paste0("There is no variability in the occurrence of the variable in-sample.\n", "Switching to occurrence='none'."),call.=FALSE) - return(oes(data,occurrence="n")); + return(oes(y,occurrence="n")); } ### Prepare exogenous variables - xregdata <- ssXreg(data=otAll, Etype="A", xreg=xregA, updateX=updateXA, ot=rep(1,obsInsample), + xregdata <- ssXreg(y=otAll, Etype="A", xreg=xregA, updateX=updateXA, ot=rep(1,obsInSample), persistenceX=persistenceXA, transitionX=transitionXA, initialX=initialXA, - obsInsample=obsInsample, obsAll=obsAll, obsStates=obsStates, + obsInSample=obsInSample, obsAll=obsAll, obsStates=obsStates, maxlag=1, h=h, xregDo=xregDoA, silent=silentText, allowMultiplicative=FALSE); @@ -298,9 +307,9 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc ssInput("oes",ParentEnvironment=environment()); ### Prepare exogenous variables - xregdata <- ssXreg(data=otAll, Etype="A", xreg=xregB, updateX=updateXB, ot=rep(1,obsInsample), + xregdata <- ssXreg(y=otAll, Etype="A", xreg=xregB, updateX=updateXB, ot=rep(1,obsInSample), persistenceX=persistenceXB, transitionX=transitionXB, initialX=initialXB, - obsInsample=obsInsample, obsAll=obsAll, obsStates=obsStates, + obsInSample=obsInSample, obsAll=obsAll, obsStates=obsStates, maxlag=1, h=h, xregDo=xregDoB, silent=silentText, allowMultiplicative=FALSE); @@ -337,14 +346,11 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc xregB <- xregdata$xreg; initialXEstimateB <- xregdata$initialXEstimate; - # The start time for the forecasts - yForecastStart <- time(data)[obsInsample]+deltat(data); - #### The functions for the model #### ##### Initialiser of oes. This is called separately for each model ##### # This creates the states, transition, persistence and measurement matrices oesInitialiser <- function(Etype, Ttype, Stype, damped, - dataFreq, obsInsample, obsAll, obsStates, ot, + dataFreq, obsInSample, obsAll, obsStates, ot, persistenceEstimate, persistence, initialType, initialValue, initialSeasonEstimate, initialSeason, modelType){ # Define the lags of the model, number of components and max lag @@ -370,7 +376,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc # Persistence vector. The initials are set here! if(persistenceEstimate){ - vecg <- matrix(0.01, nComponentsAll, 1); + vecg <- matrix(0.05, nComponentsAll, 1); } else{ vecg <- matrix(persistence, nComponentsAll, 1); @@ -416,7 +422,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc # Define the seasonals if(modelIsSeasonal){ if(initialSeasonEstimate){ - XValues <- matrix(rep(diag(modelLagsMax),ceiling(obsInsample/modelLagsMax)),modelLagsMax)[,1:obsInsample]; + XValues <- matrix(rep(diag(modelLagsMax),ceiling(obsInSample/modelLagsMax)),modelLagsMax)[,1:obsInSample]; # The seasonal values should be between -1 and 1 initialSeasonValue <- solve(XValues %*% t(XValues)) %*% XValues %*% (ot - mean(ot)); # But make sure that it lies there @@ -543,7 +549,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - A <- c(A,mean(ot[1:min(dataFreq,obsInsample)]),1E-5); + A <- c(A,mean(ot[1:min(dataFreq,obsInSample)]),1E-5); ALower <- c(ALower,-Inf); AUpper <- c(AUpper,Inf); } @@ -601,7 +607,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - A <- c(A,mean(ot[1:min(dataFreq,obsInsample)]),1E-5); + A <- c(A,mean(ot[1:min(dataFreq,obsInSample)]),1E-5); ALower <- c(ALower,-Inf); AUpper <- c(AUpper,Inf); } @@ -659,7 +665,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc else{ if(Ttype=="A"){ # This is something like ETS(M,A,N), so set level to mean, trend to zero for stability - A <- c(A,mean(ot[1:min(dataFreq,obsInsample)]),1E-5); + A <- c(A,mean(ot[1:min(dataFreq,obsInSample)]),1E-5); ALower <- c(ALower,-Inf); AUpper <- c(AUpper,Inf); } @@ -772,11 +778,11 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc if(modelDo=="estimate"){ # Initialise the model basicparamsA <- oesInitialiser(EtypeA, TtypeA, StypeA, dampedA, - dataFreq, obsInsample, obsAll, obsStatesA, ot, + dataFreq, obsInSample, obsAll, obsStatesA, ot, persistenceEstimateA, persistenceA, initialTypeA, initialValueA, initialSeasonEstimateA, initialSeasonA, modelType="A"); basicparamsB <- oesInitialiser(EtypeB, TtypeB, StypeB, dampedB, - dataFreq, obsInsample, obsAll, obsStatesB, ot, + dataFreq, obsInSample, obsAll, obsStatesB, ot, persistenceEstimateB, persistenceB, initialTypeB, initialValueB, initialSeasonEstimateB, initialSeasonB, modelType="B"); @@ -903,12 +909,12 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc # Produce forecasts if(h>0){ # aForecast is the underlying forecast of the model A - aForecast <- as.vector(forecasterwrap(t(matvtA[,(obsInsample+1):(obsInsample+basicparamsA$modelLagsMax),drop=FALSE]), + aForecast <- as.vector(forecasterwrap(t(matvtA[,(obsInSample+1):(obsInSample+basicparamsA$modelLagsMax),drop=FALSE]), matFA, matwA, h, EtypeA, TtypeA, StypeA, basicparamsA$modelLags, matxtA[(obsAll-h+1):(obsAll),,drop=FALSE], t(matatA[,(obsAll-h+1):(obsAll),drop=FALSE]), matFXA)); # bForecast is the underlying forecast of the model B - bForecast <- as.vector(forecasterwrap(t(matvtB[,(obsInsample+1):(obsInsample+basicparamsB$modelLagsMax),drop=FALSE]), + bForecast <- as.vector(forecasterwrap(t(matvtB[,(obsInSample+1):(obsInSample+basicparamsB$modelLagsMax),drop=FALSE]), matFB, matwB, h, EtypeB, TtypeB, StypeB, basicparamsB$modelLags, matxtB[(obsAll-h+1):(obsAll),,drop=FALSE], t(matatB[,(obsAll-h+1):(obsAll),drop=FALSE]), matFXB)); @@ -966,7 +972,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc parametersNumberB[2,4] <- sum(parametersNumberB[2,1:3]); if(holdout){ - yHoldout <- ts(otAll[(obsInsample+1):obsAll],start=yForecastStart,frequency=dataFreq); + yHoldout <- ts(otAll[(obsInSample+1):obsAll],start=yForecastStart,frequency=dataFreq); errormeasures <- measures(yHoldout,pForecast,ot); } else{ @@ -975,7 +981,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc } } else{ - stop("The model selection and combinations are not implemented in oes just yet", call.=FALSE); + stop("The model selection and combinations are not implemented in oesg() just yet", call.=FALSE); } # Merge states of vt and at if the xreg was provided @@ -1018,19 +1024,23 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc #### Prepare the output #### # Prepare two models - modelA <- list(model=paste0(modelnameA,"(",modelA,")_A"), states=ts(t(matvtA), start=(time(data)[1] - deltat(data)*basicparamsA$modelLagsMax), frequency=dataFreq), + modelA <- list(model=paste0(modelnameA,"(",modelA,")_A"), + states=ts(t(matvtA), start=(time(y)[1] - deltat(y)*basicparamsA$modelLagsMax), + frequency=dataFreq), nParam=parametersNumberA, residuals=errorsA, occurrence="g", persistence=vecgA, phi=phiA, initial=matvtA[1:basicparamsA$nComponentsNonSeasonal,1], initialSeason=initialSeasonA, - fittedBeta=aFitted, forecastBeta=aForecast, + fittedModel=aFitted, forecastModel=aForecast, initialX=matatA[,1], xreg=xregA); class(modelA) <- c("oes","smooth"); - modelB <- list(model=paste0(modelnameB,"(",modelB,")_B"), states=ts(t(matvtB), start=(time(data)[1] - deltat(data)*basicparamsB$modelLagsMax), frequency=dataFreq), + modelB <- list(model=paste0(modelnameB,"(",modelB,")_B"), + states=ts(t(matvtB), start=(time(y)[1] - deltat(y)*basicparamsB$modelLagsMax), + frequency=dataFreq), nParam=parametersNumberB, residuals=errorsB, occurrence="g", persistence=vecgB, phi=phiB, initial=matvtB[1:basicparamsB$nComponentsNonSeasonal,1], initialSeason=initialSeasonB, - fittedBeta=bFitted, forecastBeta=bForecast, + fittedModel=bFitted, forecastModel=bForecast, initialX=matatB[,1], xreg=xregB); class(modelB) <- c("oes","smooth"); @@ -1042,13 +1052,14 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc modelname <- "oETS"; } # Start forming the output - output <- list(model=paste0(modelname,"[G](",modelType(modelA),")(",modelType(modelB),")"), occurrence="g", actuals=otAll, + output <- list(model=paste0(modelname,"[G](",modelType(modelA),")(",modelType(modelB),")"), occurrence="g", y=otAll, fitted=pFitted, forecast=pForecast, modelA=modelA, modelB=modelB, nParam=parametersNumberA+parametersNumberB); + output$timeElapsed <- Sys.time()-startTime; # If there was a holdout, measure the accuracy if(holdout){ - yHoldout <- ts(otAll[(obsInsample+1):obsAll],start=yForecastStart,frequency=dataFreq); + yHoldout <- ts(otAll[(obsInSample+1):obsAll],start=yForecastStart,frequency=dataFreq); output$accuracy <- measures(yHoldout,pForecast,ot); } else{ @@ -1058,7 +1069,7 @@ oesg <- function(data, modelA="MNN", modelB="MNN", persistenceA=NULL, persistenc ##### Make a plot ##### if(!silentGraph){ - # if(intervals){ + # if(interval){ # graphmaker(actuals=otAll, forecast=yForecastNew, fitted=pFitted, lower=yLowerNew, upper=yUpperNew, # level=level,legend=!silentLegend,main=output$model); # } diff --git a/R/simces.R b/R/simces.R index 141b192..2d6f927 100644 --- a/R/simces.R +++ b/R/simces.R @@ -3,6 +3,8 @@ #' Function generates data using CES with Single Source of Error as a data #' generating process. #' +#' For the information about the function, see the vignette: +#' \code{vignette("simulate","smooth")} #' #' @template ssSimParam #' @template ssAuthor diff --git a/R/simes.R b/R/simes.R index f59351a..8459f91 100644 --- a/R/simes.R +++ b/R/simes.R @@ -3,6 +3,8 @@ #' Function generates data using ETS with Single Source of Error as a data #' generating process. #' +#' For the information about the function, see the vignette: +#' \code{vignette("simulate","smooth")} #' #' @template ssSimParam #' @template ssAuthor diff --git a/R/simgum.R b/R/simgum.R index f65b3cd..8b991e4 100644 --- a/R/simgum.R +++ b/R/simgum.R @@ -3,6 +3,8 @@ #' Function generates data using GUM with Single Source of Error as a data #' generating process. #' +#' For the information about the function, see the vignette: +#' \code{vignette("simulate","smooth")} #' #' @template ssSimParam #' @template ssPersistenceParam diff --git a/R/simsma.R b/R/simsma.R index e99892d..765580a 100644 --- a/R/simsma.R +++ b/R/simsma.R @@ -3,6 +3,8 @@ #' Function generates data using SMA in a Single Source of Error state space #' model as a data generating process. #' +#' For the information about the function, see the vignette: +#' \code{vignette("simulate","smooth")} #' #' @template ssSimParam #' @template ssAuthor diff --git a/R/simssarima.R b/R/simssarima.R index d8fb929..a1d6e36 100644 --- a/R/simssarima.R +++ b/R/simssarima.R @@ -3,6 +3,8 @@ #' Function generates data using SSARIMA with Single Source of Error as a data #' generating process. #' +#' For the information about the function, see the vignette: +#' \code{vignette("simulate","smooth")} #' #' @template ssSimParam #' @template ssAuthor @@ -242,7 +244,7 @@ elementsGenerator <- function(ar.orders=ar.orders, ma.orders=ma.orders, i.orders # Get rid of duplicates in lags if(length(unique(lags))!=length(lags)){ - if(frequency(data)!=1){ + if(frequency!=1){ warning(paste0("'lags' variable contains duplicates: (",paste0(lags,collapse=","), "). Getting rid of some of them."),call.=FALSE); } diff --git a/R/simves.R b/R/simves.R index 6b5a60b..28747d6 100644 --- a/R/simves.R +++ b/R/simves.R @@ -4,6 +4,9 @@ utils::globalVariables(c("mvrnorm")); #' #' Function generates data using VES model as a data generating process. #' +#' For the information about the function, see the vignette: +#' \code{vignette("simulate","smooth")} +#' #' @template ssAuthor #' @template vssKeywords #' diff --git a/R/sma.R b/R/sma.R index 050985b..1865aab 100644 --- a/R/sma.R +++ b/R/sma.R @@ -17,6 +17,9 @@ utils::globalVariables(c("yForecastStart")); #' #' Where \eqn{v_{t}} is a state vector. #' +#' For some more information about the model and its implementation, see the +#' vignette: \code{vignette("sma","smooth")} +#' #' @template ssBasicParam #' @template ssAuthor #' @template ssKeywords @@ -49,23 +52,23 @@ utils::globalVariables(c("yForecastStart")); #' \item \code{fitted} - the fitted values. #' \item \code{forecast} - the point forecast. #' \item \code{lower} - the lower bound of prediction interval. When -#' \code{intervals=FALSE} then NA is returned. +#' \code{interval=FALSE} then NA is returned. #' \item \code{upper} - the higher bound of prediction interval. When -#' \code{intervals=FALSE} then NA is returned. +#' \code{interval=FALSE} then NA is returned. #' \item \code{residuals} - the residuals of the estimated model. #' \item \code{errors} - The matrix of 1 to h steps ahead errors. #' \item \code{s2} - variance of the residuals (taking degrees of freedom into #' account). -#' \item \code{intervals} - type of intervals asked by user. -#' \item \code{level} - confidence level for intervals. +#' \item \code{interval} - type of interval asked by user. +#' \item \code{level} - confidence level for interval. #' \item \code{cumulative} - whether the produced forecast was cumulative or not. -#' \item \code{actuals} - the original data. +#' \item \code{y} - the original data. #' \item \code{holdout} - the holdout part of the original data. #' \item \code{ICs} - values of information criteria of the model. Includes AIC, #' AICc, BIC and BICc. #' \item \code{logLik} - log-likelihood of the function. -#' \item \code{cf} - Cost function value. -#' \item \code{cfType} - Type of cost function used in the estimation. +#' \item \code{lossValue} - Cost function value. +#' \item \code{loss} - Type of loss function used in the estimation. #' \item \code{accuracy} - vector of accuracy measures for the #' holdout sample. Includes: MPE, MAPE, SMAPE, MASE, sMAE, RelMAE, sMSE and #' Bias coefficient (based on complex numbers). This is available only when @@ -86,19 +89,19 @@ utils::globalVariables(c("yForecastStart")); #' @examples #' #' # SMA of specific order -#' ourModel <- sma(rnorm(118,100,3),order=12,h=18,holdout=TRUE,intervals="p") +#' ourModel <- sma(rnorm(118,100,3),order=12,h=18,holdout=TRUE,interval="p") #' #' # SMA of arbitrary order -#' ourModel <- sma(rnorm(118,100,3),h=18,holdout=TRUE,intervals="sp") +#' ourModel <- sma(rnorm(118,100,3),h=18,holdout=TRUE,interval="sp") #' #' summary(ourModel) #' forecast(ourModel) #' plot(forecast(ourModel)) #' #' @export sma -sma <- function(data, order=NULL, ic=c("AICc","AIC","BIC","BICc"), +sma <- function(y, order=NULL, ic=c("AICc","AIC","BIC","BICc"), h=10, holdout=FALSE, cumulative=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, silent=c("all","graph","legend","output","none"), ...){ # Function constructs simple moving average in state space model @@ -108,6 +111,10 @@ sma <- function(data, order=NULL, ic=c("AICc","AIC","BIC","BICc"), # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + interval <- depricator(interval, list(...), "intervals"); + # Add all the variables in ellipsis to current environment list2env(list(...),environment()); @@ -128,7 +135,7 @@ sma <- function(data, order=NULL, ic=c("AICc","AIC","BIC","BICc"), occurrence <- "none"; oesmodel <- NULL; bounds <- "admissible"; - cfType <- "MSE"; + loss <- "MSE"; xreg <- NULL; nExovars <- 1; @@ -137,9 +144,9 @@ sma <- function(data, order=NULL, ic=c("AICc","AIC","BIC","BICc"), ssInput("sma",ParentEnvironment=environment()); ##### Preset yFitted, yForecast, errors and basic parameters ##### - yFitted <- rep(NA,obsInsample); + yFitted <- rep(NA,obsInSample); yForecast <- rep(NA,h); - errors <- rep(NA,obsInsample); + errors <- rep(NA,obsInSample); maxlag <- 1; # These three are needed in order to use ssgeneralfun.cpp functions @@ -148,7 +155,7 @@ sma <- function(data, order=NULL, ic=c("AICc","AIC","BIC","BICc"), Stype <- "N"; if(!is.null(order)){ - if(obsInsample < order){ + if(obsInSample < order){ stop("Sorry, but we don't have enough observations for that order.",call.=FALSE); } @@ -176,7 +183,7 @@ sma <- function(data, order=NULL, ic=c("AICc","AIC","BIC","BICc"), # Cost function for GES CF <- function(C){ - fitting <- fitterwrap(matvt, matF, matw, y, vecg, + fitting <- fitterwrap(matvt, matF, matw, yInSample, vecg, modellags, Etype, Ttype, Stype, initialType, matxt, matat, matFX, vecgX, ot); @@ -201,7 +208,7 @@ CreatorSMA <- function(silentText=FALSE,...){ } vecg <- matrix(1/nComponents,nComponents); matvt <- matrix(NA,obsStates,nComponents); - matvt[1:nComponents,1] <- rep(mean(y[1:nComponents]),nComponents); + matvt[1:nComponents,1] <- rep(mean(yInSample[1:nComponents]),nComponents); if(nComponents>1){ for(i in 2:nComponents){ matvt[1:(nComponents-i+1),i] <- matvt[1:(nComponents-i+1)+1,i-1] - matvt[1:(nComponents-i+1),1] * matF[i-1,1]; @@ -211,9 +218,9 @@ CreatorSMA <- function(silentText=FALSE,...){ modellags <- rep(1,nComponents); ##### Prepare exogenous variables ##### - xregdata <- ssXreg(data=data, xreg=NULL, updateX=FALSE, + xregdata <- ssXreg(y=y, xreg=NULL, updateX=FALSE, persistenceX=NULL, transitionX=NULL, initialX=NULL, - obsInsample=obsInsample, obsAll=obsAll, obsStates=obsStates, maxlag=maxlag, h=h, silent=silentText); + obsInSample=obsInSample, obsAll=obsAll, obsStates=obsStates, maxlag=maxlag, h=h, silent=silentText); matxt <- xregdata$matxt; matat <- xregdata$matat; matFX <- xregdata$matFX; @@ -238,7 +245,7 @@ CreatorSMA <- function(silentText=FALSE,...){ environment(ssFitter) <- environment(); if(orderSelect){ - maxOrder <- min(200,obsInsample); + maxOrder <- min(200,obsInSample); ICs <- rep(NA,maxOrder); smaValuesAll <- list(NA); for(i in 1:maxOrder){ @@ -265,12 +272,12 @@ CreatorSMA <- function(silentText=FALSE,...){ ##### Do final check and make some preparations for output ##### if(holdout==T){ - yHoldout <- ts(data[(obsInsample+1):obsAll],start=yForecastStart,frequency=frequency(data)); + yHoldout <- ts(y[(obsInSample+1):obsAll],start=yForecastStart,frequency=dataFreq); if(cumulative){ - errormeasures <- measures(sum(yHoldout),yForecast,h*y); + errormeasures <- measures(sum(yHoldout),yForecast,h*yInSample); } else{ - errormeasures <- measures(yHoldout,yForecast,y); + errormeasures <- measures(yHoldout,yForecast,yInSample); } if(cumulative){ @@ -291,18 +298,18 @@ CreatorSMA <- function(silentText=FALSE,...){ yLowerNew <- yLower; if(cumulative){ yForecastNew <- ts(rep(yForecast/h,h),start=yForecastStart,frequency=dataFreq) - if(intervals){ + if(interval){ yUpperNew <- ts(rep(yUpper/h,h),start=yForecastStart,frequency=dataFreq) yLowerNew <- ts(rep(yLower/h,h),start=yForecastStart,frequency=dataFreq) } } - if(intervals){ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, + if(interval){ + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, level=level,legend=!silentLegend,main=modelname,cumulative=cumulative); } else{ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, legend=!silentLegend,main=modelname,cumulative=cumulative); } } @@ -313,8 +320,8 @@ CreatorSMA <- function(silentText=FALSE,...){ measurement=matw, order=order, initial=matvt[1,], initialType=initialType, nParam=parametersNumber, fitted=yFitted,forecast=yForecast,lower=yLower,upper=yUpper,residuals=errors, - errors=errors.mat,s2=s2,intervals=intervalsType,level=level,cumulative=cumulative, - actuals=data,holdout=yHoldout,occurrence=NULL, - ICs=ICs,logLik=logLik,cf=cfObjective,cfType=cfType,accuracy=errormeasures); + errors=errors.mat,s2=s2,interval=intervalType,level=level,cumulative=cumulative, + y=y,holdout=yHoldout,occurrence=NULL, + ICs=ICs,logLik=logLik,lossValue=cfObjective,loss=loss,accuracy=errormeasures); return(structure(model,class="smooth")); } diff --git a/R/smoothCombine.R b/R/smoothCombine.R index 716d1e4..a8850ff 100644 --- a/R/smoothCombine.R +++ b/R/smoothCombine.R @@ -8,11 +8,11 @@ #' framework. Due to the the complexity of some of the models, the #' estimation process may take some time. So be patient. #' -#' The prediction intervals are combined either probability-wise or +#' The prediction interval are combined either probability-wise or #' quantile-wise (Lichtendahl et al., 2013), which may take extra time, #' because we need to produce all the distributions for all the models. #' This can be sped up with the smaller value for bins parameter, but -#' the resulting intervals may be imprecise. +#' the resulting interval may be imprecise. #' #' @template ssBasicParam #' @template ssAdvancedParam @@ -29,11 +29,11 @@ #' @param initial Can be \code{"optimal"}, meaning that the initial #' states are optimised, or \code{"backcasting"}, meaning that the #' initials are produced using backcasting procedure. -#' @param bins The number of bins for the prediction intervals. +#' @param bins The number of bins for the prediction interval. #' The lower value means faster work of the function, but less #' precise estimates of the quantiles. This needs to be an even #' number. -#' @param intervalsCombine How to average the prediction intervals: +#' @param intervalCombine How to average the prediction interval: #' quantile-wise (\code{"quantile"}) or probability-wise #' (\code{"probability"}). #' @param ... This currently determines nothing. @@ -42,19 +42,19 @@ #' \item \code{timeElapsed} - time elapsed for the construction of the model. #' \item \code{initialType} - type of the initial values used. #' \item \code{fitted} - fitted values of ETS. -#' \item \code{quantiles} - the 3D array of produced quantiles if \code{intervals!="none"} +#' \item \code{quantiles} - the 3D array of produced quantiles if \code{interval!="none"} #' with the dimensions: (number of models) x (bins) x (h). #' \item \code{forecast} - point forecast of ETS. -#' \item \code{lower} - lower bound of prediction interval. When \code{intervals="none"} +#' \item \code{lower} - lower bound of prediction interval. When \code{interval="none"} #' then NA is returned. -#' \item \code{upper} - higher bound of prediction interval. When \code{intervals="none"} +#' \item \code{upper} - higher bound of prediction interval. When \code{interval="none"} #' then NA is returned. #' \item \code{residuals} - residuals of the estimated model. #' \item \code{s2} - variance of the residuals (taking degrees of freedom into account). -#' \item \code{intervals} - type of intervals asked by user. -#' \item \code{level} - confidence level for intervals. +#' \item \code{interval} - type of interval asked by user. +#' \item \code{level} - confidence level for interval. #' \item \code{cumulative} - whether the produced forecast was cumulative or not. -#' \item \code{actuals} - original data. +#' \item \code{y} - original data. #' \item \code{holdout} - holdout part of the original data. #' \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. #' If the model is non-intermittent, then occurrence is \code{NULL}. @@ -77,7 +77,7 @@ #' #' library(Mcomp) #' -#' ourModel <- smoothCombine(M3[[578]],intervals="p") +#' ourModel <- smoothCombine(M3[[578]],interval="p") #' plot(ourModel) #' #' # models parameter accepts either previously estimated smoothCombine @@ -90,12 +90,12 @@ #' #' @importFrom stats fitted #' @export smoothCombine -smoothCombine <- function(data, models=NULL, +smoothCombine <- function(y, models=NULL, initial=c("optimal","backcasting"), ic=c("AICc","AIC","BIC","BICc"), - cfType=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), + loss=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), h=10, holdout=FALSE, cumulative=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, - bins=200, intervalsCombine=c("quantile","probability"), + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, + bins=200, intervalCombine=c("quantile","probability"), occurrence=c("none","auto","fixed","general","odds-ratio","inverse-odds-ratio","probability"), oesmodel="MNN", bounds=c("admissible","none"), @@ -108,6 +108,11 @@ smoothCombine <- function(data, models=NULL, # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + if(any(is.smoothC(models))){ ourQuantiles <- models$quantiles; models <- models$models; @@ -137,8 +142,8 @@ smoothCombine <- function(data, models=NULL, IC <- BICc; } - # Grab the type of intervals combination - intervalsCombine <- substr(intervalsCombine[1],1,1); + # Grab the type of interval combination + intervalCombine <- substr(intervalCombine[1],1,1); modelsNotProvided <- is.null(models); if(modelsNotProvided){ @@ -156,40 +161,40 @@ smoothCombine <- function(data, models=NULL, if(!silentText){ cat("ES"); } - esModel <- es(data,initial=initial,ic=ic,cfType=cfType,h=h,holdout=holdout, - cumulative=cumulative,intervals="n",occurrence=occurrence, + esModel <- es(y,initial=initial,ic=ic,loss=loss,h=h,holdout=holdout, + cumulative=cumulative,interval="n",occurrence=occurrence, oesmodel=oesmodel,bounds=bounds,silent=TRUE, xreg=xreg,xregDo=xregDo,updateX=updateX, initialX=initialX,persistenceX=persistenceX,transitionX=transitionX); if(!silentText){ cat(", CES"); } - cesModel <- auto.ces(data,initial=initial,ic=ic,cfType=cfType,h=h,holdout=holdout, - cumulative=cumulative,intervals="n",occurrence=occurrence, + cesModel <- auto.ces(y,initial=initial,ic=ic,loss=loss,h=h,holdout=holdout, + cumulative=cumulative,interval="n",occurrence=occurrence, oesmodel=oesmodel,bounds=bounds,silent=TRUE, xreg=xreg,xregDo=xregDo,updateX=updateX, initialX=initialX,persistenceX=persistenceX,transitionX=transitionX); if(!silentText){ cat(", SSARIMA"); } - ssarimaModel <- auto.ssarima(data,initial=initial,ic=ic,cfType=cfType,h=h,holdout=holdout, - cumulative=cumulative,intervals="n",occurrence=occurrence, + ssarimaModel <- auto.ssarima(y,initial=initial,ic=ic,loss=loss,h=h,holdout=holdout, + cumulative=cumulative,interval="n",occurrence=occurrence, oesmodel=oesmodel,bounds=bounds,silent=TRUE, xreg=xreg,xregDo=xregDo,updateX=updateX, initialX=initialX,persistenceX=persistenceX,transitionX=transitionX); if(!silentText){ cat(", GUM"); } - gumModel <- auto.gum(data,initial=initial,ic=ic,cfType=cfType,h=h,holdout=holdout, - cumulative=cumulative,intervals="n",occurrence=occurrence, + gumModel <- auto.gum(y,initial=initial,ic=ic,loss=loss,h=h,holdout=holdout, + cumulative=cumulative,interval="n",occurrence=occurrence, oesmodel=oesmodel,bounds=bounds,silent=TRUE, xreg=xreg,xregDo=xregDo,updateX=updateX, initialX=initialX,persistenceX=persistenceX,transitionX=transitionX); if(!silentText){ cat(", SMA"); } - smaModel <- sma(data,ic=ic,h=h,holdout=holdout, - cumulative=cumulative,intervals="n",silent=TRUE); + smaModel <- sma(y,ic=ic,h=h,holdout=holdout, + cumulative=cumulative,interval="n",silent=TRUE); if(!silentText){ cat(". Done!\n"); } @@ -197,10 +202,9 @@ smoothCombine <- function(data, models=NULL, names(models) <- c("ETS","CES","SSARIMA","GUM","SMA"); } - yForecastTest <- forecast(models[[1]],h=h,intervals="none",holdout=holdout); - yForecastStart <- start(yForecastTest$mean); + yForecastTest <- forecast(models[[1]],h=h,interval="none",holdout=holdout); yHoldout <- yForecastTest$model$holdout; - y <- yForecastTest$model$actuals; + yInSample <- yForecastTest$model$y; # Calculate AIC weights ICs <- unlist(lapply(models, IC)); @@ -212,7 +216,7 @@ smoothCombine <- function(data, models=NULL, icBest <- min(ICs); icWeights <- exp(-0.5*(ICs-icBest)) / sum(exp(-0.5*(ICs-icBest))); - modelsForecasts <- lapply(models,forecast,h=h,intervals=intervals, + modelsForecasts <- lapply(models,forecast,h=h,interval=interval, level=0,holdout=holdout,cumulative=cumulative, xreg=xreg); @@ -224,13 +228,13 @@ smoothCombine <- function(data, models=NULL, lower <- upper <- NA; - if(intervalsType!="n"){ - #### This part is for combining the prediction intervals #### + if(intervalType!="n"){ + #### This part is for combining the prediction interval #### quantilesReturned <- matrix(NA,2,h,dimnames=list(c("Lower","Upper"),paste0("h",c(1:h)))); # Minimum and maximum quantiles minMaxQuantiles <- matrix(NA,2,h); - if(intervalsCombine=="p"){ + if(intervalCombine=="p"){ # Probability-based combination if((abs(bins) %% 2)<=1e-100){ bins <- bins-1; @@ -251,7 +255,7 @@ smoothCombine <- function(data, models=NULL, # If quantiles weren't provided by the previous model, produce them if(quantilesRedo){ - # This is needed for appropriate combination of prediction intervals + # This is needed for appropriate combination of prediction interval ourQuantiles <- array(NA,c(nModels,bins,h),dimnames=list(names(models), c(1:bins)/(bins+1), colnames(quantilesReturned))); @@ -260,7 +264,7 @@ smoothCombine <- function(data, models=NULL, ourQuantiles[,"0.5",] <- t(as.matrix(as.data.frame(lapply(modelsForecasts,`[[`,"lower")))); if(!silentText){ - cat("Constructing prediction intervals... "); + cat("Constructing prediction interval... "); } # Do loop writing down all the quantiles for(j in 1:((bins-1)/2)){ @@ -271,7 +275,7 @@ smoothCombine <- function(data, models=NULL, cat(paste0(rep("\b",nchar(round((j-1)/((bins-1)/2),2)*100)+1),collapse="")); cat(paste0(round(j/((bins-1)/2),2)*100,"%")); } - modelsForecasts <- lapply(models,forecast,h=h,intervals=intervals, + modelsForecasts <- lapply(models,forecast,h=h,interval=interval, level=j*2/(bins+1),holdout=holdout,cumulative=cumulative, xreg=xreg); @@ -305,7 +309,7 @@ smoothCombine <- function(data, models=NULL, } } else{ - modelsForecasts <- lapply(models,forecast,h=h,intervals=intervals, + modelsForecasts <- lapply(models,forecast,h=h,interval=interval, level=level,holdout=holdout,cumulative=cumulative, xreg=xreg); @@ -319,17 +323,17 @@ smoothCombine <- function(data, models=NULL, upper <- ts(quantilesReturned[2,],start=yForecastStart,frequency=dataFreq); } - y <- y[1:length(yFitted)]; - errors <- c(y[1:length(yFitted)])-c(yFitted); + yInSample <- yInSample[1:length(yFitted)]; + errors <- c(yInSample[1:length(yFitted)])-c(yFitted); s2 <- mean(errors^2); ##### Now let's deal with holdout ##### if(holdout){ if(cumulative){ - errormeasures <- measures(sum(yHoldout),yForecast,h*y); + errormeasures <- measures(sum(yHoldout),yForecast,h*yInSample); } else{ - errormeasures <- measures(yHoldout,yForecast,y); + errormeasures <- measures(yHoldout,yForecast,yInSample); } if(cumulative){ @@ -346,18 +350,18 @@ smoothCombine <- function(data, models=NULL, lowerNew <- lower; if(cumulative){ yForecastNew <- ts(rep(yForecast/h,h),start=yForecastStart,frequency=dataFreq) - if(intervals){ + if(interval){ upperNew <- ts(rep(upper/h,h),start=yForecastStart,frequency=dataFreq) lowerNew <- ts(rep(lower/h,h),start=yForecastStart,frequency=dataFreq) } } - if(intervals){ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, lower=lowerNew,upper=upperNew, + if(interval){ + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, lower=lowerNew,upper=upperNew, level=level,legend=!silentLegend,main="Combined smooth forecasts",cumulative=cumulative); } else{ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, legend=!silentLegend,main="Combined smooth forecasts",cumulative=cumulative); } } @@ -365,9 +369,9 @@ smoothCombine <- function(data, models=NULL, model <- list(timeElapsed=Sys.time()-startTime, models=models, initialType=initialType, fitted=yFitted, quantiles=ourQuantiles, forecast=yForecast, lower=lower, upper=upper, residuals=errors, s2=s2, - intervals=intervalsType, level=level, cumulative=cumulative, - actuals=data, holdout=yHoldout, ICs=ICs, ICw=icWeights, cfType=cfType, - cf=NULL,accuracy=errormeasures); + interval=intervalType, level=level, cumulative=cumulative, + y=y, holdout=yHoldout, ICs=ICs, ICw=icWeights, loss=loss, + lossValue=NULL,accuracy=errormeasures); return(structure(model,class=c("smoothC","smooth"))); } diff --git a/R/ssarima.R b/R/ssarima.R index 0a2a9cb..865b4be 100644 --- a/R/ssarima.R +++ b/R/ssarima.R @@ -45,6 +45,9 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' #' The model selection for SSARIMA is done by the \link[smooth]{auto.ssarima} function. #' +#' For some more information about the model and its implementation, see the +#' vignette: \code{vignette("ssarima","smooth")} +#' #' @template ssBasicParam #' @template ssAdvancedParam #' @template ssInitialParam @@ -112,17 +115,17 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' \item \code{fitted} - the fitted values. #' \item \code{forecast} - the point forecast. #' \item \code{lower} - the lower bound of prediction interval. When -#' \code{intervals="none"} then NA is returned. +#' \code{interval="none"} then NA is returned. #' \item \code{upper} - the higher bound of prediction interval. When -#' \code{intervals="none"} then NA is returned. +#' \code{interval="none"} then NA is returned. #' \item \code{residuals} - the residuals of the estimated model. #' \item \code{errors} - The matrix of 1 to h steps ahead errors. #' \item \code{s2} - variance of the residuals (taking degrees of freedom into #' account). -#' \item \code{intervals} - type of intervals asked by user. -#' \item \code{level} - confidence level for intervals. +#' \item \code{interval} - type of interval asked by user. +#' \item \code{level} - confidence level for interval. #' \item \code{cumulative} - whether the produced forecast was cumulative or not. -#' \item \code{actuals} - the original data. +#' \item \code{y} - the original data. #' \item \code{holdout} - the holdout part of the original data. #' \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. #' If the model is non-intermittent, then occurrence is \code{NULL}. @@ -138,8 +141,8 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' \item \code{ICs} - values of information criteria of the model. Includes #' AIC, AICc, BIC and BICc. #' \item \code{logLik} - log-likelihood of the function. -#' \item \code{cf} - Cost function value. -#' \item \code{cfType} - Type of cost function used in the estimation. +#' \item \code{lossValue} - Cost function value. +#' \item \code{loss} - Type of loss function used in the estimation. #' \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} #' or when \code{FI} is not provided at all. #' \item \code{accuracy} - vector of accuracy measures for the holdout sample. @@ -158,11 +161,11 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' #' # ARIMA(1,1,1) fitted to some data #' ourModel <- ssarima(rnorm(118,100,3),orders=list(ar=c(1),i=c(1),ma=c(1)),lags=c(1),h=18, -#' holdout=TRUE,intervals="p") +#' holdout=TRUE,interval="p") #' #' # The previous one is equivalent to: #' \dontrun{ourModel <- ssarima(rnorm(118,100,3),ar.orders=c(1),i.orders=c(1),ma.orders=c(1),lags=c(1),h=18, -#' holdout=TRUE,intervals="p")} +#' holdout=TRUE,interval="p")} #' #' # Model with the same lags and orders, applied to a different data #' ssarima(rnorm(118,100,3),orders=orders(ourModel),lags=lags(ourModel),h=18,holdout=TRUE) @@ -184,8 +187,8 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' h=10,holdout=TRUE)} #' #' # ARIMA(1,1,1) with Mean Squared Trace Forecast Error -#' \dontrun{ssarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,cfType="TMSE") -#' ssarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,cfType="aTMSE")} +#' \dontrun{ssarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,loss="TMSE") +#' ssarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,loss="aTMSE")} #' #' # SARIMA(0,1,1) with exogenous variables #' ssarima(rnorm(118,100,3),orders=list(i=1,ma=1),h=18,holdout=TRUE,xreg=c(1:118)) @@ -199,12 +202,12 @@ utils::globalVariables(c("normalizer","constantValue","constantRequired","consta #' plot(forecast(ourModel)) #' #' @export ssarima -ssarima <- function(data, orders=list(ar=c(0),i=c(1),ma=c(1)), lags=c(1), +ssarima <- function(y, orders=list(ar=c(0),i=c(1),ma=c(1)), lags=c(1), constant=FALSE, AR=NULL, MA=NULL, initial=c("backcasting","optimal"), ic=c("AICc","AIC","BIC","BICc"), - cfType=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), + loss=c("MSE","MAE","HAM","MSEh","TMSE","GTMSE","MSCE"), h=10, holdout=FALSE, cumulative=FALSE, - intervals=c("none","parametric","semiparametric","nonparametric"), level=0.95, + interval=c("none","parametric","semiparametric","nonparametric"), level=0.95, occurrence=c("none","auto","fixed","general","odds-ratio","inverse-odds-ratio","direct"), oesmodel="MNN", bounds=c("admissible","none"), @@ -214,11 +217,16 @@ ssarima <- function(data, orders=list(ar=c(0),i=c(1),ma=c(1)), lags=c(1), ##### Function constructs SARIMA model (possible triple seasonality) using state space approach # ar.orders contains vector of seasonal ARs. ar.orders=c(2,1,3) will mean AR(2)*SAR(1)*SAR(3) - model with double seasonality. # -# Copyright (C) 2016 Ivan Svetunkov +# Copyright (C) 2016 - Inf Ivan Svetunkov # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + # Add all the variables in ellipsis to current environment list2env(list(...),environment()); @@ -329,9 +337,9 @@ CF <- function(C){ cfRes <- costfuncARIMA(ar.orders, ma.orders, i.orders, lags, nComponents, ARValue, MAValue, constantValue, C, - matvt, matF, matw, y, vecg, + matvt, matF, matw, yInSample, vecg, h, modellags, Etype, Ttype, Stype, - multisteps, cfType, normalizer, initialType, + multisteps, loss, normalizer, initialType, nExovars, matxt, matat, matFX, vecgX, ot, AREstimate, MAEstimate, constantRequired, constantEstimate, xregEstimate, updateX, FXEstimate, gXEstimate, initialXEstimate, @@ -373,10 +381,10 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ if(constantEstimate){ if(all(i.orders==0)){ - C <- c(C,sum(yot)/obsInsample); + C <- c(C,sum(yot)/obsInSample); } else{ - C <- c(C,sum(diff(yot))/obsInsample); + C <- c(C,sum(diff(yot))/obsInSample); } } @@ -461,11 +469,11 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ matvt[1,1:nComponents] <- initialValue; } else{ - if(obsInsample<(nComponents+dataFreq)){ - matvt[1:nComponents,] <- y[1:nComponents] + diff(y[1:(nComponents+1)]); + if(obsInSample<(nComponents+dataFreq)){ + matvt[1:nComponents,] <- yInSample[1:nComponents] + diff(yInSample[1:(nComponents+1)]); } else{ - matvt[1:nComponents,] <- (y[1:nComponents]+y[1:nComponents+dataFreq])/2; + matvt[1:nComponents,] <- (yInSample[1:nComponents]+yInSample[1:nComponents+dataFreq])/2; } } } @@ -477,14 +485,14 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ } ##### Preset yFitted, yForecast, errors and basic parameters ##### - yFitted <- rep(NA,obsInsample); + yFitted <- rep(NA,obsInSample); yForecast <- rep(NA,h); - errors <- rep(NA,obsInsample); + errors <- rep(NA,obsInSample); ##### Prepare exogenous variables ##### - xregdata <- ssXreg(data=data, xreg=xreg, updateX=updateX, ot=ot, + xregdata <- ssXreg(y=y, xreg=xreg, updateX=updateX, ot=ot, persistenceX=persistenceX, transitionX=transitionX, initialX=initialX, - obsInsample=obsInsample, obsAll=obsAll, obsStates=obsStates, + obsInSample=obsInSample, obsAll=obsAll, obsStates=obsStates, maxlag=maxlag, h=h, xregDo=xregDo, silent=silentText); if(xregDo=="u"){ @@ -565,11 +573,11 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ # If this is tiny sample, use ARIMA with constant instead if(tinySample){ warning("Not enough observations to fit ARIMA. Switching to ARIMA(0,0,0) with constant.",call.=FALSE); - return(ssarima(data,orders=list(ar=0,i=0,ma=0),lags=1, + return(ssarima(y,orders=list(ar=0,i=0,ma=0),lags=1, constant=TRUE, - initial=initial,cfType=cfType, + initial=initial,loss=loss, h=h,holdout=holdout,cumulative=cumulative, - intervals=intervals,level=level, + interval=interval,level=level, occurrence=occurrence, oesmodel=oesmodel, bounds="u", @@ -763,7 +771,7 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ } # Fill in the rest of matvt - matvt <- ts(matvt,start=(time(data)[1] - deltat(data)*maxlag),frequency=frequency(data)); + matvt <- ts(matvt,start=(time(y)[1] - deltat(y)*maxlag),frequency=dataFreq); if(!is.null(xreg)){ matvt <- cbind(matvt,matat[1:nrow(matvt),]); colnames(matvt) <- c(paste0("Component ",c(1:max(1,nComponents))),colnames(matat)); @@ -914,12 +922,12 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ ##### Deal with the holdout sample ##### if(holdout){ - yHoldout <- ts(data[(obsInsample+1):obsAll],start=yForecastStart,frequency=frequency(data)); + yHoldout <- ts(y[(obsInSample+1):obsAll],start=yForecastStart,frequency=dataFreq); if(cumulative){ - errormeasures <- measures(sum(yHoldout),yForecast,h*y); + errormeasures <- measures(sum(yHoldout),yForecast,h*yInSample); } else{ - errormeasures <- measures(yHoldout,yForecast,y); + errormeasures <- measures(yHoldout,yForecast,yInSample); } if(cumulative){ @@ -938,18 +946,18 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ yLowerNew <- yLower; if(cumulative){ yForecastNew <- ts(rep(yForecast/h,h),start=yForecastStart,frequency=dataFreq) - if(intervals){ + if(interval){ yUpperNew <- ts(rep(yUpper/h,h),start=yForecastStart,frequency=dataFreq) yLowerNew <- ts(rep(yLower/h,h),start=yForecastStart,frequency=dataFreq) } } - if(intervals){ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, + if(interval){ + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, lower=yLowerNew,upper=yUpperNew, level=level,legend=!silentLegend,main=modelname,cumulative=cumulative); } else{ - graphmaker(actuals=data,forecast=yForecastNew,fitted=yFitted, + graphmaker(actuals=y,forecast=yForecastNew,fitted=yFitted, legend=!silentLegend,main=modelname,cumulative=cumulative); } } @@ -962,9 +970,9 @@ CreatorSSARIMA <- function(silentText=FALSE,...){ initialType=initialType,initial=initialValue, nParam=parametersNumber, fitted=yFitted,forecast=yForecast,lower=yLower,upper=yUpper,residuals=errors, - errors=errors.mat,s2=s2,intervals=intervalsType,level=level,cumulative=cumulative, - actuals=data,holdout=yHoldout,occurrence=occurrenceModel, + errors=errors.mat,s2=s2,interval=intervalType,level=level,cumulative=cumulative, + y=y,holdout=yHoldout,occurrence=occurrenceModel, xreg=xreg,updateX=updateX,initialX=initialX,persistenceX=persistenceX,transitionX=transitionX, - ICs=ICs,logLik=logLik,cf=cfObjective,cfType=cfType,FI=FI,accuracy=errormeasures); + ICs=ICs,logLik=logLik,lossValue=cfObjective,loss=loss,FI=FI,accuracy=errormeasures); return(structure(model,class="smooth")); } diff --git a/R/ssfunctions.R b/R/ssfunctions.R index 6696179..ed315a4 100644 --- a/R/ssfunctions.R +++ b/R/ssfunctions.R @@ -1,7 +1,7 @@ -utils::globalVariables(c("h","holdout","orders","lags","transition","measurement","multisteps","ot","obsInsample","obsAll", - "obsStates","obsNonzero","obsZero","pFitted","cfType","CF","Etype","Ttype","Stype","matxt","matFX","vecgX","xreg", - "matvt","nExovars","matat","errors","nParam","intervals","intervalsType","level","model","oesmodel","imodel", - "constant","AR","MA","data","yFitted","cumulative","rounded")); +utils::globalVariables(c("h","holdout","orders","lags","transition","measurement","multisteps","ot","obsInSample","obsAll", + "obsStates","obsNonzero","obsZero","pFitted","loss","CF","Etype","Ttype","Stype","matxt","matFX","vecgX","xreg", + "matvt","nExovars","matat","errors","nParam","interval","intervalType","level","model","oesmodel","imodel", + "constant","AR","MA","y","yFitted","cumulative","rounded")); ##### *Checker of input of basic functions* ##### ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ @@ -65,45 +65,46 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ } ##### data ##### - if(any(is.smooth.sim(data))){ - data <- data$data; + if(any(is.smooth.sim(y))){ + y <- y$data; } - else if(any(class(data)=="Mdata")){ - h <- data$h; + else if(any(class(y)=="Mdata")){ + h <- y$h; holdout <- TRUE; - data <- ts(c(data$x,data$xx),start=start(data$x),frequency=frequency(data$x)); + y <- ts(c(y$x,y$xx),start=start(y$x),frequency=frequency(y$x)); } - if(!is.numeric(data)){ + if(!is.numeric(y)){ stop("The provided data is not a vector or ts object! Can't construct any model!", call.=FALSE); } - if(!is.null(ncol(data))){ - if(ncol(data)>1){ + if(!is.null(ncol(y))){ + if(ncol(y)>1){ stop("The provided data is not a vector! Can't construct any model!", call.=FALSE); } } # Check the data for NAs - if(any(is.na(data))){ + if(any(is.na(y))){ if(!silentText){ warning("Data contains NAs. These observations will be substituted by zeroes.",call.=FALSE); } - data[is.na(data)] <- 0; + y[is.na(y)] <- 0; } # Define obs, the number of observations of in-sample - obsInsample <- length(data) - holdout*h; + obsInSample <- length(y) - holdout*h; # Define obsAll, the overal number of observations (in-sample + holdout) - obsAll <- length(data) + (1 - holdout)*h; + obsAll <- length(y) + (1 - holdout)*h; - # If obsInsample is negative, this means that we can't do anything... - if(obsInsample<=0){ + # If obsInSample is negative, this means that we can't do anything... + if(obsInSample<=0){ stop("Not enough observations in sample.",call.=FALSE); } # Define the actual values - y <- matrix(data[1:obsInsample],obsInsample,1); - dataFreq <- frequency(data); - dataStart <- start(data); + yInSample <- matrix(y[1:obsInSample],obsInSample,1); + dataFreq <- frequency(y); + dataStart <- start(y); + yForecastStart <- time(y)[obsInSample]+deltat(y); # Number of parameters to estimate / provided parametersNumber <- matrix(0,2,4, @@ -319,7 +320,7 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ # Get rid of duplicates in lags if(length(unique(lags))!=length(lags)){ - if(frequency(data)!=1){ + if(dataFreq!=1){ warning(paste0("'lags' variable contains duplicates: (",paste0(lags,collapse=","), "). Getting rid of some of them."),call.=FALSE); } @@ -434,16 +435,24 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ # Number of components to use nComponents <- max(ar.orders %*% lags + i.orders %*% lags,ma.orders %*% lags); + # If there is no constant and there are no orders + if(nComponents==0 & !constantRequired){ + constantValue <- 0; + constantRequired[] <- TRUE + nComponenst <- 1; + } + nonZeroARI <- matrix(1,ncol=2); nonZeroMA <- matrix(1,ncol=2); modellags <- matrix(rep(1,nComponents),ncol=1); + if(constantRequired){ modellags <- rbind(modellags,1); } maxlag <- 1; - if(obsInsample < nComponents){ - warning(paste0("In-sample size is ",obsInsample,", while number of components is ",nComponents, + if(obsInSample < nComponents){ + warning(paste0("In-sample size is ",obsInSample,", while number of components is ",nComponents, ". Cannot fit the model."),call.=FALSE) stop("Not enough observations for such a complicated model.",call.=FALSE); } @@ -460,7 +469,7 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ if(any(smoothType==c("es","oes"))){ modelIsSeasonal <- Stype!="N"; # Check if the data is ts-object - if(!is.ts(data) & modelIsSeasonal){ + if(!is.ts(y) & modelIsSeasonal){ if(!silentText){ message("The provided data is not ts object. Only non-seasonal models are available."); } @@ -495,7 +504,7 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ else if(smoothType=="sma"){ maxlag <- 1; if(is.null(order)){ - nParamMax <- obsInsample; + nParamMax <- obsInSample; } else{ nParamMax <- order; @@ -549,14 +558,14 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ nComponents <- sum(orders); type <- substr(type[1],1,1); - if(type=="M"){ - if(any(y<=0)){ + if(type=="m"){ + if(any(yInSample<=0)){ warning("Multiplicative model can only be used on positive data. Switching to the additive one.",call.=FALSE); modelIsMultiplicative <- FALSE; - type <- "A"; + type <- "a"; } else{ - y <- log(y); + yInSample <- log(yInSample); modelIsMultiplicative <- TRUE; } } @@ -674,8 +683,8 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ nComponents <- length(nonZeroComponents); - if(obsInsample < nComponents){ - warning(paste0("In-sample size is ",obsInsample,", while number of components is ",nComponents, + if(obsInSample < nComponents){ + warning(paste0("In-sample size is ",obsInSample,", while number of components is ",nComponents, ". Cannot fit the model."),call.=FALSE) stop("Not enough observations for such a complicated model.",call.=FALSE); } @@ -690,7 +699,7 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ ##### obsStates ##### # Define the number of rows that should be in the matvt - obsStates <- max(obsAll + maxlag, obsInsample + 2*maxlag); + obsStates <- max(obsAll + maxlag, obsInSample + 2*maxlag); ##### bounds ##### bounds <- substring(bounds[1],1,1); @@ -706,86 +715,86 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ ic <- "AICc"; } - ##### Cost function type ##### - cfType <- cfType[1]; - if(any(cfType==c("MSEh","TMSE","GTMSE","MSCE","MAEh","TMAE","GTMAE","MACE", + ##### Loss function type ##### + loss <- loss[1]; + if(any(loss==c("MSEh","TMSE","GTMSE","MSCE","MAEh","TMAE","GTMAE","MACE", "HAMh","THAM","GTHAM","CHAM", "TFL","aMSEh","aTMSE","aGTMSE","aTFL"))){ multisteps <- TRUE; } - else if(any(cfType==c("MSE","MAE","HAM","TSB","Rounded","LogisticD","LogisticL"))){ + else if(any(loss==c("MSE","MAE","HAM","TSB","Rounded","LogisticD","LogisticL"))){ multisteps <- FALSE; } else{ - if(cfType=="MSTFE"){ + if(loss=="MSTFE"){ warning(paste0("This estimator has recently been renamed from \"MSTFE\" to \"TMSE\". ", "Please, use the new name."),call.=FALSE); multisteps <- TRUE; - cfType <- "TMSE"; + loss <- "TMSE"; } - else if(cfType=="GMSTFE"){ + else if(loss=="GMSTFE"){ warning(paste0("This estimator has recently been renamed from \"GMSTFE\" to \"GTMSE\". ", "Please, use the new name."),call.=FALSE); multisteps <- TRUE; - cfType <- "GTMSE"; + loss <- "GTMSE"; } - else if(cfType=="aMSTFE"){ + else if(loss=="aMSTFE"){ warning(paste0("This estimator has recently been renamed from \"aMSTFE\" to \"aTMSE\". ", "Please, use the new name."),call.=FALSE); multisteps <- TRUE; - cfType <- "aTMSE"; + loss <- "aTMSE"; } - else if(cfType=="aGMSTFE"){ + else if(loss=="aGMSTFE"){ warning(paste0("This estimator has recently been renamed from \"aGMSTFE\" to \"aGTMSE\". ", "Please, use the new name."),call.=FALSE); multisteps <- TRUE; - cfType <- "aGTMSE"; + loss <- "aGTMSE"; } else{ - warning(paste0("Strange cost function specified: ",cfType,". Switching to 'MSE'."),call.=FALSE); - cfType <- "MSE"; + warning(paste0("Strange loss function specified: ",loss,". Switching to 'MSE'."),call.=FALSE); + loss <- "MSE"; multisteps <- FALSE; } } - cfTypeOriginal <- cfType; + lossOriginal <- loss; - ##### intervals, intervalsType, level ##### - #intervalsType <- substring(intervalsType[1],1,1); - intervalsType <- intervals[1]; + ##### interval, intervalType, level ##### + #intervalType <- substring(intervalType[1],1,1); + intervalType <- interval[1]; # Check the provided type of interval - if(is.logical(intervalsType)){ - if(intervalsType){ - intervalsType <- "p"; + if(is.logical(intervalType)){ + if(intervalType){ + intervalType <- "p"; } else{ - intervalsType <- "none"; + intervalType <- "none"; } } - if(all(intervalsType!=c("p","s","n","a","sp","np","none","parametric","semiparametric","nonparametric"))){ - warning(paste0("Wrong type of interval: '",intervalsType, "'. Switching to 'parametric'."),call.=FALSE); - intervalsType <- "p"; + if(all(intervalType!=c("p","s","n","a","sp","np","none","parametric","semiparametric","nonparametric"))){ + warning(paste0("Wrong type of interval: '",intervalType, "'. Switching to 'parametric'."),call.=FALSE); + intervalType <- "p"; } - if(any(intervalsType==c("none","n"))){ - intervalsType <- "n"; - intervals <- FALSE; + if(any(intervalType==c("none","n"))){ + intervalType <- "n"; + interval <- FALSE; } - else if(any(intervalsType==c("parametric","p"))){ - intervalsType <- "p"; - intervals <- TRUE; + else if(any(intervalType==c("parametric","p"))){ + intervalType <- "p"; + interval <- TRUE; } - else if(any(intervalsType==c("semiparametric","sp"))){ - intervalsType <- "sp"; - intervals <- TRUE; + else if(any(intervalType==c("semiparametric","sp"))){ + intervalType <- "sp"; + interval <- TRUE; } - else if(any(intervalsType==c("nonparametric","np"))){ - intervalsType <- "np"; - intervals <- TRUE; + else if(any(intervalType==c("nonparametric","np"))){ + intervalType <- "np"; + interval <- TRUE; } else{ - intervals <- TRUE; + interval <- TRUE; } if(level>1){ @@ -806,7 +815,12 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ occurrenceModelProvided <- FALSE; } else{ - occurrenceModel <- oesmodel; + if(is.null(oesmodel) || is.na(oesmodel)){ + occurrenceModel <- "MNN"; + } + else{ + occurrenceModel <- oesmodel; + } occurrenceModelProvided <- FALSE; } @@ -846,11 +860,11 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ "Where should we plug in the future occurences anyway?\n", "Switching to occurrence='fixed'."),call.=FALSE); occurrence <- "f"; - ot <- (y!=0)*1; + ot <- (yInSample!=0)*1; obsNonzero <- sum(ot); - obsZero <- obsInsample - obsNonzero; - yot <- matrix(y[y!=0],obsNonzero,1); - pFitted <- matrix(mean(ot),obsInsample,1); + obsZero <- obsInSample - obsNonzero; + yot <- matrix(yInSample[yInSample!=0],obsNonzero,1); + pFitted <- matrix(mean(ot),obsInSample,1); pForecast <- matrix(1,h,1); nParamOccurrence <- 1; } @@ -861,16 +875,16 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ occurrence <- (occurrence!=0)*1; } - ot <- (y!=0)*1; + ot <- (yInSample!=0)*1; obsNonzero <- sum(ot); - obsZero <- obsInsample - obsNonzero; - yot <- matrix(y[y!=0],obsNonzero,1); + obsZero <- obsInSample - obsNonzero; + yot <- matrix(yInSample[yInSample!=0],obsNonzero,1); if(length(occurrence)==obsAll){ - pFitted <- occurrence[1:obsInsample]; - pForecast <- occurrence[(obsInsample+1):(obsInsample+h)]; + pFitted <- occurrence[1:obsInSample]; + pForecast <- occurrence[(obsInSample+1):(obsInSample+h)]; } else{ - pFitted <- matrix(ot,obsInsample,1); + pFitted <- matrix(ot,obsInSample,1); pForecast <- matrix(occurrence,h,1); } @@ -904,14 +918,14 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ } } else{ - obsNonzero <- obsInsample; + obsNonzero <- obsInSample; obsZero <- 0; } if(any(smoothType==c("es"))){ # Check if multiplicative models can be fitted - allowMultiplicative <- !((any(y<=0) && occurrence=="n") | (occurrence!="n" && any(y<0))); - if(any(cfType==c("LogisticL","LogisticD"))){ + allowMultiplicative <- !((any(yInSample<=0) && occurrence=="n") | (occurrence!="n" && any(yInSample<0))); + if(any(loss==c("LogisticL","LogisticD"))){ allowMultiplicative <- TRUE; } # If non-positive values are present, check if data is intermittent, if negatives are here, switch to additive models @@ -1176,8 +1190,8 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ # 1. Seasonal model, <=2 seasons of data and no initial seasonals. # 2. Seasonal model, <=1 season of data, no initial seasonals and no persistence. if(is.null(modelsPool)){ - if((modelIsSeasonal & (obsInsample <= 2*dataFreq) & is.null(initialSeason)) | - (modelIsSeasonal & (obsInsample <= dataFreq) & is.null(initialSeason) & is.null(persistence))){ + if((modelIsSeasonal & (obsInSample <= 2*dataFreq) & is.null(initialSeason)) | + (modelIsSeasonal & (obsInSample <= dataFreq) & is.null(initialSeason) & is.null(persistence))){ if(is.null(initialSeason)){ warning(paste0("Sorry, but we don't have enough observations for the seasonal model!\n", "Switching to non-seasonal."),call.=FALSE); @@ -1300,17 +1314,17 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ } # Stop if number of observations is less than horizon and multisteps is chosen. - if((multisteps) & (obsNonzero < h+1) & all(cfType!=c("aMSEh","aTMSE","aGTMSE","aTFL"))){ - warning(paste0("Do you seriously think that you can use ",cfType, + if((multisteps) & (obsNonzero < h+1) & all(loss!=c("aMSEh","aTMSE","aGTMSE","aTFL"))){ + warning(paste0("Do you seriously think that you can use ",loss, " with h=",h," on ",obsNonzero," non-zero observations?!"),call.=FALSE); - stop("Not enough observations for multisteps cost function.",call.=FALSE); + stop("Not enough observations for multisteps loss function.",call.=FALSE); } - else if((multisteps) & (obsNonzero < 2*h) & all(cfType!=c("aMSEh","aTMSE","aGTMSE","aTFL"))){ - warning(paste0("Number of observations is really low for a multisteps cost function! ", + else if((multisteps) & (obsNonzero < 2*h) & all(loss!=c("aMSEh","aTMSE","aGTMSE","aTFL"))){ + warning(paste0("Number of observations is really low for a multisteps loss function! ", "We will, try but cannot guarantee anything..."),call.=FALSE); } - normalizer <- mean(abs(diff(c(y)))); + normalizer <- mean(abs(diff(c(yInSample)))); ##### Define xregDo ##### if(smoothType!="sma"){ @@ -1349,8 +1363,8 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ } else{ if(rounded){ - cfType <- "Rounded"; - cfTypeOriginal <- cfType; + loss <- "Rounded"; + lossOriginal <- loss; } } } @@ -1361,21 +1375,22 @@ ssInput <- function(smoothType=c("es","gum","ces","ssarima","smoothC"),...){ assign("silentText",silentText,ParentEnvironment); assign("silentGraph",silentGraph,ParentEnvironment); assign("silentLegend",silentLegend,ParentEnvironment); - assign("obsInsample",obsInsample,ParentEnvironment); + assign("obsInSample",obsInSample,ParentEnvironment); assign("obsAll",obsAll,ParentEnvironment); assign("obsStates",obsStates,ParentEnvironment); assign("obsNonzero",obsNonzero,ParentEnvironment); assign("obsZero",obsZero,ParentEnvironment); - assign("data",data,ParentEnvironment); assign("y",y,ParentEnvironment); + assign("yInSample",yInSample,ParentEnvironment); assign("dataFreq",dataFreq,ParentEnvironment); assign("dataStart",dataStart,ParentEnvironment); + assign("yForecastStart",yForecastStart,ParentEnvironment); assign("bounds",bounds,ParentEnvironment); - assign("cfType",cfType,ParentEnvironment); - assign("cfTypeOriginal",cfTypeOriginal,ParentEnvironment); + assign("loss",loss,ParentEnvironment); + assign("lossOriginal",lossOriginal,ParentEnvironment); assign("multisteps",multisteps,ParentEnvironment); - assign("intervalsType",intervalsType,ParentEnvironment); - assign("intervals",intervals,ParentEnvironment); + assign("intervalType",intervalType,ParentEnvironment); + assign("interval",interval,ParentEnvironment); assign("initialValue",initialValue,ParentEnvironment); assign("initialType",initialType,ParentEnvironment); assign("normalizer",normalizer,ParentEnvironment); @@ -1524,39 +1539,40 @@ ssAutoInput <- function(smoothType=c("auto.ces","auto.gum","auto.ssarima","auto. } ##### data ##### - if(any(is.smooth.sim(data))){ - data <- data$data; + if(any(is.smooth.sim(y))){ + y <- y$data; } - else if(any(class(data)=="Mdata")){ - h <- data$h; + else if(any(class(y)=="Mdata")){ + h <- y$h; holdout <- TRUE; - data <- ts(c(data$x,data$xx),start=start(data$x),frequency=frequency(data$x)); + y <- ts(c(y$x,y$xx),start=start(y$x),frequency=frequency(y$x)); } - if(!is.numeric(data)){ + if(!is.numeric(y)){ stop("The provided data is not a vector or ts object! Can't build any model!", call.=FALSE); } # Check the data for NAs - if(any(is.na(data))){ + if(any(is.na(y))){ if(!silentText){ warning("Data contains NAs. These observations will be substituted by zeroes.",call.=FALSE); } - data[is.na(data)] <- 0; + y[is.na(y)] <- 0; } ##### Observations ##### # Define obs, the number of observations of in-sample - obsInsample <- length(data) - holdout*h; + obsInSample <- length(y) - holdout*h; # Define obsAll, the overal number of observations (in-sample + holdout) - obsAll <- length(data) + (1 - holdout)*h; + obsAll <- length(y) + (1 - holdout)*h; - y <- data[1:obsInsample]; - dataFreq <- frequency(data); - dataStart <- start(data); + yInSample <- matrix(y[1:obsInSample],obsInSample,1); + dataFreq <- frequency(y); + dataStart <- start(y); + yForecastStart <- time(y)[obsInSample]+deltat(y); # This is the critical minimum needed in order to at least fit ARIMA(0,0,0) with constant - if(obsInsample < 4){ + if(obsInSample < 4){ stop("Sorry, but your sample is too small. Come back when you have at least 4 observations...",call.=FALSE); } @@ -1600,65 +1616,65 @@ ssAutoInput <- function(smoothType=c("auto.ces","auto.gum","auto.ssarima","auto. ic <- "AICc"; } - ##### Cost function type ##### - cfType <- cfType[1]; - if(any(cfType==c("MSEh","TMSE","GTMSE","MSCE","MAEh","TMAE","GTMAE","MACE", + ##### Loss function type ##### + loss <- loss[1]; + if(any(loss==c("MSEh","TMSE","GTMSE","MSCE","MAEh","TMAE","GTMAE","MACE", "HAMh","THAM","GTHAM","CHAM", "TFL","aMSEh","aTMSE","aGTMSE","aTFL"))){ multisteps <- TRUE; } - else if(any(cfType==c("MSE","MAE","HAM","Rounded","TSB","LogisticD","LogisticL"))){ + else if(any(loss==c("MSE","MAE","HAM","Rounded","TSB","LogisticD","LogisticL"))){ multisteps <- FALSE; } else{ - warning(paste0("Strange cost function specified: ",cfType,". Switching to 'MSE'."),call.=FALSE); - cfType <- "MSE"; + warning(paste0("Strange loss function specified: ",loss,". Switching to 'MSE'."),call.=FALSE); + loss <- "MSE"; multisteps <- FALSE; } - if(!any(cfType==c("MSE","MAE","HAM","MSEh","MAEh","HAMh","MSCE","MACE","CHAM", + if(!any(loss==c("MSE","MAE","HAM","MSEh","MAEh","HAMh","MSCE","MACE","CHAM", "TFL","aTFL"))){ - warning(paste0("'",cfType,"' is used as cost function instead of 'MSE'. ", + warning(paste0("'",loss,"' is used as loss function instead of 'MSE'. ", "The results of the model selection may be wrong."),call.=FALSE); } - ##### intervals, intervalsType, level ##### - #intervalsType <- substring(intervalsType[1],1,1); - intervalsType <- intervals[1]; + ##### interval, intervalType, level ##### + #intervalType <- substring(intervalType[1],1,1); + intervalType <- interval[1]; # Check the provided type of interval - if(is.logical(intervalsType)){ - if(intervalsType){ - intervalsType <- "p"; + if(is.logical(intervalType)){ + if(intervalType){ + intervalType <- "p"; } else{ - intervalsType <- "none"; + intervalType <- "none"; } } - if(all(intervalsType!=c("p","s","n","a","sp","np","none","parametric","semiparametric","nonparametric"))){ - warning(paste0("Wrong type of interval: '",intervalsType, "'. Switching to 'parametric'."),call.=FALSE); - intervalsType <- "p"; + if(all(intervalType!=c("p","s","n","a","sp","np","none","parametric","semiparametric","nonparametric"))){ + warning(paste0("Wrong type of interval: '",intervalType, "'. Switching to 'parametric'."),call.=FALSE); + intervalType <- "p"; } - if(any(intervalsType==c("none","n"))){ - intervalsType <- "n"; - intervals <- FALSE; + if(any(intervalType==c("none","n"))){ + intervalType <- "n"; + interval <- FALSE; } - else if(any(intervalsType==c("parametric","p"))){ - intervalsType <- "p"; - intervals <- TRUE; + else if(any(intervalType==c("parametric","p"))){ + intervalType <- "p"; + interval <- TRUE; } - else if(any(intervalsType==c("semiparametric","sp"))){ - intervalsType <- "sp"; - intervals <- TRUE; + else if(any(intervalType==c("semiparametric","sp"))){ + intervalType <- "sp"; + interval <- TRUE; } - else if(any(intervalsType==c("nonparametric","np"))){ - intervalsType <- "np"; - intervals <- TRUE; + else if(any(intervalType==c("nonparametric","np"))){ + intervalType <- "np"; + interval <- TRUE; } else{ - intervals <- TRUE; + interval <- TRUE; } ##### Occurrence part of the model ##### @@ -1675,11 +1691,15 @@ ssAutoInput <- function(smoothType=c("auto.ces","auto.gum","auto.ssarima","auto. occurrenceModelProvided <- FALSE; } else{ - occurrenceModel <- oesmodel; + if(is.null(oesmodel) || is.na(oesmodel)){ + occurrenceModel <- "MNN"; + } + else{ + occurrenceModel <- oesmodel; + } occurrenceModelProvided <- FALSE; } - ##### Occurrence part of the model ##### if(exists("intermittent",envir=ParentEnvironment,inherits=FALSE)){ intermittent <- substr(intermittent[1],1,1); warning("The parameter \"intermittent\" is obsolete. Please, use \"occurrence\" instead"); @@ -1703,7 +1723,7 @@ ssAutoInput <- function(smoothType=c("auto.ces","auto.gum","auto.ssarima","auto. } if(is.numeric(occurrence)){ - obsNonzero <- sum((y!=0)*1); + obsNonzero <- sum((yInSample!=0)*1); # If it is data, then it should either correspond to the whole sample (in-sample + holdout) or be equal to forecating horizon. if(all(length(c(occurrence))!=c(h,obsAll))){ warning(paste0("Length of the provided future occurrences is ",length(c(occurrence)), @@ -1720,7 +1740,7 @@ ssAutoInput <- function(smoothType=c("auto.ces","auto.gum","auto.ssarima","auto. } } else{ - obsNonzero <- sum((y!=0)*1); + obsNonzero <- sum((yInSample!=0)*1); occurrence <- occurrence[1]; if(all(occurrence!=c("n","a","f","g","o","i","d", "none","auto","fixed","general","odds-ratio","inverse-odds-ratio","direct"))){ @@ -1759,21 +1779,22 @@ ssAutoInput <- function(smoothType=c("auto.ces","auto.gum","auto.ssarima","auto. assign("silentLegend",silentLegend,ParentEnvironment); assign("bounds",bounds,ParentEnvironment); assign("FI",FI,ParentEnvironment); - assign("obsInsample",obsInsample,ParentEnvironment); + assign("obsInSample",obsInSample,ParentEnvironment); assign("obsAll",obsAll,ParentEnvironment); assign("obsNonzero",obsNonzero,ParentEnvironment); assign("initialValue",initialValue,ParentEnvironment); assign("initialType",initialType,ParentEnvironment); assign("ic",ic,ParentEnvironment); - assign("cfType",cfType,ParentEnvironment); + assign("loss",loss,ParentEnvironment); assign("multisteps",multisteps,ParentEnvironment); - assign("intervals",intervals,ParentEnvironment); - assign("intervalsType",intervalsType,ParentEnvironment); + assign("interval",interval,ParentEnvironment); + assign("intervalType",intervalType,ParentEnvironment); assign("occurrence",occurrence,ParentEnvironment); + assign("yInSample",yInSample,ParentEnvironment); assign("y",y,ParentEnvironment); - assign("data",data,ParentEnvironment); assign("dataFreq",dataFreq,ParentEnvironment); assign("dataStart",dataStart,ParentEnvironment); + assign("yForecastStart",yForecastStart,ParentEnvironment); assign("xregDo",xregDo,ParentEnvironment); } @@ -1782,21 +1803,21 @@ ssFitter <- function(...){ ellipsis <- list(...); ParentEnvironment <- ellipsis[['ParentEnvironment']]; - if(cfType=="LogisticL"){ + if(loss=="LogisticL"){ EtypeNew <- "L"; } - else if(cfType=="LogisticD"){ + else if(loss=="LogisticD"){ EtypeNew <- "D"; } else{ EtypeNew <- Etype; } - fitting <- fitterwrap(matvt, matF, matw, y, vecg, + fitting <- fitterwrap(matvt, matF, matw, yInSample, vecg, modellags, EtypeNew, Ttype, Stype, initialType, matxt, matat, matFX, vecgX, ot); statesNames <- colnames(matvt); - matvt <- ts(fitting$matvt,start=(time(data)[1] - deltat(data)*maxlag),frequency=dataFreq); + matvt <- ts(fitting$matvt,start=(time(y)[1] - deltat(y)*maxlag),frequency=dataFreq); colnames(matvt) <- statesNames; yFitted <- ts(fitting$yfit,start=dataStart,frequency=dataFreq); errors <- ts(fitting$errors,start=dataStart,frequency=dataFreq); @@ -1818,7 +1839,7 @@ ssFitter <- function(...){ } if(h>0){ - errors.mat <- ts(errorerwrap(matvt, matF, matw, y, + errors.mat <- ts(errorerwrap(matvt, matF, matw, yInSample, h, Etype, Ttype, Stype, modellags, matxt, matat, matFX, ot), start=dataStart,frequency=dataFreq); @@ -1838,13 +1859,13 @@ ssFitter <- function(...){ assign("errors",errors,ParentEnvironment); } -##### *State space intervals* ##### -ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c("a","p","sp","np"), df=NULL, +##### *State space interval* ##### +ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalType=c("a","p","sp","np"), df=NULL, measurement=NULL, transition=NULL, persistence=NULL, s2=NULL, - modellags=NULL, states=NULL, cumulative=FALSE, cfType="MSE", + modellags=NULL, states=NULL, cumulative=FALSE, loss="MSE", yForecast=rep(0,ncol(errors)), Etype="A", Ttype="N", Stype="N", s2g=NULL, iprob=1){ - # Function constructs intervals based on the provided random variable. + # Function constructs interval based on the provided random variable. # If errors is a matrix, then it is assumed that each column has a variable that needs an interval. # based on errors the horison is estimated as ncol(errors) @@ -1861,38 +1882,38 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" } hsmN <- gamma(0.75)*pi^(-0.5)*2^(-0.75); - intervalsType <- intervalsType[1] + intervalType <- intervalType[1] # Check the provided type of interval - if(is.logical(intervalsType)){ - if(intervalsType){ - intervalsType <- "p"; + if(is.logical(intervalType)){ + if(intervalType){ + intervalType <- "p"; } else{ - intervalsType <- "none"; + intervalType <- "none"; } } - if(all(intervalsType!=c("a","p","s","n","a","sp","np","none","parametric","semiparametric","nonparametric","asymmetric"))){ - stop(paste0("What do you mean by 'intervalsType=",intervalsType,"'? I can't work with this!"),call.=FALSE); + if(all(intervalType!=c("a","p","s","n","a","sp","np","none","parametric","semiparametric","nonparametric","asymmetric"))){ + stop(paste0("What do you mean by 'intervalType=",intervalType,"'? I can't work with this!"),call.=FALSE); } - if(intervalsType=="none"){ - intervalsType <- "n"; + if(intervalType=="none"){ + intervalType <- "n"; } - else if(intervalsType=="parametric"){ - intervalsType <- "p"; + else if(intervalType=="parametric"){ + intervalType <- "p"; } - else if(intervalsType=="semiparametric"){ - intervalsType <- "sp"; + else if(intervalType=="semiparametric"){ + intervalType <- "sp"; } - else if(intervalsType=="nonparametric"){ - intervalsType <- "np"; + else if(intervalType=="nonparametric"){ + intervalType <- "np"; } - if(intervalsType=="p"){ + if(intervalType=="p"){ if(any(is.null(measurement),is.null(transition),is.null(persistence),is.null(s2),is.null(modellags))){ - stop("measurement, transition, persistence, s2 and modellags need to be provided in order to construct parametric intervals!",call.=FALSE); + stop("measurement, transition, persistence, s2 and modellags need to be provided in order to construct parametric interval!",call.=FALSE); } if(any(!is.matrix(measurement),!is.matrix(transition),!is.matrix(persistence))){ @@ -1900,7 +1921,7 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" } } - # Function allows to estimate the coefficients of the simple quantile regression. Used in intervals construction. + # Function allows to estimate the coefficients of the simple quantile regression. Used in interval construction. quantfunc <- function(A){ ee[] <- ye - (A[1]*xe^A[2]); return((1-quant)*sum(abs(ee[ee<0]))+quant*sum(abs(ee[ee>=0]))); @@ -1917,20 +1938,20 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" if(any(positiveLevels)){ # If this is Laplace or S, then get b values - if(cfType=="MAE"){ + if(loss=="MAE"){ sdVec <- sqrt(sdVec/2); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ sdVec <- (sdVec/120)^0.25; } # Produce lower quantiles if the probability is still lower than the lower P if(Etype=="A" | all(Etype=="M",all((1-iprob) < (1-level)/2))){ if(Etype=="M"){ - if(cfType=="MAE"){ + if(loss=="MAE"){ lowerquant[positiveLevels] <- exp(qlaplace((1-levelResidual[positiveLevels])/2,meanVec,sdVec)); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ lowerquant[positiveLevels] <- exp(qs((1-levelResidual[positiveLevels])/2,meanVec,sdVec)); } else{ @@ -1938,10 +1959,10 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" } } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ lowerquant[positiveLevels] <- qlaplace((1-levelResidual[positiveLevels])/2,meanVec,sdVec); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ lowerquant[positiveLevels] <- qs((1-levelResidual[positiveLevels])/2,meanVec,sdVec); } else{ @@ -1957,10 +1978,10 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" # Produce upper quantiles if(Etype=="M"){ - if(cfType=="MAE"){ + if(loss=="MAE"){ upperquant[positiveLevels] <- exp(qlaplace(levelNew,meanVec,sdVec)); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ upperquant[positiveLevels] <- exp(qs(levelNew,meanVec,sdVec)); } else{ @@ -1968,10 +1989,10 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" } } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ upperquant[positiveLevels] <- qlaplace(levelNew,meanVec,sdVec); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ upperquant[positiveLevels] <- qs(levelNew,meanVec,sdVec); } else{ @@ -1983,16 +2004,16 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" return(list(lower=lowerquant,upper=upperquant)); } - if(cfType=="MAE"){ + if(loss=="MAE"){ upperquant <- qlaplace((1+level)/2,0,1); lowerquant <- qlaplace((1-level)/2,0,1); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ upperquant <- qs((1+level)/2,0,1); lowerquant <- qs((1-level)/2,0,1); } else{ - #if(cfType=="MSE") + #if(loss=="MSE") # If degrees of freedom are provided, use Student's distribution. Otherwise stick with normal. if(is.null(df)){ upperquant <- qnorm((1+level)/2,0,1); @@ -2030,8 +2051,8 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" upper <- rep(NA,nVariables); lower <- rep(NA,nVariables); - #### Asymmetric intervals using HM #### - if(intervalsType=="a"){ + #### Asymmetric interval using HM #### + if(intervalType=="a"){ if(!cumulative){ for(i in 1:nVariables){ upper[i] <- ev[i] + upperquant / hsmN^2 * Re(hm(errors[,i],ev[i]))^2; @@ -2049,8 +2070,8 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" varVec <- NULL; } - #### Semiparametric intervals using the variance of errors #### - else if(intervalsType=="sp"){ + #### Semiparametric interval using the variance of errors #### + else if(intervalType=="sp"){ if(Etype=="M"){ errors[errors < -1] <- -0.999; if(!cumulative){ @@ -2061,12 +2082,12 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ varVec <- sqrt(varVec/2); upper <- exp(qlaplace((1+level)/2,0,varVec)); lower <- exp(qlaplace((1-level)/2,0,varVec)); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ varVec <- (varVec/120)^0.25; upper <- exp(qs((1+level)/2,0,varVec)); lower <- exp(qs((1-level)/2,0,varVec)); @@ -2088,12 +2109,12 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ varVec <- sqrt(varVec/2); upper <- exp(qlaplace((1+level)/2,0,varVec)); lower <- exp(qlaplace((1-level)/2,0,varVec)); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ varVec <- (varVec/120)^0.25; upper <- exp(qs((1+level)/2,0,varVec)); lower <- exp(qs((1-level)/2,0,varVec)); @@ -2117,11 +2138,11 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- ev + quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ # s^2 = 2 b^2 => b^2 = s^2 / 2 varVec <- varVec / 2; } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ # s^2 = 120 b^4 => b^4 = s^2 / 120 # S(mu, b) = S(mu, 1) * 50^2 varVec <- varVec/120; @@ -2139,11 +2160,11 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- sum(ev) + quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ # s^2 = 2 b^2 => b^2 = s^2 / 2 varVec <- varVec / 2; } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ # s^2 = 120 b^4 => b^4 = s^2 / 120 # S(mu, b) = S(mu, 1) * 50^2 varVec <- varVec/120; @@ -2156,8 +2177,8 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" } } - #### Nonparametric intervals using Taylor and Bunn, 1999 #### - else if(intervalsType=="np"){ + #### Nonparametric interval using Taylor and Bunn, 1999 #### + else if(intervalType=="np"){ nonNAobs <- apply(!is.na(errors),1,all); ye <- errors[nonNAobs,]; @@ -2209,8 +2230,8 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" varVec <- NULL; } - #### Parametric intervals #### - else if(intervalsType=="p"){ + #### Parametric interval #### + else if(intervalType=="p"){ h <- length(yForecast); # Vector of final variances @@ -2218,7 +2239,7 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" #### Pure Multiplicative models #### if(Etype=="M"){ - # This is just an approximation of the true intervals + # This is just an approximation of the true interval covarMat <- covarAnal(modellags, h, measurement, transition, persistence, s2); ### Cumulative variance is different. @@ -2232,12 +2253,12 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ varVec <- sqrt(varVec / 2); upper <- exp(qlaplace((1+level)/2,0,varVec)); lower <- exp(qlaplace((1-level)/2,0,varVec)); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ varVec <- (varVec/120)^0.25; upper <- exp(qs((1+level)/2,0,varVec)); lower <- exp(qs((1-level)/2,0,varVec)); @@ -2260,13 +2281,13 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ # s^2 = 2 b^2 => b = sqrt(s^2 / 2) varVec <- sqrt(varVec / 2); upper <- exp(qlaplace((1+level)/2,0,varVec)); lower <- exp(qlaplace((1-level)/2,0,varVec)); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ # s^2 = 120 b^4 => b^4 = s^2 / 120 # S(mu, b) = S(mu, 1) * 50^2 varVec <- (varVec/120)^0.25; @@ -2305,11 +2326,11 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ # s^2 = 2 b^2 => b^2 = s^2 / 2 varVec <- varVec / 2; } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ # s^2 = 120 b^4 => b^4 = s^2 / 120 # S(mu, b) = S(mu, 1) * b^2 varVec <- varVec/120; @@ -2326,11 +2347,11 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" stop("Provided expected value doesn't correspond to the dimension of errors.", call.=FALSE); } - if(intervalsType=="a"){ + if(intervalType=="a"){ upper <- ev + upperquant / hsmN^2 * Re(hm(errors,ev))^2; lower <- ev + lowerquant / hsmN^2 * Im(hm(errors,ev))^2; } - else if(any(intervalsType==c("sp","p"))){ + else if(any(intervalType==c("sp","p"))){ if(Etype=="M"){ if(any(iprob!=1)){ quants <- qlnormBin(iprob, level=level, meanVec=0, sdVec=sqrt(s2), Etype="M"); @@ -2338,13 +2359,13 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ # s^2 = 2 b^2 => b = sqrt(s^2 / 2) s2 <- sqrt(s2 / 2); upper <- exp(qlaplace((1+level)/2,0,s2)); lower <- exp(qlaplace((1-level)/2,0,s2)); } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ # s^2 = 120 b^4 => b^4 = s^2 / 120 # S(mu, b) = S(mu, 1) * 50^2 s2 <- (s2/120)^0.25; @@ -2366,11 +2387,11 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" lower <- quants$lower; } else{ - if(cfType=="MAE"){ + if(loss=="MAE"){ # s^2 = 2 b^2 => b^2 = s^2 / 2 s2 <- s2 / 2; } - else if(cfType=="HAM"){ + else if(loss=="HAM"){ # s^2 = 120 b^4 => b^4 = s^2 / 120 # S(mu, b) = S(mu, 1) * 50^2 s2 <- s2/120; @@ -2380,7 +2401,7 @@ ssIntervals <- function(errors, ev=median(errors), level=0.95, intervalsType=c(" } } } - else if(intervalsType=="np"){ + else if(intervalType=="np"){ if(Etype=="M"){ errors <- errors + 1; } @@ -2404,27 +2425,25 @@ ssForecaster <- function(...){ if(!rounded){ # If error additive, estimate as normal. Otherwise - lognormal if(Etype=="A"){ - s2 <- as.vector(sum((errors*ot)^2)/obsInsample); + s2 <- as.vector(sum((errors*ot)^2)/obsInSample); s2g <- 1; } else{ - s2 <- as.vector(sum(log(1 + errors*ot)^2)/obsInsample); - s2g <- log(1 + vecg %*% as.vector(errors*ot)) %*% t(log(1 + vecg %*% as.vector(errors*ot)))/obsInsample; + s2 <- as.vector(sum(log(1 + errors*ot)^2)/obsInSample); + s2g <- log(1 + vecg %*% as.vector(errors*ot)) %*% t(log(1 + vecg %*% as.vector(errors*ot)))/obsInSample; } } - yForecastStart <- time(data)[obsInsample]+deltat(data); - if(h>0){ - yForecast <- ts(c(forecasterwrap(matrix(matvt[(obsInsample+1):(obsInsample+maxlag),],nrow=maxlag), + yForecast <- ts(c(forecasterwrap(matvt[(obsInSample+1):(obsInSample+maxlag),,drop=FALSE], matF, matw, h, Etype, Ttype, Stype, modellags, - matrix(matxt[(obsAll-h+1):(obsAll),],ncol=nExovars), - matrix(matat[(obsAll-h+1):(obsAll),],ncol=nExovars), matFX)), + matxt[(obsAll-h+1):(obsAll),,drop=FALSE], + matat[(obsAll-h+1):(obsAll),,drop=FALSE], matFX)), start=yForecastStart,frequency=dataFreq); - if(any(cfType==c("LogisticL","LogisticD"))){ + if(any(loss==c("LogisticL","LogisticD"))){ if(any(is.nan(yForecast)) | any(is.infinite(yForecast))){ - yForecast[] <- matvt[obsInsample,1]; + yForecast[] <- matvt[obsInSample,1]; } } @@ -2436,13 +2455,13 @@ ssForecaster <- function(...){ if(Etype=="M" & any(yForecast<0)){ warning(paste0("Negative values produced in forecast. This does not make any sense for model with multiplicative error.\n", "Please, use another model."),call.=FALSE); - if(intervals){ - warning("And don't expect anything reasonable from the prediction intervals!",call.=FALSE); + if(interval){ + warning("And don't expect anything reasonable from the prediction interval!",call.=FALSE); } } - # Write down the forecasting intervals - if(intervals){ + # Write down the forecasting interval + if(interval){ if(h==1){ errors.x <- as.vector(errors); ev <- median(errors); @@ -2451,14 +2470,14 @@ ssForecaster <- function(...){ errors.x <- errors.mat; ev <- apply(errors.mat,2,median,na.rm=TRUE); } - if(intervalsType!="a"){ + if(intervalType!="a"){ ev <- 0; } # We don't simulate pure additive models, pure multiplicative and # additive models with multiplicative error, # because they can be approximated by the pure additive ones - if(intervalsType=="p"){ + if(intervalType=="p"){ if(all(c(Etype,Stype,Ttype)!="M") | all(c(Etype,Stype,Ttype)!="A") | (all(Etype=="M",any(Ttype==c("A","N")),any(Stype==c("A","N"))) & s2<0.1)){ @@ -2472,7 +2491,7 @@ ssForecaster <- function(...){ simulateIntervals <- FALSE; } - # It is not possible to produce parametric / semi / non intervals for cumulative values + # It is not possible to produce parametric / semi / non interval for cumulative values # of multiplicative model. So we use simulations instead. # if(Etype=="M"){ # simulateIntervals <- TRUE; @@ -2482,7 +2501,7 @@ ssForecaster <- function(...){ nSamples <- 100000; matg <- matrix(vecg,nComponents,nSamples); arrvt <- array(NA,c(h+maxlag,nComponents,nSamples)); - arrvt[1:maxlag,,] <- rep(matvt[obsInsample+(1:maxlag),],nSamples); + arrvt[1:maxlag,,] <- rep(matvt[obsInSample+(1:maxlag),],nSamples); materrors <- matrix(rnorm(h*nSamples,0,sqrt(s2)),h,nSamples); if(Etype=="M"){ @@ -2499,7 +2518,7 @@ ssForecaster <- function(...){ Etype,Ttype,Stype,modellags)$matyt; if(!is.null(xreg)){ - yForecastExo <- c(yForecast) - forecasterwrap(matrix(matvt[(obsInsample+1):(obsInsample+maxlag),],nrow=maxlag), + yForecastExo <- c(yForecast) - forecasterwrap(matrix(matvt[(obsInSample+1):(obsInSample+maxlag),],nrow=maxlag), matF, matw, h, Etype, Ttype, Stype, modellags, matrix(rep(1,h),ncol=1), matrix(rep(0,h),ncol=1), matrix(1,1,1)); } @@ -2508,7 +2527,7 @@ ssForecaster <- function(...){ } if(Etype=="M"){ - yForecast <- apply(ySimulated, 1, mean); + yForecast[] <- apply(ySimulated, 1, mean); } if(rounded){ @@ -2525,8 +2544,8 @@ ssForecaster <- function(...){ quantileType <- 7; } - yForecast <- yForecast + yForecastExo; - yForecast <- c(pForecast)*yForecast; + yForecast[] <- yForecast + yForecastExo; + yForecast[] <- pForecast * yForecast; if(cumulative){ yForecast <- ts(sum(yForecast),start=yForecastStart,frequency=dataFreq); @@ -2534,21 +2553,21 @@ ssForecaster <- function(...){ yUpper <- ts(quantile(colSums(ySimulated,na.rm=T),(1+level)/2,type=quantileType),start=yForecastStart,frequency=dataFreq); } else{ - yForecast <- ts(yForecast,start=yForecastStart,frequency=dataFreq); + # yForecast <- ts(yForecast,start=yForecastStart,frequency=dataFreq); yLower <- ts(apply(ySimulated,1,quantile,(1-level)/2,na.rm=T,type=quantileType) + yForecastExo,start=yForecastStart,frequency=dataFreq); yUpper <- ts(apply(ySimulated,1,quantile,(1+level)/2,na.rm=T,type=quantileType) + yForecastExo,start=yForecastStart,frequency=dataFreq); } } else{ - quantvalues <- ssIntervals(errors.x, ev=ev, level=level, intervalsType=intervalsType, df=obsInsample, + quantvalues <- ssIntervals(errors.x, ev=ev, level=level, intervalType=intervalType, df=obsInSample, measurement=matw, transition=matF, persistence=vecg, s2=s2, - modellags=modellags, states=matvt[(obsInsample-maxlag+1):obsInsample,], - cumulative=cumulative, cfType=cfType, + modellags=modellags, states=matvt[(obsInSample-maxlag+1):obsInSample,], + cumulative=cumulative, loss=loss, yForecast=yForecast, Etype=Etype, Ttype=Ttype, Stype=Stype, s2g=s2g, iprob=pForecast); - # if(!(intervalsType=="sp" & Etype=="M")){ - yForecast <- c(pForecast)*yForecast; + # if(!(intervalType=="sp" & Etype=="M")){ + yForecast[] <- pForecast * yForecast; # } if(cumulative){ @@ -2560,7 +2579,7 @@ ssForecaster <- function(...){ yUpper <- ts(c(yForecast) + quantvalues$upper,start=yForecastStart,frequency=dataFreq); } else{ - # if(any(intervalsType==c("np","sp","a"))){ + # if(any(intervalType==c("np","sp","a"))){ # quantvalues$upper <- quantvalues$upper * yForecast; # quantvalues$lower <- quantvalues$lower * yForecast; # } @@ -2578,15 +2597,15 @@ ssForecaster <- function(...){ yLower <- NA; yUpper <- NA; if(rounded){ - yForecast <- ceiling(yForecast); + yForecast[] <- ceiling(yForecast); } - yForecast <- c(pForecast)*yForecast; + yForecast[] <- pForecast*yForecast; if(cumulative){ yForecast <- ts(sum(yForecast),start=yForecastStart,frequency=dataFreq); } - else{ - yForecast <- ts(yForecast,start=yForecastStart,frequency=dataFreq); - } + # else{ + # yForecast <- ts(yForecast,start=yForecastStart,frequency=dataFreq); + # } } } else{ @@ -2600,12 +2619,12 @@ ssForecaster <- function(...){ warning("Please check the input and report this error to the maintainer if it persists.",call.=FALSE); } - if(cfType=="LogisticL"){ + if(loss=="LogisticL"){ yForecast <- yForecast / (1 + yForecast); yLower <- yLower / (1 + yLower); yUpper <- yUpper / (1 + yUpper); } - else if(cfType=="LogisticD"){ + else if(loss=="LogisticD"){ # If the values are too high (hard to take exp), substitute by 1 yForecastNew <- exp(yForecast) / (1 + exp(yForecast)); yLowerNew <- exp(yLower) / (1 + exp(yLower)); @@ -2631,13 +2650,12 @@ ssForecaster <- function(...){ assign("yForecast",yForecast,ParentEnvironment); assign("yLower",yLower,ParentEnvironment); assign("yUpper",yUpper,ParentEnvironment); - assign("yForecastStart",yForecastStart,ParentEnvironment); } ##### *Check and initialisation of xreg* ##### -ssXreg <- function(data, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, +ssXreg <- function(y, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, persistenceX=NULL, transitionX=NULL, initialX=NULL, - obsInsample, obsAll, obsStates, maxlag=1, h=1, xregDo="u", silent=FALSE, + obsInSample, obsAll, obsStates, maxlag=1, h=1, xregDo="u", silent=FALSE, allowMultiplicative=FALSE){ # The function does general checks needed for exogenouse variables and returns the list of necessary parameters @@ -2661,7 +2679,7 @@ ssXreg <- function(data, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, if(is.vector(xreg) | (is.ts(xreg) & !is.matrix(xreg))){ # Check if xreg contains something meaningful if(is.null(initialX)){ - if(all(xreg[1:obsInsample]==xreg[1])){ + if(all(xreg[1:obsInSample]==xreg[1])){ warning("The exogenous variable has no variability. Cannot do anything with that, so dropping out xreg.", call.=FALSE); xreg <- NULL; @@ -2686,7 +2704,7 @@ ssXreg <- function(data, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, xreg <- xreg[1:obsAll]; } - if(all(data[1:obsInsample]==xreg[1:obsInsample])){ + if(all(y[1:obsInSample]==xreg[1:obsInSample])){ warning("The exogenous variable and the forecasted data are exactly the same. What's the point of such a regression?", call.=FALSE); xreg <- NULL; @@ -2701,19 +2719,19 @@ ssXreg <- function(data, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, # Fill in the initial values for exogenous coefs using OLS if(is.null(initialX)){ if(Etype=="M"){ - matat[1:maxlag,] <- cov(log(data[1:obsInsample][ot==1]), - xreg[1:obsInsample][ot==1])/var(xreg[1:obsInsample][ot==1]); + matat[1:maxlag,] <- cov(log(y[1:obsInSample][ot==1]), + xreg[1:obsInSample][ot==1])/var(xreg[1:obsInSample][ot==1]); matatMultiplicative[1:maxlag,] <- matat[1:maxlag,]; } else{ - matat[1:maxlag,] <- cov(data[1:obsInsample][ot==1],xreg[1:obsInsample][ot==1])/var(xreg[1:obsInsample][ot==1]); + matat[1:maxlag,] <- cov(y[1:obsInSample][ot==1],xreg[1:obsInSample][ot==1])/var(xreg[1:obsInSample][ot==1]); } matat[] <- matat[1,] # If Etype=="Z" or "C", estimate multiplicative stuff. if(allowMultiplicative & all(Etype!=c("M","A"))){ - matatMultiplicative[1:maxlag,] <- cov(log(data[1:obsInsample][ot==1]), - xreg[1:obsInsample][ot==1])/var(xreg[1:obsInsample][ot==1]); + matatMultiplicative[1:maxlag,] <- cov(log(y[1:obsInSample][ot==1]), + xreg[1:obsInSample][ot==1])/var(xreg[1:obsInSample][ot==1]); } } if(is.null(names(xreg))){ @@ -2761,7 +2779,7 @@ ssXreg <- function(data, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, xreg <- xreg[1:obsAll,]; } - xregEqualToData <- apply(xreg[1:obsInsample,]==data[1:obsInsample],2,all); + xregEqualToData <- apply(xreg[1:obsInSample,]==y[1:obsInSample],2,all); if(any(xregEqualToData)){ warning("One of exogenous variables and the forecasted data are exactly the same. We have dropped it.", call.=FALSE); @@ -2772,7 +2790,7 @@ ssXreg <- function(data, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, # If initialX is provided, then probably we don't need to check the xreg on variability and multicollinearity if(is.null(initialX)){ - checkvariability <- apply(matrix(xreg[1:obsInsample,][ot==1,]==rep(xreg[ot==1,][1,],each=sum(ot)),sum(ot),nExovars),2,all); + checkvariability <- apply(matrix(xreg[1:obsInSample,][ot==1,]==rep(xreg[ot==1,][1,],each=sum(ot)),sum(ot),nExovars),2,all); if(any(checkvariability)){ if(all(checkvariability)){ warning("None of exogenous variables has variability. Cannot do anything with that, so dropping out xreg.", @@ -2839,22 +2857,22 @@ ssXreg <- function(data, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, # Fill in the initial values for exogenous coefs using OLS if(is.null(initialX)){ if(Etype=="M"){ - matat[1:maxlag,] <- rep(t(solve(t(mat.x[1:obsInsample,][ot==1,]) %*% mat.x[1:obsInsample,][ot==1,],tol=1e-50) %*% - t(mat.x[1:obsInsample,][ot==1,]) %*% log(data[1:obsInsample][ot==1]))[2:(nExovars+1)], + matat[1:maxlag,] <- rep(t(solve(t(mat.x[1:obsInSample,][ot==1,]) %*% mat.x[1:obsInSample,][ot==1,],tol=1e-50) %*% + t(mat.x[1:obsInSample,][ot==1,]) %*% log(y[1:obsInSample][ot==1]))[2:(nExovars+1)], each=maxlag); matatMultiplicative[1:maxlag,] <- matat[1:maxlag,]; } else{ - matat[1:maxlag,] <- rep(t(solve(t(mat.x[1:obsInsample,][ot==1,]) %*% mat.x[1:obsInsample,][ot==1,],tol=1e-50) %*% - t(mat.x[1:obsInsample,][ot==1,]) %*% data[1:obsInsample][ot==1])[2:(nExovars+1)], + matat[1:maxlag,] <- rep(t(solve(t(mat.x[1:obsInSample,][ot==1,]) %*% mat.x[1:obsInSample,][ot==1,],tol=1e-50) %*% + t(mat.x[1:obsInSample,][ot==1,]) %*% y[1:obsInSample][ot==1])[2:(nExovars+1)], each=maxlag); } matat[-1,] <- rep(matat[1,],each=obsStates-1); # If Etype=="Z" or "C", estimate multiplicative stuff. if(allowMultiplicative & all(Etype!=c("M","A"))){ - matatMultiplicative[1:maxlag,] <- rep(t(solve(t(mat.x[1:obsInsample,][ot==1,]) %*% mat.x[1:obsInsample,][ot==1,],tol=1e-50) %*% - t(mat.x[1:obsInsample,][ot==1,]) %*% log(data[1:obsInsample][ot==1]))[2:(nExovars+1)], + matatMultiplicative[1:maxlag,] <- rep(t(solve(t(mat.x[1:obsInSample,][ot==1,]) %*% mat.x[1:obsInSample,][ot==1,],tol=1e-50) %*% + t(mat.x[1:obsInSample,][ot==1,]) %*% log(y[1:obsInSample][ot==1]))[2:(nExovars+1)], each=maxlag); } } @@ -2985,36 +3003,36 @@ ssXreg <- function(data, Etype="A", xreg=NULL, updateX=FALSE, ot=NULL, ##### *Likelihood function* ##### likelihoodFunction <- function(C){ #### Concentrated logLikelihood based on C and CF #### - logLikFromCF <- function(C, cfType){ + logLikFromCF <- function(C, loss){ yotSumLog <- switch(Etype, "M" = sum(log(yot)), "A" = 0); - if(Etype=="M" && any(cfType==c("TMSE","GTMSE","TMAE","GTMAE","THAM","GTHAM", + if(Etype=="M" && any(loss==c("TMSE","GTMSE","TMAE","GTMAE","THAM","GTHAM", "TFL","aTMSE","aGTMSE","aTFL"))){ yotSumLog <- yotSumLog * h; } - if(any(cfType==c("MAE","MAEh","MACE"))){ - return(- (obsInsample*(log(2) + 1 + log(CF(C))) + obsZero) - yotSumLog); + if(any(loss==c("MAE","MAEh","MACE"))){ + return(- (obsInSample*(log(2) + 1 + log(CF(C))) + obsZero) - yotSumLog); } - else if(any(cfType==c("HAM","HAMh","CHAM"))){ + else if(any(loss==c("HAM","HAMh","CHAM"))){ #### This is a temporary fix for the oes models... Needs to be done properly!!! #### - return(- 2*(obsInsample*(log(2) + 1 + log(CF(C))) + obsZero) - yotSumLog); + return(- 2*(obsInSample*(log(2) + 1 + log(CF(C))) + obsZero) - yotSumLog); } - else if(any(cfType==c("TFL","aTFL"))){ - return(- 0.5 *(obsInsample*(h*log(2*pi) + 1 + CF(C)) + obsZero) - yotSumLog); + else if(any(loss==c("TFL","aTFL"))){ + return(- 0.5 *(obsInSample*(h*log(2*pi) + 1 + CF(C)) + obsZero) - yotSumLog); } - else if(any(cfType==c("LogisticD","LogisticL","TSB","Rounded"))){ + else if(any(loss==c("LogisticD","LogisticL","TSB","Rounded"))){ return(-CF(C)); } else{ - #if(cfType==c("MSE","MSEh","MSCE")) obsNonzero - return(- 0.5 *(obsInsample*(log(2*pi) + 1 + log(CF(C))) + obsZero) - yotSumLog); + #if(loss==c("MSE","MSEh","MSCE")) obsNonzero + return(- 0.5 *(obsInSample*(log(2*pi) + 1 + log(CF(C))) + obsZero) - yotSumLog); } } if(any(occurrence==c("n","p"))){ - return(logLikFromCF(C, cfType)); + return(logLikFromCF(C, loss)); } else{ #Failsafe for exceptional cases when the probability is equal to zero / one, when it should not have been. @@ -3023,16 +3041,16 @@ likelihoodFunction <- function(C){ ptNew <- pFitted[(pFitted!=0) & (pFitted!=1)]; otNew <- ot[(pFitted!=0) & (pFitted!=1)]; if(length(ptNew)==0){ - return(logLikFromCF(C, cfType)); + return(logLikFromCF(C, loss)); } else{ return(sum(log(ptNew[otNew==1])) + sum(log(1-ptNew[otNew==0])) - + logLikFromCF(C, cfType)); + + logLikFromCF(C, loss)); } } #Failsafe for cases, when data has no variability when ot==1. if(CF(C)==0){ - if(cfType=="TFL" | cfType=="aTFL"){ + if(loss=="TFL" | loss=="aTFL"){ return(sum(log(pFitted[ot==1]))*h + sum(log(1-pFitted[ot==0]))*h); } else{ @@ -3042,14 +3060,14 @@ likelihoodFunction <- function(C){ if(rounded){ return(sum(log(pFitted[ot==1])) + sum(log(1-pFitted[ot==0])) - CF(C) - obsZero/2*(log(2*pi*C[length(C)]^2)+1)); } - if(cfType=="TFL" | cfType=="aTFL"){ + if(loss=="TFL" | loss=="aTFL"){ return(sum(log(pFitted[ot==1]))*h + sum(log(1-pFitted[ot==0]))*h - + logLikFromCF(C, cfType)); + + logLikFromCF(C, loss)); } else{ return(sum(log(pFitted[ot==1])) + sum(log(1-pFitted[ot==0])) - + logLikFromCF(C, cfType)); + + logLikFromCF(C, loss)); } } } @@ -3067,22 +3085,22 @@ ICFunction <- function(nParam=nParam,nParamOccurrence=nParamOccurrence, # max here is needed in order to take into account cases with higher ## number of parameters than observations ### AICc and BICc are incorrect in case of non-normal residuals! - if(cfType=="TFL"){ + if(loss=="TFL"){ coefAIC <- 2*nParamOverall*h - 2*llikelihood; - coefBIC <- log(obsInsample)*nParamOverall*h - 2*llikelihood; - coefAICc <- (2*obsInsample*(nParam*h + (h*(h+1))/2) / - max(obsInsample - nParam - 1 - h,0) + coefBIC <- log(obsInSample)*nParamOverall*h - 2*llikelihood; + coefAICc <- (2*obsInSample*(nParam*h + (h*(h+1))/2) / + max(obsInSample - nParam - 1 - h,0) -2*llikelihood); coefBICc <- (((nParam + (h*(h+1))/2)* - log(obsInsample*h)*obsInsample*h) / - max(obsInsample*h - nParam - (h*(h+1))/2,0) + log(obsInSample*h)*obsInSample*h) / + max(obsInSample*h - nParam - (h*(h+1))/2,0) -2*llikelihood); } else{ coefAIC <- 2*nParamOverall - 2*llikelihood; - coefBIC <- log(obsInsample)*nParamOverall - 2*llikelihood; - coefAICc <- coefAIC + 2*nParam*(nParam+1) / max(obsInsample-nParam-1,0); - coefBICc <- (nParam * log(obsInsample) * obsInsample) / (obsInsample - nParam - 1) -2*llikelihood; + coefBIC <- log(obsInSample)*nParamOverall - 2*llikelihood; + coefAICc <- coefAIC + 2*nParam*(nParam+1) / max(obsInSample-nParam-1,0); + coefBICc <- (nParam * log(obsInSample) * obsInSample) / (obsInSample - nParam - 1) -2*llikelihood; } ICs <- c(coefAIC, coefAICc, coefBIC, coefBICc); @@ -3095,9 +3113,9 @@ ICFunction <- function(nParam=nParam,nParamOccurrence=nParamOccurrence, ssOutput <- function(timeelapsed, modelname, persistence=NULL, transition=NULL, measurement=NULL, phi=NULL, ARterms=NULL, MAterms=NULL, constant=NULL, A=NULL, B=NULL, initialType="o", nParam=NULL, s2=NULL, hadxreg=FALSE, wentwild=FALSE, - cfType="MSE", cfObjective=NULL, intervals=FALSE, cumulative=FALSE, - intervalsType=c("n","p","sp","np","a"), level=0.95, ICs, - holdout=FALSE, insideintervals=NULL, errormeasures=NULL, + loss="MSE", cfObjective=NULL, interval=FALSE, cumulative=FALSE, + intervalType=c("n","p","sp","np","a"), level=0.95, ICs, + holdout=FALSE, insideinterval=NULL, errormeasures=NULL, occurrence="n"){ # Function forms the generic output for state space models. if(!is.null(modelname)){ @@ -3259,9 +3277,9 @@ ssOutput <- function(timeelapsed, modelname, persistence=NULL, transition=NULL, } } - cat(paste0("Cost function type: ",cfType)) + cat(paste0("Loss function type: ",loss)) if(!is.null(cfObjective)){ - cat(paste0("; Cost function value: ",round(cfObjective,3),"\n")); + cat(paste0("; Loss function value: ",round(cfObjective,3),"\n")); } else{ cat("\n"); @@ -3276,28 +3294,28 @@ ssOutput <- function(timeelapsed, modelname, persistence=NULL, transition=NULL, } print(round(ICs,4)); - if(intervals){ - if(intervalsType=="p"){ - intervalsType <- "parametric"; + if(interval){ + if(intervalType=="p"){ + intervalType <- "parametric"; } - else if(intervalsType=="sp"){ - intervalsType <- "semiparametric"; + else if(intervalType=="sp"){ + intervalType <- "semiparametric"; } - else if(intervalsType=="np"){ - intervalsType <- "nonparametric"; + else if(intervalType=="np"){ + intervalType <- "nonparametric"; } - else if(intervalsType=="a"){ - intervalsType <- "asymmetric"; + else if(intervalType=="a"){ + intervalType <- "asymmetric"; } if(cumulative){ - intervalsType <- paste0("cumulative ",intervalsType); + intervalType <- paste0("cumulative ",intervalType); } - cat(paste0(level*100,"% ",intervalsType," prediction intervals were constructed\n")); + cat(paste0(level*100,"% ",intervalType," prediction interval were constructed\n")); } if(holdout){ - if(intervals && !is.null(insideintervals)){ - cat(paste0(round(insideintervals,0), "% of values are in the prediction interval\n")); + if(interval && !is.null(insideinterval)){ + cat(paste0(round(insideinterval,0), "% of values are in the prediction interval\n")); } cat("Forecast errors:\n"); if(any(occurrence==c("none","n"))){ diff --git a/R/ves.R b/R/ves.R index 817e3e0..2cfe529 100644 --- a/R/ves.R +++ b/R/ves.R @@ -58,6 +58,9 @@ utils::globalVariables(c("nParamMax","nComponentsAll","nComponentsNonSeasonal"," #' In case of multiplicative model, instead of the vector y_t we use its logarithms. #' As a result the multiplicative model is much easier to work with. #' +#' For some more information about the model and its implementation, see the +#' vignette: \code{vignette("ves","smooth")} +#' #' @template vssBasicParam #' @template vssAdvancedParam #' @template ssAuthor @@ -112,7 +115,7 @@ utils::globalVariables(c("nParamMax","nComponentsAll","nComponentsNonSeasonal"," #' \item \code{initialSeason} - The initial values of the seasonal components; #' \item \code{nParam} - The number of estimated parameters; #' \item \code{imodel} - The intermittent model estimated with VES; -#' \item \code{actuals} - The matrix with the original data; +#' \item \code{y} - The matrix with the original data; #' \item \code{fitted} - The matrix of the fitted values; #' \item \code{holdout} - The matrix with the holdout values (if \code{holdout=TRUE} in #' the estimation); @@ -120,13 +123,13 @@ utils::globalVariables(c("nParamMax","nComponentsAll","nComponentsNonSeasonal"," #' \item \code{Sigma} - The covariance matrix of the errors (estimated with the correction #' for the number of degrees of freedom); #' \item \code{forecast} - The matrix of point forecasts; -#' \item \code{PI} - The bounds of the prediction intervals; -#' \item \code{intervals} - The type of the constructed prediction intervals; -#' \item \code{level} - The level of the confidence for the prediction intervals; +#' \item \code{PI} - The bounds of the prediction interval; +#' \item \code{interval} - The type of the constructed prediction interval; +#' \item \code{level} - The level of the confidence for the prediction interval; #' \item \code{ICs} - The values of the information criteria; #' \item \code{logLik} - The log-likelihood function; -#' \item \code{cf} - The value of the cost function; -#' \item \code{cfType} - The type of the used cost function; +#' \item \code{lossValue} - The value of the loss function; +#' \item \code{loss} - The type of the used loss function; #' \item \code{accuracy} - the values of the error measures. Currently not available. #' \item \code{FI} - Fisher information if user asked for it using \code{FI=TRUE}. #' } @@ -152,12 +155,12 @@ utils::globalVariables(c("nParamMax","nComponentsAll","nComponentsNonSeasonal"," #' ves(Y,model="MNN",h=10,holdout=TRUE,intermittent="l") #' #' @export -ves <- function(data, model="ANN", persistence=c("group","independent","dependent","seasonal"), +ves <- function(y, model="ANN", persistence=c("group","independent","dependent","seasonal"), transition=c("group","independent","dependent"), phi=c("group","individual"), initial=c("individual","group"), initialSeason=c("group","individual"), - cfType=c("likelihood","diagonal","trace"), + loss=c("likelihood","diagonal","trace"), ic=c("AICc","AIC","BIC","BICc"), h=10, holdout=FALSE, - intervals=c("none","conditional","unconditional","independent"), level=0.95, + interval=c("none","conditional","unconditional","independent"), level=0.95, cumulative=FALSE, intermittent=c("none","fixed","logistic"), imodel="ANN", iprobability=c("dependent","independent"), @@ -168,6 +171,11 @@ ves <- function(data, model="ANN", persistence=c("group","independent","dependen # Start measuring the time of calculations startTime <- Sys.time(); + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + loss <- depricator(loss, list(...), "cfType"); + interval <- depricator(interval, list(...), "intervals"); + # If a previous model provided as a model, write down the variables if(any(is.vsmooth(model))){ if(smoothType(model)!="VES"){ @@ -212,8 +220,8 @@ ves <- function(data, model="ANN", persistence=c("group","independent","dependen CF <- function(A){ elements <- BasicInitialiserVES(matvt,matF,matG,matW,A); - cfRes <- vOptimiserWrap(y, elements$matvt, elements$matF, elements$matW, elements$matG, - modelLags, Etype, Ttype, Stype, cfType, normalizer, bounds, ot, otObs); + cfRes <- vOptimiserWrap(yInSample, elements$matvt, elements$matF, elements$matW, elements$matG, + modelLags, Etype, Ttype, Stype, loss, normalizer, bounds, ot, otObs); # multisteps, initialType, bounds, if(is.nan(cfRes) | is.na(cfRes) | is.infinite(cfRes)){ @@ -414,7 +422,7 @@ BasicMakerVES <- function(...){ } else{ XValues <- rbind(rep(1,obsInSample),c(1:obsInSample)); - initialValue <- y %*% t(XValues) %*% solve(XValues %*% t(XValues)); + initialValue <- yInSample %*% t(XValues) %*% solve(XValues %*% t(XValues)); if(Etype=="L"){ initialValue[,1] <- (initialValue[,1] - 0.5) * 20; } @@ -440,10 +448,10 @@ BasicMakerVES <- function(...){ # Matrix of dummies for seasons XValues <- matrix(rep(diag(maxlag),ceiling(obsInSample/maxlag)),maxlag)[,1:obsInSample]; # if(Stype=="A"){ - initialSeasonValue <- (y-rowMeans(y)) %*% t(XValues) %*% solve(XValues %*% t(XValues)); + initialSeasonValue <- (yInSample-rowMeans(yInSample)) %*% t(XValues) %*% solve(XValues %*% t(XValues)); # } # else{ - # initialSeasonValue <- (y-rowMeans(y)) %*% t(XValues) %*% solve(XValues %*% t(XValues)); + # initialSeasonValue <- (yInSample-rowMeans(yInSample)) %*% t(XValues) %*% solve(XValues %*% t(XValues)); # } if(initialSeasonType=="g"){ initialSeasonValue <- matrix(colMeans(initialSeasonValue),1,maxlag); @@ -630,7 +638,7 @@ EstimatorVES <- function(...){ names(A) <- AList$ANames; # First part is for the covariance matrix - if(cfType=="l"){ + if(loss=="l"){ nParam <- nSeries * (nSeries + 1) / 2 + length(A); } else{ @@ -699,10 +707,10 @@ CreatorVES <- function(silent=FALSE,...){ # Number of parameters # First part is for the covariance matrix - if(cfType=="l"){ + if(loss=="l"){ nParam <- nSeries * (nSeries + 1) / 2; } - else if(cfType=="d"){ + else if(loss=="d"){ nParam <- nSeries; } else{ @@ -729,7 +737,7 @@ CreatorVES <- function(silent=FALSE,...){ } } -##### Preset y.fit, y.for, errors and basic parameters ##### +##### Preset yFitted, yForecast, errors and basic parameters ##### yFitted <- matrix(NA,nSeries,obsInSample); yForecast <- matrix(NA,nSeries,h); errors <- matrix(NA,nSeries,obsInSample); @@ -842,30 +850,30 @@ CreatorVES <- function(silent=FALSE,...){ colnames(initialSeasonValue) <- paste0("Seasonal",c(1:maxlag)); } - matvt <- ts(t(matvt),start=(time(data)[1] - dataDeltat*maxlag),frequency=dataFreq); + matvt <- ts(t(matvt),start=(time(y)[1] - dataDeltat*maxlag),frequency=dataFreq); yFitted <- ts(t(yFitted),start=dataStart,frequency=dataFreq); errors <- ts(t(errors),start=dataStart,frequency=dataFreq); - yForecast <- ts(t(yForecast),start=time(data)[obsInSample] + dataDeltat,frequency=dataFreq); + yForecast <- ts(t(yForecast),start=yForecastStart,frequency=dataFreq); if(!is.matrix(yForecast)){ yForecast <- as.matrix(yForecast,h,nSeries); } colnames(yForecast) <- dataNames; - forecastStart <- start(yForecast) - if(any(intervalsType==c("i","u"))){ - PI <- ts(PI,start=forecastStart,frequency=dataFreq); + yForecastStart <- start(yForecast) + if(any(intervalType==c("i","u"))){ + PI <- ts(PI,start=yForecastStart,frequency=dataFreq); } - if(cfType=="l"){ - cfType <- "likelihood"; + if(loss=="l"){ + loss <- "likelihood"; parametersNumber[1,1] <- parametersNumber[1,1] + nSeries * (nSeries + 1) / 2; } - else if(cfType=="d"){ - cfType <- "diagonal"; + else if(loss=="d"){ + loss <- "diagonal"; parametersNumber[1,1] <- parametersNumber[1,1] + nSeries; } else{ - cfType <- "trace"; + loss <- "trace"; parametersNumber[1,1] <- parametersNumber[1,1] + nSeries; } @@ -879,16 +887,16 @@ CreatorVES <- function(silent=FALSE,...){ ##### Now let's deal with the holdout ##### if(holdout){ - yHoldout <- ts(data[(obsInSample+1):obsAll,],start=forecastStart,frequency=dataFreq); + yHoldout <- ts(y[(obsInSample+1):obsAll,],start=yForecastStart,frequency=dataFreq); colnames(yHoldout) <- dataNames; - measureFirst <- measures(yHoldout[,1],yForecast[,1],y[1,]); + measureFirst <- measures(yHoldout[,1],yForecast[,1],yInSample[1,]); errorMeasures <- matrix(NA,nSeries,length(measureFirst)); rownames(errorMeasures) <- dataNames; colnames(errorMeasures) <- names(measureFirst); errorMeasures[1,] <- measureFirst; for(i in 2:nSeries){ - errorMeasures[i,] <- measures(yHoldout[,i],yForecast[,i],y[i,]); + errorMeasures[i,] <- measures(yHoldout[,i],yForecast[,i],yInSample[i,]); } } else{ @@ -924,37 +932,37 @@ CreatorVES <- function(silent=FALSE,...){ for(j in 1:pages){ par(mar=c(4,4,2,1),mfcol=c(perPage,1)); for(i in packs[j]:(packs[j+1]-1)){ - if(any(intervalsType==c("u","i"))){ - plotRange <- range(min(data[,i],yForecast[,i],yFitted[,i],PI[,i*2-1]), - max(data[,i],yForecast[,i],yFitted[,i],PI[,i*2])); + if(any(intervalType==c("u","i"))){ + plotRange <- range(min(y[,i],yForecast[,i],yFitted[,i],PI[,i*2-1]), + max(y[,i],yForecast[,i],yFitted[,i],PI[,i*2])); } else{ - plotRange <- range(min(data[,i],yForecast[,i],yFitted[,i]), - max(data[,i],yForecast[,i],yFitted[,i])); + plotRange <- range(min(y[,i],yForecast[,i],yFitted[,i]), + max(y[,i],yForecast[,i],yFitted[,i])); } - plot(data[,i],main=paste0(modelname," ",dataNames[i]),ylab="Y", - ylim=plotRange, xlim=range(time(data[,i])[1],time(yForecast)[max(h,1)]), + plot(y[,i],main=paste0(modelname," ",dataNames[i]),ylab="Y", + ylim=plotRange, xlim=range(time(y[,i])[1],time(yForecast)[max(h,1)]), type="l"); lines(yFitted[,i],col="purple",lwd=2,lty=2); if(h>1){ - if(any(intervalsType==c("u","i"))){ + if(any(intervalType==c("u","i"))){ lines(PI[,i*2-1],col="darkgrey",lwd=3,lty=2); lines(PI[,i*2],col="darkgrey",lwd=3,lty=2); - polygon(c(seq(dataDeltat*(forecastStart[2]-1)+forecastStart[1],dataDeltat*(end(yForecast)[2]-1)+end(yForecast)[1],dataDeltat), - rev(seq(dataDeltat*(forecastStart[2]-1)+forecastStart[1],dataDeltat*(end(yForecast)[2]-1)+end(yForecast)[1],dataDeltat))), + polygon(c(seq(dataDeltat*(yForecastStart[2]-1)+yForecastStart[1],dataDeltat*(end(yForecast)[2]-1)+end(yForecast)[1],dataDeltat), + rev(seq(dataDeltat*(yForecastStart[2]-1)+yForecastStart[1],dataDeltat*(end(yForecast)[2]-1)+end(yForecast)[1],dataDeltat))), c(as.vector(PI[,i*2]), rev(as.vector(PI[,i*2-1]))), col = "lightgray", border=NA, density=10); } lines(yForecast[,i],col="blue",lwd=2); } else{ - if(any(intervalsType==c("u","i"))){ + if(any(intervalType==c("u","i"))){ points(PI[,i*2-1],col="darkgrey",lwd=3,pch=4); points(PI[,i*2],col="darkgrey",lwd=3,pch=4); } points(yForecast[,i],col="blue",lwd=2,pch=4); } - abline(v=dataDeltat*(forecastStart[2]-2)+forecastStart[1],col="red",lwd=2); + abline(v=dataDeltat*(yForecastStart[2]-2)+yForecastStart[1],col="red",lwd=2); } } par(parDefault); @@ -966,9 +974,9 @@ CreatorVES <- function(silent=FALSE,...){ measurement=matW, phi=dampedValue, coefficients=A, initialType=initialType,initial=initialValue,initialSeason=initialSeasonValue, nParam=parametersNumber, imodel=imodel, - actuals=data,fitted=yFitted,holdout=yHoldout,residuals=errors,Sigma=Sigma, - forecast=yForecast,PI=PI,intervals=intervalsType,level=level, - ICs=ICs,logLik=logLik,cf=cfObjective,cfType=cfType,accuracy=errorMeasures, + y=y,fitted=yFitted,holdout=yHoldout,residuals=errors,Sigma=Sigma, + forecast=yForecast,PI=PI,interval=intervalType,level=level, + ICs=ICs,logLik=logLik,lossValue=cfObjective,loss=loss,accuracy=errorMeasures, FI=FI); return(structure(model,class=c("vsmooth","smooth"))); } diff --git a/R/viss.R b/R/viss.R index 008827a..2bb40c1 100644 --- a/R/viss.R +++ b/R/viss.R @@ -10,7 +10,7 @@ #' @template ssAuthor #' @template ssKeywords #' -#' @param data The matrix with data, where series are in columns and +#' @param y The matrix with data, where series are in columns and #' observations are in rows. #' @param intermittent Type of method used in probability estimation. Can be #' \code{"none"} - none, \code{"fixed"} - constant probability or @@ -40,6 +40,7 @@ #' If \code{NULL}, then it is estimated. See \link[smooth]{ves} for the details. #' @param xreg Vector of matrix of exogenous variables, explaining some parts #' of occurrence variable (probability). +#' @param ... Other parameters. This is not needed for now. #' @return The object of class "iss" is returned. It contains following list of #' values: #' @@ -52,7 +53,7 @@ #' \item \code{logLik} - likelihood value for the model #' \item \code{nParam} - number of parameters used in the model; #' \item \code{residuals} - residuals of the model; -#' \item \code{actuals} - actual values of probabilities (zeros and ones). +#' \item \code{y} - actual values of probabilities (zeros and ones). #' \item \code{persistence} - the vector of smoothing parameters; #' \item \code{initial} - initial values of the state vector; #' \item \code{initialSeason} - the matrix of initials seasonal states; @@ -75,14 +76,18 @@ #' viss(Y, intermittent="l", probability="i") #' #' @export viss -viss <- function(data, intermittent=c("logistic","none","fixed"), +viss <- function(y, intermittent=c("logistic","none","fixed"), ic=c("AICc","AIC","BIC","BICc"), h=10, holdout=FALSE, probability=c("dependent","independent"), model="ANN", persistence=NULL, transition=NULL, phi=NULL, - initial=NULL, initialSeason=NULL, xreg=NULL){ + initial=NULL, initialSeason=NULL, xreg=NULL, ...){ # Function returns intermittent State-Space model # probability="i" - assume that ot[,1] is independent from ot[,2], but has similar dynamics; # probability="d" - assume that ot[,1] and ot[,2] are dependent, so that sum(P)=1; + + ##### Check if data was used instead of y. Remove by 2.6.0 ##### + y <- depricator(y, list(...), "data"); + intermittent <- substring(intermittent[1],1,1); if(all(intermittent!=c("n","f","l"))){ warning(paste0("Unknown value of intermittent provided: '",intermittent,"'.")); @@ -156,25 +161,25 @@ viss <- function(data, intermittent=c("logistic","none","fixed"), } } - if(is.data.frame(data)){ - data <- as.matrix(data); + if(is.data.frame(y)){ + y <- as.matrix(y); } # Number of series in the matrix - nSeries <- ncol(data); + nSeries <- ncol(y); - if(is.null(ncol(data))){ + if(is.null(ncol(y))){ stop("The provided data is not a matrix! Use iss() function instead!", call.=FALSE); } - if(ncol(data)==1){ + if(ncol(y)==1){ stop("The provided data contains only one column. Use iss() function instead!", call.=FALSE); } # Check the data for NAs - if(any(is.na(data))){ + if(any(is.na(y))){ if(!silentText){ warning("Data contains NAs. These observations will be substituted by zeroes.", call.=FALSE); } - data[is.na(data)] <- 0; + y[is.na(y)] <- 0; } if(intermittent=="n"){ @@ -182,23 +187,24 @@ viss <- function(data, intermittent=c("logistic","none","fixed"), } # Define obs, the number of observations of in-sample - obsInSample <- nrow(data) - holdout*h; + obsInSample <- nrow(y) - holdout*h; # Define obsAll, the overal number of observations (in-sample + holdout) - obsAll <- nrow(data) + (1 - holdout)*h; + obsAll <- nrow(y) + (1 - holdout)*h; # If obsInSample is negative, this means that we can't do anything... if(obsInSample<=2){ stop("Not enough observations in sample.", call.=FALSE); } # Define the actual values. - dataFreq <- frequency(data); - dataDeltat <- deltat(data); - dataStart <- start(data); - y <- ts(matrix(data[1:obsInSample,],obsInSample,nSeries),start=dataStart,frequency=dataFreq); + dataFreq <- frequency(y); + dataDeltat <- deltat(y); + dataStart <- start(y); + yInSample <- ts(matrix(y[1:obsInSample,],obsInSample,nSeries),start=dataStart,frequency=dataFreq); + yForecastStart <- time(y)[obsInSample]+deltat(y); - ot <- (y!=0)*1; - otAll <- (data!=0)*1; + ot <- (yInSample!=0)*1; + otAll <- (y!=0)*1; obsOnes <- apply(ot,2,sum); pFitted <- matrix(NA,obsInSample,nSeries); @@ -322,11 +328,11 @@ viss <- function(data, intermittent=c("logistic","none","fixed"), states <- ts(states, start=dataStart, frequency=dataFreq); pFitted <- ts(pFitted, start=dataStart, frequency=dataFreq); - pForecast <- ts(pForecast, start=time(data)[obsInSample] + dataDeltat, frequency=dataFreq); + pForecast <- ts(pForecast, start=time(y)[obsInSample] + dataDeltat, frequency=dataFreq); output <- list(model=model, fitted=pFitted, forecast=pForecast, states=states, variance=pForecast*(1-pForecast), logLik=logLik, nParam=nParam, - residuals=errors, actuals=otAll, persistence=persistence, initial=initial, + residuals=errors, y=otAll, persistence=persistence, initial=initial, initialSeason=initialSeason, intermittent=intermittent, issModel=issModel, probability=probability); diff --git a/R/vmethods.R b/R/vmethods.R index f495c25..6fcf806 100644 --- a/R/vmethods.R +++ b/R/vmethods.R @@ -16,9 +16,9 @@ logLik.viss <- function(object,...){ AICc.vsmooth <- function(object, ...){ llikelihood <- logLik(object); llikelihood <- llikelihood[1:length(llikelihood)]; - nSeries <- ncol(object$actuals); + nSeries <- ncol(actuals(object)); # Remove covariances in the number of parameters - nParamAll <- nparam(object) / nSeries - switch(object$cfType, + nParamAll <- nparam(object) / nSeries - switch(object$loss, "likelihood" = nSeries*(nSeries+1)/2, "trace" = , "diagonal" = 1); @@ -34,9 +34,9 @@ AICc.vsmooth <- function(object, ...){ BICc.vsmooth <- function(object, ...){ llikelihood <- logLik(object); llikelihood <- llikelihood[1:length(llikelihood)]; - nSeries <- ncol(object$actuals); + nSeries <- ncol(actuals(object)); # Remove covariances in the number of parameters - nParamAll <- nparam(object) / nSeries - switch(object$cfType, + nParamAll <- nparam(object) / nSeries - switch(object$loss, "likelihood" = nSeries*(nSeries+1)/2, "trace" = , "diagonal" = 1); @@ -106,13 +106,13 @@ plot.viss <- function(x, ...){ intermittent <- "None"; } - actuals <- x$actuals; + y <- actuals(x); yForecast <- x$forecast; yFitted <- x$fitted; - dataDeltat <- deltat(actuals); + dataDeltat <- deltat(y); forecastStart <- start(yForecast); h <- nrow(yForecast); - nSeries <- ncol(actuals); + nSeries <- ncol(y); modelname <- paste0("iVES(",x$model,")") pages <- ceiling(nSeries / 5); @@ -120,10 +120,10 @@ plot.viss <- function(x, ...){ for(j in 1:pages){ par(mfcol=c(min(5,floor(nSeries/j)),1)); for(i in 1:nSeries){ - plotRange <- range(min(actuals[,i],yForecast[,i],yFitted[,i]), - max(actuals[,i],yForecast[,i],yFitted[,i])); - plot(actuals[,i],main=paste0(modelname,", series ", i),ylab="Y", - ylim=plotRange, xlim=range(time(actuals[,i])[1],time(yForecast)[max(h,1)]), + plotRange <- range(min(y[,i],yForecast[,i],yFitted[,i]), + max(y[,i],yForecast[,i],yFitted[,i])); + plot(y[,i],main=paste0(modelname,", series ", i),ylab="Y", + ylim=plotRange, xlim=range(time(y[,i])[1],time(yForecast)[max(h,1)]), type="l"); lines(yFitted[,i],col="purple",lwd=2,lty=2); if(h>1){ @@ -211,16 +211,16 @@ print.viss <- function(x, ...){ #' @export print.vsmooth <- function(x, ...){ holdout <- any(!is.na(x$holdout)); - intervals <- any(!is.na(x$PI)); + interval <- any(!is.na(x$PI)); - # if(all(holdout,intervals)){ - # insideintervals <- sum((x$holdout <= x$upper) & (x$holdout >= x$lower)) / length(x$forecast) * 100; + # if(all(holdout,interval)){ + # insideinterval <- sum((x$holdout <= x$upper) & (x$holdout >= x$lower)) / length(x$forecast) * 100; # } # else{ - # insideintervals <- NULL; + # insideinterval <- NULL; # } - intervalsType <- x$intervals; + intervalType <- x$interval; cat(paste0("Time elapsed: ",round(as.numeric(x$timeElapsed,units="secs"),2)," seconds\n")); cat(paste0("Model estimated: ",x$model,"\n")); @@ -249,10 +249,10 @@ print.vsmooth <- function(x, ...){ } if(!is.null(x$nParam)){ if(x$nParam[1,4]==1){ - cat(paste0(x$nParam[1,4]," parameter was estimated for ", ncol(x$actuals) ," time series in the process\n")); + cat(paste0(x$nParam[1,4]," parameter was estimated for ", ncol(actuals(x)) ," time series in the process\n")); } else{ - cat(paste0(x$nParam[1,4]," parameters were estimated for ", ncol(x$actuals) ," time series in the process\n")); + cat(paste0(x$nParam[1,4]," parameters were estimated for ", ncol(actuals(x)) ," time series in the process\n")); } if(x$nParam[2,4]>1){ @@ -263,9 +263,9 @@ print.vsmooth <- function(x, ...){ } } - cat(paste0("Cost function type: ",x$cfType)) - if(!is.null(x$cf)){ - cat(paste0("; Cost function value: ",round(x$cf,3),"\n")); + cat(paste0("Loss function type: ",x$loss)) + if(!is.null(x$lossValue)){ + cat(paste0("; Loss function value: ",round(x$lossValue,3),"\n")); } else{ cat("\n"); @@ -274,17 +274,17 @@ print.vsmooth <- function(x, ...){ cat("\nInformation criteria:\n"); print(x$ICs); - if(intervals){ - if(x$intervals=="c"){ - intervalsType <- "conditional"; + if(interval){ + if(x$interval=="c"){ + intervalType <- "conditional"; } - else if(x$intervals=="u"){ - intervalsType <- "unconditional"; + else if(x$interval=="u"){ + intervalType <- "unconditional"; } - else if(x$intervals=="i"){ - intervalsType <- "independent"; + else if(x$interval=="i"){ + intervalType <- "independent"; } - cat(paste0(x$level*100,"% ",intervalsType," prediction intervals were constructed\n")); + cat(paste0(x$level*100,"% ",intervalType," prediction interval were constructed\n")); } } @@ -303,7 +303,7 @@ simulate.vsmooth <- function(object, nsim=1, seed=NULL, obs=NULL, ...){ # Start a list of arguments args <- vector("list",0); - args$nSeries <- ncol(object$actuals); + args$nSeries <- ncol(actuals(object)); if(!is.null(ellipsis$randomizer)){ randomizer <- ellipsis$randomizer; @@ -374,7 +374,7 @@ simulate.vsmooth <- function(object, nsim=1, seed=NULL, obs=NULL, ...){ } args$randomizer <- randomizer; - args$frequency <- frequency(object$actuals); + args$frequency <- frequency(actuals(object)); args$obs <- obs; args$nsim <- nsim; args$initial <- object$initial; diff --git a/R/vssFunctions.R b/R/vssFunctions.R index 6b23cb9..ddfd6d4 100644 --- a/R/vssFunctions.R +++ b/R/vssFunctions.R @@ -60,61 +60,62 @@ vssInput <- function(smoothType=c("ves"),...){ } #### Check data #### - if(any(is.vsmooth.sim(data))){ - data <- data$data; - if(length(dim(data))==3){ + if(any(is.vsmooth.sim(y))){ + y <- y$data; + if(length(dim(y))==3){ warning("Simulated data contains several samples. Selecting a random one.",call.=FALSE); - data <- ts(data[,,runif(1,1,dim(data)[3])]); + y <- ts(y[,,runif(1,1,dim(y)[3])]); } } - if(!is.data.frame(data)){ - if(!is.numeric(data)){ + if(!is.data.frame(y)){ + if(!is.numeric(y)){ stop("The provided data is not a numeric matrix! Can't construct any model!", call.=FALSE); } } - if(is.null(dim(data))){ + if(is.null(dim(y))){ stop("The provided data is not a matrix or a data.frame! If it is a vector, please use es() function instead.", call.=FALSE); } - if(is.data.frame(data)){ - data <- as.matrix(data); + if(is.data.frame(y)){ + y <- as.matrix(y); } # Number of series in the matrix - nSeries <- ncol(data); + nSeries <- ncol(y); - if(is.null(ncol(data))){ + if(is.null(ncol(y))){ stop("The provided data is not a matrix! Use es() function instead!", call.=FALSE); } - if(ncol(data)==1){ + if(ncol(y)==1){ stop("The provided data contains only one column. Use es() function instead!", call.=FALSE); } # Check the data for NAs - if(any(is.na(data))){ + if(any(is.na(y))){ if(!silentText){ warning("Data contains NAs. These observations will be substituted by zeroes.", call.=FALSE); } - data[is.na(data)] <- 0; + y[is.na(y)] <- 0; } # Define obs, the number of observations of in-sample - obsInSample <- nrow(data) - holdout*h; + obsInSample <- nrow(y) - holdout*h; # Define obsAll, the overal number of observations (in-sample + holdout) - obsAll <- nrow(data) + (1 - holdout)*h; + obsAll <- nrow(y) + (1 - holdout)*h; # If obsInSample is negative, this means that we can't do anything... if(obsInSample<=0){ stop("Not enough observations in sample.", call.=FALSE); } # Define the actual values. Transpose the matrix! - y <- matrix(data[1:obsInSample,],nSeries,obsInSample,byrow=TRUE); - dataFreq <- frequency(data); - dataDeltat <- deltat(data); - dataStart <- start(data); - dataNames <- colnames(data); + yInSample <- matrix(y[1:obsInSample,],nSeries,obsInSample,byrow=TRUE); + dataFreq <- frequency(y); + dataDeltat <- deltat(y); + dataStart <- start(y); + yForecastStart <- time(y)[obsInSample]+deltat(y); + dataNames <- colnames(y); if(!is.null(dataNames)){ dataNames <- gsub(" ", "_", dataNames, fixed = TRUE); dataNames <- gsub(":", "_", dataNames, fixed = TRUE); @@ -182,7 +183,7 @@ vssInput <- function(smoothType=c("ves"),...){ #### Check seasonality type #### # Check if the data is ts-object - if(!is.ts(data) & Stype!="N"){ + if(!is.ts(y) & Stype!="N"){ warning("The provided data is not ts object. Only non-seasonal models are available."); Stype <- "N"; substr(model,nchar(model),nchar(model)) <- "N"; @@ -241,8 +242,8 @@ vssInput <- function(smoothType=c("ves"),...){ ##### intermittent ##### intermittent <- substring(intermittent[1],1,1); if(intermittent!="n"){ - ot <- (y!=0)*1; - # Matrix of non-zero observations for the cost function + ot <- (yInSample!=0)*1; + # Matrix of non-zero observations for the loss function otObs <- diag(rowSums(ot)); for(i in 1:nSeries){ for(j in 1:nSeries){ @@ -254,7 +255,7 @@ vssInput <- function(smoothType=c("ves"),...){ } } else{ - ot <- matrix(1,nrow=nrow(y),ncol=ncol(y)); + ot <- matrix(1,nrow=nrow(yInSample),ncol=ncol(yInSample)); otObs <- matrix(obsInSample,nSeries,nSeries); } @@ -268,11 +269,11 @@ vssInput <- function(smoothType=c("ves"),...){ # Check if multiplicative model can be applied if(any(c(Etype,Ttype,Stype)=="M")){ - if(all(y>0)){ + if(all(yInSample>0)){ if(any(c(Etype,Ttype,Stype)=="A")){ warning("Mixed models are not available. Switching to pure multiplicative.",call.=FALSE); } - y <- log(y); + yInSample <- log(yInSample); Etype <- "M"; Ttype <- ifelse(Ttype=="A","M",Ttype); Stype <- ifelse(Stype=="A","M",Stype); @@ -287,7 +288,7 @@ vssInput <- function(smoothType=c("ves"),...){ modelIsMultiplicative <- FALSE; } else{ - y[ot==1] <- log(y[ot==1]); + yInSample[ot==1] <- log(yInSample[ot==1]); Etype <- "M"; Ttype <- ifelse(Ttype=="A","M",Ttype); Stype <- ifelse(Stype=="A","M",Stype); @@ -703,15 +704,15 @@ vssInput <- function(smoothType=c("ves"),...){ } } - ##### Cost function type ##### - cfType <- cfType[1]; - if(!any(cfType==c("likelihood","diagonal","trace","l","d","t"))){ - warning(paste0("Strange cost function specified: ",cfType,". Switching to 'likelihood'."),call.=FALSE); - cfType <- "likelihood"; + ##### Loss function type ##### + loss <- loss[1]; + if(!any(loss==c("likelihood","diagonal","trace","l","d","t"))){ + warning(paste0("Strange loss function specified: ",loss,". Switching to 'likelihood'."),call.=FALSE); + loss <- "likelihood"; } - cfType <- substr(cfType,1,1); + loss <- substr(loss,1,1); - normalizer <- sum(colMeans(abs(diff(t(y))),na.rm=TRUE)); + normalizer <- sum(colMeans(abs(diff(t(yInSample))),na.rm=TRUE)); ##### Information Criteria ##### ic <- ic[1]; @@ -720,42 +721,42 @@ vssInput <- function(smoothType=c("ves"),...){ ic <- "AICc"; } - ##### intervals, intervalsType, level ##### - intervalsType <- intervals[1]; + ##### interval, intervalType, level ##### + intervalType <- interval[1]; # Check the provided type of interval - if(is.logical(intervalsType)){ - if(intervalsType){ - intervalsType <- "c"; + if(is.logical(intervalType)){ + if(intervalType){ + intervalType <- "c"; } else{ - intervalsType <- "none"; + intervalType <- "none"; } } - if(all(intervalsType!=c("c","u","i","n","none","conditional","unconditional","independent"))){ - warning(paste0("Wrong type of interval: '",intervalsType, "'. Switching to 'conditional'."),call.=FALSE); - intervalsType <- "c"; + if(all(intervalType!=c("c","u","i","n","none","conditional","unconditional","independent"))){ + warning(paste0("Wrong type of interval: '",intervalType, "'. Switching to 'conditional'."),call.=FALSE); + intervalType <- "c"; } - if(intervalsType=="none"){ - intervalsType <- "n"; - intervals <- FALSE; + if(intervalType=="none"){ + intervalType <- "n"; + interval <- FALSE; } - else if(intervalsType=="conditional"){ - intervalsType <- "c"; - intervals <- TRUE; + else if(intervalType=="conditional"){ + intervalType <- "c"; + interval <- TRUE; } - else if(intervalsType=="unconditional"){ - intervalsType <- "u"; - intervals <- TRUE; + else if(intervalType=="unconditional"){ + intervalType <- "u"; + interval <- TRUE; } - else if(intervalsType=="independent"){ - intervalsType <- "i"; - intervals <- TRUE; + else if(intervalType=="independent"){ + intervalType <- "i"; + interval <- TRUE; } else{ - intervals <- TRUE; + interval <- TRUE; } if(level>1){ @@ -800,11 +801,12 @@ vssInput <- function(smoothType=c("ves"),...){ assign("obsStates",obsStates,ParentEnvironment); assign("nSeries",nSeries,ParentEnvironment); assign("nParamMax",nParamMax,ParentEnvironment); - assign("data",data,ParentEnvironment); assign("y",y,ParentEnvironment); + assign("yInSample",yInSample,ParentEnvironment); assign("dataFreq",dataFreq,ParentEnvironment); assign("dataDeltat",dataDeltat,ParentEnvironment); assign("dataStart",dataStart,ParentEnvironment); + assign("yForecastStart",yForecastStart,ParentEnvironment); assign("dataNames",dataNames,ParentEnvironment); assign("parametersNumber",parametersNumber,ParentEnvironment); @@ -840,13 +842,13 @@ vssInput <- function(smoothType=c("ves"),...){ assign("initialSeasonType",initialSeasonType,ParentEnvironment); assign("initialSeasonEstimate",initialSeasonEstimate,ParentEnvironment); - assign("cfType",cfType,ParentEnvironment); + assign("loss",loss,ParentEnvironment); assign("normalizer",normalizer,ParentEnvironment); assign("ic",ic,ParentEnvironment); - assign("intervalsType",intervalsType,ParentEnvironment); - assign("intervals",intervals,ParentEnvironment); + assign("intervalType",intervalType,ParentEnvironment); + assign("interval",interval,ParentEnvironment); assign("intermittent",intermittent,ParentEnvironment); assign("ot",ot,ParentEnvironment); @@ -873,7 +875,7 @@ vLikelihoodFunction <- function(A){ return(- obsInSample/2 * (nSeries*log(2*pi*exp(1)) + CF(A))); } else if(Etype=="M"){ - return(- obsInSample/2 * (nSeries*log(2*pi*exp(1)) + CF(A)) - sum(y)); + return(- obsInSample/2 * (nSeries*log(2*pi*exp(1)) + CF(A)) - sum(yInSample)); } else{ #### This is not derived yet #### @@ -913,7 +915,7 @@ vssFitter <- function(...){ ellipsis <- list(...); ParentEnvironment <- ellipsis[['ParentEnvironment']]; - fitting <- vFitterWrap(y, matvt, matF, matW, matG, + fitting <- vFitterWrap(yInSample, matvt, matF, matW, matG, modelLags, Etype, Ttype, Stype, ot); statesNames <- rownames(matvt); matvt <- fitting$matvt; @@ -933,10 +935,10 @@ vssFitter <- function(...){ assign("errors",errors,ParentEnvironment); } -##### *State space intervals* ##### +##### *State space interval* ##### # This is not implemented yet #' @importFrom stats qchisq -vssIntervals <- function(level=0.95, intervalsType=c("c","u","i"), Sigma=NULL, +vssIntervals <- function(level=0.95, intervalType=c("c","u","i"), Sigma=NULL, measurement=NULL, transition=NULL, persistence=NULL, modelLags=NULL, cumulative=FALSE, df=0, nComponents=1, nSeries=1, h=1){ @@ -946,12 +948,12 @@ vssIntervals <- function(level=0.95, intervalsType=c("c","u","i"), Sigma=NULL, nElements <- length(modelLags); # This is a temporary solution, needed while we work on other types. - if(intervalsType!="i"){ - intervalsType <- "i"; + if(intervalType!="i"){ + intervalType <- "i"; } # In case of independent we use either t distribution or Chebyshev inequality - if(intervalsType=="i"){ + if(intervalType=="i"){ if(df>0){ quantUpper <- qt((1+level)/2,df=df); quantLower <- qt((1-level)/2,df=df); @@ -967,7 +969,7 @@ vssIntervals <- function(level=0.95, intervalsType=c("c","u","i"), Sigma=NULL, } nPoints <- 100; - if(intervalsType=="c"){ + if(intervalType=="c"){ # Nuber of points in the ellipse PI <- array(NA, c(h,2*nPoints^(nSeries-1),nSeries), dimnames=list(paste0("h",c(1:h)), NULL, @@ -1050,7 +1052,7 @@ vssIntervals <- function(level=0.95, intervalsType=c("c","u","i"), Sigma=NULL, } # Produce PI matrix - if(any(intervalsType==c("c","u"))){ + if(any(intervalType==c("c","u"))){ # eigensList contains eigenvalues and eigenvectors of the covariance matrix eigensList <- apply(varVec,1,eigen); # eigenLimits specify the lowest and highest ellipse points in all dimensions @@ -1068,7 +1070,7 @@ vssIntervals <- function(level=0.95, intervalsType=c("c","u","i"), Sigma=NULL, } } } - else if(intervalsType=="i"){ + else if(intervalType=="i"){ variances <- apply(varVec,1,diag); for(i in 1:nSeries){ PI[,2*i-1] <- quantLower * sqrt(variances[i,]); @@ -1102,7 +1104,7 @@ vssForecaster <- function(...){ # df <- 0; # } # else{ - # Take the minimum df for the purposes of intervals construction + # Take the minimum df for the purposes of interval construction df <- min(df); # } @@ -1116,13 +1118,13 @@ vssForecaster <- function(...){ yForecast <- rowSums(yForecast); } - if(intervals){ - PI <- vssIntervals(level=level, intervalsType=intervalsType, Sigma=Sigma, + if(interval){ + PI <- vssIntervals(level=level, intervalType=intervalType, Sigma=Sigma, measurement=matW, transition=matF, persistence=matG, modelLags=modelLags, cumulative=cumulative, df=df, nComponents=nComponentsAll, nSeries=nSeries, h=h); - if(any(intervalsType==c("i","u"))){ + if(any(intervalType==c("i","u"))){ for(i in 1:nSeries){ PI[,i*2-1] <- PI[,i*2-1] + yForecast[i,]; PI[,i*2] <- PI[,i*2] + yForecast[i,]; diff --git a/build/partial.rdb b/build/partial.rdb index 164053e..4625e6e 100644 Binary files a/build/partial.rdb and b/build/partial.rdb differ diff --git a/inst/doc/ces.R b/inst/doc/ces.R index 05bf9c5..7ec104c 100644 --- a/inst/doc/ces.R +++ b/inst/doc/ces.R @@ -10,17 +10,17 @@ require(Mcomp) ces(M3$N2457$x, h=18, holdout=TRUE, silent=FALSE) ## ----auto_ces_N2457------------------------------------------------------ -auto.ces(M3$N2457$x, h=18, holdout=TRUE, intervals="p", silent=FALSE) +auto.ces(M3$N2457$x, h=18, holdout=TRUE, interval="p", silent=FALSE) ## ----auto_ces_N2457_optimal---------------------------------------------- -auto.ces(M3$N2457$x, h=18, holdout=TRUE, initial="o", intervals="sp") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, initial="o", interval="sp") ## ----es_N2457_xreg_create------------------------------------------------ x <- cbind(rnorm(length(M3$N2457$x),50,3),rnorm(length(M3$N2457$x),100,7)) ## ----auto_ces_N2457_xreg_simple------------------------------------------ -auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, xregDo="select", intervals="p") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, xregDo="select", interval="p") ## ----auto_ces_N2457_xreg_update------------------------------------------ -auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE, intervals="p") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE, interval="p") diff --git a/inst/doc/ces.Rmd b/inst/doc/ces.Rmd index a8c6da9..d449711 100644 --- a/inst/doc/ces.Rmd +++ b/inst/doc/ces.Rmd @@ -34,14 +34,14 @@ This output is very similar to ones printed out by `es()` function. The only dif If we want automatic model selection, then we use `auto.ces()` function: ```{r auto_ces_N2457} -auto.ces(M3$N2457$x, h=18, holdout=TRUE, intervals="p", silent=FALSE) +auto.ces(M3$N2457$x, h=18, holdout=TRUE, interval="p", silent=FALSE) ``` -Note that prediction intervals are too narrow and do not include 95% of values. This is because CES is pure additive model and it cannot take into account possible heteroscedasticity. +Note that prediction interval are too narrow and do not include 95% of values. This is because CES is pure additive model and it cannot take into account possible heteroscedasticity. If for some reason we want to optimise initial values then we call: ```{r auto_ces_N2457_optimal} -auto.ces(M3$N2457$x, h=18, holdout=TRUE, initial="o", intervals="sp") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, initial="o", interval="sp") ``` Now let's introduce some artificial exogenous variables: @@ -49,12 +49,12 @@ Now let's introduce some artificial exogenous variables: x <- cbind(rnorm(length(M3$N2457$x),50,3),rnorm(length(M3$N2457$x),100,7)) ``` -`ces()` allows using exogenous variables and different types of prediction intervals in exactly the same manner as `es()`: +`ces()` allows using exogenous variables and different types of prediction interval in exactly the same manner as `es()`: ```{r auto_ces_N2457_xreg_simple} -auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, xregDo="select", intervals="p") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, xregDo="select", interval="p") ``` The same model but with updated parameters of exogenous variables is called: ```{r auto_ces_N2457_xreg_update} -auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE, intervals="p") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE, interval="p") ``` diff --git a/inst/doc/ces.html b/inst/doc/ces.html index 69bb12b..c375892 100644 --- a/inst/doc/ces.html +++ b/inst/doc/ces.html @@ -12,7 +12,7 @@ - + ces() - Complex Exponential Smoothing @@ -303,7 +303,7 @@

ces() - Complex Exponential Smoothing

Ivan Svetunkov

-

2019-04-25

+

2019-06-13

@@ -314,13 +314,13 @@

2019-04-25

ces() function allows constructing Complex Exponential Smoothing either with no seasonality, or with simple / partial / full seasonality. A simple call for ces() results in estimation of non-seasonal model:

For the same series from M3 dataset ces() can be constructed using:

-
## Time elapsed: 0.27 seconds
+
## Time elapsed: 0.67 seconds
 ## Model estimated: CES(n)
 ## a0 + ia1: 1.09841+1.01221i
 ## Initial values were optimised.
 ## 5 parameters were estimated in the process
 ## Residuals standard deviation: 1392.904
-## Cost function type: MSE; Cost function value: 1940181.381
+## Loss function type: MSE; Loss function value: 1940181.381
 ## 
 ## Information criteria:
 ##      AIC     AICc      BIC     BICc 
@@ -331,84 +331,85 @@ 

2019-04-25

This output is very similar to ones printed out by es() function. The only difference is complex smoothing parameter values which are printed out instead of persistence vector in es().

If we want automatic model selection, then we use auto.ces() function:

- +
## Estimating CES with seasonality: "n" "s" "f"  
 ## The best model is with seasonality = "n"
-
## Time elapsed: 0.91 seconds
+
## Time elapsed: 2.31 seconds
 ## Model estimated: CES(n)
 ## a0 + ia1: 1.09841+1.01221i
 ## Initial values were optimised.
 ## 5 parameters were estimated in the process
 ## Residuals standard deviation: 1392.904
-## Cost function type: MSE; Cost function value: 1940181.381
+## Loss function type: MSE; Loss function value: 1940181.381
 ## 
 ## Information criteria:
 ##      AIC     AICc      BIC     BICc 
 ## 1689.668 1690.328 1702.542 1704.050 
-## 95% parametric prediction intervals were constructed
+## 95% parametric prediction interval were constructed
 ## 50% of values are in the prediction interval
 ## Forecast errors:
 ## MPE: 15.2%; sCE: -1502.3%; Bias: 75.7%; MAPE: 37.3%
 ## MASE: 2.622; sMAE: 107%; sMSE: 202.9%; RelMAE: 1.12; RelRMSE: 1.25

-

Note that prediction intervals are too narrow and do not include 95% of values. This is because CES is pure additive model and it cannot take into account possible heteroscedasticity.

+

Note that prediction interval are too narrow and do not include 95% of values. This is because CES is pure additive model and it cannot take into account possible heteroscedasticity.

If for some reason we want to optimise initial values then we call:

- -
## Time elapsed: 0.92 seconds
+
+
## Time elapsed: 2.35 seconds
 ## Model estimated: CES(n)
 ## a0 + ia1: 1.09841+1.01221i
 ## Initial values were optimised.
 ## 5 parameters were estimated in the process
 ## Residuals standard deviation: 1392.904
-## Cost function type: MSE; Cost function value: 1940181.381
+## Loss function type: MSE; Loss function value: 1940181.381
 ## 
 ## Information criteria:
 ##      AIC     AICc      BIC     BICc 
 ## 1689.668 1690.328 1702.542 1704.050 
-## 95% semiparametric prediction intervals were constructed
+## 95% semiparametric prediction interval were constructed
 ## 39% of values are in the prediction interval
 ## Forecast errors:
 ## MPE: 15.2%; sCE: -1502.3%; Bias: 75.7%; MAPE: 37.3%
 ## MASE: 2.622; sMAE: 107%; sMSE: 202.9%; RelMAE: 1.12; RelRMSE: 1.25

Now let’s introduce some artificial exogenous variables:

-

ces() allows using exogenous variables and different types of prediction intervals in exactly the same manner as es():

- -
## Time elapsed: 2.09 seconds
-## Model estimated: CES(n)
-## a0 + ia1: 1.09841+1.01221i
+

ces() allows using exogenous variables and different types of prediction interval in exactly the same manner as es():

+ +
## Time elapsed: 5.7 seconds
+## Model estimated: CESX(n)
+## a0 + ia1: 1.1369+1.00445i
 ## Initial values were optimised.
-## 5 parameters were estimated in the process
-## Residuals standard deviation: 1392.904
-## Cost function type: MSE; Cost function value: 1940181.381
+## 7 parameters were estimated in the process
+## Residuals standard deviation: 1339.267
+## Xreg coefficients were estimated in a normal style
+## Loss function type: MSE; Loss function value: 1793636.798
 ## 
 ## Information criteria:
 ##      AIC     AICc      BIC     BICc 
-## 1689.668 1690.328 1702.542 1704.050 
-## 95% parametric prediction intervals were constructed
-## 50% of values are in the prediction interval
+## 1686.050 1687.309 1704.073 1706.952 
+## 95% parametric prediction interval were constructed
+## 61% of values are in the prediction interval
 ## Forecast errors:
-## MPE: 15.2%; sCE: -1502.3%; Bias: 75.7%; MAPE: 37.3%
-## MASE: 2.622; sMAE: 107%; sMSE: 202.9%; RelMAE: 1.12; RelRMSE: 1.25
+## MPE: 11.8%; sCE: -1349.6%; Bias: 72.4%; MAPE: 36.6% +## MASE: 2.494; sMAE: 101.8%; sMSE: 183.2%; RelMAE: 1.066; RelRMSE: 1.187

The same model but with updated parameters of exogenous variables is called:

- -
## Time elapsed: 3.17 seconds
+
+
## Time elapsed: 7.82 seconds
 ## Model estimated: CESX(n)
-## a0 + ia1: 0.99999+1.00383i
+## a0 + ia1: 0.99983+1.01309i
 ## Initial values were optimised.
 ## 13 parameters were estimated in the process
-## Residuals standard deviation: 1355.937
+## Residuals standard deviation: 1347.338
 ## Xreg coefficients were estimated in a crazy style
-## Cost function type: MSE; Cost function value: 1838566.256
+## Loss function type: MSE; Loss function value: 1815319.116
 ## 
 ## Information criteria:
 ##      AIC     AICc      BIC     BICc 
-## 1700.450 1704.836 1733.921 1743.953 
-## 95% parametric prediction intervals were constructed
+## 1699.216 1703.601 1732.687 1742.718 
+## 95% parametric prediction interval were constructed
 ## 44% of values are in the prediction interval
 ## Forecast errors:
-## MPE: 28.7%; sCE: -2016.6%; Bias: 88.6%; MAPE: 41.3%
-## MASE: 3.056; sMAE: 124.7%; sMSE: 255.5%; RelMAE: 1.305; RelRMSE: 1.402
+## MPE: 19.8%; sCE: -1683.3%; Bias: 81.6%; MAPE: 38.5% +## MASE: 2.765; sMAE: 112.8%; sMSE: 220.6%; RelMAE: 1.181; RelRMSE: 1.303
diff --git a/inst/doc/es.R b/inst/doc/es.R index fa1efcb..847e5e7 100644 --- a/inst/doc/es.R +++ b/inst/doc/es.R @@ -10,18 +10,21 @@ require(Mcomp) ## ----es_N2457------------------------------------------------------------ es(M3$N2457$x, h=18, holdout=TRUE, silent=FALSE) -## ----es_N2457_with_intervals--------------------------------------------- -es(M3$N2457$x, h=18, holdout=TRUE, intervals=TRUE, silent=FALSE) +## ----es_N2457_with_interval---------------------------------------------- +es(M3$N2457$x, h=18, holdout=TRUE, interval=TRUE, silent=FALSE) ## ----es_N2457_save_model------------------------------------------------- ourModel <- es(M3$N2457$x, h=18, holdout=TRUE, silent="all") ## ----es_N2457_reuse_model------------------------------------------------ -es(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, intervals="np", level=0.93) +es(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, interval="np", level=0.93) ## ----es_N2457_modelType-------------------------------------------------- modelType(ourModel) +## ----es_N2457_actuals---------------------------------------------------- +actuals(ourModel) + ## ----es_N2457_reuse_model_parts------------------------------------------ es(M3$N2457$x, model=modelType(ourModel), h=18, holdout=FALSE, initial=ourModel$initial, silent="graph") es(M3$N2457$x, model=modelType(ourModel), h=18, holdout=FALSE, persistence=ourModel$persistence, silent="graph") @@ -30,7 +33,7 @@ es(M3$N2457$x, model=modelType(ourModel), h=18, holdout=FALSE, persistence=ourMo es(M3$N2457$x, model=modelType(ourModel), h=18, holdout=FALSE, initial=1500, silent="graph") ## ----es_N2457_aMSTFE----------------------------------------------------- -es(M3$N2457$x, h=18, holdout=TRUE, cfType="aTMSE", bounds="a", ic="BIC", intervals=TRUE) +es(M3$N2457$x, h=18, holdout=TRUE, loss="aTMSE", bounds="a", ic="BIC", interval=TRUE) ## ----es_N2457_combine---------------------------------------------------- es(M3$N2457$x, model="CCN", h=18, holdout=TRUE, silent="graph") @@ -66,5 +69,5 @@ forecast(etsModel,h=18,level=0.95) forecast(esModel,h=18,level=0.95) ## ----es_N2457_M3--------------------------------------------------------- -es(M3$N2457, intervals=TRUE, silent=FALSE) +es(M3$N2457, interval=TRUE, silent=FALSE) diff --git a/inst/doc/es.Rmd b/inst/doc/es.Rmd index 0afc7f8..ae28e31 100644 --- a/inst/doc/es.Rmd +++ b/inst/doc/es.Rmd @@ -45,14 +45,14 @@ In this case function uses branch and bound algorithm to form a pool of models t 8. Information criteria for this model; 9. Forecast errors (because we have set `holdout=TRUE`). -The function has also produced a graph with actuals, fitted values and point forecasts. +The function has also produced a graph with actual values, fitted values and point forecasts. -If we need prediction intervals, then we run: -```{r es_N2457_with_intervals} -es(M3$N2457$x, h=18, holdout=TRUE, intervals=TRUE, silent=FALSE) +If we need prediction interval, then we run: +```{r es_N2457_with_interval} +es(M3$N2457$x, h=18, holdout=TRUE, interval=TRUE, silent=FALSE) ``` -Due to multiplicative nature of error term in the model, the intervals are asymmetric. This is the expected behaviour. The other thing to note is that the output now also provides the theoretical width of prediction intervals and its actual coverage. +Due to multiplicative nature of error term in the model, the interval are asymmetric. This is the expected behaviour. The other thing to note is that the output now also provides the theoretical width of prediction interval and its actual coverage. If we save the model (and let's say we want it to work silently): ```{r es_N2457_save_model} @@ -61,7 +61,7 @@ ourModel <- es(M3$N2457$x, h=18, holdout=TRUE, silent="all") we can then reuse it for different purposes: ```{r es_N2457_reuse_model} -es(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, intervals="np", level=0.93) +es(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, interval="np", level=0.93) ``` We can also extract the type of model in order to reuse it later: @@ -71,6 +71,11 @@ modelType(ourModel) This handy function, by the way, also works with ets() from forecast package. +If we need actual values from the model, we can use `actuals()` method from `greybox` package: +```{r es_N2457_actuals} +actuals(ourModel) +``` + We can then use persistence or initials only from the model to construct the other one: ```{r es_N2457_reuse_model_parts} es(M3$N2457$x, model=modelType(ourModel), h=18, holdout=FALSE, initial=ourModel$initial, silent="graph") @@ -83,7 +88,7 @@ es(M3$N2457$x, model=modelType(ourModel), h=18, holdout=FALSE, initial=1500, sil Using some other parameters may lead to completely different model and forecasts: ```{r es_N2457_aMSTFE} -es(M3$N2457$x, h=18, holdout=TRUE, cfType="aTMSE", bounds="a", ic="BIC", intervals=TRUE) +es(M3$N2457$x, h=18, holdout=TRUE, loss="aTMSE", bounds="a", ic="BIC", interval=TRUE) ``` You can play around with all the available parameters to see what's their effect on final model. @@ -135,7 +140,7 @@ etsModel <- forecast::ets(M3$N2457$x) esModel <- es(M3$N2457$x, model=etsModel, h=18) ``` -The point forecasts in the majority of cases should the same, but the prediction intervals may be different (especially if error term is multiplicative): +The point forecasts in the majority of cases should the same, but the prediction interval may be different (especially if error term is multiplicative): ```{r ets_es_forecast, message=FALSE, warning=FALSE} forecast(etsModel,h=18,level=0.95) forecast(esModel,h=18,level=0.95) @@ -143,7 +148,7 @@ forecast(esModel,h=18,level=0.95) Finally, if you work with M or M3 data, and need to test a function on a specific time series, you can use the following simplified call: ```{r es_N2457_M3} -es(M3$N2457, intervals=TRUE, silent=FALSE) +es(M3$N2457, interval=TRUE, silent=FALSE) ``` This command has taken the data, split it into in-sample and holdout and produced the forecast of appropriate length to the holdout. diff --git a/inst/doc/es.html b/inst/doc/es.html index 74098bf..e474632 100644 --- a/inst/doc/es.html +++ b/inst/doc/es.html @@ -12,7 +12,7 @@ - + es() - Exponential Smoothing @@ -303,7 +303,7 @@

es() - Exponential Smoothing

Ivan Svetunkov

-

2019-04-25

+

2019-06-13

@@ -317,7 +317,7 @@

2019-04-25

The simplest call of this function is:

## Forming the pool of models based on... ANN, ANA, AAN, Estimation progress:    100%... Done!
-
## Time elapsed: 0.6 seconds
+
## Time elapsed: 1.46 seconds
 ## Model estimated: ETS(MNN)
 ## Persistence vector g:
 ## alpha 
@@ -325,7 +325,7 @@ 

2019-04-25

## Initial values were optimised. ## 3 parameters were estimated in the process ## Residuals standard deviation: 0.407 -## Cost function type: MSE; Cost function value: 0.165 +## Loss function type: MSE; Loss function value: 0.165 ## ## Information criteria: ## AIC AICc BIC BICc @@ -346,11 +346,11 @@

2019-04-25

  • Information criteria for this model;
  • Forecast errors (because we have set holdout=TRUE).
  • -

    The function has also produced a graph with actuals, fitted values and point forecasts.

    -

    If we need prediction intervals, then we run:

    - +

    The function has also produced a graph with actual values, fitted values and point forecasts.

    +

    If we need prediction interval, then we run:

    +
    ## Forming the pool of models based on... ANN, ANA, AAN, Estimation progress:    100%... Done!
    -
    ## Time elapsed: 0.4 seconds
    +
    ## Time elapsed: 1.02 seconds
     ## Model estimated: ETS(MNN)
     ## Persistence vector g:
     ## alpha 
    @@ -358,23 +358,23 @@ 

    2019-04-25

    ## Initial values were optimised. ## 3 parameters were estimated in the process ## Residuals standard deviation: 0.407 -## Cost function type: MSE; Cost function value: 0.165 +## Loss function type: MSE; Loss function value: 0.165 ## ## Information criteria: ## AIC AICc BIC BICc ## 1645.978 1646.236 1653.702 1654.292 -## 95% parametric prediction intervals were constructed +## 95% parametric prediction interval were constructed ## 72% of values are in the prediction interval ## Forecast errors: ## MPE: 26.3%; sCE: -1919.1%; Bias: 86.9%; MAPE: 39.8% ## MASE: 2.944; sMAE: 120.1%; sMSE: 242.7%; RelMAE: 1.258; RelRMSE: 1.367

    -

    Due to multiplicative nature of error term in the model, the intervals are asymmetric. This is the expected behaviour. The other thing to note is that the output now also provides the theoretical width of prediction intervals and its actual coverage.

    +

    Due to multiplicative nature of error term in the model, the interval are asymmetric. This is the expected behaviour. The other thing to note is that the output now also provides the theoretical width of prediction interval and its actual coverage.

    If we save the model (and let’s say we want it to work silently):

    we can then reuse it for different purposes:

    - -
    ## Time elapsed: 0.05 seconds
    +
    +
    ## Time elapsed: 0.13 seconds
     ## Model estimated: ETS(MNN)
     ## Persistence vector g:
     ## alpha 
    @@ -383,19 +383,41 @@ 

    2019-04-25

    ## 1 parameter was estimated in the process ## 2 parameters were provided ## Residuals standard deviation: 0.429 -## Cost function type: MSE; Cost function value: 0.184 +## Loss function type: MSE; Loss function value: 0.184 ## ## Information criteria: ## AIC AICc BIC BICc ## 1994.861 1994.897 1997.606 1997.690 -## 93% nonparametric prediction intervals were constructed
    +## 93% nonparametric prediction interval were constructed

    We can also extract the type of model in order to reuse it later:

    ## [1] "MNN"

    This handy function, by the way, also works with ets() from forecast package.

    +

    If we need actual values from the model, we can use actuals() method from greybox package:

    + +
    ##         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
    +## 1983 2158.1 1086.4 1154.7 1125.6  920.0 2188.6  829.2 1353.1  947.2 1816.8
    +## 1984 1783.3 1713.1 3479.7 2429.4 3074.3 3427.4 2783.7 1968.7 2045.6 1471.3
    +## 1985 1821.0 2409.8 3485.8 3289.2 3048.3 2914.1 2173.9 3018.4 2200.1 6844.3
    +## 1986 3238.9 3252.2 3278.8 1766.8 3572.8 3467.6 7464.7 2748.4 5126.7 2870.8
    +## 1987 3220.7 3586.0 3249.5 3222.5 2488.5 3332.4 2036.1 1968.2 2967.2 3151.6
    +## 1988 3894.1 4625.5 3291.7 3065.6 2316.5 2453.4 4582.8 2291.2 3555.5 1785.0
    +## 1989 2102.9 2307.7 6242.1 6170.5 1863.5 6318.9 3992.8 3435.1 1585.8 2106.8
    +## 1990 6168.0 7247.4 3579.7 6365.2 4658.9 6911.8 2143.7 5973.9 4017.2 4473.0
    +## 1991 8749.1                                                               
    +##         Nov    Dec
    +## 1983 1624.5  868.5
    +## 1984 2763.7 2328.4
    +## 1985 4160.4 1548.8
    +## 1986 2170.2 4326.8
    +## 1987 1610.5 3985.0
    +## 1988 2020.0 2026.8
    +## 1989 1892.1 4310.6
    +## 1990 3591.9 4676.5
    +## 1991

    We can then use persistence or initials only from the model to construct the other one:

    - -
    ## Time elapsed: 0.02 seconds
    +
    +
    ## Time elapsed: 0.04 seconds
     ## Model estimated: ETS(MNN)
     ## Persistence vector g:
     ## alpha 
    @@ -404,13 +426,13 @@ 

    2019-04-25

    ## 2 parameters were estimated in the process ## 1 parameter was provided ## Residuals standard deviation: 0.429 -## Cost function type: MSE; Cost function value: 0.184 +## Loss function type: MSE; Loss function value: 0.184 ## ## Information criteria: ## AIC AICc BIC BICc ## 1996.845 1996.952 2002.334 2002.589
    - -
    ## Time elapsed: 0.02 seconds
    +
    +
    ## Time elapsed: 0.04 seconds
     ## Model estimated: ETS(MNN)
     ## Persistence vector g:
     ## alpha 
    @@ -419,14 +441,14 @@ 

    2019-04-25

    ## 2 parameters were estimated in the process ## 1 parameter was provided ## Residuals standard deviation: 0.429 -## Cost function type: MSE; Cost function value: 0.184 +## Loss function type: MSE; Loss function value: 0.184 ## ## Information criteria: ## AIC AICc BIC BICc ## 1996.861 1996.968 2002.351 2002.605

    or provide some arbitrary values:

    - -
    ## Time elapsed: 0.02 seconds
    +
    +
    ## Time elapsed: 0.04 seconds
     ## Model estimated: ETS(MNN)
     ## Persistence vector g:
     ## alpha 
    @@ -435,14 +457,14 @@ 

    2019-04-25

    ## 2 parameters were estimated in the process ## 1 parameter was provided ## Residuals standard deviation: 0.429 -## Cost function type: MSE; Cost function value: 0.184 +## Loss function type: MSE; Loss function value: 0.184 ## ## Information criteria: ## AIC AICc BIC BICc ## 1997.028 1997.136 2002.518 2002.773

    Using some other parameters may lead to completely different model and forecasts:

    - -
    ## Time elapsed: 0.43 seconds
    +
    +
    ## Time elapsed: 1.1 seconds
     ## Model estimated: ETS(ANN)
     ## Persistence vector g:
     ## alpha 
    @@ -450,25 +472,25 @@ 

    2019-04-25

    ## Initial values were optimised. ## 3 parameters were estimated in the process ## Residuals standard deviation: 1444.05 -## Cost function type: aTMSE; Cost function value: 39565651.9 +## Loss function type: aTMSE; Loss function value: 39565651.9 ## ## Information criteria: ## AIC AICc BIC BICc ## 1974.076 1974.736 1985.865 1986.455 -## 95% parametric prediction intervals were constructed +## 95% parametric prediction interval were constructed ## 44% of values are in the prediction interval ## Forecast errors: ## MPE: 33.4%; sCE: -2196.8%; Bias: 90.4%; MAPE: 43.4% ## MASE: 3.235; sMAE: 132%; sMSE: 278%; RelMAE: 1.382; RelRMSE: 1.463

    You can play around with all the available parameters to see what’s their effect on final model.

    In order to combine forecasts we need to use “C” letter:

    - +
    ## Estimation progress:    10%20%30%40%50%60%70%80%90%100%... Done!
    -
    ## Time elapsed: 0.69 seconds
    +
    ## Time elapsed: 1.58 seconds
     ## Model estimated: ETS(CCN)
     ## Initial values were optimised.
     ## Residuals standard deviation: 1409.001
    -## Cost function type: MSE
    +## Loss function type: MSE
     ## 
     ## Information criteria:
     ## (combined values)
    @@ -478,9 +500,9 @@ 

    2019-04-25

    ## MPE: 26.7%; sCE: -1936.1%; Bias: 87.4%; MAPE: 40% ## MASE: 2.963; sMAE: 120.9%; sMSE: 245%; RelMAE: 1.266; RelRMSE: 1.373

    Model selection from a specified pool and forecasts combination are called using respectively:

    - +
    ## Estimation progress:    17%33%50%67%83%100%... Done!
    -
    ## Time elapsed: 0.76 seconds
    +
    ## Time elapsed: 2.01 seconds
     ## Model estimated: ETS(ANN)
     ## Persistence vector g:
     ## alpha 
    @@ -488,7 +510,7 @@ 

    2019-04-25

    ## Initial values were optimised. ## 3 parameters were estimated in the process ## Residuals standard deviation: 1416.935 -## Cost function type: MSE; Cost function value: 2007704.532 +## Loss function type: MSE; Loss function value: 2007704.532 ## ## Information criteria: ## AIC AICc BIC BICc @@ -496,13 +518,13 @@

    2019-04-25

    ## Forecast errors: ## MPE: 25.3%; sCE: -1880.4%; Bias: 86%; MAPE: 39.4% ## MASE: 2.909; sMAE: 118.7%; sMSE: 238.1%; RelMAE: 1.243; RelRMSE: 1.354
    - +
    ## Estimation progress:    17%33%50%67%83%100%... Done!
    -
    ## Time elapsed: 0.75 seconds
    +
    ## Time elapsed: 1.89 seconds
     ## Model estimated: ETS(CCC)
     ## Initial values were optimised.
     ## Residuals standard deviation: 1386.692
    -## Cost function type: MSE
    +## Loss function type: MSE
     ## 
     ## Information criteria:
     ## (combined values)
    @@ -512,29 +534,29 @@ 

    2019-04-25

    ## MPE: 17.1%; sCE: -1568.3%; Bias: 77.7%; MAPE: 37.3% ## MASE: 2.658; sMAE: 108.4%; sMSE: 206.7%; RelMAE: 1.135; RelRMSE: 1.261

    Now let’s introduce some artificial exogenous variables:

    - +

    and fit a model with all the exogenous first:

    - -
    ## Time elapsed: 0.73 seconds
    +
    +
    ## Time elapsed: 1.48 seconds
     ## Model estimated: ETSX(MNN)
     ## Persistence vector g:
     ## alpha 
    -## 0.148 
    +## 0.144 
     ## Initial values were optimised.
     ## 5 parameters were estimated in the process
    -## Residuals standard deviation: 0.403
    +## Residuals standard deviation: 0.405
     ## Xreg coefficients were estimated in a normal style
    -## Cost function type: MSE; Cost function value: 0.163
    +## Loss function type: MSE; Loss function value: 0.164
     ## 
     ## Information criteria:
     ##      AIC     AICc      BIC     BICc 
    -## 1648.352 1649.012 1661.226 1662.734 
    +## 1649.471 1650.130 1662.345 1663.853 
     ## Forecast errors:
    -## MPE: 23.8%; sCE: -1807.1%; Bias: 84.9%; MAPE: 38%
    -## MASE: 2.821; sMAE: 115.1%; sMSE: 230.1%; RelMAE: 1.205; RelRMSE: 1.331
    +## MPE: 26.3%; sCE: -1912.8%; Bias: 87.9%; MAPE: 39.6% +## MASE: 2.931; sMAE: 119.6%; sMSE: 240.6%; RelMAE: 1.252; RelRMSE: 1.361

    or construct a model with selected exogenous (based on IC):

    - -
    ## Time elapsed: 0.41 seconds
    +
    +
    ## Time elapsed: 0.91 seconds
     ## Model estimated: ETS(MNN)
     ## Persistence vector g:
     ## alpha 
    @@ -542,7 +564,7 @@ 

    2019-04-25

    ## Initial values were optimised. ## 3 parameters were estimated in the process ## Residuals standard deviation: 0.407 -## Cost function type: MSE; Cost function value: 0.165 +## Loss function type: MSE; Loss function value: 0.165 ## ## Information criteria: ## AIC AICc BIC BICc @@ -551,34 +573,33 @@

    2019-04-25

    ## MPE: 26.3%; sCE: -1919.1%; Bias: 86.9%; MAPE: 39.8% ## MASE: 2.944; sMAE: 120.1%; sMSE: 242.7%; RelMAE: 1.258; RelRMSE: 1.367

    or the one with the updated xreg:

    - +

    If we want to check if lagged x can be used for forecasting purposes, we can use xregExpander() function from greybox package:

    - -
    ## Time elapsed: 1.38 seconds
    -## Model estimated: ETSX(MNN)
    +
    +
    ## Time elapsed: 2.18 seconds
    +## Model estimated: ETS(MNN)
     ## Persistence vector g:
     ## alpha 
    -## 0.147 
    +## 0.145 
     ## Initial values were optimised.
    -## 4 parameters were estimated in the process
    -## Residuals standard deviation: 0.403
    -## Xreg coefficients were estimated in a normal style
    -## Cost function type: MSE; Cost function value: 0.163
    +## 3 parameters were estimated in the process
    +## Residuals standard deviation: 0.407
    +## Loss function type: MSE; Loss function value: 0.165
     ## 
     ## Information criteria:
     ##      AIC     AICc      BIC     BICc 
    -## 1646.458 1646.893 1656.757 1657.752 
    +## 1645.978 1646.236 1653.702 1654.292 
     ## Forecast errors:
    -## MPE: 27.5%; sCE: -1991.7%; Bias: 88%; MAPE: 40.9%
    -## MASE: 3.041; sMAE: 124.1%; sMSE: 259.2%; RelMAE: 1.299; RelRMSE: 1.412
    +## MPE: 26.3%; sCE: -1919.1%; Bias: 86.9%; MAPE: 39.8% +## MASE: 2.944; sMAE: 120.1%; sMSE: 242.7%; RelMAE: 1.258; RelRMSE: 1.367

    If we are confused about the type of estimated model, the function formula() will help us:

    - +
    ## [1] "y[t] = l[t-1] * exp(a1[t-1] * x1[t] + a2[t-1] * x2[t]) * e[t]"

    A feature available since 2.1.0 is fitting ets() model and then using its parameters in es():

    - -

    The point forecasts in the majority of cases should the same, but the prediction intervals may be different (especially if error term is multiplicative):

    - + +

    The point forecasts in the majority of cases should the same, but the prediction interval may be different (especially if error term is multiplicative):

    +
    ##          Point Forecast       Lo 95    Hi 95
     ## Aug 1992       8523.456   853.30277 16193.61
     ## Sep 1992       8563.040   719.69262 16406.39
    @@ -598,30 +619,30 @@ 

    2019-04-25

    ## Nov 1993 9117.225 -1048.41679 19282.87 ## Dec 1993 9156.809 -1170.00570 19483.62 ## Jan 1994 9196.394 -1291.22258 19684.01
    - +
    ##          Point forecast Lower bound (2.5%) Upper bound (97.5%)
    -## Aug 1992       9352.900           3661.607            19667.82
    -## Sep 1992       9534.040           3664.498            20407.03
    -## Oct 1992       9765.247           3662.563            21211.15
    -## Nov 1992       9973.668           3721.224            21972.67
    -## Dec 1992      10192.885           3751.618            22487.85
    -## Jan 1993      10398.648           3752.026            23342.72
    -## Feb 1993      10625.812           3806.078            24186.38
    -## Mar 1993      10829.256           3811.182            24978.04
    -## Apr 1993      11061.514           3818.255            25816.54
    -## May 1993      11290.470           3844.685            26350.27
    -## Jun 1993      11524.842           3866.250            27321.64
    -## Jul 1993      11779.648           3913.325            28129.69
    -## Aug 1993      11989.252           3901.180            28933.17
    -## Sep 1993      12288.172           3959.410            29923.20
    -## Oct 1993      12530.298           3972.137            30594.77
    -## Nov 1993      12774.302           4017.697            31655.47
    -## Dec 1993      13038.866           4053.363            32484.63
    -## Jan 1994      13313.902           4086.671            33466.95
    +## Aug 1992 9347.098 3666.149 19856.70 +## Sep 1992 9546.144 3693.162 20440.49 +## Oct 1992 9735.156 3703.239 21170.36 +## Nov 1992 9961.805 3723.052 21877.28 +## Dec 1992 10192.260 3758.424 22645.36 +## Jan 1993 10389.613 3769.291 23292.52 +## Feb 1993 10651.171 3799.634 24290.73 +## Mar 1993 10837.798 3800.740 24979.30 +## Apr 1993 11082.481 3840.874 25717.10 +## May 1993 11296.297 3890.034 26444.53 +## Jun 1993 11555.200 3866.731 27341.39 +## Jul 1993 11780.012 3916.427 28146.32 +## Aug 1993 11988.098 3914.172 29023.99 +## Sep 1993 12242.071 3952.081 29956.68 +## Oct 1993 12556.283 3979.754 30691.82 +## Nov 1993 12774.233 3996.921 31826.38 +## Dec 1993 13018.245 4014.508 32608.30 +## Jan 1994 13295.652 4032.588 33623.21

    Finally, if you work with M or M3 data, and need to test a function on a specific time series, you can use the following simplified call:

    - +
    ## Forming the pool of models based on... ANN, ANA, AAN, Estimation progress:    100%... Done!
    -
    ## Time elapsed: 0.41 seconds
    +
    ## Time elapsed: 1.01 seconds
     ## Model estimated: ETS(MNN)
     ## Persistence vector g:
     ## alpha 
    @@ -629,12 +650,12 @@ 

    2019-04-25

    ## Initial values were optimised. ## 3 parameters were estimated in the process ## Residuals standard deviation: 0.429 -## Cost function type: MSE; Cost function value: 0.184 +## Loss function type: MSE; Loss function value: 0.184 ## ## Information criteria: ## AIC AICc BIC BICc ## 1998.844 1999.061 2007.079 2007.592 -## 95% parametric prediction intervals were constructed +## 95% parametric prediction interval were constructed ## 50% of values are in the prediction interval ## Forecast errors: ## MPE: -127.6%; sCE: 1618.3%; Bias: -92.4%; MAPE: 129.2% diff --git a/inst/doc/gum.R b/inst/doc/gum.R index c9aeca3..34864af 100644 --- a/inst/doc/gum.R +++ b/inst/doc/gum.R @@ -13,7 +13,7 @@ gum(M3$N2457$x, h=18, holdout=TRUE) gum(M3$N2457$x, h=18, holdout=TRUE, orders=c(2,1), lags=c(1,12)) ## ----Autogum_N2457_1[1]-------------------------------------------------- -auto.gum(M3[[2457]], intervals=TRUE, silent=FALSE) +auto.gum(M3[[2457]], interval=TRUE, silent=FALSE) ## ----gum_N2457_predefined------------------------------------------------ transition <- matrix(c(1,0,0,1,1,0,0,0,1),3,3) diff --git a/inst/doc/gum.Rmd b/inst/doc/gum.Rmd index b73e56b..5f41b64 100644 --- a/inst/doc/gum.Rmd +++ b/inst/doc/gum.Rmd @@ -41,7 +41,7 @@ gum(M3$N2457$x, h=18, holdout=TRUE, orders=c(2,1), lags=c(1,12)) Function `auto.gum()` is now implemented in `smooth`, but it works slowly as it needs to check a large number of models: ```{r Autogum_N2457_1[1]} -auto.gum(M3[[2457]], intervals=TRUE, silent=FALSE) +auto.gum(M3[[2457]], interval=TRUE, silent=FALSE) ``` In addition to standard values that other functions accept, GUM accepts predefined values for transition matrix, measurement and persistence vectors. For example, something more common can be passed to the function: diff --git a/inst/doc/gum.html b/inst/doc/gum.html index cee1134..a5180a5 100644 --- a/inst/doc/gum.html +++ b/inst/doc/gum.html @@ -12,7 +12,7 @@ - + gum() - Generalised Univariate Model @@ -303,7 +303,7 @@

    gum() - Generalised Univariate Model

    Ivan Svetunkov

    -

    2019-04-25

    +

    2019-06-13

    @@ -316,7 +316,7 @@

    2019-04-25

    Generalised Exponential Smoothing is a next step from CES. It is a state-space model in which all the matrices and vectors are estimated. It is very demanding in sample size, but is also insanely flexible.

    A simple call by default constructs GUM\((1^1,1^m)\), where \(m\) is frequency of the data. So for our example with monthly data N2457, we will have GUM\((1^1,1^{12})\):

    -
    ## Time elapsed: 0.45 seconds
    +
    ## Time elapsed: 1.13 seconds
     ## Model estimated: GUM(1[1],1[12])
     ## Persistence vector g:
     ##       [,1]  [,2]
    @@ -329,7 +329,7 @@ 

    2019-04-25

    ## Initial values were optimised. ## 22 parameters were estimated in the process ## Residuals standard deviation: 1318.608 -## Cost function type: MSE; Cost function value: 1738726.863 +## Loss function type: MSE; Loss function value: 1738726.863 ## ## Information criteria: ## AIC AICc BIC BICc @@ -339,7 +339,7 @@

    2019-04-25

    ## MASE: 2.782; sMAE: 113.5%; sMSE: 221.7%; RelMAE: 1.188; RelRMSE: 1.306

    But some different orders and lags can be specified. For example:

    -
    ## Time elapsed: 0.36 seconds
    +
    ## Time elapsed: 0.89 seconds
     ## Model estimated: GUM(2[1],1[12])
     ## Persistence vector g:
     ##       [,1]  [,2]   [,3]
    @@ -353,7 +353,7 @@ 

    2019-04-25

    ## Initial values were optimised. ## 30 parameters were estimated in the process ## Residuals standard deviation: 1269.81 -## Cost function type: MSE; Cost function value: 1612418.429 +## Loss function type: MSE; Loss function value: 1612418.429 ## ## Information criteria: ## AIC AICc BIC BICc @@ -362,12 +362,12 @@

    2019-04-25

    ## MPE: 24%; sCE: -1779.5%; Bias: 87.4%; MAPE: 37.9% ## MASE: 2.764; sMAE: 112.8%; sMSE: 214.9%; RelMAE: 1.181; RelRMSE: 1.286

    Function auto.gum() is now implemented in smooth, but it works slowly as it needs to check a large number of models:

    - +
    ## Starting preliminary loop:            1 out of 122 out of 123 out of 124 out of 125 out of 126 out of 127 out of 128 out of 129 out of 1210 out of 1211 out of 1212 out of 12. Done.
     ## Searching for appropriate lags:  —\|/—\|/—\|/We found them!
     ## Searching for appropriate orders:  —\|/—\|/—Orders found.
     ## Reestimating the model. Done!
    -
    ## Time elapsed: 27.51 seconds
    +
    ## Time elapsed: 69.38 seconds
     ## Model estimated: GUM(1[1],1[4])
     ## Persistence vector g:
     ##       [,1]  [,2]
    @@ -380,12 +380,12 @@ 

    2019-04-25

    ## Initial values were produced using backcasting. ## 9 parameters were estimated in the process ## Residuals standard deviation: 1826.168 -## Cost function type: MSE; Cost function value: 3334888.499 +## Loss function type: MSE; Loss function value: 3334888.499 ## ## Information criteria: ## AIC AICc BIC BICc ## 2071.650 2073.364 2096.354 2100.422 -## 95% parametric prediction intervals were constructed +## 95% parametric prediction interval were constructed ## 44% of values are in the prediction interval ## Forecast errors: ## MPE: -175.7%; sCE: 2148.6%; Bias: -89.7%; MAPE: 181.5% @@ -395,7 +395,7 @@

    2019-04-25

    -
    ## Time elapsed: 0.27 seconds
    +
    ## Time elapsed: 0.63 seconds
     ## Model estimated: GUM(2[1],1[12])
     ## Persistence vector g:
     ##       [,1]  [,2]   [,3]
    @@ -410,7 +410,7 @@ 

    2019-04-25

    ## 18 parameters were estimated in the process ## 12 parameters were provided ## Residuals standard deviation: 1339.166 -## Cost function type: MSE; Cost function value: 1793364.893 +## Loss function type: MSE; Loss function value: 1793364.893 ## ## Information criteria: ## AIC AICc BIC BICc diff --git a/inst/doc/oes.R b/inst/doc/oes.R index 7f97a9f..540444a 100644 --- a/inst/doc/oes.R +++ b/inst/doc/oes.R @@ -63,5 +63,5 @@ oETSAModel plot(oETSAModel) ## ----iETSGRoundedExample------------------------------------------------- -es(rpois(100,0.3), "MNN", occurrence="g", oesmodel="MNN", h=10, holdout=TRUE, silent=FALSE, intervals=TRUE, rounded=TRUE) +es(rpois(100,0.3), "MNN", occurrence="g", oesmodel="MNN", h=10, holdout=TRUE, silent=FALSE, interval=TRUE, rounded=TRUE) diff --git a/inst/doc/oes.Rmd b/inst/doc/oes.Rmd index 3c9b0a4..36e808c 100644 --- a/inst/doc/oes.Rmd +++ b/inst/doc/oes.Rmd @@ -1,5 +1,5 @@ --- -title: "Occurrence part of iETS model" +title: "oes() - occurrence part of iETS model" author: "Ivan Svetunkov" date: "`r Sys.Date()`" output: rmarkdown::html_vignette @@ -24,7 +24,8 @@ The canonical general iETS model (called iETS$_G$) can be summarised as: \begin{equation} \label{eq:iETS} \tag{1} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = f{a_t, b_t} \\ a_t = w_a(v_{a,t-L}) + r_a(v_{a,t-L}) \epsilon_{a,t} \\ v_{a,t} = f_a(v_{a,t-L}) + g_a(v_{a,t-L}) \epsilon_{a,t} \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \\ @@ -33,7 +34,7 @@ The canonical general iETS model (called iETS$_G$) can be summarised as: (1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2) \end{matrix}, \end{equation} -where $y_t$ is the observed values, $z_t$ is the demand size, which is a pure multiplicative ETS model on its own, $w(\cdot)$ is the measurement function, $r(\cdot)$ is the error function, $f(\cdot)$ is the transition function and $g(\cdot)$ is the persistence function (the subscripts allow separating the functions for different parts of the model). These four functions define how the elements of the vector $v_{t}$ interact with each other. Furthermore, $\epsilon_{a,t}$ and $\epsilon_{b,t}$ are the mutually independent error terms, $o_t$ is the binary occurrence variable (1 - demand is non-zero, 0 - no demand in the period $t$) which is distributed according to Bernoulli with probability $p_t$ that has a Beta distribution ($o_t \sim \text{Bernoulli} \left(p_t \right)$, $p_t \sim \text{Beta} \left(a_t, b_t \right)$). Any ETS model can be used for $a_t$ and $b_t$, and the transformation of them into the probability $p_t$ depends on the type of the error. The general formula for the multiplicative error is: +where $y_t$ is the observed values, $z_t$ is the demand size, which is a pure multiplicative ETS model on its own, $w(\cdot)$ is the measurement function, $r(\cdot)$ is the error function, $f(\cdot)$ is the transition function and $g(\cdot)$ is the persistence function (the subscripts allow separating the functions for different parts of the model). These four functions define how the elements of the vector $v_{t}$ interact with each other. Furthermore, $\epsilon_{a,t}$ and $\epsilon_{b,t}$ are the mutually independent error terms, $o_t$ is the binary occurrence variable (1 - demand is non-zero, 0 - no demand in the period $t$) which is distributed according to Bernoulli with probability $p_t$ that has a logit-normal distribution ($o_t \sim \text{Bernoulli} \left(p_t \right)$, $p_t \sim \text{logit} \mathcal{N}$). Any ETS model can be used for $a_t$ and $b_t$, and the transformation of them into the probability $p_t$ depends on the type of the error. The general formula for the multiplicative error is: \begin{equation} \label{eq:oETS(MZZ)} p_t = \frac{a_t}{a_t+b_t} , \end{equation} @@ -51,7 +52,8 @@ An example of an iETS model is the basic local-level model iETS(M,N,N)$_G$(M,N,N l_{z,t} = l_{z,t-1}( 1 + \alpha_{z} \epsilon_{z,t}) \\ (1 + \epsilon_{t}) \sim \text{log}\mathcal{N}(0, \sigma_\epsilon^2) \\ \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{a_t}{a_t+b_t} \\ a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \\ @@ -95,33 +97,24 @@ In case of the fixed $a_t$ and $b_t$, the iETS$_G$ model reduces to: \begin{equation} \label{eq:ISSETS(MNN)Fixed} \tag{3} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli}(a, b) + o_t \sim \text{Bernoulli}(p) \end{matrix} . \end{equation} -The conditional h-steps ahead median of the demand occurrence probability is calculated as: +The conditional h-steps ahead mean of the demand occurrence probability is calculated as: \begin{equation} \label{eq:pt_fixed_expectation} - \mu_{o,t+h|t} = \tilde{p}_{t+h|t} = \frac{a}{a+b} . -\end{equation} - -The likelihood function used in the first step of the estimation of iETS can be simplified to: -\begin{equation} \label{eq:ISSETS(MNN)FixedLikelihood} \tag{4} - \ell \left(a,b | o_t \right) = {\sum_{t=1}^T} \log \left( \frac{ \text{B} (o_t + a, 1 - o_t + b) }{ \text{B}(a,b) } \right) , + \mu_{o,t+h|t} = \tilde{p}_{t+h|t} = \hat{p} . \end{equation} -where $B$ is the beta function. -Note, however that there can be combinations of $a$ and $b$ that will lead to the same fixed probability of occurrence $p$, so there is no point in estimating the model (3) \ref{eq:ISSETS(MNN)Fixed} based on (4) \ref{eq:ISSETS(MNN)FixedLikelihood}. Instead, the simpler version of the iETS$_F$ is fitted in the `oes()` function of the `smooth` package: -\begin{equation} \label{eq:ISSETS(MNN)FixedSmooth} - \begin{matrix} - y_t = o_t z_t \\ - o_t \sim \text{Bernoulli}(p) - \end{matrix} , -\end{equation} -so that the estimate of the probability $p$ is calculated based on the maximisation of the following concentrated log-likelihood function: +The estimate of the probability $p$ is calculated based on the maximisation of the following concentrated log-likelihood function: \begin{equation} \label{eq:ISSETS(MNN)FixedLikelihoodSmooth} - \ell \left(\hat{p} | o_t \right) = T_1 \log \hat{p} + T_0 \log (1-\hat{p}) , + \ell \left({p} | o_t \right) = T_1 \log {p} + T_0 \log (1-{p}) , +\end{equation} +where $T_0$ is the number of zero observations and $T_1$ is the number of non-zero observations in the data. The number of estimated parameters in this case is equal to $k_z+1$, where $k_z$ is the number of parameters for the demand sizes part, and 1 is for the estimation of the probability $p$. Maximising this likelihood deems the analytical solution for the $p$: +\begin{equation} \label{eq:ISSETS(MNN)FixedLikelihoodSmoothProbability} + \hat{p} = \frac{1}{T} \sum_{t=1}^T o_t , \end{equation} -where $T_0$ is the number of zero observations and $T_1$ is the number of non-zero observations in the data. The number of estimated parameters in this case is equal to $k_z+1$, where $k_z$ is the number of parameters for the demand sizes part, and 1 is for the estimation of the probability $p$. +where $T$ is the number of all the available observations. The occurrence part of the model oETS$_F$ is constructed using `oes()` function: ```{r iETSFExample1} @@ -141,16 +134,13 @@ The odds-ratio iETS uses only one model for the occurrence part, for the $a_t$ v \begin{equation} \label{eq:iETSO} \tag{5} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, 1 \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{a_t}{a_t+1} \\ a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \end{matrix}. \end{equation} -The probability of occurrence in this model is equal to: -\begin{equation} \label{eq:oETS_O(MNN)} - p_t = \frac{a_t}{a_t+1} . -\end{equation} In the estimation of the model, the initial level is set to the transformed mean probability of occurrence $l_{a,0}=\frac{\bar{p}}{1-\bar{p}}$ for multiplicative error model and $l_{a,0} = \log l_{a,0}$ for the additive one, where $\bar{p}=\frac{1}{T} \sum_{t=1}^T o_t$, the initial trend is equal to 0 in case of the additive and 1 in case of the multiplicative types. In cases of seasonal models, the regression with dummy variables is fitted, and its parameters are then used for the initials of the seasonal indices after the transformations similar to the level ones. @@ -191,16 +181,13 @@ Similarly to the odds-ratio iETS, inverse-odds-ratio model uses only one model f \begin{equation} \label{eq:iETSI} \tag{6} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(1, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{1}{1+b_t} \\ b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\ l_{b,t} = l_{b,t-1}( 1 + \alpha_{b} \epsilon_{b,t}) \\ (1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2) \end{matrix}. \end{equation} -The probability of occurrence in this model is equal to: -\begin{equation} \label{eq:oETS_I(MNN)} - p_t = \frac{1}{1+b_t} . -\end{equation} In the estimation of the model, the initial level is set to the transformed mean probability of occurrence $l_{b,0}=\frac{1-\bar{p}}{\bar{p}}$ for multiplicative error model and $l_{b,0} = \log l_{b,0}$ for the additive one, where $\bar{p}=\frac{1}{T} \sum_{t=1}^T o_t$, the initial trend is equal to 0 in case of the additive and 1 in case of the multiplicative types. The seasonality is treated similar to the iETS$_O$ model, but using the inverse-odds transformation. @@ -347,10 +334,10 @@ The main restriction of the iETS models at the moment (`smooth` v.2.5.0) is that ## The integer-valued iETS By default, the models assume that the data is continuous, which sounds counter intuitive for the typical intermittent demand forecasting tasks. However, [@Svetunkov2017a] showed that these models perform quite well in terms of forecasting accuracy for many cases. Still, there is also an option for the rounded up values, which is implemented in the `es()` function. This is not described in the manual and can be triggered via the `rounded=TRUE` parameter provided in ellipsis. Here's an example: ```{r iETSGRoundedExample} -es(rpois(100,0.3), "MNN", occurrence="g", oesmodel="MNN", h=10, holdout=TRUE, silent=FALSE, intervals=TRUE, rounded=TRUE) +es(rpois(100,0.3), "MNN", occurrence="g", oesmodel="MNN", h=10, holdout=TRUE, silent=FALSE, interval=TRUE, rounded=TRUE) ``` -Keep in mind that the model with the rounded up values is estimated differently than it continuous counterpart and produces more adequate results for the highly intermittent data with low level of demand sizes. In all the other cases, the continuous iETS models are recommended. In fact, if you need to produce integer-valued prediction intervals, then you can produce the intervals from a continuous model and then round them up (see discussion in [@Svetunkov2017a] for details). +Keep in mind that the model with the rounded up values is estimated differently than it continuous counterpart and produces more adequate results for the highly intermittent data with low level of demand sizes. In all the other cases, the continuous iETS models are recommended. In fact, if you need to produce integer-valued prediction interval, then you can produce the interval from a continuous model and then round them up (see discussion in [@Svetunkov2017a] for details). ## References diff --git a/inst/doc/oes.html b/inst/doc/oes.html index fde4782..9eb5aa7 100644 --- a/inst/doc/oes.html +++ b/inst/doc/oes.html @@ -12,9 +12,9 @@ - + -Occurrence part of iETS model +oes() - occurrence part of iETS model @@ -301,9 +301,9 @@ -

    Occurrence part of iETS model

    +

    oes() - occurrence part of iETS model

    Ivan Svetunkov

    -

    2019-04-25

    +

    2019-06-13

    @@ -313,7 +313,8 @@

    The basics

    The canonical general iETS model (called iETS\(_G\)) can be summarised as: \[\begin{equation} \label{eq:iETS} \tag{1} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = f{a_t, b_t} \\ a_t = w_a(v_{a,t-L}) + r_a(v_{a,t-L}) \epsilon_{a,t} \\ v_{a,t} = f_a(v_{a,t-L}) + g_a(v_{a,t-L}) \epsilon_{a,t} \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \\ @@ -321,7 +322,7 @@

    The basics

    v_{b,t} = f_a(v_{b,t-L}) + g_a(v_{b,t-L}) \epsilon_{b,t} \\ (1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2) \end{matrix}, -\end{equation}\] where \(y_t\) is the observed values, \(z_t\) is the demand size, which is a pure multiplicative ETS model on its own, \(w(\cdot)\) is the measurement function, \(r(\cdot)\) is the error function, \(f(\cdot)\) is the transition function and \(g(\cdot)\) is the persistence function (the subscripts allow separating the functions for different parts of the model). These four functions define how the elements of the vector \(v_{t}\) interact with each other. Furthermore, \(\epsilon_{a,t}\) and \(\epsilon_{b,t}\) are the mutually independent error terms, \(o_t\) is the binary occurrence variable (1 - demand is non-zero, 0 - no demand in the period \(t\)) which is distributed according to Bernoulli with probability \(p_t\) that has a Beta distribution (\(o_t \sim \text{Bernoulli} \left(p_t \right)\), \(p_t \sim \text{Beta} \left(a_t, b_t \right)\)). Any ETS model can be used for \(a_t\) and \(b_t\), and the transformation of them into the probability \(p_t\) depends on the type of the error. The general formula for the multiplicative error is: \[\begin{equation} \label{eq:oETS(MZZ)} +\end{equation}\] where \(y_t\) is the observed values, \(z_t\) is the demand size, which is a pure multiplicative ETS model on its own, \(w(\cdot)\) is the measurement function, \(r(\cdot)\) is the error function, \(f(\cdot)\) is the transition function and \(g(\cdot)\) is the persistence function (the subscripts allow separating the functions for different parts of the model). These four functions define how the elements of the vector \(v_{t}\) interact with each other. Furthermore, \(\epsilon_{a,t}\) and \(\epsilon_{b,t}\) are the mutually independent error terms, \(o_t\) is the binary occurrence variable (1 - demand is non-zero, 0 - no demand in the period \(t\)) which is distributed according to Bernoulli with probability \(p_t\) that has a logit-normal distribution (\(o_t \sim \text{Bernoulli} \left(p_t \right)\), \(p_t \sim \text{logit} \mathcal{N}\)). Any ETS model can be used for \(a_t\) and \(b_t\), and the transformation of them into the probability \(p_t\) depends on the type of the error. The general formula for the multiplicative error is: \[\begin{equation} \label{eq:oETS(MZZ)} p_t = \frac{a_t}{a_t+b_t} , \end{equation}\] while for the additive error it is: \[\begin{equation} \label{eq:oETS(AZZ)} p_t = \frac{\exp(a_t)}{\exp(a_t)+\exp(b_t)} . @@ -333,7 +334,8 @@

    The basics

    l_{z,t} = l_{z,t-1}( 1 + \alpha_{z} \epsilon_{z,t}) \\ (1 + \epsilon_{t}) \sim \text{log}\mathcal{N}(0, \sigma_\epsilon^2) \\ \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{a_t}{a_t+b_t} \\ a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \\ @@ -365,42 +367,36 @@

    iETS\(_F\)

    In case of the fixed \(a_t\) and \(b_t\), the iETS\(_G\) model reduces to: \[\begin{equation} \label{eq:ISSETS(MNN)Fixed} \tag{3} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli}(a, b) + o_t \sim \text{Bernoulli}(p) \end{matrix} . \end{equation}\]

    -

    The conditional h-steps ahead median of the demand occurrence probability is calculated as: \[\begin{equation} \label{eq:pt_fixed_expectation} - \mu_{o,t+h|t} = \tilde{p}_{t+h|t} = \frac{a}{a+b} . +

    The conditional h-steps ahead mean of the demand occurrence probability is calculated as: \[\begin{equation} \label{eq:pt_fixed_expectation} + \mu_{o,t+h|t} = \tilde{p}_{t+h|t} = \hat{p} . \end{equation}\]

    -

    The likelihood function used in the first step of the estimation of iETS can be simplified to: \[\begin{equation} \label{eq:ISSETS(MNN)FixedLikelihood} \tag{4} - \ell \left(a,b | o_t \right) = {\sum_{t=1}^T} \log \left( \frac{ \text{B} (o_t + a, 1 - o_t + b) }{ \text{B}(a,b) } \right) , -\end{equation}\] where \(B\) is the beta function.

    -

    Note, however that there can be combinations of \(a\) and \(b\) that will lead to the same fixed probability of occurrence \(p\), so there is no point in estimating the model (3) based on (4) . Instead, the simpler version of the iETS\(_F\) is fitted in the oes() function of the smooth package: \[\begin{equation} \label{eq:ISSETS(MNN)FixedSmooth} - \begin{matrix} - y_t = o_t z_t \\ - o_t \sim \text{Bernoulli}(p) - \end{matrix} , -\end{equation}\] so that the estimate of the probability \(p\) is calculated based on the maximisation of the following concentrated log-likelihood function: \[\begin{equation} \label{eq:ISSETS(MNN)FixedLikelihoodSmooth} - \ell \left(\hat{p} | o_t \right) = T_1 \log \hat{p} + T_0 \log (1-\hat{p}) , -\end{equation}\] where \(T_0\) is the number of zero observations and \(T_1\) is the number of non-zero observations in the data. The number of estimated parameters in this case is equal to \(k_z+1\), where \(k_z\) is the number of parameters for the demand sizes part, and 1 is for the estimation of the probability \(p\).

    +

    The estimate of the probability \(p\) is calculated based on the maximisation of the following concentrated log-likelihood function: \[\begin{equation} \label{eq:ISSETS(MNN)FixedLikelihoodSmooth} + \ell \left({p} | o_t \right) = T_1 \log {p} + T_0 \log (1-{p}) , +\end{equation}\] where \(T_0\) is the number of zero observations and \(T_1\) is the number of non-zero observations in the data. The number of estimated parameters in this case is equal to \(k_z+1\), where \(k_z\) is the number of parameters for the demand sizes part, and 1 is for the estimation of the probability \(p\). Maximising this likelihood deems the analytical solution for the \(p\): \[\begin{equation} \label{eq:ISSETS(MNN)FixedLikelihoodSmoothProbability} + \hat{p} = \frac{1}{T} \sum_{t=1}^T o_t , +\end{equation}\] where \(T\) is the number of all the available observations.

    The occurrence part of the model oETS\(_F\) is constructed using oes() function:

    ## Occurrence state space model estimated: Fixed probability
    -## Underlying ETS model: oETS[F]
    +## Underlying ETS model: oETS[F](MNN)
     ## Smoothing parameters:
     ## level 
     ##     0 
     ## Vector of initials:
     ## level 
    -## 0.555 
    +## 0.645 
     ## Information criteria: 
     ##      AIC     AICc      BIC     BICc 
    -## 153.1807 153.2177 155.8812 155.9682
    +## 145.0473 145.0844 147.7478 147.8349
    -

    +

    All the smooth forecasting functions support the occurrence part of the model. For example, here’s how the iETS(M,M,N)\(_F\) can be constructed:

    -
    ## Time elapsed: 0.11 seconds
    +
    ## Time elapsed: 0.25 seconds
     ## Model estimated: iETS(MMN)
     ## Occurrence model type: Fixed probability
     ## Persistence vector g:
    @@ -408,28 +404,27 @@ 

    iETS\(_F\)

    ## 0 0 ## Initial values were optimised. ## 6 parameters were estimated in the process -## Residuals standard deviation: 0.416 -## Cost function type: MSE; Cost function value: 0.173 +## Residuals standard deviation: 0.363 +## Loss function type: MSE; Loss function value: 0.132 ## ## Information criteria: ## AIC AICc BIC BICc -## 417.8305 418.4074 434.0333 430.6888 +## 400.2405 400.8174 416.4434 413.0988 ## Forecast errors: -## Bias: 91.6%; sMSE: 193.7%; RelRMSE: 1.652; sPIS: -5425%; sCE: -1115.8%
    -

    +## Bias: 90.9%; sMSE: 204.4%; RelRMSE: 1.374; sPIS: -6333.7%; sCE: -1010.8%
    +

    iETS\(_O\)

    The odds-ratio iETS uses only one model for the occurrence part, for the \(a_t\) variable (setting \(b_t=1\)), which simplifies the iETS\(_G\) model. For example, for the iETS\(_O\)(M,N,N): \[\begin{equation} \label{eq:iETSO} \tag{5} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, 1 \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{a_t}{a_t+1} \\ a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \end{matrix}. -\end{equation}\] The probability of occurrence in this model is equal to: \[\begin{equation} \label{eq:oETS_O(MNN)} - p_t = \frac{a_t}{a_t+1} . \end{equation}\]

    In the estimation of the model, the initial level is set to the transformed mean probability of occurrence \(l_{a,0}=\frac{\bar{p}}{1-\bar{p}}\) for multiplicative error model and \(l_{a,0} = \log l_{a,0}\) for the additive one, where \(\bar{p}=\frac{1}{T} \sum_{t=1}^T o_t\), the initial trend is equal to 0 in case of the additive and 1 in case of the multiplicative types. In cases of seasonal models, the regression with dummy variables is fitted, and its parameters are then used for the initials of the seasonal indices after the transformations similar to the level ones.

    The construction of the model is done via the following set of equations (example with oETS\(_O\)(M,N,N)): \[\begin{equation} \label{eq:iETSOEstimation} @@ -449,18 +444,18 @@

    iETS\(_O\)

    ## Underlying ETS model: oETS[O](MMN) ## Smoothing parameters: ## level trend -## 0.010 0.001 +## 0.026 0.000 ## Vector of initials: ## level trend -## 0.254 0.998 +## 0.099 1.041 ## Information criteria: ## AIC AICc BIC BICc -## 127.8927 128.2737 138.6947 139.5900
    +## 106.5971 106.9781 117.3991 118.2944
    -

    +

    And here’s the full iETS(M,M,N)\(_O\) model:

    -
    ## Time elapsed: 0.2 seconds
    +
    ## Time elapsed: 0.45 seconds
     ## Model estimated: iETS(MMN)
     ## Occurrence model type: Odds ratio
     ## Persistence vector g:
    @@ -468,15 +463,15 @@ 

    iETS\(_O\)

    ## 0 0 ## Initial values were optimised. ## 9 parameters were estimated in the process -## Residuals standard deviation: 0.416 -## Cost function type: MSE; Cost function value: 0.173 +## Residuals standard deviation: 0.363 +## Loss function type: MSE; Loss function value: 0.132 ## ## Information criteria: ## AIC AICc BIC BICc -## 392.5425 393.1194 416.8468 399.4008 +## 361.7903 362.3672 386.0946 368.6486 ## Forecast errors: -## Bias: 62.4%; sMSE: 97.5%; RelRMSE: 1.172; sPIS: -2366.9%; sCE: -544.4%
    -

    +## Bias: 37.3%; sMSE: 123.1%; RelRMSE: 1.066; sPIS: -3263.4%; sCE: -436.2%
    +

    This should give the same results as running, meaning that we ask explicitly for the es() function to use the earlier estimated model:

    This gives an additional flexibility, because the construction can be done in two steps, with a more refined model for the occurrence part (e.g. including explanatory variables).

    @@ -486,13 +481,12 @@

    iETS\(_I\)

    Similarly to the odds-ratio iETS, inverse-odds-ratio model uses only one model for the occurrence part, but for the \(b_t\) variable instead of \(a_t\) (now \(a_t=1\)). Here is an example of iETS\(_I\)(M,N,N): \[\begin{equation} \label{eq:iETSI} \tag{6} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(1, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{1}{1+b_t} \\ b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\ l_{b,t} = l_{b,t-1}( 1 + \alpha_{b} \epsilon_{b,t}) \\ (1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2) \end{matrix}. -\end{equation}\] The probability of occurrence in this model is equal to: \[\begin{equation} \label{eq:oETS_I(MNN)} - p_t = \frac{1}{1+b_t} . \end{equation}\]

    In the estimation of the model, the initial level is set to the transformed mean probability of occurrence \(l_{b,0}=\frac{1-\bar{p}}{\bar{p}}\) for multiplicative error model and \(l_{b,0} = \log l_{b,0}\) for the additive one, where \(\bar{p}=\frac{1}{T} \sum_{t=1}^T o_t\), the initial trend is equal to 0 in case of the additive and 1 in case of the multiplicative types. The seasonality is treated similar to the iETS\(_O\) model, but using the inverse-odds transformation.

    The construction of the model is done via the set of equations similar to the ones for the iETS\(_O\) model: \[\begin{equation} \label{eq:iETSIEstimation} @@ -512,18 +506,18 @@

    iETS\(_I\)

    ## Underlying ETS model: oETS[I](MMN) ## Smoothing parameters: ## level trend -## 0.119 0.000 +## 0.107 0.000 ## Vector of initials: ## level trend -## 17.109 0.877 +## 21.409 0.878 ## Information criteria: ## AIC AICc BIC BICc -## 135.4940 135.8749 146.2959 147.1912
    +## 110.5098 110.8908 121.3118 122.2071
    -

    +

    And here’s the full iETS(M,M,N)\(_O\) model:

    -
    ## Time elapsed: 0.18 seconds
    +
    ## Time elapsed: 0.42 seconds
     ## Model estimated: iETS(MMN)
     ## Occurrence model type: Inverse odds ratio
     ## Persistence vector g:
    @@ -531,15 +525,15 @@ 

    iETS\(_I\)

    ## 0 0 ## Initial values were optimised. ## 9 parameters were estimated in the process -## Residuals standard deviation: 0.416 -## Cost function type: MSE; Cost function value: 0.173 +## Residuals standard deviation: 0.363 +## Loss function type: MSE; Loss function value: 0.132 ## ## Information criteria: ## AIC AICc BIC BICc -## 400.1438 400.7207 424.4481 407.0021 +## 365.7030 366.2799 390.0073 372.5613 ## Forecast errors: -## Bias: 61.5%; sMSE: 95.3%; RelRMSE: 1.159; sPIS: -2289.8%; sCE: -526.2%
    -

    +## Bias: 33.8%; sMSE: 120.5%; RelRMSE: 1.055; sPIS: -3095.2%; sCE: -405.3%
    +

    Once again, an earlier estimated model can be used in the univariate forecasting functions:

    @@ -580,22 +574,22 @@

    iETS\(_D\)

    Here’s an example of the application of the model to the same artificial data:

    -
    ## Occurrence state space model estimated: None
    +
    ## Occurrence state space model estimated: Direct probability
     ## Underlying ETS model: oETS[D](MMN)
     ## Smoothing parameters:
     ## level trend 
    -## 0.005 0.000 
    +##     0     0 
     ## Vector of initials:
     ## level trend 
    -## 0.241 1.008 
    +## 0.342 1.011 
     ## Information criteria: 
     ##      AIC     AICc      BIC     BICc 
    -## 127.9212 128.3021 138.7231 139.6184
    +## 109.7768 110.1578 120.5788 121.4741
    -

    +

    The usage of the model in case of univariate forecasting functions is the same as in the cases of other occurrence models, discussed above:

    -
    ## Time elapsed: 0.11 seconds
    +
    ## Time elapsed: 0.24 seconds
     ## Model estimated: iETS(MMN)
     ## Occurrence model type: Direct
     ## Persistence vector g:
    @@ -604,15 +598,15 @@ 

    iETS\(_D\)

    ## Initial values were optimised. ## 5 parameters were estimated in the process ## 4 parameters were provided -## Residuals standard deviation: 0.416 -## Cost function type: MSE; Cost function value: 0.173 +## Residuals standard deviation: 0.363 +## Loss function type: MSE; Loss function value: 0.132 ## ## Information criteria: ## AIC AICc BIC BICc -## 384.5709 385.1479 398.0733 399.4292 +## 356.9700 357.5469 370.4724 371.8283 ## Forecast errors: -## Bias: 58.5%; sMSE: 91.4%; RelRMSE: 1.135; sPIS: -2011.4%; sCE: -482.8%
    -

    +## Bias: 32.5%; sMSE: 119.4%; RelRMSE: 1.05; sPIS: -3023.2%; sCE: -394.1%
    +

    iETS\(_G\)

    @@ -637,9 +631,9 @@

    iETS\(_G\)

    ## Underlying ETS model: oETS[G](MNN)(AAN) ## Information criteria: ## AIC AICc BIC BICc -## 131.0764 131.8919 147.2793 149.1960
    +## 111.1289 111.9444 127.3317 129.2484
    -

    +

    The oes() function accepts occurrence="g" and in this case calls for oesg() with the same types of ETS models for both parts:

    @@ -647,12 +641,12 @@

    iETS\(_G\)

    ## Underlying ETS model: oETS[G](MNN)(MNN) ## Information criteria: ## AIC AICc BIC BICc -## 132.1816 132.5626 142.9835 143.8789
    +## 111.9023 112.2833 122.7043 123.5996
    -

    +

    Finally, the more flexible way to construct iETS model would be to do it in two steps: either using oesg() or oes() and then using the es() with the provided model in occurrence variable. But a simpler option is available as well:

    -
    ## Time elapsed: 0.25 seconds
    +
    ## Time elapsed: 0.59 seconds
     ## Model estimated: iETS(MMN)
     ## Occurrence model type: General
     ## Persistence vector g:
    @@ -660,60 +654,60 @@ 

    iETS\(_G\)

    ## 0 0 ## Initial values were optimised. ## 13 parameters were estimated in the process -## Residuals standard deviation: 0.416 -## Cost function type: MSE; Cost function value: 0.173 +## Residuals standard deviation: 0.363 +## Loss function type: MSE; Loss function value: 0.132 ## ## Information criteria: ## AIC AICc BIC BICc -## 410.0518 410.6287 445.1580 408.9101 +## 369.6940 370.2709 404.8002 368.5523 ## Forecast errors: -## Bias: 61.4%; sMSE: 94.8%; RelRMSE: 1.156; sPIS: -2288.2%; sCE: -523.6%
    -

    +## Bias: 36.5%; sMSE: 122.4%; RelRMSE: 1.063; sPIS: -3221.9%; sCE: -428.8%
    +

    iETS\(_A\)

    Finally, there is an occurrence type selection mechanism. It tries out all the iETS subtypes of models, discussed above and selects the one that has the lowest information criterion (i.e. AIC). This subtype is called iETS\(_A\) (automatic), although it does not represent any specific model. Here’s an example:

    -
    ## Occurrence state space model estimated: None
    -## Underlying ETS model: oETS[D](MNN)
    +
    ## Occurrence state space model estimated: Odds ratio
    +## Underlying ETS model: oETS[O](MNN)
     ## Smoothing parameters:
     ## level 
    -## 0.014 
    +## 0.104 
     ## Vector of initials:
     ## level 
    -## 0.246 
    +##   0.1 
     ## Information criteria: 
     ##      AIC     AICc      BIC     BICc 
    -## 124.9098 125.0219 130.3107 130.5743
    +## 107.9023 108.0145 113.3033 113.5669
    -

    +

    The main restriction of the iETS models at the moment (smooth v.2.5.0) is that there is no model selection between the ETS models for the occurrence part. This needs to be done manually. Hopefully, this feature will appear in the next release of the package.

    The integer-valued iETS

    By default, the models assume that the data is continuous, which sounds counter intuitive for the typical intermittent demand forecasting tasks. However, (Svetunkov and Boylan 2017) showed that these models perform quite well in terms of forecasting accuracy for many cases. Still, there is also an option for the rounded up values, which is implemented in the es() function. This is not described in the manual and can be triggered via the rounded=TRUE parameter provided in ellipsis. Here’s an example:

    - -
    ## Time elapsed: 0.16 seconds
    +
    +
    ## Time elapsed: 0.32 seconds
     ## Model estimated: iETS(MNN)
     ## Occurrence model type: General
     ## Persistence vector g:
     ## alpha 
    -##     0 
    +## 0.001 
     ## Initial values were optimised.
     ## 7 parameters were estimated in the process
    -## Residuals standard deviation: 0.083
    -## Cost function type: Rounded; Cost function value: 10.813
    +## Residuals standard deviation: 0.086
    +## Loss function type: Rounded; Loss function value: 0
     ## 
     ## Information criteria:
    -##     AIC    AICc     BIC    BICc 
    -## -1.2374 -0.9583 16.2613 -1.1101 
    -## 95% parametric prediction intervals were constructed
    -## 90% of values are in the prediction interval
    +##      AIC     AICc      BIC     BICc 
    +## 102.9357 103.2148 120.4344 103.0630 
    +## 95% parametric prediction interval were constructed
    +## 100% of values are in the prediction interval
     ## Forecast errors:
    -## Bias: -9%; sMSE: 33.9%; RelRMSE: 0.877; sPIS: 162%; sCE: -126.4%
    -

    -

    Keep in mind that the model with the rounded up values is estimated differently than it continuous counterpart and produces more adequate results for the highly intermittent data with low level of demand sizes. In all the other cases, the continuous iETS models are recommended. In fact, if you need to produce integer-valued prediction intervals, then you can produce the intervals from a continuous model and then round them up (see discussion in (Svetunkov and Boylan 2017) for details).

    +## Bias: 61.4%; sMSE: 29.8%; RelRMSE: 0.96; sPIS: -1773.5%; sCE: -322.5%
    +

    +

    Keep in mind that the model with the rounded up values is estimated differently than it continuous counterpart and produces more adequate results for the highly intermittent data with low level of demand sizes. In all the other cases, the continuous iETS models are recommended. In fact, if you need to produce integer-valued prediction interval, then you can produce the interval from a continuous model and then round them up (see discussion in (Svetunkov and Boylan 2017) for details).

    References

    diff --git a/inst/doc/simulate.html b/inst/doc/simulate.html index 90f8bbc..db6cffe 100644 --- a/inst/doc/simulate.html +++ b/inst/doc/simulate.html @@ -12,7 +12,7 @@ - + Simulate functions of the package @@ -303,7 +303,7 @@

    Simulate functions of the package

    Ivan Svetunkov

    -

    2019-04-25

    +

    2019-06-13

    @@ -317,23 +317,23 @@

    Exponential Smoothing

    The resulting ourSimulation object contains: ourSimulation$model – name of ETS model used in simulation; ourSimulation$data – vector of simulated data; ourSimulation$states – matrix of states, where columns contain different states and rows corresponds to time; ourSimulation$persistence – vector of smoothing parameters used in simulation (in our case generated randomly); ourSimulation$residuals – vector of errors generated in the simulation; ourSimulation$occurrence – vector of demand occurrences (zeroes and ones, in our case only ones); ourSimulation$logLik – true likelihood function for the used generating model.

    We can plot produced data, states or residuals in order to see what was generated. This is done using:

    -

    +

    If only one time series has been generated, we can use a simpler command in order to plot it:

    -

    +

    Now let’s use more complicated model and be more specific, providing persistence vector:

    -

    +

    High values of smoothing parameters are not advised for models with multiplicative components, because they may lead to explosive data. As for randomizer the default values seem to work fine in the majority of cases, but if we want, we can intervene and ask for something specific (for example, some values taken from some estimated model):

    -

    +

    It is advised to use lower values for sdlog and sd for models with multiplicative components. Once again, using higher values may lead to data with explosive behaviour.

    If we need intermittent data, we can define probability of occurrences. And it also makes sense to use pure multiplicative models and specify initials in this case:

    -

    +

    If we want to have several time series generated using the same parameters then we can use nsim parameter:

    We will have the same set of returned values, but with one more dimension. So, for example, we will end up with matrix for ourSimulation$data and array for ourSimulation$states.

    @@ -346,7 +346,7 @@

    Exponential Smoothing

    plot(x) plot(ourData$data[,1]) par(mfcol=c(1,1))
    -

    +

    As we see the level is the same and variance is similar for both series. Achievement unlocked!

    @@ -356,11 +356,11 @@

    SARIMA

    The resulting ourSimulation object contains: ourSimulation$model – name of ARIMA model used in simulation; ourSimulation$AR – matrix with generated or provided AR parameters, ourSimulation$MA – matrix with MA parameters, ourSimulation$AR – vector with constant values (one for each time series), ourSimulation$initial – matrix with initial values, ourSimulation$data – matrix of simulated data (if we had nsim=1, then that would be a vector); ourSimulation$states – array of states, where columns contain different states, rows corresponds to time and last dimension is for each time series; ourSimulation$residuals – vector of errors generated in the simulation; ourSimulation$occurrence – vector of demand occurrences (zeroes and ones, in our case only ones); ourSimulation$logLik – true likelihood function for the used generating model.

    Similarly to sim.es(), we can plot produced data, states or residuals for each time series in order to see what was generated. Here’s an example:

    -

    +

    Now let’s use more complicated model. For example, data from SARIMA(0,1,1)(1,0,2)_12 with drift can be generated using:

    -

    +

    If we want to provide some specific parameters, then we should follow the structure: from lower lag to lag from lower order to higher order. For example, the same model with predefined MA terms will be:

    @@ -368,28 +368,28 @@

    SARIMA

    ## Number of generated series: 1 ## AR parameters: ## Lag 12 -## AR(1) 0.554 +## AR(1) 0.72 ## MA parameters: ## Lag 1 Lag 12 ## MA(1) 0.5 0.2 ## MA(2) 0.0 0.3 -## Constant value: -181.278 -## True likelihood: -735.076
    +## Constant value: -59.308 +## True likelihood: -674.043

    We can create time series with several frequencies For example, some sort of daily series from SARIMA(1,0,2)(0,1,1)_7(1,0,1)_30 can be generated with a command:

    ## Data generated from: SARIMA(1,0,2)[1](0,1,1)[7](1,0,1)[30]
     ## Number of generated series: 1
     ## AR parameters: 
    -##       Lag 1 Lag 30
    -## AR(1) 0.125  0.542
    +##        Lag 1 Lag 30
    +## AR(1) -0.744  0.496
     ## MA parameters: 
     ##        Lag 1  Lag 7 Lag 30
    -## MA(1) -0.194 -0.147 -0.079
    -## MA(2)  0.290  0.000  0.000
    -## True likelihood: -1804.511
    +## MA(1) -0.634 -0.241 0.931 +## MA(2) 0.431 0.000 0.000 +## True likelihood: -1577.575
    -

    +

    sim.ssarima also supports intermittent data, which is defined via iprob parameter, similar to sim.es():

    @@ -397,14 +397,14 @@

    SARIMA

    ## Number of generated series: 1 ## AR parameters: ## Lag 1 -## AR(1) -0.443 +## AR(1) -0.643 ## MA parameters: -## Lag 1 Lag 7 -## MA(1) -0.822 0.501 -## MA(2) 0.565 0.000 -## True likelihood: -593.91
    +## Lag 1 Lag 7 +## MA(1) 0.938 0.618 +## MA(2) 0.233 0.000 +## True likelihood: -613.337
    -

    +

    Finally we can use simulate() function in a similar manner as with sim.es(). For example:

    -

    +

    As we see series demonstrate similarities in dynamics and have similar variances.

    @@ -426,31 +426,31 @@

    Complex Exponential Smoothing

    The resulting ourSimulation object contains: ourSimulation$model – name of CES model used in simulation; ourSimulation$A – vector of complex smoothing parameters A, ourSimulation$B – vector of complex smoothing parameters A (if “partial” or “full” seasonal model was used in the simulation), ourSimulation$initial – array with initial values, ourSimulation$data – matrix of simulated data (if we had nsim=1, then that would be a vector); ourSimulation$states – array of states, where columns contain different states, rows corresponds to time and last dimension is for each time series; ourSimulation$residuals – vector of errors generated in the simulation; ourSimulation$occurrence – vector of demand occurrences (zeroes and ones, in our case only ones); ourSimulation$logLik – true likelihood function for the used generating model.

    Similarly to other simulate functions in “smooth”, we can plot produced data, states or residuals for each time series in order to see what was generated. Here’s an example:

    -

    +

    We can also see a brief summary of our simulated data:

    ## Data generated from: CES(n)
     ## Number of generated series: 1
    -## Smoothing parameter A: 1.655+1.074i
    -## True likelihood: -554.932
    +## Smoothing parameter A: 1.998+1.087i +## True likelihood: -568.548

    We can produce one out of three possible seasonal CES models. For example, Let’s generate data from “Simple CES”, which does not have a level:

    -

    +

    Now let’s be more creative and mess around with the generated initial values of the previous model. We will make some of them equal to zero and regenerate the data:

    -

    +

    The resulting generated series has properties close to the ones that solar irradiation data has: changing amplitude of seasonality without changes in level. We have also chosen a random number generator, based on Student distribution rather than normal. This is done just in order to show what can be done using simulate functions in “smooth”.

    We can also produce CES with so called “partial” seasonality, which corresponds to CES(n) with additive seasonal components. Let’s produce 10 of such time series:

    -

    +

    Finally, the most complicated CES model is the one with “full” seasonality, implying that there are two complex exponential smoothing models inside: one for seasonal and the other for non-seasonal part:

    -

    +

    The generated smoothing parameters may sometimes lead to explosive behaviour and produce meaningless time series. That is why it is advised to use parameters of a model fitted to time series of interest. For example, here we generate something crazy and then simulate the data:

    -

    +

    Generalised Exponential Smoothing

    @@ -473,7 +473,7 @@

    Generalised Exponential Smoothing

    plot(x) plot(ourData) par(mfcol=c(1,1))
    -

    +

    Note that GUM is still an ongoing research and its properties are currently understudied.

    @@ -481,13 +481,13 @@

    Simple Moving Average

    Now that there is a model underlying simple moving averages, we can simulate the data for it. Here how it can be done:

    -

    +

    As usual, you can use simulate function as well:

    -

    +

    Vector Exponential Smoothing

    @@ -498,7 +498,7 @@

    Vector Exponential Smoothing

    -

    +

    When using simulate with ves, you can specify randomizer and additional parameters for distributions. For example, you can use mvrnorm() from MASS package and if you don’t provide mu and Sigma then they will be extracted from the model.

    diff --git a/inst/doc/sma.R b/inst/doc/sma.R index c0c16ec..3adee93 100644 --- a/inst/doc/sma.R +++ b/inst/doc/sma.R @@ -10,5 +10,5 @@ require(Mcomp) sma(M3$N2457$x, h=18, silent=FALSE) ## ----sma_N2568----------------------------------------------------------- -sma(M3$N2568$x, h=18) +sma(M3$N2568$x, h=18, interval=TRUE) diff --git a/inst/doc/sma.Rmd b/inst/doc/sma.Rmd index 5395af5..2a569fb 100644 --- a/inst/doc/sma.Rmd +++ b/inst/doc/sma.Rmd @@ -36,7 +36,7 @@ It appears that SMA(13) is the optimal model for this time series, which is not If we try selecting SMA order for data without substantial trend, then we will end up with some other order. For example, let's consider a seasonal time series N2568: ```{r sma_N2568} -sma(M3$N2568$x, h=18) +sma(M3$N2568$x, h=18, interval=TRUE) ``` Here we end up with SMA(12). Note that the order of moving average corresponds to seasonal frequency, which is usually a first step in classical time series decomposition. We however do not have centred moving average, we deal with simple one, so decomposition should not be done based on this model. diff --git a/inst/doc/sma.html b/inst/doc/sma.html index 120557f..eb936d7 100644 --- a/inst/doc/sma.html +++ b/inst/doc/sma.html @@ -12,7 +12,7 @@ - + sma() - Simple Moving Average @@ -303,7 +303,7 @@

    sma() - Simple Moving Average

    Ivan Svetunkov

    -

    2019-04-25

    +

    2019-06-13

    @@ -315,12 +315,12 @@

    2019-04-25

    You may note that Mcomp depends on forecast package and if you load both forecast and smooth, then you will have a message that forecast() function is masked from the environment. There is nothing to be worried about - smooth uses this function for consistency purposes and has exactly the same original forecast() as in the forecast package. The inclusion of this function in smooth was done only in order not to include forecast in dependencies of the package.

    By default SMA does order selection based on AICc and returns the model with the lowest value:

    -
    ## Time elapsed: 0.19 seconds
    +
    ## Time elapsed: 0.33 seconds
     ## Model estimated: SMA(13)
     ## Initial values were produced using backcasting.
     ## 2 parameters were estimated in the process
     ## Residuals standard deviation: 2095.988
    -## Cost function type: MSE; Cost function value: 4393167.585
    +## Loss function type: MSE; Loss function value: 4393167.585
     ## 
     ## Information criteria:
     ##      AIC     AICc      BIC     BICc 
    @@ -328,17 +328,18 @@ 

    2019-04-25

    It appears that SMA(13) is the optimal model for this time series, which is not obvious. Note also that the forecast trajectory of SMA(13) is not just a straight line. This is because the actual values are used in construction of point forecasts up to h=13.

    If we try selecting SMA order for data without substantial trend, then we will end up with some other order. For example, let’s consider a seasonal time series N2568:

    - -
    ## Time elapsed: 0.22 seconds
    +
    +
    ## Time elapsed: 0.31 seconds
     ## Model estimated: SMA(12)
     ## Initial values were produced using backcasting.
     ## 2 parameters were estimated in the process
     ## Residuals standard deviation: 1847.527
    -## Cost function type: MSE; Cost function value: 3413356.678
    +## Loss function type: MSE; Loss function value: 3413356.678
     ## 
     ## Information criteria:
     ##      AIC     AICc      BIC     BICc 
    -## 2078.206 2078.312 2083.713 2083.965
    +## 2078.206 2078.312 2083.713 2083.965 +## 95% parametric prediction interval were constructed

    Here we end up with SMA(12). Note that the order of moving average corresponds to seasonal frequency, which is usually a first step in classical time series decomposition. We however do not have centred moving average, we deal with simple one, so decomposition should not be done based on this model.

    diff --git a/inst/doc/smooth.Rmd b/inst/doc/smooth.Rmd index 1ea0542..430a257 100644 --- a/inst/doc/smooth.Rmd +++ b/inst/doc/smooth.Rmd @@ -26,10 +26,8 @@ The package includes the following functions: 6. [sma() - Simple Moving Average in state-space form](sma.html); 7. [Simulate functions of the package](simulate.html). 8. `smoothCombine()` - function that combines forecasts of the main univariate functions of smooth package. -9. `oes()` -- function that estimates probability of occurrence of variable using one of the following model types: 1. Fixed probability; 2. Odds ratio probability; 3. Inverse odds ratio probability; 4. Direct probability; 5. General. It can also select the most appropriate model among these five. The model produced by `oes()` can then be used in any forecasting function as input variable for `occurrence` parameter. This is the new function introduced in smooth v2.5.0, substituting the old `iss()` function. +9. [oes() - Occurrence part of iETS model](oes.html) -- function that estimates probability of occurrence of variable using one of the following model types: 1. Fixed probability; 2. Odds ratio probability; 3. Inverse odds ratio probability; 4. Direct probability; 5. General. It can also select the most appropriate model among these five. The model produced by `oes()` can then be used in any forecasting function as input variable for `occurrence` parameter. This is the new function introduced in smooth v2.5.0, substituting the old `iss()` function. There is also vector counterpart of this function called `viss()` which implements multivariate fixed and logistic probabilities. -11. `xregExpander()` -- function that creates lags and leads of the provided exogenous variables (either vector or matrix) and forecasts the missing values. This thing returns the matrix. -12. `stepwise()` -- the function that implements stepwise based on information criteria and partial correlations. Easier to use and works faster than `step()` from `stats` package. The functions (1) - (4) and (6) return object of class `smooth`, (5) returns the object of class `vsmooth`, (7) returns `smooth.sim` class and finally (8) returns `oes` or `viss` (depending on the function used). There are several methods for these classes in the package. @@ -45,7 +43,7 @@ There are several functions that can be used together with the forecasting funct 6. `plot(ourModel)` -- plots states of constructed model. If number of states is higher than 10, then several graphs are produced. 7. `simulate(ourModel)` -- produces data simulated from provided model; 8. `summary(forecast(ourModel))` -- prints point and interval forecasts; -9. `plot(forecast(ourModel))` -- produces graph with actuals, forecast, fitted and intervals using `graphmaker()` function. +9. `plot(forecast(ourModel))` -- produces graph with actuals, forecast, fitted and prediction interval using `graphmaker()` function from `greybox` package. 10. `logLik(ourModel)` -- returns log-likelihood of the model; 11. `nobs(ourModel)` -- returns number of observations in-sample we had; 12. `nParam(ourModel)` -- number of estimated parameters (originally from `greybox` package); diff --git a/inst/doc/smooth.html b/inst/doc/smooth.html index 4aabff9..b03541d 100644 --- a/inst/doc/smooth.html +++ b/inst/doc/smooth.html @@ -12,7 +12,7 @@ - + smooth: forecasting using state-space models @@ -217,7 +217,7 @@

    smooth: forecasting using state-space models

    Ivan Svetunkov

    -

    2019-04-25

    +

    2019-06-13

    @@ -232,9 +232,7 @@

    2019-04-25

  • sma() - Simple Moving Average in state-space form;
  • Simulate functions of the package.
  • smoothCombine() - function that combines forecasts of the main univariate functions of smooth package.
  • -
  • oes() – function that estimates probability of occurrence of variable using one of the following model types: 1. Fixed probability; 2. Odds ratio probability; 3. Inverse odds ratio probability; 4. Direct probability; 5. General. It can also select the most appropriate model among these five. The model produced by oes() can then be used in any forecasting function as input variable for occurrence parameter. This is the new function introduced in smooth v2.5.0, substituting the old iss() function. There is also vector counterpart of this function called viss() which implements multivariate fixed and logistic probabilities.
  • -
  • xregExpander() – function that creates lags and leads of the provided exogenous variables (either vector or matrix) and forecasts the missing values. This thing returns the matrix.
  • -
  • stepwise() – the function that implements stepwise based on information criteria and partial correlations. Easier to use and works faster than step() from stats package.
  • +
  • oes() - Occurrence part of iETS model – function that estimates probability of occurrence of variable using one of the following model types: 1. Fixed probability; 2. Odds ratio probability; 3. Inverse odds ratio probability; 4. Direct probability; 5. General. It can also select the most appropriate model among these five. The model produced by oes() can then be used in any forecasting function as input variable for occurrence parameter. This is the new function introduced in smooth v2.5.0, substituting the old iss() function. There is also vector counterpart of this function called viss() which implements multivariate fixed and logistic probabilities.
  • The functions (1) - (4) and (6) return object of class smooth, (5) returns the object of class vsmooth, (7) returns smooth.sim class and finally (8) returns oes or viss (depending on the function used). There are several methods for these classes in the package.

    @@ -249,7 +247,7 @@

    Methods for the class smooth

  • plot(ourModel) – plots states of constructed model. If number of states is higher than 10, then several graphs are produced.
  • simulate(ourModel) – produces data simulated from provided model;
  • summary(forecast(ourModel)) – prints point and interval forecasts;
  • -
  • plot(forecast(ourModel)) – produces graph with actuals, forecast, fitted and intervals using graphmaker() function.
  • +
  • plot(forecast(ourModel)) – produces graph with actuals, forecast, fitted and prediction interval using graphmaker() function from greybox package.
  • logLik(ourModel) – returns log-likelihood of the model;
  • nobs(ourModel) – returns number of observations in-sample we had;
  • nParam(ourModel) – number of estimated parameters (originally from greybox package);
  • diff --git a/inst/doc/ssarima.R b/inst/doc/ssarima.R index a6c3758..19df917 100644 --- a/inst/doc/ssarima.R +++ b/inst/doc/ssarima.R @@ -29,8 +29,8 @@ x <- cbind(rnorm(length(M3$N2457$x),50,3),rnorm(length(M3$N2457$x),100,7)) ourModel <- auto.ssarima(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE) ## ----auto_ssarima_N2457_xreg_update-------------------------------------- -ssarima(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, xreg=x, updateX=TRUE, intervals=TRUE) +ssarima(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, xreg=x, updateX=TRUE, interval=TRUE) ## ----auto_ssarima_N2457_combination-------------------------------------- -ssarima(M3$N2457$x, h=18, holdout=FALSE, intervals=TRUE, combine=TRUE) +ssarima(M3$N2457$x, h=18, holdout=FALSE, interval=TRUE, combine=TRUE) diff --git a/inst/doc/ssarima.Rmd b/inst/doc/ssarima.Rmd index e0f06f8..b24f73a 100644 --- a/inst/doc/ssarima.Rmd +++ b/inst/doc/ssarima.Rmd @@ -70,12 +70,12 @@ ourModel <- auto.ssarima(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE) we can then reuse it: ```{r auto_ssarima_N2457_xreg_update} -ssarima(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, xreg=x, updateX=TRUE, intervals=TRUE) +ssarima(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, xreg=x, updateX=TRUE, interval=TRUE) ``` Finally, we can combine several SARIMA models: ```{r auto_ssarima_N2457_combination} -ssarima(M3$N2457$x, h=18, holdout=FALSE, intervals=TRUE, combine=TRUE) +ssarima(M3$N2457$x, h=18, holdout=FALSE, interval=TRUE, combine=TRUE) ``` diff --git a/inst/doc/ssarima.html b/inst/doc/ssarima.html index cfdc666..99d3d89 100644 --- a/inst/doc/ssarima.html +++ b/inst/doc/ssarima.html @@ -12,7 +12,7 @@ - + ssarima() - State-Space ARIMA @@ -303,7 +303,7 @@

    ssarima() - State-Space ARIMA

    Ivan Svetunkov

    -

    2019-04-25

    +

    2019-06-13

    @@ -314,7 +314,7 @@

    2019-04-25

    require(Mcomp)

    The default call constructs ARIMA(0,1,1):

    -
    ## Time elapsed: 0.02 seconds
    +
    ## Time elapsed: 0.03 seconds
     ## Model estimated: ARIMA(0,1,1)
     ## Matrix of MA terms:
     ##        Lag 1
    @@ -322,7 +322,7 @@ 

    2019-04-25

    ## Initial values were produced using backcasting. ## 2 parameters were estimated in the process ## Residuals standard deviation: 2097.877 -## Cost function type: MSE; Cost function value: 4401089.934 +## Loss function type: MSE; Loss function value: 4401089.934 ## ## Information criteria: ## AIC AICc BIC BICc @@ -330,7 +330,7 @@

    2019-04-25

    Some more complicated model can be defined using parameter orders the following way:

    -
    ## Time elapsed: 0.11 seconds
    +
    ## Time elapsed: 0.15 seconds
     ## Model estimated: SARIMA(0,1,1)[1](1,0,1)[12]
     ## Matrix of AR terms:
     ##       Lag 12
    @@ -341,7 +341,7 @@ 

    2019-04-25

    ## Initial values were produced using backcasting. ## 4 parameters were estimated in the process ## Residuals standard deviation: 1902.156 -## Cost function type: MSE; Cost function value: 3618199.058 +## Loss function type: MSE; Loss function value: 3618199.058 ## ## Information criteria: ## AIC AICc BIC BICc @@ -349,7 +349,7 @@

    2019-04-25

    This would construct us seasonal ARIMA(0,1,1)(1,0,1)\(_{12}\).

    We could try selecting orders manually, but this can also be done automatically via auto.ssarima() function:

    -
    ## Time elapsed: 1.98 seconds
    +
    ## Time elapsed: 4.01 seconds
     ## Model estimated: SARIMA(0,1,2)[1](0,0,3)[12]
     ## Matrix of MA terms:
     ##        Lag 1 Lag 12
    @@ -359,14 +359,14 @@ 

    2019-04-25

    ## Initial values were produced using backcasting. ## 6 parameters were estimated in the process ## Residuals standard deviation: 1784.234 -## Cost function type: MSE; Cost function value: 3183490.065 +## Loss function type: MSE; Loss function value: 3183490.065 ## ## Information criteria: ## AIC AICc BIC BICc ## 2060.307 2061.085 2076.777 2078.622

    Automatic order selection in SSARIMA with optimised initials does not work well and in general is not recommended. This is partially because of the possible high number of parameters in some models and partially because of potential overfitting of first observations when non-zero order of AR is selected. This problem can be seen on example of another time series (which has complicated seasonality):

    -
    ## Time elapsed: 3.3 seconds
    +
    ## Time elapsed: 8.33 seconds
     ## Model estimated: SARIMA(0,0,3)[1](3,1,3)[12]
     ## Matrix of AR terms:
     ##       Lag 12
    @@ -381,13 +381,13 @@ 

    2019-04-25

    ## Initial values were produced using backcasting. ## 10 parameters were estimated in the process ## Residuals standard deviation: 281.532 -## Cost function type: MSE; Cost function value: 79260.469 +## Loss function type: MSE; Loss function value: 79260.469 ## ## Information criteria: ## AIC AICc BIC BICc ## 1544.784 1547.052 1571.605 1576.915
    -
    ## Time elapsed: 4.72 seconds
    +
    ## Time elapsed: 11.83 seconds
     ## Model estimated: ARIMA(0,0,3) with constant
     ## Matrix of MA terms:
     ##       Lag 1
    @@ -398,7 +398,7 @@ 

    2019-04-25

    ## Initial values were optimised. ## 8 parameters were estimated in the process ## Residuals standard deviation: 411.517 -## Cost function type: MSE; Cost function value: 169345.889 +## Loss function type: MSE; Loss function value: 169345.889 ## ## Information criteria: ## AIC AICc BIC BICc @@ -406,7 +406,7 @@

    2019-04-25

    As can be seen from the second graph, ssarima with optimal initial does not select seasonal model and reverts to ARIMA(0,0,3) with constant. In theory this can be due to implemented order selection algorithm, however if we estimate all the model in the pool separately, we will see that this model is optimal for this time series when this type of initials is used.

    A power of ssarima() function is that it can estimate SARIMA models with multiple seasonalities. For example, SARIMA(0,1,1)(0,0,1)_6(1,0,1)_12 model can be estimated the following way:

    -
    ## Time elapsed: 0.16 seconds
    +
    ## Time elapsed: 0.4 seconds
     ## Model estimated: SARIMA(0,1,1)[1](0,0,1)[6](1,0,1)[12]
     ## Matrix of AR terms:
     ##       Lag 12
    @@ -417,7 +417,7 @@ 

    2019-04-25

    ## Initial values were produced using backcasting. ## 5 parameters were estimated in the process ## Residuals standard deviation: 1857.086 -## Cost function type: MSE; Cost function value: 3448766.732 +## Loss function type: MSE; Loss function value: 3448766.732 ## ## Information criteria: ## AIC AICc BIC BICc @@ -428,29 +428,29 @@

    2019-04-25

    If we save model:

    we can then reuse it:

    - -
    ## Time elapsed: 0.34 seconds
    -## Model estimated: SARIMAX(0,0,1)[1](0,1,3)[12] with drift
    +
    +
    ## Time elapsed: 0.8 seconds
    +## Model estimated: SARIMAX(0,0,2)[1](0,0,3)[12] with constant
     ## Matrix of MA terms:
     ##       Lag 1 Lag 12
    -## MA(1) 0.112  0.082
    -## MA(2) 0.000  0.115
    -## MA(3) 0.000  0.088
    -## Constant value is: 65.664
    +## MA(1) 0.237 -0.143
    +## MA(2) 0.135  0.275
    +## MA(3) 0.000  0.221
    +## Constant value is: -719.789
     ## Initial values were provided by user.
     ## 1 parameter was estimated in the process
    -## 50 parameters were provided
    -## Residuals standard deviation: 2052.711
    +## 52 parameters were provided
    +## Residuals standard deviation: 2605.3
     ## Xreg coefficients were estimated in a crazy style
    -## Cost function type: MSE; Cost function value: 4213621.558
    +## Loss function type: MSE; Loss function value: 6787587.416
     ## 
     ## Information criteria:
     ##      AIC     AICc      BIC     BICc 
    -## 2082.547 2082.582 2085.292 2085.376 
    -## 95% parametric prediction intervals were constructed
    +## 2137.376 2137.411 2140.120 2140.204 +## 95% parametric prediction interval were constructed

    Finally, we can combine several SARIMA models:

    - -
    ## Time elapsed: 0.01 seconds
    +
    +
    ## Time elapsed: 0.03 seconds
     ## Model estimated: ARIMA(0,1,1)
     ## Matrix of MA terms:
     ##        Lag 1
    @@ -458,12 +458,12 @@ 

    2019-04-25

    ## Initial values were produced using backcasting. ## 2 parameters were estimated in the process ## Residuals standard deviation: 2097.877 -## Cost function type: MSE; Cost function value: 4401089.934 +## Loss function type: MSE; Loss function value: 4401089.934 ## ## Information criteria: ## AIC AICc BIC BICc ## 2089.553 2089.660 2095.042 2095.297 -## 95% parametric prediction intervals were constructed
    +## 95% parametric prediction interval were constructed

    References

    diff --git a/inst/doc/ves.Rmd b/inst/doc/ves.Rmd index 0650ce5..0949238 100644 --- a/inst/doc/ves.Rmd +++ b/inst/doc/ves.Rmd @@ -57,6 +57,6 @@ Number of estimated parameters in the model can be extracted via `nParam()` meth AICc and BICc for the vector models are calculated as proposed in [@Bedrick1994] and [@Tremblay2004]. -Currently we don't do model selection, don't have exogenous variables and don't produce conditional prediction intervals. But at least it works and allows you to play around with it :). +Currently we don't do model selection, don't have exogenous variables and don't produce conditional prediction interval. But at least it works and allows you to play around with it :). ### References diff --git a/inst/doc/ves.html b/inst/doc/ves.html index 4d3b8ff..cc7b49a 100644 --- a/inst/doc/ves.html +++ b/inst/doc/ves.html @@ -12,7 +12,7 @@ - + ves() - Vector Exponential Smoothing @@ -303,7 +303,7 @@

    ves() - Vector Exponential Smoothing

    Ivan Svetunkov

    -

    2019-04-25

    +

    2019-06-13

    @@ -315,10 +315,10 @@

    2019-04-25

    ves() function allows constructing Vector Exponential Smoothing (aka “VISTS” discussed by Silva, Hyndman, and Snyder 2010) in either pure additive or pure multiplicative form. The function has several elements that can either be individual or grouped. The former means that all the time series use the same value. For example, persistence="g" means that the smoothing parameters for all the series are the same. A simple call for ves() results in estimation of VES(A,N,N) with grouped smoothing parameters, transition matrix and individual initials:

    -
    ## Time elapsed: 0.04 seconds
    +
    ## Time elapsed: 0.07 seconds
     ## Model estimated: VES(ANN)
     ## 6 parameters were estimated for 2 time series in the process
    -## Cost function type: likelihood; Cost function value: 13.717
    +## Loss function type: likelihood; Loss function value: 13.717
     ## 
     ## Information criteria:
     ##      AIC     AICc      BIC     BICc 
    @@ -340,7 +340,7 @@ 

    2019-04-25

    Number of estimated parameters in the model can be extracted via nParam() method. However, when it comes to the calculation of the number of degrees of freedom in the model, this value is divided by the number of series (Lütkepohl 2005). So both ourModel$Sigma and all the information criteria rely on the \(df = T - k_m\), where \(T\) is the number of observations and \(k_m = \frac{k}{m}\) is the number of parameters \(k\) per series (\(m\) is the number of series).

    AICc and BICc for the vector models are calculated as proposed in (Bedrick and Tsai 1994) and (Tremblay and Wallach 2004).

    -

    Currently we don’t do model selection, don’t have exogenous variables and don’t produce conditional prediction intervals. But at least it works and allows you to play around with it :).

    +

    Currently we don’t do model selection, don’t have exogenous variables and don’t produce conditional prediction interval. But at least it works and allows you to play around with it :).

    References

    diff --git a/man/auto.ces.Rd b/man/auto.ces.Rd index f357a2c..e57a9ab 100644 --- a/man/auto.ces.Rd +++ b/man/auto.ces.Rd @@ -4,11 +4,11 @@ \alias{auto.ces} \title{Complex Exponential Smoothing Auto} \usage{ -auto.ces(data, models = c("none", "simple", "full"), +auto.ces(y, models = c("none", "simple", "full"), initial = c("optimal", "backcasting"), ic = c("AICc", "AIC", "BIC", - "BICc"), cfType = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", + "BICc"), loss = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, cumulative = FALSE, - intervals = c("none", "parametric", "semiparametric", "nonparametric"), + interval = c("none", "parametric", "semiparametric", "nonparametric"), level = 0.95, occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", bounds = c("admissible", "none"), silent = c("all", "graph", @@ -17,7 +17,7 @@ auto.ces(data, models = c("none", "simple", "full"), transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{models}{The vector containing several types of seasonality that should be used in CES selection. See \link[smooth]{ces} for more details about the @@ -30,12 +30,12 @@ produced using backcasting procedure.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -52,28 +52,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -155,6 +155,9 @@ state space 2 described in Svetunkov, Kourentzes (2015) with the information potential equal to the approximation error using different types of seasonality and chooses the one with the lowest value of information criterion. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("ces","smooth")} } \examples{ @@ -166,10 +169,9 @@ auto.ces(y,h=20,holdout=FALSE) library("Mcomp") \dontrun{y <- ts(c(M3$N0740$x,M3$N0740$xx),start=start(M3$N0740$x),frequency=frequency(M3$N0740$x)) # Selection between "none" and "full" seasonalities -auto.ces(y,h=8,holdout=TRUE,models=c("n","f"),intervals="p",level=0.8,ic="AIC")} +auto.ces(y,h=8,holdout=TRUE,models=c("n","f"),interval="p",level=0.8,ic="AIC")} -y <- ts(c(M3$N1683$x,M3$N1683$xx),start=start(M3$N1683$x),frequency=frequency(M3$N1683$x)) -ourModel <- auto.ces(y,h=18,holdout=TRUE,intervals="sp") +ourModel <- auto.ces(M3[[1683]],interval="sp") summary(ourModel) forecast(ourModel) diff --git a/man/auto.gum.Rd b/man/auto.gum.Rd index 0c17f00..13861fb 100644 --- a/man/auto.gum.Rd +++ b/man/auto.gum.Rd @@ -4,31 +4,32 @@ \alias{auto.gum} \title{Automatic GUM} \usage{ -auto.gum(data, orderMax = 3, lagMax = frequency(data), type = c("A", - "M", "Z"), initial = c("backcasting", "optimal"), ic = c("AICc", - "AIC", "BIC", "BICc"), cfType = c("MSE", "MAE", "HAM", "MSEh", "TMSE", - "GTMSE", "MSCE"), h = 10, holdout = FALSE, cumulative = FALSE, - intervals = c("none", "parametric", "semiparametric", "nonparametric"), - level = 0.95, occurrence = c("none", "auto", "fixed", "general", - "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", +auto.gum(y, orders = 3, lags = frequency(y), type = c("additive", + "multiplicative", "select"), initial = c("backcasting", "optimal"), + ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("MSE", "MAE", "HAM", + "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, + cumulative = FALSE, interval = c("none", "parametric", + "semiparametric", "nonparametric"), level = 0.95, + occurrence = c("none", "auto", "fixed", "general", "odds-ratio", + "inverse-odds-ratio", "direct"), oesmodel = "MNN", bounds = c("restricted", "admissible", "none"), silent = c("all", "graph", "legend", "output", "none"), xreg = NULL, xregDo = c("use", "select"), initialX = NULL, updateX = FALSE, persistenceX = NULL, transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} -\item{orderMax}{The value of the max order to check. This is the upper bound +\item{orders}{The value of the max order to check. This is the upper bound of orders, but the real orders could be lower than this because of the increasing number of parameters in the models with higher orders.} -\item{lagMax}{The value of the maximum lag to check. This should usually be +\item{lags}{The value of the maximum lag to check. This should usually be a maximum frequency of the data.} -\item{type}{Type of model. Can either be \code{"Additive"} or -\code{"Multiplicative"}. The latter means that the GUM is fitted on -log-transformed data. If \code{"Z"}, then this is selected automatically, +\item{type}{Type of model. Can either be \code{"additive"} or +\code{"multiplicative"}. The latter means that the GUM is fitted on +log-transformed data. If \code{"select"}, then this is selected automatically, which may slow down things twice.} \item{initial}{Can be either character or a vector of initial states. If it @@ -38,12 +39,12 @@ produced using backcasting procedure.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -60,28 +61,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -164,13 +165,16 @@ The resulting model can be complicated and not straightforward, because GUM allows capturing hidden orders that no ARIMA model can. It is advised to use \code{initial="b"}, because optimising GUM of arbitrary order is not a simple task. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("gum","smooth")} } \examples{ x <- rnorm(50,100,3) # The best GUM model for the data -ourModel <- auto.gum(x,orderMax=2,lagMax=4,h=18,holdout=TRUE,intervals="np") +ourModel <- auto.gum(x,orders=2,lags=4,h=18,holdout=TRUE,interval="np") summary(ourModel) forecast(ourModel) diff --git a/man/auto.msarima.Rd b/man/auto.msarima.Rd index 8f48ca6..58cb2c9 100644 --- a/man/auto.msarima.Rd +++ b/man/auto.msarima.Rd @@ -4,13 +4,13 @@ \alias{auto.msarima} \title{Automatic Multiple Seasonal ARIMA} \usage{ -auto.msarima(data, orders = list(ar = c(3, 3), i = c(2, 1), ma = c(3, - 3)), lags = c(1, frequency(data)), combine = FALSE, - workFast = TRUE, constant = NULL, initial = c("backcasting", - "optimal"), ic = c("AICc", "AIC", "BIC", "BICc"), cfType = c("MSE", - "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, - holdout = FALSE, cumulative = FALSE, intervals = c("none", - "parametric", "semiparametric", "nonparametric"), level = 0.95, +auto.msarima(y, orders = list(ar = c(3, 3), i = c(2, 1), ma = c(3, 3)), + lags = c(1, frequency(y)), combine = FALSE, fast = TRUE, + constant = NULL, initial = c("backcasting", "optimal"), + ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("MSE", "MAE", "HAM", + "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, + cumulative = FALSE, interval = c("none", "parametric", + "semiparametric", "nonparametric"), level = 0.95, occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", bounds = c("admissible", "none"), silent = c("all", "graph", @@ -19,7 +19,7 @@ auto.msarima(data, orders = list(ar = c(3, 3), i = c(2, 1), ma = c(3, transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{orders}{List of maximum orders to check, containing vector variables \code{ar}, \code{i} and \code{ma}. If a variable is not provided in the @@ -33,7 +33,7 @@ is no restrictions on the length of \code{lags} vector.} \item{combine}{If \code{TRUE}, then resulting ARIMA is combined using AIC weights.} -\item{workFast}{If \code{TRUE}, then some of the orders of ARIMA are +\item{fast}{If \code{TRUE}, then some of the orders of ARIMA are skipped. This is not advised for models with \code{lags} greater than 12.} \item{constant}{If \code{NULL}, then the function will check if constant is @@ -47,12 +47,12 @@ produced using backcasting procedure.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -69,28 +69,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -182,6 +182,9 @@ Due to the flexibility of the model, multiple seasonalities can be used. For example, something crazy like this can be constructed: SARIMA(1,1,1)(0,1,1)[24](2,0,1)[24*7](0,0,1)[24*30], but the estimation may take some time... + +For some more information about the model and its implementation, see the +vignette: \code{vignette("ssarima","smooth")} } \examples{ @@ -189,7 +192,7 @@ x <- rnorm(118,100,3) # The best ARIMA for the data ourModel <- auto.msarima(x,orders=list(ar=c(2,1),i=c(1,1),ma=c(2,1)),lags=c(1,12), - h=18,holdout=TRUE,intervals="np") + h=18,holdout=TRUE,interval="np") # The other one using optimised states \dontrun{auto.msarima(x,orders=list(ar=c(3,2),i=c(2,1),ma=c(3,2)),lags=c(1,12), diff --git a/man/auto.ssarima.Rd b/man/auto.ssarima.Rd index ff637bc..8f103ec 100644 --- a/man/auto.ssarima.Rd +++ b/man/auto.ssarima.Rd @@ -4,13 +4,13 @@ \alias{auto.ssarima} \title{State Space ARIMA} \usage{ -auto.ssarima(data, orders = list(ar = c(3, 3), i = c(2, 1), ma = c(3, - 3)), lags = c(1, frequency(data)), combine = FALSE, - workFast = TRUE, constant = NULL, initial = c("backcasting", - "optimal"), ic = c("AICc", "AIC", "BIC", "BICc"), cfType = c("MSE", - "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, - holdout = FALSE, cumulative = FALSE, intervals = c("none", - "parametric", "semiparametric", "nonparametric"), level = 0.95, +auto.ssarima(y, orders = list(ar = c(3, 3), i = c(2, 1), ma = c(3, 3)), + lags = c(1, frequency(y)), combine = FALSE, fast = TRUE, + constant = NULL, initial = c("backcasting", "optimal"), + ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("MSE", "MAE", "HAM", + "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, + cumulative = FALSE, interval = c("none", "parametric", + "semiparametric", "nonparametric"), level = 0.95, occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", bounds = c("admissible", "none"), silent = c("all", "graph", @@ -19,7 +19,7 @@ auto.ssarima(data, orders = list(ar = c(3, 3), i = c(2, 1), ma = c(3, transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{orders}{List of maximum orders to check, containing vector variables \code{ar}, \code{i} and \code{ma}. If a variable is not provided in the @@ -33,7 +33,7 @@ is no restrictions on the length of \code{lags} vector.} \item{combine}{If \code{TRUE}, then resulting ARIMA is combined using AIC weights.} -\item{workFast}{If \code{TRUE}, then some of the orders of ARIMA are +\item{fast}{If \code{TRUE}, then some of the orders of ARIMA are skipped. This is not advised for models with \code{lags} greater than 12.} \item{constant}{If \code{NULL}, then the function will check if constant is @@ -47,12 +47,12 @@ produced using backcasting procedure.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -69,28 +69,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -182,6 +182,9 @@ example, something crazy like this can be constructed: SARIMA(1,1,1)(0,1,1)[24](2,0,1)[24*7](0,0,1)[24*30], but the estimation may take a lot of time... It is recommended to use \link[smooth]{auto.msarima} in cases with more than one seasonality and high frequencies. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("ssarima","smooth")} } \examples{ @@ -189,7 +192,7 @@ x <- rnorm(118,100,3) # The best ARIMA for the data ourModel <- auto.ssarima(x,orders=list(ar=c(2,1),i=c(1,1),ma=c(2,1)),lags=c(1,12), - h=18,holdout=TRUE,intervals="np") + h=18,holdout=TRUE,interval="np") # The other one using optimised states \dontrun{auto.ssarima(x,orders=list(ar=c(3,2),i=c(2,1),ma=c(3,2)),lags=c(1,12), diff --git a/man/ces.Rd b/man/ces.Rd index eb9e447..cefe83e 100644 --- a/man/ces.Rd +++ b/man/ces.Rd @@ -4,11 +4,11 @@ \alias{ces} \title{Complex Exponential Smoothing} \usage{ -ces(data, seasonality = c("none", "simple", "partial", "full"), +ces(y, seasonality = c("none", "simple", "partial", "full"), initial = c("optimal", "backcasting"), A = NULL, B = NULL, - ic = c("AICc", "AIC", "BIC", "BICc"), cfType = c("MSE", "MAE", "HAM", + ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, - cumulative = FALSE, intervals = c("none", "parametric", + cumulative = FALSE, interval = c("none", "parametric", "semiparametric", "nonparametric"), level = 0.95, occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", @@ -18,7 +18,7 @@ ces(data, seasonality = c("none", "simple", "partial", "full"), transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{seasonality}{The type of seasonality used in CES. Can be: \code{none} - No seasonality; \code{simple} - Simple seasonality, using lagged CES @@ -46,12 +46,12 @@ complex number.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -68,28 +68,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -183,17 +183,17 @@ provided parameters will take this into account. \item \code{fitted} - the fitted values of CES. \item \code{forecast} - the point forecast of CES. \item \code{lower} - the lower bound of prediction interval. When -\code{intervals="none"} then NA is returned. +\code{interval="none"} then NA is returned. \item \code{upper} - the upper bound of prediction interval. When -\code{intervals="none"} then NA is returned. +\code{interval="none"} then NA is returned. \item \code{residuals} - the residuals of the estimated model. \item \code{errors} - The matrix of 1 to h steps ahead errors. \item \code{s2} - variance of the residuals (taking degrees of freedom into account). -\item \code{intervals} - type of intervals asked by user. -\item \code{level} - confidence level for intervals. +\item \code{interval} - type of interval asked by user. +\item \code{level} - confidence level for interval. \item \code{cumulative} - whether the produced forecast was cumulative or not. -\item \code{actuals} - The data provided in the call of the function. +\item \code{y} - The data provided in the call of the function. \item \code{holdout} - the holdout part of the original data. \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. If the model is non-intermittent, then occurrence is \code{NULL}. @@ -208,8 +208,8 @@ exogenous variables were estimated as well. \item \code{ICs} - values of information criteria of the model. Includes AIC, AICc, BIC and BICc. \item \code{logLik} - log-likelihood of the function. -\item \code{cf} - Cost function value. -\item \code{cfType} - Type of cost function used in the estimation. +\item \code{lossValue} - Cost function value. +\item \code{loss} - Type of loss function used in the estimation. \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} or when \code{FI} is not provided at all. \item \code{accuracy} - vector of accuracy measures for the holdout sample. In @@ -229,6 +229,9 @@ The function estimates Complex Exponential Smoothing in the state space 2 described in Svetunkov, Kourentzes (2017) with the information potential equal to the approximation error. The estimation of initial states of xt is done using backcast. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("ces","smooth")} } \examples{ @@ -237,20 +240,20 @@ ces(y,h=20,holdout=TRUE) ces(y,h=20,holdout=FALSE) y <- 500 - c(1:100)*0.5 + rnorm(100,10,3) -ces(y,h=20,holdout=TRUE,intervals="p",bounds="a") +ces(y,h=20,holdout=TRUE,interval="p",bounds="a") library("Mcomp") y <- ts(c(M3$N0740$x,M3$N0740$xx),start=start(M3$N0740$x),frequency=frequency(M3$N0740$x)) -ces(y,h=8,holdout=TRUE,seasonality="s",intervals="sp",level=0.8) +ces(y,h=8,holdout=TRUE,seasonality="s",interval="sp",level=0.8) \dontrun{y <- ts(c(M3$N1683$x,M3$N1683$xx),start=start(M3$N1683$x),frequency=frequency(M3$N1683$x)) -ces(y,h=18,holdout=TRUE,seasonality="s",intervals="sp") -ces(y,h=18,holdout=TRUE,seasonality="p",intervals="np") -ces(y,h=18,holdout=TRUE,seasonality="f",intervals="p")} +ces(y,h=18,holdout=TRUE,seasonality="s",interval="sp") +ces(y,h=18,holdout=TRUE,seasonality="p",interval="np") +ces(y,h=18,holdout=TRUE,seasonality="f",interval="p")} \dontrun{x <- cbind(c(rep(0,25),1,rep(0,43)),c(rep(0,10),1,rep(0,58))) ces(ts(c(M3$N1457$x,M3$N1457$xx),frequency=12),h=18,holdout=TRUE, - intervals="np",xreg=x,cfType="TMSE")} + interval="np",xreg=x,loss="TMSE")} # Exogenous variables in CES \dontrun{x <- cbind(c(rep(0,25),1,rep(0,43)),c(rep(0,10),1,rep(0,58))) diff --git a/man/cma.Rd b/man/cma.Rd index 5f7ff78..95f899e 100644 --- a/man/cma.Rd +++ b/man/cma.Rd @@ -4,10 +4,10 @@ \alias{cma} \title{Centered Moving Average} \usage{ -cma(data, order = NULL, silent = TRUE) +cma(y, order = NULL, silent = TRUE, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be smoothed.} +\item{y}{Vector or ts object, containing data needed to be smoothed.} \item{order}{Order of centered moving average. If \code{NULL}, then the function will try to select order of SMA based on information criteria. @@ -15,6 +15,8 @@ See \link[smooth]{sma} for details.} \item{silent}{If \code{TRUE}, then plot is not produced. Otherwise, there is a plot...} + +\item{...}{Nothing. Needed only for the transition to the new name of variables.} } \value{ Object of class "smooth" is returned. It contains the list of the @@ -32,12 +34,12 @@ provided parameters will take this into account. \item \code{residuals} - the residuals of the SMA / AR model. \item \code{s2} - variance of the residuals (taking degrees of freedom into account) of the SMA / AR model. -\item \code{actuals} - the original data. +\item \code{y} - the original data. \item \code{ICs} - values of information criteria from the respective SMA or AR model. Includes AIC, AICc, BIC and BICc. \item \code{logLik} - log-likelihood of the SMA / AR model. -\item \code{cf} - Cost function value (for the SMA / AR model). -\item \code{cfType} - Type of cost function used in the estimation. +\item \code{lossValue} - Cost function value (for the SMA / AR model). +\item \code{loss} - Type of loss function used in the estimation. } } \description{ @@ -61,19 +63,19 @@ function! } \examples{ -# SMA of specific order -ourModel <- sma(rnorm(118,100,3),order=12,h=18,holdout=TRUE,intervals="p") +# CMA of specific order +ourModel <- cma(rnorm(118,100,3),order=12) -# SMA of arbitrary order -ourModel <- sma(rnorm(118,100,3),h=18,holdout=TRUE,intervals="sp") +# CMA of arbitrary order +ourModel <- cma(rnorm(118,100,3)) summary(ourModel) -forecast(ourModel) -plot(forecast(ourModel)) } \references{ \itemize{ +\item Svetunkov I. (2015 - Inf) "smooth" package for R - series of posts about the underlying +models and how to use them: \url{https://forecasting.svetunkov.ru/en/tag/smooth/}. \item Svetunkov I. (2017). Statistical models underlying functions of 'smooth' package for R. Working Paper of Department of Management Science, Lancaster University 2017:1, 1-52. @@ -86,8 +88,6 @@ University 2017:1, 1-52. \author{ Ivan Svetunkov, \email{ivan@svetunkov.ru} } -\keyword{ARIMA} -\keyword{SARIMA} \keyword{models} \keyword{nonlinear} \keyword{regression} diff --git a/man/es.Rd b/man/es.Rd index c33f482..fd5b037 100644 --- a/man/es.Rd +++ b/man/es.Rd @@ -4,11 +4,11 @@ \alias{es} \title{Exponential Smoothing in SSOE state space model} \usage{ -es(data, model = "ZZZ", persistence = NULL, phi = NULL, +es(y, model = "ZZZ", persistence = NULL, phi = NULL, initial = c("optimal", "backcasting"), initialSeason = NULL, - ic = c("AICc", "AIC", "BIC", "BICc"), cfType = c("MSE", "MAE", "HAM", + ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, - cumulative = FALSE, intervals = c("none", "parametric", + cumulative = FALSE, interval = c("none", "parametric", "semiparametric", "nonparametric"), level = 0.95, occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", @@ -18,13 +18,16 @@ es(data, model = "ZZZ", persistence = NULL, phi = NULL, transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} - -\item{model}{The type of ETS model. Can consist of 3 or 4 chars: \code{ANN}, -\code{AAN}, \code{AAdN}, \code{AAA}, \code{AAdA}, \code{MAdM} etc. -\code{ZZZ} means that the model will be selected based on the chosen -information criteria type. Models pool can be restricted with additive only -components. This is done via \code{model="XXX"}. For example, making +\item{y}{Vector or ts object, containing data needed to be forecasted.} + +\item{model}{The type of ETS model. The first letter stands for the type of +the error term ("A" or "M"), the second (and sometimes the third as well) is for +the trend ("N", "A", "Ad", "M" or "Md"), and the last one is for the type of +seasonality ("N", "A" or "M"). So, the function accepts words with 3 or 4 +characters: \code{ANN}, \code{AAN}, \code{AAdN}, \code{AAA}, \code{AAdA}, +\code{MAdM} etc. \code{ZZZ} means that the model will be selected based on the +chosen information criteria type. Models pool can be restricted with additive +only components. This is done via \code{model="XXX"}. For example, making selection between models with none / additive / damped additive trend component only (i.e. excluding multiplicative trend) can be done with \code{model="ZXZ"}. Furthermore, selection between multiplicative models @@ -66,12 +69,12 @@ by \code{initial}.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -88,28 +91,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -212,9 +215,9 @@ provided parameters will take this into account. \item \code{fitted} - fitted values of ETS. In case of the intermittent model, the fitted are multiplied by the probability of occurrence. \item \code{forecast} - point forecast of ETS. -\item \code{lower} - lower bound of prediction interval. When \code{intervals="none"} +\item \code{lower} - lower bound of prediction interval. When \code{interval="none"} then NA is returned. -\item \code{upper} - higher bound of prediction interval. When \code{intervals="none"} +\item \code{upper} - higher bound of prediction interval. When \code{interval="none"} then NA is returned. \item \code{residuals} - residuals of the estimated model. \item \code{errors} - trace forecast in-sample errors, returned as a matrix. In the @@ -222,10 +225,10 @@ case of trace forecasts this is the matrix used in optimisation. In non-trace es it is returned just for the information. \item \code{s2} - variance of the residuals (taking degrees of freedom into account). This is an unbiased estimate of variance. -\item \code{intervals} - type of intervals asked by user. -\item \code{level} - confidence level for intervals. +\item \code{interval} - type of interval asked by user. +\item \code{level} - confidence level for interval. \item \code{cumulative} - whether the produced forecast was cumulative or not. -\item \code{actuals} - original data. +\item \code{y} - original data. \item \code{holdout} - holdout part of the original data. \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. If the model is non-intermittent, then occurrence is \code{NULL}. @@ -238,8 +241,8 @@ estimated as well. \item \code{transitionX} - transition matrix F for exogenous variables. \item \code{ICs} - values of information criteria of the model. Includes AIC, AICc, BIC and BICc. \item \code{logLik} - concentrated log-likelihood of the function. -\item \code{cf} - cost function value. -\item \code{cfType} - type of cost function used in the estimation. +\item \code{lossValue} - loss function value. +\item \code{loss} - type of loss function used in the estimation. \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} or when \code{FI} is not provided at all. \item \code{accuracy} - vector of accuracy measures for the holdout sample. In @@ -263,15 +266,15 @@ shorter list of values is returned: \item \code{upper}, \item \code{residuals}, \item \code{s2} - variance of additive error of combined one-step-ahead forecasts, -\item \code{intervals}, +\item \code{interval}, \item \code{level}, \item \code{cumulative}, -\item \code{actuals}, +\item \code{y}, \item \code{holdout}, \item \code{occurrence}, \item \code{ICs} - combined ic, \item \code{ICw} - ic weights used in the combination, -\item \code{cfType}, +\item \code{loss}, \item \code{xreg}, \item \code{accuracy}. } @@ -298,14 +301,22 @@ function. \eqn{a_t} is the vector of parameters for exogenous variables, \code{persistenceX} matrix. Finally, \eqn{\epsilon_{t}} is the error term. For the details see Hyndman et al.(2008). + +For some more information about the model and its implementation, see the +vignette: \code{vignette("es","smooth")}. + +Also, there are posts about the functions of the package smooth on the +website of Ivan Svetunkov: +\url{https://forecasting.svetunkov.ru/en/tag/smooth/} - they explain the +underlying models and how to use the functions. } \examples{ library(Mcomp) # See how holdout and trace parameters influence the forecast -es(M3$N1245$x,model="AAdN",h=8,holdout=FALSE,cfType="MSE") -\dontrun{es(M3$N2568$x,model="MAM",h=18,holdout=TRUE,cfType="TMSE")} +es(M3$N1245$x,model="AAdN",h=8,holdout=FALSE,loss="MSE") +\dontrun{es(M3$N2568$x,model="MAM",h=18,holdout=TRUE,loss="TMSE")} # Model selection example es(M3$N1245$x,model="ZZN",ic="AIC",h=8,holdout=FALSE,bounds="a") @@ -323,16 +334,16 @@ es(M3$N1683$x,"MAdM",h=10,holdout=TRUE)} # Model selection using a specified pool of models ourModel <- es(M3$N1587$x,model=c("ANN","AAM","AMdA"),h=18) -# Redo previous model and produce prediction intervals -es(M3$N1587$x,model=ourModel,h=18,intervals="p") +# Redo previous model and produce prediction interval +es(M3$N1587$x,model=ourModel,h=18,interval="p") -# Semiparametric intervals example -\dontrun{es(M3$N1587$x,h=18,holdout=TRUE,intervals="sp")} +# Semiparametric interval example +\dontrun{es(M3$N1587$x,h=18,holdout=TRUE,interval="sp")} # Exogenous variables in ETS example \dontrun{x <- cbind(c(rep(0,25),1,rep(0,43)),c(rep(0,10),1,rep(0,58))) y <- ts(c(M3$N1457$x,M3$N1457$xx),frequency=12) -es(y,h=18,holdout=TRUE,xreg=x,cfType="aTMSE",intervals="np") +es(y,h=18,holdout=TRUE,xreg=x,loss="aTMSE",interval="np") ourModel <- es(ts(c(M3$N1457$x,M3$N1457$xx),frequency=12),h=18,holdout=TRUE,xreg=x,updateX=TRUE)} # This will be the same model as in previous line but estimated on new portion of data diff --git a/man/forecast.smooth.Rd b/man/forecast.smooth.Rd index a0fab89..97f076e 100644 --- a/man/forecast.smooth.Rd +++ b/man/forecast.smooth.Rd @@ -5,7 +5,7 @@ \alias{forecast} \title{Forecasting time series using smooth functions} \usage{ -\method{forecast}{smooth}(object, h = 10, intervals = c("parametric", +\method{forecast}{smooth}(object, h = 10, interval = c("parametric", "semiparametric", "nonparametric", "none"), level = 0.95, ...) } \arguments{ @@ -13,7 +13,7 @@ \item{h}{Forecast horizon} -\item{intervals}{Type of intervals to construct. See \link[smooth]{es} for +\item{interval}{Type of interval to construct. See \link[smooth]{es} for details.} \item{level}{Confidence level. Defines width of prediction interval.} @@ -28,13 +28,13 @@ Returns object of class "smooth.forecast", which contains: \item \code{model} - the estimated model (ES / CES / GUM / SSARIMA). \item \code{method} - the name of the estimated model (ES / CES / GUM / SSARIMA). \item \code{fitted} - fitted values of the model. -\item \code{actuals} - actuals provided in the call of the model. +\item \code{y} - actual values provided in the call of the model. \item \code{forecast} aka \code{mean} - point forecasts of the model (conditional mean). -\item \code{lower} - lower bound of prediction intervals. -\item \code{upper} - upper bound of prediction intervals. +\item \code{lower} - lower bound of prediction interval. +\item \code{upper} - upper bound of prediction interval. \item \code{level} - confidence level. -\item \code{intervals} - binary variable (whether intervals were produced or not). +\item \code{interval} - binary variable (whether interval were produced or not). \item \code{residuals} - the residuals of the original model. } } @@ -53,8 +53,8 @@ function, then go ahead! ourModel <- ces(rnorm(100,0,1),h=10) forecast.smooth(ourModel,h=10) -forecast.smooth(ourModel,h=10,intervals=TRUE) -plot(forecast.smooth(ourModel,h=10,intervals=TRUE)) +forecast.smooth(ourModel,h=10,interval=TRUE) +plot(forecast.smooth(ourModel,h=10,interval=TRUE)) } \references{ diff --git a/man/gsi.Rd b/man/gsi.Rd index 968c5ae..3be647c 100644 --- a/man/gsi.Rd +++ b/man/gsi.Rd @@ -4,15 +4,15 @@ \alias{gsi} \title{Vector exponential smoothing model with Group Seasonal Indices} \usage{ -gsi(data, model = "MNM", weights = 1/ncol(data), type = c(3, 2, 1), - cfType = c("likelihood", "diagonal", "trace"), ic = c("AICc", "AIC", - "BIC", "BICc"), h = 10, holdout = FALSE, intervals = c("none", +gsi(y, model = "MNM", weights = 1/ncol(y), type = c(3, 2, 1), + loss = c("likelihood", "diagonal", "trace"), ic = c("AICc", "AIC", + "BIC", "BICc"), h = 10, holdout = FALSE, interval = c("none", "conditional", "unconditional", "independent"), level = 0.95, bounds = c("admissible", "usual", "none"), silent = c("all", "graph", "output", "none"), ...) } \arguments{ -\item{data}{The matrix with data, where series are in columns and +\item{y}{The matrix with data, where series are in columns and observations are in rows.} \item{model}{The type of seasonal ETS model. Currently only "MMM" is available.} @@ -22,7 +22,7 @@ the number of time series in the model.} \item{type}{Type of the GSI model. Can be "Model 1", "Model 2" or "Model 3".} -\item{cfType}{Type of Cost Function used in optimization. \code{cfType} can +\item{loss}{Type of Cost Function used in optimization. \code{loss} can be: \itemize{ \item \code{likelihood} - which assumes the minimisation of the determinant @@ -42,25 +42,25 @@ is minimised in this case. \item{holdout}{If \code{TRUE}, holdout sample of size \code{h} is taken from the end of the data.} -\item{intervals}{Type of intervals to construct. NOT AVAILABLE YET! +\item{interval}{Type of interval to construct. NOT AVAILABLE YET! This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{conditional}, \code{c} - produces multidimensional elliptic -intervals for each step ahead forecast. +interval for each step ahead forecast. \item \code{unconditional}, \code{u} - produces separate bounds for each series based on ellipses for each step ahead. These bounds correspond to min and max values of the ellipse assuming that all the other series but one take values in the centre of the ellipse. This leads to less accurate estimates of bounds -(wider intervals than needed), but these could still be useful. -\item \code{independent}, \code{i} - produces intervals based on variances of +(wider interval than needed), but these could still be useful. +\item \code{independent}, \code{i} - produces interval based on variances of each separate series. This does not take vector structure into account. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -conditional intervals are constructed, while the latter is equivalent to +conditional interval are constructed, while the latter is equivalent to \code{none}.} \item{level}{Confidence level. Defines width of prediction interval.} @@ -97,7 +97,7 @@ values: \item \code{initial} - The initial values of the non-seasonal components; \item \code{initialSeason} - The initial values of the seasonal components; \item \code{nParam} - The number of estimated parameters; -\item \code{actuals} - The matrix with the original data; +\item \code{y} - The matrix with the original data; \item \code{fitted} - The matrix of the fitted values; \item \code{holdout} - The matrix with the holdout values (if \code{holdout=TRUE} in the estimation); @@ -105,13 +105,13 @@ the estimation); \item \code{Sigma} - The covariance matrix of the errors (estimated with the correction for the number of degrees of freedom); \item \code{forecast} - The matrix of point forecasts; -\item \code{PI} - The bounds of the prediction intervals; -\item \code{intervals} - The type of the constructed prediction intervals; -\item \code{level} - The level of the confidence for the prediction intervals; +\item \code{PI} - The bounds of the prediction interval; +\item \code{interval} - The type of the constructed prediction interval; +\item \code{level} - The level of the confidence for the prediction interval; \item \code{ICs} - The values of the information criteria; \item \code{logLik} - The log-likelihood function; -\item \code{cf} - The value of the cost function; -\item \code{cfType} - The type of the used cost function; +\item \code{lossValue} - The value of the loss function; +\item \code{loss} - The type of the used loss function; \item \code{accuracy} - the values of the error measures. Currently not available. \item \code{FI} - Fisher information if user asked for it using \code{FI=TRUE}. } @@ -125,6 +125,9 @@ model, restricting the seasonal indices. The model is based on \link[smooth]{ves In case of multiplicative model, instead of the vector y_t we use its logarithms. As a result the multiplicative model is much easier to work with. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("ves","smooth")} } \examples{ diff --git a/man/gum.Rd b/man/gum.Rd index 4dab6bc..4757ed7 100644 --- a/man/gum.Rd +++ b/man/gum.Rd @@ -5,13 +5,13 @@ \alias{ges} \title{Generalised Univariate Model} \usage{ -gum(data, orders = c(1, 1), lags = c(1, frequency(data)), - type = c("A", "M"), persistence = NULL, transition = NULL, - measurement = NULL, initial = c("optimal", "backcasting"), - ic = c("AICc", "AIC", "BIC", "BICc"), cfType = c("MSE", "MAE", "HAM", - "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, - cumulative = FALSE, intervals = c("none", "parametric", - "semiparametric", "nonparametric"), level = 0.95, +gum(y, orders = c(1, 1), lags = c(1, frequency(y)), + type = c("additive", "multiplicative"), persistence = NULL, + transition = NULL, measurement = NULL, initial = c("optimal", + "backcasting"), ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("MSE", + "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, + holdout = FALSE, cumulative = FALSE, interval = c("none", + "parametric", "semiparametric", "nonparametric"), level = 0.95, occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", bounds = c("restricted", "admissible", "none"), silent = c("all", @@ -22,7 +22,7 @@ gum(data, orders = c(1, 1), lags = c(1, frequency(data)), ges(...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{orders}{Order of the model. Specified as vector of number of states with different lags. For example, \code{orders=c(1,1)} means that there are @@ -56,12 +56,12 @@ produced using backcasting procedure.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -78,28 +78,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -194,17 +194,17 @@ smoothing parameters live. \item \code{fitted} - fitted values. \item \code{forecast} - point forecast. \item \code{lower} - lower bound of prediction interval. When -\code{intervals="none"} then NA is returned. +\code{interval="none"} then NA is returned. \item \code{upper} - higher bound of prediction interval. When -\code{intervals="none"} then NA is returned. +\code{interval="none"} then NA is returned. \item \code{residuals} - the residuals of the estimated model. \item \code{errors} - matrix of 1 to h steps ahead errors. \item \code{s2} - variance of the residuals (taking degrees of freedom into account). -\item \code{intervals} - type of intervals asked by user. -\item \code{level} - confidence level for intervals. +\item \code{interval} - type of interval asked by user. +\item \code{level} - confidence level for interval. \item \code{cumulative} - whether the produced forecast was cumulative or not. -\item \code{actuals} - original data. +\item \code{y} - original data. \item \code{holdout} - holdout part of the original data. \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. If the model is non-intermittent, then occurrence is \code{NULL}. @@ -218,8 +218,8 @@ were estimated as well. \item \code{ICs} - values of information criteria of the model. Includes AIC, AICc, BIC and BICc. \item \code{logLik} - log-likelihood of the function. -\item \code{cf} - Cost function value. -\item \code{cfType} - Type of cost function used in the estimation. +\item \code{lossValue} - Cost function value. +\item \code{loss} - Type of loss function used in the estimation. \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} or when \code{FI} variable is not provided at all. \item \code{accuracy} - vector of accuracy measures for the holdout sample. @@ -252,24 +252,27 @@ vector of exogenous parameters. \eqn{w} is the \code{measurement} vector, vector, \eqn{a_t} is the vector of parameters for exogenous variables, \eqn{F_{X}} is the \code{transitionX} matrix and \eqn{g_{X}} is the \code{persistenceX} matrix. Finally, \eqn{\epsilon_{t}} is the error term. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("gum","smooth")} } \examples{ # Something simple: -gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="a",intervals="p") +gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="a",interval="p") # A more complicated model with seasonality \dontrun{ourModel <- gum(rnorm(118,100,3),orders=c(2,1),lags=c(1,4),h=18,holdout=TRUE)} -# Redo previous model on a new data and produce prediction intervals -\dontrun{gum(rnorm(118,100,3),model=ourModel,h=18,intervals="sp")} +# Redo previous model on a new data and produce prediction interval +\dontrun{gum(rnorm(118,100,3),model=ourModel,h=18,interval="sp")} # Produce something crazy with optimal initials (not recommended) \dontrun{gum(rnorm(118,100,3),orders=c(1,1,1),lags=c(1,3,5),h=18,holdout=TRUE,initial="o")} -# Simpler model estiamted using trace forecast error cost function and its analytical analogue -\dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",cfType="TMSE") -gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",cfType="aTMSE")} +# Simpler model estiamted using trace forecast error loss function and its analytical analogue +\dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",loss="TMSE") +gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",loss="aTMSE")} # Introduce exogenous variables \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,xreg=c(1:118))} @@ -278,8 +281,8 @@ gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,bounds="n",cfType=" \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,xreg=c(1:118),updateX=TRUE)} # Do the same but now let's shrink parameters... -\dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,xreg=c(1:118),updateX=TRUE,cfType="TMSE") -ourModel <- gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,cfType="aTMSE")} +\dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,xreg=c(1:118),updateX=TRUE,loss="TMSE") +ourModel <- gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,loss="aTMSE")} # Or select the most appropriate one \dontrun{gum(rnorm(118,100,3),orders=c(1),lags=c(1),h=18,holdout=TRUE,xreg=c(1:118),xregDo="s") @@ -291,6 +294,8 @@ plot(forecast(ourModel))} } \references{ \itemize{ +\item Svetunkov I. (2015 - Inf) "smooth" package for R - series of posts about the underlying +models and how to use them: \url{https://forecasting.svetunkov.ru/en/tag/smooth/}. \item Svetunkov I. (2017). Statistical models underlying functions of 'smooth' package for R. Working Paper of Department of Management Science, Lancaster University 2017:1, 1-52. diff --git a/man/iss.Rd b/man/iss.Rd index 235b9d2..a135db8 100644 --- a/man/iss.Rd +++ b/man/iss.Rd @@ -52,7 +52,7 @@ values: \item \code{logLik} - likelihood value for the model \item \code{nParam} - number of parameters used in the model; \item \code{residuals} - residuals of the model; -\item \code{actuals} - actual values of probabilities (zeros and ones). +\item \code{y} - actual values of probabilities (zeros and ones). \item \code{persistence} - the vector of smoothing parameters; \item \code{initial} - initial values of the state vector; \item \code{initialSeason} - the matrix of initials seasonal states; diff --git a/man/msarima.Rd b/man/msarima.Rd index 8640773..54ea072 100644 --- a/man/msarima.Rd +++ b/man/msarima.Rd @@ -4,12 +4,12 @@ \alias{msarima} \title{Multiple Seasonal ARIMA} \usage{ -msarima(data, orders = list(ar = c(0), i = c(1), ma = c(1)), - lags = c(1), constant = FALSE, AR = NULL, MA = NULL, +msarima(y, orders = list(ar = c(0), i = c(1), ma = c(1)), lags = c(1), + constant = FALSE, AR = NULL, MA = NULL, initial = c("backcasting", "optimal"), ic = c("AICc", "AIC", "BIC", - "BICc"), cfType = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", + "BICc"), loss = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, cumulative = FALSE, - intervals = c("none", "parametric", "semiparametric", "nonparametric"), + interval = c("none", "parametric", "semiparametric", "nonparametric"), level = 0.95, occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", bounds = c("admissible", "none"), silent = c("all", "graph", @@ -18,7 +18,7 @@ msarima(data, orders = list(ar = c(0), i = c(1), ma = c(1)), transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{orders}{List of orders, containing vector variables \code{ar}, \code{i} and \code{ma}. Example: @@ -56,12 +56,12 @@ produced using backcasting procedure.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -78,28 +78,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -196,17 +196,17 @@ provided parameters will take this into account. \item \code{fitted} - the fitted values. \item \code{forecast} - the point forecast. \item \code{lower} - the lower bound of prediction interval. When -\code{intervals="none"} then NA is returned. +\code{interval="none"} then NA is returned. \item \code{upper} - the higher bound of prediction interval. When -\code{intervals="none"} then NA is returned. +\code{interval="none"} then NA is returned. \item \code{residuals} - the residuals of the estimated model. \item \code{errors} - The matrix of 1 to h steps ahead errors. \item \code{s2} - variance of the residuals (taking degrees of freedom into account). -\item \code{intervals} - type of intervals asked by user. -\item \code{level} - confidence level for intervals. +\item \code{interval} - type of interval asked by user. +\item \code{level} - confidence level for interval. \item \code{cumulative} - whether the produced forecast was cumulative or not. -\item \code{actuals} - the original data. +\item \code{y} - the original data. \item \code{holdout} - the holdout part of the original data. \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. If the model is non-intermittent, then occurrence is \code{NULL}. @@ -222,8 +222,8 @@ variables. \item \code{ICs} - values of information criteria of the model. Includes AIC, AICc, BIC and BICc. \item \code{logLik} - log-likelihood of the function. -\item \code{cf} - Cost function value. -\item \code{cfType} - Type of cost function used in the estimation. +\item \code{lossValue} - Cost function value. +\item \code{loss} - Type of loss function used in the estimation. \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} or when \code{FI} is not provided at all. \item \code{accuracy} - vector of accuracy measures for the holdout sample. @@ -280,11 +280,13 @@ example, something crazy like this can be constructed: SARIMA(1,1,1)(0,1,1)[24](2,0,1)[24*7](0,0,1)[24*30], but the estimation may take some time... Still this should be estimated in finite time (not like with \code{ssarima}). + +For some additional details see the vignette: \code{vignette("ssarima","smooth")} } \examples{ # The previous one is equivalent to: -ourModel <- msarima(rnorm(118,100,3),orders=c(1,1,1),lags=1,h=18,holdout=TRUE,intervals="p") +ourModel <- msarima(rnorm(118,100,3),orders=c(1,1,1),lags=1,h=18,holdout=TRUE,interval="p") # Example of SARIMA(2,0,0)(1,0,0)[4] msarima(rnorm(118,100,3),orders=list(ar=c(2,1)),lags=c(1,4),h=18,holdout=TRUE) @@ -293,7 +295,7 @@ msarima(rnorm(118,100,3),orders=list(ar=c(2,1)),lags=c(1,4),h=18,holdout=TRUE) ourModel <- msarima(AirPassengers,orders=list(ar=c(1,0,3),i=c(1,0,1),ma=c(0,1,2)), lags=c(1,6,12),h=10,holdout=TRUE,FI=TRUE) -# Construct the 95\% confidence intervals for the parameters of the model +# Construct the 95\% confidence interval for the parameters of the model ourCoefs <- coef(ourModel) ourCoefsSD <- sqrt(abs(diag(solve(ourModel$FI)))) # Sort values accordingly @@ -304,9 +306,9 @@ colnames(ourConfInt) <- c("2.25\%","97.5\%") ourConfInt # ARIMA(1,1,1) with Mean Squared Trace Forecast Error -msarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,cfType="TMSE") +msarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,loss="TMSE") -msarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,cfType="aTMSE") +msarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,loss="aTMSE") # SARIMA(0,1,1) with exogenous variables with crazy estimation of xreg ourModel <- msarima(rnorm(118,100,3),orders=list(i=1,ma=1),h=18,holdout=TRUE, diff --git a/man/oes.Rd b/man/oes.Rd index 8c31242..6b1c363 100644 --- a/man/oes.Rd +++ b/man/oes.Rd @@ -4,21 +4,25 @@ \alias{oes} \title{Occurrence ETS model} \usage{ -oes(data, model = "MNN", persistence = NULL, initial = "o", +oes(y, model = "MNN", persistence = NULL, initial = "o", initialSeason = NULL, phi = NULL, occurrence = c("fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct", "auto", "none"), ic = c("AICc", "AIC", "BIC", "BICc"), h = 10, holdout = FALSE, - intervals = c("none", "parametric", "semiparametric", "nonparametric"), + interval = c("none", "parametric", "semiparametric", "nonparametric"), level = 0.95, bounds = c("usual", "admissible", "none"), silent = c("all", "graph", "legend", "output", "none"), xreg = NULL, xregDo = c("use", "select"), initialX = NULL, updateX = FALSE, transitionX = NULL, persistenceX = NULL, ...) } \arguments{ -\item{data}{Either numeric vector or time series vector.} +\item{y}{Either numeric vector or time series vector.} \item{model}{The type of ETS model used for the estimation. Normally this should -be \code{"MNN"} or any other pure multiplicative model.} +be \code{"MNN"} or any other pure multiplicative or additive model. The model +selection is available here (although it's not fast), so you can use, for example, +\code{"YYN"} and \code{"XXN"} for selecting between the pure multiplicative and +pure additive models respectively. Using mixed models is possible, but not +recommended.} \item{persistence}{Persistence vector \eqn{g}, containing smoothing parameters. If \code{NULL}, then estimated.} @@ -52,25 +56,25 @@ parameters (initials and smoothing).} \item{holdout}{If \code{TRUE}, holdout sample of size \code{h} is taken from the end of the data.} -\item{intervals}{The type of intervals to construct. This can be: +\item{interval}{The type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{The confidence level. Defines width of prediction interval.} @@ -126,17 +130,18 @@ values: \itemize{ \item \code{model} - the type of the estimated ETS model; +\item \code{timeElapsed} - the time elapsed for the construction of the model; \item \code{fitted} - the fitted values for the probability; -\item \code{fittedBeta} - the fitted values of the underlying ETS model, where applicable +\item \code{fittedModel} - the fitted values of the underlying ETS model, where applicable (only for occurrence=c("o","i","d")); \item \code{forecast} - the forecast of the probability for \code{h} observations ahead; -\item \code{forecastBeta} - the forecast of the underlying ETS model, where applicable +\item \code{forecastModel} - the forecast of the underlying ETS model, where applicable (only for occurrence=c("o","i","d")); \item \code{states} - the values of the state vector; \item \code{logLik} - the log-likelihood value of the model; \item \code{nParam} - the number of parameters in the model (the matrix is returned); \item \code{residuals} - the residuals of the model; -\item \code{actuals} - actual values of occurrence (zeros and ones). +\item \code{y} - actual values of occurrence (zeros and ones). \item \code{persistence} - the vector of smoothing parameters; \item \code{phi} - the value of the damped trend parameter; \item \code{initial} - initial values of the state vector; @@ -156,8 +161,10 @@ probability update and model types. } \details{ The function estimates probability of demand occurrence, using the selected -ETS state space models. Although the function accepts all types of ETS models, -only the pure multiplicative models make sense. +ETS state space models. + +For the details about the model and its implementation, see the respective +vignette: \code{vignette("oes","smooth")} } \examples{ diff --git a/man/oesg.Rd b/man/oesg.Rd index 6b5824e..d6f6e6a 100644 --- a/man/oesg.Rd +++ b/man/oesg.Rd @@ -4,11 +4,11 @@ \alias{oesg} \title{Occurrence ETS, general model} \usage{ -oesg(data, modelA = "MNN", modelB = "MNN", persistenceA = NULL, +oesg(y, modelA = "MNN", modelB = "MNN", persistenceA = NULL, persistenceB = NULL, phiA = NULL, phiB = NULL, initialA = "o", initialB = "o", initialSeasonA = NULL, initialSeasonB = NULL, ic = c("AICc", "AIC", "BIC", "BICc"), h = 10, holdout = FALSE, - intervals = c("none", "parametric", "semiparametric", "nonparametric"), + interval = c("none", "parametric", "semiparametric", "nonparametric"), level = 0.95, bounds = c("usual", "admissible", "none"), silent = c("all", "graph", "legend", "output", "none"), xregA = NULL, xregB = NULL, initialXA = NULL, initialXB = NULL, @@ -18,7 +18,7 @@ oesg(data, modelA = "MNN", modelB = "MNN", persistenceA = NULL, ...) } \arguments{ -\item{data}{Either numeric vector or time series vector.} +\item{y}{Either numeric vector or time series vector.} \item{modelA}{The type of the ETS for the model A.} @@ -55,23 +55,23 @@ model B. If \code{NULL}, then it is estimated.} \item{holdout}{If \code{TRUE}, holdout sample of size \code{h} is taken from the end of the data.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}.} \item{level}{Confidence level. Defines width of prediction interval.} @@ -144,9 +144,9 @@ The object of class "occurrence" is returned. It contains following list of values: \itemize{ -\item \code{modelA} - the model A of the class oesg, that contains the output similar +\item \code{modelA} - the model A of the class oes, that contains the output similar to the one from the \code{oes()} function; -\item \code{modelB} - the model B of the class oesg, that contains the output similar +\item \code{modelB} - the model B of the class oes, that contains the output similar to the one from the \code{oes()} function. } } @@ -157,8 +157,10 @@ Function returns the general occurrence model of the of iETS model. The function estimates probability of demand occurrence, based on the iETS_G state-space model. It involves the estimation and modelling of the two simultaneous state space equations. Thus two parts for the model type, -persistence, initials etc. Although the function accepts all types -of ETS models, only the pure multiplicative models make sense. +persistence, initials etc. + +For the details about the model and its implementation, see the respective +vignette: \code{vignette("oes","smooth")} The model is based on: diff --git a/man/pls.Rd b/man/pls.Rd index 05534b4..0e5114a 100644 --- a/man/pls.Rd +++ b/man/pls.Rd @@ -35,7 +35,7 @@ based on the distribution of 1 to h steps ahead forecast errors is used in the p # Generate data, apply es() with the holdout parameter and calculate PLS x <- rnorm(100,0,1) -ourModel <- es(x, h=10, holdout=TRUE, intervals=TRUE) +ourModel <- es(x, h=10, holdout=TRUE, interval=TRUE) pls(ourModel, type="a") pls(ourModel, type="e") pls(ourModel, type="s", obs=100, nsim=100) diff --git a/man/sim.ces.Rd b/man/sim.ces.Rd index 246edfc..626b888 100644 --- a/man/sim.ces.Rd +++ b/man/sim.ces.Rd @@ -82,6 +82,10 @@ a vector or a matrix... Function generates data using CES with Single Source of Error as a data generating process. } +\details{ +For the information about the function, see the vignette: +\code{vignette("simulate","smooth")} +} \examples{ # Create 120 observations from CES(n). Generate 100 time series of this kind. diff --git a/man/sim.es.Rd b/man/sim.es.Rd index f632624..a47f365 100644 --- a/man/sim.es.Rd +++ b/man/sim.es.Rd @@ -89,6 +89,10 @@ a vector or a matrix... Function generates data using ETS with Single Source of Error as a data generating process. } +\details{ +For the information about the function, see the vignette: +\code{vignette("simulate","smooth")} +} \examples{ # Create 40 observations of quarterly data using AAA model with errors from normal distribution diff --git a/man/sim.gum.Rd b/man/sim.gum.Rd index 42bd9ab..0e6d890 100644 --- a/man/sim.gum.Rd +++ b/man/sim.gum.Rd @@ -83,6 +83,10 @@ a vector or a matrix... Function generates data using GUM with Single Source of Error as a data generating process. } +\details{ +For the information about the function, see the vignette: +\code{vignette("simulate","smooth")} +} \examples{ # Create 120 observations from GUM(1[1]). Generate 100 time series of this kind. @@ -98,6 +102,8 @@ simulate(ourModel,nsim=10) } \references{ \itemize{ +\item Svetunkov I. (2015 - Inf) "smooth" package for R - series of posts about the underlying +models and how to use them: \url{https://forecasting.svetunkov.ru/en/tag/smooth/}. \item Svetunkov I. (2017). Statistical models underlying functions of 'smooth' package for R. Working Paper of Department of Management Science, Lancaster University 2017:1, 1-52. diff --git a/man/sim.sma.Rd b/man/sim.sma.Rd index 51ee6f5..c4cfbe9 100644 --- a/man/sim.sma.Rd +++ b/man/sim.sma.Rd @@ -59,6 +59,10 @@ a vector or a matrix... Function generates data using SMA in a Single Source of Error state space model as a data generating process. } +\details{ +For the information about the function, see the vignette: +\code{vignette("simulate","smooth")} +} \examples{ # Create 40 observations of quarterly data using AAA model with errors from normal distribution diff --git a/man/sim.ssarima.Rd b/man/sim.ssarima.Rd index fe768c6..4abcbc2 100644 --- a/man/sim.ssarima.Rd +++ b/man/sim.ssarima.Rd @@ -94,6 +94,10 @@ either a vector or a matrix... Function generates data using SSARIMA with Single Source of Error as a data generating process. } +\details{ +For the information about the function, see the vignette: +\code{vignette("simulate","smooth")} +} \examples{ # Create 120 observations from ARIMA(1,1,1) with drift. Generate 100 time series of this kind. diff --git a/man/sim.ves.Rd b/man/sim.ves.Rd index 53a25d9..d8f4fc9 100644 --- a/man/sim.ves.Rd +++ b/man/sim.ves.Rd @@ -89,6 +89,10 @@ or array, depending on \code{nsim}. \description{ Function generates data using VES model as a data generating process. } +\details{ +For the information about the function, see the vignette: +\code{vignette("simulate","smooth")} +} \examples{ # Create 40 observations of quarterly data using AAA model with errors diff --git a/man/sma.Rd b/man/sma.Rd index 3a8d10e..8950099 100644 --- a/man/sma.Rd +++ b/man/sma.Rd @@ -4,13 +4,13 @@ \alias{sma} \title{Simple Moving Average} \usage{ -sma(data, order = NULL, ic = c("AICc", "AIC", "BIC", "BICc"), h = 10, - holdout = FALSE, cumulative = FALSE, intervals = c("none", +sma(y, order = NULL, ic = c("AICc", "AIC", "BIC", "BICc"), h = 10, + holdout = FALSE, cumulative = FALSE, interval = c("none", "parametric", "semiparametric", "nonparametric"), level = 0.95, silent = c("all", "graph", "legend", "output", "none"), ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{order}{Order of simple moving average. If \code{NULL}, then it is selected automatically using information criteria.} @@ -23,28 +23,28 @@ selected automatically using information criteria.} the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -86,23 +86,23 @@ provided parameters will take this into account. \item \code{fitted} - the fitted values. \item \code{forecast} - the point forecast. \item \code{lower} - the lower bound of prediction interval. When -\code{intervals=FALSE} then NA is returned. +\code{interval=FALSE} then NA is returned. \item \code{upper} - the higher bound of prediction interval. When -\code{intervals=FALSE} then NA is returned. +\code{interval=FALSE} then NA is returned. \item \code{residuals} - the residuals of the estimated model. \item \code{errors} - The matrix of 1 to h steps ahead errors. \item \code{s2} - variance of the residuals (taking degrees of freedom into account). -\item \code{intervals} - type of intervals asked by user. -\item \code{level} - confidence level for intervals. +\item \code{interval} - type of interval asked by user. +\item \code{level} - confidence level for interval. \item \code{cumulative} - whether the produced forecast was cumulative or not. -\item \code{actuals} - the original data. +\item \code{y} - the original data. \item \code{holdout} - the holdout part of the original data. \item \code{ICs} - values of information criteria of the model. Includes AIC, AICc, BIC and BICc. \item \code{logLik} - log-likelihood of the function. -\item \code{cf} - Cost function value. -\item \code{cfType} - Type of cost function used in the estimation. +\item \code{lossValue} - Cost function value. +\item \code{loss} - Type of loss function used in the estimation. \item \code{accuracy} - vector of accuracy measures for the holdout sample. Includes: MPE, MAPE, SMAPE, MASE, sMAE, RelMAE, sMSE and Bias coefficient (based on complex numbers). This is available only when @@ -125,14 +125,17 @@ which is AR(n) process, that can be modelled using: \eqn{v_{t} = F v_{t-1} + g \epsilon_{t}} Where \eqn{v_{t}} is a state vector. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("sma","smooth")} } \examples{ # SMA of specific order -ourModel <- sma(rnorm(118,100,3),order=12,h=18,holdout=TRUE,intervals="p") +ourModel <- sma(rnorm(118,100,3),order=12,h=18,holdout=TRUE,interval="p") # SMA of arbitrary order -ourModel <- sma(rnorm(118,100,3),h=18,holdout=TRUE,intervals="sp") +ourModel <- sma(rnorm(118,100,3),h=18,holdout=TRUE,interval="sp") summary(ourModel) forecast(ourModel) @@ -141,6 +144,8 @@ plot(forecast(ourModel)) } \references{ \itemize{ +\item Svetunkov I. (2015 - Inf) "smooth" package for R - series of posts about the underlying +models and how to use them: \url{https://forecasting.svetunkov.ru/en/tag/smooth/}. \item Svetunkov I. (2017). Statistical models underlying functions of 'smooth' package for R. Working Paper of Department of Management Science, Lancaster University 2017:1, 1-52. diff --git a/man/smooth.Rd b/man/smooth.Rd index 3d7ce7c..943036a 100644 --- a/man/smooth.Rd +++ b/man/smooth.Rd @@ -97,6 +97,8 @@ for Time Series Forecasting. Not yet published. } \itemize{ +\item Svetunkov I. (2015 - Inf) "smooth" package for R - series of posts about the underlying +models and how to use them: \url{https://forecasting.svetunkov.ru/en/tag/smooth/}. \item Svetunkov I. (2017). Statistical models underlying functions of 'smooth' package for R. Working Paper of Department of Management Science, Lancaster University 2017:1, 1-52. diff --git a/man/smoothCombine.Rd b/man/smoothCombine.Rd index f64384f..8444b13 100644 --- a/man/smoothCombine.Rd +++ b/man/smoothCombine.Rd @@ -4,21 +4,21 @@ \alias{smoothCombine} \title{Combination of forecasts of state space models} \usage{ -smoothCombine(data, models = NULL, initial = c("optimal", - "backcasting"), ic = c("AICc", "AIC", "BIC", "BICc"), - cfType = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), - h = 10, holdout = FALSE, cumulative = FALSE, - intervals = c("none", "parametric", "semiparametric", "nonparametric"), - level = 0.95, bins = 200, intervalsCombine = c("quantile", - "probability"), occurrence = c("none", "auto", "fixed", "general", - "odds-ratio", "inverse-odds-ratio", "probability"), oesmodel = "MNN", +smoothCombine(y, models = NULL, initial = c("optimal", "backcasting"), + ic = c("AICc", "AIC", "BIC", "BICc"), loss = c("MSE", "MAE", "HAM", + "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, + cumulative = FALSE, interval = c("none", "parametric", + "semiparametric", "nonparametric"), level = 0.95, bins = 200, + intervalCombine = c("quantile", "probability"), + occurrence = c("none", "auto", "fixed", "general", "odds-ratio", + "inverse-odds-ratio", "probability"), oesmodel = "MNN", bounds = c("admissible", "none"), silent = c("all", "graph", "legend", "output", "none"), xreg = NULL, xregDo = c("use", "select"), initialX = NULL, updateX = FALSE, persistenceX = NULL, transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{models}{List of the estimated smooth models to use in the combination. If \code{NULL}, then all the models are estimated @@ -30,12 +30,12 @@ initials are produced using backcasting procedure.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -52,38 +52,38 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} -\item{bins}{The number of bins for the prediction intervals. +\item{bins}{The number of bins for the prediction interval. The lower value means faster work of the function, but less precise estimates of the quantiles. This needs to be an even number.} -\item{intervalsCombine}{How to average the prediction intervals: +\item{intervalCombine}{How to average the prediction interval: quantile-wise (\code{"quantile"}) or probability-wise (\code{"probability"}).} @@ -151,19 +151,19 @@ state vector. If \code{NULL}, then estimated. Prerequisite - non-NULL \item \code{timeElapsed} - time elapsed for the construction of the model. \item \code{initialType} - type of the initial values used. \item \code{fitted} - fitted values of ETS. -\item \code{quantiles} - the 3D array of produced quantiles if \code{intervals!="none"} +\item \code{quantiles} - the 3D array of produced quantiles if \code{interval!="none"} with the dimensions: (number of models) x (bins) x (h). \item \code{forecast} - point forecast of ETS. -\item \code{lower} - lower bound of prediction interval. When \code{intervals="none"} +\item \code{lower} - lower bound of prediction interval. When \code{interval="none"} then NA is returned. -\item \code{upper} - higher bound of prediction interval. When \code{intervals="none"} +\item \code{upper} - higher bound of prediction interval. When \code{interval="none"} then NA is returned. \item \code{residuals} - residuals of the estimated model. \item \code{s2} - variance of the residuals (taking degrees of freedom into account). -\item \code{intervals} - type of intervals asked by user. -\item \code{level} - confidence level for intervals. +\item \code{interval} - type of interval asked by user. +\item \code{level} - confidence level for interval. \item \code{cumulative} - whether the produced forecast was cumulative or not. -\item \code{actuals} - original data. +\item \code{y} - original data. \item \code{holdout} - holdout part of the original data. \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. If the model is non-intermittent, then occurrence is \code{NULL}. @@ -189,17 +189,17 @@ possible because they are all formulated in Single Source of Error framework. Due to the the complexity of some of the models, the estimation process may take some time. So be patient. -The prediction intervals are combined either probability-wise or +The prediction interval are combined either probability-wise or quantile-wise (Lichtendahl et al., 2013), which may take extra time, because we need to produce all the distributions for all the models. This can be sped up with the smaller value for bins parameter, but -the resulting intervals may be imprecise. +the resulting interval may be imprecise. } \examples{ library(Mcomp) -ourModel <- smoothCombine(M3[[578]],intervals="p") +ourModel <- smoothCombine(M3[[578]],interval="p") plot(ourModel) # models parameter accepts either previously estimated smoothCombine diff --git a/man/ssarima.Rd b/man/ssarima.Rd index f767311..434e5f6 100644 --- a/man/ssarima.Rd +++ b/man/ssarima.Rd @@ -4,12 +4,12 @@ \alias{ssarima} \title{State Space ARIMA} \usage{ -ssarima(data, orders = list(ar = c(0), i = c(1), ma = c(1)), - lags = c(1), constant = FALSE, AR = NULL, MA = NULL, +ssarima(y, orders = list(ar = c(0), i = c(1), ma = c(1)), lags = c(1), + constant = FALSE, AR = NULL, MA = NULL, initial = c("backcasting", "optimal"), ic = c("AICc", "AIC", "BIC", - "BICc"), cfType = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", + "BICc"), loss = c("MSE", "MAE", "HAM", "MSEh", "TMSE", "GTMSE", "MSCE"), h = 10, holdout = FALSE, cumulative = FALSE, - intervals = c("none", "parametric", "semiparametric", "nonparametric"), + interval = c("none", "parametric", "semiparametric", "nonparametric"), level = 0.95, occurrence = c("none", "auto", "fixed", "general", "odds-ratio", "inverse-odds-ratio", "direct"), oesmodel = "MNN", bounds = c("admissible", "none"), silent = c("all", "graph", @@ -18,7 +18,7 @@ ssarima(data, orders = list(ar = c(0), i = c(1), ma = c(1)), transitionX = NULL, ...) } \arguments{ -\item{data}{Vector or ts object, containing data needed to be forecasted.} +\item{y}{Vector or ts object, containing data needed to be forecasted.} \item{orders}{List of orders, containing vector variables \code{ar}, \code{i} and \code{ma}. Example: @@ -56,12 +56,12 @@ produced using backcasting procedure.} \item{ic}{The information criterion used in the model selection procedure.} -\item{cfType}{The type of Cost Function used in optimization. \code{cfType} can +\item{loss}{The type of Loss Function used in optimization. \code{loss} can be: \code{MSE} (Mean Squared Error), \code{MAE} (Mean Absolute Error), \code{HAM} (Half Absolute Moment), \code{TMSE} - Trace Mean Squared Error, \code{GTMSE} - Geometric Trace Mean Squared Error, \code{MSEh} - optimisation using only h-steps ahead error, \code{MSCE} - Mean Squared Cumulative Error. -If \code{cfType!="MSE"}, then likelihood and model selection is done based +If \code{loss!="MSE"}, then likelihood and model selection is done based on equivalent \code{MSE}. Model selection in this cases becomes not optimal. There are also available analytical approximations for multistep functions: @@ -78,28 +78,28 @@ are available: \code{MAEh}, \code{TMAE}, \code{GTMAE}, \code{MACE}, \code{TMAE}, the end of the data.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} -\item{intervals}{Type of intervals to construct. This can be: +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{none}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{parametric}, \code{p} - use state-space structure of ETS. In case of mixed models this is done using simulations, which may take longer time than for the pure additive and pure multiplicative models. -\item \code{semiparametric}, \code{sp} - intervals based on covariance +\item \code{semiparametric}, \code{sp} - interval based on covariance matrix of 1 to h steps ahead errors and assumption of normal / log-normal distribution (depending on error type). -\item \code{nonparametric}, \code{np} - intervals based on values from a +\item \code{nonparametric}, \code{np} - interval based on values from a quantile regression on error matrix (see Taylor and Bunn, 1999). The model used in this process is e[j] = a j^b, where j=1,..,h. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -parametric intervals are constructed, while the latter is equivalent to +parametric interval are constructed, while the latter is equivalent to \code{none}. -If the forecasts of the models were combined, then the intervals are combined +If the forecasts of the models were combined, then the interval are combined quantile-wise (Lichtendahl et al., 2013).} \item{level}{Confidence level. Defines width of prediction interval.} @@ -196,17 +196,17 @@ provided parameters will take this into account. \item \code{fitted} - the fitted values. \item \code{forecast} - the point forecast. \item \code{lower} - the lower bound of prediction interval. When -\code{intervals="none"} then NA is returned. +\code{interval="none"} then NA is returned. \item \code{upper} - the higher bound of prediction interval. When -\code{intervals="none"} then NA is returned. +\code{interval="none"} then NA is returned. \item \code{residuals} - the residuals of the estimated model. \item \code{errors} - The matrix of 1 to h steps ahead errors. \item \code{s2} - variance of the residuals (taking degrees of freedom into account). -\item \code{intervals} - type of intervals asked by user. -\item \code{level} - confidence level for intervals. +\item \code{interval} - type of interval asked by user. +\item \code{level} - confidence level for interval. \item \code{cumulative} - whether the produced forecast was cumulative or not. -\item \code{actuals} - the original data. +\item \code{y} - the original data. \item \code{holdout} - the holdout part of the original data. \item \code{occurrence} - model of the class "oes" if the occurrence model was estimated. If the model is non-intermittent, then occurrence is \code{NULL}. @@ -222,8 +222,8 @@ variables. \item \code{ICs} - values of information criteria of the model. Includes AIC, AICc, BIC and BICc. \item \code{logLik} - log-likelihood of the function. -\item \code{cf} - Cost function value. -\item \code{cfType} - Type of cost function used in the estimation. +\item \code{lossValue} - Cost function value. +\item \code{loss} - Type of loss function used in the estimation. \item \code{FI} - Fisher Information. Equal to NULL if \code{FI=FALSE} or when \code{FI} is not provided at all. \item \code{accuracy} - vector of accuracy measures for the holdout sample. @@ -276,16 +276,19 @@ take some finite time... If you plan estimating a model with more than one seasonality, it is recommended to consider doing it using \link[smooth]{msarima}. The model selection for SSARIMA is done by the \link[smooth]{auto.ssarima} function. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("ssarima","smooth")} } \examples{ # ARIMA(1,1,1) fitted to some data ourModel <- ssarima(rnorm(118,100,3),orders=list(ar=c(1),i=c(1),ma=c(1)),lags=c(1),h=18, - holdout=TRUE,intervals="p") + holdout=TRUE,interval="p") # The previous one is equivalent to: \dontrun{ourModel <- ssarima(rnorm(118,100,3),ar.orders=c(1),i.orders=c(1),ma.orders=c(1),lags=c(1),h=18, - holdout=TRUE,intervals="p")} + holdout=TRUE,interval="p")} # Model with the same lags and orders, applied to a different data ssarima(rnorm(118,100,3),orders=orders(ourModel),lags=lags(ourModel),h=18,holdout=TRUE) @@ -307,8 +310,8 @@ ssarima(rnorm(118,100,3),orders=list(ar=c(1),i=c(1),ma=c(1,1)), h=10,holdout=TRUE)} # ARIMA(1,1,1) with Mean Squared Trace Forecast Error -\dontrun{ssarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,cfType="TMSE") -ssarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,cfType="aTMSE")} +\dontrun{ssarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,loss="TMSE") +ssarima(rnorm(118,100,3),orders=list(ar=1,i=1,ma=1),lags=1,h=18,holdout=TRUE,loss="aTMSE")} # SARIMA(0,1,1) with exogenous variables ssarima(rnorm(118,100,3),orders=list(i=1,ma=1),h=18,holdout=TRUE,xreg=c(1:118)) diff --git a/man/ves.Rd b/man/ves.Rd index 6d08139..8b8028b 100644 --- a/man/ves.Rd +++ b/man/ves.Rd @@ -4,20 +4,20 @@ \alias{ves} \title{Vector Exponential Smoothing in SSOE state space model} \usage{ -ves(data, model = "ANN", persistence = c("group", "independent", +ves(y, model = "ANN", persistence = c("group", "independent", "dependent", "seasonal"), transition = c("group", "independent", "dependent"), phi = c("group", "individual"), initial = c("individual", "group"), initialSeason = c("group", - "individual"), cfType = c("likelihood", "diagonal", "trace"), + "individual"), loss = c("likelihood", "diagonal", "trace"), ic = c("AICc", "AIC", "BIC", "BICc"), h = 10, holdout = FALSE, - intervals = c("none", "conditional", "unconditional", "independent"), + interval = c("none", "conditional", "unconditional", "independent"), level = 0.95, cumulative = FALSE, intermittent = c("none", "fixed", "logistic"), imodel = "ANN", iprobability = c("dependent", "independent"), bounds = c("admissible", "usual", "none"), silent = c("all", "graph", "output", "none"), ...) } \arguments{ -\item{data}{The matrix with data, where series are in columns and +\item{y}{The matrix with the data, where series are in columns and observations are in rows.} \item{model}{The type of ETS model. Can consist of 3 or 4 chars: \code{ANN}, @@ -83,7 +83,7 @@ a matrix, assuming that these values are provided for the whole group.} states. Treated the same way as \code{initial}. This means that different time series may share the same initial seasonal component.} -\item{cfType}{Type of Cost Function used in optimization. \code{cfType} can +\item{loss}{Type of Loss Function used in optimization. \code{loss} can be: \itemize{ \item \code{likelihood} - which assumes the minimisation of the determinant @@ -103,32 +103,32 @@ is minimised in this case. \item{holdout}{If \code{TRUE}, holdout sample of size \code{h} is taken from the end of the data.} -\item{intervals}{Type of intervals to construct. +\item{interval}{Type of interval to construct. This can be: \itemize{ \item \code{"none"}, aka \code{n} - do not produce prediction -intervals. +interval. \item \code{"conditional"}, \code{c} - produces multidimensional elliptic -intervals for each step ahead forecast. NOT AVAILABLE YET! +interval for each step ahead forecast. NOT AVAILABLE YET! \item \code{"unconditional"}, \code{u} - produces separate bounds for each series based on ellipses for each step ahead. These bounds correspond to min and max values of the ellipse assuming that all the other series but one take values in the centre of the ellipse. This leads to less accurate estimates of bounds -(wider intervals than needed), but these could still be useful. NOT AVAILABLE YET! -\item \code{"independent"}, \code{i} - produces intervals based on variances of +(wider interval than needed), but these could still be useful. NOT AVAILABLE YET! +\item \code{"independent"}, \code{i} - produces interval based on variances of each separate series. This does not take vector structure into account. } The parameter also accepts \code{TRUE} and \code{FALSE}. The former means that -the independent intervals are constructed, while the latter is equivalent to +the independent interval are constructed, while the latter is equivalent to \code{none}. You can also use the first letter instead of writing the full word.} \item{level}{Confidence level. Defines width of prediction interval.} \item{cumulative}{If \code{TRUE}, then the cumulative forecast and prediction -intervals are produced instead of the normal ones. This is useful for +interval are produced instead of the normal ones. This is useful for inventory control systems.} \item{intermittent}{Defines type of intermittent model used. Can be: @@ -186,7 +186,7 @@ values: \item \code{initialSeason} - The initial values of the seasonal components; \item \code{nParam} - The number of estimated parameters; \item \code{imodel} - The intermittent model estimated with VES; -\item \code{actuals} - The matrix with the original data; +\item \code{y} - The matrix with the original data; \item \code{fitted} - The matrix of the fitted values; \item \code{holdout} - The matrix with the holdout values (if \code{holdout=TRUE} in the estimation); @@ -194,13 +194,13 @@ the estimation); \item \code{Sigma} - The covariance matrix of the errors (estimated with the correction for the number of degrees of freedom); \item \code{forecast} - The matrix of point forecasts; -\item \code{PI} - The bounds of the prediction intervals; -\item \code{intervals} - The type of the constructed prediction intervals; -\item \code{level} - The level of the confidence for the prediction intervals; +\item \code{PI} - The bounds of the prediction interval; +\item \code{interval} - The type of the constructed prediction interval; +\item \code{level} - The level of the confidence for the prediction interval; \item \code{ICs} - The values of the information criteria; \item \code{logLik} - The log-likelihood function; -\item \code{cf} - The value of the cost function; -\item \code{cfType} - The type of the used cost function; +\item \code{lossValue} - The value of the loss function; +\item \code{loss} - The type of the used loss function; \item \code{accuracy} - the values of the error measures. Currently not available. \item \code{FI} - Fisher information if user asked for it using \code{FI=TRUE}. } @@ -258,6 +258,9 @@ chapter 17. In case of multiplicative model, instead of the vector y_t we use its logarithms. As a result the multiplicative model is much easier to work with. + +For some more information about the model and its implementation, see the +vignette: \code{vignette("ves","smooth")} } \examples{ diff --git a/man/viss.Rd b/man/viss.Rd index 8a3311f..fcdfbaa 100644 --- a/man/viss.Rd +++ b/man/viss.Rd @@ -4,14 +4,14 @@ \alias{viss} \title{Vector Intermittent State Space} \usage{ -viss(data, intermittent = c("logistic", "none", "fixed"), - ic = c("AICc", "AIC", "BIC", "BICc"), h = 10, holdout = FALSE, +viss(y, intermittent = c("logistic", "none", "fixed"), ic = c("AICc", + "AIC", "BIC", "BICc"), h = 10, holdout = FALSE, probability = c("dependent", "independent"), model = "ANN", persistence = NULL, transition = NULL, phi = NULL, - initial = NULL, initialSeason = NULL, xreg = NULL) + initial = NULL, initialSeason = NULL, xreg = NULL, ...) } \arguments{ -\item{data}{The matrix with data, where series are in columns and +\item{y}{The matrix with data, where series are in columns and observations are in rows.} \item{intermittent}{Type of method used in probability estimation. Can be @@ -53,6 +53,8 @@ If \code{NULL}, then it is estimated. See \link[smooth]{ves} for the details.} \item{xreg}{Vector of matrix of exogenous variables, explaining some parts of occurrence variable (probability).} + +\item{...}{Other parameters. This is not needed for now.} } \value{ The object of class "iss" is returned. It contains following list of @@ -67,7 +69,7 @@ values: \item \code{logLik} - likelihood value for the model \item \code{nParam} - number of parameters used in the model; \item \code{residuals} - residuals of the model; -\item \code{actuals} - actual values of probabilities (zeros and ones). +\item \code{y} - actual values of probabilities (zeros and ones). \item \code{persistence} - the vector of smoothing parameters; \item \code{initial} - initial values of the state vector; \item \code{initialSeason} - the matrix of initials seasonal states; diff --git a/src/ssGeneral.cpp b/src/ssGeneral.cpp index d4690ba..79d6c19 100644 --- a/src/ssGeneral.cpp +++ b/src/ssGeneral.cpp @@ -426,9 +426,10 @@ RcppExport SEXP etsmatrices(SEXP matvt, SEXP vecg, SEXP phi, SEXP Cvalues, SEXP # polysos - function that transforms AR and MA parameters into polynomials # and then in matF and other things. # Cvalues includes AR, MA, initials, constant, matrixAt, transitionX and persistenceX. +# C and constValue can be NULL, so pointer is not suitable here. */ List polysos(arma::uvec const &arOrders, arma::uvec const &maOrders, arma::uvec const &iOrders, arma::uvec const &lags, unsigned int const &nComponents, - arma::vec const &arValues, arma::vec const &maValues, double const &constValue, arma::vec const &C, + arma::vec const &arValues, arma::vec const &maValues, double const constValue, arma::vec const C, arma::mat &matrixVt, arma::vec &vecG, arma::mat &matrixF, char const &fitterType, int const &nexo, arma::mat &matrixAt, arma::mat &matrixFX, arma::vec &vecGX, bool const &arEstimate, bool const &maEstimate, bool const &constRequired, bool const &constEstimate, @@ -1843,7 +1844,7 @@ RcppExport SEXP costfuncARIMA(SEXP ARorders, SEXP MAorders, SEXP Iorders, SEXP A } arma::vec maValues(MA_n.begin(), MA_n.size(), false); - double constValue; + double constValue = 0.0; if(!Rf_isNull(constant)){ constValue = as(constant); } diff --git a/src/ssOccurrence.cpp b/src/ssOccurrence.cpp index c6c3c2c..3ecdf7d 100644 --- a/src/ssOccurrence.cpp +++ b/src/ssOccurrence.cpp @@ -30,7 +30,7 @@ std::vector occurrenceError(double const &yAct, double aFit, double bFit // Calculate the error switch(EA){ case 'M': - output[0] = (yAct * (1 - 2 * kappa) + kappa) / pfit; + output[0] = (yAct * (1 - 2 * kappa) + kappa - pfit) / pfit; break; case 'A': output[0] = yAct - pfit; diff --git a/tests/testthat/test_ces.R b/tests/testthat/test_ces.R index 2b34bf6..bbb8372 100644 --- a/tests/testthat/test_ces.R +++ b/tests/testthat/test_ces.R @@ -12,7 +12,7 @@ test_that("Test on N1234$x, predefined CES", { }) # Test trace cost function for CES -testModel <- ces(Mcomp::M3$N2568$x, seasonality="f", h=18, holdout=TRUE, silent=TRUE, intervals=TRUE) +testModel <- ces(Mcomp::M3$N2568$x, seasonality="f", h=18, holdout=TRUE, silent=TRUE, interval=TRUE) test_that("Test AICc of CES based on MSTFE on N2568$x", { expect_equal(as.numeric(round(AICc(testModel),2)), as.numeric(round(testModel$ICs["AICc"],2))); }) @@ -27,7 +27,7 @@ test_that("Test initials, A and B of CES on N2568$x", { # Test exogenous (normal + updateX) with CES x <- cbind(c(rep(0,25),1,rep(0,43)),c(rep(0,10),1,rep(0,58))); y <- ts(c(Mcomp::M3$N1457$x,Mcomp::M3$N1457$xx),frequency=12); -testModel <- ces(y, h=18, holdout=TRUE, xreg=x, updateX=TRUE, silent=TRUE, cfType="aTMSE", intervals="sp") +testModel <- ces(y, h=18, holdout=TRUE, xreg=x, updateX=TRUE, silent=TRUE, loss="aTMSE", interval="sp") test_that("Check exogenous variables for CESX on N1457", { expect_equal(suppressWarnings(ces(y, h=18, holdout=TRUE, xreg=x, silent=TRUE)$model), testModel$model); expect_equal(suppressWarnings(forecast(testModel, h=18, holdout=FALSE)$method), testModel$model); diff --git a/tests/testthat/test_es.R b/tests/testthat/test_es.R index 78aeb28..528e292 100644 --- a/tests/testthat/test_es.R +++ b/tests/testthat/test_es.R @@ -28,7 +28,7 @@ test_that("Test ETS(MXM) with AIC on N2568$x", { }) # Test trace cost function for ETS -testModel <- es(Mcomp::M3$N2568$x, model="MAdM", h=18, holdout=TRUE, silent=TRUE, intervals=TRUE) +testModel <- es(Mcomp::M3$N2568$x, model="MAdM", h=18, holdout=TRUE, silent=TRUE, interval=TRUE) test_that("Test AIC of ETS on N2568$x", { expect_equal(as.numeric(round(AIC(testModel),2)), as.numeric(round(testModel$ICs[1,"AIC"],2))); }) @@ -44,9 +44,9 @@ test_that("Test initials, initialSeason and persistence of ETS on N2568$x", { # Test exogenous (normal + updateX) with ETS x <- cbind(c(rep(0,25),1,rep(0,43)),c(rep(0,10),1,rep(0,58))); y <- ts(c(Mcomp::M3$N1457$x,Mcomp::M3$N1457$xx),frequency=12); -testModel <- es(y, h=18, holdout=TRUE, xreg=x, updateX=TRUE, silent=TRUE, intervals="np") +testModel <- es(y, h=18, holdout=TRUE, xreg=x, updateX=TRUE, silent=TRUE, interval="np") test_that("Check exogenous variables for ETS on N1457", { - expect_equal(suppressWarnings(es(y, "MNN", h=18, holdout=TRUE, xreg=x, cfType="aTMSE", silent=TRUE)$model), testModel$model); + expect_equal(suppressWarnings(es(y, "MNN", h=18, holdout=TRUE, xreg=x, loss="aTMSE", silent=TRUE)$model), testModel$model); expect_equal(suppressWarnings(forecast(testModel, h=18, holdout=FALSE)$method), testModel$model); }) diff --git a/tests/testthat/test_ges.R b/tests/testthat/test_ges.R index 1d76aca..551a2a5 100644 --- a/tests/testthat/test_ges.R +++ b/tests/testthat/test_ges.R @@ -12,7 +12,7 @@ test_that("Reuse previous GUM on N1234$x", { }) # Test some crazy order of GUM -testModel <- gum(Mcomp::M3$N1234$x, orders=c(1,1,1), lags=c(1,3,5), h=18, holdout=TRUE, initial="o", silent=TRUE, intervals=TRUE) +testModel <- gum(Mcomp::M3$N1234$x, orders=c(1,1,1), lags=c(1,3,5), h=18, holdout=TRUE, initial="o", silent=TRUE, interval=TRUE) test_that("Test if crazy order GUM was estimated on N1234$x", { expect_equal(testModel$model, "GUM(1[1],1[3],1[5])"); }) @@ -28,7 +28,7 @@ test_that("Test initials, measurement, transition and persistence of GUM on N256 # Test exogenous (normal + updateX) with GUM x <- cbind(c(rep(0,25),1,rep(0,43)),c(rep(0,10),1,rep(0,58))); y <- ts(c(Mcomp::M3$N1457$x,Mcomp::M3$N1457$xx),frequency=12); -testModel <- gum(y, h=18, holdout=TRUE, xreg=x, updateX=TRUE, silent=TRUE, cfType="aTMSE", intervals="np") +testModel <- gum(y, h=18, holdout=TRUE, xreg=x, updateX=TRUE, silent=TRUE, loss="aTMSE", interval="np") test_that("Check exogenous variables for GUMX on N1457", { expect_equal(suppressWarnings(gum(y, h=18, holdout=TRUE, xreg=x, silent=TRUE)$model), testModel$model); expect_equal(suppressWarnings(forecast(testModel, h=18, holdout=FALSE)$method), testModel$model); diff --git a/tests/testthat/test_ssarima.R b/tests/testthat/test_ssarima.R index 04b1b8a..882f5f9 100644 --- a/tests/testthat/test_ssarima.R +++ b/tests/testthat/test_ssarima.R @@ -12,7 +12,7 @@ test_that("Reuse previous SSARIMA on N1234$x", { }) # Test some crazy order of SSARIMA -testModel <- ssarima(Mcomp::M3$N2568$x, orders=NULL, ar.orders=c(1,1,0), i.orders=c(1,0,1), ma.orders=c(0,1,1), lags=c(1,6,12), h=18, holdout=TRUE, initial="o", silent=TRUE, intervals=TRUE) +testModel <- ssarima(Mcomp::M3$N2568$x, orders=NULL, ar.orders=c(1,1,0), i.orders=c(1,0,1), ma.orders=c(0,1,1), lags=c(1,6,12), h=18, holdout=TRUE, initial="o", silent=TRUE, interval=TRUE) test_that("Test if crazy order SSARIMA was estimated on N1234$x", { expect_equal(testModel$model, "SARIMA(1,1,0)[1](1,0,1)[6](0,1,1)[12]"); }) @@ -50,7 +50,7 @@ test_that("Test if combined ARIMA works", { # Test exogenous (normal + updateX) with SSARIMA x <- cbind(c(rep(0,25),1,rep(0,43)),c(rep(0,10),1,rep(0,58))); y <- ts(c(Mcomp::M3$N1457$x,Mcomp::M3$N1457$xx),frequency=12); -testModel <- ssarima(y, h=18, holdout=TRUE, xreg=x, updateX=TRUE, silent=TRUE, cfType="aTMSE", intervals=TRUE) +testModel <- ssarima(y, h=18, holdout=TRUE, xreg=x, updateX=TRUE, silent=TRUE, loss="aTMSE", interval=TRUE) test_that("Check exogenous variables for SSARIMAX on N1457", { expect_equal(suppressWarnings(ssarima(y, h=18, holdout=TRUE, xreg=x, silent=TRUE)$model), testModel$model); expect_equal(suppressWarnings(forecast(testModel, h=18, holdout=FALSE)$method), testModel$model); diff --git a/tests/testthat/test_ves.R b/tests/testthat/test_ves.R index 99bcd76..84dd3c1 100644 --- a/tests/testthat/test_ves.R +++ b/tests/testthat/test_ves.R @@ -26,14 +26,14 @@ test_that("Test VES with grouped initials and dependent persistence", { }) # Test VES with a trace cost function -testModel <- ves(Y,"AAN", cfType="t", silent=TRUE); +testModel <- ves(Y,"AAN", loss="t", silent=TRUE); test_that("Test VES with a trace cost function", { - expect_match(testModel$cfType, "trace"); + expect_match(testModel$loss, "trace"); }) -# Test VES with a dependent transition and independent intervals -testModel <- ves(Y,"AAN", transition="d", intervals="i", silent=TRUE); -test_that("Test VES with a dependent transition and independent intervals", { +# Test VES with a dependent transition and independent interval +testModel <- ves(Y,"AAN", transition="d", interval="i", silent=TRUE); +test_that("Test VES with a dependent transition and independent interval", { expect_false(isTRUE(all.equal(testModel$transition[1,4], 0))); expect_equal(dim(testModel$PI),c(10,4)); }) diff --git a/vignettes/ces.Rmd b/vignettes/ces.Rmd index a8c6da9..d449711 100644 --- a/vignettes/ces.Rmd +++ b/vignettes/ces.Rmd @@ -34,14 +34,14 @@ This output is very similar to ones printed out by `es()` function. The only dif If we want automatic model selection, then we use `auto.ces()` function: ```{r auto_ces_N2457} -auto.ces(M3$N2457$x, h=18, holdout=TRUE, intervals="p", silent=FALSE) +auto.ces(M3$N2457$x, h=18, holdout=TRUE, interval="p", silent=FALSE) ``` -Note that prediction intervals are too narrow and do not include 95% of values. This is because CES is pure additive model and it cannot take into account possible heteroscedasticity. +Note that prediction interval are too narrow and do not include 95% of values. This is because CES is pure additive model and it cannot take into account possible heteroscedasticity. If for some reason we want to optimise initial values then we call: ```{r auto_ces_N2457_optimal} -auto.ces(M3$N2457$x, h=18, holdout=TRUE, initial="o", intervals="sp") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, initial="o", interval="sp") ``` Now let's introduce some artificial exogenous variables: @@ -49,12 +49,12 @@ Now let's introduce some artificial exogenous variables: x <- cbind(rnorm(length(M3$N2457$x),50,3),rnorm(length(M3$N2457$x),100,7)) ``` -`ces()` allows using exogenous variables and different types of prediction intervals in exactly the same manner as `es()`: +`ces()` allows using exogenous variables and different types of prediction interval in exactly the same manner as `es()`: ```{r auto_ces_N2457_xreg_simple} -auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, xregDo="select", intervals="p") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, xregDo="select", interval="p") ``` The same model but with updated parameters of exogenous variables is called: ```{r auto_ces_N2457_xreg_update} -auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE, intervals="p") +auto.ces(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE, interval="p") ``` diff --git a/vignettes/es.Rmd b/vignettes/es.Rmd index 0afc7f8..ae28e31 100644 --- a/vignettes/es.Rmd +++ b/vignettes/es.Rmd @@ -45,14 +45,14 @@ In this case function uses branch and bound algorithm to form a pool of models t 8. Information criteria for this model; 9. Forecast errors (because we have set `holdout=TRUE`). -The function has also produced a graph with actuals, fitted values and point forecasts. +The function has also produced a graph with actual values, fitted values and point forecasts. -If we need prediction intervals, then we run: -```{r es_N2457_with_intervals} -es(M3$N2457$x, h=18, holdout=TRUE, intervals=TRUE, silent=FALSE) +If we need prediction interval, then we run: +```{r es_N2457_with_interval} +es(M3$N2457$x, h=18, holdout=TRUE, interval=TRUE, silent=FALSE) ``` -Due to multiplicative nature of error term in the model, the intervals are asymmetric. This is the expected behaviour. The other thing to note is that the output now also provides the theoretical width of prediction intervals and its actual coverage. +Due to multiplicative nature of error term in the model, the interval are asymmetric. This is the expected behaviour. The other thing to note is that the output now also provides the theoretical width of prediction interval and its actual coverage. If we save the model (and let's say we want it to work silently): ```{r es_N2457_save_model} @@ -61,7 +61,7 @@ ourModel <- es(M3$N2457$x, h=18, holdout=TRUE, silent="all") we can then reuse it for different purposes: ```{r es_N2457_reuse_model} -es(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, intervals="np", level=0.93) +es(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, interval="np", level=0.93) ``` We can also extract the type of model in order to reuse it later: @@ -71,6 +71,11 @@ modelType(ourModel) This handy function, by the way, also works with ets() from forecast package. +If we need actual values from the model, we can use `actuals()` method from `greybox` package: +```{r es_N2457_actuals} +actuals(ourModel) +``` + We can then use persistence or initials only from the model to construct the other one: ```{r es_N2457_reuse_model_parts} es(M3$N2457$x, model=modelType(ourModel), h=18, holdout=FALSE, initial=ourModel$initial, silent="graph") @@ -83,7 +88,7 @@ es(M3$N2457$x, model=modelType(ourModel), h=18, holdout=FALSE, initial=1500, sil Using some other parameters may lead to completely different model and forecasts: ```{r es_N2457_aMSTFE} -es(M3$N2457$x, h=18, holdout=TRUE, cfType="aTMSE", bounds="a", ic="BIC", intervals=TRUE) +es(M3$N2457$x, h=18, holdout=TRUE, loss="aTMSE", bounds="a", ic="BIC", interval=TRUE) ``` You can play around with all the available parameters to see what's their effect on final model. @@ -135,7 +140,7 @@ etsModel <- forecast::ets(M3$N2457$x) esModel <- es(M3$N2457$x, model=etsModel, h=18) ``` -The point forecasts in the majority of cases should the same, but the prediction intervals may be different (especially if error term is multiplicative): +The point forecasts in the majority of cases should the same, but the prediction interval may be different (especially if error term is multiplicative): ```{r ets_es_forecast, message=FALSE, warning=FALSE} forecast(etsModel,h=18,level=0.95) forecast(esModel,h=18,level=0.95) @@ -143,7 +148,7 @@ forecast(esModel,h=18,level=0.95) Finally, if you work with M or M3 data, and need to test a function on a specific time series, you can use the following simplified call: ```{r es_N2457_M3} -es(M3$N2457, intervals=TRUE, silent=FALSE) +es(M3$N2457, interval=TRUE, silent=FALSE) ``` This command has taken the data, split it into in-sample and holdout and produced the forecast of appropriate length to the holdout. diff --git a/vignettes/gum.Rmd b/vignettes/gum.Rmd index b73e56b..5f41b64 100644 --- a/vignettes/gum.Rmd +++ b/vignettes/gum.Rmd @@ -41,7 +41,7 @@ gum(M3$N2457$x, h=18, holdout=TRUE, orders=c(2,1), lags=c(1,12)) Function `auto.gum()` is now implemented in `smooth`, but it works slowly as it needs to check a large number of models: ```{r Autogum_N2457_1[1]} -auto.gum(M3[[2457]], intervals=TRUE, silent=FALSE) +auto.gum(M3[[2457]], interval=TRUE, silent=FALSE) ``` In addition to standard values that other functions accept, GUM accepts predefined values for transition matrix, measurement and persistence vectors. For example, something more common can be passed to the function: diff --git a/vignettes/oes.Rmd b/vignettes/oes.Rmd index 3c9b0a4..36e808c 100644 --- a/vignettes/oes.Rmd +++ b/vignettes/oes.Rmd @@ -1,5 +1,5 @@ --- -title: "Occurrence part of iETS model" +title: "oes() - occurrence part of iETS model" author: "Ivan Svetunkov" date: "`r Sys.Date()`" output: rmarkdown::html_vignette @@ -24,7 +24,8 @@ The canonical general iETS model (called iETS$_G$) can be summarised as: \begin{equation} \label{eq:iETS} \tag{1} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = f{a_t, b_t} \\ a_t = w_a(v_{a,t-L}) + r_a(v_{a,t-L}) \epsilon_{a,t} \\ v_{a,t} = f_a(v_{a,t-L}) + g_a(v_{a,t-L}) \epsilon_{a,t} \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \\ @@ -33,7 +34,7 @@ The canonical general iETS model (called iETS$_G$) can be summarised as: (1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2) \end{matrix}, \end{equation} -where $y_t$ is the observed values, $z_t$ is the demand size, which is a pure multiplicative ETS model on its own, $w(\cdot)$ is the measurement function, $r(\cdot)$ is the error function, $f(\cdot)$ is the transition function and $g(\cdot)$ is the persistence function (the subscripts allow separating the functions for different parts of the model). These four functions define how the elements of the vector $v_{t}$ interact with each other. Furthermore, $\epsilon_{a,t}$ and $\epsilon_{b,t}$ are the mutually independent error terms, $o_t$ is the binary occurrence variable (1 - demand is non-zero, 0 - no demand in the period $t$) which is distributed according to Bernoulli with probability $p_t$ that has a Beta distribution ($o_t \sim \text{Bernoulli} \left(p_t \right)$, $p_t \sim \text{Beta} \left(a_t, b_t \right)$). Any ETS model can be used for $a_t$ and $b_t$, and the transformation of them into the probability $p_t$ depends on the type of the error. The general formula for the multiplicative error is: +where $y_t$ is the observed values, $z_t$ is the demand size, which is a pure multiplicative ETS model on its own, $w(\cdot)$ is the measurement function, $r(\cdot)$ is the error function, $f(\cdot)$ is the transition function and $g(\cdot)$ is the persistence function (the subscripts allow separating the functions for different parts of the model). These four functions define how the elements of the vector $v_{t}$ interact with each other. Furthermore, $\epsilon_{a,t}$ and $\epsilon_{b,t}$ are the mutually independent error terms, $o_t$ is the binary occurrence variable (1 - demand is non-zero, 0 - no demand in the period $t$) which is distributed according to Bernoulli with probability $p_t$ that has a logit-normal distribution ($o_t \sim \text{Bernoulli} \left(p_t \right)$, $p_t \sim \text{logit} \mathcal{N}$). Any ETS model can be used for $a_t$ and $b_t$, and the transformation of them into the probability $p_t$ depends on the type of the error. The general formula for the multiplicative error is: \begin{equation} \label{eq:oETS(MZZ)} p_t = \frac{a_t}{a_t+b_t} , \end{equation} @@ -51,7 +52,8 @@ An example of an iETS model is the basic local-level model iETS(M,N,N)$_G$(M,N,N l_{z,t} = l_{z,t-1}( 1 + \alpha_{z} \epsilon_{z,t}) \\ (1 + \epsilon_{t}) \sim \text{log}\mathcal{N}(0, \sigma_\epsilon^2) \\ \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{a_t}{a_t+b_t} \\ a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \\ @@ -95,33 +97,24 @@ In case of the fixed $a_t$ and $b_t$, the iETS$_G$ model reduces to: \begin{equation} \label{eq:ISSETS(MNN)Fixed} \tag{3} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli}(a, b) + o_t \sim \text{Bernoulli}(p) \end{matrix} . \end{equation} -The conditional h-steps ahead median of the demand occurrence probability is calculated as: +The conditional h-steps ahead mean of the demand occurrence probability is calculated as: \begin{equation} \label{eq:pt_fixed_expectation} - \mu_{o,t+h|t} = \tilde{p}_{t+h|t} = \frac{a}{a+b} . -\end{equation} - -The likelihood function used in the first step of the estimation of iETS can be simplified to: -\begin{equation} \label{eq:ISSETS(MNN)FixedLikelihood} \tag{4} - \ell \left(a,b | o_t \right) = {\sum_{t=1}^T} \log \left( \frac{ \text{B} (o_t + a, 1 - o_t + b) }{ \text{B}(a,b) } \right) , + \mu_{o,t+h|t} = \tilde{p}_{t+h|t} = \hat{p} . \end{equation} -where $B$ is the beta function. -Note, however that there can be combinations of $a$ and $b$ that will lead to the same fixed probability of occurrence $p$, so there is no point in estimating the model (3) \ref{eq:ISSETS(MNN)Fixed} based on (4) \ref{eq:ISSETS(MNN)FixedLikelihood}. Instead, the simpler version of the iETS$_F$ is fitted in the `oes()` function of the `smooth` package: -\begin{equation} \label{eq:ISSETS(MNN)FixedSmooth} - \begin{matrix} - y_t = o_t z_t \\ - o_t \sim \text{Bernoulli}(p) - \end{matrix} , -\end{equation} -so that the estimate of the probability $p$ is calculated based on the maximisation of the following concentrated log-likelihood function: +The estimate of the probability $p$ is calculated based on the maximisation of the following concentrated log-likelihood function: \begin{equation} \label{eq:ISSETS(MNN)FixedLikelihoodSmooth} - \ell \left(\hat{p} | o_t \right) = T_1 \log \hat{p} + T_0 \log (1-\hat{p}) , + \ell \left({p} | o_t \right) = T_1 \log {p} + T_0 \log (1-{p}) , +\end{equation} +where $T_0$ is the number of zero observations and $T_1$ is the number of non-zero observations in the data. The number of estimated parameters in this case is equal to $k_z+1$, where $k_z$ is the number of parameters for the demand sizes part, and 1 is for the estimation of the probability $p$. Maximising this likelihood deems the analytical solution for the $p$: +\begin{equation} \label{eq:ISSETS(MNN)FixedLikelihoodSmoothProbability} + \hat{p} = \frac{1}{T} \sum_{t=1}^T o_t , \end{equation} -where $T_0$ is the number of zero observations and $T_1$ is the number of non-zero observations in the data. The number of estimated parameters in this case is equal to $k_z+1$, where $k_z$ is the number of parameters for the demand sizes part, and 1 is for the estimation of the probability $p$. +where $T$ is the number of all the available observations. The occurrence part of the model oETS$_F$ is constructed using `oes()` function: ```{r iETSFExample1} @@ -141,16 +134,13 @@ The odds-ratio iETS uses only one model for the occurrence part, for the $a_t$ v \begin{equation} \label{eq:iETSO} \tag{5} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(a_t, 1 \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{a_t}{a_t+1} \\ a_t = l_{a,t-1} \left(1 + \epsilon_{a,t} \right) \\ l_{a,t} = l_{a,t-1}( 1 + \alpha_{a} \epsilon_{a,t}) \\ (1 + \epsilon_{a,t}) \sim \text{log}\mathcal{N}(0, \sigma_{a}^2) \end{matrix}. \end{equation} -The probability of occurrence in this model is equal to: -\begin{equation} \label{eq:oETS_O(MNN)} - p_t = \frac{a_t}{a_t+1} . -\end{equation} In the estimation of the model, the initial level is set to the transformed mean probability of occurrence $l_{a,0}=\frac{\bar{p}}{1-\bar{p}}$ for multiplicative error model and $l_{a,0} = \log l_{a,0}$ for the additive one, where $\bar{p}=\frac{1}{T} \sum_{t=1}^T o_t$, the initial trend is equal to 0 in case of the additive and 1 in case of the multiplicative types. In cases of seasonal models, the regression with dummy variables is fitted, and its parameters are then used for the initials of the seasonal indices after the transformations similar to the level ones. @@ -191,16 +181,13 @@ Similarly to the odds-ratio iETS, inverse-odds-ratio model uses only one model f \begin{equation} \label{eq:iETSI} \tag{6} \begin{matrix} y_t = o_t z_t \\ - o_t \sim \text{Beta-Bernoulli} \left(1, b_t \right) \\ + o_t \sim \text{Bernoulli} \left(p_t \right) \\ + p_t = \frac{1}{1+b_t} \\ b_t = l_{b,t-1} \left(1 + \epsilon_{b,t} \right) \\ l_{b,t} = l_{b,t-1}( 1 + \alpha_{b} \epsilon_{b,t}) \\ (1 + \epsilon_{b,t}) \sim \text{log}\mathcal{N}(0, \sigma_{b}^2) \end{matrix}. \end{equation} -The probability of occurrence in this model is equal to: -\begin{equation} \label{eq:oETS_I(MNN)} - p_t = \frac{1}{1+b_t} . -\end{equation} In the estimation of the model, the initial level is set to the transformed mean probability of occurrence $l_{b,0}=\frac{1-\bar{p}}{\bar{p}}$ for multiplicative error model and $l_{b,0} = \log l_{b,0}$ for the additive one, where $\bar{p}=\frac{1}{T} \sum_{t=1}^T o_t$, the initial trend is equal to 0 in case of the additive and 1 in case of the multiplicative types. The seasonality is treated similar to the iETS$_O$ model, but using the inverse-odds transformation. @@ -347,10 +334,10 @@ The main restriction of the iETS models at the moment (`smooth` v.2.5.0) is that ## The integer-valued iETS By default, the models assume that the data is continuous, which sounds counter intuitive for the typical intermittent demand forecasting tasks. However, [@Svetunkov2017a] showed that these models perform quite well in terms of forecasting accuracy for many cases. Still, there is also an option for the rounded up values, which is implemented in the `es()` function. This is not described in the manual and can be triggered via the `rounded=TRUE` parameter provided in ellipsis. Here's an example: ```{r iETSGRoundedExample} -es(rpois(100,0.3), "MNN", occurrence="g", oesmodel="MNN", h=10, holdout=TRUE, silent=FALSE, intervals=TRUE, rounded=TRUE) +es(rpois(100,0.3), "MNN", occurrence="g", oesmodel="MNN", h=10, holdout=TRUE, silent=FALSE, interval=TRUE, rounded=TRUE) ``` -Keep in mind that the model with the rounded up values is estimated differently than it continuous counterpart and produces more adequate results for the highly intermittent data with low level of demand sizes. In all the other cases, the continuous iETS models are recommended. In fact, if you need to produce integer-valued prediction intervals, then you can produce the intervals from a continuous model and then round them up (see discussion in [@Svetunkov2017a] for details). +Keep in mind that the model with the rounded up values is estimated differently than it continuous counterpart and produces more adequate results for the highly intermittent data with low level of demand sizes. In all the other cases, the continuous iETS models are recommended. In fact, if you need to produce integer-valued prediction interval, then you can produce the interval from a continuous model and then round them up (see discussion in [@Svetunkov2017a] for details). ## References diff --git a/vignettes/sma.Rmd b/vignettes/sma.Rmd index 5395af5..2a569fb 100644 --- a/vignettes/sma.Rmd +++ b/vignettes/sma.Rmd @@ -36,7 +36,7 @@ It appears that SMA(13) is the optimal model for this time series, which is not If we try selecting SMA order for data without substantial trend, then we will end up with some other order. For example, let's consider a seasonal time series N2568: ```{r sma_N2568} -sma(M3$N2568$x, h=18) +sma(M3$N2568$x, h=18, interval=TRUE) ``` Here we end up with SMA(12). Note that the order of moving average corresponds to seasonal frequency, which is usually a first step in classical time series decomposition. We however do not have centred moving average, we deal with simple one, so decomposition should not be done based on this model. diff --git a/vignettes/smooth.Rmd b/vignettes/smooth.Rmd index 1ea0542..430a257 100644 --- a/vignettes/smooth.Rmd +++ b/vignettes/smooth.Rmd @@ -26,10 +26,8 @@ The package includes the following functions: 6. [sma() - Simple Moving Average in state-space form](sma.html); 7. [Simulate functions of the package](simulate.html). 8. `smoothCombine()` - function that combines forecasts of the main univariate functions of smooth package. -9. `oes()` -- function that estimates probability of occurrence of variable using one of the following model types: 1. Fixed probability; 2. Odds ratio probability; 3. Inverse odds ratio probability; 4. Direct probability; 5. General. It can also select the most appropriate model among these five. The model produced by `oes()` can then be used in any forecasting function as input variable for `occurrence` parameter. This is the new function introduced in smooth v2.5.0, substituting the old `iss()` function. +9. [oes() - Occurrence part of iETS model](oes.html) -- function that estimates probability of occurrence of variable using one of the following model types: 1. Fixed probability; 2. Odds ratio probability; 3. Inverse odds ratio probability; 4. Direct probability; 5. General. It can also select the most appropriate model among these five. The model produced by `oes()` can then be used in any forecasting function as input variable for `occurrence` parameter. This is the new function introduced in smooth v2.5.0, substituting the old `iss()` function. There is also vector counterpart of this function called `viss()` which implements multivariate fixed and logistic probabilities. -11. `xregExpander()` -- function that creates lags and leads of the provided exogenous variables (either vector or matrix) and forecasts the missing values. This thing returns the matrix. -12. `stepwise()` -- the function that implements stepwise based on information criteria and partial correlations. Easier to use and works faster than `step()` from `stats` package. The functions (1) - (4) and (6) return object of class `smooth`, (5) returns the object of class `vsmooth`, (7) returns `smooth.sim` class and finally (8) returns `oes` or `viss` (depending on the function used). There are several methods for these classes in the package. @@ -45,7 +43,7 @@ There are several functions that can be used together with the forecasting funct 6. `plot(ourModel)` -- plots states of constructed model. If number of states is higher than 10, then several graphs are produced. 7. `simulate(ourModel)` -- produces data simulated from provided model; 8. `summary(forecast(ourModel))` -- prints point and interval forecasts; -9. `plot(forecast(ourModel))` -- produces graph with actuals, forecast, fitted and intervals using `graphmaker()` function. +9. `plot(forecast(ourModel))` -- produces graph with actuals, forecast, fitted and prediction interval using `graphmaker()` function from `greybox` package. 10. `logLik(ourModel)` -- returns log-likelihood of the model; 11. `nobs(ourModel)` -- returns number of observations in-sample we had; 12. `nParam(ourModel)` -- number of estimated parameters (originally from `greybox` package); diff --git a/vignettes/ssarima.Rmd b/vignettes/ssarima.Rmd index e0f06f8..b24f73a 100644 --- a/vignettes/ssarima.Rmd +++ b/vignettes/ssarima.Rmd @@ -70,12 +70,12 @@ ourModel <- auto.ssarima(M3$N2457$x, h=18, holdout=TRUE, xreg=x, updateX=TRUE) we can then reuse it: ```{r auto_ssarima_N2457_xreg_update} -ssarima(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, xreg=x, updateX=TRUE, intervals=TRUE) +ssarima(M3$N2457$x, model=ourModel, h=18, holdout=FALSE, xreg=x, updateX=TRUE, interval=TRUE) ``` Finally, we can combine several SARIMA models: ```{r auto_ssarima_N2457_combination} -ssarima(M3$N2457$x, h=18, holdout=FALSE, intervals=TRUE, combine=TRUE) +ssarima(M3$N2457$x, h=18, holdout=FALSE, interval=TRUE, combine=TRUE) ``` diff --git a/vignettes/ves.Rmd b/vignettes/ves.Rmd index 0650ce5..0949238 100644 --- a/vignettes/ves.Rmd +++ b/vignettes/ves.Rmd @@ -57,6 +57,6 @@ Number of estimated parameters in the model can be extracted via `nParam()` meth AICc and BICc for the vector models are calculated as proposed in [@Bedrick1994] and [@Tremblay2004]. -Currently we don't do model selection, don't have exogenous variables and don't produce conditional prediction intervals. But at least it works and allows you to play around with it :). +Currently we don't do model selection, don't have exogenous variables and don't produce conditional prediction interval. But at least it works and allows you to play around with it :). ### References