Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upstep_fourier returns all NaN #40
Comments
|
@mpelath Do you have a reproducible example? |
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#>
#> fixed
#> The following object is masked from 'package:stats':
#>
#> step
library(timetk)
dates <- c(ymd(20200531), ymd(20200630))
train_data <- tibble(id = c(1, 1, 2, 2, 3, 3), date = rep(dates, 3))
good_recipe <- recipe(train_data) %>% step_fourier(date, period = 12, K = 6)
good_recipe_prepped <- prep(good_recipe, train_data)
baked <- bake(good_recipe_prepped, train_data)
train_data <- train_data %>% arrange(date)
bad_recipe <- recipe(train_data) %>% step_fourier(date, period = 12, K = 6)
bad_recipe_prepped <- prep(bad_recipe, train_data)
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
baked <- bake(bad_recipe_prepped, train_data)
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs producedCreated on 2020-06-26 by the reprex package (v0.3.0) |
|
I took a look at this. This one is complicated... About scalingIn order for the fourier terms to come out correctly, there needs to be scaling applied to ensure that the sine and cosine are generated with unit difference between subsequent terms. So in this case, it's actually bad to rearrange by date because you actually have groups of dates. The time difference between observations becomes zero when it should be the difference between the first and second date in each group of date sequences. SolutionI've added an error that now happens. Hopefully, this will point users in the right direction.
|
|
Closing this issue. The fix will be included in version 2.1.0. |
Although it worked a month or two ago, step_fourier is now giving me NaNs for everything.
After pulling the source code and debugging, I think the issue arises when the scale is inferred:
date_to_seq_scale_factor <- function(idx) {
tk_get_timeseries_summary(idx) %>% dplyr::pull(diff.median)
}
since tk_get_timeseries_summary returns a diff.median of zero. This is because I'm using panel data, not time series data. My guess is that the sort order of the data is now being changed by some upstream process (possibly but not necessarily something in timetk). When the data is sorted by the time index, rather than the unit then the time index, then it looks like most diffs are zero. The scale factor is then zero.
I don't know whether there is anything you can do about it. Perhaps allow the user to define the scale, or just document this pitfall when using non-univariate time series data.