Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_fourier returns all NaN #40

Closed
mpelath opened this issue Jun 24, 2020 · 4 comments
Closed

step_fourier returns all NaN #40

mpelath opened this issue Jun 24, 2020 · 4 comments

Comments

@mpelath
Copy link

@mpelath mpelath commented Jun 24, 2020

Although it worked a month or two ago, step_fourier is now giving me NaNs for everything.

After pulling the source code and debugging, I think the issue arises when the scale is inferred:

date_to_seq_scale_factor <- function(idx) {
tk_get_timeseries_summary(idx) %>% dplyr::pull(diff.median)
}

since tk_get_timeseries_summary returns a diff.median of zero. This is because I'm using panel data, not time series data. My guess is that the sort order of the data is now being changed by some upstream process (possibly but not necessarily something in timetk). When the data is sorted by the time index, rather than the unit then the time index, then it looks like most diffs are zero. The scale factor is then zero.

I don't know whether there is anything you can do about it. Perhaps allow the user to define the scale, or just document this pitfall when using non-univariate time series data.

@mdancho84
Copy link
Contributor

@mdancho84 mdancho84 commented Jun 26, 2020

@mpelath Do you have a reproducible example?

@mpelath
Copy link
Author

@mpelath mpelath commented Jun 26, 2020

library(tidyverse)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(recipes)
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stringr':
#> 
#>     fixed
#> The following object is masked from 'package:stats':
#> 
#>     step
library(timetk)

dates <- c(ymd(20200531), ymd(20200630))
train_data <- tibble(id = c(1, 1, 2, 2, 3, 3), date = rep(dates, 3))

good_recipe <- recipe(train_data) %>% step_fourier(date, period = 12, K = 6)
good_recipe_prepped <- prep(good_recipe, train_data)
baked <- bake(good_recipe_prepped, train_data)

train_data <- train_data %>% arrange(date)
bad_recipe <- recipe(train_data) %>% step_fourier(date, period = 12, K = 6)
bad_recipe_prepped <- prep(bad_recipe, train_data)
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
baked <- bake(bad_recipe_prepped, train_data)
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced
#> Warning in sin(2 * pi * term * x): NaNs produced
#> Warning in cos(2 * pi * term * x): NaNs produced

Created on 2020-06-26 by the reprex package (v0.3.0)

mdancho84 added a commit that referenced this issue Jul 2, 2020
@mdancho84
Copy link
Contributor

@mdancho84 mdancho84 commented Jul 2, 2020

I took a look at this. This one is complicated...

About scaling

In order for the fourier terms to come out correctly, there needs to be scaling applied to ensure that the sine and cosine are generated with unit difference between subsequent terms.

So in this case, it's actually bad to rearrange by date because you actually have groups of dates. The time difference between observations becomes zero when it should be the difference between the first and second date in each group of date sequences.

Solution

I've added an error that now happens. Hopefully, this will point users in the right direction.

> bad_recipe_prepped <- prep(bad_recipe, train_data)
 Error: Problem with `mutate()` input `date_sin12_K1`.
x Time difference between observations is zero. Try arranging data to have a positive time difference between observations. If working with time series groups, arrange by groups first, then date.
ℹ Input `date_sin12_K1` is `timetk::fourier_vec(x = date, period = 12, K = 1L, type = "sin")`.
Run `rlang::last_error()` to see where the error occurred. 
@mdancho84
Copy link
Contributor

@mdancho84 mdancho84 commented Jul 2, 2020

Closing this issue. The fix will be included in version 2.1.0.

@mdancho84 mdancho84 closed this Jul 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.