Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom feature doesn't work #35

Open
GGA-PERSO opened this issue Jun 18, 2023 · 1 comment
Open

Custom feature doesn't work #35

GGA-PERSO opened this issue Jun 18, 2023 · 1 comment
Labels

Comments

@GGA-PERSO
Copy link

GGA-PERSO commented Jun 18, 2023

What happened + What you expected to happen

As your doc mentions it should be possible to add custom feature (I copy paste your function from README)
=> but nothing happens after a few longs minutes

Could you please check ?

Versions / Dependencies

0.4.2 (the last one)

Reproduction script

import pandas as pd
import numpy as np
from tsfeatures import tsfeatures

periods = 24
ind = pd.date_range(start='2021-01-01', periods=periods, freq='MS')
vals = np.random.rand(periods)
df = pd.DataFrame({'ds':ind, 'y':vals, 'unique_id':1})

def number_zeros(x, freq):
number = (x == 0).sum()
return {'number_zeros': number}

features_df = tsfeatures(df,freq=12, features=[number_zeros])
features_df

Issue Severity

None

@GGA-PERSO GGA-PERSO added the bug label Jun 18, 2023
@truonghm
Copy link

truonghm commented Aug 28, 2023

I'm having a similar issue. If I understand correctly, the number_zeros function will count the number of zeros for each unique_id.

def number_zeros(x, freq):

    number = (x == 0).sum()
    return {'number_zeros': number}

features = tsf.tsfeatures(data, features=[tsf.stl_features, number_zeros], dict_freqs={'MS': 12,})

Result is wrong because number_zeros is not supposed to be all zeros like this. In the data there are some unique ids that contain zeros.

unique_id number_zeros
0 282998 0
1 347809 0
2 489552 0
3 594474 0
4 594861 0
5 595209 0
6 595956 0
7 600426 0
8 600429 0

Currently I'm having to do this instead:

features = pd.merge(
    data[["unique_id", "y"]].query("y>0").groupby("unique_id").count().reset_index(),
    features,
    how="left",
    on="unique_id",
)

features.rename(columns={"y": "series_length"}, inplace=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants