Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert tsfeatures lib to nbdev framework #39

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
8f0c1cc
convert lib to nbdev
jope35 Jan 4, 2024
61112fc
remove pre-commit and .ruff
jope35 Jan 10, 2024
a8d3b1e
update lincense
jope35 Jan 10, 2024
527f3d9
restore .github folder
jope35 Jan 10, 2024
3dd8d5f
remove ruff and pre-commit from dev reqs and add ryp2
jope35 Jan 10, 2024
5e03313
restore setup.py
jope35 Jan 10, 2024
2101882
restore .gitignore
jope35 Jan 10, 2024
528b541
create nbdev specific .gitconfig and .gitattributes
jope35 Jan 10, 2024
846508e
remove *.ipynb from gitignore
jope35 Jan 17, 2024
bf80383
align settings and setup
jope35 Jan 17, 2024
ed0266f
refactor: Restructure notebooks and utils
jope35 Jan 17, 2024
ddabac9
update settings with nixtla user
jope35 Jan 17, 2024
5d420e0
merged core and featrues into one notebook
jope35 Feb 4, 2024
5fe9589
descriptive stats on cluster results
jope35 Feb 9, 2024
cb60c1b
add github action for testing
jope35 Feb 9, 2024
c0eb867
modified setup
jope35 Feb 9, 2024
0135e1b
include a pip reqs file
jope35 Feb 9, 2024
2dd6a65
exclude rpy2
jope35 Feb 9, 2024
e9c1b94
run nbdev_export
jope35 Feb 9, 2024
29c2a98
incorporate pr feedback
jope35 Feb 19, 2024
022858c
adding shields
jope35 Feb 21, 2024
be7efc9
only run on ubuntu
jope35 Feb 21, 2024
74bf82f
explicit ci steps to install nbdev, move test features
jope35 Feb 21, 2024
b7b671f
ci mod
jope35 Feb 21, 2024
424331c
no pip cache
jope35 Feb 21, 2024
cbdb352
add import stmt
jope35 Feb 21, 2024
3a57f9c
add star import
jope35 Feb 21, 2024
2968544
small cleanup
jope35 Feb 24, 2024
5245275
cleanup notebooks
jope35 Feb 29, 2024
00de734
add os to runs-on
jope35 Feb 29, 2024
28f400d
change order of runs-on
jope35 Feb 29, 2024
4cd2e2c
matrix python version and os
jope35 Feb 29, 2024
cf77218
optimize ci file
jope35 Feb 29, 2024
824d7ce
no cache
jope35 Feb 29, 2024
88ae6b7
include tests and R CI
jope35 Apr 19, 2024
8704297
update readme
jope35 Apr 19, 2024
22a1767
alter rpy2 version
jope35 Apr 19, 2024
02f1a6b
debug R install
jope35 Apr 19, 2024
936f0ec
more debug
jope35 Apr 19, 2024
d679122
more debug2
jope35 Apr 19, 2024
80f5d96
one string
jope35 Apr 19, 2024
bff351c
asdf
jope35 Apr 19, 2024
8b9a01f
alter rpy2 version
jope35 Apr 19, 2024
9409776
update R version
jope35 Apr 19, 2024
7e92696
exclude the R dependency
jope35 Apr 19, 2024
00a9b9d
other R settings
jope35 Apr 23, 2024
b437028
explicit PATH
jope35 Apr 23, 2024
903876d
echo rhome
jope35 Apr 23, 2024
523ccf5
export R functions
jope35 Apr 23, 2024
d3c4324
include ubuntu
jope35 Apr 23, 2024
233ea75
include fastcore explicitly as a dev dep
jope35 Apr 23, 2024
180136a
no parallel
jope35 Apr 23, 2024
dbd5047
Merge branch 'main' into main
jope35 Apr 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.ipynb merge=nbdev-merge
11 changes: 11 additions & 0 deletions .gitconfig
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Generated by nbdev_install_hooks
#
# If you need to disable this instrumentation do:
# git config --local --unset include.path
#
# To restore:
# git config --local include.path ../.gitconfig
#
[merge "nbdev-merge"]
name = resolve conflicts with nbdev_fix
driver = nbdev_merge %O %A %B %P
5 changes: 5 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
include settings.ini
include LICENSE
include CONTRIBUTING.md
include README.md
recursive-exclude * __pycache__
244 changes: 176 additions & 68 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,75 +1,85 @@
[![Build](https://github.com/FedericoGarza/tsfeatures/workflows/Python%20package/badge.svg)](https://github.com/FedericoGarza/tsfeatures/tree/master)
[![PyPI version fury.io](https://badge.fury.io/py/tsfeatures.svg)](https://pypi.python.org/pypi/tsfeatures/)
[![Downloads](https://pepy.tech/badge/tsfeatures)](https://pepy.tech/project/tsfeatures)
[![Python 3.6+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-370+/)
[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/FedericoGarza/tsfeatures/blob/master/LICENSE)
jmoralez marked this conversation as resolved.
Show resolved Hide resolved
# tsfeatures

<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

# tsfeatures

Calculates various features from time series data. Python implementation of the R package _[tsfeatures](https://github.com/robjhyndman/tsfeatures)_.
Calculates various features from time series data. Python implementation
of the R package
*[tsfeatures](https://github.com/robjhyndman/tsfeatures)*.

# Installation

You can install the *released* version of `tsfeatures` from the [Python package index](pypi.org) with:
You can install the *released* version of `tsfeatures` from the [Python
package index](pypi.org) with:

``` python
pip install tsfeatures
```

# Usage

The `tsfeatures` main function calculates by default the features used by Montero-Manso, Talagala, Hyndman and Athanasopoulos in [their implementation of the FFORMA model](https://htmlpreview.github.io/?https://github.com/robjhyndman/M4metalearning/blob/master/docs/M4_methodology.html#features).
The `tsfeatures` main function calculates by default the features used
by Montero-Manso, Talagala, Hyndman and Athanasopoulos in [their
implementation of the FFORMA
model](https://htmlpreview.github.io/?https://github.com/robjhyndman/M4metalearning/blob/master/docs/M4_methodology.html#features).

```python
``` python
from tsfeatures import tsfeatures
```

This function receives a panel pandas df with columns `unique_id`, `ds`, `y` and optionally the frequency of the data.
This function receives a panel pandas df with columns `unique_id`, `ds`,
`y` and optionally the frequency of the data.

<img src=https://raw.githubusercontent.com/FedericoGarza/tsfeatures/master/.github/images/y_train.png width="152">

```python
``` python
tsfeatures(panel, freq=7)
```

By default (`freq=None`) the function will try to infer the frequency of each time series (using `infer_freq` from `pandas` on the `ds` column) and assign a seasonal period according to the built-in dictionary `FREQS`:
By default (`freq=None`) the function will try to infer the frequency of
each time series (using `infer_freq` from `pandas` on the `ds` column)
and assign a seasonal period according to the built-in dictionary
`FREQS`:

```python
``` python
FREQS = {'H': 24, 'D': 1,
'M': 12, 'Q': 4,
'W':1, 'Y': 1}
```

You can use your own dictionary using the `dict_freqs` argument:

```python
``` python
tsfeatures(panel, dict_freqs={'D': 7, 'W': 52})
```

## List of available features

| Features |||
|:--------|:------|:-------------|
|acf_features|heterogeneity|series_length|
|arch_stat|holt_parameters|sparsity|
|count_entropy|hurst|stability|
|crossing_points|hw_parameters|stl_features|
|entropy|intervals|unitroot_kpss|
|flat_spots|lumpiness|unitroot_pp|
|frequency|nonlinearity||
|guerrero|pacf_features||
| Features | | |
|:----------------|:----------------|:--------------|
| acf_features | heterogeneity | series_length |
| arch_stat | holt_parameters | sparsity |
| count_entropy | hurst | stability |
| crossing_points | hw_parameters | stl_features |
| entropy | intervals | unitroot_kpss |
| flat_spots | lumpiness | unitroot_pp |
| frequency | nonlinearity | |
| guerrero | pacf_features | |

See the docs for a description of the features. To use a particular feature included in the package you need to import it:
See the docs for a description of the features. To use a particular
feature included in the package you need to import it:

```python
``` python
from tsfeatures import acf_features

tsfeatures(panel, freq=7, features=[acf_features])
```

You can also define your own function and use it together with the included features:
You can also define your own function and use it together with the
included features:

```python
``` python
def number_zeros(x, freq):

number = (x == 0).sum()
Expand All @@ -78,36 +88,41 @@ def number_zeros(x, freq):
tsfeatures(panel, freq=7, features=[acf_features, number_zeros])
```

`tsfeatures` can handle functions that receives a numpy array `x` and a frequency `freq` (this parameter is needed even if you don't use it) and returns a dictionary with the feature name as a key and its value.
`tsfeatures` can handle functions that receives a numpy array `x` and a
frequency `freq` (this parameter is needed even if you don’t use it) and
returns a dictionary with the feature name as a key and its value.

## R implementation

You can use this package to call `tsfeatures` from R inside python (you need to have installed R, the packages `forecast` and `tsfeatures`; also the python package `rpy2`):
You can use this package to call `tsfeatures` from R inside python (you
need to have installed R, the packages `forecast` and `tsfeatures`; also
the python package `rpy2`):

```python
``` python
from tsfeatures.tsfeatures_r import tsfeatures_r

tsfeatures_r(panel, freq=7, features=["acf_features"])
```

Observe that this function receives a list of strings instead of a list of functions.
Observe that this function receives a list of strings instead of a list
of functions.

## Comparison with the R implementation (sum of absolute differences)

### Non-seasonal data (100 Daily M4 time series)

| feature | diff | feature | diff | feature | diff | feature | diff |
|:----------------|-------:|:----------------|-------:|:----------------|-------:|:----------------|-------:|
| e_acf10 | 0 | e_acf1 | 0 | diff2_acf1 | 0 | alpha | 3.2 |
| seasonal_period | 0 | spike | 0 | diff1_acf10 | 0 | arch_acf | 3.3 |
| nperiods | 0 | curvature | 0 | x_acf1 | 0 | beta | 4.04 |
| linearity | 0 | crossing_points | 0 | nonlinearity | 0 | garch_r2 | 4.74 |
| hw_gamma | 0 | lumpiness | 0 | diff2x_pacf5 | 0 | hurst | 5.45 |
| hw_beta | 0 | diff1x_pacf5 | 0 | unitroot_kpss | 0 | garch_acf | 5.53 |
| hw_alpha | 0 | diff1_acf10 | 0 | x_pacf5 | 0 | entropy | 11.65 |
| trend | 0 | arch_lm | 0 | x_acf10 | 0 |
| flat_spots | 0 | diff1_acf1 | 0 | unitroot_pp | 0 |
| series_length | 0 | stability | 0 | arch_r2 | 1.37 |
| feature | diff | feature | diff | feature | diff | feature | diff |
|:----------------|-----:|:----------------|-----:|:--------------|-----:|:----------|------:|
| e_acf10 | 0 | e_acf1 | 0 | diff2_acf1 | 0 | alpha | 3.2 |
| seasonal_period | 0 | spike | 0 | diff1_acf10 | 0 | arch_acf | 3.3 |
| nperiods | 0 | curvature | 0 | x_acf1 | 0 | beta | 4.04 |
| linearity | 0 | crossing_points | 0 | nonlinearity | 0 | garch_r2 | 4.74 |
| hw_gamma | 0 | lumpiness | 0 | diff2x_pacf5 | 0 | hurst | 5.45 |
| hw_beta | 0 | diff1x_pacf5 | 0 | unitroot_kpss | 0 | garch_acf | 5.53 |
| hw_alpha | 0 | diff1_acf10 | 0 | x_pacf5 | 0 | entropy | 11.65 |
| trend | 0 | arch_lm | 0 | x_acf10 | 0 | | |
| flat_spots | 0 | diff1_acf1 | 0 | unitroot_pp | 0 | | |
| series_length | 0 | stability | 0 | arch_r2 | 1.37 | | |

To replicate this results use:

Expand All @@ -118,33 +133,126 @@ python -m tsfeatures.compare_with_r --results_directory /some/path

### Sesonal data (100 Hourly M4 time series)

| feature | diff | feature | diff | feature | diff | feature | diff |
|:------------------|-------:|:-------------|-----:|:----------|--------:|:-----------|--------:|
| series_length | 0 |seas_acf1 | 0 | trend | 2.28 | hurst | 26.02 |
| flat_spots | 0 |x_acf1|0| arch_r2 | 2.29 | hw_beta | 32.39 |
| nperiods | 0 |unitroot_kpss|0| alpha | 2.52 | trough | 35 |
| crossing_points | 0 |nonlinearity|0| beta | 3.67 | peak | 69 |
| seasonal_period | 0 |diff1_acf10|0| linearity | 3.97 |
| lumpiness | 0 |x_acf10|0| curvature | 4.8 |
| stability | 0 |seas_pacf|0| e_acf10 | 7.05 |
| arch_lm | 0 |unitroot_pp|0| garch_r2 | 7.32 |
| diff2_acf1 | 0 |spike|0| hw_gamma | 7.32 |
| diff2_acf10 | 0 |seasonal_strength|0.79| hw_alpha | 7.47 |
| diff1_acf1 | 0 |e_acf1|1.67| garch_acf | 7.53 |
| diff2x_pacf5 | 0 |arch_acf|2.18| entropy | 9.45 |
| feature | diff | feature | diff | feature | diff | feature | diff |
|:----------------|-----:|:------------------|-----:|:----------|-----:|:--------|------:|
| series_length | 0 | seas_acf1 | 0 | trend | 2.28 | hurst | 26.02 |
| flat_spots | 0 | x_acf1 | 0 | arch_r2 | 2.29 | hw_beta | 32.39 |
| nperiods | 0 | unitroot_kpss | 0 | alpha | 2.52 | trough | 35 |
| crossing_points | 0 | nonlinearity | 0 | beta | 3.67 | peak | 69 |
| seasonal_period | 0 | diff1_acf10 | 0 | linearity | 3.97 | | |
| lumpiness | 0 | x_acf10 | 0 | curvature | 4.8 | | |
| stability | 0 | seas_pacf | 0 | e_acf10 | 7.05 | | |
| arch_lm | 0 | unitroot_pp | 0 | garch_r2 | 7.32 | | |
| diff2_acf1 | 0 | spike | 0 | hw_gamma | 7.32 | | |
| diff2_acf10 | 0 | seasonal_strength | 0.79 | hw_alpha | 7.47 | | |
| diff1_acf1 | 0 | e_acf1 | 1.67 | garch_acf | 7.53 | | |
| diff2x_pacf5 | 0 | arch_acf | 2.18 | entropy | 9.45 | | |

To replicate this results use:
[![Build](https://github.com/FedericoGarza/tsfeatures/workflows/Python%20package/badge.svg)](https://github.com/FedericoGarza/tsfeatures/tree/master)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like from this point onwards this is duplicated

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

de-duplicated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still duplicated

[![PyPI version
fury.io](https://badge.fury.io/py/tsfeatures.svg)](https://pypi.python.org/pypi/tsfeatures/)
[![Downloads](https://pepy.tech/badge/tsfeatures.png)](https://pepy.tech/project/tsfeatures)
[![Python
3.6+](https://img.shields.io/badge/python-3.7+-blue.svg)](https://www.python.org/downloads/release/python-370+/)
[![License:
MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://github.com/FedericoGarza/tsfeatures/blob/master/LICENSE)

``` console
python -m tsfeatures.compare_with_r --results_directory /some/path \
--dataset_name Hourly --num_obs 100
# tsfeatures

Calculates various features from time series data. Python implementation
of the R package
*[tsfeatures](https://github.com/robjhyndman/tsfeatures)*.

# Installation

You can install the *released* version of `tsfeatures` from the [Python
package index](pypi.org) with:

``` python
pip install tsfeatures
```

# Usage

The `tsfeatures` main function calculates by default the features used
by Montero-Manso, Talagala, Hyndman and Athanasopoulos in [their
implementation of the FFORMA
model](https://htmlpreview.github.io/?https://github.com/robjhyndman/M4metalearning/blob/master/docs/M4_methodology.html#features).

``` python
from tsfeatures import tsfeatures
```

This function receives a panel pandas df with columns `unique_id`, `ds`,
`y` and optionally the frequency of the data.

<img src=https://raw.githubusercontent.com/FedericoGarza/tsfeatures/master/.github/images/y_train.png width="152">

``` python
tsfeatures(panel, freq=7)
```

By default (`freq=None`) the function will try to infer the frequency of
each time series (using `infer_freq` from `pandas` on the `ds` column)
and assign a seasonal period according to the built-in dictionary
`FREQS`:

``` python
FREQS = {'H': 24, 'D': 1,
'M': 12, 'Q': 4,
'W':1, 'Y': 1}
```

You can use your own dictionary using the `dict_freqs` argument:

``` python
tsfeatures(panel, dict_freqs={'D': 7, 'W': 52})
```

## List of available features

| Features | | |
|:----------------|:----------------|:--------------|
| acf_features | heterogeneity | series_length |
| arch_stat | holt_parameters | sparsity |
| count_entropy | hurst | stability |
| crossing_points | hw_parameters | stl_features |
| entropy | intervals | unitroot_kpss |
| flat_spots | lumpiness | unitroot_pp |
| frequency | nonlinearity | |
| guerrero | pacf_features | |

See the docs for a description of the features. To use a particular
feature included in the package you need to import it:

``` python
from tsfeatures import acf_features

tsfeatures(panel, freq=7, features=[acf_features])
```

You can also define your own function and use it together with the
included features:

``` python
def number_zeros(x, freq):

number = (x == 0).sum()
return {'number_zeros': number}

tsfeatures(panel, freq=7, features=[acf_features, number_zeros])
```

`tsfeatures` can handle functions that receives a numpy array `x` and a
frequency `freq` (this parameter is needed even if you don’t use it) and
returns a dictionary with the feature name as a key and its value.

# Authors

* **Federico Garza** - [FedericoGarza](https://github.com/FedericoGarza)
* **Kin Gutierrez** - [kdgutier](https://github.com/kdgutier)
* **Cristian Challu** - [cristianchallu](https://github.com/cristianchallu)
* **Jose Moralez** - [jose-moralez](https://github.com/jose-moralez)
* **Ricardo Olivares** - [rolivaresar](https://github.com/rolivaresar)
* **Max Mergenthaler** - [mergenthaler](https://github.com/mergenthaler)
- **Federico Garza** - [FedericoGarza](https://github.com/FedericoGarza)
- **Kin Gutierrez** - [kdgutier](https://github.com/kdgutier)
- **Cristian Challu** -
[cristianchallu](https://github.com/cristianchallu)
- **Jose Moralez** - [jose-moralez](https://github.com/jose-moralez)
- **Ricardo Olivares** - [rolivaresar](https://github.com/rolivaresar)
- **Max Mergenthaler** - [mergenthaler](https://github.com/mergenthaler)
1 change: 1 addition & 0 deletions nbs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/.quarto/
Loading