Skip to content

Add RateOfChange primitive #2359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 4, 2022
Merged

Conversation

thehomebrewnerd
Copy link
Contributor

Add RateOfChange primitive

Closes #2176

Adds a new RateOfChange primitive that calculates the rate of change of a value in units per second, using the time index and a list of numeric values.

@thehomebrewnerd thehomebrewnerd marked this pull request as draft November 3, 2022 20:50
def rate_of_change(values, time):
time_delta = time.diff().dt.total_seconds()
value_delta = values.diff()
return value_delta / time_delta
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, what is the difference between your approach and this pandas function:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on a quick read of the documentation, that function appears to just be calculating a percent change between consecutive values. This primitive calculates the rate of change between points (think velocity).

Take this dataframe as an example:

values = pd.Series([0, 30, 180, -90, 0, 240])
time = pd.Series(pd.date_range(start='2019-01-01', freq='1min', periods=6))
df = pd.DataFrame({"values": values, "time": time})

If you use the pandas pct_change method you get this:

0    NaN
1    inf
2    5.0
3   -1.5
4   -1.0
5    inf

If you compute the rate of change using the elapsed time between points from the time index you get this:

0    NaN
1    0.5
2    2.5
3   -4.5
4    1.5
5    4.0

Copy link
Contributor Author

@thehomebrewnerd thehomebrewnerd Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the values were a distance measurement recorded in meters, the pandas function just tells you how much a value increased or decreased as a percentage of the previous one (%). This primitive will give you the rate of change of distance, which is a velocity in this case measured in units of meters per second.

Said another way, this primitive is basically giving you the first derivative with respect to time of a set of values that are ordered by time.

@codecov
Copy link

codecov bot commented Nov 3, 2022

Codecov Report

Merging #2359 (a3e8aee) into main (25e70c9) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main    #2359   +/-   ##
=======================================
  Coverage   99.48%   99.49%           
=======================================
  Files         309      310    +1     
  Lines       19781    19818   +37     
=======================================
+ Hits        19680    19717   +37     
  Misses        101      101           
Impacted Files Coverage Δ
.../primitives/standard/transform/numeric/__init__.py 100.00% <100.00%> (ø)
...tives/standard/transform/numeric/rate_of_change.py 100.00% <100.00%> (ø)
.../tests/primitive_tests/test_transform_primitive.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@thehomebrewnerd thehomebrewnerd marked this pull request as ready for review November 4, 2022 13:55
…nge.py

Co-authored-by: Shripad Badithe <60528327+sbadithe@users.noreply.github.com>


class RateOfChange(TransformPrimitive):
"""Computes the rate of change of a value per second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it may be interesting in the future to have the unit of time be configurable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought about that, but I don't think having a configurable unit would add predictive value since changing the unit effectively just scales the output by a fixed value. Might be useful in other circumstances though. My preference is to keep this simple until we identify a specific need for the additional complexity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think it would be more of a convenience for the user if they wanted to examine their feature matrix. For example, if the user had data sampled daily.

ColumnSchema(semantic_tags={"numeric"}),
ColumnSchema(semantic_tags={"time_index"}),
]
return_type = ColumnSchema(logical_type=Double, semantic_tags={"numeric"})
Copy link
Contributor

@sbadithe sbadithe Nov 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a rate semantic tag? Might be something to look at in the future.

Copy link
Contributor

@sbadithe sbadithe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! Just left a few comments about potential functionality in the future.

@thehomebrewnerd thehomebrewnerd merged commit 2678df3 into main Nov 4, 2022
@thehomebrewnerd thehomebrewnerd deleted the issue-2176-rate-of-change-primitive branch November 4, 2022 18:07
@gsheni gsheni mentioned this pull request Nov 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add RateOfChange primitive
4 participants