-
Notifications
You must be signed in to change notification settings - Fork 897
Add RateOfChange primitive #2359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
def rate_of_change(values, time): | ||
time_delta = time.diff().dt.total_seconds() | ||
value_delta = values.diff() | ||
return value_delta / time_delta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, what is the difference between your approach and this pandas function:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on a quick read of the documentation, that function appears to just be calculating a percent change between consecutive values. This primitive calculates the rate of change between points (think velocity).
Take this dataframe as an example:
values = pd.Series([0, 30, 180, -90, 0, 240])
time = pd.Series(pd.date_range(start='2019-01-01', freq='1min', periods=6))
df = pd.DataFrame({"values": values, "time": time})
If you use the pandas pct_change
method you get this:
0 NaN
1 inf
2 5.0
3 -1.5
4 -1.0
5 inf
If you compute the rate of change using the elapsed time between points from the time index you get this:
0 NaN
1 0.5
2 2.5
3 -4.5
4 1.5
5 4.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the values were a distance measurement recorded in meters
, the pandas function just tells you how much a value increased or decreased as a percentage of the previous one (%
). This primitive will give you the rate of change of distance, which is a velocity in this case measured in units of meters per second
.
Said another way, this primitive is basically giving you the first derivative with respect to time of a set of values that are ordered by time.
Codecov Report
@@ Coverage Diff @@
## main #2359 +/- ##
=======================================
Coverage 99.48% 99.49%
=======================================
Files 309 310 +1
Lines 19781 19818 +37
=======================================
+ Hits 19680 19717 +37
Misses 101 101
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
featuretools/primitives/standard/transform/numeric/rate_of_change.py
Outdated
Show resolved
Hide resolved
…nge.py Co-authored-by: Shripad Badithe <60528327+sbadithe@users.noreply.github.com>
|
||
|
||
class RateOfChange(TransformPrimitive): | ||
"""Computes the rate of change of a value per second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it may be interesting in the future to have the unit of time be configurable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also thought about that, but I don't think having a configurable unit would add predictive value since changing the unit effectively just scales the output by a fixed value. Might be useful in other circumstances though. My preference is to keep this simple until we identify a specific need for the additional complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I think it would be more of a convenience for the user if they wanted to examine their feature matrix. For example, if the user had data sampled daily.
ColumnSchema(semantic_tags={"numeric"}), | ||
ColumnSchema(semantic_tags={"time_index"}), | ||
] | ||
return_type = ColumnSchema(logical_type=Double, semantic_tags={"numeric"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a rate
semantic tag? Might be something to look at in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! Just left a few comments about potential functionality in the future.
Add RateOfChange primitive
Closes #2176
Adds a new
RateOfChange
primitive that calculates the rate of change of a value in units per second, using the time index and a list of numeric values.