Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling time documentation page #512

Merged
merged 14 commits into from
May 3, 2019
Merged

Improve handling time documentation page #512

merged 14 commits into from
May 3, 2019

Conversation

CharlesBradshaw
Copy link
Contributor

@CharlesBradshaw CharlesBradshaw commented Apr 25, 2019

Improved the handling time documentation page

@codecov
Copy link

codecov bot commented Apr 25, 2019

Codecov Report

Merging #512 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #512      +/-   ##
=========================================
+ Coverage    96.1%   96.1%   +<.01%     
=========================================
  Files         108     108              
  Lines        8898    8900       +2     
=========================================
+ Hits         8551    8553       +2     
  Misses        347     347
Impacted Files Coverage Δ
...turetools/computational_backends/pandas_backend.py 98.07% <ø> (ø) ⬆️
featuretools/entityset/entity.py 96.1% <ø> (ø) ⬆️
...computational_backends/calculate_feature_matrix.py 97.08% <ø> (ø) ⬆️
featuretools/synthesis/dfs.py 100% <ø> (ø) ⬆️
featuretools/tests/demo_tests/test_demo_data.py 100% <100%> (ø) ⬆️
featuretools/demo/flight.py 95.06% <100%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 269307e...e1b6e7e. Read the comment docs.

@kmax12 kmax12 changed the title Cutoff time docs Improved handling time documentation page Apr 30, 2019
@kmax12 kmax12 changed the title Improved handling time documentation page Improve handling time documentation page Apr 30, 2019
@codecov
Copy link

codecov bot commented May 2, 2019

Codecov Report

Merging #512 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #512      +/-   ##
=========================================
+ Coverage    96.1%   96.1%   +<.01%     
=========================================
  Files         108     108              
  Lines        8913    8915       +2     
=========================================
+ Hits         8566    8568       +2     
  Misses        347     347
Impacted Files Coverage Δ
...turetools/computational_backends/pandas_backend.py 98.07% <ø> (ø) ⬆️
featuretools/entityset/entity.py 96.1% <ø> (ø) ⬆️
...computational_backends/calculate_feature_matrix.py 97.09% <ø> (ø) ⬆️
featuretools/synthesis/dfs.py 100% <ø> (ø) ⬆️
featuretools/tests/demo_tests/test_demo_data.py 100% <100%> (ø) ⬆️
featuretools/demo/flight.py 95.06% <100%> (+0.12%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 26dd292...9151dcd. Read the comment docs.

@@ -163,13 +164,34 @@ def _clean_data(data):
clean_data.loc[:, 'flight_id'] = clean_data['carrier'] + '-' + \
clean_data['flight_num'].apply(lambda x: str(x)) + ':' + clean_data['origin'] + '->' + clean_data['dest']

column_order = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated column order to improve print out in the docs

@@ -148,7 +149,7 @@ def _clean_data(data):
clean_data = _reconstruct_times(clean_data)

# Create a time index 6 months before scheduled_dep
clean_data.loc[:, 'time_index'] = clean_data['scheduled_dep_time'] - \
clean_data.loc[:, 'date_scheduled'] = clean_data['scheduled_dep_time'].dt.date - \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed to something more meaningful


Let's make features at some varying times in the flight example. Trip ``14`` is a flight from CLT to PHX on January 31 2017 and trip ``92`` is a flight from PIT to DFW on January 1. We can set any cutoff time before the flight is scheduled to depart, emulating how we would make the prediction at that point in time.
In this computation, features that can be approximated will be calculated at 1 day intervals, while features that cannot be approximated (e.g "what is the destination of this flight?") will be calculated at the exact cutoff time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The flight destination example was a little disorienting since the rest of this section is talking about a fraud detection problem

@kmax12 kmax12 merged commit 7e1e47a into master May 3, 2019
@gsheni gsheni deleted the cutoff_time_docs branch May 3, 2019 19:08
@rwedge rwedge mentioned this pull request May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants