-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update flight #175
Update flight #175
Conversation
featuretools/demo/flight.py
Outdated
if demo: | ||
filename = 'flight_dataset_sample.csv.zip' | ||
key = 'bots_flight_data_2017/data_2017_jan_feb.csv.zip' | ||
rows = 860457.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this have to be a float?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do rows/ 99
earlier. Not sure if the cleanest solution is to do this, 99.
or float(rows)
when we divide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure it matters because you take the ceil
of it, but move any casting as close as possible to where it is required
featuretools/demo/flight.py
Outdated
if month_filter is not None: | ||
tmp = False | ||
for month in month_filter: | ||
tmp = tmp | (clean_data['scheduled_dep_time'].apply(lambda x: x.month) == month) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this be clean_data['scheduled_dep_time'].dt.month == month
? if that works, it's probably much faster than apply
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, you should try to use an isin here rather than iterate over each month
Codecov Report
@@ Coverage Diff @@
## master #175 +/- ##
==========================================
+ Coverage 92.86% 93.15% +0.28%
==========================================
Files 69 70 +1
Lines 7623 7551 -72
==========================================
- Hits 7079 7034 -45
+ Misses 544 517 -27
Continue to review full report at Codecov.
|
4989609
to
2301f68
Compare
Looks good to me |
Rework the
load_flight
data to use newly uploaded csvs from 2017. New functionality:month_filter
to restrict which months are used (e.g. [3, 5] to use March and May)categorical_filter
to restrict which categorical values are loaded. As an example{'dest_city': ['Boston, MA'], 'origin_city': ['Washington, DC']}
will take all flights into Boston and all flights out of Washington DC).verbose
argument which gives a progress bar while loading the data.