Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update flight #175

Merged
merged 11 commits into from
Jun 26, 2018
Merged

Update flight #175

merged 11 commits into from
Jun 26, 2018

Conversation

Seth-Rothschild
Copy link
Contributor

Rework the load_flight data to use newly uploaded csvs from 2017. New functionality:

  1. Can take in a month_filter to restrict which months are used (e.g. [3, 5] to use March and May)
  2. Can take in a categorical_filter to restrict which categorical values are loaded. As an example {'dest_city': ['Boston, MA'], 'origin_city': ['Washington, DC']} will take all flights into Boston and all flights out of Washington DC).
  3. An optional verbose argument which gives a progress bar while loading the data.

if demo:
filename = 'flight_dataset_sample.csv.zip'
key = 'bots_flight_data_2017/data_2017_jan_feb.csv.zip'
rows = 860457.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this have to be a float?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do rows/ 99 earlier. Not sure if the cleanest solution is to do this, 99. or float(rows) when we divide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure it matters because you take the ceil of it, but move any casting as close as possible to where it is required

if month_filter is not None:
tmp = False
for month in month_filter:
tmp = tmp | (clean_data['scheduled_dep_time'].apply(lambda x: x.month) == month)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be clean_data['scheduled_dep_time'].dt.month == month? if that works, it's probably much faster than apply

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, you should try to use an isin here rather than iterate over each month

@codecov-io
Copy link

codecov-io commented Jun 25, 2018

Codecov Report

Merging #175 into master will increase coverage by 0.28%.
The diff coverage is 89.01%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #175      +/-   ##
==========================================
+ Coverage   92.86%   93.15%   +0.28%     
==========================================
  Files          69       70       +1     
  Lines        7623     7551      -72     
==========================================
- Hits         7079     7034      -45     
+ Misses        544      517      -27
Impacted Files Coverage Δ
featuretools/tests/demo_tests/test_demo_data.py 100% <100%> (ø) ⬆️
featuretools/demo/flight.py 88.5% <87.65%> (+65.92%) ⬆️
featuretools/entityset/entity.py 85.81% <0%> (-1.44%) ⬇️
featuretools/utils/wrangle.py 66.98% <0%> (-0.91%) ⬇️
featuretools/variable_types/variable.py 91.66% <0%> (-0.65%) ⬇️
featuretools/tests/dfs_tests/test_dfs_method.py 98.07% <0%> (-0.34%) ⬇️
featuretools/entityset/entityset.py 93.55% <0%> (-0.22%) ⬇️
featuretools/tests/entityset_tests/test_entity.py 100% <0%> (ø) ⬆️
...ols/tests/feature_function_tests/test_agg_feats.py 98.52% <0%> (ø) ⬆️
featuretools/synthesis/dfs.py 100% <0%> (ø) ⬆️
... and 11 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b2aead5...2301f68. Read the comment docs.

@kmax12
Copy link
Contributor

kmax12 commented Jun 25, 2018

Looks good to me

@kmax12 kmax12 merged commit f224e22 into master Jun 26, 2018
@rwedge rwedge mentioned this pull request Jul 2, 2018
@Seth-Rothschild Seth-Rothschild deleted the update-flight branch August 10, 2018 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants