Skip to content

Fix Progress Bar in Notebook #932

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 29, 2020
Merged

Fix Progress Bar in Notebook #932

merged 9 commits into from
Apr 29, 2020

Conversation

jeff-hernandez
Copy link
Contributor

@jeff-hernandez jeff-hernandez commented Apr 28, 2020

This closes #791 by preventing an interrupted progress bar from double printing when running in a jupyter notebook.

@codecov
Copy link

codecov bot commented Apr 28, 2020

Codecov Report

Merging #932 into master will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #932      +/-   ##
==========================================
+ Coverage   98.17%   98.22%   +0.04%     
==========================================
  Files         119      119              
  Lines       10867    10859       -8     
==========================================
- Hits        10669    10666       -3     
+ Misses        198      193       -5     
Impacted Files Coverage Δ
...computational_backends/calculate_feature_matrix.py 98.58% <100.00%> (+<0.01%) ⬆️
featuretools/utils/gen_utils.py 100.00% <100.00%> (+8.92%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcb0d87...d04ef35. Read the comment docs.

Copy link
Contributor

@rwedge rwedge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@jeff-hernandez jeff-hernandez merged commit 8cf9b7e into master Apr 29, 2020
@jeff-hernandez jeff-hernandez deleted the tqdm_notebook branch April 29, 2020 22:30
Copy link

@casperdcl casperdcl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of lines should be removed and replaced with tqdm>=4.45.0 in requirements.txt

@@ -247,6 +247,7 @@ def calculate_feature_matrix(features, entityset=None, cutoff_time=None, instanc
tqdm_options.update({'file': open(os.devnull, 'w'), 'disable': False})

progress_bar = make_tqdm_iterator(**tqdm_options)
progress_bar._instances.clear()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be removed

@@ -299,7 +300,6 @@ def calculate_feature_matrix(features, entityset=None, cutoff_time=None, instanc

progress_bar.refresh()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line should be removed

@rwedge
Copy link
Contributor

rwedge commented Apr 30, 2020

@casperdcl, I implemented those changes on a different branch, tqdm-suggestions, but it did not resolve the issue

Running this code in a notebook:

import featuretools as ft
import tqdm

print(tqdm.__version__)

es = ft.demo.load_mock_customer(return_entityset=True)

cutoff_time = es['transactions'].df[["transaction_id", "transaction_time"]]

ft.dfs(entityset=es, target_entity='transactions', cutoff_time=cutoff_time, verbose=True)

Interrupting the kernel during the dfs call and rerunning dfs caused the progress bar to print on a new line each update

Screen Shot 2020-04-30 at 12 42 56 PM

@frances-h frances-h mentioned this pull request Apr 30, 2020
@casperdcl
Copy link

Thanks @rwedge - I suspect this may be a pandas wrapper issue. Can you try interrupting and re-running:

from tqdm import trange
for i in trange(int(1e9)):
    pass

@rwedge
Copy link
Contributor

rwedge commented Apr 30, 2020

@casperdcl - running the code you provided, interrupting the loop, then re-running the loop does not result in unexpected progress bar behavior. It looks normal.

@casperdcl
Copy link

Great so this is a wrapper issue; will open a new issue upstream

@rwedge
Copy link
Contributor

rwedge commented Apr 30, 2020

Thanks for helping look into this!

@casperdcl
Copy link

@kmax12 I assume the reason you became my first ever sponsor? ❤️ 🎊

@kmax12
Copy link
Contributor

kmax12 commented May 1, 2020

@casperdcl the reason is that I've been a longtime user of tqdm and wanted to support you. the help here is above and beyond. we really appreciate it. thanks!

@casperdcl
Copy link

:D I'd like to claim that I have free time I'd like to donate to the world as an act of kindness but the truth is more like this:

xkcd#356

I mean; I wrote the tqdm.pandas wrapper years ago, continuously supported since then, and to date still have never used it myself.

@casperdcl
Copy link

casperdcl commented May 1, 2020

With conda create -n ft 'featuretools<0.14.0' tqdm ipykernel:

#from tqdm.auto import tqdm
from tqdm import tqdm
import pandas as pd
from random import random
import warnings

with warnings.catch_warnings():
    warnings.filterwarnings('ignore', category=FutureWarning)
    tqdm.pandas()

df = pd.DataFrame((random() for _ in range(10)) for _ in range(int(1e6)))

try:
    df.progress_apply(lambda x: x**x ** (1/x) ** x ** (1/x) ** x ** (1/x))
except KeyboardInterrupt:
    print("finished with %d instances left" % len(tqdm._instances))
60%|██████    | 6/10 [00:02<00:01,  2.77it/s]
finished with 0 instances left
#from tqdm.auto import tqdm
from tqdm import tqdm
import featuretools as ft

es = ft.demo.load_mock_customer(return_entityset=True)
cutoff_time = es['transactions'].df[["transaction_id", "transaction_time"]]

try:
    ft.dfs(entityset=es, target_entity='transactions', cutoff_time=cutoff_time, verbose=True)
except KeyboardInterrupt:
    print("finished with %d instances left" % len(tqdm._instances))
Built 33 features
Elapsed: 00:02 | Progress:   6%|▌         finished with 1 instances left
Elapsed: 00:02 | Progress:   6%|▌         

So tqdm.pandas works as expected but featuretools doesn't.

Looks like featuretools may have some error handling. My guess is your custom function which you're handing to pandas will itself catch the error and not let it propagate to tqdm. If you're doing your own error handling then it's your responsibility to clean up everything, including but not limited to tqdm. You'd really need to close() the tqdm instance rather than hackily clearing the whole internal _instances set:

https://github.com/tqdm/tqdm/blob/5e8978909c639bc20f58bf877e66cc6a0b9fb5bf/tqdm/std.py#L766-L767

@rwedge
Copy link
Contributor

rwedge commented May 4, 2020

@casperdcl thanks for looking into this, I've opened a new issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Double printing progress bar
4 participants