Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add ETA to progress bar and fix not showing the progress bar if irrelavant #253

Merged
merged 8 commits into from
Jan 16, 2024

Conversation

ignacioct
Copy link
Contributor

@ignacioct ignacioct commented Jan 15, 2024

Closes #243

Script for trying the progress bar

from distilabel.llm import InferenceEndpointsLLM
from distilabel.pipeline import Pipeline
from distilabel.tasks import TextGenerationTask

from datasets import Dataset

token = ""
model = "aws-notus-7b-v1-3184"
namespace = "argilla"

llm = InferenceEndpointsLLM(
    endpoint_name_or_model_id=model,  # type: ignore
    endpoint_namespace=namespace,  # type: ignore
    token=token,
    task=TextGenerationTask(),
)

pipeline = Pipeline(generator=llm)

inputs = ["Generate whatever"] * 50
dataset = Dataset.from_dict({"input": inputs})

generated_instructions = pipeline.generate(
    dataset=dataset, num_generations=1, display_progress_bar=True
)

print(generated_instructions[0])

With just this change, we obtain the following using the try script above:


22:37:49 INFO     [PID: 48942] Processing batch 50 of 50...                                                                                 pipeline.py:567
         INFO     [PID: 48942] Calling generator for batch 50...                                                                            pipeline.py:571
Texts Generated ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 50/50 0:03:42
Rows labelled   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0%  0/50 -:--:--
Flattening the indices: 100%|████████████████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 11395.09 examples/s]
Saving the dataset (1/1 shards): 100%|███████████████████████████████████████████████████████████████████████████| 50/50 [00:00<00:00, 10768.98 examples/s]

Thanks @plaguss for all the doc, I owe you a beer! He was afraid that dealing with OpenAI's futures would break the progress, I still have to test that specifically.

@ignacioct ignacioct self-assigned this Jan 15, 2024
@ignacioct ignacioct marked this pull request as draft January 15, 2024 21:22
@ignacioct ignacioct marked this pull request as ready for review January 15, 2024 21:44
Copy link
Member

@davidberenstein1957 davidberenstein1957 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a quick and easy one haha. @ignacioct , if you want to tackle something else, feel free to pick something nice. Also, there is this tiny thing that you might have missed in the original issue? #243 (comment)

In the example above, we also see the "Rows labelled".

@ignacioct
Copy link
Contributor Author

@davidberenstein1957 ups, didn't see it! I fixed it in the last push, so everything should be working now.

@plaguss
Copy link
Contributor

plaguss commented Jan 16, 2024

🍻 @ignacioct. Could you check a simple example with OpenAI to see if it renders properly? After that we should be ready to go

@ignacioct
Copy link
Contributor Author

@plaguss do I need to set up an inference endpoint or is there a way to avoid that? I tried the model that the HF people used to test the serverless implementation, but it was giving me limit errors

@davidberenstein1957
Copy link
Member

@ignacioct, I believe we can also only use a generator. Correct?

@davidberenstein1957 davidberenstein1957 changed the title Adding ETA to progress bar feat: add ETA to progress bar and fix not showing the progress bar if irrelavant Jan 16, 2024
@plaguss
Copy link
Contributor

plaguss commented Jan 16, 2024

we can use only the generator, but it would be nice to see if there is some problem with a complete pipeline, or at least a labelling pipeline using OpenAI. Does he have access to Open AI?

@davidberenstein1957
Copy link
Member

@ignacioct there are some linting issues

@davidberenstein1957 davidberenstein1957 added this to the 0.4.0 milestone Jan 16, 2024
@davidberenstein1957 davidberenstein1957 merged commit d15b671 into main Jan 16, 2024
4 checks passed
@davidberenstein1957 davidberenstein1957 deleted the feat/eta_progress_bars branch January 16, 2024 15:52
@ignacioct ignacioct restored the feat/eta_progress_bars branch January 16, 2024 15:55
@ignacioct ignacioct deleted the feat/eta_progress_bars branch January 16, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] ETA for time until completion for progress bars
3 participants