Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update serialization method of _BatchManager to write each step on its own file #496

Merged
merged 1 commit into from
Apr 2, 2024

Conversation

plaguss
Copy link
Contributor

@plaguss plaguss commented Apr 2, 2024

Description

Updates the caching mechanism form _BatchManager to write each _BatchManagerStep on it's own file and load it back.

Closes #487

@plaguss plaguss requested a review from gabrielmbmb April 2, 2024 10:14
@plaguss plaguss self-assigned this Apr 2, 2024
@plaguss plaguss added the enhancement New feature or request label Apr 2, 2024
@plaguss plaguss added this to the 1.1.0 milestone Apr 2, 2024
@plaguss plaguss merged commit 251e34c into core-refactor Apr 2, 2024
4 checks passed
@plaguss plaguss deleted the cache-batch-manager-step branch April 2, 2024 18:39
plaguss added a commit that referenced this pull request Apr 3, 2024
* Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496)

* Update replacing string

* Add guide to the CLI

* Added CLI to api reference a small reference to that from the tutorial
plaguss added a commit that referenced this pull request Apr 3, 2024
* Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496)

* Add docstrings to lost argument in Distiset

* Add section for caching in advanced tutorial
gabrielmbmb added a commit that referenced this pull request Apr 9, 2024
* Draft refactor docs

* Include layout for the api

* Layout for the docs

* Redirect imports of LLMs

* Draft overview and getting started

* Update docstrings

* Fix docstrings

* Fix argilla reference

* Remove extra line-break

* Refactor and rename `llm` -> `llms`

* Refactor and rename `task` -> `tasks`

* Remove extra line-breaks

* Add missing `type: ignore`

* Update `tasks` and `llms` imports

* Fix imports in `tests/`

* Fix `QualityScorer.format_input` signature

* Update `extra.md`

* Fix `mkdocs.yml` API reference for LLMs

* Add `docs/papers` (WIP)

* Update `docs/papers` (WIP)

* Fix imports after rename to `tasks`

* Remove not used files

* Update main page

* Move argilla docs

* Move papers to sections

* Remove old tutorials

* Update nav

* Remove navigation

* Advances on docs, learn section (#497)

* Add section for distiset

* Update distiset

* Add sample images for screenshots of pipeline runs

* Remove unused files

* Draft including tutorial and advances steps, work in progress

* Fix minor bugs and add `docs/sections/papers/*.md` (#499)

* Fix `distilabel.steps.tasks` imports in `__init__`

* Fix formatting in `__init__.py`

* Remove `commit_message` from `push_to_hub`

* Add missing `super().load()` to load `logging`

* Fix `outputs` in `UltraFeedback`

* Add `model_post_init` in `Argilla` to supress `warnings`

* Add `docs/sections/papers/ultrafeedback.md`

* Add `docs/sections/papers/instruction_backtranslation.md`

* Fix `tests/unit`

* Add `httpx` under `TYPE_CHECKING`

* Fix `argilla` optional dependency handling

* Revert `AnthropicLLM.http_client` typing and add `httpx` dependency instead

* Apply suggestions from code review

Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>

---------

Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>

* Docs cli (#502)

* Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496)

* Update replacing string

* Add guide to the CLI

* Added CLI to api reference a small reference to that from the tutorial

* Docs caching (#500)

* Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496)

* Add docstrings to lost argument in Distiset

* Add section for caching in advanced tutorial

* Add `AzureOpenAILLM` (#505)

* Add `AzureOpenAILLM`

* Update `distilabel.llms` imports

* Fix `base_url` env var and add `api_version` env var

* Add `AzureOpenAILLM` to `test_imports`

* Add `TestAzureOpenAILLM`

* Fix `base_url` docstring

* Remove `together` extra and place `tests` extra properly

* Fix extras alphabetic order in `pyproject.toml`

* Update `docs/index.md` and `README.md`

* Add `docs/api/llms/azure.md`

* Docs steps (#503)

* Update layout of steps

* Add step guide and draft of special types of steps

* Add reference for the step decorator

* Include step decorator in the tutorial

* Add intro to the different types of steps

* Add generator steps

* Update general and global steps

* Fix typos

* Missing argilla steps examples in general steps

* Create initial layout for tasks

* Add special tasks

* Add `StepInput` missing import

* Deita tutorial for docs  (#504)

* docs: add deita notebook from community meetup

* Add `asyncio.get_running_loop` for Colab

* feat: refactor into individual steps

* fix: patch async active loops

* chore: tidy print incremental steps

* fix: remove nested asyncio

* convert tutorial to markdown and move

* add assets to repo

* reference tutorial in mkdocs menu bar

* formatting and prose in deita tutorial

* Add mathjax to render math properly

* Update sections to render properly and add some stylistic choices for variable names

* update imports to shortcuts in Deita tutorial

Co-authored-by: Alvaro Bartolome <alvaro@argilla.io>

* Update docs/sections/papers/deita.md

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* Update docs/sections/papers/deita.md

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* docs: respond to prose feedback

---------

Co-authored-by: Alvaro Bartolome <alvaro@argilla.io>
Co-authored-by: plaguss <agustin@argilla.io>
Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* Update docs/sections/learn/steps/index.md

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Update docs/sections/learn/steps/index.md

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Update docs/sections/learn/steps/index.md

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Update docs/sections/learn/steps/index.md

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Docs tasks (#506)

* Add feedback tasks

* Add text generation and self instruct

* Add fix for runtime parameter of extra arguments

* Update docs/sections/learn/tasks/feedback_tasks.md

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* Add text generation specific tasks

* Add example of custom task

* Add runtime parameters

* Modify place of runtime parameters

---------

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* Add `docs/sections/learn/argilla.md` (#509)

* Fix wrong formatting around `#`

* Add `{TextGeneration,Preference}ToArgilla` in docs

* Add `argilla.md` and move Argilla docs there

* Add detailed examples in `argilla.md`

* Add `assets` for `argilla.md`

* Add deployment tips in `argilla.md`

* Add `docs/sections/learn/llms/index.md` (#514)

* Add `docs/sections/learn/llms/index.md`

* Update docs/sections/learn/llms/index.md

---------

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Docs pipeline (#512)

* Draft of pipeline section

* Finish pipeline docs section

* Add CLI `run` example

---------

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

---------

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>
Co-authored-by: Alvaro Bartolome <alvaro@argilla.io>
Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>
Co-authored-by: burtenshaw <ben@argilla.io>
Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>
gabrielmbmb added a commit that referenced this pull request Apr 9, 2024
* Draft refactor docs

* Include layout for the api

* Layout for the docs

* Redirect imports of LLMs

* Draft overview and getting started

* Update docstrings

* Fix docstrings

* Fix argilla reference

* Remove extra line-break

* Refactor and rename `llm` -> `llms`

* Refactor and rename `task` -> `tasks`

* Remove extra line-breaks

* Add missing `type: ignore`

* Update `tasks` and `llms` imports

* Fix imports in `tests/`

* Fix `QualityScorer.format_input` signature

* Update `extra.md`

* Fix `mkdocs.yml` API reference for LLMs

* Add `docs/papers` (WIP)

* Update `docs/papers` (WIP)

* Fix imports after rename to `tasks`

* Remove not used files

* Update main page

* Move argilla docs

* Move papers to sections

* Remove old tutorials

* Update nav

* Remove navigation

* Advances on docs, learn section (#497)

* Add section for distiset

* Update distiset

* Add sample images for screenshots of pipeline runs

* Remove unused files

* Draft including tutorial and advances steps, work in progress

* Fix minor bugs and add `docs/sections/papers/*.md` (#499)

* Fix `distilabel.steps.tasks` imports in `__init__`

* Fix formatting in `__init__.py`

* Remove `commit_message` from `push_to_hub`

* Add missing `super().load()` to load `logging`

* Fix `outputs` in `UltraFeedback`

* Add `model_post_init` in `Argilla` to supress `warnings`

* Add `docs/sections/papers/ultrafeedback.md`

* Add `docs/sections/papers/instruction_backtranslation.md`

* Fix `tests/unit`

* Add `httpx` under `TYPE_CHECKING`

* Fix `argilla` optional dependency handling

* Revert `AnthropicLLM.http_client` typing and add `httpx` dependency instead

* Apply suggestions from code review

Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>

---------

Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>

* Docs cli (#502)

* Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496)

* Update replacing string

* Add guide to the CLI

* Added CLI to api reference a small reference to that from the tutorial

* Docs caching (#500)

* Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496)

* Add docstrings to lost argument in Distiset

* Add section for caching in advanced tutorial

* Add `AzureOpenAILLM` (#505)

* Add `AzureOpenAILLM`

* Update `distilabel.llms` imports

* Fix `base_url` env var and add `api_version` env var

* Add `AzureOpenAILLM` to `test_imports`

* Add `TestAzureOpenAILLM`

* Fix `base_url` docstring

* Remove `together` extra and place `tests` extra properly

* Fix extras alphabetic order in `pyproject.toml`

* Update `docs/index.md` and `README.md`

* Add `docs/api/llms/azure.md`

* Docs steps (#503)

* Update layout of steps

* Add step guide and draft of special types of steps

* Add reference for the step decorator

* Include step decorator in the tutorial

* Add intro to the different types of steps

* Add generator steps

* Update general and global steps

* Fix typos

* Missing argilla steps examples in general steps

* Create initial layout for tasks

* Add special tasks

* Add `StepInput` missing import

* Add `CohereLLM`

* Add missing docstring

* Add unit tests for `CohereLLM`

* Add `cohere` extra

* Reference `CohereLLM`

---------

Co-authored-by: plaguss <agustin@argilla.io>
Co-authored-by: Alvaro Bartolome <alvaro@argilla.io>
Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants