Update docs for distilabel v1.0 with mkdocs-material (#476)

* Draft refactor docs * Include layout for the api * Layout for the docs * Redirect imports of LLMs * Draft overview and getting started * Update docstrings * Fix docstrings * Fix argilla reference * Remove extra line-break * Refactor and rename `llm` -> `llms` * Refactor and rename `task` -> `tasks` * Remove extra line-breaks * Add missing `type: ignore` * Update `tasks` and `llms` imports * Fix imports in `tests/` * Fix `QualityScorer.format_input` signature * Update `extra.md` * Fix `mkdocs.yml` API reference for LLMs * Add `docs/papers` (WIP) * Update `docs/papers` (WIP) * Fix imports after rename to `tasks` * Remove not used files * Update main page * Move argilla docs * Move papers to sections * Remove old tutorials * Update nav * Remove navigation * Advances on docs, learn section (#497) * Add section for distiset * Update distiset * Add sample images for screenshots of pipeline runs * Remove unused files * Draft including tutorial and advances steps, work in progress * Fix minor bugs and add `docs/sections/papers/*.md` (#499) * Fix `distilabel.steps.tasks` imports in `__init__` * Fix formatting in `__init__.py` * Remove `commit_message` from `push_to_hub` * Add missing `super().load()` to load `logging` * Fix `outputs` in `UltraFeedback` * Add `model_post_init` in `Argilla` to supress `warnings` * Add `docs/sections/papers/ultrafeedback.md` * Add `docs/sections/papers/instruction_backtranslation.md` * Fix `tests/unit` * Add `httpx` under `TYPE_CHECKING` * Fix `argilla` optional dependency handling * Revert `AnthropicLLM.http_client` typing and add `httpx` dependency instead * Apply suggestions from code review Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com> --------- Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com> * Docs cli (#502) * Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496) * Update replacing string * Add guide to the CLI * Added CLI to api reference a small reference to that from the tutorial * Docs caching (#500) * Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496) * Add docstrings to lost argument in Distiset * Add section for caching in advanced tutorial * Add `AzureOpenAILLM` (#505) * Add `AzureOpenAILLM` * Update `distilabel.llms` imports * Fix `base_url` env var and add `api_version` env var * Add `AzureOpenAILLM` to `test_imports` * Add `TestAzureOpenAILLM` * Fix `base_url` docstring * Remove `together` extra and place `tests` extra properly * Fix extras alphabetic order in `pyproject.toml` * Update `docs/index.md` and `README.md` * Add `docs/api/llms/azure.md` * Docs steps (#503) * Update layout of steps * Add step guide and draft of special types of steps * Add reference for the step decorator * Include step decorator in the tutorial * Add intro to the different types of steps * Add generator steps * Update general and global steps * Fix typos * Missing argilla steps examples in general steps * Create initial layout for tasks * Add special tasks * Add `StepInput` missing import * Deita tutorial for docs (#504) * docs: add deita notebook from community meetup * Add `asyncio.get_running_loop` for Colab * feat: refactor into individual steps * fix: patch async active loops * chore: tidy print incremental steps * fix: remove nested asyncio * convert tutorial to markdown and move * add assets to repo * reference tutorial in mkdocs menu bar * formatting and prose in deita tutorial * Add mathjax to render math properly * Update sections to render properly and add some stylistic choices for variable names * update imports to shortcuts in Deita tutorial Co-authored-by: Alvaro Bartolome <alvaro@argilla.io> * Update docs/sections/papers/deita.md Co-authored-by: David Berenstein <david.m.berenstein@gmail.com> * Update docs/sections/papers/deita.md Co-authored-by: David Berenstein <david.m.berenstein@gmail.com> * docs: respond to prose feedback --------- Co-authored-by: Alvaro Bartolome <alvaro@argilla.io> Co-authored-by: plaguss <agustin@argilla.io> Co-authored-by: David Berenstein <david.m.berenstein@gmail.com> * Update docs/sections/learn/steps/index.md Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> * Update docs/sections/learn/steps/index.md Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> * Update docs/sections/learn/steps/index.md Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> * Update docs/sections/learn/steps/index.md Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> * Docs tasks (#506) * Add feedback tasks * Add text generation and self instruct * Add fix for runtime parameter of extra arguments * Update docs/sections/learn/tasks/feedback_tasks.md Co-authored-by: David Berenstein <david.m.berenstein@gmail.com> * Add text generation specific tasks * Add example of custom task * Add runtime parameters * Modify place of runtime parameters --------- Co-authored-by: David Berenstein <david.m.berenstein@gmail.com> * Add `docs/sections/learn/argilla.md` (#509) * Fix wrong formatting around `#` * Add `{TextGeneration,Preference}ToArgilla` in docs * Add `argilla.md` and move Argilla docs there * Add detailed examples in `argilla.md` * Add `assets` for `argilla.md` * Add deployment tips in `argilla.md` * Add `docs/sections/learn/llms/index.md` (#514) * Add `docs/sections/learn/llms/index.md` * Update docs/sections/learn/llms/index.md --------- Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> * Docs pipeline (#512) * Draft of pipeline section * Finish pipeline docs section * Add CLI `run` example --------- Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> --------- Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com> Co-authored-by: Alvaro Bartolome <alvaro@argilla.io> Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com> Co-authored-by: burtenshaw <ben@argilla.io> Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>
argilla-io · Apr 9, 2024 · fbcaf6f · fbcaf6f
1 parent 6c4d9ae
commit fbcaf6f
Show file tree

Hide file tree

Showing 247 changed files with 3,834 additions and 21,460 deletions.
diff --git a/README.md b/README.md
@@ -73,18 +73,17 @@ Requires Python 3.8+
 
 In addition, the following extras are available:
 
+- `anthropic`: for using models available in [Anthropic API](https://www.anthropic.com/api) via the `AnthropicLLM` integration.
+- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).
+- `hf-inference-endpoints`: for using the [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
 - `hf-transformers`: for using models available in [transformers](https://github.com/huggingface/transformers) package via the `TransformersLLM` integration.
-- `hf-inference-endpoints`: for using the [HuggingFace Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
-- `openai`: for using (Azure) OpenAI API models via the `OpenAILLM` integration.
+- `litellm`: for using [`LiteLLM`](https://github.com/BerriAI/litellm) to call any LLM using OpenAI format via the `LiteLLM` integration.
+- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) Python bindings for `llama.cpp` via the `LlamaCppLLM` integration.
+- `mistralai`: for using models available in [Mistral AI API](https://mistral.ai/news/la-plateforme/) via the `MistralAILLM` integration.
+- `ollama`: for using [Ollama](https://ollama.com/) and their available models via `OllamaLLM` integration.
+- `openai`: for using [OpenAI API](https://openai.com/blog/openai-api) models via the `OpenAILLM` integration, or the rest of the integrations based on OpenAI and relying on its client as `AnyscaleLLM`, `AzureOpenAILLM`, and `TogetherLLM`.
+- `vertexai`: for using [Google Vertex AI](https://cloud.google.com/vertex-ai) proprietary models via the `VertexAILLM` integration.
 - `vllm`: for using [vllm](https://github.com/vllm-project/vllm) serving engine via the `vLLM` integration.
-- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) as Python bindings for `llama.cpp`.
-- `ollama`: for using [Ollama](https://github.com/ollama/ollama) and their available models via their Python client.
-- `together`: for using [Together Inference](https://www.together.ai/products) via their Python client.
-- `anyscale`: for using [Anyscale endpoints](https://www.anyscale.com/endpoints).
-- `ollama`: for using [Ollama](https://ollama.ai/).
-- `mistralai`: for using [Mistral AI](https://docs.mistral.ai/platform/endpoints/) via their Python client.
-- `vertexai`: for using both [Google Vertex AI](https://cloud.google.com/vertex-ai/?&gad_source=1&hl=es) offerings: their proprietary models and endpoints via their Python client [`google-cloud-aiplatform`](https://github.com/googleapis/python-aiplatform).
-- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).
 
 ### Example
 

diff --git a/docs/api/cli.md b/docs/api/cli.md
@@ -0,0 +1,48 @@
+# Command Line Interface
+
+This section contains the API reference for the command line interface.
+
+## CLI commands
+
+This section shows the CLI commands:
+
+### distilabel pipeline run
+
+```bash
+$ distilabel pipeline info --help
+
+ Usage: distilabel pipeline info [OPTIONS]
+
+ Get information about a Distilabel pipeline.
+
+╭─ Options ───────────────────────────────────────────────────────────────────────────╮
+│ *  --config        TEXT  Path or URL to the Distilabel pipeline configuration file. │
+│                          [default: None]                                            │
+│                          [required]                                                 │
+│    --help                Show this message and exit.                                │
+╰─────────────────────────────────────────────────────────────────────────────────────╯
+```
+
+### distilabel pipeline info
+
+```bash
+$ distilabel pipeline --help
+
+ Usage: distilabel pipeline [OPTIONS] COMMAND [ARGS]...
+
+ Commands to run and inspect Distilabel pipelines.
+
+╭─ Options ───────────────────────────────────────────────────────────────────────────────╮
+│ --help          Show this message and exit.                                             │
+╰─────────────────────────────────────────────────────────────────────────────────────────╯
+╭─ Commands ──────────────────────────────────────────────────────────────────────────────╮
+│ info      Get information about a Distilabel pipeline.                                  │
+│ run       Run a Distilabel pipeline.                                                    │
+╰─────────────────────────────────────────────────────────────────────────────────────────╯
+```
+
+## Utility functions for the pipeline commands
+
+Here are some utility functions to help working with the pipelines in the console.
+
+::: distilabel.cli.pipeline.utils
diff --git a/docs/api/llms/anthropic.md b/docs/api/llms/anthropic.md
@@ -0,0 +1,3 @@
+## AnthropicLLM
+
+::: distilabel.llms.anthropic
diff --git a/docs/api/llms/anyscale.md b/docs/api/llms/anyscale.md
@@ -0,0 +1,3 @@
+## AnyscaleLLM
+
+::: distilabel.llms.anyscale
diff --git a/docs/api/llms/azure.md b/docs/api/llms/azure.md
@@ -0,0 +1,4 @@
+## AzureOpenAILLM
+
+::: distilabel.llms.azure
+
diff --git a/docs/api/llms/huggingface.md b/docs/api/llms/huggingface.md
@@ -0,0 +1,11 @@
+# Hugging Face
+
+This section contains the reference for Hugging Face integrations:
+
+## Inference Endpoints
+
+::: distilabel.llms.huggingface.inference_endpoints
+
+## Transformers
+
+::: distilabel.llms.huggingface.transformers
diff --git a/docs/api/llms/litellm.md b/docs/api/llms/litellm.md
@@ -0,0 +1,3 @@
+## LiteLLM
+
+::: distilabel.llms.litellm
diff --git a/docs/api/llms/llamacpp.md b/docs/api/llms/llamacpp.md
@@ -0,0 +1,3 @@
+## LlamacppLLM
+
+::: distilabel.llms.llamacpp
diff --git a/docs/api/llms/mistral.md b/docs/api/llms/mistral.md
@@ -0,0 +1,3 @@
+## MistralLLM
+
+::: distilabel.llms.mistral
diff --git a/docs/api/llms/ollama.md b/docs/api/llms/ollama.md
@@ -0,0 +1,3 @@
+## OllamaLLM
+
+::: distilabel.llms.ollama
diff --git a/docs/api/llms/openai.md b/docs/api/llms/openai.md
@@ -0,0 +1,3 @@
+## OpenaiLLM
+
+::: distilabel.llms.openai
diff --git a/docs/api/llms/together.md b/docs/api/llms/together.md
@@ -0,0 +1,3 @@
+## TogetherLLM
+
+::: distilabel.llms.together
diff --git a/docs/api/llms/vertexai.md b/docs/api/llms/vertexai.md
@@ -0,0 +1,3 @@
+## VertexaiLLM
+
+::: distilabel.llms.vertexai
diff --git a/docs/api/llms/vllm.md b/docs/api/llms/vllm.md
@@ -0,0 +1,3 @@
+# vLLM
+
+::: distilabel.llms.vllm
diff --git a/docs/api/pipeline/pipeline.md b/docs/api/pipeline/pipeline.md
@@ -0,0 +1,13 @@
+# Pipeline
+
+## Base Pipeline
+
+::: distilabel.pipeline.base
+
+## Local Pipeline
+
+::: distilabel.pipeline.local
+
+## Extra
+
+::: distilabel.pipeline.utils
diff --git a/docs/api/steps/argilla.md b/docs/api/steps/argilla.md
@@ -0,0 +1,5 @@
+# Argilla
+
+::: distilabel.steps.argilla.base
+::: distilabel.steps.argilla.preference
+::: distilabel.steps.argilla.text_generation
diff --git a/docs/api/steps/decorator.md b/docs/api/steps/decorator.md
@@ -0,0 +1,5 @@
+# step decorator
+
+This section contains the reference for the `@step` decorator,
+
+::: distilabel.steps.decorator
diff --git a/docs/api/steps/extra.md b/docs/api/steps/extra.md
@@ -0,0 +1,10 @@
+# Extra
+
+::: distilabel.steps.combine
+::: distilabel.steps.conversation
+::: distilabel.steps.decorator
+::: distilabel.steps.deita
+::: distilabel.steps.expand
+::: distilabel.steps.keep
+::: distilabel.steps.typing
+::: distilabel.steps.tasks.typing
diff --git a/docs/api/steps/generator_steps/generator_steps.md b/docs/api/steps/generator_steps/generator_steps.md
@@ -0,0 +1,4 @@
+# Generator Steps
+
+::: distilabel.steps.generators.data
+::: distilabel.steps.generators.huggingface
diff --git a/docs/api/steps/global_steps/global_steps.md b/docs/api/steps/global_steps/global_steps.md
@@ -0,0 +1,3 @@
+# Global Steps
+
+::: distilabel.steps.globals.huggingface
diff --git a/docs/api/steps/steps.md b/docs/api/steps/steps.md
@@ -0,0 +1,3 @@
+# Steps
+
+::: distilabel.steps.base
diff --git a/docs/api/steps/tasks/embeddings.md b/docs/api/steps/tasks/embeddings.md
@@ -0,0 +1,3 @@
+# Embeddings
+
+::: distilabel.steps.tasks.generate_embeddings
diff --git a/docs/api/steps/tasks/preference_tasks.md b/docs/api/steps/tasks/preference_tasks.md
@@ -0,0 +1,3 @@
+# Preference tasks
+
+::: distilabel.steps.tasks.ultrafeedback
diff --git a/docs/api/steps/tasks/text_generation.md b/docs/api/steps/tasks/text_generation.md
@@ -0,0 +1,29 @@
+# Tasks
+
+::: distilabel.steps.tasks.base
+
+## General Text Generation
+
+::: distilabel.steps.tasks.text_generation
+
+## Evol Instruct
+
+::: distilabel.steps.tasks.evol_instruct.base
+::: distilabel.steps.tasks.evol_instruct.generator
+::: distilabel.steps.tasks.evol_instruct.utils
+
+### Evol Complexity
+
+::: distilabel.steps.tasks.evol_instruct.evol_complexity.base
+::: distilabel.steps.tasks.evol_instruct.evol_complexity.generator
+::: distilabel.steps.tasks.evol_instruct.evol_complexity.utils
+
+## Evol Quality
+
+::: distilabel.steps.tasks.evol_quality.base
+::: distilabel.steps.tasks.evol_quality.utils
+
+## DEITA Scorers
+
+::: distilabel.steps.tasks.complexity_scorer
+::: distilabel.steps.tasks.quality_scorer
diff --git a/docs/assets/images/sections/caching/caching_pipe_1.png b/docs/assets/images/sections/caching/caching_pipe_1.png
diff --git a/docs/assets/images/sections/caching/caching_pipe_2.png b/docs/assets/images/sections/caching/caching_pipe_2.png
diff --git a/docs/assets/images/sections/caching/caching_pipe_3.png b/docs/assets/images/sections/caching/caching_pipe_3.png
diff --git a/docs/assets/images/sections/caching/caching_pipe_4.png b/docs/assets/images/sections/caching/caching_pipe_4.png
diff --git a/docs/assets/images/sections/cli/cli_pipe_1.png b/docs/assets/images/sections/cli/cli_pipe_1.png
diff --git a/docs/assets/images/sections/cli/cli_pipe_2.png b/docs/assets/images/sections/cli/cli_pipe_2.png
diff --git a/docs/assets/images/sections/learn/steps/argilla/preference.png b/docs/assets/images/sections/learn/steps/argilla/preference.png
diff --git a/docs/assets/images/sections/learn/steps/argilla/text_generation.png b/docs/assets/images/sections/learn/steps/argilla/text_generation.png
diff --git a/docs/assets/tutorials-assets/deita/datasets.png b/docs/assets/tutorials-assets/deita/datasets.png
diff --git a/docs/assets/tutorials-assets/deita/diversity.png b/docs/assets/tutorials-assets/deita/diversity.png
diff --git a/docs/assets/tutorials-assets/deita/overview.png b/docs/assets/tutorials-assets/deita/overview.png
diff --git a/docs/assets/tutorials-assets/deita/results.png b/docs/assets/tutorials-assets/deita/results.png
diff --git a/docs/concepts.md b/docs/concepts.md
diff --git a/docs/index.md b/docs/index.md
@@ -1,6 +1,7 @@
 ---
 description: Distilabel is an AI Feedback (AIF) framework for building datasets with and for LLMs.
 ---
+
 # distilabel
 
 AI Feedback (AIF) framework to build datasets with and for LLMs:
@@ -18,48 +19,18 @@ Requires Python 3.8+
 
 In addition, the following extras are available:
 
+- `anthropic`: for using models available in [Anthropic API](https://www.anthropic.com/api) via the `AnthropicLLM` integration.
+- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).
+- `hf-inference-endpoints`: for using the [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
 - `hf-transformers`: for using models available in [transformers](https://github.com/huggingface/transformers) package via the `TransformersLLM` integration.
-- `hf-inference-endpoints`: for using the [HuggingFace Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
-- `openai`: for using OpenAI API models via the `OpenAILLM` integration.
+- `litellm`: for using [`LiteLLM`](https://github.com/BerriAI/litellm) to call any LLM using OpenAI format via the `LiteLLM` integration.
+- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) Python bindings for `llama.cpp` via the `LlamaCppLLM` integration.
+- `mistralai`: for using models available in [Mistral AI API](https://mistral.ai/news/la-plateforme/) via the `MistralAILLM` integration.
+- `ollama`: for using [Ollama](https://ollama.com/) and their available models via `OllamaLLM` integration.
+- `openai`: for using [OpenAI API](https://openai.com/blog/openai-api) models via the `OpenAILLM` integration, or the rest of the integrations based on OpenAI and relying on its client as `AnyscaleLLM`, `AzureOpenAILLM`, and `TogetherLLM`.
+- `vertexai`: for using [Google Vertex AI](https://cloud.google.com/vertex-ai) proprietary models via the `VertexAILLM` integration.
 - `vllm`: for using [vllm](https://github.com/vllm-project/vllm) serving engine via the `vLLM` integration.
-- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) as Python bindings for `llama.cpp`.
-- `ollama`: for using [Ollama](https://github.com/ollama/ollama) and their available models via their Python client.
-- `together`: for using [Together Inference](https://www.together.ai/products) via their Python client.
-- `vertexai`: for using both [Google Vertex AI](https://cloud.google.com/vertex-ai/?&gad_source=1&hl=es) offerings: their proprietary models and endpoints via their Python client [`google-cloud-aiplatform`](https://github.com/googleapis/python-aiplatform).
-- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).
 
 ## Quick example
 
-```python
---8<-- "docs/snippets/quick-example.py"
-```
-
-1. Create a `Task` for generating text given an instruction.
-2. Create a `LLM` for generating text using the `Task` created in the first step. As the `LLM` will generate text, it will be a `generator`.
-3. Create a pre-defined `Pipeline` using the `pipeline` function and the `generator` created in step 2. The `pipeline` function
-will create a `labeller` LLM using `OpenAILLM` with the `UltraFeedback` task for instruction following assessment.
-
-!!! note
-    To run the script successfully, ensure you have assigned your OpenAI API key to the `OPENAI_API_KEY` environment variable.
-
-For a more complete example, check out our awesome tutorials in the docs or the example below:
-
-[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/argilla-io/distilabel/blob/main/docs/tutorials/pipeline-notus-instructions-preferences-legal.ipynb) [![Open Source in Github](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/argilla-io/distilabel/blob/main/docs/tutorials/pipeline-notus-instructions-preferences-legal.ipynb)
-
-## Navigation
-
-<div class="grid cards" markdown>
-
--   <p align="center"> [**Concept Guides**](./technical-reference/llms.md)</p>
-
-    ---
-
-    Understand the components and their interactions.
-
--   <p align="center"> [**API Reference**](./reference/distilabel/index.md)</p>
-
-    ---
-
-    Technical description of the classes and functions.
-
-</div>
+ADD SHOWCASE EXAMPLE
diff --git a/docs/overview.md b/docs/overview.md
@@ -0,0 +1,21 @@
+---
+description: Get familiar with the distilabel's pipelines.
+---
+
+# Overview of Distilabel
+
+AI Feedback (AIF) framework to build datasets with and for LLMs:
+
+## Pipeline
+
+Define your pipeline like you would a Directed Acyclic Graph (DAG)...
+
+## Steps
+
+...
+
+## Command Line Interface
+
+Distilabel comes with a CLI to easily reproduce datasets from a `pipeline.yaml`.
+...
+