Skip to content

Commit

Permalink
Update docs for distilabel v1.0 with mkdocs-material (#476)
Browse files Browse the repository at this point in the history
* Draft refactor docs

* Include layout for the api

* Layout for the docs

* Redirect imports of LLMs

* Draft overview and getting started

* Update docstrings

* Fix docstrings

* Fix argilla reference

* Remove extra line-break

* Refactor and rename `llm` -> `llms`

* Refactor and rename `task` -> `tasks`

* Remove extra line-breaks

* Add missing `type: ignore`

* Update `tasks` and `llms` imports

* Fix imports in `tests/`

* Fix `QualityScorer.format_input` signature

* Update `extra.md`

* Fix `mkdocs.yml` API reference for LLMs

* Add `docs/papers` (WIP)

* Update `docs/papers` (WIP)

* Fix imports after rename to `tasks`

* Remove not used files

* Update main page

* Move argilla docs

* Move papers to sections

* Remove old tutorials

* Update nav

* Remove navigation

* Advances on docs, learn section (#497)

* Add section for distiset

* Update distiset

* Add sample images for screenshots of pipeline runs

* Remove unused files

* Draft including tutorial and advances steps, work in progress

* Fix minor bugs and add `docs/sections/papers/*.md` (#499)

* Fix `distilabel.steps.tasks` imports in `__init__`

* Fix formatting in `__init__.py`

* Remove `commit_message` from `push_to_hub`

* Add missing `super().load()` to load `logging`

* Fix `outputs` in `UltraFeedback`

* Add `model_post_init` in `Argilla` to supress `warnings`

* Add `docs/sections/papers/ultrafeedback.md`

* Add `docs/sections/papers/instruction_backtranslation.md`

* Fix `tests/unit`

* Add `httpx` under `TYPE_CHECKING`

* Fix `argilla` optional dependency handling

* Revert `AnthropicLLM.http_client` typing and add `httpx` dependency instead

* Apply suggestions from code review

Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>

---------

Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>

* Docs cli (#502)

* Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496)

* Update replacing string

* Add guide to the CLI

* Added CLI to api reference a small reference to that from the tutorial

* Docs caching (#500)

* Update serialization method of _BatchManager to write each of the inner steps to a file and load them back (#496)

* Add docstrings to lost argument in Distiset

* Add section for caching in advanced tutorial

* Add `AzureOpenAILLM` (#505)

* Add `AzureOpenAILLM`

* Update `distilabel.llms` imports

* Fix `base_url` env var and add `api_version` env var

* Add `AzureOpenAILLM` to `test_imports`

* Add `TestAzureOpenAILLM`

* Fix `base_url` docstring

* Remove `together` extra and place `tests` extra properly

* Fix extras alphabetic order in `pyproject.toml`

* Update `docs/index.md` and `README.md`

* Add `docs/api/llms/azure.md`

* Docs steps (#503)

* Update layout of steps

* Add step guide and draft of special types of steps

* Add reference for the step decorator

* Include step decorator in the tutorial

* Add intro to the different types of steps

* Add generator steps

* Update general and global steps

* Fix typos

* Missing argilla steps examples in general steps

* Create initial layout for tasks

* Add special tasks

* Add `StepInput` missing import

* Deita tutorial for docs  (#504)

* docs: add deita notebook from community meetup

* Add `asyncio.get_running_loop` for Colab

* feat: refactor into individual steps

* fix: patch async active loops

* chore: tidy print incremental steps

* fix: remove nested asyncio

* convert tutorial to markdown and move

* add assets to repo

* reference tutorial in mkdocs menu bar

* formatting and prose in deita tutorial

* Add mathjax to render math properly

* Update sections to render properly and add some stylistic choices for variable names

* update imports to shortcuts in Deita tutorial

Co-authored-by: Alvaro Bartolome <alvaro@argilla.io>

* Update docs/sections/papers/deita.md

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* Update docs/sections/papers/deita.md

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* docs: respond to prose feedback

---------

Co-authored-by: Alvaro Bartolome <alvaro@argilla.io>
Co-authored-by: plaguss <agustin@argilla.io>
Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* Update docs/sections/learn/steps/index.md

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Update docs/sections/learn/steps/index.md

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Update docs/sections/learn/steps/index.md

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Update docs/sections/learn/steps/index.md

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Docs tasks (#506)

* Add feedback tasks

* Add text generation and self instruct

* Add fix for runtime parameter of extra arguments

* Update docs/sections/learn/tasks/feedback_tasks.md

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* Add text generation specific tasks

* Add example of custom task

* Add runtime parameters

* Modify place of runtime parameters

---------

Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>

* Add `docs/sections/learn/argilla.md` (#509)

* Fix wrong formatting around `#`

* Add `{TextGeneration,Preference}ToArgilla` in docs

* Add `argilla.md` and move Argilla docs there

* Add detailed examples in `argilla.md`

* Add `assets` for `argilla.md`

* Add deployment tips in `argilla.md`

* Add `docs/sections/learn/llms/index.md` (#514)

* Add `docs/sections/learn/llms/index.md`

* Update docs/sections/learn/llms/index.md

---------

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

* Docs pipeline (#512)

* Draft of pipeline section

* Finish pipeline docs section

* Add CLI `run` example

---------

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>

---------

Co-authored-by: Gabriel Martín Blázquez <gmartinbdev@gmail.com>
Co-authored-by: Alvaro Bartolome <alvaro@argilla.io>
Co-authored-by: David Berenstein <davidberenstein1957@users.noreply.github.com>
Co-authored-by: burtenshaw <ben@argilla.io>
Co-authored-by: David Berenstein <david.m.berenstein@gmail.com>
  • Loading branch information
6 people committed Apr 9, 2024
1 parent 6c4d9ae commit fbcaf6f
Show file tree
Hide file tree
Showing 247 changed files with 3,834 additions and 21,460 deletions.
19 changes: 9 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,18 +73,17 @@ Requires Python 3.8+

In addition, the following extras are available:

- `anthropic`: for using models available in [Anthropic API](https://www.anthropic.com/api) via the `AnthropicLLM` integration.
- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).
- `hf-inference-endpoints`: for using the [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
- `hf-transformers`: for using models available in [transformers](https://github.com/huggingface/transformers) package via the `TransformersLLM` integration.
- `hf-inference-endpoints`: for using the [HuggingFace Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
- `openai`: for using (Azure) OpenAI API models via the `OpenAILLM` integration.
- `litellm`: for using [`LiteLLM`](https://github.com/BerriAI/litellm) to call any LLM using OpenAI format via the `LiteLLM` integration.
- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) Python bindings for `llama.cpp` via the `LlamaCppLLM` integration.
- `mistralai`: for using models available in [Mistral AI API](https://mistral.ai/news/la-plateforme/) via the `MistralAILLM` integration.
- `ollama`: for using [Ollama](https://ollama.com/) and their available models via `OllamaLLM` integration.
- `openai`: for using [OpenAI API](https://openai.com/blog/openai-api) models via the `OpenAILLM` integration, or the rest of the integrations based on OpenAI and relying on its client as `AnyscaleLLM`, `AzureOpenAILLM`, and `TogetherLLM`.
- `vertexai`: for using [Google Vertex AI](https://cloud.google.com/vertex-ai) proprietary models via the `VertexAILLM` integration.
- `vllm`: for using [vllm](https://github.com/vllm-project/vllm) serving engine via the `vLLM` integration.
- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) as Python bindings for `llama.cpp`.
- `ollama`: for using [Ollama](https://github.com/ollama/ollama) and their available models via their Python client.
- `together`: for using [Together Inference](https://www.together.ai/products) via their Python client.
- `anyscale`: for using [Anyscale endpoints](https://www.anyscale.com/endpoints).
- `ollama`: for using [Ollama](https://ollama.ai/).
- `mistralai`: for using [Mistral AI](https://docs.mistral.ai/platform/endpoints/) via their Python client.
- `vertexai`: for using both [Google Vertex AI](https://cloud.google.com/vertex-ai/?&gad_source=1&hl=es) offerings: their proprietary models and endpoints via their Python client [`google-cloud-aiplatform`](https://github.com/googleapis/python-aiplatform).
- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).

### Example

Expand Down
48 changes: 48 additions & 0 deletions docs/api/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
# Command Line Interface

This section contains the API reference for the command line interface.

## CLI commands

This section shows the CLI commands:

### distilabel pipeline run

```bash
$ distilabel pipeline info --help

Usage: distilabel pipeline info [OPTIONS]

Get information about a Distilabel pipeline.

╭─ Options ───────────────────────────────────────────────────────────────────────────╮
* --config TEXT Path or URL to the Distilabel pipeline configuration file. │
│ [default: None] │
│ [required] │
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────╯
```

### distilabel pipeline info

```bash
$ distilabel pipeline --help

Usage: distilabel pipeline [OPTIONS] COMMAND [ARGS]...

Commands to run and inspect Distilabel pipelines.

╭─ Options ───────────────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit. │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────────────────╮
│ info Get information about a Distilabel pipeline. │
│ run Run a Distilabel pipeline. │
╰─────────────────────────────────────────────────────────────────────────────────────────╯
```

## Utility functions for the pipeline commands

Here are some utility functions to help working with the pipelines in the console.

::: distilabel.cli.pipeline.utils
3 changes: 3 additions & 0 deletions docs/api/llms/anthropic.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## AnthropicLLM

::: distilabel.llms.anthropic
3 changes: 3 additions & 0 deletions docs/api/llms/anyscale.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## AnyscaleLLM

::: distilabel.llms.anyscale
4 changes: 4 additions & 0 deletions docs/api/llms/azure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
## AzureOpenAILLM

::: distilabel.llms.azure

11 changes: 11 additions & 0 deletions docs/api/llms/huggingface.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Hugging Face

This section contains the reference for Hugging Face integrations:

## Inference Endpoints

::: distilabel.llms.huggingface.inference_endpoints

## Transformers

::: distilabel.llms.huggingface.transformers
3 changes: 3 additions & 0 deletions docs/api/llms/litellm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## LiteLLM

::: distilabel.llms.litellm
3 changes: 3 additions & 0 deletions docs/api/llms/llamacpp.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## LlamacppLLM

::: distilabel.llms.llamacpp
3 changes: 3 additions & 0 deletions docs/api/llms/mistral.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## MistralLLM

::: distilabel.llms.mistral
3 changes: 3 additions & 0 deletions docs/api/llms/ollama.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## OllamaLLM

::: distilabel.llms.ollama
3 changes: 3 additions & 0 deletions docs/api/llms/openai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## OpenaiLLM

::: distilabel.llms.openai
3 changes: 3 additions & 0 deletions docs/api/llms/together.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## TogetherLLM

::: distilabel.llms.together
3 changes: 3 additions & 0 deletions docs/api/llms/vertexai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## VertexaiLLM

::: distilabel.llms.vertexai
3 changes: 3 additions & 0 deletions docs/api/llms/vllm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# vLLM

::: distilabel.llms.vllm
13 changes: 13 additions & 0 deletions docs/api/pipeline/pipeline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Pipeline

## Base Pipeline

::: distilabel.pipeline.base

## Local Pipeline

::: distilabel.pipeline.local

## Extra

::: distilabel.pipeline.utils
5 changes: 5 additions & 0 deletions docs/api/steps/argilla.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Argilla

::: distilabel.steps.argilla.base
::: distilabel.steps.argilla.preference
::: distilabel.steps.argilla.text_generation
5 changes: 5 additions & 0 deletions docs/api/steps/decorator.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# step decorator

This section contains the reference for the `@step` decorator,

::: distilabel.steps.decorator
10 changes: 10 additions & 0 deletions docs/api/steps/extra.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Extra

::: distilabel.steps.combine
::: distilabel.steps.conversation
::: distilabel.steps.decorator
::: distilabel.steps.deita
::: distilabel.steps.expand
::: distilabel.steps.keep
::: distilabel.steps.typing
::: distilabel.steps.tasks.typing
4 changes: 4 additions & 0 deletions docs/api/steps/generator_steps/generator_steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Generator Steps

::: distilabel.steps.generators.data
::: distilabel.steps.generators.huggingface
3 changes: 3 additions & 0 deletions docs/api/steps/global_steps/global_steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Global Steps

::: distilabel.steps.globals.huggingface
3 changes: 3 additions & 0 deletions docs/api/steps/steps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Steps

::: distilabel.steps.base
3 changes: 3 additions & 0 deletions docs/api/steps/tasks/embeddings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Embeddings

::: distilabel.steps.tasks.generate_embeddings
3 changes: 3 additions & 0 deletions docs/api/steps/tasks/preference_tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# Preference tasks

::: distilabel.steps.tasks.ultrafeedback
29 changes: 29 additions & 0 deletions docs/api/steps/tasks/text_generation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Tasks

::: distilabel.steps.tasks.base

## General Text Generation

::: distilabel.steps.tasks.text_generation

## Evol Instruct

::: distilabel.steps.tasks.evol_instruct.base
::: distilabel.steps.tasks.evol_instruct.generator
::: distilabel.steps.tasks.evol_instruct.utils

### Evol Complexity

::: distilabel.steps.tasks.evol_instruct.evol_complexity.base
::: distilabel.steps.tasks.evol_instruct.evol_complexity.generator
::: distilabel.steps.tasks.evol_instruct.evol_complexity.utils

## Evol Quality

::: distilabel.steps.tasks.evol_quality.base
::: distilabel.steps.tasks.evol_quality.utils

## DEITA Scorers

::: distilabel.steps.tasks.complexity_scorer
::: distilabel.steps.tasks.quality_scorer
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sections/cli/cli_pipe_1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/images/sections/cli/cli_pipe_2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/tutorials-assets/deita/datasets.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/tutorials-assets/deita/diversity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/tutorials-assets/deita/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/assets/tutorials-assets/deita/results.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
72 changes: 0 additions & 72 deletions docs/concepts.md

This file was deleted.

51 changes: 11 additions & 40 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
description: Distilabel is an AI Feedback (AIF) framework for building datasets with and for LLMs.
---

# distilabel

AI Feedback (AIF) framework to build datasets with and for LLMs:
Expand All @@ -18,48 +19,18 @@ Requires Python 3.8+

In addition, the following extras are available:

- `anthropic`: for using models available in [Anthropic API](https://www.anthropic.com/api) via the `AnthropicLLM` integration.
- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).
- `hf-inference-endpoints`: for using the [Hugging Face Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
- `hf-transformers`: for using models available in [transformers](https://github.com/huggingface/transformers) package via the `TransformersLLM` integration.
- `hf-inference-endpoints`: for using the [HuggingFace Inference Endpoints](https://huggingface.co/inference-endpoints) via the `InferenceEndpointsLLM` integration.
- `openai`: for using OpenAI API models via the `OpenAILLM` integration.
- `litellm`: for using [`LiteLLM`](https://github.com/BerriAI/litellm) to call any LLM using OpenAI format via the `LiteLLM` integration.
- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) Python bindings for `llama.cpp` via the `LlamaCppLLM` integration.
- `mistralai`: for using models available in [Mistral AI API](https://mistral.ai/news/la-plateforme/) via the `MistralAILLM` integration.
- `ollama`: for using [Ollama](https://ollama.com/) and their available models via `OllamaLLM` integration.
- `openai`: for using [OpenAI API](https://openai.com/blog/openai-api) models via the `OpenAILLM` integration, or the rest of the integrations based on OpenAI and relying on its client as `AnyscaleLLM`, `AzureOpenAILLM`, and `TogetherLLM`.
- `vertexai`: for using [Google Vertex AI](https://cloud.google.com/vertex-ai) proprietary models via the `VertexAILLM` integration.
- `vllm`: for using [vllm](https://github.com/vllm-project/vllm) serving engine via the `vLLM` integration.
- `llama-cpp`: for using [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) as Python bindings for `llama.cpp`.
- `ollama`: for using [Ollama](https://github.com/ollama/ollama) and their available models via their Python client.
- `together`: for using [Together Inference](https://www.together.ai/products) via their Python client.
- `vertexai`: for using both [Google Vertex AI](https://cloud.google.com/vertex-ai/?&gad_source=1&hl=es) offerings: their proprietary models and endpoints via their Python client [`google-cloud-aiplatform`](https://github.com/googleapis/python-aiplatform).
- `argilla`: for exporting the generated datasets to [Argilla](https://argilla.io/).

## Quick example

```python
--8<-- "docs/snippets/quick-example.py"
```

1. Create a `Task` for generating text given an instruction.
2. Create a `LLM` for generating text using the `Task` created in the first step. As the `LLM` will generate text, it will be a `generator`.
3. Create a pre-defined `Pipeline` using the `pipeline` function and the `generator` created in step 2. The `pipeline` function
will create a `labeller` LLM using `OpenAILLM` with the `UltraFeedback` task for instruction following assessment.

!!! note
To run the script successfully, ensure you have assigned your OpenAI API key to the `OPENAI_API_KEY` environment variable.

For a more complete example, check out our awesome tutorials in the docs or the example below:

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/argilla-io/distilabel/blob/main/docs/tutorials/pipeline-notus-instructions-preferences-legal.ipynb) [![Open Source in Github](https://img.shields.io/badge/github-view%20source-black.svg)](https://github.com/argilla-io/distilabel/blob/main/docs/tutorials/pipeline-notus-instructions-preferences-legal.ipynb)

## Navigation

<div class="grid cards" markdown>

- <p align="center"> [**Concept Guides**](./technical-reference/llms.md)</p>

---

Understand the components and their interactions.

- <p align="center"> [**API Reference**](./reference/distilabel/index.md)</p>

---

Technical description of the classes and functions.

</div>
ADD SHOWCASE EXAMPLE
21 changes: 21 additions & 0 deletions docs/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
description: Get familiar with the distilabel's pipelines.
---

# Overview of Distilabel

AI Feedback (AIF) framework to build datasets with and for LLMs:

## Pipeline

Define your pipeline like you would a Directed Acyclic Graph (DAG)...

## Steps

...

## Command Line Interface

Distilabel comes with a CLI to easily reproduce datasets from a `pipeline.yaml`.
...

Loading

0 comments on commit fbcaf6f

Please sign in to comment.