diff --git a/argilla-frontend/README.md b/argilla-frontend/README.md index cf8039f6b6..d8b4608aea 100644 --- a/argilla-frontend/README.md +++ b/argilla-frontend/README.md @@ -53,7 +53,9 @@ https://github.com/argilla-io/argilla/assets/1107111/49e28d64-9799-4cac-be49-19d ## 🚀 Quickstart -Argilla is an open-source data curation platform for LLMs. Using Argilla, everyone can build robust language models through faster data curation using both human and machine feedback. We provide support for each step in the MLOps cycle, from data labeling to model monitoring. +Argilla is an open-source data curation platform for LLMs. Using Argilla, everyone can build robust language models +through faster data curation using both human and machine feedback. We provide support for each step in the MLOps cycle, +from data labeling to model monitoring. There are different options to get started: @@ -65,6 +67,12 @@ There are different options to get started: ## 🖥️ FRONTEND +- Before running Argilla frontend server, you need to install Node version 18: + +```bash +brew install node@18 +``` +

💣 Install dependencies

```bash @@ -89,17 +97,33 @@ npm run generate ## 📏 Principles -- **Open**: Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can **use and combine your preferred libraries** without implementing any specific interface. +- **Open**: Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, + spaCy, Stanford Stanza, Flair, etc.). In fact, you can **use and combine your preferred libraries** without + implementing any specific interface. -- **End-to-end**: Most annotation tools treat data collection as a one-off activity at the beginning of each project. In real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model goes into production, you want to monitor and analyze its predictions and collect more data to improve your model over time. Argilla is designed to close this gap, enabling you to **iterate as much as you need**. +- **End-to-end**: Most annotation tools treat data collection as a one-off activity at the beginning of each project. In + real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model + goes into production, you want to monitor and analyze its predictions and collect more data to improve your model over + time. Argilla is designed to close this gap, enabling you to **iterate as much as you need**. -- **User and Developer Experience**: The key to sustainable NLP solutions are to make it easier for everyone to contribute to projects. _Domain experts_ should feel comfortable interpreting and annotating data. _Data scientists_ should feel free to experiment and iterate. _Engineers_ should feel in control of data pipelines. Argilla optimizes the experience for these core users to **make your teams more productive**. +- **User and Developer Experience**: The key to sustainable NLP solutions are to make it easier for everyone to + contribute to projects. _Domain experts_ should feel comfortable interpreting and annotating data. _Data scientists_ + should feel free to experiment and iterate. _Engineers_ should feel in control of data pipelines. Argilla optimizes + the experience for these core users to **make your teams more productive**. -- **Beyond hand-labeling**: Classical hand-labeling workflows are costly and inefficient, but having humans in the loop is essential. Easily combine hand-labeling with active learning, bulk-labeling, zero-shot models, and weak supervision in **novel** data annotation workflows\*\*. +- **Beyond hand-labeling**: Classical hand-labeling workflows are costly and inefficient, but having humans in the loop + is essential. Easily combine hand-labeling with active learning, bulk-labeling, zero-shot models, and weak supervision + in **novel** data annotation workflows\*\*. ## 🫱🏾‍🫲🏼 Contribute -We love contributors and have launched a [collaboration with JustDiggit](https://argilla.io/blog/introducing-argilla-community-growers) to hand out our very own bunds and help the re-greening of sub-Saharan Africa. To help our community with the creation of contributions, we have created our [developer](https://docs.argilla.io/en/latest/community/developer_docs.html) and [contributor](https://docs.argilla.io/en/latest/community/contributing.html) docs. Additionally, you can always [schedule a meeting](https://calendly.com/argilla-office-hours/30min) with our Developer Advocacy team so they can get you up to speed. +We love contributors and have launched +a [collaboration with JustDiggit](https://argilla.io/blog/introducing-argilla-community-growers) to hand out our very +own bunds and help the re-greening of sub-Saharan Africa. To help our community with the creation of contributions, we +have created our [developer](https://docs.argilla.io/en/latest/community/developer_docs.html) +and [contributor](https://docs.argilla.io/en/latest/community/contributing.html) docs. Additionally, you can +always [schedule a meeting](https://calendly.com/argilla-office-hours/30min) with our Developer Advocacy team so they +can get you up to speed. ## 🥇 Contributors @@ -111,4 +135,5 @@ We love contributors and have launched a [collaboration with JustDiggit](https:/ ## 🗺️ Roadmap -We continuously work on updating [our plans and our roadmap](https://github.com/orgs/argilla-io/projects/10/views/1) and we love to discuss those with our community. Feel encouraged to participate. +We continuously work on updating [our plans and our roadmap](https://github.com/orgs/argilla-io/projects/10/views/1) and +we love to discuss those with our community. Feel encouraged to participate. diff --git a/argilla-server/README.md b/argilla-server/README.md index 57d145f4b7..8701f1146d 100644 --- a/argilla-server/README.md +++ b/argilla-server/README.md @@ -32,47 +32,21 @@

-Argilla is a **collaboration platform for AI engineers and domain experts** that require **high-quality outputs, full data ownership, and overall efficiency**. +Argilla is a **collaboration platform for AI engineers and domain experts** that require **high-quality outputs, full +data ownership, and overall efficiency**. -This repository only contains developer info about the backend server. If you want to get started, we recommend taking a look at our [main repository](https://github.com/argilla-io/argilla) or our [documentation](https://docs.argilla.io/). +This repository only contains developer info about the backend server. If you want to get started, we recommend taking a +look at our [main repository](https://github.com/argilla-io/argilla) or our [documentation](https://docs.argilla.io/). -Are you a contributor or do you want to understand what is going on under the hood, please keep reading the documentation below. - -## Clone repository - -`argilla-server` is using `argilla` repository as submodule to build frontend statics so when cloning use the following command: - -```sh -git clone --recurse-submodules git@github.com:argilla-io/argilla-server.git -``` - -If you already cloned the repository without using `--recurse-submodules` you can init and update the submodules with: - -```sh -git submodule update --remote --recursive --init -``` - -> [!IMPORTANT] -> By default `argilla` submodule is using `develop` branch so the previous command will get the latest commit from that branch. - -### Specify a tag for argilla submodule - -When doing a release we should change `argilla` submodule to use an specific tag. In the following example we are setting tag `v1.22.0`: - -```sh -cd argilla -git fetch --tags -git checkout v1.22.0 -``` - -> [!NOTE] -> You should see some changes on the `argilla-server` root folder where the subproject commit is now changed to the one from the tag version. Feel free to commit these changes. +Are you a contributor or do you want to understand what is going on under the hood, please keep reading the +documentation below. ## Development environment -By default all commands executed with `pdm run` will get environment variables from `.env.dev` except command `pdm test` that will overwrite some of them using values coming from `.env.test` file. +By default all commands executed with `pdm run` will get environment variables from `.env.dev` except command `pdm test` +that will overwrite some of them using values coming from `.env.test` file. -These environment variables can be overrided if necessary so feel free to defined your own ones locally. +These environment variables can be override if necessary so feel free to defined your own ones locally. ### Run cli @@ -82,7 +56,8 @@ pdm cli ### Run database migrations -By default a SQLite located at `~/.argilla/argilla.db` will be used. You can create the database and run migrations with the following custom PDM command: +By default a SQLite located at `~/.argilla/argilla.db` will be used. You can create the database and run migrations with +the following custom PDM command: ```sh pdm migrate @@ -90,7 +65,8 @@ pdm migrate ### Run tests -A SQLite database located at `~/.argilla/argilla-test.db` will be automatically created to run tests. You can run the entire test suite using the following custom PDM command: +A SQLite database located at `~/.argilla/argilla-test.db` will be automatically created to run tests. You can run the +entire test suite using the following custom PDM command: ```sh pdm test @@ -98,21 +74,8 @@ pdm test ## Run development server -### Build frontend static files - -Before running Argilla development server we need to build the frontend static files. Node version 18 is required for this action: - -```sh -brew install node@18 -``` - -After that you can build the frontend static files: - -```sh -./scripts/build_frontend.sh -``` - -After running the previous script you should have a folder at `src/argilla_server/static` with all the frontend static files successfully generated. +Note: If you need to run the frontend server you can follow the instructions at +the [argilla-frontend](/argilla-frontend/README.md) project ### Run uvicorn development server diff --git a/argilla-server/scripts/build_distribution.sh b/argilla-server/scripts/build_distribution.sh deleted file mode 100755 index 810ded14ca..0000000000 --- a/argilla-server/scripts/build_distribution.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/usr/bin/env bash -set -e - -BASEDIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" - -$BASEDIR/build_frontend.sh - -rm -rf dist && pdm build diff --git a/argilla-server/scripts/build_frontend.sh b/argilla-server/scripts/build_frontend.sh deleted file mode 100755 index 4c517672c9..0000000000 --- a/argilla-server/scripts/build_frontend.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/usr/bin/env bash - -cd argilla/frontend \ -&& npm install \ -&& npm run-script lint \ -&& npm run-script test \ -&& BASE_URL=@@baseUrl@@ DIST_FOLDER=../../src/argilla_server/static npm run-script build \ diff --git a/docs/_source/community/developer_docs.md b/docs/_source/community/developer_docs.md index f8f6c8a2a4..888249564e 100644 --- a/docs/_source/community/developer_docs.md +++ b/docs/_source/community/developer_docs.md @@ -1,18 +1,40 @@ # Developer Documentation -Being a developer in Argilla means that you are a part of the Argilla community and you are contributing to the development of Argilla. This page will guide you through the steps that you need to take to set up your development environment and start contributing to Argilla. Argilla is built upon different core components: +Being a developer in Argilla means that you are a part of the Argilla community, and you are contributing to the +development of Argilla. This page will guide you through the steps that you need to take to set up your development +environment and start contributing to Argilla. Argilla is built upon different core components: -- **Documentation**: The documentation for Argilla serves as an invaluable resource, providing a comprehensive and in-depth guide for users seeking to explore, understand, and effectively harness the core components of the Argilla ecosystem. +- **Documentation**: The documentation for Argilla serves as an invaluable resource, providing a comprehensive and +in-depth guide for users seeking to explore, understand, and effectively harness the core components of the Argilla +ecosystem. -- **Python SDK**: A Python SDK which is installable with `pip install argilla`, to interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration, and annotation workflows. +- **Python SDK**: A Python SDK which is installable with `pip install argilla`, to interact with the Argilla Server and +the Argilla UI. It provides an API to manage the data, configuration, and annotation workflows. -- **FastAPI Server**: The core of Argilla is a Python `FastAPI server` that manages the data, by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides a REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data. +- **FastAPI Server**: The core of Argilla is a Python `FastAPI server` that manages the data, by pre-processing it and +storing it in the vector database. Also, it stores application information in the relational database. It provides a +REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize +the data. -- **Relational Database**: A relational database to store the metadata of the records and the annotations. `SQLite` is used as the default built-in option and is deployed separately with the Argilla Server but a separate `PostgreSQL` can be used too. +- **Relational Database**: A relational database to store the metadata of the records and the annotations. `SQLite` is +used as the default built-in option and is deployed separately with the Argilla Server but a separate `PostgreSQL` +can be used too. -- **Vector Database**: A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support `ElasticSearch` and `AWS OpenSearch` and they can be deployed as separate Docker images. +- **Vector Database**: A vector database to store the records data and perform scalable vector similarity searches and +basic document searches. We currently support `ElasticSearch` and `AWS OpenSearch` and they can be deployed as separate +Docker images. -- **Vue.js UI**: A web application to visualize and annotate your data, users, and teams. It is built with `Vue.js` and is directly deployed alongside the Argilla Server within our Argilla Docker image. +- **Vue.js UI**: A web application to visualize and annotate your data, users, and teams. It is built with `Vue.js` and +is directly deployed alongside the Argilla Server within our Argilla Docker image. + +The Argilla repository has a monorepo structure, which means that all the components live in the same repository: +`argilla-io/argilla`. This repo is divided into the following folders: + +- [`argilla`](/argilla): The python SDK project +- [`argilla-server`](/argilla-server): The FastAPI server project +- [`argilla-frontend`](/argilla-frontend): The Vue.js UI project +- [`docs`](/docs): The documentation project +- [`examples`](/examples): Example resources for deployments, scripts and notebooks For a proper installation, you will need to: @@ -26,11 +48,14 @@ And, you can start to [make your contribution](#make-your-contribution)! ## Set up the Documentation Environment -To kickstart your journey in contributing to Argilla, immersing yourself in the documentation is highly recommended. To do so, we recommend you create a virtual environment and follow the steps below. To build the documentation, a reduced set of dependencies is needed. +To kickstart your journey in contributing to Argilla, immersing yourself in the documentation is highly recommended. To +do so, we recommend you create a virtual environment and follow the steps below. To build the documentation, a reduced +set of dependencies is needed. ### Clone the Argilla Repository -First of all, you have to fork our repository and clone the fork to your computer. For more information, you can check our [guide](/community/contributing.md#work-with-a-fork). +First of all, you have to fork our repository and clone the fork to your computer. For more information, you can check +our [guide](/community/contributing.md#work-with-a-fork). ```sh git clone https://github.com/[your-github-username]/argilla.git @@ -53,19 +78,26 @@ To build the documentation, make sure you set up your system by installing the r pip install -r docs/_source/requirements.txt ``` -During the installation, you may encounter the following error: Microsoft Visual C++ 14.0 or greater is required. To solve it easily, check this [link](https://learn.microsoft.com/en-us/answers/questions/136595/error-microsoft-visual-c-14-0-or-greater-is-requir). +During the installation, you may encounter the following error: Microsoft Visual C++ 14.0 or greater is required. To +solve it easily, check this [link](https://learn.microsoft.com/en-us/answers/questions/136595/error-microsoft-visual-c-14-0-or-greater-is-requir). ### Build the documentation -To build the documentation, it is used [`sphinx`](https://www.sphinx-doc.org/en/master/),an open-source documentation generator, that is, it uses reStructuredText for writing documentation. Using Sphinx's command-line tool, it takes a collection of source files in plain text and generate them in HTML format. It also automatically creates a table of contents, index pages, and search features, enhancing navigation. To do so, the following files are required: +To build the documentation, it is used [`sphinx`](https://www.sphinx-doc.org/en/master/),an open-source documentation generator, that is, it uses +reStructuredText for writing documentation. Using Sphinx's command-line tool, it takes a collection of source files +in plain text and generate them in HTML format. It also automatically creates a table of contents, index pages, and +search features, enhancing navigation. To do so, the following files are required: -- **index.rst**: This serves as the main entry point for our documentation, accessible at the root URL. It typically includes a table of contents (using the toc trees), connecting users to other documentation sections. +- **index.rst**: This serves as the main entry point for our documentation, accessible at the root URL. It typically +includes a table of contents (using the toc trees), connecting users to other documentation sections. - **conf.py**: This file enables customization of the documentation's output. - **Makefile**: A crucial component provided by Sphinx, serving as the primary tool for local development. - **Other .rst files**: These are intended for specific subsections of the documentation. - **Markdown files**: The source files with plain text. -In our case, we rely on [`MyST-Parser`](https://myst-parser.readthedocs.io/en/latest/) to facilitate our work with Markdown. So, it's essential that when writing the documentation, we utilize [proper cross-references](https://docs.readthedocs.io/en/stable/guides/cross-referencing-with-sphinx.html) to connect various sections and documents. Below, you can find a typical illustration of commonly used cross-references: +In our case, we rely on [`MyST-Parser`](https://myst-parser.readthedocs.io/en/latest/) to facilitate our work with Markdown. So, it's essential that when writing +the documentation, we utilize [proper cross-references](https://docs.readthedocs.io/en/stable/guides/cross-referencing-with-sphinx.html) to connect various sections and documents. Below, you can +find a typical illustration of commonly used cross-references: ```md # To reference a previous section @@ -89,13 +121,17 @@ Reference [](my_target). - {doc}`Custom title ` ``` -So, once the documentation is written or fixed, if the installation was smooth, then use `sphinx-autobuild` to continuously deploy the webpage using the following command: +So, once the documentation is written or fixed, if the installation was smooth, then use `sphinx-autobuild` to +continuously deploy the webpage using the following command: ```sh sphinx-autobuild docs/_source docs/_build/html ``` -This will create a _build/html folder that is served at [http://127.0.0.1:8000](http://127.0.0.1:8000). Also, it starts watching for changes in the docs/source directory. When a change is detected in docs/source, the documentation is rebuilt and any open browser windows are reloaded automatically. Make sure that all files are indexed correctly. KeyboardInterrupt (ctrl+c) will stop the server. Below is an example of the server output running and stopping: +This will create a _build/html folder that is served at [http://127.0.0.1:8000](http://127.0.0.1:8000). Also, it starts watching for +changes in the docs/source directory. When a change is detected in docs/source, the documentation is rebuilt and any +open browser windows are reloaded automatically. Make sure that all files are indexed correctly. KeyboardInterrupt (ctrl+c) +will stop the server. Below is an example of the server output running and stopping: ```sh The HTML pages are in docs\_build\html. @@ -113,13 +149,16 @@ The HTML pages are in docs\_build\html. ## Set up the Development Environment -To work and develop for the core product of Argilla, you need to have all of Argilla's subsystem correctly running. In this section, we'll show how to install the Argilla package, the databases and the server. The frontend is optional and only required for running the UI, but you can also find how to run it here. +To work and develop for the core product of Argilla, you need to have all of Argilla's subsystem correctly running. In +this section, we'll show how to install the Argilla package, the databases and the server. The frontend is optional +and only required for running the UI, but you can also find how to run it here. ### Creating the Python Environment #### Clone the Argilla Repository -To set up your system for Argilla development, you, first of all, have to [fork](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) our repository and [clone](https://github.com/argilla-io/argilla) the fork to your computer. +To set up your system for Argilla development, you, first of all, have to [fork](https://docs.github.com/en/get-started/quickstart/contributing-to-projects) our repository and [clone](https://github.com/argilla-io/argilla) +the fork to your computer. ```sh git clone https://github.com/[your-github-username]/argilla.git @@ -134,15 +173,20 @@ git remote add upstream https://github.com/argilla-io/argilla.git #### Install Dependencies -You will need to install `argilla` and the extra dependencies that you prefer to be able to use Argilla in your Python client or Command Line Interface (CLI). There are two ways to install it and you can opt for one of them depending on your use case: +You will need to install `argilla` and the extra dependencies that you prefer to be able to use Argilla in your Python +client or Command Line Interface (CLI). There are two ways to install it and you can opt for one of them depending on +your use case: -- Install `argilla` with `pip`: Recommended for non-extensive, one-time contributions as it will only install the required packages. +- Install `argilla` with `pip`: Recommended for non-extensive, one-time contributions as it will only install the +required packages. -- Install `argilla` with `conda`: Recommended for comprehensive, continuous contributions as it will create an all-inclusive environment for development. +- Install `argilla` with `conda`: Recommended for comprehensive, continuous contributions as it will create an +all-inclusive environment for development. ##### Install with `pip` -If you choose to install Argilla via `pip`, you can do it easily on your terminal. Firstly, direct to the `argilla` folder in your terminal by: +If you choose to install Argilla via `pip`, you can do it easily on your terminal. Firstly, direct to the `argilla` +folder in your terminal by: ```sh cd argilla @@ -155,7 +199,9 @@ python -m venv .env source .env/bin/activate ``` -Then, you just need to install Argilla with the command below. Note that we will install it in editable mode using the -e/--editable flag in the `pip` command to avoid having to re-install it on every code modification, but if you’re not planning to modify the code, you can just omit the -e/--editable flag. +Then, you just need to install Argilla with the command below. Note that we will install it in editable mode using the +-e/--editable flag in the `pip` command to avoid having to re-install it on every code modification, but if you’re not +planning to modify the code, you can just omit the -e/--editable flag. ```sh pip install -e . @@ -167,7 +213,9 @@ Or installing just the `server` extra: pip install -e ".[server]" ``` -Or you can install all the extras, which are also required to run the tests via pytest to make sure that the implemented features or the bug fixes work as expected, and that the unit/integration tests are passing. If you encounter any package or dependency problems, please consider upgrading or downgrading the related packages to solve the problem. +Or you can install all the extras, which are also required to run the tests via pytest to make sure that the implemented +features or the bug fixes work as expected, and that the unit/integration tests are passing. If you encounter any package +or dependency problems, please consider upgrading or downgrading the related packages to solve the problem. ```sh pip install -e ".[server,listeners,postgresql,integrations,tests]" @@ -175,28 +223,30 @@ pip install -e ".[server,listeners,postgresql,integrations,tests]" ##### Install with `conda` -If you want to go with `conda` to install Argilla, firstly make sure that you have the latest version of conda on your system. You can go to the [anaconda page](https://conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation) and follow the tutorial there to make a clean install of `conda` on your system. - -Make sure that you are in the argilla folder. - -```sh -cd argilla -``` +If you want to go with `conda` to install Argilla, firstly make sure that you have the latest version of conda on your +system. You can go to the [anaconda page](https://conda.io/projects/conda/en/latest/user-guide/install/index.html#regular-installation) and follow the tutorial there to make a clean install of `conda` on +your system. -Then, you can go ahead and create a new conda development environment, and then, activate it: +Make sure that you are in the argilla folder. Then, you can go ahead and create a new conda development environment, and +then, activate it: ```sh conda env create -f environment_dev.yml conda activate argilla ``` -In the new Conda environment, Argilla will already be installed in editable mode with all the server dependencies. But if you’re willing to install any other dependency you can do so via `pip` to install your own, or just see the available extras besides the `server` extras, which are: `listeners`, `postgresql`, and `tests`; all those installable as `pip install -e ".[]"`. +In the new Conda environment, Argilla will already be installed in editable mode with all the server dependencies. But +if you’re willing to install any other dependency you can do so via `pip` to install your own, or just see the available +extras besides the `server` extras, which are: `listeners`, `postgresql`, and `tests`; all those installable as +`pip install -e ".[]"`. -Now, the Argilla package is set up on your system and you need to make further installations for a thorough development setup. +Now, the Argilla package is set up on your system and you need to make further installations for a thorough development +setup. #### Install Code Formatting Tools -To keep a consistent code format, we use [pre-commit](https://pre-commit.com/) hooks. So, you first need to install `pre-commit` if not installed already, via pip as follows: +To keep a consistent code format, we use [pre-commit](https://pre-commit.com/) hooks. So, you first need to install `pre-commit` if not +installed already, via pip as follows: ```sh pip install pre-commit @@ -210,13 +260,19 @@ pre-commit install ### Set up the Databases -Argilla is built upon two databases: vector database and relational database. The vector database stores all the record data and is the component that performs scalable vector similarity searches as well as basic vector searches. On the other hand, the relational database stores the metadata of the records and annotations besides user and workspace information. +Argilla is built upon two databases: vector database and relational database. The vector database stores all the record +data and is the component that performs scalable vector similarity searches as well as basic vector searches. On the +other hand, the relational database stores the metadata of the records and annotations besides user and workspace +information. #### Vector Database -Argilla supports ElasticSearch and OpenSearch as its main search engine for the vector database. One of the two is required to correctly run Argilla in your development environment. +Argilla supports ElasticSearch and OpenSearch as its main search engine for the vector database. One of the two is +required to correctly run Argilla in your development environment. -To install Elasticsearch or Opensearch, and to work with Argilla on your server later, you first need to install Docker on your system. You can find the Docker installation guides for [Windows](https://docs.docker.com/desktop/install/windows-install/), [macOS](https://docs.docker.com/desktop/install/mac-install/) and [Linux](https://docs.docker.com/desktop/install/linux-install/) on Docker website. +To install Elasticsearch or Opensearch, and to work with Argilla on your server later, you first need to install Docker +on your system. You can find the Docker installation guides for [Windows](https://docs.docker.com/desktop/install/windows-install/), [macOS](https://docs.docker.com/desktop/install/mac-install/) and [Linux](https://docs.docker.com/desktop/install/linux-install/) on +Docker website. To install ElasticSearch or OpenSearch, you can refer to the [Setup and Installation](/getting_started/installation/deployments/docker.md) guide. @@ -225,7 +281,8 @@ Argilla supports ElasticSearch versions >=8.5, and OpenSearch versions >=2.4. ::: :::{note} -For vector search in OpenSearch, the filtering applied is using a `post_filter` step, since there is a bug that makes queries fail using filtering + knn from Argilla. +For vector search in OpenSearch, the filtering applied is using a `post_filter` step, since there is a bug that makes +queries fail using filtering + knn from Argilla. See https://github.com/opensearch-project/k-NN/issues/1286 This may result in unexpected results when combining filtering with vector search with this engine. @@ -233,13 +290,17 @@ This may result in unexpected results when combining filtering with vector searc #### Relational Database and Migration -Argilla will use SQLite as the default built-in option to store information about users, workspaces, etc. for the relational database. No additional configuration is required to start using SQLite. +Argilla will use SQLite as the default built-in option to store information about users, workspaces, etc. for the +relational database. No additional configuration is required to start using SQLite. -By default, the database file will be created at `~/.argilla/argilla.db`, this can be configured by setting different values for `ARGILLA_DATABASE_URL` and `ARGILLA_HOME_PATH` environment variables. +By default, the database file will be created at `~/.argilla/argilla.db`, this can be configured by setting different +values for `ARGILLA_DATABASE_URL` and `ARGILLA_HOME_PATH` environment variables. ##### Run Database Migration -Starting from Argilla 1.16.0, the data of the FeedbackDataset along with the user and workspace information are stored in an SQL database (SQLite or PostgreSQL). With each Argilla release, you may need to update the database schema to the newer version. Here, you can find how to do this database migration. +Starting from Argilla 1.16.0, the data of the FeedbackDataset along with the user and workspace information are stored +in an SQL database (SQLite or PostgreSQL). With each Argilla release, you may need to update the database schema to +the newer version. Here, you can find how to do this database migration. You can run database migrations by executing the following command: @@ -247,11 +308,14 @@ You can run database migrations by executing the following command: argilla server database migrate ``` -The default SQLite database will be created at `~/.argilla/argilla.db`. This can be changed by setting different values for `ARGILLA_DATABASE_URL` and `ARGILLA_HOME_PATH` environment variables. +The default SQLite database will be created at `~/.argilla/argilla.db`. This can be changed by setting different values +for `ARGILLA_DATABASE_URL` and `ARGILLA_HOME_PATH` environment variables. ##### Create the Default User -To run the Argilla database and server on your system, you should at least create the default user. Alternatively, you may skip a default user and directly create user(s) whose credentials you will set up. You can refer to the [user management](../getting_started/installation/configurations/user_management.md#create-a-user) page for detailed information. +To run the Argilla database and server on your system, you should at least create the default user. Alternatively, you +may skip a default user and directly create user(s) whose credentials you will set up. You can refer to the +[user management](../getting_started/installation/configurations/user_management.md#create-a-user) page for detailed information. To create a default user, you can run the following command: @@ -261,7 +325,9 @@ argilla server database users create_default ##### Recreate the Database -Occasionally, it may be necessary to recreate the database from scratch to ensure a clean state in your development environment. For instance, to run the Argilla test suite or troubleshoot issues that could be related to database inconsistencies. +Occasionally, it may be necessary to recreate the database from scratch to ensure a clean state in your development +environment. For instance, to run the Argilla test suite or troubleshoot issues that could be related to database +inconsistencies. First, you need to delete the Argilla database with the following command: @@ -269,110 +335,65 @@ First, you need to delete the Argilla database with the following command: rm ~/.argilla/argilla.db ``` -After deleting the database, you will need to run the [database migration](#run-database-migration) task. By following these steps, you’ll have a fresh and clean database to work with. - -### Set up the Frontend +After deleting the database, you will need to run the [database migration](#run-database-migration) task. By following these steps, you’ll +have a fresh and clean database to work with. -If you want to work on the frontend of Argilla, you can do so by following the steps below. +### Set up Argilla Server -#### Clone the Argilla Repository - -Firstly, you have to [fork our repository and clone the fork](<(/community/contributing.md#work-with-a-fork)>) to your computer. - -```sh -git clone https://github.com/[your-github-username]/argilla.git -cd argilla -``` +If you want to work on the server of Argilla, please visit the `argilla-server` [README.md](/argilla-server/README.md) +file to see how to set up the server and run it on your local machine. -To keep your fork’s develop branch up to date with our repo you should add it as an [upstream remote branch](https://dev.to/louhayes3/git-add-an-upstream-to-a-forked-repo-1mik): - -```sh -git remote add upstream https://github.com/argilla-io/argilla.git -``` - -#### Build Frontend Static Files - -Build the static UI files in case you want to work on the UI: - -```sh -bash scripts/build_frontend.sh -``` - -#### Run Frontend Files - -Run the Argilla backend using Docker with the following command: - -```sh -docker run -d --name quickstart -p 6900:6900 argilla/argilla-quickstart:latest -``` +### Set up Argilla Frontend -Navigate to the `frontend` folder from your project's root directory. +If you want to work on the frontend of Argilla, please visit the `argilla-frontend` [README.md](/argilla-frontend/README.md) +file to see how to set up the frontend and run it on your local machine. -Then, execute the command: - -```sh -npm run dev -``` - -To log in, use the username `admin` and the password `12345678`. If you need more information, please check [here](/getting_started/quickstart_installation.ipynb). +## Make Your Contribution -### Set up the Server +Now that everything is up and running, you can start to develop and contribute to Argilla! You can refer to +our [contributor guide](/community/contributing.md) to have an understanding of how you can structure your contribution and upload it +to the repository. -Before running the Argilla server, it is recommended to [build the frontend files](#build-frontend-static-files) to be able to access the UI on your local host. +### Run Tests -Then, to run Argilla backend, you will need an ElasticSearch instance up and running for the time being. You can get one running using Docker with the following command: +#### Running Tests for the Argilla Python SDK +Running tests at the end of every development cycle is indispensable to make sure that there are no breaking changes. In +your Argilla environment, you can run all the tests as follows (Under the argilla project folder) ```sh -docker run -d --name elasticsearch-for-argilla -p 9200:9200 -p 9300:9300 -e "ES_JAVA_OPTS=-Xms512m -Xmx512m" -e "discovery.type=single-node" -e "xpack.security.enabled=false" docker.elastic.co/elasticsearch/elasticsearch:8.5.3 +pytest tests ``` -You will also need the vector database set up, as we show in the [Vector Database](#vector-database ) section. - - -#### Launch Argilla Server - -Now that your system has the Argilla backend server, you are ready to start your server and access Argilla. You can either use the CLI command, which uses the port 6900 and the host 0.0.0.0 as default. +You can also run only the unit tests by providing the proper path: ```sh -argilla server start ARGILLA_ENABLE_TELEMETRY=0 +pytest tests/unit ``` -Or you can start the server through uvicorn, with the following command: +For running more heavy integration tests you can just run pytest with the `tests/integration` folder: ```sh -ARGILLA_ENABLE_TELEMETRY=0 uvicorn argilla.server.app:app --port 6900 --host 0.0.0.0 --reload +pytest tests/integration ``` -With this command, you will activate reloading the backend files after every change. This way, whenever you make a change and save it, it will automatically be reflected in your server. +#### Running tests for the Argilla Server -Note that we start the server with `ARGILLA_ENABLE_TELEMETRY=0` to stop anonymous reporting for our development environment. You can read more about telemetry settings on the [telemetry page](/reference/telemetry.md). - -## Make Your Contribution - -Now that everything is up and running, you can start to develop and contribute to Argilla! You can refer to our [contributer guide](/community/contributing.md) to have an understanding of how you can structure your contribution and upload it to the repository. - -### Run Tests - -Running tests at the end of every development cycle is indispensable to make sure that there are no breaking changes. In your Argilla environment, you can run all the tests as follows: +To run the tests for the Argilla Server, you can use the following command (Under the argilla project folder): ```sh -pytest tests +pdm test test/unit ``` -You can also run only the unit tests by providing the proper path: +You can also set up a PostgreSQL database instead of the default sqlite backend: ```sh -pytest tests/unit +ARGILLA_DATABASE_URL=postgresql://postgres:postgres@localhost:5432 pdm test tests/unit ``` -For the unit tests, you can also set up a PostgreSQL database instead of the default sqlite backend: +#### Running tests for the Argilla Frontend -```sh -ARGILLA_DATABASE_URL=postgresql://postgres:postgres@localhost:5432 pytest tests/unit -``` - -For running more heavy integration tests you can just run pytest with the `tests/integration` folder: +To run the tests for the Argilla Frontend, you can use the following command (Under the argilla project folder): ```sh -pytest tests/integration +npm run test ```