Skip to content

Commit

Permalink
docs: update some refernces
Browse files Browse the repository at this point in the history
  • Loading branch information
davidberenstein1957 committed Feb 20, 2024
1 parent caade42 commit 1710fbf
Showing 1 changed file with 17 additions and 197 deletions.
214 changes: 17 additions & 197 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,9 @@
<p align="center">
<a href="https://demo.argilla.io/sign-in?auth=ZGVtbzoxMjM0NTY3OA%3D%3D">🛝 Demo</a> | </span>
<a href="https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html#%F0%9F%91%A9%F0%9F%8F%BD%E2%80%8D%F0%9F%9A%80-Argilla-on-Hugging-Face-Spaces">🚀 Deploy</a> <span> | </span>
<a href="#-contribute">👨‍💻 Features</a>
<a href="#-contribute">🤝 Contribute</a>
<a href="#-contribute">👨‍💻 Getting started</a> <span> | </span>
<a href="#-contribute">📺 Demo video</a> <span> | </span>
<a href="#-contribute">🏘️ Community meetup</a>
</p>
</h3>

Expand All @@ -48,11 +49,11 @@ If you just want to get started, great!

2. 🚀 Deploy Argilla for free using [three clicks](https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html#%F0%9F%91%A9%F0%9F%8F%BD%E2%80%8D%F0%9F%9A%80-Argilla-on-Hugging-Face-Spaces).

3. 👨‍💻 Explore our [unique features](https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html#%F0%9F%91%A9%F0%9F%8F%BD%E2%80%8D%F0%9F%9A%80-Argilla-on-Hugging-Face-Spaces).
3. 👨‍💻 Gettin started with our [unique features](https://docs.argilla.io/en/latest/getting_started/quickstart_installation.html#%F0%9F%91%A9%F0%9F%8F%BD%E2%80%8D%F0%9F%9A%80-Argilla-on-Hugging-Face-Spaces).

4. 📺 Watch our [demo video](https://www.youtube.com/watch?v=FlJ6hrBB2bU).

5. 🏘️ Attend our [online community meetup](https://lu.ma/embed-checkout/evt-IQtRiSuXZCIW6FB)
5. 🏘️ Attend our [online community meetup](https://lu.ma/embed-checkout/evt-IQtRiSuXZCIW6FB).

Want to know more? Read our [documentation](https://docs.argilla.io/).

Expand All @@ -62,15 +63,15 @@ We designed Argilla to help you create the **highest quality AI through the leas

<details>
<summary>Improve your AI output quality through data quality.</summary>
<p>
Compute is expensive and output quality is important. By focusing on data you can tackle the root cause of both of these problems.
</p>
Compute is expensive and output quality is important. By focusing on data you can tackle the root cause of both of these problems at once. Argilla helps you to create achieve and keep high-quality standards for your data. This means you can improve the quality of your AI output and reduce the time and cost of your AI projects.
</details>
<details>
<summary>Reduce the time-to-value for AI projects with engaging data interaction.</summary>
Gathering data is a time-consuming process. Argilla helps you to reduce the time-to-value for AI projects by providing a platform that allows you to interact with your data in a more engaging way. This means you can quickly and easily label your data with filters, AI feedback suggestions and semantic search. So you can focus on training your models and monitoring their performance.
</details>
<details>
<summary>Take control by owning your data and models.</summary>
Most AI platforms are black boxes. Argilla is different. We believe that you should be the owner your data and your models. That's why we provide you with all the tools your team needs to manage your data and models in a way that suits you best.
</details>

## What can you build with Argilla?
Expand All @@ -83,10 +84,10 @@ Argilla is a tool that can be used for high-quality data with a focus on NLP and
Additionally, AI experts and domain experts from companies like [the Red Cross](https://510.global/), [Loris.ai](https://loris.ai/) and [Prolific](https://www.prolific.com/) use Argilla to improve the quality and efficiency of their AI projects. They shared their experiences with our community in our [online community meetup](https://lu.ma/embed-checkout/evt-IQtRiSuXZCIW6FB).

- AI for good: [the Red Cross presentation](https://youtu.be/ZsCqrAhzkFU?feature=shared) showcases how their team collaborates by classifying and redirecting requests from refugees of the Ukrainian crisis to streamline the support processes of the Red Cross.
- Customer support: [Loris showed](https://youtu.be/jWrtgf2w4VU?feature=shared) how their AI team uses unsupervised and few-shot contrastive learning to help them quickly validate and gain labelled samples for a huge amount of multi-label classifiers.
- Research studies: [Prolific](https://youtu.be/ePDlhIxnuAs?feature=shared) is actively distributing data collection projects among its annotating workforce. They do this through an integration with our platform.
- Customer support: during [the Loris meetup](https://youtu.be/jWrtgf2w4VU?feature=shared) they showed how their AI team uses unsupervised and few-shot contrastive learning to help them quickly validate and gain labelled samples for a huge amount of multi-label classifiers.
- Research studies: [the showcase from Prolific](https://youtu.be/ePDlhIxnuAs?feature=shared) announced their integration with our platform. They use it to actively distribute data collection projects among their annotating workforce. This allows them to quickly and efficiently collect high quality data for their research studies.

## 🚀 Quickstart
## 👨‍💻 Getting started

<details>
<summary><a href="https://docs.argilla.io/en/latest/getting_started/installation/deployments/docker.html">pip install argilla</a></summary>
Expand Down Expand Up @@ -307,207 +308,26 @@ records = trainer.predict(["my-text"], as_argilla_records=True)
</p>
</details>

## 🛠️ Project Architecture

Argilla is built on 5 core components:

<details>
<summary><strong>Python SDK</strong></summary>
<p>

A Python SDK which is installable with `pip install argilla`. To interact with the Argilla Server and the Argilla UI. It provides an API to manage the data, configuration and annotation workflows.

</p>
</details>

<details>
<summary><strong>FastAPI Server</strong></summary>
<p>

The core of Argilla is a <strong>Python FastAPI</strong> server that manages the data, by pre-processing it and storing it in the vector database. Also, it stores application information in the relational database. It provides a REST API to interact with the data from the Python SDK and the Argilla UI. It also provides a web interface to visualize the data.

</p>
</details>

<details>
<summary><strong>Relational Database</strong></summary>
<p>

A relational database to store the metadata of the records and the annotations. <strong>SQLite</strong> is used as the default built-in option and is deployed separately with the Argilla Server but a separate <strong>PostgreSQL</strong> can be used too.

</p>
</details>
<details>
<summary><strong>Vector Database</strong></summary>
<p>

A vector database to store the records data and perform scalable vector similarity searches and basic document searches. We currently support <strong>ElasticSearch</strong> and <strong>AWS OpenSearch</strong> and they can be deployed as separate Docker images.

</p>
</details>

<details>
<summary><strong>Vue.js UI</strong></summary>
<p>

A web application to visualize and annotate your data, users and teams. It is built with <strong>Vue.js</strong> and is directly deployed alongside the Argilla Server within our Argilla Docker image.

</p>
</details>


## 📏 Principles

Argilla is a tool that is in continuous development, with the aim of always offering better workflows and methods for various NLP tasks. To achieve this, it is based on several principles that define its functionality and scope.

<details>
<summary><strong>Open</strong></summary>
<p>

Argilla is free, open-source, and 100% compatible with major NLP libraries (Hugging Face transformers, spaCy, Stanford Stanza, Flair, etc.). In fact, you can <strong>use and combine your preferred libraries</strong> without implementing any specific interface.

</p>
</details>

<details>
<summary><strong>End-to-end</strong></summary>
<p>

Most annotation tools treat data collection as a one-off activity at the beginning of each project. In real-world projects, data collection is a key activity of the iterative process of ML model development. Once a model goes into production, you want to monitor and analyze its predictions and collect more data to improve your model over time. Argilla is designed to close this gap, enabling you to <strong>iterate as much as you need</strong>.

</p>
</details>

<details>
<summary><strong>User and Developer Experience</strong></summary>
<p>

The key to sustainable NLP solutions are to make it easier for everyone to contribute to projects. <em>Domain experts</em> should feel comfortable interpreting and annotating data. <em>Data scientists</em> should feel free to experiment and iterate. <em>Engineers</em> should feel in control of data pipelines. Argilla optimizes the experience for these core users to <strong>make your teams more productive</strong>.

</p>
</details>

<details>
<summary><strong>Beyond hand-labeling</strong></summary>
<p>

Classical hand-labeling workflows are costly and inefficient, but having humans in the loop is essential. Easily combine hand-labeling with active learning, bulk-labeling, zero-shot models, and weak supervision in <strong>novel data annotation workflows</strong>.

</p>
</details>


## ❔ Frequently Asked Questions

Below, you can find answers to some of the most common questions about Argilla. For more information, refer to our [documentation](https://docs.argilla.io/en/develop/index.html).

<details>
<summary><strong>What is Argilla?</strong></summary>
<p>

Argilla is an open-source data curation platform, designed to enhance the development of both small and large language models (LLMs). Using Argilla, everyone can build robust language models through faster data curation using both human and machine feedback. We provide support for each step in the MLOps cycle, from data labeling to model monitoring. In fact, the inspiration behind the name "Argilla" comes from the word for "clay", in Latin, Italian and even in Catalan. And just as clay has been a fundamental medium for human creativity and tool-making throughout history, we view data as the essential material for sculpting and refining models.

</p>
</details>

<details>
<summary><strong>Does Argilla train models?</strong></summary>
<p>

Argilla does not train models but offers tools and integrations to help you do so. With Argilla, you can easily load data and train models straightforward using a feature we call the `ArgillaTrainer`. The `ArgillaTrainer` acts as a bridge to various popular NLP libraries. It simplifies the training process by offering an easy-to-understand interface for many NLP tasks using default pre-set settings without the need of converting data from Argilla's format. You can find more information about training models with Argilla <a href="https://docs.argilla.io/en/latest/practical_guides/fine_tune.html">here</a>.

</p>
</details>

<details>
<summary><strong>What is the difference between old datasets and the FeedbackDataset?</strong></summary>
<p>

The FeedbackDataset stands out for its versatility and adaptability, designed to support a wider range of NLP tasks including those centered on large language models. In contrast, older datasets, while more feature-rich in specific areas, are tailored to singular NLP tasks. However, in Argilla 2.0, the intention is to phase out the older datasets in favor of the FeedbackDataset. For a more detailed explanation, please refer to <a href="https://docs.argilla.io/en/latest/practical_guides/choose_dataset.html">this guide</a>.

</p>
</details>

<details>
<summary><strong>Can Argilla only be used for LLMs?</strong></summary>
<p>

No, Argilla is a versatile tool suitable for a wide range of NLP tasks. However, we emphasize the integration with small and large language models (LLMs), reflecting confidence in the significant role that they will play in the future of NLP. In this page, you can find a list of <a href="https://docs.argilla.io/en/latest/practical_guides/choose_dataset.html">supported tasks</a>.

</p>
</details>

<details>
<summary><strong>Does Argilla provide annotation workforces?</strong></summary>
<p>

Currently, we already have partnerships with annotation providers that ensure ethical practices and secure work environments. Feel free to schedule a meeting <a href="https://calendly.com/argilla-office-hours/30min">here</a> or contact us via <a href="mailto:david@argilla.io">email</a>.

</p>
</details>

<details>
<summary><strong>Does Argilla cost money?</strong></summary>
<p>

No, Argilla is an open-source platform. And we plan to keep Argilla free forever. However, we do offer a commercial version of Argilla called Argilla Cloud.

</p>
</details>

<details>
<summary><strong>What is the difference between Argilla open source and Argilla Cloud?</strong></summary>
<p>

Argilla Cloud is the counterpart to our open-source platform, offering a Software as a Service (SaaS) model, and doesn't add extra features beyond what is available in the open-source version. The main difference is its cloud-hosting, which caters especially to large teams requiring features that aren't typically necessary for individual practitioners or small businesses. So, Argilla Cloud is a SAS plus virtual private cloud deployment, with added features specifically related to the cloud. For those interested in the different plans available under Argilla Cloud, you can find detailed information on our <a href="https://argilla.io/pricing">website</a>.

</p>
</details>

<details>
<summary><strong>How does Argilla differ from competitors like Snorkel, Prodigy and Scale?</strong></summary>
<p>

Argilla distinguishes itself for its focus on specific use cases and human-in-the-loop approaches. While it does offer programmatic features, Argilla's core value lies in actively involving human experts in the tool-building process, setting it apart from other competitors.

Furthermore, Argilla places particular emphasis on smooth integration with other tools in the community, particularly within the realms of MLOps and NLP. So, its compatibility with popular frameworks like SpaCy and Hugging Face makes it exceptionally user-friendly and accessible.

Finally, platforms like Snorkel, Prodigy or Scale, while more comprehensive, often require a significant commitment. Argilla, on the other hand, works more as a component within the MLOps ecosystem, allowing users to begin with specific use cases and then scale up as needed. This flexibility is particularly beneficial for users and customers who prefer to start small and expand their applications over time, as opposed to committing to an all-encompassing platform from the outset.
## 🏘️ Community

</p>
</details>
🏘️ Attend our [online bi-weekly community meetup](https://lu.ma/embed-checkout/evt-IQtRiSuXZCIW6FB).

<details>
<summary><strong>What is Argilla currently working on?</strong></summary>
<p>
🙋‍♀️ Join the Argilla community on [Slack](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g) and get direct support from the community.

We are continuously working on improving Argilla's features and usability, focusing now concentrating on a three-pronged vision: the development of Argilla Core (open-source), Distilabel, and Argilla JS/TS. You can find a list of our current projects <a href="https://github.com/orgs/argilla-io/projects/10/views/1">here</a>.
Argilla [Github repo](https://github.com/argilla-io/argilla) to stay updated about new releases and tutorials.

</p>
</details>
🎁 We've just printed stickers! Would you like some? [Order stickers for free](https://tally.so/r/nr5gg2).

## 🤝 Contribute
## 🥇 Contributors

We love contributors and have launched a [collaboration with JustDiggit](https://argilla.io/blog/introducing-argilla-community-growers) to hand out our very own bunds and help the re-greening of sub-Saharan Africa. To help our community with the creation of contributions, we have created our [developer](https://docs.argilla.io/en/latest/community/developer_docs.html) and [contributor](https://docs.argilla.io/en/latest/community/contributing.html) docs. Additionally, you can always [schedule a meeting](https://calendly.com/argilla-office-hours/30min) with our Developer Advocacy team so they can get you up to speed.

## 🥇 Contributors

<a href="https://github.com/argilla-io/argilla/graphs/contributors">

<img src="https://contrib.rocks/image?repo=argilla-io/argilla" />

</a>

## 🏘️ Community

🏘️ Attend our [online bi-weekly community meetup](https://lu.ma/embed-checkout/evt-IQtRiSuXZCIW6FB).

🙋‍♀️ Join the Argilla community on [Slack](https://join.slack.com/t/rubrixworkspace/shared_invite/zt-whigkyjn-a3IUJLD7gDbTZ0rKlvcJ5g) and get direct support from the community.

⭐ Argilla [Github repo](https://github.com/argilla-io/argilla) to stay updated about new releases and tutorials.

🎁 We've just printed stickers! Would you like some? [Order stickers for free](https://tally.so/r/nr5gg2).

## 🗺️ Roadmap

We continuously work on updating [our plans and our roadmap](https://github.com/orgs/argilla-io/projects/10/views/1) and we love to discuss those with our community. Feel encouraged to participate.
Expand Down

0 comments on commit 1710fbf

Please sign in to comment.