docs: use FeedbackDataset in HF example (#4805)

# Description ![screencapture-127-0-0-1-8000-getting-started-installation-deployments-huggingface-spaces-html-2024-05-08-14_03_04](https://github.com/argilla-io/argilla/assets/127759186/e118e0b2-5689-45e6-bc08-a744fd32dde3) Closes #4740 **Type of change** (Remember to title the PR according to the type of change) - [ ] Documentation update **How Has This Been Tested** (Please describe the tests that you ran to verify your changes.) - [ ] `sphinx-autobuild` (read [Developer Documentation](https://docs.argilla.io/en/latest/community/developer_docs.html#building-the-documentation) for more details) **Checklist** - [ ] I added relevant documentation - [ ] I followed the style guidelines of this project - [ ] I did a self-review of my code - [ ] I made corresponding changes to the documentation - [ ] My changes generate no new warnings - [ ] I filled out [the contributor form](https://tally.so/r/n9XrxK) (see text above) - [ ] I have added relevant notes to the `CHANGELOG.md` file (See https://keepachangelog.com/)
argilla-io · May 13, 2024 · a422e27 · a422e27
1 parent 535526d
commit a422e27
Showing 1 changed file with 43 additions and 59 deletions.
diff --git a/docs/_source/getting_started/installation/deployments/huggingface-spaces.md b/docs/_source/getting_started/installation/deployments/huggingface-spaces.md
@@ -55,83 +55,67 @@ Once Argilla is running, you can use the UI with the Direct URL. This URL gives
 
 ### Create your first dataset
 
-If everything goes well, you are ready to use the Argilla Python client from an IDE such as Colab, Jupyter, or VS Code.
-
-If you want a quick step-by-step example, keep reading. If you want an end-to-end tutorial, go to this [tutorial and use Colab or Jupyter](https://docs.argilla.io/en/latest/tutorials/notebooks/training-textclassification-setfit-fewshot.html).
-
-First, we need to pip install `datasets` and `argilla` on Colab or your local machine:
+To create your first dataset, you need to pip install `argilla` on Colab or your local machine:
 
 ```bash
-pip install datasets argilla
-```
-
-Then, you can read the example dataset using the `datasets` library. This dataset is a CSV file uploaded to the Hub using the drag-and-drop feature.
-
-```python
-from datasets import load_dataset
-
-dataset = load_dataset("dvilasuero/banking_app", split="train").shuffle()
+pip install argilla
 ```
 
-You can create your first dataset by logging it into Argilla using your endpoint URL:
+Then, you have to connect to your Argilla HF Space. Get the `api_url` as mentioned before and copy the `api_key` from "My settings" (UI):
 
 ```python
 import argilla as rg
 
-# if you connect to your public app endpoint (uses default API key)
-rg.init(api_url="[your_space_url]", api_key="admin.apikey")
-
-# if you connect to your private app endpoint (uses default API key)
-rg.init(api_url="[your_space_url]", api_key="admin.apikey", extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"})
-
-# transform dataset into Argilla's format and log it
-rg.log(rg.read_datasets(dataset, task="TextClassification"), name="bankingapp_sentiment")
+# If you connect to your public HF Space
+rg.init(
+  api_url="[your_space_url]",
+  api_key="admin.apikey" # this is the default API key, don't change it if you didn't set up one during the Space creation
+  )
+
+# If you connect to your private HF Space
+rg.init(
+  api_url="[your_space_url]",
+  api_key="admin.apikey", # this is the default API key, don't change it if you didn't set up one during the Space creation
+  extra_headers={"Authorization": f"Bearer {os.environ['HF_TOKEN']}"}
+  )
 ```
-
-Congrats! You now have a dataset available from the Argilla UI to start browsing and labeling. In the code above, we've used one of the many integrations with Hugging Face libraries, which let you read hundreds of datasets available on the Hub.
-
-### Data labeling and model training
-
-At this point, you can label your data directly using your Argilla Space and read the training data to train your model of choice.
+Now, create a dataset for text classification. We'll use a task template, check the [docs](../../../practical_guides/create_update_dataset/create_dataset.md) to create a custom dataset. Indicate the workspace where the dataset will be created. You can check them in "My settings" (UI).
 
 ```python
-# this will read our current dataset and turn it into a clean dataset for training
-dataset = rg.load("bankingapp_sentiment").prepare_for_training()
+dataset = rg.FeedbackDataset.for_text_classification(
+    labels=["sadness", "joy"],
+    multi_label=False,
+    use_markdown=True,
+    guidelines=None,
+    metadata_properties=None,
+    vectors_settings=None,
+)
+# Create the dataset to be visualized in the UI (uses default workspace)
+dataset.push_to_argilla(name="my-first-dataset", workspace="admin")
 ```
-
-You can also get the full dataset and push it to the Hub for reproducibility and versioning:
+To add the records, create a list with the records you want to add. Match the fields with the ones specified before. You can also use pandas or `load_dataset` to read an existing dataset and create records from it.
 
 ```python
-# save full argilla dataset for reproducibility
-rg.load("bankingapp_sentiment").to_datasets().push_to_hub("bankingapp_sentiment")
+records = [
+    rg.FeedbackRecord(
+        fields={
+            "text": "I am so happy today",
+        },
+    ),
+    rg.FeedbackRecord(
+        fields={
+            "text": "I feel sad today",
+        },
+    )
+]
+dataset.add_records(records)
 ```
 
-Finally, this is how you can train a SetFit model using data from your Argilla Space:
+Congrats! You now have a dataset available from the Argilla UI to start browsing and labeling. Once annotated, you can also easily push it back to the Hub.
 
 ```python
-from sentence_transformers.losses import CosineSimilarityLoss
-
-from setfit import SetFitModel, SetFitTrainer
-
-# Create train test split
-dataset = dataset.train_test_split()
-
-# Load SetFit model from Hub
-model = SetFitModel.from_pretrained("sentence-transformers/paraphrase-mpnet-base-v2")
-
-# Create trainer
-trainer = SetFitTrainer(
-    model=model,
-    train_dataset=dataset["train"],
-    eval_dataset=dataset["test"],
-    loss_class=CosineSimilarityLoss,
-    batch_size=8,
-    num_iterations=20,
-)
-
-# Train and evaluate
-trainer.train()
-metrics = trainer.evaluate()
+dataset = rg.FeedbackDataset.from_argilla("my-first-dataset", workspace="admin")
+dataset.push_to_huggingface("my-repo/my-first-dataset")
 ```
 
 As a next step, you can check the [Argilla Tutorials](https://docs.argilla.io/en/latest/tutorials/tutorials.html) section. All the tutorials can be run using Colab or local Jupyter Notebooks, so you can start building datasets with Argilla and Spaces!