Skip to content

Latest commit

 

History

History
690 lines (516 loc) · 62.5 KB

CHANGELOG.md

File metadata and controls

690 lines (516 loc) · 62.5 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

Added

  • Added metadata_properties to the __repr__ method of the FeedbackDataset and RemoteFeedbackDataset.(#4192).

Fixed

  • Fixed error in ArgillaTrainer, with numerical labels use RatingQuestion instead of RankingQuestion (#4171)
  • Fixed error in ArgillaTrainer, now we can train for extractive_question_answering using a validation sample (#4204)
  • Fixed error in ArgillaTrainer, when training for sentence-similarity it didn't worked with a list of values per record (#4211)

Changed

  • Updated push_to_argilla to print repr of the pushed RemoteFeedbackDataset after push and changed show_progress to True by default. (#4223)

Added

  • Added POST /api/v1/datasets/:dataset_id/records/search endpoint to search for records without user context, including responses by all users. (#4143)
  • Added POST /api/v1/datasets/:dataset_id/vectors-settings endpoint for creating vector settings for a dataset. (#3776)
  • Added GET /api/v1/datasets/:dataset_id/vectors-settings endpoint for listing the vectors settings for a dataset. (#3776)
  • Added DELETE /api/v1/vectors-settings/:vector_settings_id endpoint for deleting a vector settings. (#3776)
  • Added PATCH /api/v1/vectors-settings/:vector_settings_id endpoint for updating a vector settings. (#4092)
  • Added GET /api/v1/records/:record_id endpoint to get a specific record. (#4039)
  • Added support to include vectors for GET /api/v1/datasets/:dataset_id/records endpoint response using include query param. (#4063)
  • Added support to include vectors for GET /api/v1/me/datasets/:dataset_id/records endpoint response using include query param. (#4063)
  • Added support to include vectors for POST /api/v1/me/datasets/:dataset_id/records/search endpoint response using include query param. (#4063)
  • Added show_progress argument to from_huggingface() method to make the progress bar for parsing records process optional.(#4132).
  • Added a progress bar for parsing records process to from_huggingface() method with trange in tqdm.(#4132).
  • Added to sort by inserted_at or updated_at for datasets with no metadata. (4147)
  • Added max_records argument to pull() method for RemoteFeedbackDataset.(#4074)
  • Added functionality to push your models to the Hugging Face hub with ArgillaTrainer.push_to_huggingface (#3976). Contributed by @Racso-3141.
  • Added filter_by argument to ArgillaTrainer to filter by response_status (#4120).
  • Added sort_by argument to ArgillaTrainer to sort by metadata (#4120).
  • Added max_records argument to ArgillaTrainer to limit record used for training (#4120).
  • Added add_vector_settings method to local and remote FeedbackDataset. (#4055)
  • Added update_vectors_settings method to local and remote FeedbackDataset. (#4122)
  • Added delete_vectors_settings method to local and remote FeedbackDataset. (#4130)
  • Added vector_settings_by_name method to local and remote FeedbackDataset. (#4055)
  • Added find_similar_records method to local and remote FeedbackDataset. (#4023)
  • Added ARGILLA_SEARCH_ENGINE environment variable to configure the search engine to use. (#4019)

Changed

  • [breaking] Remove support for Elasticsearch < 8.5 and OpenSearch < 2.4. (#4173)
  • [breaking] Users working with OpenSearch engines must use version >=2.4 and set ARGILLA_SEARCH_ENGINE=opensearch. (#4019 and #4111)
  • [breaking] Changed FeedbackDataset.*_by_name() methods to return None when no match is found (#4101).
  • [breaking] limit query parameter for GET /api/v1/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
  • [breaking] limit query parameter for GET /api/v1/me/datasets/:dataset_id/records endpoint is now only accepting values greater or equal than 1 and less or equal than 1000. (#4143)
  • Update GET /api/v1/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
  • Update GET /api/v1/me/datasets/:dataset_id/records endpoint to fetch record using the search engine. (#4142)
  • Update POST /api/v1/datasets/:dataset_id/records endpoint to allow to create records with vectors (#4022)
  • Update PATCH /api/v1/datasets/:dataset_id endpoint to allow updating allow_extra_metadata attribute. (#4112)
  • Update PATCH /api/v1/datasets/:dataset_id/records endpoint to allow to update records with vectors. (#4062)
  • Update PATCH /api/v1/records/:record_id endpoint to allow to update record with vectors. (#4062)
  • Update POST /api/v1/me/datasets/:dataset_id/records/search endpoint to allow to search records with vectors. (#4019)
  • Update BaseElasticAndOpenSearchEngine.index_records method to also index record vectors. (#4062)
  • Update FeedbackDataset.__init__ to allow passing a list of vector settings. (#4055)
  • Update FeedbackDataset.push_to_argilla to also push vector settings. (#4055)
  • Update FeedbackDatasetRecord to support the creation of records with vectors. (#4043)
  • Using cosine similarity to compute similarity between vectors. (#4124)

Fixed

  • Fixed svg images out of screen with too large images (#4047)
  • Fixed creating records with responses from multiple users. Closes #3746 and #3808 (#4142)
  • Fixed deleting or updating responses as an owner for annotators. (Commit 403a66d)
  • Fixed passing user_id when getting records by id. (Commit 98c7927)
  • Fixed non-basic tags serialized when pushing a dataset to the Hugging Face Hub. Closes #4089 (#4200)

Added

  • New GET /api/v1/datasets/:dataset_id/metadata-properties endpoint for listing dataset metadata properties. (#3813)
  • New POST /api/v1/datasets/:dataset_id/metadata-properties endpoint for creating dataset metadata properties. (#3813)
  • New PATCH /api/v1/metadata-properties/:metadata_property_id endpoint allowing the update of a specific metadata property. (#3952)
  • New DELETE /api/v1/metadata-properties/:metadata_property_id endpoint for deletion of a specific metadata property. (#3911)
  • New GET /api/v1/metadata-properties/:metadata_property_id/metrics endpoint to compute metrics for a specific metadata property. (#3856)
  • New PATCH /api/v1/records/:record_id endpoint to update a record. (#3920)
  • New PATCH /api/v1/dataset/:dataset_id/records endpoint to bulk update the records of a dataset. (#3934)
  • Missing validations to PATCH /api/v1/questions/:question_id. Now title and description are using the same validations used to create questions. (#3967)
  • Added TermsMetadataProperty, IntegerMetadataProperty and FloatMetadataProperty classes allowing to define metadata properties for a FeedbackDataset. (#3818)
  • Added metadata_filters to filter_by method in RemoteFeedbackDataset to filter based on metadata i.e. TermsMetadataFilter, IntegerMetadataFilter, and FloatMetadataFilter. (#3834)
  • Added a validation layer for both metadata_properties and metadata_filters in their schemas and as part of the add_records and filter_by methods, respectively. (#3860)
  • Added sort_by query parameter to listing records endpoints that allows to sort the records by inserted_at, updated_at or metadata property. (#3843)
  • Added add_metadata_property method to both FeedbackDataset and RemoteFeedbackDataset (i.e. FeedbackDataset in Argilla). (#3900)
  • Added fields inserted_at and updated_at in RemoteResponseSchema. (#3822)
  • Added support for sort_by for RemoteFeedbackDataset i.e. a FeedbackDataset uploaded to Argilla. (#3925)
  • Added metadata_properties support for both push_to_huggingface and from_huggingface. (#3947)
  • Add support for update records (metadata) from Python SDK. (#3946)
  • Added delete_metadata_properties method to delete metadata properties. (#3932)
  • Added update_metadata_properties method to update metadata_properties. (#3961)
  • Added automatic model card generation through ArgillaTrainer.save (#3857)
  • Added FeedbackDataset TaskTemplateMixin for pre-defined task templates. (#3969)
  • A maximum limit of 50 on the number of options a ranking question can accept. (#3975)
  • New last_activity_at field to FeedbackDataset exposing when the last activity for the associated dataset occurs. (#3992)

Changed

  • GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records and POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to return the total number of records. (#3848, #3903)
  • Implemented __len__ method for filtered datasets to return the number of records matching the provided filters. (#3916)
  • Increase the default max result window for Elasticsearch created for Feedback datasets. (#3929)
  • Force elastic index refresh after records creation. (#3929)
  • Validate metadata fields for filtering and sorting in the Python SDK. (#3993)
  • Using metadata property name instead of id for indexing data in search engine index. (#3994)

Fixed

  • Fixed response schemas to allow values to be None i.e. when a record is discarded the response.values are set to None. (#3926)

Added

  • Added fields inserted_at and updated_at in RemoteResponseSchema (#3822).
  • Added automatic model card generation through ArgillaTrainer.save (#3857).
  • Added task templates to the FeedbackDataset (#3973).

Changed

  • Updated Dockerfile to use multi stage build (#3221 and #3793).
  • Updated active learning for text classification notebooks to use the most recent small-text version (#3831).
  • Changed argilla dataset name in the active learning for text classification notebooks to be consistent with the default names in the huggingface spaces (#3831).
  • FeedbackDataset API methods have been aligned to be accessible through the several implementations (#3937).
  • The unify_responses support for remote datasets (#3937).

Fixed

  • Fix field not shown in the order defined in the dataset settings. Closes #3959 (#3984)
  • Updated active learning for text classification notebooks to pass ids of type int to TextClassificationRecord (#3831).
  • Fixed record fields validation that was preventing from logging records with optional fields (i.e. required=True) when the field value was None (#3846).
  • Always set pretrained_model_name_or_path attribute as string in ArgillaTrainer (#3914).
  • The inserted_at and updated_at attributes are create using the utcnow factory to avoid unexpected race conditions on timestamp creation (#3945)
  • Fixed configure_dataset_settings when providing the workspace via the arg workspace (#3887).
  • Fixed saving of models trained with ArgillaTrainer with a peft_config parameter (#3795).
  • Fixed backwards compatibility on from_huggingface when loading a FeedbackDataset from the Hugging Face Hub that was previously dumped using another version of Argilla, starting at 1.8.0, when it was first introduced (#3829).
  • Fixed wrong __repr__ problem for TrainingTask. (#3969)
  • Fixed wrong key return error prepare_for_training_with_* for TrainingTask. (#3969)

Deprecated

  • Function rg.configure_dataset is deprecated in favour of rg.configure_dataset_settings. The former will be removed in version 1.19.0

Added

  • Added ArgillaTrainer integration with sentence-transformers, allowing fine tuning for sentence similarity (#3739)
  • Added ArgillaTrainer integration with TrainingTask.for_question_answering (#3740)
  • Added Auto save record to save automatically the current record that you are working on (#3541)
  • Added ArgillaTrainer integration with OpenAI, allowing fine tuning for chat completion (#3615)
  • Added workspaces list command to list Argilla workspaces (#3594).
  • Added datasets list command to list Argilla datasets (#3658).
  • Added users create command to create users (#3667).
  • Added whoami command to get current user (#3673).
  • Added users delete command to delete users (#3671).
  • Added users list command to list users (#3688).
  • Added workspaces delete-user command to remove a user from a workspace (#3699).
  • Added datasets list command to list Argilla datasets (#3658).
  • Added users create command to create users (#3667).
  • Added users delete command to delete users (#3671).
  • Added workspaces create command to create an Argilla workspace (#3676).
  • Added datasets push-to-hub command to push a FeedbackDataset from Argilla into the HuggingFace Hub (#3685).
  • Added info command to get info about the used Argilla client and server (#3707).
  • Added datasets delete command to delete a FeedbackDataset from Argilla (#3703).
  • Added created_at and updated_at properties to RemoteFeedbackDataset and FilteredRemoteFeedbackDataset (#3709).
  • Added handling PermissionError when executing a command with a logged in user with not enough permissions (#3717).
  • Added workspaces add-user command to add a user to workspace (#3712).
  • Added workspace_id param to GET /api/v1/me/datasets endpoint (#3727).
  • Added workspace_id arg to list_datasets in the Python SDK (#3727).
  • Added argilla script that allows to execute Argilla CLI using the argilla command (#3730).
  • Added support for passing already initialized model and tokenizer instances to the ArgillaTrainer (#3751)
  • Added server_info function to check the Argilla server information (also accessible via rg.server_info) (#3772).

Changed

  • Move database commands under server group of commands (#3710)
  • server commands only included in the CLI app when server extra requirements are installed (#3710).
  • Updated PUT /api/v1/responses/{response_id} to replace values stored with received values in request (#3711).
  • Display a UserWarning when the user_id in Workspace.add_user and Workspace.delete_user is the ID of an user with the owner role as they don't require explicit permissions (#3716).
  • Rename tasks sub-package to cli (#3723).
  • Changed argilla database command in the CLI to now be accessed via argilla server database, to be deprecated in the upcoming release (#3754).
  • Changed visible_options (of label and multi label selection questions) validation in the backend to check that the provided value is greater or equal than/to 3 and less or equal than/to the number of provided options (#3773).

Fixed

  • Fixed remove user modification in text component on clear answers (#3775)
  • Fixed Highlight raw text field in dataset feedback task (#3731)
  • Fixed Field title too long (#3734)
  • Fixed error messages when deleting a DatasetForTextClassification (#3652)
  • Fixed Pending queue pagination problems when during data annotation (#3677)
  • Fixed visible_labels default value to be 20 just when visible_labels not provided and len(labels) > 20, otherwise it will either be the provided visible_labels value or None, for LabelQuestion and MultiLabelQuestion (#3702).
  • Fixed DatasetCard generation when RemoteFeedbackDataset contains suggestions (#3718).
  • Add missing draft status in ResponseSchema as now there can be responses with draft status when annotating via the UI (#3749).
  • Searches when queried words are distributed along the record fields (#3759).
  • Fixed Python 3.11 compatibility issue with /api/datasets endpoints due to the TaskType enum replacement in the endpoint URL (#3769).
  • Fixed RankingValueSchema and FeedbackRankingValueModel schemas to allow rank=None when status=draft (#3781).

Fixed

  • Fixed Text component text content sanitization behavior just for markdown to prevent disappear the text(#3738)
  • Fixed Text component now you need to press Escape to exit the text area (#3733)
  • Fixed SearchEngine was creating the same number of primary shards and replica shards for each FeedbackDataset (#3736).

Added

  • Added Enable to update guidelines and dataset settings for Feedback Datasets directly in the UI (#3489)
  • Added ArgillaTrainer integration with TRL, allowing for easy supervised finetuning, reward modeling, direct preference optimization and proximal policy optimization (#3467)
  • Added formatting_func to ArgillaTrainer for FeedbackDataset datasets add a custom formatting for the data (#3599).
  • Added login function in argilla.client.login to login into an Argilla server and store the credentials locally (#3582).
  • Added login command to login into an Argilla server (#3600).
  • Added logout command to logout from an Argilla server (#3605).
  • Added DELETE /api/v1/suggestions/{suggestion_id} endpoint to delete a suggestion given its ID (#3617).
  • Added DELETE /api/v1/records/{record_id}/suggestions endpoint to delete several suggestions linked to the same record given their IDs (#3617).
  • Added response_status param to GET /api/v1/datasets/{dataset_id}/records to be able to filter by response_status as previously included for GET /api/v1/me/datasets/{dataset_id}/records (#3613).
  • Added list classmethod to ArgillaMixin to be used as FeedbackDataset.list(), also including the workspace to list from as arg (#3619).
  • Added filter_by method in RemoteFeedbackDataset to filter based on response_status (#3610).
  • Added list_workspaces function (to be used as rg.list_workspaces, but Workspace.list is preferred) to list all the workspaces from an user in Argilla (#3641).
  • Added list_datasets function (to be used as rg.list_datasets) to list the TextClassification, TokenClassification, and Text2Text datasets in Argilla (#3638).
  • Added RemoteSuggestionSchema to manage suggestions in Argilla, including the delete method to delete suggestios from Argilla via DELETE /api/v1/suggestions/{suggestion_id} (#3651).
  • Added delete_suggestions to RemoteFeedbackRecord to remove suggestions from Argilla via DELETE /api/v1/records/{record_id}/suggestions (#3651).

Changed

  • Changed Optional label for * mark for required question (#3608)
  • Updated RemoteFeedbackDataset.delete_records to use batch delete records endpoint (#3580).
  • Included allowed_for_roles for some RemoteFeedbackDataset, RemoteFeedbackRecords, and RemoteFeedbackRecord methods that are only allowed for users with roles owner and admin (#3601).
  • Renamed ArgillaToFromMixin to ArgillaMixin (#3619).
  • Move users CLI app under database CLI app (#3593).
  • Move server Enum classes to argilla.server.enums module (#3620).

Fixed

  • Fixed Filter by workspace in breadcrumbs (#3577)
  • Fixed Filter by workspace in datasets table (#3604)
  • Fixed Query search highlight for Text2Text and TextClassification (#3621)
  • Fixed RatingQuestion.values validation to raise a ValidationError when values are out of range i.e. [1, 10] (#3626).

Removed

  • Removed multi_task_text_token_classification from TaskType as not used (#3640).
  • Removed argilla_id in favor of id from RemoteFeedbackDataset (#3663).
  • Removed fetch_records from RemoteFeedbackDataset as now the records are lazily fetched from Argilla (#3663).
  • Removed push_to_argilla from RemoteFeedbackDataset, as it just works when calling it through a FeedbackDataset locally, as now the updates of the remote datasets are automatically pushed to Argilla (#3663).
  • Removed set_suggestions in favor of update(suggestions=...) for both FeedbackRecord and RemoteFeedbackRecord, as all the updates of any "updateable" attribute of a record will go through update instead (#3663).
  • Remove unused owner attribute for client Dataset data model (#3665)

Fixed

  • Fixed PostgreSQL database not being updated after begin_nested because of missing commit (#3567).

Fixed

  • Fixed settings could not be provided when updating a rating or ranking question (#3552).

Added

  • Added PATCH /api/v1/fields/{field_id} endpoint to update the field title and markdown settings (#3421).
  • Added PATCH /api/v1/datasets/{dataset_id} endpoint to update dataset name and guidelines (#3402).
  • Added PATCH /api/v1/questions/{question_id} endpoint to update question title, description and some settings (depending on the type of question) (#3477).
  • Added DELETE /api/v1/records/{record_id} endpoint to remove a record given its ID (#3337).
  • Added pull method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) to pull all the records from it and return it as a local copy as a FeedbackDataset (#3465).
  • Added delete method in RemoteFeedbackDataset (a FeedbackDataset pushed to Argilla) (#3512).
  • Added delete_records method in RemoteFeedbackDataset, and delete method in RemoteFeedbackRecord to delete records from Argilla (#3526).

Changed

  • Improved efficiency of weak labeling when dataset contains vectors (#3444).
  • Added ArgillaDatasetMixin to detach the Argilla-related functionality from the FeedbackDataset (#3427)
  • Moved FeedbackDataset-related pydantic.BaseModel schemas to argilla.client.feedback.schemas instead, to be better structured and more scalable and maintainable (#3427)
  • Update CLI to use database async connection (#3450).
  • Limit rating questions values to the positive range [1, 10] (#3451).
  • Updated POST /api/users endpoint to be able to provide a list of workspace names to which the user should be linked to (#3462).
  • Updated Python client User.create method to be able to provide a list of workspace names to which the user should be linked to (#3462).
  • Updated GET /api/v1/me/datasets/{dataset_id}/records endpoint to allow getting records matching one of the response statuses provided via query param (#3359).
  • Updated POST /api/v1/me/datasets/{dataset_id}/records endpoint to allow searching records matching one of the response statuses provided via query param (#3359).
  • Updated SearchEngine.search method to allow searching records matching one of the response statuses provided (#3359).
  • After calling FeedbackDataset.push_to_argilla, the methods FeedbackDataset.add_records and FeedbackRecord.set_suggestions will automatically call Argilla with no need of calling push_to_argilla explicitly (#3465).
  • Now calling FeedbackDataset.push_to_huggingface dumps the responses as a List[Dict[str, Any]] instead of Sequence to make it more readable via 🤗datasets (#3539).

Fixed

  • Fixed issue with bool values and default from Jinja2 while generating the HuggingFace DatasetCard from argilla_template.md (#3499).
  • Fixed DatasetConfig.from_yaml which was failing when calling FeedbackDataset.from_huggingface as the UUIDs cannot be deserialized automatically by PyYAML, so UUIDs are neither dumped nor loaded anymore (#3502).
  • Fixed an issue that didn't allow the Argilla server to work behind a proxy (#3543).
  • TextClassificationSettings and TokenClassificationSettings labels are properly parsed to strings both in the Python client and in the backend endpoint (#3495).
  • Fixed PUT /api/v1/datasets/{dataset_id}/publish to check whether at least one field and question has required=True (#3511).
  • Fixed FeedbackDataset.from_huggingface as suggestions were being lost when there were no responses (#3539).
  • Fixed QuestionSchema and FieldSchema not validating name attribute (#3550).

Deprecated

  • After calling FeedbackDataset.push_to_argilla, calling push_to_argilla again won't do anything since the dataset is already pushed to Argilla (#3465).
  • After calling FeedbackDataset.push_to_argilla, calling fetch_records won't do anything since the records are lazily fetched from Argilla (#3465).
  • After calling FeedbackDataset.push_to_argilla, the Argilla ID is no longer stored in the attribute/property argilla_id but in id instead (#3465).

Fixed

  • Fixed ModuleNotFoundError caused because the argilla.utils.telemetry module used in the ArgillaTrainer was importing an optional dependency not installed by default (#3471).
  • Fixed ImportError caused because the argilla.client.feedback.config module was importing pyyaml optional dependency not installed by default (#3471).

Fixed

  • The suggestion_type_enum ENUM data type created in PostgreSQL didn't have any value (#3445).

Fixed

  • Fix database migration for PostgreSQL (See #3438)

Added

  • Added GET /api/v1/users/{user_id}/workspaces endpoint to list the workspaces to which a user belongs (#3308 and #3343).
  • Added HuggingFaceDatasetMixin for internal usage, to detach the FeedbackDataset integrations from the class itself, and use Mixins instead (#3326).
  • Added GET /api/v1/records/{record_id}/suggestions API endpoint to get the list of suggestions for the responses associated to a record (#3304).
  • Added POST /api/v1/records/{record_id}/suggestions API endpoint to create a suggestion for a response associated to a record (#3304).
  • Added support for RankingQuestionStrategy, RankingQuestionUnification and the .for_text_classification method for the TrainingTaskMapping (#3364)
  • Added PUT /api/v1/records/{record_id}/suggestions API endpoint to create or update a suggestion for a response associated to a record (#3304 & 3391).
  • Added suggestions attribute to FeedbackRecord, and allow adding and retrieving suggestions from the Python client (#3370)
  • Added allowed_for_roles Python decorator to check whether the current user has the required role to access the decorated function/method for User and Workspace (#3383)
  • Added API and Python Client support for workspace deletion (Closes #3260)
  • Added GET /api/v1/me/workspaces endpoint to list the workspaces of the current active user (#3390)

Changed

  • Updated output payload for GET /api/v1/datasets/{dataset_id}/records, GET /api/v1/me/datasets/{dataset_id}/records, POST /api/v1/me/datasets/{dataset_id}/records/search endpoints to include the suggestions of the records based on the value of the include query parameter (#3304).
  • Updated POST /api/v1/datasets/{dataset_id}/records input payload to add suggestions (#3304).
  • The POST /api/datasets/:dataset-id/:task/bulk endpoints don't create the dataset if does not exists (Closes #3244)
  • Added Telemetry support for ArgillaTrainer (closes #3325)
  • User.workspaces is no longer an attribute but a property, and is calling list_user_workspaces to list all the workspace names for a given user ID (#3334)
  • Renamed FeedbackDatasetConfig to DatasetConfig and export/import from YAML as default instead of JSON (just used internally on push_to_huggingface and from_huggingface methods of FeedbackDataset) (#3326).
  • The protected metadata fields support other than textual info - existing datasets must be reindex. See docs for more detail (Closes #3332).
  • Updated Dockerfile parent image from python:3.9.16-slim to python:3.10.12-slim (#3425).
  • Updated quickstart.Dockerfile parent image from elasticsearch:8.5.3 to argilla/argilla-server:${ARGILLA_VERSION} (#3425).

Removed

  • Removed support to non-prefixed environment variables. All valid env vars start with ARGILLA_ (See #3392).

Fixed

  • Fixed GET /api/v1/me/datasets/{dataset_id}/records endpoint returning always the responses for the records even if responses was not provided via the include query parameter (#3304).
  • Values for protected metadata fields are not truncated (Closes #3331).
  • Big number ids are properly rendered in UI (Closes #3265)
  • Fixed ArgillaDatasetCard to include the values/labels for all the existing questions (#3366)

Deprecated

  • Integer support for record id in text classification, token classification and text2text datasets.

Fixed

  • Using rg.init with default argilla user skips setting the default workspace if not available. (Closes #3340)
  • Resolved wrong import structure for ArgillaTrainer and TrainingTaskMapping (Closes #3345)
  • Pin pydantic dependency to version < 2 (Closes 3348)

Added

  • Added RankingQuestionSettings class allowing to create ranking questions in the API using POST /api/v1/datasets/{dataset_id}/questions endpoint (#3232)
  • Added RankingQuestion in the Python client to create ranking questions (#3275).
  • Added Ranking component in feedback task question form (#3177 & #3246).
  • Added FeedbackDataset.prepare_for_training method for generaring a framework-specific dataset with the responses provided for RatingQuestion, LabelQuestion and MultiLabelQuestion (#3151).
  • Added ArgillaSpaCyTransformersTrainer class for supporting the training with spacy-transformers (#3256).

Docs

  • Added instructions for how to run the Argilla frontend in the developer docs (#3314).

Changed

  • All docker related files have been moved into the docker folder (#3053).
  • release.Dockerfile have been renamed to Dockerfile (#3133).
  • Updated rg.load function to raise a ValueError with a explanatory message for the cases in which the user tries to use the function to load a FeedbackDataset (#3289).
  • Updated ArgillaSpaCyTrainer to allow re-using tok2vec (#3256).

Fixed

  • Check available workspaces on Argilla on rg.set_workspace (Closes #3262)

Fixed

  • Replaced np.float alias by float to avoid AttributeError when using find_label_errors function with numpy>=1.24.0 (#3214).
  • Fixed format_as("datasets") when no responses or optional respones in FeedbackRecord, to set their value to what 🤗 Datasets expects instead of just None (#3224).
  • Fixed push_to_huggingface() when generate_card=True (default behaviour), as we were passing a sample record to the ArgillaDatasetCard class, and UUIDs introduced in 1.10.0 (#3192), are not JSON-serializable (#3231).
  • Fixed from_argilla and push_to_argilla to ensure consistency on both field and question re-construction, and to ensure UUIDs are properly serialized as str, respectively (#3234).
  • Refactored usage of import argilla as rg to clarify package navigation (#3279).

Docs

  • Fixed URLs in Weak Supervision with Sentence Tranformers tutorial #3243.
  • Fixed library buttons' formatting on Tutorials page (#3255).
  • Modified styling of error code outputs in notebooks (#3270).
  • Added ElasticSearch and OpenSearch versions (#3280).
  • Removed template notebook from table of contents (#3271).
  • Fixed tutorials with pip install argilla to not use older versions of the package (#3282).

Added

  • Added metadata attribute to the Record of the FeedbackDataset (#3194)
  • New users update command to update the role for an existing user (#3188)
  • New Workspace class to allow users manage their Argilla workspaces and the users assigned to those workspaces via the Python client (#3180)
  • Added User class to let users manage their Argilla users via the Python client (#3169).
  • Added an option to display tqdm progress bar to FeedbackDataset.push_to_argilla when looping over the records to upload (#3233).

Changed

  • The role system now support three different roles owner, admin and annotator (#3104)
  • admin role is scoped to workspace-level operations (#3115)
  • The owner user is created among the default pool of users in the quickstart, and the default user in the server has now owner role (#3248), reverting (#3188).

Deprecated

Added

  • Added search component for feedback datasets (#3138)
  • Added markdown support for feedback dataset guidelines (#3153)
  • Added Train button for feedback datasets (#3170)

Changed

  • Updated SearchEngine and POST /api/v1/me/datasets/{dataset_id}/records/search to return the total number of records matching the search query (#3166)

Fixed

  • Replaced Enum for string value in URLs for client API calls (Closes #3149)
  • Resolve breaking issue with ArgillaSpanMarkerTrainer for Named Entity Recognition with span_marker v1.1.x onwards.
  • Move ArgillaDatasetCard import under @requires_version decorator, so that the ImportError on huggingface_hub is handled properly (#3174)
  • Allow flow FeedbackDataset.from_argilla -> FeedbackDataset.push_to_argilla under different dataset names and/or workspaces (#3192)

Docs

  • Resolved typos in the docs (#3240).
  • Fixed mention of master branch (#3254).

Added

  • Added boolean use_markdown property to TextFieldSettings model.
  • Added boolean use_markdown property to TextQuestionSettings model.
  • Added new status draft for the Response model.
  • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API (#3005)
  • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API (#3010).
  • Added POST /api/v1/me/datasets/{dataset_id}/records/search endpoint (#3068).
  • Added new components in feedback task Question form: MultiLabel (#3064) and SingleLabel (#3016).
  • Added docstrings to the pydantic.BaseModels defined at argilla/client/feedback/schemas.py (#3137)
  • Added the information about executing tests in the developer documentation ([#3143]).

Changed

  • Updated GET /api/v1/me/datasets/:dataset_id/metrics output payload to include the count of responses with draft status.
  • Added LabelSelectionQuestionSettings class allowing to create label selection (single-choice) questions in the API.
  • Added MultiLabelSelectionQuestionSettings class allowing to create multi-label selection (multi-choice) questions in the API.
  • Database setup for unit tests. Now the unit tests use a different database than the one used by the local Argilla server (Closes #2987).
  • Updated alembic setup to be able to autogenerate revision/migration scripts using SQLAlchemy metadata from Argilla server models (#3044)
  • Improved DatasetCard generation on FeedbackDataset.push_to_huggingface when generate_card=True, following the official HuggingFace Hub template, but suited to FeedbackDatasets from Argilla (#3110)

Fixed

  • Disallow fields and questions in FeedbackDataset with the same name (#3126).
  • Fixed broken links in the documentation and updated the development branch name from development to develop ([#3145]).

Added

  • /api/v1/datasets new endpoint to list and create datasets (#2615).
  • /api/v1/datasets/{dataset_id} new endpoint to get and delete datasets (#2615).
  • /api/v1/datasets/{dataset_id}/publish new endpoint to publish a dataset (#2615).
  • /api/v1/datasets/{dataset_id}/questions new endpoint to list and create dataset questions (#2615)
  • /api/v1/datasets/{dataset_id}/fields new endpoint to list and create dataset fields (#2615)
  • /api/v1/datasets/{dataset_id}/questions/{question_id} new endpoint to delete a dataset questions (#2615)
  • /api/v1/datasets/{dataset_id}/fields/{field_id} new endpoint to delete a dataset field (#2615)
  • /api/v1/workspaces/{workspace_id} new endpoint to get workspaces by id (#2615)
  • /api/v1/responses/{response_id} new endpoint to update and delete a response (#2615)
  • /api/v1/datasets/{dataset_id}/records new endpoint to create and list dataset records (#2615)
  • /api/v1/me/datasets new endpoint to list user visible datasets (#2615)
  • /api/v1/me/dataset/{dataset_id}/records new endpoint to list dataset records with user responses (#2615)
  • /api/v1/me/datasets/{dataset_id}/metrics new endpoint to get the dataset user metrics (#2615)
  • /api/v1/me/records/{record_id}/responses new endpoint to create record user responses (#2615)
  • showing new feedback task datasets in datasets list ([#2719])
  • new page for feedback task ([#2680])
  • show feedback task metrics ([#2822])
  • user can delete dataset in dataset settings page ([#2792])
  • Support for FeedbackDataset in Python client (parent PR #2615, and nested PRs: [#2949], [#2827], [#2943], [#2945], [#2962], and [#3003])
  • Integration with the HuggingFace Hub ([#2949])
  • Added ArgillaPeftTrainer for text and token classificaiton #2854
  • Added predict_proba() method to ArgillaSetFitTrainer
  • Added ArgillaAutoTrainTrainer for Text Classification #2664
  • New database revisions command showing database revisions info

Fixes

  • Avoid rendering html for invalid html strings in Text2text ([#2911]argilla-io#2911)

Changed

  • The database migrate command accepts a --revision param to provide specific revision id
  • tokens_length metrics function returns empty data (#3045)
  • token_length metrics function returns empty data (#3045)
  • mention_length metrics function returns empty data (#3045)
  • entity_density metrics function returns empty data (#3045)

Deprecated

  • Using Argilla with Python 3.7 runtime is deprecated and support will be removed from version 1.11.0 (#2902)
  • tokens_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • token_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • mention_length metrics function has been deprecated and will be removed in 1.10.0 (#3045)
  • entity_density metrics function has been deprecated and will be removed in 1.10.0 (#3045)

Removed

  • Removed mention density, tokens_length and chars_length metrics from token classification metrics storage (#3045)
  • Removed token char_start, char_end, tag, and score metrics from token classification metrics storage (#3045)
  • Removed tags-related metrics from token classification metrics storage (#3045)

Added

  • add max_retries and num_threads parameters to rg.log to run data logging request concurrently with backoff retry policy. See #2458 and #2533
  • rg.load accepts include_vectors and include_metrics when loading data. Closes #2398
  • Added settings param to prepare_for_training (#2689)
  • Added prepare_for_training for openai (#2658)
  • Added ArgillaOpenAITrainer (#2659)
  • Added ArgillaSpanMarkerTrainer for Named Entity Recognition (#2693)
  • Added ArgillaTrainer CLI support. Closes (#2809)

Fixes

  • fix image alignment on token classification

Changed

  • Argilla quickstart image dependencies are externalized into quickstart.requirements.txt. See #2666
  • bulk endpoints will upsert data when record id is present. Closes #2535
  • moved from click to typer CLI support. Closes (#2815)
  • Argilla server docker image is built with PostgreSQL support. Closes #2686
  • The rg.log computes all batches and raise an error for all failed batches.
  • The default batch size for rg.log is now 100.

Fixed

  • argilla.training bugfixes and unification (#2665)
  • Resolved several small bugs in the ArgillaTrainer.

Deprecated

  • The rg.log_async function is deprecated and will be removed in next minor release.

Added

  • ARGILLA_HOME_PATH new environment variable (#2564).
  • ARGILLA_DATABASE_URL new environment variable (#2564).
  • Basic support for user roles with admin and annotator (#2564).
  • id, first_name, last_name, role, inserted_at and updated_at new user fields (#2564).
  • /api/users new endpoint to list and create users (#2564).
  • /api/users/{user_id} new endpoint to delete users (#2564).
  • /api/workspaces new endpoint to list and create workspaces (#2564).
  • /api/workspaces/{workspace_id}/users new endpoint to list workspace users (#2564).
  • /api/workspaces/{workspace_id}/users/{user_id} new endpoint to create and delete workspace users (#2564).
  • argilla.tasks.users.migrate new task to migrate users from old YAML file to database (#2564).
  • argilla.tasks.users.create new task to create a user (#2564).
  • argilla.tasks.users.create_default new task to create a user with default credentials (#2564).
  • argilla.tasks.database.migrate new task to execute database migrations (#2564).
  • release.Dockerfile and quickstart.Dockerfile now creates a default argilladata volume to persist data (#2564).
  • Add user settings page. Closes #2496
  • Added Argilla.training module with support for spacy, setfit, and transformers. Closes #2504

Fixes

  • Now the prepare_for_training method is working when multi_label=True. Closes #2606

Changed

  • ARGILLA_USERS_DB_FILE environment variable now it's only used to migrate users from YAML file to database (#2564).
  • full_name user field is now deprecated and first_name and last_name should be used instead (#2564).
  • password user field now requires a minimum of 8 and a maximum of 100 characters in size (#2564).
  • quickstart.Dockerfile image default users from team and argilla to admin and annotator including new passwords and API keys (#2564).
  • Datasets to be managed only by users with admin role (#2564).
  • The list of rules is now accessible while metrics are computed. Closes#2117
  • Style updates for weak labeling and adding feedback toast when delete rules. See #2626 and #2648

Removed

  • email user field (#2564).
  • disabled user field (#2564).
  • Support for private workspaces (#2564).
  • ARGILLA_LOCAL_AUTH_DEFAULT_APIKEY and ARGILLA_LOCAL_AUTH_DEFAULT_PASSWORD environment variables. Use python -m argilla.tasks.users.create_default instead (#2564).
  • The old headers for API Key and workspace from python client
  • The default value for old API Key constant. Closes #2251

1.5.1 - 2023-03-30

Fixes

  • Copying datasets between workspaces with proper owner/workspace info. Closes #2562
  • Copy dataset with empty workspace to the default user workspace 905d4de
  • Using elasticsearch config to request backend version. Closes #2311
  • Remove sorting by score in labels. Closes #2622

Changed

  • Update field name in metadata for image url. See #2609
  • Improvements in tutorial doc cards. Closes #2216

1.5.0 - 2023-03-21

Added

  • Add the fields to retrieve when loading the data from argilla. rg.load takes too long because of the vector field, even when users don't need it. Closes #2398
  • Add new page and components for dataset settings. Closes #2442
  • Add ability to show image in records (for TokenClassification and TextClassification) if an URL is passed in metadata with the key _image_url
  • Non-searchable fields support in metadata. #2570
  • Add record ID references to the prepare for training methods. Closes #2483
  • Add tutorial on Image Classification. #2420
  • Add Train button, visible for "admin" role, with code snippets from a selection of libraries. Closes [#2591] (argilla-io#2591)

Changed

  • Labels are now centralized in a specific vuex ORM called GlobalLabel Model, see argilla-io#2210. This model is the same for TokenClassification and TextClassification (so both task have labels with color_id and shortcuts parameters in the vuex ORM)
  • The shortcuts improvement for labels #2339 have been moved to the vuex ORM in dataset settings feature #2444
  • Update "Define a labeling schema" section in docs.
  • The record inputs are sorted alphabetically in UI by default. #2581
  • The record inputs are fully visible when pagination size is one and the height of collapsed area size is bigger for laptop screen. #2587

Fixes

  • Allow URL to be clickable in Jupyter notebook again. Closes #2527

Removed

  • Removing some data scan deprecated endpoints used by old clients. This change will break compatibility with client <v1.3.0
  • Stop using old scan deprecated endpoints in python client. This logic will break client compatibility with server version <1.3.0
  • Remove the previous way to add labels through the dataset page. Now labels can be added only through dataset settings page.