diff --git a/examples/basics/data_row_metadata.ipynb b/examples/basics/data_row_metadata.ipynb index aa9cb7d36..8e3dd5051 100644 --- a/examples/basics/data_row_metadata.ipynb +++ b/examples/basics/data_row_metadata.ipynb @@ -36,6 +36,32 @@ ], "cell_type": "markdown" }, + { + "metadata": {}, + "source": [ + "## Metadata ontology\n", + "\n", + "We use a similar system for managing metadata as we do feature schemas. Metadata schemas are strongly typed to ensure we can provide the best experience in the App. Each metadata field can be uniquely accessed by id. Names are unique within the kind of metadata, reserved or custom. A DataRow can have a maximum of 5 metadata fields at a time.\n", + "\n", + "### Metadata kinds\n", + "\n", + "* **Enum**: A classification with options, only one option can be selected at a time\n", + "* **DateTime**: A utc ISO datetime \n", + "* **String**: A string of less than 500 characters\n", + "\n", + "### Reserved fields\n", + "\n", + "* **tag**: a free text field\n", + "* **split**: enum of train-valid-test\n", + "* **captureDateTime**: ISO 8601 datetime field. All times must be in UTC\n", + "\n", + "### Custom fields\n", + "\n", + "* **Embedding**: 128 float 32 vector used for similarity. To upload custom embeddings use the following [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb)\n", + "* Any metadata kind can be customized" + ], + "cell_type": "markdown" + }, { "metadata": {}, "source": [ @@ -58,6 +84,7 @@ "import labelbox as lb\n", "from datetime import datetime\n", "from pprint import pprint\n", + "from labelbox.schema.data_row_metadata import DataRowMetadataKind\n", "from uuid import uuid4" ], "cell_type": "code", @@ -78,26 +105,7 @@ { "metadata": {}, "source": [ - "## Metadata ontology\n", - "\n", - "We use a similar system for managing metadata as we do feature schemas. Metadata schemas are strongly typed to ensure we can provide the best experience in the App. Each metadata field can be uniquely accessed by id. Names are unique within the kind of metadata, reserved or custom. A DataRow can have a maximum of 5 metadata fields at a time.\n", - "\n", - "### Metadata kinds\n", - "\n", - "* **Enum**: A classification with options, only one option can be selected at a time\n", - "* **DateTime**: A utc ISO datetime \n", - "* **Embedding**: 128 float 32 vector used for similarity\n", - "* **String**: A string of less than 500 characters\n", - "\n", - "### Reserved fields\n", - "\n", - "* **tag**: a free text field\n", - "* **split**: enum of train-valid-test\n", - "* **captureDateTime**: ISO 8601 datetime field. All times must be in UTC\n", - "\n", - "### Custom fields\n", - "\n", - "You can create your own fields from within the app by navigating to the [metadata schema page](https://app.labelbox.com/schema/metadata)" + "### Get the current metadata ontology " ], "cell_type": "markdown" }, @@ -124,9 +132,15 @@ { "metadata": {}, "source": [ - "# access by name\n", + "### Access metadata by name" + ], + "cell_type": "markdown" + }, + { + "metadata": {}, + "source": [ "split_field = mdo.reserved_by_name[\"split\"]\n", - "train_field = mdo.reserved_by_name[\"split\"][\"train\"]" + "split_field" ], "cell_type": "code", "outputs": [], @@ -135,7 +149,8 @@ { "metadata": {}, "source": [ - "tag_field = mdo.reserved_by_name[\"tag\"]" + "tag_field = mdo.reserved_by_name[\"tag\"]\n", + "tag_field" ], "cell_type": "code", "outputs": [], @@ -144,7 +159,8 @@ { "metadata": {}, "source": [ - "tag_field" + "train_field = mdo.reserved_by_name[\"split\"][\"train\"]\n", + "train_field" ], "cell_type": "code", "outputs": [], @@ -153,9 +169,9 @@ { "metadata": {}, "source": [ - "## Construct metadata fields\n", + "## Construct metadata fields for existing metadata schemas\n", "\n", - "To construct a metadata field you must provide the Schema Id for the field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the Schema Id and value in a dictionary format.\n", + "To construct a metadata field you must provide the name for the metadata field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the name and value in a dictionary format.\n", "\n", "\n", "\n" @@ -174,21 +190,21 @@ "source": [ "# Construct a metadata field of string kind\n", "tag_metadata_field = lb.DataRowMetadataField(\n", - " name=\"tag\", # specify the schema name\n", - " value=\"tag_string\", # typed inputs\n", + " name=\"tag\", \n", + " value=\"tag_string\", \n", ")\n", "\n", "# Construct an metadata field of datetime kind\n", "capture_datetime_field = lb.DataRowMetadataField(\n", - " name=\"captureDateTime\", # specify the schema id\n", - " value=datetime.utcnow(), # typed inputs\n", + " name=\"captureDateTime\", \n", + " value=datetime.utcnow(), \n", ")\n", "\n", "# Construct a metadata field of Enums options\n", - "split_metadta_field = lb.DataRowMetadataField(\n", - " name=\"split\", # specify the schema id\n", - " value=\"train\", # typed inputs\n", - ")" + "split_metadata_field = lb.DataRowMetadataField(\n", + " name=\"split\", \n", + " value=\"train\",\n", + ")\n" ], "cell_type": "code", "outputs": [], @@ -197,7 +213,7 @@ { "metadata": {}, "source": [ - "Option 2: Alternatively, you can specify the metadata fields with dictionary format without declaring the `DataRowMetadataField` objects.\n" + "Option 2: You can also specify the metadata fields with dictionary format without declaring the `DataRowMetadataField` objects.\n" ], "cell_type": "markdown" }, @@ -220,7 +236,8 @@ "split_metadata_field_dict = {\n", " \"name\": \"split\",\n", " \"value\": \"train\",\n", - "}" + "}\n", + "\n" ], "cell_type": "code", "outputs": [], @@ -229,25 +246,156 @@ { "metadata": {}, "source": [ - "## Upload data rows together with metadata\n", + "## Create a custom metadata schema with their corresponding fields\n" + ], + "cell_type": "markdown" + }, + { + "metadata": {}, + "source": [ + "# Final \n", + "custom_metadata_fields = []\n", + "\n", + "# Create the schema for the metadata \n", + "number_schema = mdo.create_schema(\n", + " name=\"numberMetadataCustom\", \n", + " kind=DataRowMetadataKind.number\n", + ")\n", "\n", - "See [Limits](https://docs.labelbox.com/docs/limits) for information on limits for uploading data rows in one API operation." + "# Add fields to the metadata schema\n", + "data_row_metadata_fields_number = lb.DataRowMetadataField(\n", + " name=number_schema.name,\n", + " value=5.0\n", + ")\n", + "\n", + "custom_metadata_fields.append(data_row_metadata_fields_number)\n" + ], + "cell_type": "code", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "source": [ + "# Create the schema for an enum metadata \n", + "custom_metadata_fields = []\n", + "\n", + "enum_schema = mdo.create_schema(\n", + " name=\"enumMetadata\", \n", + " kind=DataRowMetadataKind.enum,\n", + " options=[\"option1\", \"option2\"]\n", + ")\n", + "\n", + "# Add fields to the metadata schema \n", + "data_row_metadata_fields_enum_1 = lb.DataRowMetadataField(\n", + " name=enum_schema.name,\n", + " value=\"option1\"\n", + ")\n", + "custom_metadata_fields.append(data_row_metadata_fields_enum_1)\n", + "\n", + "\n", + "data_row_metadata_fields_enum_2 = lb.DataRowMetadataField(\n", + " name=enum_schema.name,\n", + " value=\"option2\"\n", + ")\n", + "custom_metadata_fields.append(data_row_metadata_fields_enum_2)\n", + "\n" + ], + "cell_type": "code", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "source": [ + "# Inspect the newly created metadata schemas\n", + "metadata_ontologies = mdo.fields_by_id\n", + "pprint(metadata_ontologies, indent=2)" + ], + "cell_type": "code", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "source": [ + "## Create data rows with metadata\n", + "\n", + "See our [documentation](https://docs.labelbox.com/docs/limits) for information on limits for uploading data rows in a single API operation." ], "cell_type": "markdown" }, { "metadata": {}, "source": [ - "# A simple example of uploading Data Rows with metadata\n", + "# A simple example of uploading data rows with metadata\n", "dataset = client.create_dataset(name=\"Simple Data Rows import with metadata example\")\n", + "global_key = \"s_basic.jpg\"\n", + "data_row = {\"row_data\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg\", \"global_key\": global_key}\n", + "# This line works with dictionaries as well as schemas and fields created with DataRowMetadataField\n", + "data_row['metadata_fields'] = custom_metadata_fields + [ split_metadata_field , capture_datetime_field_dict, tag_metadata_field ]\n", "\n", - "data_row = {\"row_data\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg\", \"global_key\": str(uuid4())}\n", - "data_row['metadata_fields'] = [tag_metadata_field, capture_datetime_field, split_metadata_field_dict] \n", - "# Also works with a list of dictionary as specified in Option 2. Uncomment the line below to try. \n", - "# data_row['metadata_fields'] = [tag_metadata_field_dict, capture_datetime_field_dict, split_metadata_field_dict]\n", "\n", "task = dataset.create_data_rows([data_row])\n", - "task.wait_till_done()" + "task.wait_till_done()\n", + "result_task = task.result\n", + "print(result_task)" + ], + "cell_type": "code", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "source": [ + "## Update data row metadata" + ], + "cell_type": "markdown" + }, + { + "metadata": {}, + "source": [ + "# Get the data row that was uploaded in the previous cell\n", + "num_schema = mdo.get_by_name(\"numberMetadataCustom\")\n", + "\n", + "# Update the metadata\n", + "updated_metadata = lb.DataRowMetadataField(\n", + " schema_id=num_schema.uid, \n", + " value=10.2\n", + ")\n", + "\n", + "# Create data row payload\n", + "data_row_payload = lb.DataRowMetadata(\n", + " global_key=global_key, \n", + " fields=[updated_metadata]\n", + ")\n", + "\n", + "# Upsert the fields with the update metadata for number-metadata\n", + "mdo.bulk_upsert([data_row_payload])" + ], + "cell_type": "code", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "source": [ + "## Update metadata schema" + ], + "cell_type": "markdown" + }, + { + "metadata": {}, + "source": [ + "# update a name \n", + "number_schema = mdo.update_schema(name=\"numberMetadataCustom\", new_name=\"numberMetadataCustomNew\")\n", + "\n", + "# update an Enum metadata schema option's name, this only applies to Enum metadata schema.\n", + "enum_schema = mdo.update_enum_option(\n", + " name=\"enumMetadata\", \n", + " option=\"option1\",\n", + " new_option=\"option3\"\n", + ")" ], "cell_type": "code", "outputs": [], @@ -276,7 +424,7 @@ { "metadata": {}, "source": [ - "You can bulk export metadata given data row IDs" + "You can bulk export metadata using data row IDs." ], "cell_type": "markdown" }, @@ -293,9 +441,26 @@ { "metadata": {}, "source": [ - "## Upload/delete/update custom metadata for existing data rows\n", + "## Delete custom metadata schema \n", + "You can delete custom metadata schema by name. If you wish to delete a metadata schema, uncomment the line below and insert the desired name." + ], + "cell_type": "markdown" + }, + { + "metadata": {}, + "source": [ + "#status = mdo.delete_schema(name=\"\")" + ], + "cell_type": "code", + "outputs": [], + "execution_count": null + }, + { + "metadata": {}, + "source": [ + "## Upload/delete/update custom embedding metadata for existing data rows\n", "\n", - "For a complete tutorial on how to update, upload and delete custom metadata please follow the steps in this [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb).\n", + "For a complete tutorial on how to update, upload and delete custom embeddings please follow the steps in this [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb).\n", "\n" ], "cell_type": "markdown"