Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
259 changes: 212 additions & 47 deletions examples/basics/data_row_metadata.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,32 @@
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"## Metadata ontology\n",
"\n",
"We use a similar system for managing metadata as we do feature schemas. Metadata schemas are strongly typed to ensure we can provide the best experience in the App. Each metadata field can be uniquely accessed by id. Names are unique within the kind of metadata, reserved or custom. A DataRow can have a maximum of 5 metadata fields at a time.\n",
"\n",
"### Metadata kinds\n",
"\n",
"* **Enum**: A classification with options, only one option can be selected at a time\n",
"* **DateTime**: A utc ISO datetime \n",
"* **String**: A string of less than 500 characters\n",
"\n",
"### Reserved fields\n",
"\n",
"* **tag**: a free text field\n",
"* **split**: enum of train-valid-test\n",
"* **captureDateTime**: ISO 8601 datetime field. All times must be in UTC\n",
"\n",
"### Custom fields\n",
"\n",
"* **Embedding**: 128 float 32 vector used for similarity. To upload custom embeddings use the following [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb)\n",
"* Any metadata kind can be customized"
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
Expand All @@ -58,6 +84,7 @@
"import labelbox as lb\n",
"from datetime import datetime\n",
"from pprint import pprint\n",
"from labelbox.schema.data_row_metadata import DataRowMetadataKind\n",
"from uuid import uuid4"
],
"cell_type": "code",
Expand All @@ -78,26 +105,7 @@
{
"metadata": {},
"source": [
"## Metadata ontology\n",
"\n",
"We use a similar system for managing metadata as we do feature schemas. Metadata schemas are strongly typed to ensure we can provide the best experience in the App. Each metadata field can be uniquely accessed by id. Names are unique within the kind of metadata, reserved or custom. A DataRow can have a maximum of 5 metadata fields at a time.\n",
"\n",
"### Metadata kinds\n",
"\n",
"* **Enum**: A classification with options, only one option can be selected at a time\n",
"* **DateTime**: A utc ISO datetime \n",
"* **Embedding**: 128 float 32 vector used for similarity\n",
"* **String**: A string of less than 500 characters\n",
"\n",
"### Reserved fields\n",
"\n",
"* **tag**: a free text field\n",
"* **split**: enum of train-valid-test\n",
"* **captureDateTime**: ISO 8601 datetime field. All times must be in UTC\n",
"\n",
"### Custom fields\n",
"\n",
"You can create your own fields from within the app by navigating to the [metadata schema page](https://app.labelbox.com/schema/metadata)"
"### Get the current metadata ontology "
],
"cell_type": "markdown"
},
Expand All @@ -124,9 +132,15 @@
{
"metadata": {},
"source": [
"# access by name\n",
"### Access metadata by name"
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"split_field = mdo.reserved_by_name[\"split\"]\n",
"train_field = mdo.reserved_by_name[\"split\"][\"train\"]"
"split_field"
],
"cell_type": "code",
"outputs": [],
Expand All @@ -135,7 +149,8 @@
{
"metadata": {},
"source": [
"tag_field = mdo.reserved_by_name[\"tag\"]"
"tag_field = mdo.reserved_by_name[\"tag\"]\n",
"tag_field"
],
"cell_type": "code",
"outputs": [],
Expand All @@ -144,7 +159,8 @@
{
"metadata": {},
"source": [
"tag_field"
"train_field = mdo.reserved_by_name[\"split\"][\"train\"]\n",
"train_field"
],
"cell_type": "code",
"outputs": [],
Expand All @@ -153,9 +169,9 @@
{
"metadata": {},
"source": [
"## Construct metadata fields\n",
"## Construct metadata fields for existing metadata schemas\n",
"\n",
"To construct a metadata field you must provide the Schema Id for the field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the Schema Id and value in a dictionary format.\n",
"To construct a metadata field you must provide the name for the metadata field and the value that will be uploaded. You can either construct a DataRowMetadataField object or specify the name and value in a dictionary format.\n",
"\n",
"\n",
"\n"
Expand All @@ -174,21 +190,21 @@
"source": [
"# Construct a metadata field of string kind\n",
"tag_metadata_field = lb.DataRowMetadataField(\n",
" name=\"tag\", # specify the schema name\n",
" value=\"tag_string\", # typed inputs\n",
" name=\"tag\", \n",
" value=\"tag_string\", \n",
")\n",
"\n",
"# Construct an metadata field of datetime kind\n",
"capture_datetime_field = lb.DataRowMetadataField(\n",
" name=\"captureDateTime\", # specify the schema id\n",
" value=datetime.utcnow(), # typed inputs\n",
" name=\"captureDateTime\", \n",
" value=datetime.utcnow(), \n",
")\n",
"\n",
"# Construct a metadata field of Enums options\n",
"split_metadta_field = lb.DataRowMetadataField(\n",
" name=\"split\", # specify the schema id\n",
" value=\"train\", # typed inputs\n",
")"
"split_metadata_field = lb.DataRowMetadataField(\n",
" name=\"split\", \n",
" value=\"train\",\n",
")\n"
],
"cell_type": "code",
"outputs": [],
Expand All @@ -197,7 +213,7 @@
{
"metadata": {},
"source": [
"Option 2: Alternatively, you can specify the metadata fields with dictionary format without declaring the `DataRowMetadataField` objects.\n"
"Option 2: You can also specify the metadata fields with dictionary format without declaring the `DataRowMetadataField` objects.\n"
],
"cell_type": "markdown"
},
Expand All @@ -220,7 +236,8 @@
"split_metadata_field_dict = {\n",
" \"name\": \"split\",\n",
" \"value\": \"train\",\n",
"}"
"}\n",
"\n"
],
"cell_type": "code",
"outputs": [],
Expand All @@ -229,25 +246,156 @@
{
"metadata": {},
"source": [
"## Upload data rows together with metadata\n",
"## Create a custom metadata schema with their corresponding fields\n"
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"# Final \n",
"custom_metadata_fields = []\n",
"\n",
"# Create the schema for the metadata \n",
"number_schema = mdo.create_schema(\n",
" name=\"numberMetadataCustom\", \n",
" kind=DataRowMetadataKind.number\n",
")\n",
"\n",
"See [Limits](https://docs.labelbox.com/docs/limits) for information on limits for uploading data rows in one API operation."
"# Add fields to the metadata schema\n",
"data_row_metadata_fields_number = lb.DataRowMetadataField(\n",
" name=number_schema.name,\n",
" value=5.0\n",
")\n",
"\n",
"custom_metadata_fields.append(data_row_metadata_fields_number)\n"
],
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"# Create the schema for an enum metadata \n",
"custom_metadata_fields = []\n",
"\n",
"enum_schema = mdo.create_schema(\n",
" name=\"enumMetadata\", \n",
" kind=DataRowMetadataKind.enum,\n",
" options=[\"option1\", \"option2\"]\n",
")\n",
"\n",
"# Add fields to the metadata schema \n",
"data_row_metadata_fields_enum_1 = lb.DataRowMetadataField(\n",
" name=enum_schema.name,\n",
" value=\"option1\"\n",
")\n",
"custom_metadata_fields.append(data_row_metadata_fields_enum_1)\n",
"\n",
"\n",
"data_row_metadata_fields_enum_2 = lb.DataRowMetadataField(\n",
" name=enum_schema.name,\n",
" value=\"option2\"\n",
")\n",
"custom_metadata_fields.append(data_row_metadata_fields_enum_2)\n",
"\n"
],
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"# Inspect the newly created metadata schemas\n",
"metadata_ontologies = mdo.fields_by_id\n",
"pprint(metadata_ontologies, indent=2)"
],
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Create data rows with metadata\n",
"\n",
"See our [documentation](https://docs.labelbox.com/docs/limits) for information on limits for uploading data rows in a single API operation."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"# A simple example of uploading Data Rows with metadata\n",
"# A simple example of uploading data rows with metadata\n",
"dataset = client.create_dataset(name=\"Simple Data Rows import with metadata example\")\n",
"global_key = \"s_basic.jpg\"\n",
"data_row = {\"row_data\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg\", \"global_key\": global_key}\n",
"# This line works with dictionaries as well as schemas and fields created with DataRowMetadataField\n",
"data_row['metadata_fields'] = custom_metadata_fields + [ split_metadata_field , capture_datetime_field_dict, tag_metadata_field ]\n",
"\n",
"data_row = {\"row_data\": \"https://storage.googleapis.com/labelbox-sample-datasets/Docs/basic.jpg\", \"global_key\": str(uuid4())}\n",
"data_row['metadata_fields'] = [tag_metadata_field, capture_datetime_field, split_metadata_field_dict] \n",
"# Also works with a list of dictionary as specified in Option 2. Uncomment the line below to try. \n",
"# data_row['metadata_fields'] = [tag_metadata_field_dict, capture_datetime_field_dict, split_metadata_field_dict]\n",
"\n",
"task = dataset.create_data_rows([data_row])\n",
"task.wait_till_done()"
"task.wait_till_done()\n",
"result_task = task.result\n",
"print(result_task)"
],
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Update data row metadata"
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"# Get the data row that was uploaded in the previous cell\n",
"num_schema = mdo.get_by_name(\"numberMetadataCustom\")\n",
"\n",
"# Update the metadata\n",
"updated_metadata = lb.DataRowMetadataField(\n",
" schema_id=num_schema.uid, \n",
" value=10.2\n",
")\n",
"\n",
"# Create data row payload\n",
"data_row_payload = lb.DataRowMetadata(\n",
" global_key=global_key, \n",
" fields=[updated_metadata]\n",
")\n",
"\n",
"# Upsert the fields with the update metadata for number-metadata\n",
"mdo.bulk_upsert([data_row_payload])"
],
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Update metadata schema"
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"# update a name \n",
"number_schema = mdo.update_schema(name=\"numberMetadataCustom\", new_name=\"numberMetadataCustomNew\")\n",
"\n",
"# update an Enum metadata schema option's name, this only applies to Enum metadata schema.\n",
"enum_schema = mdo.update_enum_option(\n",
" name=\"enumMetadata\", \n",
" option=\"option1\",\n",
" new_option=\"option3\"\n",
")"
],
"cell_type": "code",
"outputs": [],
Expand Down Expand Up @@ -276,7 +424,7 @@
{
"metadata": {},
"source": [
"You can bulk export metadata given data row IDs"
"You can bulk export metadata using data row IDs."
],
"cell_type": "markdown"
},
Expand All @@ -293,9 +441,26 @@
{
"metadata": {},
"source": [
"## Upload/delete/update custom metadata for existing data rows\n",
"## Delete custom metadata schema \n",
"You can delete custom metadata schema by name. If you wish to delete a metadata schema, uncomment the line below and insert the desired name."
],
"cell_type": "markdown"
},
{
"metadata": {},
"source": [
"#status = mdo.delete_schema(name=\"<metadata schema name>\")"
],
"cell_type": "code",
"outputs": [],
"execution_count": null
},
{
"metadata": {},
"source": [
"## Upload/delete/update custom embedding metadata for existing data rows\n",
"\n",
"For a complete tutorial on how to update, upload and delete custom metadata please follow the steps in this [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb).\n",
"For a complete tutorial on how to update, upload and delete custom embeddings please follow the steps in this [tutorial](https://colab.research.google.com/github/Labelbox/labelbox-python/blob/master/examples/basics/custom_embeddings.ipynb).\n",
"\n"
],
"cell_type": "markdown"
Expand Down