# Build a Weaivate Transformation Agent

In this recipe, we will build a simple Weaviate [`TransformationAgent`](https://weaviate.io/developers/agents/transformation). We will build an agent that has access to a collection containing a bunch or research papers, their abstracts and titles. We will then use the agent to create additional properties for eaach of our objects in the collection.

The `TransformationAgent` is able to access a Weaviate collection of your chosing, and perform operations on the objects within it. However, each operation for the agent can be defined in natural language. The agent will then use an LLM to complete the instructions in the operation.


> 📚 You can learn more about the new `TransformationAgent`, you can read our accompanyin ["Introducing the Weaviate Transformation Agent"]() blog

To get started, we've prepared an open datasets, available on Hugging Face. The first step will be walking through how to populate your Weaviate Cloud collections.

- [**ArxivPapers:**](https://huggingface.co/datasets/weaviate/agents/viewer/query-agent-ecommerce) A dataset that lists titles and abstracts of research papers.


If you'd like to try out building more agents with different datasets, check out the list of demo datasets we have available on [Hugging Face Weaviate agents dataset](https://huggingface.co/datasets/weaviate/agents)

>[Build a Weaivate Transformation Agent](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=1iSbBy2zguFv)

>>[Setting Up Weaviate & Importing Data](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=zWqspSa5DTm-)

>>>[Prepare the Collections](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=m-JOSLFqsXY2)

>>>[Inspect the Collection in Explorer](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=Rj1CMObcvFbw)

>>[Define Transformation Operations](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=iftXR_eXDYvz)

>>>[Append New Properties](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=iftXR_eXDYvz)

>>>>[Create a List of Topics](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=iftXR_eXDYvz)

>>>>[Add a French Translation](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=H31WPYAbzVQC)

>>>>[Add NLP Relevance Score](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=KF3z9wSxziUL)

>>>>[Determine If It's a Survey Paper](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=DPOhrC-WzyXQ)

>>[Create & Run the Transformation Agent](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=7M0Hvf5J0K3Y)

>>>[Running the Transformations](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=DdN-FKsI0ljm)

>>>[Inspect the Operation Workflows](#updateTitle=true&folderId=19vZXBl8HMn0gIKArBsKg-zUfiFDTCf9j&scrollTo=rKEU5Olm0zhz)



## Setting Up Weaviate & Importing Data

To use the Weaviate Transformation Agent, first, create a [Weaviate Cloud](https://weaviate.io/deployment/serverless) account👇
1. [Create Serverless Weaviate Cloud account](https://weaviate.io/deployment/serverless) and setup a free [Sandbox](https://weaviate.io/developers/wcs/manage-clusters/create#sandbox-clusters)
2. Go to 'Embedding' and enable it, by default, this will make it so that we use `Snowflake/snowflake-arctic-embed-l-v2.0` as the embedding model
3. Take note of the `WEAVIATE_URL` and `WEAVIATE_API_KEY` to connect to your cluster below

> Info: We recommend using [Weaviate Embeddings](https://weaviate.io/developers/weaviate/model-providers/weaviate) so you do not have to provide any extra keys for external embedding providers.

In [4]:
!pip install weaviate-client[agents] datasets



In [5]:
import os
from getpass import getpass

if "WEAVIATE_API_KEY" not in os.environ:
  os.environ["WEAVIATE_API_KEY"] = getpass("Weaviate API Key")
if "WEAVIATE_URL" not in os.environ:
  os.environ["WEAVIATE_URL"] = getpass("Weaviate URL")

In [6]:
import weaviate
from weaviate.auth import Auth

client = weaviate.connect_to_weaviate_cloud(
        cluster_url=os.environ.get("WEAVIATE_URL"),
        auth_credentials=Auth.api_key(os.environ.get("WEAVIATE_API_KEY")),
)

### Prepare the Collections


In the following code block, we are pulling our demo "papers" datasets from Hugging Face and writing them to new collections in our Weaviate Serverless cluster.

Note: We recommend enableing 'Embeddings' in the Weavaite Cloud console. This way, you can use the `text2vec_weaviate` vectorizer, which will create vectors for each object using `Snowflake/snowflake-arctic-embed-l-v2.0` by default.

In [25]:
from weaviate.classes.config import Configure, Property, DataType

# To re-run cell you may have to delete collections
# client.collections.delete("ArxivPapers")
client.collections.create(
    "ArxivPapers",
    description="A dataset that lists research paper titles and abstracts",
    vectorizer_config=Configure.Vectorizer.text2vec_weaviate()
)


<weaviate.collections.collection.sync.Collection at 0x79c2b9070090>

In [31]:
from datasets import load_dataset

dataset = load_dataset("weaviate/agents", "transformation-agent-papers", split="train", streaming=True)

papers_collection = client.collections.get("ArxivPapers")

with papers_collection.batch.dynamic() as batch:
    for i, item in enumerate(dataset):
      if i < 200:
        batch.add_object(properties=item["properties"])

### Inspect the Collection in Explorer

The `TransformationAgent` will modify the collection as we go along. This is a good time to take a look at the contents of your "ArxivPapers" collection. If all goes well, you should be seeing 2 properties listed for each object:
- `title`: the title of the paper.
- `abstract`: the abstract of the paper.

As well as the `vectors` for each object.

## Define Transformation Operations

The star of the show for the `TransformationAgent` are the operations.

We can now define transformation operations which we want to perform on our collection. An operation can be:

- Appending a new property
- Updating an existing property

### Append New Properties

To append a new property, we define an operation with:
- **`instrcution`**: This is where you can describe, in natural language, what you want this new property to be.
- **`property_name`**: The name you want the property to have
- **`data_type`**: The specific datatype the property should be. E.g.: `DataType.TEXT`, `DataType.TEXT_ARRAY`, `DataType.BOOL`, `DataType.INT` etc.
- **`view_properties`**: Sometimes, you may want to create properties that are based on information provided in other properties, this is where you can list out which properties the instruction should view.

#### Create a List of Topics

First, let's append a new property called "topics", which should be a `TEXT_ARRAY`. Based on the "abstract" and "title", let's ask for the LLM to extract a list of topic tags. We can be specific here. Let's ask for no more than 5

In [32]:
from weaviate.agents.classes import Operations
from weaviate.collections.classes.config import DataType

add_topics = Operations.append_property(
    property_name="topics",
    data_type=DataType.TEXT_ARRAY,
    view_properties=["abstract", "title"],
    instruction="""Create a list of topic tags based on the title and abstract.
    Topics should be distinct from eachother. Provide a maximum of 5 topics.
    Group similar topics under one topic tag.""",
)


#### Add a French Translation

Next, let's add a new "french_abstract" property which is simply a translation of the "abstract"

In [15]:
add_french_abstract = Operations.append_property(
      property_name="french_abstract",
      data_type=DataType.TEXT,
      view_properties=["abstract"],
      instruction="Translate the abstract to French",
)

#### Add NLP Relevance Score

This time, we can add a property which is an `INT`. Here, we ask the LLM to give a score from 0 to 10, based on how relevant tha paper is to Natural Language Processing.

In [33]:
add_nlp_relevance = Operations.append_property(
    property_name="nlp_relevance",
    data_type=DataType.INT,
    view_properties=["abstract"],
    instruction="""Give a score from 0-10 based on how relevant the abstract is to Natural Language Processing.
    The scale is from 0 (not relevant at all) to 10 (very relevant)""",
)

#### Determine If It's a Survey Paper

Finally, let's ask for a `BOOL` property which indicates whether the paper is a survey or not. I.e., we'll ask the LLM to determine if the paper presents  novel techniques, or whether it's a survey of existing ones.

In [34]:
is_survey_paper = Operations.append_property(
    property_name="is_survey_paper",
    data_type=DataType.BOOL,
    view_properties=["abstract"],
    instruction="""Determine if the paper is a "survey".
    A paper is considered survey it's a surveys existing techniques, and not if it presents novel techniques""",
)

## Create & Run the Transformation Agent

Once we have all of our operations defined, we can initialize a `TransformationAgent`.

When initializing the agent, we have to decide which `collection` it may have accesss to. In this case, we want it to have access to the "ArxivPapers" collection we previously created.

Next, we need to provide a list of `operations` which the agent should run. Here, we provide all the operations we defined above.

In [37]:
from weaviate.agents.transformation import TransformationAgent

agent = TransformationAgent(
    client=client,
    collection="ArxivPapers",
    operations=[
        add_topics,
        add_french_abstract,
        add_nlp_relevance,
        is_survey_paper,
    ],
)

### Running the Transformations

By calling `update_all()`, we get the agent to spin up individual workflows for each operation. Each operation will then run on each object in our collectoion.

In [38]:
workflow_ids = agent.update_all()

### Inspect the Operation Workflows

To inspect the status of our operations, we can take a look at the `workflow_ids` and get their status with `agent.get_status(workflow_id)`

In [39]:
workflow_ids

[TransformationResponse(operation_name='topics', workflow_id='TransformationWorkflow-598aafbd4688c36768b2f2c307c576fd'),
 TransformationResponse(operation_name='french_abstract', workflow_id='TransformationWorkflow-ca3e9020ab4eb6569946953901c6202e'),
 TransformationResponse(operation_name='nlp_relevance', workflow_id='TransformationWorkflow-4bb5763a3d97a2fbf791eaa50a700754'),
 TransformationResponse(operation_name='is_survey_paper', workflow_id='TransformationWorkflow-bd1d1302255768cb2d0d16156b428674')]

In [40]:
agent.get_status(workflow_id=workflow_ids[0].workflow_id)

{'workflow_id': 'TransformationWorkflow-598aafbd4688c36768b2f2c307c576fd',
 'status': {'batch_count': 1,
  'end_time': None,
  'start_time': '2025-03-06 14:20:31',
  'state': 'running',
  'total_duration': None,
  'total_items': 200}}