# Weaviate Transformation Agent - Workshop

<a target="_blank" href="https://colab.research.google.com/github/weaviate-tutorials/intro-to-weaviate-agents/blob/main/transformation-agent-workshop.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

### Prerequisites

1. Log in to [Weaviate Cloud](https://console.weaviate.cloud) account (sign up if you don't have one yet)
1. Create a Weaviate Cloud [Sandbox](https://weaviate.io/developers/wcs/manage-clusters/create#sandbox-clusters) instance
1. Go to the 'Embedding' tab (on the left column) and enable `Weaviate Embeddings`
1. Take note of the `REST Endpoint` and a `Admin` `API Key`. 
1. Update `WEAVIATE_CLOUD_URL` with the `REST Endpoint` and `WEAVIATE_API_KEY` with the `Admin` `API Key`.
    - (Option 1): **If using Google Colab**
        - Set the values in the "Secrets" tab in the left column.
        
        <img src="../img/colab_secrets.png" alt="Update the .env file" width="400"/>
    - (Option 2): **Using an environment with a copy of the repository**
        - Update the values the `.env` file in the root directory of this repository.
        - Make sure to restart the Jupyter notebook after updating & saving the `.env` file.
        

Load our secrets (Weaviate URL & API key)

In [1]:
def is_colab():
    """Check if the current notebook is running in Google Colab."""
    try:
        import google.colab
        return True
    except ImportError:
        return False


if is_colab():
    from google.colab import userdata
    weaviate_url = userdata.get('WEAVIATE_CLOUD_URL')
    weaviate_api_key = userdata.get('WEAVIATE_CLOUD_API_KEY')
    print("Running in Colab, secrets retrieved. URL:", weaviate_url)

else:
    import os
    import dotenv

    dotenv.load_dotenv()

    # Remember to Update the .env file & RESTART the kernel (if running a local environment)
    weaviate_url = os.getenv("WEAVIATE_CLOUD_URL")
    weaviate_api_key = os.getenv("WEAVIATE_CLOUD_API_KEY")

In [2]:
# If in Colab, install required packages like so:
# !pip install  -Uqq weaviate-client[agents] datasets

# Otherwise, run the following command in your terminal:
# "pip install -r requirements.txt"

## Introduction

### Agenda

Let's talk about:
- What the Transformation Agent is
- What you can do with the Transformation Agent
- Some tips & tricks
- How to get started

### About the Transformation Agent

The *Weaviate Transformation Agent* is 

- A cloud-based service 
- for transforming your data in a Weaviate instance
- available for Weaviate Cloud users

**And** it is: in technical preview (do **not** use in production)

<center><img src="../img/agents_tech_preview.png" width="60%"></center>

> ⚠️ The Weaviate Transformation Agent modifies data objects in Weaviate. **While the Agent is in technical preview, do not use it in a production environment.** 
> 
> The Agent may not work as expected, and the data in your Weaviate instance may be affected in unexpected ways.

**What the Transformation Agent is**

<center><img src="../img/ta_obj.png" width="60%"></center>

The `TransformationAgent` can modify objects in a Weaviate collection to add new properties or update existing properties.

**What you can do with the Transformation Agent**

<center><img src="../img/ta_overview.png" width="60%"></center>

Provide instructions to the `TransformationAgent` using natural language, and other required parameters. 

## Preparation

Here, we are going to use the [**Weaviate/ArxivPapers**](https://huggingface.co/datasets/weaviate/agents/viewer/query-agent-ecommerce) dataset. 

It includes titles and abstracts of a few research papers.

First, we load the dataset & add it to Weaviate.

### Load dataset

In [3]:
from datasets import load_dataset

papers_dataset = load_dataset("weaviate/agents", "transformation-agent-papers", split="train")

In [4]:
print(papers_dataset.shape)
print(papers_dataset[0]["properties"].keys())

(2000, 2)
dict_keys(['abstract', 'title'])


In [5]:
for k, v in papers_dataset[0]["properties"].items():
    if len(v) > 100:
        v = v[:100] + "..."
    print(f"{k}: {v}")

abstract:   Astronomy is increasingly encountering two fundamental truths: (1) The field
is faced with the tas...
title: Discussion on "Techniques for Massive-Data Machine Learning in
  Astronomy" by A. Gray


Iterate through the data

In [6]:
columns = papers_dataset[0]["properties"].keys()

for i, item in enumerate(papers_dataset):
    if i < 2:
        properties = {
            col: item["properties"][col] for col in columns
        }
        print(properties)

{'abstract': "  Astronomy is increasingly encountering two fundamental truths: (1) The field\nis faced with the task of extracting useful information from extremely large,\ncomplex, and high dimensional datasets; (2) The techniques of astroinformatics\nand astrostatistics are the only way to make this tractable, and bring the\nrequired level of sophistication to the analysis. Thus, an approach which\nprovides these tools in a way that scales to these datasets is not just\ndesirable, it is vital. The expertise required spans not just astronomy, but\nalso computer science, statistics, and informatics. As a computer scientist and\nexpert in machine learning, Alex's contribution of expertise and a large number\nof fast algorithms designed to scale to large datasets, is extremely welcome.\nWe focus in this discussion on the questions raised by the practical\napplication of these algorithms to real astronomical datasets. That is, what is\nneeded to maximally leverage their potential to impro

### Ingest data into Weaviate

#### Connect to Weaviate

In [7]:
weaviate_url

'https://1ree7zierqqrwwnif6b6ug.c0.europe-west3.gcp.weaviate.cloud'

In [8]:
import weaviate
from weaviate.classes.init import Auth

client = weaviate.connect_to_weaviate_cloud(
    cluster_url=weaviate_url, auth_credentials=Auth.api_key(weaviate_api_key)
)

assert client.is_ready()

            We encourage you to update your code to use the async client instead when running inside async def functions!


#### Set up a collection

**Important:** Make sure to enable 'Embeddings' in the Weaviate Cloud console. 

[See above](#prerequisites)

In [None]:
# Hint: https://weaviate.io/developers/weaviate/manage-data/collections#define-named-vectors

from weaviate.classes.config import Configure, Property, DataType

collection_name = "ArxivPapersDemo"

# Can delete the collection if you would like to (re)start fresh
client.collections.delete(collection_name)

if client.collections.exists(collection_name):
    # For re-running this tutorial, do nothing
    pass
else:
    client.collections.create(
        collection_name,
        description="A dataset that lists research paper titles and abstracts",
        properties=[
            Property(name="title", data_type=DataType.TEXT),
            Property(name="abstract", data_type=DataType.TEXT),
        ],
        vectorizer_config=[
            Configure.NamedVectors.text2vec_weaviate(
                name="default",
                source_properties=["title", "abstract"],
            )
        ]
    )

#### Add data to Weaviate

We loop through the data and add it to Weaviate. 

For the demo/workshop, we add only a few rows for speed and simplicity.

In [None]:
papers_collection = client.collections.get(collection_name)
columns = papers_dataset[0]["properties"].keys()


# Hint: https://weaviate.io/developers/weaviate/manage-data/import#basic-import
with papers_collection.batch.fixed_size(100) as batch:
    for i, item in enumerate(papers_dataset):
        if i < 50:
            properties = {col: item["properties"][col] for col in columns}
            batch.add_object(properties=properties)


if papers_collection.batch.failed_objects:
    for fo in papers_collection.batch.failed_objects[:3]:
        print(fo.message)
        print(fo.object_)

In [11]:
len(papers_collection)

50

#### Inspect the collection 



In [None]:
# Hint: https://weaviate.io/developers/weaviate/search/basics#limit-returned-objects
response = papers_collection.query.fetch_objects(
    limit=3,
    include_vector=True
)

for o in response.objects:
    for k, v in o.properties.items():
        print(f"{k}: {v[:50]}")
    print()
    print(o.vector["default"][:10])  # No need to print the entire vector

abstract:   We consider finite horizon Markov decision proce
title: Mean-Variance Optimization in Markov Decision Proc

[-0.0181732177734375, -0.03717041015625, 0.0040740966796875, 0.006504058837890625, -0.0855712890625, 0.049957275390625, 0.00616455078125, -0.003971099853515625, -0.00991058349609375, -0.0213775634765625]
abstract:   We propose a new approach to value function appr
title: Predictive State Temporal Difference Learning

[-0.0650634765625, -0.040313720703125, 0.0125732421875, 0.0020351409912109375, -0.06591796875, 0.037384033203125, 0.06390380859375, 0.056365966796875, 0.0447998046875, -0.01021575927734375]
abstract:   We study the problem of dynamic spectrum sensing
title: Algorithms for Dynamic Spectrum Access with Learni

[-0.0162200927734375, -0.0203399658203125, 0.031494140625, 0.0002772808074951172, -0.05853271484375, -0.030029296875, 0.05706787109375, -0.07281494140625, 0.0119476318359375, 0.0028362274169921875]


**Alternative: Use the `Explorer` cloud tool**

On Weaviate Cloud Console, click on the `Explorer` tab on the left column.

When you click on each object, you should see 2 properties:
- `title`
- `abstract`

As well as its `vectors`

## Using the original dataset:


### Can you find what you need?

Can you find papers about a specific topic (e.g. machine learning)?

In [None]:
# https://weaviate.io/developers/weaviate/search/similarity#search-with-text
response = papers_collection.query.near_text(
    # STUDENT TODO: Can you think of a semantic search query to find papers about a topic?
    # (Note: It may be very difficult / impossible!)
    query="machine learning",
    limit=5
)

for o in response.objects:
    print(o.properties["title"])

Probabilistic Approach to Neural Networks Computation Based on Quantum
  Probability Model Probabilistic Principal Subspace Analysis Example
Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based
  Search
Discussion on "Techniques for Massive-Data Machine Learning in
  Astronomy" by A. Gray
Transfer Learning Using Feature Selection
Bayesian Active Learning for Classification and Preference Learning


Can you filter only for papers with a particular main topic? (e.g. classification)

In [None]:
# Hint: https://weaviate.io/developers/weaviate/search/filters
from weaviate.classes.query import Filter

response = papers_collection.query.fetch_objects(
    limit=3,
    ## STUDENT TODO: Can you think of a filter that will only return papers about classification?
    # (Note: It may be very difficult / impossible!)
    filters=(
        Filter.by_property("abstract").like("*classification*") |
        Filter.by_property("title").like("*classification*")
    )
)

for o in response.objects:
    print(o.properties["title"])

### Does your data meet your needs?

What if: 
- The data is in the wrong language?
- Each abstract is too long?

Would you want to perform a RAG query each time?




## Try the Weaviate Transformation Agent 

### Task 1: Create a `topics` property

Define the operation(s) that you want to perform on the data.

In [15]:
prompt_create_topics = """
Create a list of topic tags based on the abstract.
Topics should be distinct from each other. Provide a maximum of 5 topics.
Group similar topics under one topic tag.
"""

In [None]:
# Hint: https://weaviate.io/developers/agents/transformation/usage#define-transformation-operations
from weaviate.agents.classes import Operations

add_topics = Operations.append_property(
    property_name="topics",             # Property to create
    data_type=DataType.TEXT_ARRAY,      # Data type of the property
    view_properties=["abstract"],       # Existing properties to view for the operation
    instruction=prompt_create_topics,   # Instruction to the Transformation Agent
)

Instantiate the agent & start the operations

In [None]:
# Hint: https://weaviate.io/developers/agents/transformation/usage#start-the-transformation-operations
from weaviate.agents.transformation import TransformationAgent

ta = TransformationAgent(
    client=client,              # Weaviate client object
    collection=collection_name, # Collection name
    operations=[add_topics]     # List of transform operations
)

ta_response = ta.update_all()

What does the response look like?

In [18]:
ta_response

TransformationResponse(workflow_id='TransformationWorkflow-a40cb09222d53c5448420d65183456f4')

The response contains the unique `workflow_id` of the operations. 

This does not mean that the operations are finished!

**The Transformation Agent is asynchronous**. You can check the status of the operation using the `workflow_id`.

In [19]:
ta.get_status(workflow_id=ta_response.workflow_id)

{'workflow_id': 'TransformationWorkflow-a40cb09222d53c5448420d65183456f4',
 'status': {'batch_count': 0,
  'end_time': None,
  'start_time': '2025-03-25 15:47:49',
  'state': 'running',
  'total_duration': None,
  'total_items': 0}}

We can periodically check if the operation is done

In [20]:
def get_ta_status(agent_instance, workflow_id):
    # Rough code to check the status of the TA workflow
    import time
    from datetime import datetime

    while True:
        status = agent_instance.get_status(workflow_id=workflow_id)

        if status["status"]["state"] != "running":
            break

        # Calculate elapsed time from start_time
        start = datetime.strptime(status["status"]["start_time"], "%Y-%m-%d %H:%M:%S")
        elapsed = (datetime.now() - start).total_seconds()

        print(f"Waiting... Elapsed time: {elapsed:.2f} seconds")
        time.sleep(10)

    # Calculate total time
    if status["status"]["total_duration"]:
        total = status["status"]["total_duration"]
    else:
        start = datetime.strptime(status["status"]["start_time"], "%Y-%m-%d %H:%M:%S")
        end = datetime.now() if not status["status"]["end_time"] else datetime.strptime(status["status"]["end_time"], "%Y-%m-%d %H:%M:%S")
        total = (end - start).total_seconds()

    print(f"Total time: {total:.2f} seconds")
    print(status)

In [21]:
get_ta_status(agent_instance=ta, workflow_id=ta_response.workflow_id)

Waiting... Elapsed time: 2.04 seconds
Waiting... Elapsed time: 12.60 seconds
Waiting... Elapsed time: 23.09 seconds
Total time: 28.92 seconds
{'workflow_id': 'TransformationWorkflow-a40cb09222d53c5448420d65183456f4', 'status': {'batch_count': 1, 'end_time': '2025-03-25 15:48:18', 'start_time': '2025-03-25 15:47:49', 'state': 'completed', 'total_duration': 28.917139, 'total_items': 50}}


**How the Transformation Agent works**

<center><img src="../img/ta_schematic.png" width="60%"></center>

The `TransformationAgent` connects to your Weaviate Cloud instance, and uses LLMs to follow these instructions.

When the operation is complete - let's see what we can do with the data:

In [22]:
from weaviate.classes.query import Metrics

response = papers_collection.aggregate.over_all(
    return_metrics=Metrics("topics").text(
        top_occurrences_count=True,
        top_occurrences_value=True,
        min_occurrences=10
    )
)

for t in response.properties["topics"].top_occurrences:
    print(t)

TopOccurrence(count=36, value='Machine Learning')
TopOccurrence(count=8, value='Computer Science')
TopOccurrence(count=7, value='Data Analysis')
TopOccurrence(count=7, value='Statistics')
TopOccurrence(count=6, value='Optimization')
TopOccurrence(count=5, value='Algorithms')
TopOccurrence(count=5, value='Artificial Intelligence')
TopOccurrence(count=5, value='Mathematics')
TopOccurrence(count=4, value='Classification')
TopOccurrence(count=4, value='Reinforcement Learning')


Try to filter for papers with particular topics:

In [None]:
# Hint: https://weaviate.io/developers/weaviate/search/filters
from weaviate.classes.query import Filter

response = papers_collection.query.fetch_objects(
    limit=3,
    filters=Filter.by_property("topics").like("*machine*")
)

for o in response.objects:
    print(o.properties["title"])

Bayesian and L1 Approaches to Sparse Unsupervised Learning
Adapting to Non-stationarity with Growing Expert Ensembles
Probabilistic Approach to Neural Networks Computation Based on Quantum
  Probability Model Probabilistic Principal Subspace Analysis Example


Inspect an object again:

In [24]:
response = papers_collection.query.fetch_objects(
    limit=3,
)

for o in response.objects:
    for k, v in o.properties.items():
        print(f"{k}: {v[:50]}")
    print()

abstract:   We consider finite horizon Markov decision proce
title: Mean-Variance Optimization in Markov Decision Proc
topics: ['Markov Decision Processes', 'Performance Measures', 'Pseudopolynomial Algorithms', 'Optimization', 'Computational Complexity']

abstract:   We propose a new approach to value function appr
topics: ['Reinforcement Learning', 'Subspace Identification', 'Machine Learning', 'Value Function Approximation', 'Predictive State Temporal Difference']
title: Predictive State Temporal Difference Learning

abstract:   We study the problem of dynamic spectrum sensing
title: Algorithms for Dynamic Spectrum Access with Learni
topics: ['Cognitive Radio', 'Markov Decision Process', 'Wireless Communication', 'Machine Learning', 'Statistics']



### Task 2: Perform multiple operations

- Add a `paper_type` property (e.g. `survey`, `method`, `resource`)
- Add a boolean property `relevant_to_rag` (True/False)

In [None]:
prompt_paper_type = """
Determine the primary type of paper based on the abstract. Assign exactly one of the following categories that best represents the paper's main contribution:

'survey':   Comprehensive review or meta-analysis of existing work in a field
'model':    Introduction of a new predictive model, statistical method, or algorithmic approach
'system':   Description of a new data pipeline, workflow, framework, or system architecture
'analysis': Focused on insights derived from analyzing data
'resource': Introduction of a new dataset, benchmark, or tool for data science
'other':    None of the above
"""

add_paper_type = Operations.append_property(
      ## STUDENT TODO: Can you complete defining the following operation code?
      property_name="paper_type",
      data_type=DataType.TEXT,
      view_properties=["abstract"],
      instruction=prompt_paper_type,
)

In [None]:
prompt_about_classification = """
Based on the abstract, determine whether the paper is
primarily about the machine field of classification.

Do not include papers that are obliquely, or vaguely about classification.
"""

add_about_classification_bool = Operations.append_property(
    ## STUDENT TODO: Can you complete defining the following operation code?
    property_name="about_classification",
    data_type=DataType.BOOL,
    view_properties=["abstract"],
    instruction=prompt_about_classification,
)

In [None]:
prompt_add_french_title_suffix = """
Update the title to ensure that it contains the French translation of itself in parantheses, after the original title.
"""

update_title = Operations.update_property(
    ## STUDENT TODO: Can you complete defining the following operation code?
    property_name="title",
    view_properties=["title"],
    instruction=prompt_add_french_title_suffix,
)

In [None]:
from weaviate.agents.transformation import TransformationAgent

ta = TransformationAgent(
    client=client,
    ## STUDENT TODO: Can you complete defining the following agent definition
    collection=collection_name,
    operations=[
        update_title,
        add_paper_type,
        add_about_classification_bool
    ],
)

ta_response = ta.update_all()

Note that this still returns one object, with one workflow ID, even though we are performing multiple operations.

In [None]:
## STUDENT TODO: Do you remember how to fetch the status of the TA workflow?
ta.get_status(workflow_id=ta_response.workflow_id)

{'workflow_id': 'TransformationWorkflow-b9ff4ff1e9ece12fb74d8c5edb113777',
 'status': {'batch_count': 0,
  'end_time': None,
  'start_time': '2025-03-25 15:48:24',
  'state': 'running',
  'total_duration': None,
  'total_items': 0}}

Let's monitor the operation as before:

In [30]:
get_ta_status(agent_instance=ta, workflow_id=ta_response.workflow_id)

Waiting... Elapsed time: 1.64 seconds
Waiting... Elapsed time: 12.13 seconds
Total time: 16.51 seconds
{'workflow_id': 'TransformationWorkflow-b9ff4ff1e9ece12fb74d8c5edb113777', 'status': {'batch_count': 1, 'end_time': '2025-03-25 15:48:40', 'start_time': '2025-03-25 15:48:24', 'state': 'completed', 'total_duration': 16.509024, 'total_items': 50}}


And again, inspect a few transformed objects:

In [31]:
response = papers_collection.query.fetch_objects(
    limit=3,
)

for o in response.objects:
    for k, v in o.properties.items():
        if type(v) == str:
            if len(v) > 50:
                v = v[:50] + "..."
        print(f"{k}: {v}")
    print()

abstract:   We consider finite horizon Markov decision proce...
title: Mean-Variance Optimization in Markov Decision Proc...
topics: ['Markov Decision Processes', 'Performance Measures', 'Pseudopolynomial Algorithms', 'Optimization', 'Computational Complexity']
paper_type: analysis
about_classification: False

abstract:   We propose a new approach to value function appr...
topics: ['Reinforcement Learning', 'Subspace Identification', 'Machine Learning', 'Value Function Approximation', 'Predictive State Temporal Difference']
paper_type: model
title: Predictive State Temporal Difference Learning (App...
about_classification: False

abstract:   We study the problem of dynamic spectrum sensing...
title: Algorithms for Dynamic Spectrum Access with Learni...
topics: ['Cognitive Radio', 'Markov Decision Process', 'Wireless Communication', 'Machine Learning', 'Statistics']
paper_type: model
about_classification: False



We see it did, in fact, perform all the specified transformation operations.

We can now use these improved properties to perform new queries. 

- e.g. what paper types do we have?

In [None]:
# https://weaviate.io/developers/weaviate/search/aggregate#aggregate-text-properties
from weaviate.classes.query import Metrics

response = papers_collection.aggregate.over_all(
    return_metrics=Metrics("paper_type").text(
        top_occurrences_count=True,
        top_occurrences_value=True,
        min_occurrences=10
    )
)

for t in response.properties["paper_type"].top_occurrences:
    print(t)

TopOccurrence(count=32, value='model')
TopOccurrence(count=16, value='analysis')
TopOccurrence(count=1, value='other')


How many objects are about classifications?

In [None]:
# Hint: https://weaviate.io/developers/weaviate/search/aggregate#filter-results
from weaviate.classes.query import Filter

response = papers_collection.aggregate.over_all(
    filters=Filter.by_property("about_classification").equal(True),
)

response.total_count

11

In [34]:
from weaviate.classes.query import Filter

response = papers_collection.query.fetch_objects(
    filters=Filter.by_property("about_classification").equal(True),
    limit=10
)

for o in response.objects:
    print(o.properties["title"])

Transfer Learning Using Feature Selection (Apprentissage transfert à l'aide de la sélection de caractéristiques)
Using a Kernel Adatron for Object Classification with RCS Data (Utiliser un Adatron pour la classification d'objets avec des données RCS)
Bayesian Active Learning for Classification and Preference Learning (Apprentissage Actif Bayésien pour la Classification et l'Apprentissage des Préférences)
Fast Inference in Sparse Coding Algorithms with Applications to Object (Inférence rapide dans les algorithmes de codage parcime avec applications à la reconnaissance d'objets)
An Explicit Nonlinear Mapping for Manifold Learning (Une Carte Non Linéaire Explicite pour l'Apprentissage de la Variété)
Bayesian Active Distance Metric Learning
Mutual information for the selection of relevant variables in spectrometric nonlinear modelling (Information mutuelle pour la sélection de variables pertinentes dans le modélisation non linéaire spectrométrique)
Optimizing F-measure: A Tale of Two Appro

What about intersections of multiple properties?

In [35]:
from weaviate.classes.query import Filter

response = papers_collection.aggregate.over_all(
    filters=(
        Filter.by_property("paper_type").equal("model") &
        Filter.by_property("about_classification").equal(True)
    )
)

response.total_count

7

Let's take a look at a few:

In [36]:
from weaviate.classes.query import Filter

response = papers_collection.query.near_text(
    query="vector",
    filters=(
        Filter.by_property("paper_type").equal("model") &
        Filter.by_property("about_classification").equal(True)
    )
)

for o in response.objects:
    print(o.properties["title"])

Using a Kernel Adatron for Object Classification with RCS Data (Utiliser un Adatron pour la classification d'objets avec des données RCS)
Fast Inference in Sparse Coding Algorithms with Applications to Object (Inférence rapide dans les algorithmes de codage parcime avec applications à la reconnaissance d'objets)
A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets (Une méthode de gradient stochastique avec un taux de convergence exponentiel pour les ensembles d entraînement finis)
Distribution-Specific Agnostic Boosting
Bayesian Active Distance Metric Learning
An Explicit Nonlinear Mapping for Manifold Learning (Une Carte Non Linéaire Explicite pour l'Apprentissage de la Variété)
Bayesian Active Learning for Classification and Preference Learning (Apprentissage Actif Bayésien pour la Classification et l'Apprentissage des Préférences)


## Bonus: Use the Query Agent

The Weaviate [Query Agent](https://weaviate.io/developers/agents/query) is another agentic service on Weaviate Cloud. The Query Agent allows you to query your Weaviate instance using natural language.

In [None]:
# Hint: https://weaviate.io/developers/agents/query/usage#1-instantiate-the-query-agent
from weaviate.agents.query import QueryAgent

qa = QueryAgent(
    client=client, collections=[collection_name]
)

Now, we can just tell the Query Agent to do the hard & boring stuff (syntax lookup!) for us.

In [None]:
# Hint: https://weaviate.io/developers/agents/query/usage#2-perform-queries
response = qa.run(
    """
    Find papers that are about classification. Tell me about some of them.
    Hint: There is a property called 'about_classification' that you can use.
    """,
)

# Print the response
response.display()





In [39]:
# Perform a query
response = qa.run(
    """
    How many papers are primarily about models?

    Hint: There is a property called 'paper_type' where the available values are: 'survey', 'model', 'system', 'analysis', 'resource', 'other'.
    """
)

# Print the response
response.display()





We can even ask it follow-up queries:

In [40]:
followup_response = qa.run(
    query="Can you select one or two of these papers and explain them in simple terms? I am not a data scientist.", context=response
)

followup_response.display()





Read more about the [Query Agent](https://weaviate.io/blog/query-agent) on our blog.

## Bonus: Current limitations

Remember that the Transformation Agent is being asked to update data objects for us. So, be very careful with the instructions you provide.

And currently, it is in technical preview. Do not use it in a production environment (*yet* 😉).

- Do not run multiple agents at the same time - this can cause conflicts (race conditions).
- There is a limit of 10,000 operations per day per Weaviate Cloud organization.

In [41]:
from weaviate.classes.config import Configure, Property, DataType

collection_name = "ArxivPapersDemo"

# Can delete the collection if you would like to (re)start fresh
client.collections.delete(collection_name)

client.collections.create(
    collection_name,
    description="A dataset that lists research paper titles and abstracts",
    properties=[
        Property(name="title", data_type=DataType.TEXT),
        Property(name="abstract", data_type=DataType.TEXT),
    ],
    vectorizer_config=[
        Configure.NamedVectors.text2vec_weaviate(
            name="default",
            source_properties=["title", "abstract"],
        )
    ]
)

papers_collection = client.collections.get(collection_name)
columns = papers_dataset[0]["properties"].keys()

with papers_collection.batch.fixed_size(100) as batch:
    for i, item in enumerate(papers_dataset):
        if i < 5:
            properties = {col: item["properties"][col] for col in columns}
            batch.add_object(properties=properties)


if papers_collection.batch.failed_objects:
    for fo in papers_collection.batch.failed_objects[:3]:
        print(fo.message)
        print(fo.object_)

len(papers_collection)

5

In [42]:
from weaviate.agents.transformation import TransformationAgent

responses = []
new_languages = ["spanish", "german", "italian"]

for lang in new_languages:

    prompt_task = f"""
    Create a {lang} version of the abstract
    """

    task = Operations.append_property(
        property_name=f"test_{lang}_abstract",
        data_type=DataType.TEXT,
        view_properties=["abstract"],
        instruction=prompt_task,
    )

    ta = TransformationAgent(
        client=client,
        collection=collection_name,
        operations=[task],
    )

    ta_response = ta.update_all()
    responses.append(ta_response)

print(responses)

[TransformationResponse(workflow_id='TransformationWorkflow-4564ffc201eefcca5145f209ce744ef8'), TransformationResponse(workflow_id='TransformationWorkflow-0608e9ce0bd8881674b282cc4ccf8813'), TransformationResponse(workflow_id='TransformationWorkflow-9719ea0dcbab0ae71700d257fe1df73a')]


In [43]:
for r in responses:
    get_ta_status(agent_instance=ta, workflow_id=r.workflow_id)

Waiting... Elapsed time: 4.12 seconds
Total time: 12.08 seconds
{'workflow_id': 'TransformationWorkflow-4564ffc201eefcca5145f209ce744ef8', 'status': {'batch_count': 1, 'end_time': '2025-03-25 15:49:28', 'start_time': '2025-03-25 15:49:16', 'state': 'completed', 'total_duration': 12.080793, 'total_items': 5}}
Total time: 14.09 seconds
{'workflow_id': 'TransformationWorkflow-0608e9ce0bd8881674b282cc4ccf8813', 'status': {'batch_count': 1, 'end_time': '2025-03-25 15:49:32', 'start_time': '2025-03-25 15:49:18', 'state': 'completed', 'total_duration': 14.088394, 'total_items': 5}}
Total time: 14.15 seconds
{'workflow_id': 'TransformationWorkflow-9719ea0dcbab0ae71700d257fe1df73a', 'status': {'batch_count': 1, 'end_time': '2025-03-25 15:49:33', 'start_time': '2025-03-25 15:49:19', 'state': 'completed', 'total_duration': 14.148659, 'total_items': 5}}


If these operations worked perfectly, all objects should have all new properties (`test_spanish_abstract`, `test_german_abstract`, `test_italian_abstract`). 

In [48]:
response = papers_collection.query.fetch_objects(
    limit=50
)

properties = []
for o in response.objects:
    for p in o.properties:
        if p not in properties:
            properties.append(p)
            print(f"Found property: {p} in object UUID: {o.uuid}")

print("\nNow checking for empty properties...")
for o in response.objects:
    for p in properties:
        if o.properties[p] is None or o.properties[p] == "":
            print(f"Property {p} is empty in object UUID: {o.uuid}")

Found property: abstract in object UUID: 3a80e1d0-98a1-4bda-81ae-84e5877460ba
Found property: title in object UUID: 3a80e1d0-98a1-4bda-81ae-84e5877460ba
Found property: test_german_abstract in object UUID: 3a80e1d0-98a1-4bda-81ae-84e5877460ba
Found property: test_spanish_abstract in object UUID: 3a80e1d0-98a1-4bda-81ae-84e5877460ba
Found property: test_italian_abstract in object UUID: 3a80e1d0-98a1-4bda-81ae-84e5877460ba

Now checking for empty properties...
Property test_german_abstract is empty in object UUID: 8df29147-d56c-4fd9-8ed2-73442e10d368
Property test_spanish_abstract is empty in object UUID: 8df29147-d56c-4fd9-8ed2-73442e10d368
Property test_german_abstract is empty in object UUID: 9e8a9082-5931-4523-b021-2cd9141343f2


But since we have very few objects, multiple objects worked on the same object at the same time. 

This shouldn't happen much in a real-world scenario, but it's something to keep in mind.

## Further resources

- Blog: ["Introducing the Weaviate Transformation Agent"](https://weaviate.io/blog/transformation-agent)
- Documentation: [Weaviate Transformation Agent](https://weaviate.io/developers/agents/transformation)