<picture>
  <source media="(prefers-color-scheme: dark)" srcset="https://vespa.ai/assets/vespa-ai-logo-heather.svg">
  <source media="(prefers-color-scheme: light)" srcset="https://vespa.ai/assets/vespa-ai-logo-rock.svg">
  <img alt="#Vespa" width="200" src="https://vespa.ai/assets/vespa-ai-logo-rock.svg" style="margin-bottom: 25px;">
</picture>

# Feeding to Vespa Cloud

Our [previous notebook](https://pyvespa.readthedocs.io/en/latest/examples/feed_performance.html), we demonstrated one way of benchmarking feed performance to a local Vespa instance running in Docker.
In this notebook, we will llok at the same methods, but how feeding to [Vespa Cloud](https://cloud.vespa.ai) affects performance of the different methods.

The key difference between feeding to a local Vespa instance and a Vespa Cloud instance is the network latency.
Additionally, we will introduce embedding in Vespa at feed time, which is a realistic scenario for many use-cases.

We will look at these 3 different methods:

1. Using `feed_iterable()` - which uses threading to parallelize the feed operation. Best for CPU-bound operations.
2. Using `feed_async_iterable()` - which uses asyncio to parallelize the feed operation. Also uses `httpx` with HTTP/2-support. Performs best for IO-bound operations.
3. Using [Vespa CLI](https://docs.vespa.ai/en/vespa-cli).


<div class="alert alert-info">
    Refer to <a href="https://pyvespa.readthedocs.io/en/latest/troubleshooting.html">troubleshooting</a>
    for any problem when running this guide.
</div>


Install [Vespa CLI](https://docs.vespa.ai/en/vespa-cli.html).
The `vespacli` python package is just a thin wrapper, allowing for installation through pypi.

> Do NOT install if you already have the Vespa CLI installed.


[Install pyvespa](https://pyvespa.readthedocs.io/), and other dependencies.


In [1]:
!pip3 install vespacli pyvespa datasets plotly>=5.20

zsh:1: 5.20 not found


## Create an application package

The [application package](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.package.ApplicationPackage)
has all the Vespa configuration files.

For this demo, we will use a simple application package


In [2]:
from vespa.package import (
    ApplicationPackage,
    Field,
    Schema,
    Document,
    FieldSet,
    HNSW,
)

# Define the application name (can NOT contain `_` or `-`)

application = "feedperformancecloud"


package = ApplicationPackage(
    name=application,
    schema=[
        Schema(
            name="doc",
            document=Document(
                fields=[
                    Field(name="id", type="string", indexing=["summary"]),
                    Field(name="text", type="string", indexing=["index", "summary"]),
                    Field(
                        name="embedding",
                        type="tensor<float>(x[1024])",
                        # Note that we are NOT embedding with a vespa model here, but that is also possible.
                        indexing=["summary", "attribute", "index"],
                        ann=HNSW(distance_metric="angular"),
                    ),
                ]
            ),
            fieldsets=[FieldSet(name="default", fields=["text"])],
        )
    ],
)

Note that the `ApplicationPackage` name cannot have `-` or `_`.


## Deploy the Vespa application

Deploy `package` on the local machine using Docker,
without leaving the notebook, by creating an instance of
[VespaDocker](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaDocker). `VespaDocker` connects
to the local Docker daemon socket and starts the [Vespa docker image](https://hub.docker.com/r/vespaengine/vespa/).

If this step fails, please check
that the Docker daemon is running, and that the Docker daemon socket can be used by clients (Configurable under advanced settings in Docker Desktop).


Follow the instrauctions from the output above and add the control-plane key in the console at `https://console.vespa-cloud.com/tenant/TENANT_NAME/account/keys`
(replace TENANT_NAME with your tenant name).


In [3]:
from vespa.deployment import VespaCloud
from vespa.application import Vespa
import os


def read_secret():
    """Read the API key from the environment variable. This is
    only used for CI/CD purposes."""
    t = os.getenv("VESPA_TEAM_API_KEY")
    if t:
        return t.replace(r"\n", "\n")
    else:
        return t


vespa_cloud = VespaCloud(
    tenant="vespa-team",
    application=application,
    key_content=read_secret()
    if read_secret()
    else None,  # Can removed this for interactive control-plane login
    application_package=package,
)

Setting application...
Running: vespa config set application vespa-team.feedperformancecloud
Setting target cloud...
Running: vespa config set target cloud

Api-key found for control plane access. Using api-key.


`app` now holds a reference to a [VespaCloud](https://pyvespa.readthedocs.io/en/latest/reference-api.html#vespa.deployment.VespaCloud) instance.


In [4]:
app: Vespa = vespa_cloud.deploy()  # deploy

Only region: aws-us-east-1c available in dev environment.
Found mtls endpoint for feedperformancecloud_container
URL: https://bf40f77a.bc737822.z.vespa-app.cloud/
Connecting to https://bf40f77a.bc737822.z.vespa-app.cloud/


In [28]:
vcapp = vespa_cloud.get_application()

Only region: aws-us-east-1c available in dev environment.
Found mtls endpoint for feedperformancecloud_container
URL: https://bf40f77a.bc737822.z.vespa-app.cloud/
Connecting to https://bf40f77a.bc737822.z.vespa-app.cloud/


Note that if you already have a Vespa Cloud instance running, the recommended way to initialize a `Vespa` instance is directly, by passing the `endpoint` and `tenant` parameters to the `Vespa` constructor, along with either:

1. Key/cert for dataplane authentication (generated as part of deployment, copied into the application package, in `/security/clients.pem`, and `~/.vespa/mytenant.myapplication/data-plane-public-cert.pem` and `~/.vespa/mytenant.myapplication/data-plane-private-key.pem`).

```python
from vespa.application import Vespa

app: Vespa = Vespa(
    url="https://my-endpoint.z.vespa-app.cloud",
    tenant="my-tenant",
    key_file="path/to/private-key.pem",
    cert_file="path/to/certificate.pem",
)
```

2. Using a token (must be generated in [Vespa Cloud Console](https://console.vespa.cloud)) and defined in the application package, see https://cloud.vespa.ai/en/security/guide.

```python
from vespa.application import Vespa
import os

app: Vespa = Vespa(
    url="https://my-endpoint.z.vespa-app.cloud",
    tenant="my-tenant",
    vespa_cloud_secret_token=os.getenv("VESPA_CLOUD_SECRET_TOKEN"),
)
```


In [6]:
vcapp.get_application_status()

Using mtls_key_cert Authentication against endpoint https://bf40f77a.bc737822.z.vespa-app.cloud//ApplicationStatus


<Response [200]>

## Preparing the data

In this example we use [HF Datasets](https://huggingface.co/docs/datasets/index) library to stream the
["Cohere/wikipedia-2023-11-embed-multilingual-v3"](https://huggingface.co/datasets/Cohere/wikipedia-2023-11-embed-multilingual-v3) dataset and index in our newly deployed Vespa instance.

The dataset contains wikipedia-pages, and their corresponding embeddings.

> For this exploration we will use the `id` , `text` and `embedding`-fields

The following uses the [stream](https://huggingface.co/docs/datasets/stream) option of datasets to stream the data without
downloading all the contents locally.

The `map` functionality allows us to convert the
dataset fields into the expected feed format for `pyvespa` which expects a dict with the keys `id` and `fields`:

`{ "id": "vespa-document-id", "fields": {"vespa_field": "vespa-field-value"}}`


In [7]:
from datasets import load_dataset

  from .autonotebook import tqdm as notebook_tqdm


## Utility function to create dataset with different number of documents


In [8]:
def get_dataset(n_docs: int = 1000):
    dataset = load_dataset(
        "Cohere/wikipedia-2023-11-embed-multilingual-v3",
        "simple",
        split=f"train[:{n_docs}]",
    )
    dataset = dataset.map(
        lambda x: {
            "id": x["_id"] + "-iter",
            "fields": {"text": x["text"], "embedding": x["emb"]},
        }
    ).select_columns(["id", "fields"])
    return dataset

### A dataclass to store the parameters and results of the different feeding methods


In [9]:
from dataclasses import dataclass
from typing import Callable, Optional, Iterable, Dict


@dataclass
class FeedParams:
    name: str
    num_docs: int
    max_connections: int
    function_name: str
    max_workers: Optional[int] = None
    max_queue_size: Optional[int] = None


@dataclass
class FeedResult(FeedParams):
    feed_time: Optional[float] = None

### A common callback function to notify if something goes wrong


In [10]:
from vespa.io import VespaResponse


def callback(response: VespaResponse, id: str):
    if not response.is_successful():
        print(
            f"Failed to feed document {id} with status code {response.status_code}: Reason {response.get_json()}"
        )

### Defining our feeding functions


In [11]:
import time
import asyncio
from vespa.application import Vespa

In [12]:
def feed_iterable(app: Vespa, params: FeedParams, data: Iterable[Dict]) -> FeedResult:
    start = time.time()
    app.feed_iterable(
        data,
        schema="doc",
        namespace="pyvespa-feed",
        operation_type="feed",
        max_queue_size=params.max_queue_size,
        max_workers=params.max_workers,
        max_connections=params.max_connections,
        callback=callback,
    )
    end = time.time()
    sync_feed_time = end - start
    return FeedResult(
        **params.__dict__,
        feed_time=sync_feed_time,
    )


def feed_async_iterable(
    app: Vespa, params: FeedParams, data: Iterable[Dict]
) -> FeedResult:
    start = time.time()
    app.feed_async_iterable(
        data,
        schema="doc",
        namespace="pyvespa-feed",
        operation_type="feed",
        max_queue_size=params.max_queue_size,
        max_workers=params.max_workers,
        max_connections=params.max_connections,
        callback=callback,
    )
    end = time.time()
    sync_feed_time = end - start
    return FeedResult(
        **params.__dict__,
        feed_time=sync_feed_time,
    )

## Defining our hyperparameters


In [13]:
from itertools import product

# We will only run for up to 10 000 documents here as notebook is run as part of CI.

num_docs = [
    1000,
    5_000,
    10_000,
]
params_by_function = {
    "feed_async_iterable": {
        "num_docs": num_docs,
        "max_connections": [64],
        "max_workers": [64],
        "max_queue_size": [2500],
    },
    "feed_iterable": {
        "num_docs": num_docs,
        "max_connections": [64],
        "max_workers": [64],
        "max_queue_size": [2500],
    },
}

feed_params = []
# Create one FeedParams instance of each permutation
for func, parameters in params_by_function.items():
    print(f"Function: {func}")
    keys, values = zip(*parameters.items())
    for combination in product(*values):
        settings = dict(zip(keys, combination))
        print(settings)
        feed_params.append(
            FeedParams(
                name=f"{settings['num_docs']}_{settings['max_connections']}_{settings.get('max_workers', 0)}_{func}",
                function_name=func,
                **settings,
            )
        )
    print("\n")  # Just to add space between different functions

Function: feed_async_iterable
{'num_docs': 1000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}
{'num_docs': 5000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}
{'num_docs': 10000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}


Function: feed_iterable
{'num_docs': 1000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}
{'num_docs': 5000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}
{'num_docs': 10000, 'max_connections': 64, 'max_workers': 64, 'max_queue_size': 2500}




In [14]:
print(f"Total number of feed_params: {len(feed_params)}")

Total number of feed_params: 6


Now, we will need a way to retrieve the callable function from the function name.


In [15]:
# Get reference to function from string name
def get_func_from_str(func_name: str) -> Callable:
    return globals()[func_name]

### Function to clean up after each feed

For a fair comparison, we will delete the data before feeding it again.


In [16]:
from typing import Iterable, Dict
from vespa.application import Vespa


def delete_data(app: Vespa, data: Iterable[Dict]):
    app.feed_iterable(
        iter=data,
        schema="doc",
        namespace="pyvespa-feed",
        operation_type="delete",
        callback=callback,
        max_workers=16,
        max_connections=16,
    )

## Main experiment loop


The line below is used to make the code run in Jupyter, as it is already running an event loop


In [17]:
import nest_asyncio

nest_asyncio.apply()

In [18]:
results = []
for params in feed_params:
    print("-" * 50)
    print("Starting feed with params:")
    print(params)
    data = get_dataset(params.num_docs)
    if "xxx" not in params.function_name:
        if "feed_sync" in params.function_name:
            print("Skipping feed_sync")
            continue
        feed_result = get_func_from_str(params.function_name)(
            app=app, params=params, data=data
        )
    else:
        feed_result = asyncio.run(
            get_func_from_str(params.function_name)(app=app, params=params, data=data)
        )
    print(feed_result.feed_time)
    results.append(feed_result)
    print("Deleting data")
    time.sleep(3)
    delete_data(app, data)

--------------------------------------------------
Starting feed with params:
FeedParams(name='1000_64_64_feed_async_iterable', num_docs=1000, max_connections=64, function_name='feed_async_iterable', max_workers=64, max_queue_size=2500)


Using mtls_key_cert Authentication against endpoint https://bf40f77a.bc737822.z.vespa-app.cloud//ApplicationStatus
6.565164089202881
Deleting data
--------------------------------------------------
Starting feed with params:
FeedParams(name='5000_64_64_feed_async_iterable', num_docs=5000, max_connections=64, function_name='feed_async_iterable', max_workers=64, max_queue_size=2500)
19.900628805160522
Deleting data
--------------------------------------------------
Starting feed with params:
FeedParams(name='10000_64_64_feed_async_iterable', num_docs=10000, max_connections=64, function_name='feed_async_iterable', max_workers=64, max_queue_size=2500)
43.1460280418396
Deleting data
--------------------------------------------------
Starting feed with params:
FeedParams(name='1000_64_64_feed_iterable', num_docs=1000, max_connections=64, function_name='feed_iterable', max_workers=64, max_queue_size=2500)
15.991410255432129
Deleting data
--------------------------------------------------
Star

In [19]:
# Create a pandas DataFrame with the results
import pandas as pd

df = pd.DataFrame([result.__dict__ for result in results])
df["requests_per_second"] = df["num_docs"] / df["feed_time"]
df

Unnamed: 0,name,num_docs,max_connections,function_name,max_workers,max_queue_size,feed_time,requests_per_second
0,1000_64_64_feed_async_iterable,1000,64,feed_async_iterable,64,2500,6.565164,152.319117
1,5000_64_64_feed_async_iterable,5000,64,feed_async_iterable,64,2500,19.900629,251.248342
2,10000_64_64_feed_async_iterable,10000,64,feed_async_iterable,64,2500,43.146028,231.771045
3,1000_64_64_feed_iterable,1000,64,feed_iterable,64,2500,15.99141,62.533572
4,5000_64_64_feed_iterable,5000,64,feed_iterable,64,2500,77.860155,64.217699
5,10000_64_64_feed_iterable,10000,64,feed_iterable,64,2500,167.938744,59.545521


## Plotting the results

Let's plot the results to see how the different methods compare.


In [20]:
import plotly.express as px


def plot_performance(df: pd.DataFrame):
    # Create a scatter plot with logarithmic scale for both axes using Plotly Express
    fig = px.scatter(
        df,
        x="num_docs",
        y="requests_per_second",
        color="function_name",  # Defines color based on different functions
        log_x=True,  # Set x-axis to logarithmic scale
        log_y=False,  # If you also want the y-axis in logarithmic scale, set this to True
        title="Performance: Requests per Second vs. Number of Documents",
        labels={  # Customizing axis labels
            "num_docs": "Number of Documents",
            "requests_per_second": "Requests per Second",
            "max_workers": "max_workers",
            "max_queue_size": "max_queue_size",
        },
        template="plotly_white",  # This sets the style to a white background, adhering to Tufte's minimalist principles
        hover_data=[
            "max_workers",
            "max_queue_size",
            "max_connections",
        ],  # Additional information to show on hover
    )

    # Update layout for better readability, similar to 'talk' context in Seaborn
    fig.update_layout(
        font=dict(
            size=16,  # Adjusting font size for better visibility, similar to 'talk' context
        ),
        legend_title_text="Function Details",  # Custom legend title
        legend=dict(
            title_font_size=16,
            x=800,  # Adjusting legend position similar to bbox_to_anchor in Matplotlib
            xanchor="auto",
            y=1,
            yanchor="auto",
        ),
        width=800,  # Adjusting width of the plot
    )
    fig.update_xaxes(
        tickvals=[1000, 5000, 10000],  # Set specific tick values
        ticktext=["1k", "5k", "10k"],  # Set corresponding tick labels
    )

    fig.update_traces(
        marker=dict(size=12, opacity=0.7)
    )  # Adjust marker size and opacity
    # Show plot
    fig.show()
    # Save plot as HTML file
    fig.write_html("performance.html")


plot_performance(df)

Interesting. Let's try to summarize the insights we got from this experiment:

- The `feed_async_iterable` method is approximately 3x faster than the `feed_iterable` method.
- Note that this will vary depending on the network latency between the client and the Vespa instance.


## Feeding with Vespa CLI

[Vespa CLI](https://docs.vespa.ai/en/vespa-cli) is a command-line interface for interacting with Vespa.

Among many useful features are a `vespa feed` command that is the recommended way of feeding large datasets into Vespa.
This is optimized for high feeding performance, and it will be interesting to get a feel for how performant feeding to a local Vespa instance is using the CLI.

Note that comparing feeding with the CLI is not entirely fair, as the CLI relies on prepared data files, while the pyvespa methods are streaming the data before feeding it.


## Prepare the data for Vespa CLI

Vespa CLI can feed data from either many .json files or a single .jsonl file with many documents.
The json format needs to be in the following format:

```json
{
  "put": "id:namespace:document-type::document-id",
  "fields": {
    "field1": "value1",
    "field2": "value2"
  }
}
```

Where, `put` is the document operation in this case. Other allowed operations are `get`, `update` and `remove`.

For reference, see https://docs.vespa.ai/en/vespa-cli#cheat-sheet

### Getting the datasets as .jsonl files

Now, let`s save the dataset to 3 different jsonl files of 1k, 5k, and 10k documents.


In [21]:
for n in num_docs:
    print(f"Getting dataset with {n} docs...")
    # First, let's load the dataset in non-streaming mode this time, as we want to save it to a jsonl file
    dataset_cli = load_dataset(
        "Cohere/wikipedia-2023-11-embed-multilingual-v3",
        "simple",
        split=f"train[:{n}]",  # Notice the slicing here, see https://huggingface.co/docs/datasets/loading#slice-splits
        streaming=False,
    )
    # Map to the format expected by the CLI.
    # Note that this differs a little bit from the format expected by the Python API.
    dataset_cli = dataset_cli.map(
        lambda x: {
            "put": f"id:pyvespa-feed:doc::{x['_id']}-json",
            "fields": {"text": x["text"]},
        }
    ).select_columns(["put", "fields"])
    # Save to a jsonl file
    assert len(dataset_cli) == n
    dataset_cli.to_json(f"vespa_feed-{n}.json", orient="records", lines=True)

Getting dataset with 1000 docs...


Creating json from Arrow format: 100%|██████████| 1/1 [00:00<00:00, 88.72ba/s]


Getting dataset with 5000 docs...


Map: 100%|██████████| 5000/5000 [00:00<00:00, 24261.98 examples/s]
Creating json from Arrow format: 100%|██████████| 5/5 [00:00<00:00, 430.19ba/s]


Getting dataset with 10000 docs...


Creating json from Arrow format: 100%|██████████| 10/10 [00:00<00:00, 281.61ba/s]


Let's look at the first line of one of the saved files to verify the format.


In [22]:
from pprint import pprint
import json

with open("vespa_feed-1000.json", "r") as f:
    sample = f.readline()
    pprint(json.loads(sample))

{'fields': {'text': 'April (Apr.) is the fourth month of the year in the '
                    'Julian and Gregorian calendars, and comes between March '
                    'and May. It is one of the four months to have 30 days.'},
 'put': 'id:pyvespa-feed:doc::20231101.simple_1_0-json'}


Ok, now we are ready to feed the data using Vespa CLI.
We also want to capture the output of feed statistics for each file.


In [23]:
cli_results = {}
for n in num_docs:
    print(f"Feeding {n} docs...")
    output_list = !vespa feed vespa_feed-{n}.json
    results = json.loads("".join(output_list))
    pprint(results)
    cli_results[n] = results

Feeding 1000 docs...
{'feeder.error.count': 0,
 'feeder.inflight.count': 0,
 'feeder.ok.count': 1000,
 'feeder.ok.rate': 132.811,
 'feeder.operation.count': 1000,
 'feeder.seconds': 7.529,
 'http.exception.count': 0,
 'http.request.MBps': 0.039,
 'http.request.bytes': 293011,
 'http.request.count': 1000,
 'http.response.MBps': 0.017,
 'http.response.bytes': 129388,
 'http.response.code.counts': {'200': 1000},
 'http.response.count': 1000,
 'http.response.error.count': 0,
 'http.response.latency.millis.avg': 150,
 'http.response.latency.millis.max': 953,
 'http.response.latency.millis.min': 118}
Feeding 5000 docs...
{'feeder.error.count': 0,
 'feeder.inflight.count': 0,
 'feeder.ok.count': 5000,
 'feeder.ok.rate': 400.086,
 'feeder.operation.count': 5000,
 'feeder.seconds': 12.497,
 'http.exception.count': 0,
 'http.request.MBps': 0.116,
 'http.request.bytes': 1450480,
 'http.request.count': 5000,
 'http.response.MBps': 0.052,
 'http.response.bytes': 652778,
 'http.response.code.counts'

In [24]:
cli_results

{1000: {'feeder.operation.count': 1000,
  'feeder.seconds': 7.529,
  'feeder.ok.count': 1000,
  'feeder.ok.rate': 132.811,
  'feeder.error.count': 0,
  'feeder.inflight.count': 0,
  'http.request.count': 1000,
  'http.request.bytes': 293011,
  'http.request.MBps': 0.039,
  'http.exception.count': 0,
  'http.response.count': 1000,
  'http.response.bytes': 129388,
  'http.response.MBps': 0.017,
  'http.response.error.count': 0,
  'http.response.latency.millis.min': 118,
  'http.response.latency.millis.avg': 150,
  'http.response.latency.millis.max': 953,
  'http.response.code.counts': {'200': 1000}},
 5000: {'feeder.operation.count': 5000,
  'feeder.seconds': 12.497,
  'feeder.ok.count': 5000,
  'feeder.ok.rate': 400.086,
  'feeder.error.count': 0,
  'feeder.inflight.count': 0,
  'http.request.count': 5000,
  'http.request.bytes': 1450480,
  'http.request.MBps': 0.116,
  'http.exception.count': 0,
  'http.response.count': 5000,
  'http.response.bytes': 652778,
  'http.response.MBps': 0.0

In [25]:
# Let's add the CLI results to the DataFrame
df_cli = pd.DataFrame(
    [
        {
            "name": f"{n}_cli",
            "num_docs": n,
            "max_connections": "unknown",
            "function_name": "cli",
            "max_workers": "unknown",
            "max_queue_size": "n/a",
            "feed_time": result["feeder.seconds"],
        }
        for n, result in cli_results.items()
    ]
)
df_cli["requests_per_second"] = df_cli["num_docs"] / df_cli["feed_time"]
df_cli

Unnamed: 0,name,num_docs,max_connections,function_name,max_workers,max_queue_size,feed_time,requests_per_second
0,1000_cli,1000,unknown,cli,unknown,,7.529,132.819764
1,5000_cli,5000,unknown,cli,unknown,,12.497,400.096023
2,10000_cli,10000,unknown,cli,unknown,,30.696,325.775345


In [26]:
df_total = pd.concat([df, df_cli])

plot_performance(df_total)

As you can tell, the CLI is still almost 2x faster than the `feed_async_iterable` method.

We might improve the performance of the `feed_async_iterable` method by introducing parallelism (threading) for that method as well.


## Conclusion


- Prefer to use the CLI if you care about performance. 🚀
- If you want to use pyvespa, prefer the `feed_async_iterable`- method, if you are I/O-bound.


## Cleanup


In [29]:
vespa_cloud.delete()

Deactivated vespa-team.feedperformancecloud in dev.aws-us-east-1c
Deleted instance vespa-team.feedperformancecloud.default


## Next steps

Check out some of the other
[examples](https://pyvespa.readthedocs.io/en/latest/examples.html) in the documentation.
