Skip to content
53 changes: 11 additions & 42 deletions java/cds-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -328,58 +328,27 @@ On the database, this data is serialized to [JSON](https://www.json.org/)<sup>(1

Map data can be nested and may contain nested maps and lists, which are serialized to JSON objects and arrays, respectively.

## Vector Embeddings <Beta /> { #vector-embeddings }
## Vector Embeddings { #vector-embeddings }

In CDS [vector embeddings](../guides/databases/hana#vector-embeddings) are stored in elements of type `cds.Vector`:
In CDS, [vector embeddings](../guides/databases/hana#vector-embeddings) are stored in elements of type [`Vector`](/@external/cds/types).

```cds
entity Books : cuid { // [!code focus]
title : String(111);
description : LargeString; // [!code focus]
embedding : Vector(1536); // vector space w/ 1536 dimensions // [!code focus]
} // [!code focus]
```
CAP Java support the vector type on SAP HANA, as well as H2 and SQLite for local testing. On Postgres (beta) support for vectors requires the [pgvector](https://github.com/pgvector/pgvector) extension.

In CAP Java, vector embeddings are represented by the `CdsVector` type, which allows a unified handling of different vector representations such as `float[]` and `String`:
In CAP Java, vectors are represented by the `CdsVector` type, which allows a unified handling of different vector representations such as `float[]` and `String`:

```Java
// Vector embedding of text, for example, from SAP GenAI Hub or via LangChain4j
float[] embedding = embeddingModel.embed(bookDescription).content().vector();
// Vector embedding of text via SAP Cloud SDK for AI
float[] embedding = embeddingModel.embedding(
new OpenAiEmbeddingRequest(List.of(text))).getEmbeddingVectors().get(0);

CdsVector v1 = CdsVector.of(embedding); // float[] format
CdsVector v2 = CdsVector.of("[0.42, 0.73, 0.28, ...]"); // String format
```

You can use the functions, `CQL.cosineSimilarity` or `CQL.l2Distance` (Euclidean distance) in queries to compute the similarity or distance of embeddings in the vector space. To use vector embeddings in functions, wrap them using `CQL.vector`:

```Java
CqnVector v = CQL.vector(embedding);

CdsResult<Books> similarBooks = service.run(Select.from(BOOKS).where(b ->
CQL.cosineSimilarity(b.embedding(), v).gt(0.9))
);
```

You can also use parameters for vectors in queries:

```Java
var similarity = CQL.cosineSimilarity(CQL.get(Books.EMBEDDING), CQL.param(0).type(VECTOR));

CqnSelect query = Select.from(BOOKS)
.columns(b -> b.title(), b -> similarity.as("similarity"))
.where(b -> b.ID().ne(bookId).and(similarity.gt(0.9)))
.orderBy(b -> b.get("similarity").desc());

Result similarBooks = db.run(query, CdsVector.of(embedding));
```

In CDS QL queries, elements of type `cds.Vector` are not included in select _all_ queries. They must be explicitly added to the select list:

```Java
CdsVector embedding = service.run(Select.from(BOOKS).byId(101)
.columns(b -> b.embedding())).single(Books.class).getEmbedding();
```
::: info
In CDS QL queries, elements of type `Vector` are excluded from the select list by default.
:::

CAP Java supports multiple [vector functions](./working-with-cql/query-api.md#vector-functions) that allow you to compute vector embeddings, similarity, and distance directly in the database.

## Data in CDS Query Language (CQL)

Expand Down
66 changes: 66 additions & 0 deletions java/working-with-cql/query-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -1640,6 +1640,72 @@ Scalar functions are values that are calculated from other values. This calculat

See [`Concat`](#string-expressions) String Expression


#### Vector Functions

Vector functions allow you to compute similarity and distance of [vectors](../cds-data.md#vector-embeddings), as well as [vector embeddings](../../guides/databases/hana.md#vector-embeddings) of text data directly in the database.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vector embeddings link changes when #2507 is merged


##### Computing Vector Embeddings in SAP HANA <Beta />

CAP Java supports the [VECTOR_EMBEDDING](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-sql-reference-guide/vector-embedding-function-vector) function via `CQL.vectorEmbedding` to generate vector embeddings from text data directly in SAP HANA.

To automatically generate vector embeddings on write in the database, you can define a calculated element [on-write](../../cds/cdl#on-write) using the `vector_embedding` function:

```cds
extend Incidents with {
@cds.api.ignore
embedding : cds.Vector(768) = vector_embedding(
'title: ' || title || ', summary: ' || summary,
'DOCUMENT', 'SAP_GXY.20250407') stored;
}
```

In Java queries, use the `CQL.vectorEmbedding` function to compute vector embeddings:

```java
var userQuery = CQL.val("""
Have we seen incidents with solar inverters this month,
and how were they resolved?
""");
var v = CQL.vectorEmbedding(userQuery, TextType.QUERY, "SAP_GXY.20250407");
```

On H2 and SQLite, the `vectorEmbedding` function is emulated. You can also use local [ONNX](https://onnx.ai) embedding models, which can be added for local testing via [LangChain4j embeddings](https://github.com/langchain4j/langchain4j/tree/main/embeddings):

```xml
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-embeddings-all-minilm-l6-v2-q</artifactId>
<scope>runtime</scope>
</dependency>
```

##### Computing Vector Similarity and Distance

You can use the functions, `CQL.cosineSimilarity`, and `CQL.l2Distance` (Euclidean distance) in queries to compute the similarity and distance of vectors. Distance functions are used in use cases such as finding similar items based on [vector embeddings](../../guides/databases/hana.md#vector-embeddings), for example to improve the response of an LLM to a user query. To use vector embeddings in functions, wrap them using `CQL.vector`:

```Java
CqnVector vec = CQL.vector(embedding);

var similarIncidents = db.run(Select.from(INCIDENTS).where(i ->
CQL.cosineSimilarity(i.embedding(), vec).gt(0.75))
);
```

You can also use parameters for vectors in queries:

```Java
var similarity = CQL.cosineSimilarity(
CQL.get(Incidents.EMBEDDING), CQL.param(0).type(VECTOR));

var query = Select.from(INCIDENTS)
.columns(i -> i.title(), i -> similarity.times(100).as("similarity"))
.where(i -> similarity.gt(0.75))
.orderBy(i -> i.get("similarity").desc());

Result similarIncidents = db.run(query, CdsVector.of(embedding));
Comment thread
MattSchur marked this conversation as resolved.
```

#### Case-When-Then Expressions

Use a case expression to compute a value based on the evaluation of conditions. The following query converts the stock of Books into a textual representation as 'stockLevel':
Expand Down
Loading