-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbeaver/dbeaver#23390 Support REAL_VECTOR type in HANA plugin #23391
base: devel
Are you sure you want to change the base?
Conversation
The new vector data type REAL_VECTOR was introduced with HANA Cloud Database QRC 1/2024. Details about the new type are available in the SAP HANA Database Vector Engine Guide. HANA's JDBC driver natively supports that type starting with version 2.21.5. This change introduces a new value handler so that vectors are displayed like arrays. Furthermore, the column type modifiers are adapted to display vector dimension constraints.
Hello @stefanuhrig Thanks for your contribution. |
Please use a meaningful branch name next time. |
Please provide more info for our QA Team. |
@@ -72,6 +73,14 @@ protected DBPDataSourceInfo createDataSourceInfo(DBRProgressMonitor monitor, @No | |||
return info; | |||
} | |||
|
|||
@Override | |||
public DBPDataKind resolveDataKind(String typeName, int valueType) { | |||
if ("REAL_VECTOR".equalsIgnoreCase(typeName)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please create a constant in the HanaConstants class for the "REAL_VECTOR"
Which additional info do you require? You can find the full guide to the new HANA feature at https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-vector-engine-guide/sap-hana-cloud-sap-hana-database-vector-engine-guide. To shortly summarize: A new datatype has been introduced into SAP HANA Cloud. This datatype has been designed for vector embeddings, i.e. high-dimensional vectors consisting of floating-point numbers. Vector embeddings play an important role in generative AI contexts. Without this feature, vectors are displayed in their binary format in DBeaver and look like data garbage. With this feature, they are displayed like arrays, and the elements of the vectors can be edited, too. If you have any further questions or require any further information, please reach out. Here is a screenshot of the new feature: This is how it looks without this new feature: |
@@ -33,6 +33,8 @@ public class HANAValueHandlerProvider implements DBDValueHandlerProvider { | |||
public DBDValueHandler getValueHandler(DBPDataSource dataSource, DBDFormatSettings preferences, | |||
DBSTypedObject typedObject) { | |||
switch (typedObject.getTypeName()) { | |||
case "REAL_VECTOR": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't add a constant here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LonwoLonwo
Thanks for your feedback.
I did not use a constant here because there are hard-coded datatype names below.
I considered the following options:
- Use the constant and make the code look inconsistent.
- Introduce constants for
ST_Geometry
andST_Point
, which would add refactorings not related to this feature. - Use the hard-coded datatype name.
I opted for 3 but can change it. What's your proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add constants for ST_Geometry and ST_Point.
(2)
Thanks for the explanation of the future. |
@@ -24,4 +24,9 @@ public class HANAConstants { | |||
|
|||
// pseudo schema for PUBLIC SYNONYMs | |||
public static final String SCHEMA_PUBLIC = "PUBLIC"; | |||
|
|||
// Data type names | |||
public static final String DATATYPENAME_REAL_VECTOR = "REAL_VECTOR"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static final String DATATYPENAME_REAL_VECTOR = "REAL_VECTOR"; | |
public static final String DATA_TYPE_NAME_REAL_VECTOR = "REAL_VECTOR"; |
|
||
// Data type names | ||
public static final String DATATYPENAME_REAL_VECTOR = "REAL_VECTOR"; | ||
public static final String DATATYPENAME_ST_GEOMETRY = "ST_GEOMETRY"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static final String DATATYPENAME_ST_GEOMETRY = "ST_GEOMETRY"; | |
public static final String DATA_TYPE_NAME_ST_GEOMETRY = "ST_GEOMETRY"; |
// Data type names | ||
public static final String DATATYPENAME_REAL_VECTOR = "REAL_VECTOR"; | ||
public static final String DATATYPENAME_ST_GEOMETRY = "ST_GEOMETRY"; | ||
public static final String DATATYPENAME_ST_POINT = "ST_POINT"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public static final String DATATYPENAME_ST_POINT = "ST_POINT"; | |
public static final String DATA_TYPE_NAME_ST_POINT = "ST_POINT"; |
Change data type constant prefix from DATATYPENAME to DATA_TYPE_NAME.
Is there something else I need to take care of so that this change can be merged or is it just a matter of time? |
@stefanuhrig Is there a wasy to get an instance of HANA Cloud Database for thesting. We only have on premise HANA for our tests and as I understand it doesn't support this feature. Also how can I get 2.21 jdbc driver? I only see 2.20 version on maven |
We have HANA Cloud instances for OSS testing and could grant access to you. We have one in the US West region and one in Germany. Which location is closer to you? The 2.21 JDBC driver has not been released yet. Unfortunately, I am not allowed to share a preliminary version with you. I can ask the responsible colleagues for an estimate release date though. |
You can send me the credentials on matvei.baranov@dbeaver.com Germany would be better. But I want to ask if there is even a point to test or merge this without th newest driver? How would this data type be handled by the current driver? |
Both the driver and the HANA Cloud version supporting the new client/server protocol (HANA Cloud QRC 2/24) will probably be released end of June. If either the driver or the database does not support the new client/server protocol, vectors will be reported to be VARBINARY data by the database. So, if a vector column would be directly selected, the result would look like in the second screenshot above. I will send you the credentials for the HANA Cloud instance per mail. |
The new vector data type REAL_VECTOR was introduced with HANA Cloud Database QRC 1/2024. Details about the new type are available in the SAP HANA Database Vector Engine Guide.
HANA's JDBC driver natively supports that type starting with version 2.21.5.
This change introduces a new value handler so that vectors are displayed like arrays. Furthermore, the column type modifiers are adapted to display vector dimension constraints.