As of Feb 2025, Python 3.13 (default in BAS since this month) is not supported by `tensorflow` package, so conda to be used.

## Preview the machine-generated images of pets

You will be working on a set of machine-generated images of the most popular (again, accordingly to the GenAI) breeds of cats and dogs. Images are stored in the folder [./pets/](./pets/) 

In [None]:
from PIL import Image as PILImage
from IPython.display import display

In [None]:
dir_images='./pets/'

img = PILImage.open(dir_images+'21_Ragdoll.webp')
display(img.resize((400, 400)))

## Get image [embeddings](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-vector-engine-guide/vectors-vector-embeddings-and-metrics)

You will use `ResNet50V2` model from https://keras.io/api/applications/#available-models to get embeddings of the images.

Ignore possible information `I` and warning `W` messages from the first `tensorflow` import below.

In [None]:
from tensorflow.keras.applications.resnet_v2 import ResNet50V2, preprocess_input
from tensorflow.keras.preprocessing import image as tf_image

import pandas as pd
import numpy as np
# from tqdm.auto import tqdm

In [None]:
mymodel = ResNet50V2(include_top=False, weights='imagenet', pooling='avg')

The model ResNet50V2 will be downloaded during the first instantiation to the folder `~/.keras/models/`.

In [None]:
!ls -lh ~/.keras/models/

To speed up processing a bit you will reduce the size of the images by half `1024//2`. Please note the use of `//` to have integer number as a result.

In [None]:
#Function to get embeddings

def get_image_embedding(model, img_path):
    img = tf_image.load_img(img_path, target_size=(1024//2, 1024//2))
    x = tf_image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    embeddings = model.predict(x)  
    result = pd.DataFrame(embeddings[0]).T
    return result

Get embeddings for all images in the source directory and store them in the `embedding_df` Pandas DataFrame for now.

In [None]:
import os
import pandas as pd

dir_images = './pets/'

path_images = os.listdir(dir_images)
embedding_df = pd.DataFrame()
for current_img in path_images:
    curr_df = get_image_embedding(model=mymodel, img_path=dir_images + current_img)
    curr_df['image'] = current_img
    embedding_df = pd.concat([embedding_df, curr_df], ignore_index=True)

In [None]:
# Check one of the generated embeddings
display(embedding_df.iloc[0])

# Note 2048 fileds with real numbers for dimensions from 0 to 2047, plus the file name in the last field.

In [None]:
import pickle

# # Open a file in binary write mode
with open('image_embeddings.pkl', 'wb') as file:
    # Serialize the list and write it to the file
    pickle.dump(embedding_df, file)

> Switch to virtual env now to load to HANA db.

In [None]:
import pickle

# Open the file in binary read mode
with open('image_embeddings.pkl', 'rb') as file:
    # Deserialize the list from the file
    embedding_df = pickle.load(file)

print(len(embedding_df))

## Load the model into SAP HANA's Vector Engine

In [None]:
%run "../01-check_setup.ipynb"

## Upload into your SAP HANA database

...similarly to how you uploaded word vectors during the Week 2 exercise.

In [None]:
source_table="IMAGES"
source_schema="VECTORS"

In [None]:
myconn.connection.setautocommit(True)
mycursor = myconn.connection.cursor()

try:
    mycursor.execute(f'DROP TABLE "{source_schema}"."{source_table}"')
    myconn.connection.commit()

except Exception as e:
    # Handle any exceptions and possibly rollback the transaction
    myconn.connection.rollback()
    print("An error occurred:", e)

The table `IMAGES` will store:
- a file name in `"IMAGE_NAME"`
- a breed name in `"NAME"`
- an **i**mage embedding (or **v**ector) in `"IV"`
- a Base64-encoded image of a pet in `"IMAGE"`

In [None]:
myconn.create_table(
    source_table, schema=source_schema,
    table_structure={
        "IMAGE_NAME": "NVARCHAR(50)", 
        "NAME": "NVARCHAR(50)", 
        "IV": "REAL_VECTOR(2048)",
        "IMAGE": "NCLOB"
        }
    )

## Get image Base64 encodings to be stored in the database table 

In [None]:
from io import BytesIO
import base64

In [None]:
# Function to open and encode an image to Base64
def get_image_encoding(image_path, size=(400, 400)):
    img_resized = PILImage.open(image_path).resize(size)
    buffer = BytesIO()
    img_resized.save(buffer, format="WEBP")
    encoded_img = base64.b64encode(buffer.getvalue()).decode('utf-8')
    return encoded_img


In the next cell, prepare the list of records `myrecords_to_insert` to be inserted into the database.

Each record has 4 fields with:
- a file name: `[myrow[-1:][0]`
- a breed name derived from a file name: `myrow[-1:][0].split('.')[0].split('_')[1]`
- image encoding: `get_image_encoding(dir_images+myrow[-1:][0])` for an image read from the file name
- a string representation of a vector embedding `str(myrow[:-1])`

In [None]:
from PIL import Image as PILImage
from IPython.display import display

In [None]:
%%time
dir_images = './pets/'
my_embeddings=embedding_df.values.tolist()

myrecords_to_insert=[
    [myrow[-1:][0], 
    myrow[-1:][0].split('.')[0].split('_')[1], 
    get_image_encoding(dir_images+myrow[-1:][0]), 
    str(myrow[:-1])] 
    for myrow in my_embeddings]

In [None]:
#Display one of the records to see its all 4 fields
display(myrecords_to_insert[0])

In [None]:
%%time
myconn.connection.setautocommit(False)
cursor = myconn.connection.cursor()

try:
    mycursor.execute(f'TRUNCATE TABLE "{source_schema}"."{source_table}"')
    # Use the executemany method to insert the data
    cursor.executemany(
        f'''INSERT INTO "{source_schema}"."{source_table}" ("IMAGE_NAME", "NAME", "IMAGE", "IV") VALUES (?, ?, ?, TO_REAL_VECTOR(?))''', 
        myrecords_to_insert
    )

except Exception as e:
    # Handle any exceptions and possibly rollback the transaction
    myconn.connection.rollback()
    print("An error occurred:", e)

In [None]:
%%time
try:
    # Commit the transaction to save the changes
    myconn.connection.commit()

finally:
    # Close the cursor and the connection when done
    cursor.close()

## Check data in the database table

In [None]:
## Check the size of the table in the database
print(f"Number of records in the table {source_table}: {myconn.table(table=source_table, schema=source_schema).count()}")

In [None]:
## Display a record for one of the entries
word='MaineCoon'

sql = f'''
SELECT "A".* FROM "{source_schema}"."{source_table}" AS "A"
WHERE "A"."NAME"='{word}'
'''

hdf = myconn.sql(sql)
print(hdf.select_statement)
hdf.head(3).collect()

Note that the `"IV"` column is binary and can be represented in different formats in different database client tools, as mentioned by Dirk O. in his comment: https://community.sap.com/t5/application-development-discussions/questions-re-quot-multi-model-with-sap-hana-cloud-quot-developer-challenge/m-p/13732043/highlight/true#M2028526

It is only when transofrmed to the string with [`TO_NVARCHAR()`](https://help.sap.com/docs/hana-cloud-database/sap-hana-cloud-sap-hana-database-vector-engine-guide/to-nvarchar-function-data-type-conversion?version=2024_1_QRC&locale=en-US) then you can see its vector representation.

In [None]:
## Display a record for one of the entries
word='MaineCoon'

sql = f'''
SELECT TO_NVARCHAR("IV") FROM "{source_schema}"."{source_table}" AS "A"
WHERE "A"."NAME"='{word}'
'''

hdf = myconn.sql(sql)
print(hdf.select_statement)

__import__("pandas").set_option('display.max_colwidth', 180)
display(hdf.head(3).collect())

__import__("pandas").reset_option('display.max_colwidth')