# Image recognition with Python, OpenCV, OpenAI CLIP model and PostgreSQL `pgvector`

This repository contains the working code for the example in the [blog post](https://aiven.io/developer/find-faces-with-pgvector)

The below is the overall flow:

![Overall flow](https://github.com/Aiven-Labs/pgvector-image-recognition/blob/main/entire_flow.jpg?raw=1)

## Step 0: Install requirements

In [None]:
# %pip install opencv-python imgbeddings psycopg2-binary

^C
Note: you may need to restart the kernel to use updated packages.


## Step 1: Face recognition

Detect the faces from the [test-image](test-image.png) picture and store them under the `stored-faces` folder

In [None]:
import cv2

alg = "haarcascade_frontalface_default.xml"
# passing the algorithm to OpenCV
haar_cascade = cv2.CascadeClassifier(alg)
file_name = "sample.jpeg"
img = cv2.imread(file_name, 0)
gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
faces = haar_cascade.detectMultiScale(
    gray_img, scaleFactor=1.05, minNeighbors=2, minSize=(100, 100)
)

i = 0
for x, y, w, h in faces:
    cropped_image = img[y : y + h, x : x + w]
    target_file_name = 'stored-faces/' + str(i) + '.jpg'
    cv2.imwrite(
        target_file_name,
        cropped_image,
    )
    i = i + 1;

## Step 2: Embeddings Calculation

Calculate embeddings from the faces and pushing to PostgreSQL, you'll need to change the `<SERVICE_URI>` parameter with the PostgreSQL Service URI

In [None]:
import numpy as np
from imgbeddings import imgbeddings
from PIL import Image
import psycopg2
import os

conn = psycopg2.connect("<SERVICE_URI>")

for filename in os.listdir("stored-faces"):
    img = Image.open("stored-faces/" + filename)
    ibed = imgbeddings()
    embedding = ibed.to_embeddings(img)
    cur = conn.cursor()
    cur.execute("INSERT INTO pictures values (%s,%s)", (filename, embedding[0].tolist()))
    print(filename)
conn.commit()

## Step 3: Calculate embeddings on a new picture

Find the face and calculate the embeddings on the picture `solo-image.png` used for research

In [None]:
# loading the face image path into file_name variable
file_name = "solo-image.png" 
# opening the image
img = Image.open(file_name)
# loading the `imgbeddings`
ibed = imgbeddings()
# calculating the embeddings
embedding = ibed.to_embeddings(img)

## Step 3: Find similar images by querying the Postgresql database using pgvector

In [None]:
from IPython.display import Image, display

cur = conn.cursor()
string_representation = "["+ ",".join(str(x) for x in embedding[0].tolist()) +"]"
cur.execute("SELECT * FROM pictures ORDER BY embedding <-> %s LIMIT 1;", (string_representation,))
rows = cur.fetchall()
for row in rows:
    display(Image(filename="stored-faces/"+row[0]))
cur.close()