# Image recognition with Python, OpenCV, OpenAI CLIP model and PostgreSQL `pgvector` 

This repository contains the working code for the example in the [blog post](https://aiven.io/developer/find-faces-with-pgvector)

The below is the overall flow:

![Overall flow](entire_flow.jpg)

## Step 0: Install requirements

In [38]:
!pip install -r requirements.txt
!pip install ipyplot


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m
Collecting ipyplot
  Downloading ipyplot-1.1.1-py3-none-any.whl (13 kB)
Collecting shortuuid
  Downloading shortuuid-1.0.11-py3-none-any.whl (10 kB)
Installing collected packages: shortuuid, ipyplot
Successfully installed ipyplot-1.1.1 shortuuid-1.0.11

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0[0m[39;49m -> [0m[32;49m23.2.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.10 -m pip install --upgrade pip[0m


## Step 1: Face recognition

Detect the faces from the [test-image](test-image.png) picture and store them under the `stored-faces` folder

In [4]:
# importing the cv2 library
import cv2
import os

# loading the haar case algorithm file into alg variable
alg = "haarcascade_frontalface_default.xml"
# passing the algorithm to OpenCV
haar_cascade = cv2.CascadeClassifier(alg)
# loading the image path into file_name variable - replace <INSERT YOUR IMAGE NAME HERE> with the path to your image

def detect_faces(folder):
    for file_name in os.listdir("train/"+folder):
        img = cv2.imread("train/"+folder+"/"+file_name, 0)
        # creating a black and white version of the image
        gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
        # detecting the faces
        faces = haar_cascade.detectMultiScale(
            gray_img, scaleFactor=1.05, minNeighbors=2, minSize=(100, 100)
        )

        i = 0
        # for each face detected
        for x, y, w, h in faces:
            # crop the image to select only the face
            cropped_image = img[y : y + h, x : x + w]
            # loading the target image path into target_file_name variable  - replace <INSERT YOUR TARGET IMAGE NAME HERE> with the path to your target image
            target_file_name = 'stored-faces/'+folder +file_name+ str(i) + '.jpg'
            cv2.imwrite(
                target_file_name,
                cropped_image,
            )
            i = i + 1;

detect_faces("muffin")
detect_faces("chihuahua")



## Step 2: Embeddings Calculation

Calculate embeddings from the faces and pushing to PostgreSQL, you'll need to change the `<SERVICE_URI>` parameter with the PostgreSQL Service URI

In [6]:
# importing the required libraries
import numpy as np
from imgbeddings import imgbeddings
from PIL import Image
import psycopg2
import os

# connecting to the database - replace the SERVICE URI with the service URI
conn = psycopg2.connect("<PG_URI>")

for filename in os.listdir("stored-faces"):
    # opening the image
    img = Image.open("stored-faces/" + filename)
    # loading the `imgbeddings`
    ibed = imgbeddings()
    # calculating the embeddings
    embedding = ibed.to_embeddings(img)
    cur = conn.cursor()
    cur.execute("INSERT INTO pictures values (%s,%s)", (filename, embedding[0].tolist()))
    print(filename)
conn.commit()

chihuahuaimg_1_190.jpg0.jpg
muffinimg_1_410.jpg1.jpg
chihuahuaimg_4_381.jpg11.jpg
chihuahuaimg_4_936.jpg19.jpg
chihuahuaimg_2_864.jpg0.jpg
muffinimg_0_830.jpg2.jpg
chihuahuaimg_4_660.jpg0.jpg
muffinimg_3_1081.jpg0.jpg
chihuahuaimg_0_626.jpg1.jpg
muffinimg_3_96.jpg4.jpg
muffinimg_2_190.jpg0.jpg
muffinimg_4_893.jpg2.jpg
chihuahuaimg_0_759.jpg0.jpg
chihuahuaimg_2_69.jpg3.jpg
muffinimg_0_495.jpg0.jpg
chihuahuaimg_0_1126.jpg0.jpg
chihuahuaimg_1_1148.jpg0.jpg
muffinimg_1_268.jpg2.jpg
muffinimg_4_156.jpg2.jpg
chihuahuaimg_3_1032.jpg0.jpg
muffinimg_3_397.jpg0.jpg
muffinimg_2_854.jpg0.jpg
chihuahuaimg_0_372.jpg1.jpg
muffinimg_0_68.jpg2.jpg
muffinimg_4_650.jpg0.jpg
chihuahuaimg_0_397.jpg0.jpg
muffinimg_4_436.jpg0.jpg
chihuahuaimg_0_239.jpg2.jpg
chihuahuaimg_3_804.jpg0.jpg
chihuahuaimg_3_494.jpg0.jpg
muffinimg_3_115.jpg1.jpg
muffinimg_3_491.jpg2.jpg
muffinimg_4_651.jpg0.jpg
chihuahuaimg_3_805.jpg0.jpg
chihuahuaimg_4_655.jpg2.jpg
muffinimg_0_183.jpg3.jpg
muffinimg_0_407.jpg0.jpg
muffinimg_4_469.jp

## Step 3: Calculate embeddings on a new picture

Find the face and calculate the embeddings on the picture `solo-image.png` used for research

In [52]:
import cv2
from PIL import Image
# loading the face image path into file_name variable
file_name = "mix.png"  # replace <INSERT YOUR FACE FILE NAME> with the path to your image
# opening the image
img = cv2.imread(file_name, 0)
        
gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
# detecting the faces
faces = haar_cascade.detectMultiScale(
    gray_img, scaleFactor=1.05, minNeighbors=2, minSize=(200, 200)
)

i = 0
# find max face
squaremin=0
selimg = ""

for x, y, w, h in faces:
    print(x)
    # crop the image to select only the face
    if h*w > squaremin:
        squaremin = h*w
        selimg = img[y : y + h, x : x + w]
        cv2.imwrite(
            "grey.jpg",
            selimg,
        )
filename="grey.jpg"
if len(faces) == 0:
    filename=file_name

img = Image.open(filename)
# loading the `imgbeddings`
ibed = imgbeddings()

# calculating the embeddings
embedding = ibed.to_embeddings(img)

## Step 3: Find similar images by querying the Postgresql database using pgvector

In [53]:
from IPython.display import Image, display
import ipyplot

cur = conn.cursor()
string_representation = "["+ ",".join(str(x) for x in embedding[0].tolist()) +"]"
cur.execute("SELECT * FROM pictures ORDER BY embedding <-> %s LIMIT 20;", (string_representation,))
rows = cur.fetchall()
images = []
display(gray_img)
for row in rows:
    images.append("stored-faces/"+row[0])
    #display(Image(filename="stored-faces/"+row[0], width="100px"))
cur.close()
ipyplot.plot_images(images, max_images=20, img_width=100)

array([[[209, 209, 209],
        [194, 194, 194],
        [180, 180, 180],
        ...,
        [186, 186, 186],
        [182, 182, 182],
        [185, 185, 185]],

       [[231, 231, 231],
        [225, 225, 225],
        [220, 220, 220],
        ...,
        [205, 205, 205],
        [206, 206, 206],
        [201, 201, 201]],

       [[235, 235, 235],
        [238, 238, 238],
        [239, 239, 239],
        ...,
        [184, 184, 184],
        [185, 185, 185],
        [176, 176, 176]],

       ...,

       [[189, 189, 189],
        [190, 190, 190],
        [178, 178, 178],
        ...,
        [160, 160, 160],
        [156, 156, 156],
        [173, 173, 173]],

       [[192, 192, 192],
        [188, 188, 188],
        [175, 175, 175],
        ...,
        [162, 162, 162],
        [156, 156, 156],
        [165, 165, 165]],

       [[202, 202, 202],
        [190, 190, 190],
        [178, 178, 178],
        ...,
        [166, 166, 166],
        [152, 152, 152],
        [158, 158, 158]]