# Comment créer une base de données vectorielle avec PostgreSLQ et pgvector

## 1. Créer la BDD PostgreSQL + pgvector avec dans un conteneur 

### Docker-compose.yaml

Tout d'abord, nous devons créer un fichier `docker-compose.yml` avec les services nécessaires.

Dans ce fichier, nous définissons un service appelé `db` qui est basé sur l'image Docker `pgvector/pgvector:pg16`. Le service expose le port `5432` pour interagir avec la base de données et configure des variables d'environnement pour le nom de la base de données, l'utilisateur, le mot de passe et la méthode d'authentification. De plus, nous montons un fichier `init.sql` dans le répertoire `/docker-entrypoint-initdb.d` à l'intérieur du conteneur à des fins d'initialisation.

### Fichier init.sql

Dans ce script `init.sql`, nous activons l'extension `pgvector`, si elle n'existe pas déjà. Ensuite, nous créons une table appelée `embedding` avec les colonnes : `id`, `embedding`, `text` et `created_at`.

### Lancer le conteneur

```
docker-compose up -d
```

Cette commande créera un conteneur Docker avec le serveur PostgreSQL et l'extension `pgvector` déjà installés et configurés, en fonction des spécifications du fichier `docker-compose.yml`.


#### Obsolète 

Créer un base de données PostgreSQL avec `docker run`

```
docker run -d --name postgresCont -p 5432:5432 -e POSTGRES_PASSWORD=pass123 postgres
docker run -d --name postgresCont -p 5432:5432 -e POSTGRES_PASSWORD=pass123 pgvector/pgvector:pg16
docker exec -it postgresCont bash
psql -h localhost -U postgres
```

## 2. Se connecter avec psql

### Aller dans le conteneur

`docker exec -it <container id> bash`

### Se conncter à la base de donnée avec `psql`

`psql -h localhost -U postgres -d vectordb`

## 3. Importer les phrases et les vecteurs

In [3]:
import os
import pickle
import time

embedding_cache_path = "quora-embeddings-quora-distilbert-multilingual-size-100000.pkl"

embedding_size = 768  # Size of embeddings
top_k_hits = 10  # Output k hits

with open(embedding_cache_path, "rb") as fIn:
    cache_data = pickle.load(fIn)
    corpus_sentences = cache_data["sentences"]
    corpus_embeddings = cache_data["embeddings"].tolist()

In [98]:
for sentence in corpus_sentences[:10]:
    print(sentence)

When is surge pricing on Uber generally in effect in Oakland and how high does it go?
It's only 1 month left for my 12th (PCM) CBSE board exams 2017 and I didn't study at all. How can I get 80%+ ? Any tips guys?
Who is the richest disabled person in the world?
I didn't file my taxes last year. What are the forms that I will have to fill out? When is the last day to do so?
If the Bible is written by many authors, who actually assembled the anthology?
How long time charge a new mobile before first use?
Why do African-Americans seem to have lighter skin tones than Africans?
What is the difference between a graphic novel and a comic?
What would be the best online educational resources to learn for an affiliate marketing beginner?
What is the royal society?


## 4. Créer les représentations vectorielles avec Sentence Transformers

## 5. Créer une table 'quora' dans la BDD

### Connexion à la base de données avec Psycopg

- https://www.psycopg.org/psycopg3/docs/basic/usage.html

In [None]:
# ! pip install "psycopg[binary]"

In [5]:
import psycopg

# DEFINE THE DATABASE CREDENTIALS
user = 'testuser'
password = 'testpwd'
# host = '127.0.0.1'
host = 'localhost'
port = 5432
database = 'vectordb'

db_url = f"postgresql://{user}:{password}@{host}:{port}/{database}"

### Création de la table 'quora'

- Pour la colonne 'embedding' : vector(768) ou vector() ?

In [19]:
# Connect to an existing database
with psycopg.connect(conninfo=db_url) as conn:
    # Open a cursor to perform database operations
    with conn.cursor() as cur:
        # Execute a command: this creates a new table
        cur.execute("""
            CREATE TABLE IF NOT EXISTS quora (
                id serial PRIMARY KEY,
                sentence text NOT NULL,
                embedding vector(768) NOT NULL
                );
            """)
        # Make the changes to the database persistent
        conn.commit()

## 6. Sauvegarder les données et les vecteurs dans la BDD

In [21]:
with psycopg.connect(conninfo=db_url) as conn:
    # Open a cursor to perform database operations
    with conn.cursor() as cur:
        # Execute a command: this creates a new table
        for sentence, embedding in zip(corpus_sentences, corpus_embeddings):
            cur.execute("""INSERT INTO quora (sentence, embedding) VALUES (%s, %s);""", (sentence, embedding))
        # Make the changes to the database persistent
        conn.commit()

## 7. Utiliser la recherche vectorielle 

### Encoder la requète avec Sentence Transformers

- https://huggingface.co/sentence-transformers/quora-distilbert-multilingual

Ca déconne en local...

### Chargement des vecteurs de la requète

In [78]:
# load queries vectors
with open("queries.pkl", "rb") as fIn:
    cache_data = pickle.load(fIn)
    queries_sentences = cache_data["sentences"]
    queries_embeddings = cache_data["embeddings"].tolist()

In [79]:
for sentence in queries_sentences:
    print(sentence)

What is the sense of the universe?
Why is it so hard to learn AI?


### Recherche par plus proches voisins d'un vecteur 

In [96]:
# Connect to an existing database
with psycopg.connect(conninfo=db_url) as conn:
    # Open a cursor to perform database operations
    with conn.cursor() as cur:
        # Execute a command: this creates a new table
        res = cur.execute("""
            SELECT * FROM quora ORDER BY embedding <-> %s LIMIT 10;
        """, (str(queries_embeddings[1]), )).fetchall()
        for row in res:
            print(f"id : {row[0]} | sentence : {row[1]}")

id : 40586 | sentence : What is the best way to learn about AI if you aren't an engineer?
id : 7861 | sentence : What is the biggest unresolved problem for AI?
id : 21625 | sentence : Do AI and machine learning involve a lot of coding?
id : 53062 | sentence : How do I become an expert in artificial intelligence?
id : 20271 | sentence : Will be better able to predict how AI might behave if we always know that it will behave rationally?
id : 59109 | sentence : Is it important for electronics engineering student to learn python and why?
id : 79515 | sentence : What's AI really?
id : 43546 | sentence : How do I become an Artificial Intelligence expert?
id : 73069 | sentence : Many famous IT people are worried about AI and its usage in future. Is there a case when someone used AI for a bad thing nowadays?
id : 33058 | sentence : How do I start learning about artificial intelligence?


### Les plus proches voisins d'une ligne (id aléatoire)

In [89]:
import random
rand_id = random.randint(0,100000)

print(f"Random id : {rand_id}")
print()

# Connect to an existing database
with psycopg.connect(conninfo=db_url) as conn:
    # Open a cursor to perform database operations
    with conn.cursor() as cur:

        records = cur.execute("""
            SELECT id, sentence
            FROM quora
            ORDER BY embedding <=> (SELECT embedding FROM quora WHERE id = %s)
            LIMIT 20;
        """, (rand_id, )).fetchall()

        for record in records:
            print(f"id : {record[0]} | sentence : {record[1]}")

Random id : 52964

id : 52964 | sentence : How do I potty train my two-month-old Labrador pup?
id : 25349 | sentence : How do you potty train a 4 month old puppy?
id : 24514 | sentence : How do I potty train a puppy?
id : 97210 | sentence : How do you potty train large puppies?
id : 97652 | sentence : How do you potty train Mini Westie puppies?
id : 11830 | sentence : How can I potty train a Pug puppy?
id : 83019 | sentence : How do you potty train White Pitbull puppies?
id : 37333 | sentence : How do you potty train a English Bulldog/Pitbull mix puppy?
id : 41151 | sentence : How do you potty train a 4 months Pitbull?
id : 76224 | sentence : How do you potty train a 4 year old, nonverbal autistic child?
id : 72484 | sentence : How do you potty train a 6-month old Pit Bull?
id : 20560 | sentence : How do you train Dachshund/Lab mix puppies?
id : 32865 | sentence : How should you train a Dachshund/Lab mix puppy?
id : 66478 | sentence : How do I train my Beagle/German Shepherd mix puppy 