# Technical Test: AI-Powered Q&A System

Installation des bibliothèques

In [19]:
%pip install PyPDF2 sentence-transformers faiss-cpu transformers torch numpy

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Note: you may need to restart the kernel to use updated packages.


In [20]:
import os

Chargement du fichier texte ou PDF

In [21]:
import PyPDF2

def load_text_from_file(file_path):
    if file_path.endswith('.txt'):
        with open(file_path, 'r', encoding='utf-8') as file:
            return file.read()
    elif file_path.endswith('.pdf'):
        text = ""
        with open(file_path, "rb") as file:
            reader = PyPDF2.PdfReader(file)
            for page in reader.pages:
                text += page.extract_text() + "\n"
        return text
    else:
        raise ValueError("Format non supporté. Utilisez .txt ou .pdf")

file_path = "Description.txt"  
document_text = load_text_from_file(file_path)
#   Test 
print(document_text) 


NEOV is an AI-powered company focused on automation and efficiency in the finance sector.



Conversion du texte en vecteur numériques

In [22]:
from sentence_transformers import SentenceTransformer

embedding_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")

def generate_embeddings(text):
    return embedding_model.encode(text, convert_to_tensor=False) 

embedding = generate_embeddings(document_text)
print(" Vecteur généré :", embedding.shape)  


 Vecteur généré : (384,)


 Stockage et recherche avec FAISS

In [23]:
import faiss
import numpy as np

class VectorDatabase:
    def __init__(self, embedding_dim):
        self.index = faiss.IndexFlatL2(embedding_dim)
        self.texts = []  

    def add_texts(self, texts, embeddings):
        self.texts.extend(texts)
        self.index.add(np.array(embeddings))

    def search(self, query_embedding, top_k=3):
        distances, indices = self.index.search(np.array([query_embedding]), top_k)
        return [self.texts[i] for i in indices[0]]

db = VectorDatabase(embedding_dim=384)
db.add_texts([document_text], [embedding])
print("Base FAISS initialisée et document ajouté ")


Base FAISS initialisée et document ajouté 


Génération d’une réponse avec un modèle rapide sur CPU ( parceque je travaille sur Ubuntu ainsi que j'ai une carte graphine intel et l'éxecution necessite une carte graphique)

In [24]:
from transformers import pipeline


qa_pipeline = pipeline(
    "text2text-generation",
    model="google/flan-t5-small"
)

def generate_response(question, retrieved_texts):
    context = " ".join(retrieved_texts)
    prompt = f"Context: {context}\n\nQuestion: {question}\n\nAnswer:"
    response = qa_pipeline(prompt, max_length=100, do_sample=True)
    return response[0]["generated_text"]

# Test 
question = "What does NEOV do?"
query_embedding = generate_embeddings(question) 
retrieved_texts = db.search(query_embedding, top_k=2)  

answer = generate_response(question, retrieved_texts)
print("Réponse générée :", answer)


Device set to use cpu


Réponse générée : IT is an AI-powered company focused on automation and efficiency in the finance sector
