<h2 style="color:red;text-align:center;font-weight:bold;">AeroStream Analytics</h2>

AeroStream Analytics est un système intelligent de classification automatique des avis clients des compagnies aériennes. Il analyse en temps réel le sentiment des utilisateurs afin de mesurer leur niveau de satisfaction et fournir des indicateurs clés de performance.

**Objectifs :**

Développer un système de classification automatique des avis clients en temps réel, Le
système devra permettre de:

- Collecter et prétraiter les avis clients,

- Analyser automatiquement le sentiment et la satisfaction,

- Générer des indicateurs de performance par compagnie aérienne,

- Visualiser les résultats via un tableau de bord interactif.

![Python](https://img.shields.io/badge/Python-3.9%2B-blue)
![Airflow](https://img.shields.io/badge/Apache%20Airflow-Orchestration-green)
![Streamlit](https://img.shields.io/badge/Streamlit-Dashboard-red)
![ChromaDB](https://img.shields.io/badge/ChromaDB-Vector%20Store-orange)

<br>

<h3 style="color:green;font-weight:bold;">Stockage dans ChromaDB :</h3>

<h4 style="color:orange;font-weight:bold;">1. Charger les Metadonnées :</h4>

In [32]:
import pandas as pd

def import_metadata(pkl_path:str) -> pd.DataFrame :
    metadata = pd.read_pickle(pkl_path)
    print("Metadonnées Chargées avec Succès !")
    return metadata

train_metadata = import_metadata("../metadata/train.pkl")
train_metadata.head()

Metadonnées Chargées avec Succès !


Unnamed: 0,id,text,embedding,sentiment
0,traintweet_0,is the real mvp for holding up the flight conn...,"[0.013936681672930717, -0.0005504910368472338,...",negative
1,traintweet_1,i would love for you to fly my best friend hom...,"[-0.07729382067918777, 0.03653344139456749, -0...",positive
2,traintweet_2,well then that is a horrible flaw in your syst...,"[-0.05217428132891655, -0.08685114979743958, -...",negative
3,traintweet_3,no response to dm or email yet customer service,"[-0.0025853964034467936, 0.0060002151876688, -...",negative
4,traintweet_4,flight 837 landed at 10 30 pm and we are still...,"[0.012473358772695065, -0.024746768176555634, ...",negative


In [33]:
test_metadata = import_metadata("../metadata/test.pkl")

test_metadata.head()

Metadonnées Chargées avec Succès !


Unnamed: 0,id,text,embedding,sentiment
0,testtweet_0,im just praying you get me home alive,"[-0.046768564730882645, -0.008228247985243797,...",negative
1,testtweet_1,honestly i m glad you didn t cancelled flight ...,"[-0.05021535977721214, -0.01710277609527111, -...",negative
2,testtweet_2,is really stepping up their service,"[0.019056325778365135, 0.05563133582472801, -0...",positive
3,testtweet_3,it still says that i can t check into my fligh...,"[0.021298479288816452, -0.03081609308719635, -...",negative
4,testtweet_4,airways corporation jblu loses 0 06 on new...,"[-0.026118002831935883, 0.09736420959234238, -...",neutral


<h4 style="color:orange;font-weight:bold;">2. ChromaDB :</h4>

<h5 style="font-weight:bold;">2.1. Vérifier l'installation de ChromaDB :</h5>

In [34]:
import chromadb

print("Version :", chromadb.__version__)

Version : 1.3.6


<h5 style="font-weight:bold;">2.2. Créer la Collection :</h5>

In [35]:
client = chromadb.PersistentClient(path="../chroma_db")

train_collection = client.get_or_create_collection("train", metadata={"hsnw:space" : "cosine"})

test_collection = client.get_or_create_collection("test", metadata={"hsnw:space" : "cosine"})

<h5 style="font-weight:bold;">2.3. Insérer Embeddings, Identifiants et Labels  :</h5>

In [36]:
from tqdm import tqdm

def insert_in_chromadb(metadata, collection, batch_size=5000) :
    total_items = len(metadata)

    for i in tqdm(range(0, total_items, batch_size), desc="Insertion dans ChromaDB"):
        batch = metadata.iloc[i:i + batch_size]

        batch_ids = batch["id"].astype(str).tolist()
        batch_embeddings = batch["embedding"].tolist()
        batch_metadata = batch[["sentiment", "text"]].to_dict("records")

        collection.add(
            ids=batch_ids,
            embeddings=batch_embeddings,
            metadatas=batch_metadata
        )
    
    print(f"\n- Nombre Total d'items dans la Collection Train : {collection.count()}")
    
    item = collection.peek(limit=1)
    
    print("\nItem : ")
    print(f"- ID: {item['ids'][0]}")
    print(f"- Label: {item['metadatas'][0]['sentiment']}")
    print(f"- Text: {item['metadatas'][0]['text'][:50]}...")
    print(f"- Embedding : {item['embeddings'][0][:10]}...")
    print(f"- Embedding length: {len(item['embeddings'][0])}")
    

In [37]:
insert_in_chromadb(train_metadata, train_collection)

Insertion dans ChromaDB:   0%|          | 0/3 [00:00<?, ?it/s]

Insertion dans ChromaDB: 100%|██████████| 3/3 [00:02<00:00,  1.15it/s]


- Nombre Total d'items dans la Collection Train : 11376

Item : 
- ID: traintweet_0
- Label: negative
- Text: is the real mvp for holding up the flight connecti...
- Embedding : [ 0.01393668 -0.00055049 -0.0040329   0.02747037 -0.03414666 -0.02162108
 -0.00461301 -0.03315642 -0.11545385  0.00193469]...
- Embedding length: 768





In [38]:
insert_in_chromadb(test_metadata, test_collection)

Insertion dans ChromaDB: 100%|██████████| 1/1 [00:00<00:00,  1.49it/s]


- Nombre Total d'items dans la Collection Train : 2844

Item : 
- ID: testtweet_0
- Label: negative
- Text: im just praying you get me home alive...
- Embedding : [-0.04676856 -0.00822825 -0.00377828  0.003065    0.03402406  0.03391403
 -0.02974964 -0.00652371  0.00348479 -0.04536733]...
- Embedding length: 768



