<h2 style="color:red;text-align:center;font-weight:bold;">AeroStream Analytics</h2>

AeroStream Analytics est un système intelligent de classification automatique des avis clients des compagnies aériennes. Il analyse en temps réel le sentiment des utilisateurs afin de mesurer leur niveau de satisfaction et fournir des indicateurs clés de performance.

**Objectifs :**

Développer un système de classification automatique des avis clients en temps réel, Le
système devra permettre de:

- Collecter et prétraiter les avis clients,

- Analyser automatiquement le sentiment et la satisfaction,

- Générer des indicateurs de performance par compagnie aérienne,

- Visualiser les résultats via un tableau de bord interactif.

![Python](https://img.shields.io/badge/Python-3.9%2B-blue)
![Airflow](https://img.shields.io/badge/Apache%20Airflow-Orchestration-green)
![Streamlit](https://img.shields.io/badge/Streamlit-Dashboard-red)
![ChromaDB](https://img.shields.io/badge/ChromaDB-Vector%20Store-orange)

<br>

<h3 style="color:green;font-weight:bold;">Embeddings & Metadonées :</h3>

<h4 style="color:orange;font-weight:bold;">1. Charger le Dataset :</h4>

In [42]:
import pandas as pd

def import_data(csv_path:str) -> pd.DataFrame :
    df = pd.read_csv(csv_path)
    print("Data Chargées avec Succès !")
    return df

df_train = import_data("../data/processed/train.csv")

df_train.head()

Data Chargées avec Succès !


Unnamed: 0,text,sentiment,clean_text
0,@USAirways is the real MVP for holding up the ...,negative,is the real mvp for holding up the flight conn...
1,@JetBlue I would love for you to fly my best f...,positive,i would love for you to fly my best friend hom...
2,@USAirways well then that is a horrible flaw i...,negative,well then that is a horrible flaw in your syst...
3,@AmericanAir no response to DM or email yet. ...,negative,no response to dm or email yet customer service
4,@USAirways @PHLAirport flight 837 landed at 10...,negative,flight 837 landed at 10 30 pm and we are still...


In [43]:
df_test = import_data("../data/processed/test.csv")

df_test.head()

Data Chargées avec Succès !


Unnamed: 0,text,sentiment,clean_text
0,@SouthwestAir Im just praying you get me home ...,negative,im just praying you get me home alive
1,@JetBlue honestly I’m glad you didn’t Cancelle...,negative,honestly i m glad you didn t cancelled flight ...
2,"@SouthwestAir is really stepping up their ""ser...",positive,is really stepping up their service
3,@USAirways it still says that I can't check in...,negative,it still says that i can t check into my fligh...
4,@JetBlue Airways Corporation (JBLU) loses -0.0...,neutral,airways corporation jblu loses 0 06 on new...


<h4 style="color:orange;font-weight:bold;">2. Encoder les Textes :</h4>

<h5 style="font-weight:bold;">2.1. Vérifier l'installation de sentence_transformers :</h5>

In [44]:
import sentence_transformers

print("Version : ", sentence_transformers.__version__)

Version :  5.1.2


<h5 style="font-weight:bold;">2.2. Charger le Modèle paraphrase-multilingual-MiniLM-L12-v2 :</h5>

In [45]:
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("sentence-transformers/paraphrase-multilingual-mpnet-base-v2")

<h5 style="font-weight:bold;">2.3. Encoder les Textes :</h5>

In [46]:
def encode_text(model:SentenceTransformer, df:pd.DataFrame) :
    tweets = df["clean_text"].tolist()
    embeddings = model.encode(tweets, normalize_embeddings=True, show_progress_bar=True)
    print(f"\n- Dimension des Embeddings : {len(embeddings[0])}")
    print(f"\n- Exemple d'un Embedding : \n {embeddings[0][:5]}")
    return embeddings

In [47]:
train_embeddings = encode_text(model, df_train)

Batches: 100%|██████████| 356/356 [03:40<00:00,  1.62it/s]


- Dimension des Embeddings : 768

- Exemple d'un Embedding : 
 [ 0.01393668 -0.00055049 -0.0040329   0.02747037 -0.03414666]





In [48]:
test_embeddings = encode_text(model, df_test)

Batches:   0%|          | 0/89 [00:00<?, ?it/s]

Batches: 100%|██████████| 89/89 [00:55<00:00,  1.60it/s]



- Dimension des Embeddings : 768

- Exemple d'un Embedding : 
 [-0.04676856 -0.00822825 -0.00377828  0.003065    0.03402406]


<h4 style="color:orange;font-weight:bold;">3. Sauvegarde des Métadonnées :</h4>

In [50]:
def save_metadata(df:pd.DataFrame, df_type:str, embeddings) -> None :
    df = df.drop(columns=["text"])
    df = df.rename(columns={"clean_text":"text"})
    df["embedding"] = embeddings.tolist()
    df["id"] = df_type + "tweet_" + df.index.astype(str)
    df = df[["id", "text", "embedding", "sentiment"]]
    df.to_pickle(f"../metadata/{df_type}.pkl")
    print(f"{df_type} Metadonnées Sauvegardées avec Succès !")

save_metadata(df_train, "train", train_embeddings)

train Metadonnées Sauvegardées avec Succès !


In [51]:
save_metadata(df_test, "test", test_embeddings)

test Metadonnées Sauvegardées avec Succès !
