# Script d'implémentation du modèle LlaMA2

On a deux façons d'utiliser LLaMA 2 :
- via l'API Hugging Face
- en local

Workflow local :
1- Nettoyage et préparation des données textuelles
2- Loading du modèle pré-entraîné et du tokenizer
3- définir le pipeline Hugging Face
4- définir le prompt de classification
5- exécution de l'inférence sur les données textuelles avec le modèle LLaMA 2 prentraîné
6- extraction des catégories prédites
7- évaluation des performances



Workflow API :
→ Construction du prompt
→ Requête API (POST)
→ Réponse texte
→ Extraction de la catégorie
→ Évaluation des performances


In [5]:
# ----- Importation des bibliothèques nécessaires -----

import pandas as pd
import torch
import re
import transformers
from transformers import LlamaForCausalLM, LlamaTokenizer
from sklearn.metrics import classification_report, accuracy_score
import requests
import os


## 1- Nettoyage et préparation des données textuelles

In [6]:
# ---- Chargement du dataset epuré ----
data = pd.read_csv('../Data/flipkart_cleaned.csv')
print(f"données textuelles descriptives du premier article :\n {data['description'].iloc[0]}",
      f"type de données : {type(data['description'].iloc[0])}")

données textuelles descriptives du premier article :
 Key Features of Elegance Polyester Multicolor Abstract Eyelet Door Curtain Floral Curtain,Elegance Polyester Multicolor Abstract Eyelet Door Curtain (213 cm in Height, Pack of 2) Price: Rs. 899 This curtain enhances the look of the interiors.This curtain is made from 100% high quality polyester fabric.It features an eyelet style stitch with Metal Ring.It makes the room environment romantic and loving.This curtain is ant- wrinkle and anti shrinkage and have elegant apparance.Give your home a bright and modernistic appeal with these designs. The surreal attention is sure to steal hearts. These contemporary eyelet and valance curtains slide smoothly so when you draw them apart first thing in the morning to welcome the bright sun rays you want to wish good morning to the whole world and when you draw them close in the evening, you create the most special moments of joyous beauty given by the soothing prints. Bring home the elegant curta

In [7]:
# ---- Nettoyage des données textuelles descriptives ----
def clean_text(text):
    # Convertir en minuscules
    text = text.lower()
    # Supprimer les balises HTML
    text = re.sub(r'<.*?>', '', text)
    # Supprimer la ponctuation et les caractères spéciaux
    text = re.sub(r'[^a-zA-Z0-9\s]', '', text)
    # Supprimer les espaces supplémentaires
    text = re.sub(r'\s+', ' ', text).strip()
    return text


In [8]:
data['description'] = data['description'].apply(clean_text)
print(f"données textuelles descriptives du premier article après nettoyage :\n {data['description'].iloc[0]}")

données textuelles descriptives du premier article après nettoyage :
 key features of elegance polyester multicolor abstract eyelet door curtain floral curtainelegance polyester multicolor abstract eyelet door curtain 213 cm in height pack of 2 price rs 899 this curtain enhances the look of the interiorsthis curtain is made from 100 high quality polyester fabricit features an eyelet style stitch with metal ringit makes the room environment romantic and lovingthis curtain is ant wrinkle and anti shrinkage and have elegant apparancegive your home a bright and modernistic appeal with these designs the surreal attention is sure to steal hearts these contemporary eyelet and valance curtains slide smoothly so when you draw them apart first thing in the morning to welcome the bright sun rays you want to wish good morning to the whole world and when you draw them close in the evening you create the most special moments of joyous beauty given by the soothing prints bring home the elegant curtai

In [9]:
# Sauvegarde du dataset nettoyé et préparé
data.to_csv('../Data/flipkart_prepared.csv', index=False)

In [10]:
data.head()

Unnamed: 0,uniq_id,product_name,description,product_category
0,55b85ea15a1536d46b7190ad6fff8ce7,Elegance Polyester Multicolor Abstract Eyelet ...,key features of elegance polyester multicolor ...,Home Furnishing
1,7b72c92c2f6c40268628ec5f14c6d590,Sathiyas Cotton Bath Towel,specifications of sathiyas cotton bath towel 3...,Baby Care
2,64d5d4a258243731dc7bbb1eef49ad74,Eurospa Cotton Terry Face Towel Set,key features of eurospa cotton terry face towe...,Baby Care
3,d4684dcdc759dd9cdf41504698d737d8,SANTOSH ROYAL FASHION Cotton Printed King size...,key features of santosh royal fashion cotton p...,Home Furnishing
4,6325b6870c54cd47be6ebfbffa620ec7,Jaipur Print Cotton Floral King sized Double B...,key features of jaipur print cotton floral kin...,Home Furnishing


In [11]:
categories = data['product_category'].unique()
print(f"Catégories uniques dans le dataset : {categories}")

Catégories uniques dans le dataset : ['Home Furnishing' 'Baby Care' 'Watches' 'Home Decor & Festive Needs'
 'Kitchen & Dining' 'Beauty and Personal Care' 'Computers']


In [19]:
# ------- constrution des prompts pour le modèle LLaMA 2 -------
def construct_prompt(description, categories):
    prompt = f"""
<s>[INST]
Tu es un classificateur de produits.

Choisis UNE SEULE catégorie parmi :
{", ".join(categories)}

Description du produit :
{description}

Réponds uniquement par le nom exact de la catégorie.
[/INST]
"""
    return prompt.strip()

## 1- Workflow API

In [69]:
API_URL = "https://router.huggingface.co/models/meta-llama/Llama-2-7b-chat-hf"

In [None]:
# ----------- token d'authentification Hugging Face -----------
# token = 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
# export HF_TOKEN=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

'export' n'est pas reconnu en tant que commande interne
ou externe, un programme ex�cutable ou un fichier de commandes.


In [12]:
os.environ["HF_TOKEN"] = "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"
print(os.getenv("HF_TOKEN"))

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


In [3]:
from transformers import pipeline
pipe = pipeline("text-generation", model="meta-llama/Llama-2-7b-chat-hf")

  from .autonotebook import tqdm as notebook_tqdm
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00,  2.71it/s]
Device set to use cpu


In [15]:
description_example = data['description'].iloc[0]
prompt_example = construct_prompt(description_example, categories)

In [16]:
prompt_example

"Classifie le produit suivant dans UNE SEULE des catégories ci-dessous : Home Furnishing, Baby Care, Watches, Home Decor & Festive Needs, Kitchen & Dining, Beauty and Personal Care, Computers.\nDescription du produit : 'key features of elegance polyester multicolor abstract eyelet door curtain floral curtainelegance polyester multicolor abstract eyelet door curtain 213 cm in height pack of 2 price rs 899 this curtain enhances the look of the interiorsthis curtain is made from 100 high quality polyester fabricit features an eyelet style stitch with metal ringit makes the room environment romantic and lovingthis curtain is ant wrinkle and anti shrinkage and have elegant apparancegive your home a bright and modernistic appeal with these designs the surreal attention is sure to steal hearts these contemporary eyelet and valance curtains slide smoothly so when you draw them apart first thing in the morning to welcome the bright sun rays you want to wish good morning to the whole world and w

In [20]:
result = pipe(
    prompt_example,
    max_new_tokens=3,
    temperature=0.0,
    do_sample=False,
    return_full_text=False  
)
print(f"Réponse du modèle LLaMA 2 : {result}")

Réponse du modèle LLaMA 2 : [{'generated_text': '\nExem'}]


In [18]:
print(f"Réponse du modèle LLaMA 2 : {pipe(prompt_example, max_new_tokens=5, do_sample=False, temperature=0.0)}")

Réponse du modèle LLaMA 2 : [{'generated_text': "Classifie le produit suivant dans UNE SEULE des catégories ci-dessous : Home Furnishing, Baby Care, Watches, Home Decor & Festive Needs, Kitchen & Dining, Beauty and Personal Care, Computers.\nDescription du produit : 'key features of elegance polyester multicolor abstract eyelet door curtain floral curtainelegance polyester multicolor abstract eyelet door curtain 213 cm in height pack of 2 price rs 899 this curtain enhances the look of the interiorsthis curtain is made from 100 high quality polyester fabricit features an eyelet style stitch with metal ringit makes the room environment romantic and lovingthis curtain is ant wrinkle and anti shrinkage and have elegant apparancegive your home a bright and modernistic appeal with these designs the surreal attention is sure to steal hearts these contemporary eyelet and valance curtains slide smoothly so when you draw them apart first thing in the morning to welcome the bright sun rays you wa

In [70]:

HF_TOKEN = os.getenv("HF_TOKEN")

def query_hf_api(prompt):
    headers = {
        "Authorization": f"Bearer {HF_TOKEN}",
        "Content-Type": "application/json"
    }
    payload = {
        "inputs": prompt,
        "parameters": {
            "max_new_tokens": 5,
            "temperature": 0.0,
            "return_full_text": False
        }
    }
    response = requests.post(API_URL, headers=headers, json=payload)
    if response.status_code != 200:
        raise RuntimeError(f"Erreur API HF : {response.status_code} - {response.text}")

    data = response.json()
    if "generated_text" in data:
        return data["generated_text"]
    else:
        raise RuntimeError(f"Format inattendu : {data}")

In [1]:
result = query_hf_api(prompt_example)
print(f"Réponse du modèle LLaMA 2 : {result}")

NameError: name 'query_hf_api' is not defined

In [71]:
# ----- lancement des inférences -------

y_predictions = []

def classify_product(description, categories):
    prompt = construct_prompt(description, categories)
    generated_text = query_hf_api(prompt)
    # Normalisation simple
    predicted_category = generated_text.strip().lower()
    return predicted_category

In [72]:
# ------- Application du modèle aux descriptions de produits ------
y_predictions = []

for desc in data['description'].tolist():
    pred_category = classify_product(desc, categories)
    y_predictions.append(pred_category)
    print(f"Predicted Category: {pred_category}")

data['predicted_category'] = y_predictions

RuntimeError: Erreur API HF : 404 - Not Found

In [None]:
# ----- evaluation des performances du modèle -----
accuracy = accuracy_score(data['product_category'], data['predicted_category'])
report = classification_report(data['product_category'], data['predicted_category'])
print(f"Accuracy: {accuracy}")
print(f"Classification Report:\n{report}")

## 2- Workflow local

In [21]:
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-chat-hf")
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-chat-hf")

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip inst

KeyboardInterrupt: 

In [None]:
# ----- pipeline de génération de texte -----
from transformers import pipeline
pipe = pipeline('text-generation', 
                model=model, 
                tokenizer=tokenizer, 
                device=0 if torch.cuda.is_available() else -1)

In [None]:
inputs = tokenizer.apply_chat_template(
	prompt,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))