<a href="https://colab.research.google.com/github/Naomie25/Hackaton-Fashion-Description-Generator/blob/Last-Version-26%2F07/Copie_de_Fashion_Description_Generator_Hackathon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1.Define the Task & Pipeline Overview

Input (keyword or image) → Generation Model → Quality-Check Module → (Optional) Image Generator → Ethical Filter → Final Output

In [None]:
!pip install transformers torch sentencepiece
!pip install schedule
!pip install --upgrade datasets

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [None]:
# ============================
# Installation des bibliothèques (à exécuter une seule fois si besoin)
# ============================
!pip install transformers torch sentencepiece
!pip install schedule

# ============================
# Imports
# ============================
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer, BartForConditionalGeneration, BartTokenizer
from transformers import pipeline, set_seed
import difflib
import re
import random

# ============================
# 1. Configuration générale
# ============================
device = torch.device("cpu")
print("Device set to use", device)

# Générateur texte
generator = pipeline('text-generation', model='distilgpt2', device=-1)
set_seed(42)

# Modèle de résumé (qualité)
bart_model_name = "facebook/bart-base"
bart_tokenizer = BartTokenizer.from_pretrained(bart_model_name)
bart_model = BartForConditionalGeneration.from_pretrained(bart_model_name).to(device)

# Mots clés mode pour le scoring
fashion_keywords = [
    "elegant", "stylish", "refined", "modern", "vintage", "casual",
    "minimalist", "chic", "versatile", "comfort", "premium", "crafted",
    "tailored", "cut", "fit", "fabric", "soft", "bold", "timeless"
]

# ============================
# Génération description produit
# ============================
def generate_descriptions(keyword, num_variants=5):
    prompt = f"""
 *ROLE:* You are a professional e-commerce copywriter specializing in luxury and contemporary fashion. Your writing is sophisticated, evocative, and persuasive.

 *TASK:* Write a compelling product description for the item provided.

 *GUIDELINES:*
 1.  *Hook:* Start with a captivating opening sentence.
 2.  *Details:* Weave in key details about material (e.g., "supple leather," "crisp cotton"), fit ("tailored silhouette," "relaxed cut"), and unique features ("polished hardware," "artisanal stitching").
 3.  *Versatility:* Suggest how or where the item can be worn to help the customer visualize it in their life.
 4.  *Tone:* Maintain an elegant, confident, and aspirational tone.
 5.  *Length:* Keep the description concise and impactful, around 3-4 sentences.

 ---
 *EXAMPLES:*

 *Item:* Tailored Wool Coat
 *Description:* Experience enduring elegance with our signature Tailored Wool Coat. Meticulously crafted from the finest Italian wool, its sharp, clean lines and structured shoulders create a powerful silhouette. A timeless investment piece that transitions seamlessly from boardroom polish to evening grace.

 *Item:* Minimalist Leather Handbag
 *Description:* Discover your new essential companion in our Minimalist Leather Handbag. Defined by its clean architecture and buttery-soft calfskin leather, this piece merges artisanal craftsmanship with modern functionality. Its spacious interior and optional crossbody strap make it the perfect accessory for a life in motion.

 ---
 *ITEM TO DESCRIBE:*

 *Item:* {keyword}
 *Description:*
 """
    outputs = generator(prompt, max_length=120, num_return_sequences=num_variants, temperature=0.8, top_p=0.9)
    results = []
    for output in outputs:
        gen_text = output["generated_text"]
        score = score_description(gen_text, prompt)
        results.append((gen_text, score))
    results = clean_descriptions(results)
    return results

# ============================
# Résumé qualité (via BART)
# ============================
def summarize_text(text):
    inputs = bart_tokenizer(text, return_tensors="pt", truncation=True, max_length=512).to(device)
    summary_ids = bart_model.generate(inputs["input_ids"], num_beams=4, max_length=30, early_stopping=True)
    summary = bart_tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

# ============================
# Filtrage éthique
# ============================
def ethical_filter(text):
    blacklist = ["hate", "violence", "racism", "sexism", "terrorism"]
    text_lower = text.lower()
    return not any(bad_word in text_lower for bad_word in blacklist)

# ============================
# Utilitaires : filtrage & score
# ============================
def has_repetitions(text, max_repeat=3):
    pattern = r'\b(\w+)( \1){' + str(max_repeat) + ',}\b'
    return re.search(pattern, text.lower()) is not None

def clean_descriptions(descriptions):
    filtered = []
    for desc, score in descriptions:
        if len(desc.split()) < 8:
            continue
        if has_repetitions(desc):
            continue
        filtered.append((desc, score))
    return filtered

def score_description(desc, prompt):
    words = desc.lower().split()
    keyword_bonus = sum(word in words for word in fashion_keywords)
    length_score = min(len(words), 50) / 50
    similarity = difflib.SequenceMatcher(None, desc.lower(), prompt.lower()).ratio()
    penalty = max(0, 1 - similarity)
    return length_score + 0.5 * keyword_bonus + penalty

# ============================
# Pipeline principal
# ============================
def run_pipeline(keyword, num_variants=5):
    print(f"\n--- Génération pour: {keyword} ---")
    descriptions = generate_descriptions(keyword, num_variants)

    final_results = []
    for desc, score in descriptions:
        summary = summarize_text(desc)
        if not ethical_filter(desc):
            print("❌ Rejeté (filtre éthique):", desc)
            continue
        final_results.append((desc, summary, score))

    for i, (desc, summary, score) in enumerate(final_results, 1):
        print(f"\n✅ Description {i} [Score: {score:.2f}]:\n{desc}")
        print(f"📝 Résumé qualité:\n{summary}")

    generate_image_placeholder()
    return final_results

# ============================
# Placeholder image
# ============================
def generate_image_placeholder():
    print("🖼️ Étape génération image (placeholder)")

# ============================
# Documentation
# ============================
def document_pipeline():
    print("""
Résumé pipeline IA mode (CPU-friendly):
- Génération : distilgpt2
- Résumé qualité : BART-base
- Scoring : mots-clés + longueur + originalité
- Filtre éthique simple
- Image : placeholder
- Utilisation : run_pipeline("mot-clé mode")
""")

# ============================
# Exemple d’utilisation
# ============================
if __name__ == "__main__":
    keyword = "denim jacket"
    run_pipeline(keyword, num_variants=5)
    document_pipeline()


Device set to use cpu


Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=120) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



--- Génération pour: denim jacket ---

✅ Description 1 [Score: 4.13]:

 *ROLE:* You are a professional e-commerce copywriter specializing in luxury and contemporary fashion. Your writing is sophisticated, evocative, and persuasive.
 
 *TASK:* Write a compelling product description for the item provided.
 
 *GUIDELINES:*
 1.  *Hook:* Start with a captivating opening sentence.
 2.  *Details:* Weave in key details about material (e.g., "supple leather," "crisp cotton"), fit ("tailored silhouette," "relaxed cut"), and unique features ("polished hardware," "artisanal stitching").
 3.  *Versatility:* Suggest how or where the item can be worn to help the customer visualize it in their life.
 4.  *Tone:* Maintain an elegant, confident, and aspirational tone.
 5.  *Length:* Keep the description concise and impactful, around 3-4 sentences.
 
 ---
 *EXAMPLES:*
 
 *Item:* Tailored Wool Coat
 *Description:* Experience enduring elegance with our signature Tailored Wool Coat. Meticulously crafted fr