Skip to content

Human vs AI summarization game for Retail exhibition (Nov 24-25, 2025, Paris) - Bilingual interactive demo showcasing Adservio's sovereign AI

License

Notifications You must be signed in to change notification settings

Adservio-Dev/retail-summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Résumé Express – Humain vs IA

A bilingual "Human vs AI" summarization game for retail exhibitions

Adservio build sovereign, aligned, responsible and frugal AI.


Overview

This Streamlit application creates an engaging "Human vs AI" game for exhibition stands. Visitors:

  1. See a short Wikipedia text for a few seconds
  2. Write their own summary from memory
  3. Compare their summary with an AI-generated summary
  4. Reveal the original text to see who did better

The app is fully bilingual (French/English) and designed for live demonstrations at retail expos like Porte de Versailles.

Key Features

Core Gameplay

  • Bilingual interface: French and English with instant language switching
  • Two-pass AI summarization: Guarantees high-quality, consistent summaries even with small models
  • Local LLM execution: Uses Ollama to run models locally (no external API calls)
  • Online & Offline modes: Fetch live Wikipedia content or use pre-built corpus
  • Multiple model support: Granite (default), Llama, DeepSeek, Mistral
  • Dark theme: Professional exhibition-ready interface
  • Always-visible branding: Adservio logo and motto displayed at all times
  • Countdown timer: Visual circular countdown showing remaining time

Semantic Analysis (v0.5.0+)

  • Semantic scoreboard: Multi-dimensional evaluation with 6 metrics
  • Concept highlighting: Visual highlighting of semantically matching content
  • Matrix visualization: Interactive heatmap showing phrase-level correspondences
  • Cross-language support: Works even when comparing French ↔ English summaries
  • Pedagogical tooltips: Mouseover explanations with mathematical formulas
  • "Cheat" mode: Iterative learning - revise and recalculate your scores

Game Rules

How to Play

  1. 📖 Read Phase (configurable time, default 10s)

    • A Wikipedia text excerpt appears on screen (60-120 words)
    • A countdown circle shows remaining time
    • Read and memorize as much as you can!
  2. ✍️ Write Phase (unlimited time)

    • Text disappears
    • Write your summary from memory
    • Try to capture the key information in 2-3 sentences
    • Click "Valider mon résumé / Submit my summary"
  3. 🤖 AI Challenge

    • The AI generates its own summary using two-pass summarization
    • First pass: Extract key facts (hidden)
    • Second pass: Generate polished 2-sentence summary (visible)
  4. 📊 Comparison & Results

    • See both summaries side-by-side
    • View semantic scoreboard with detailed metrics
    • Optional: Highlight matching concepts
    • Optional: View correspondence analysis matrix
    • Reveal original text to see who captured it better
  5. 🎯 "Cheat" to Learn (optional)

    • Edit your summary after seeing results
    • Click "Tricher / Cheat" to recalculate scores
    • Experiment with different phrasings
    • Learn what makes a good summary!

Winning Conditions

The semantic scoreboard determines the winner using a composite score (0-100):

  • Score difference > 2 points: Clear winner
  • Score difference ≤ 2 points: Tie (both did well!)

The AI is strong, but humans can win by:

  • Capturing key concepts precisely
  • Staying focused (no off-topic content)
  • Writing concise, well-structured summaries

Semantic Metrics Explained

The app uses multilingual sentence embeddings (384-dimensional vectors) to evaluate summaries semantically, not just word-by-word.

🏆 Semantic Scoreboard (6 Metrics)

1. Final Score (/100)

  • What it measures: Overall summary quality
  • Formula: S = 100 × [α·sim + β·cov + γ·focus] × penalty
    • α (global similarity weight) = 0.4
    • β (coverage weight) = 0.3
    • γ (focus weight) = 0.3
  • Interpretation: Higher is better. Combines all metrics with length penalty.

2. Global Similarity (%)

  • What it measures: Overall semantic closeness to original text
  • Method: Cosine similarity between mean embeddings (384-D vectors)
  • Range: 0-100%
  • Interpretation: Measures if summary captures the "meaning" globally

3. Concept Coverage (0–1)

  • What it measures: Fraction of original text concepts captured
  • Formula: coverage = (matched phrases) / (total reference phrases)
  • Threshold: Phrase similarity ≥ 65%
  • Interpretation: Did you cover all the important points?

4. Semantic Focus (0–1)

  • What it measures: Fraction of summary content that's relevant
  • Formula: focus = (aligned phrases) / (total summary phrases)
  • Threshold: Phrase similarity ≥ 65%
  • Interpretation: Did you stay on-topic? (Penalizes hallucinations/off-topic content)

5. Length Penalty (0–1)

  • What it measures: Penalty for summaries too short or too long
  • Formula: p = exp(-((n-50)/25)²) where n = word count
  • Optimum: 50 words
  • Interpretation: Gaussian penalty centered at target length

6. Word Count

  • What it measures: Total words in summary
  • Method: Alphanumeric tokens only
  • Interpretation: Context for understanding length penalty

📊 Correspondence Analysis

Matrix Heatmap Visualization showing ALL phrase-level relationships:

  • Rows (A, B, C...): Phrases from your summary
  • Columns (1, 2, 3...): Phrases from original text
  • Cell colors:
    • 🟢 Green (>85%): Strong semantic match
    • 🟠 Orange (70-85%): Medium match
    • Gray (<70%): Weak match
  • Cell values: Cosine similarity percentage (0-100%)

What it reveals:

  • Which parts of your summary match which parts of the original
  • One-to-many relationships (one summary phrase capturing multiple original concepts)
  • Gaps in coverage (original concepts you missed)

🔍 Concept Highlighting

Toggle buttons to visually highlight matching content:

  • Orange background on semantically similar phrases (≥65% similarity)
  • Content words only (nouns, verbs, proper names)
  • Stopwords excluded (70+ common FR/EN function words)

🎯 Technical Notes

  • Model: paraphrase-multilingual-MiniLM-L12-v2 (384 dimensions)
  • Languages: 50+ supported (FR, EN, and more)
  • Comparison level: Phrase-level (3-12 words), not word-by-word
  • Threshold: 0.65 for coverage/focus (configurable)
  • Cross-language: Works for FR ↔ EN comparisons!

Quick Start

Prerequisites

  • Conda or Python 3.11+
  • Ollama installed and running (see ollama.com)

Installation

  1. Clone or download this repository

  2. Create and activate the conda environment:

conda env create -f environment.yaml
conda activate retail-summarizer
  1. Install dependencies (if needed):
pip install -r requirements.txt
  1. Install semantic embeddings model (for semantic analysis):
python scripts/preinstall_semantics_model.py

This downloads and caches the multilingual embedding model (~100MB) for offline use.

  1. Pull required Ollama models:
# Minimum (default model)
ollama pull granite3.1-moe:3b

# Optional alternatives
ollama pull llama3:latest
ollama pull deepseek-r1:14b
ollama pull mistral:latest
  1. Generate offline corpus (recommended for exhibition use):
# French corpus (300 excerpts)
python scripts/build_offline_corpus.py build --lang fr --n 300

# English corpus (300 excerpts)
python scripts/build_offline_corpus.py build --lang simple --n 300
  1. Start the application:
streamlit run app.py

The app will open in your browser at http://localhost:8501.


Configuration

Models (config/models.json)

Define available LLM models for Ollama. Each model needs:

  • id: Unique identifier
  • label: Display name in UI
  • ollama_name: Exact Ollama model name (e.g., granite3.1-moe:3b)
  • description: User-friendly description

The default_model_id specifies which model loads on startup.

App Settings (config/app_config.yaml)

Key configuration options:

General Settings

  • default_language: Starting language ("fr" or "en")
  • default_display_time: Seconds to show original text (default: 10)
  • min_words / max_words: Wikipedia excerpt size range (60-120)
  • ollama.host / ollama.port: Ollama server connection

Summarization Settings

  • summarization.target_sentences: Target summary length (default: 2)
  • summarization.max_attempts: Retry attempts for sentence validation (default: 2)

Semantic Analysis Settings (v0.5.0+)

  • semantics.reference_source: Comparison reference ("raw" = original text, "internal" = AI's first pass)
  • semantics.similarity_threshold: Threshold for phrase matching (default: 0.65, range: 0.60-0.75)
  • semantics.target_word_count: Optimal summary length for penalty (default: 50 words)
  • semantics.weights.alpha: Weight for global similarity (default: 0.4)
  • semantics.weights.beta: Weight for concept coverage (default: 0.3)
  • semantics.weights.gamma: Weight for semantic focus (default: 0.3)

Tuning tips:

  • Lower similarity_threshold (0.60-0.63) for more lenient matching
  • Higher similarity_threshold (0.70-0.75) for stricter matching
  • Adjust target_word_count based on your typical text lengths
  • Weights must sum to 1.0 for proper scaling

Usage

Main Workflow

  1. Select language (FR/EN) in sidebar
  2. Choose LLM model from available options
  3. Select mode: Online (live Wikipedia) or Offline (local corpus)
  4. Adjust display time (5-60 seconds)
  5. Click "Nouveau texte / New text" to load a random excerpt
  6. Click "Démarrer / Start" to begin the round
  7. Read the text before it disappears
  8. Write your summary from memory
  9. Submit and see the AI's summary
  10. Compare and reveal the original text

Two-Pass Summarization

The app uses a clever "two-pass" strategy to ensure high-quality AI summaries:

  1. First pass (hidden): Extract key information from the full text
  2. Second pass (visible): Generate a polished N-sentence summary from the internal summary

This approach:

  • Improves consistency with small models
  • Reduces hallucinations
  • Produces cleaner, more focused summaries
  • Validates sentence count and retries if needed

Offline Corpus

For reliable exhibition use without internet dependency, pre-generate a local corpus:

Build Corpus

# French Wikipedia
python scripts/build_offline_corpus.py build --lang fr --n 300

# Simple English Wikipedia
python scripts/build_offline_corpus.py build --lang simple --n 300

# Append more entries
python scripts/build_offline_corpus.py build --lang fr --n 100 --append

View Corpus Statistics

python scripts/build_offline_corpus.py stats

The corpus is saved in data/wiki_corpus.jsonl (JSONL format, one entry per line).


Technical Details

Architecture

  • Frontend: Streamlit with custom CSS for dark theme
  • LLM Backend: Ollama (local inference)
  • Data Source: Wikipedia API (online) or local JSONL corpus (offline)
  • State Management: Streamlit session state

Models

Default: Granite 3.1 MoE 3B

  • Multilingual (French/English)
  • Fast on CPU/GPU
  • Good balance of size vs. quality

Alternatives:

  • Llama 3: Strong general-purpose performance
  • DeepSeek R1 14B: Larger reasoning model (slower)
  • Mistral: Compact and efficient

File Structure

.
├── app.py                            # Main Streamlit application (1700+ lines)
├── semantics_utils.py                # Semantic analysis module (v0.5.0+)
├── CLAUDE.md                         # Project specification
├── README.md                         # This file
├── CHANGELOG.md                      # Version history
├── VERSION.txt                       # Current version
├── SEMANTICS_IMPLEMENTATION.md       # Semantic analysis documentation
├── environment.yaml                  # Conda environment
├── requirements.txt                  # Python dependencies
├── config/
│   ├── models.json                  # LLM model definitions
│   └── app_config.yaml              # App settings (incl. semantic config)
├── data/
│   └── wiki_corpus.jsonl            # Offline corpus (generated, git-ignored)
├── models/
│   └── embeddings/                  # Cached embedding model (git-ignored)
├── scripts/
│   ├── build_offline_corpus.py     # Corpus builder CLI
│   └── preinstall_semantics_model.py # Embedding model downloader
├── assets/
│   └── adservio-logo.svg           # Adservio branding
└── .streamlit/
    └── config.toml                 # Streamlit theme (dark mode)

Troubleshooting

"Cannot connect to Ollama"

Solution: Ensure Ollama is running:

# Check if running
ollama list

# If not running, start it (varies by OS)
# On Linux:
ollama serve

"Offline corpus not available"

Solution: Generate the corpus first:

python scripts/build_offline_corpus.py build --lang fr --n 300

Models not found

Solution: Pull the required models:

ollama pull granite3.1-moe:3b

Slow performance

Solutions:

  • Use a smaller model (Granite 3B or Mistral)
  • Reduce max_words in config (fewer words to process)
  • Use offline mode (eliminates Wikipedia fetch time)
  • Ensure Ollama is using GPU if available

Exhibition Tips

Before the Event

  1. Pre-install everything offline:

    # Download embedding model
    python scripts/preinstall_semantics_model.py
    
    # Pull Ollama models
    ollama pull granite3.1-moe:3b
    
    # Build offline corpus (600+ entries)
    python scripts/build_offline_corpus.py build --lang fr --n 300
    python scripts/build_offline_corpus.py build --lang simple --n 300
  2. Test on exhibition hardware:

    • Run end-to-end gameplay with semantic analysis
    • Verify embedding model loads quickly (<2s)
    • Test both French and English modes
    • Check matrix visualization renders properly
  3. Configure for optimal experience:

    • Set default_display_time: 8 (challenging but fair)
    • Use reference_source: "raw" for fair scoring
    • Keep similarity_threshold: 0.65 for balanced matching

During the Event

  1. Use offline mode - No internet dependency
  2. Keep Granite 3B model - Best speed/quality balance
  3. Show tooltips - Hover over metrics to explain to technical visitors
  4. Encourage "cheating" - Let visitors iterate and learn!
  5. Highlight cross-language - Show FR summary vs EN original (impressive!)

Engagement Strategies

  1. Challenge visitors: "Can you beat the AI in 10 seconds?"
  2. Show the matrix: "See how your words map to the original"
  3. Explain the math: Hover tooltips show formulas (for engineers/academics)
  4. Iterate mode: "Try again with the Cheat button - learn what works!"
  5. Multilingual demo: "Write in English, compare to French text - it works!"

Troubleshooting On-Site

  • Slow semantic analysis: Check if embedding model is cached (models/embeddings/)
  • Coverage/focus = 0.00: Lower threshold to 0.60 in config
  • Matrix too large: Increase max_words in config to get shorter texts
  • AI summaries too long: Check Ollama model is properly loaded

Credits

Author: Olivier Vitrac, PhD, HDR Email: olivier.vitrac@adservio.fr Organization: Adservio – Innovation Lab

Technologies:

  • Streamlit - UI framework with custom dark theme
  • Ollama - Local LLM serving (no external API calls)
  • Wikipedia API - Content source (online mode)
  • Sentence Transformers - Multilingual embeddings (semantic analysis)
  • PyTorch - Deep learning backend
  • scikit-learn - Cosine similarity computations
  • Language Models:
    • Granite 3.1 MoE 3B (default)
    • Llama 3 Latest
    • DeepSeek R1 14B
    • Mistral Latest
  • Embedding Model: paraphrase-multilingual-MiniLM-L12-v2 (384-D, 50+ languages)

License

© 2025 Adservio. All rights reserved.

Adservio build sovereign, aligned, responsible and frugal AI.


Quick Reference Card

For Operators

Starting the app:

conda activate retail-summarizer
streamlit run app.py

Key buttons:

  • Nouveau texte - Load new Wikipedia excerpt
  • Démarrer - Start countdown timer
  • Valider mon résumé - Submit human summary (triggers AI)
  • Tricher - Recalculate after editing summary
  • Analyse des correspondances - Show matrix visualization

Metrics to explain:

  1. Score final - Overall quality (0-100)
  2. Similarité globale - Semantic closeness (%)
  3. Couverture - Concepts captured (0-1)
  4. Focus - Content relevance (0-1)
  5. Pénalité - Length penalty (0-1)

Troubleshooting:

  • Slow? Check offline mode is enabled
  • Coverage = 0? Normal for very different summaries
  • AI not responding? Check ollama list in terminal

For Technical Visitors

Hover over ℹ️ icons for:

  • Mathematical formulas
  • Technical explanations
  • Vector space dimensions

Show off features:

  • Cross-language comparison (FR ↔ EN)
  • Matrix heatmap (phrase-level matching)
  • Semantic embeddings (384 dimensions)
  • Local execution (no API calls)

Key technical points:

  • Multilingual transformer embeddings
  • Cosine similarity in 384-D space
  • Phrase-level granularity (not word-by-word)
  • Threshold: 65% for phrase matching
  • Two-pass summarization for stability

Version

Current Version: 0.5.0 (2025-11-22)

Major Features:

  • v0.1: Basic gameplay
  • v0.2: Bilingual support
  • v0.3: Rules display, countdown timer
  • v0.4: Semantic analysis, scoreboard, concept highlighting
  • v0.5: Matrix visualization, tooltips, cheat mode, phrase-level metrics

See CHANGELOG.md for detailed version history.

About

Human vs AI summarization game for Retail exhibition (Nov 24-25, 2025, Paris) - Bilingual interactive demo showcasing Adservio's sovereign AI

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published