Résumé Express – Humain vs IA

A bilingual "Human vs AI" summarization game for retail exhibitions

Adservio build sovereign, aligned, responsible and frugal AI.

Overview

This Streamlit application creates an engaging "Human vs AI" game for exhibition stands. Visitors:

See a short Wikipedia text for a few seconds
Write their own summary from memory
Compare their summary with an AI-generated summary
Reveal the original text to see who did better

The app is fully bilingual (French/English) and designed for live demonstrations at retail expos like Porte de Versailles.

Key Features

Core Gameplay

Bilingual interface: French and English with instant language switching
Two-pass AI summarization: Guarantees high-quality, consistent summaries even with small models
Local LLM execution: Uses Ollama to run models locally (no external API calls)
Online & Offline modes: Fetch live Wikipedia content or use pre-built corpus
Multiple model support: Granite (default), Llama, DeepSeek, Mistral
Dark theme: Professional exhibition-ready interface
Always-visible branding: Adservio logo and motto displayed at all times
Countdown timer: Visual circular countdown showing remaining time

Semantic Analysis (v0.5.0+)

Semantic scoreboard: Multi-dimensional evaluation with 6 metrics
Concept highlighting: Visual highlighting of semantically matching content
Matrix visualization: Interactive heatmap showing phrase-level correspondences
Cross-language support: Works even when comparing French ↔ English summaries
Pedagogical tooltips: Mouseover explanations with mathematical formulas
"Cheat" mode: Iterative learning - revise and recalculate your scores

Game Rules

How to Play

📖 Read Phase (configurable time, default 10s)
- A Wikipedia text excerpt appears on screen (60-120 words)
- A countdown circle shows remaining time
- Read and memorize as much as you can!
✍️ Write Phase (unlimited time)
- Text disappears
- Write your summary from memory
- Try to capture the key information in 2-3 sentences
- Click "Valider mon résumé / Submit my summary"
🤖 AI Challenge
- The AI generates its own summary using two-pass summarization
- First pass: Extract key facts (hidden)
- Second pass: Generate polished 2-sentence summary (visible)
📊 Comparison & Results
- See both summaries side-by-side
- View semantic scoreboard with detailed metrics
- Optional: Highlight matching concepts
- Optional: View correspondence analysis matrix
- Reveal original text to see who captured it better
🎯 "Cheat" to Learn (optional)
- Edit your summary after seeing results
- Click "Tricher / Cheat" to recalculate scores
- Experiment with different phrasings
- Learn what makes a good summary!

Winning Conditions

The semantic scoreboard determines the winner using a composite score (0-100):

Score difference > 2 points: Clear winner
Score difference ≤ 2 points: Tie (both did well!)

The AI is strong, but humans can win by:

Capturing key concepts precisely
Staying focused (no off-topic content)
Writing concise, well-structured summaries

Semantic Metrics Explained

The app uses multilingual sentence embeddings (384-dimensional vectors) to evaluate summaries semantically, not just word-by-word.

🏆 Semantic Scoreboard (6 Metrics)

1. Final Score (/100)

What it measures: Overall summary quality
Formula: S = 100 × [α·sim + β·cov + γ·focus] × penalty
- α (global similarity weight) = 0.4
- β (coverage weight) = 0.3
- γ (focus weight) = 0.3
Interpretation: Higher is better. Combines all metrics with length penalty.

2. Global Similarity (%)

What it measures: Overall semantic closeness to original text
Method: Cosine similarity between mean embeddings (384-D vectors)
Range: 0-100%
Interpretation: Measures if summary captures the "meaning" globally

3. Concept Coverage (0–1)

What it measures: Fraction of original text concepts captured
Formula: coverage = (matched phrases) / (total reference phrases)
Threshold: Phrase similarity ≥ 65%
Interpretation: Did you cover all the important points?

4. Semantic Focus (0–1)

What it measures: Fraction of summary content that's relevant
Formula: focus = (aligned phrases) / (total summary phrases)
Threshold: Phrase similarity ≥ 65%
Interpretation: Did you stay on-topic? (Penalizes hallucinations/off-topic content)

5. Length Penalty (0–1)

What it measures: Penalty for summaries too short or too long
Formula: p = exp(-((n-50)/25)²) where n = word count
Optimum: 50 words
Interpretation: Gaussian penalty centered at target length

6. Word Count

What it measures: Total words in summary
Method: Alphanumeric tokens only
Interpretation: Context for understanding length penalty

📊 Correspondence Analysis

Matrix Heatmap Visualization showing ALL phrase-level relationships:

Rows (A, B, C...): Phrases from your summary
Columns (1, 2, 3...): Phrases from original text
Cell colors:
- 🟢 Green (>85%): Strong semantic match
- 🟠 Orange (70-85%): Medium match
- ⚪ Gray (<70%): Weak match
Cell values: Cosine similarity percentage (0-100%)

What it reveals:

Which parts of your summary match which parts of the original
One-to-many relationships (one summary phrase capturing multiple original concepts)
Gaps in coverage (original concepts you missed)

🔍 Concept Highlighting

Toggle buttons to visually highlight matching content:

Orange background on semantically similar phrases (≥65% similarity)
Content words only (nouns, verbs, proper names)
Stopwords excluded (70+ common FR/EN function words)

🎯 Technical Notes

Model: paraphrase-multilingual-MiniLM-L12-v2 (384 dimensions)
Languages: 50+ supported (FR, EN, and more)
Comparison level: Phrase-level (3-12 words), not word-by-word
Threshold: 0.65 for coverage/focus (configurable)
Cross-language: Works for FR ↔ EN comparisons!

Quick Start

Prerequisites

Conda or Python 3.11+
Ollama installed and running (see ollama.com)

Installation

Clone or download this repository
Create and activate the conda environment:

conda env create -f environment.yaml
conda activate retail-summarizer

Install dependencies (if needed):

pip install -r requirements.txt

Install semantic embeddings model (for semantic analysis):

python scripts/preinstall_semantics_model.py

This downloads and caches the multilingual embedding model (~100MB) for offline use.

Pull required Ollama models:

# Minimum (default model)
ollama pull granite3.1-moe:3b

# Optional alternatives
ollama pull llama3:latest
ollama pull deepseek-r1:14b
ollama pull mistral:latest

Generate offline corpus (recommended for exhibition use):

# French corpus (300 excerpts)
python scripts/build_offline_corpus.py build --lang fr --n 300

# English corpus (300 excerpts)
python scripts/build_offline_corpus.py build --lang simple --n 300

Start the application:

streamlit run app.py

The app will open in your browser at http://localhost:8501.

Configuration

Models (`config/models.json`)

Define available LLM models for Ollama. Each model needs:

id: Unique identifier
label: Display name in UI
ollama_name: Exact Ollama model name (e.g., granite3.1-moe:3b)
description: User-friendly description

The default_model_id specifies which model loads on startup.

App Settings (`config/app_config.yaml`)

Key configuration options:

General Settings

default_language: Starting language ("fr" or "en")
default_display_time: Seconds to show original text (default: 10)
min_words / max_words: Wikipedia excerpt size range (60-120)
ollama.host / ollama.port: Ollama server connection

Summarization Settings

summarization.target_sentences: Target summary length (default: 2)
summarization.max_attempts: Retry attempts for sentence validation (default: 2)

Semantic Analysis Settings (v0.5.0+)

semantics.reference_source: Comparison reference ("raw" = original text, "internal" = AI's first pass)
semantics.similarity_threshold: Threshold for phrase matching (default: 0.65, range: 0.60-0.75)
semantics.target_word_count: Optimal summary length for penalty (default: 50 words)
semantics.weights.alpha: Weight for global similarity (default: 0.4)
semantics.weights.beta: Weight for concept coverage (default: 0.3)
semantics.weights.gamma: Weight for semantic focus (default: 0.3)

Tuning tips:

Lower similarity_threshold (0.60-0.63) for more lenient matching
Higher similarity_threshold (0.70-0.75) for stricter matching
Adjust target_word_count based on your typical text lengths
Weights must sum to 1.0 for proper scaling

Usage

Main Workflow

Select language (FR/EN) in sidebar
Choose LLM model from available options
Select mode: Online (live Wikipedia) or Offline (local corpus)
Adjust display time (5-60 seconds)
Click "Nouveau texte / New text" to load a random excerpt
Click "Démarrer / Start" to begin the round
Read the text before it disappears
Write your summary from memory
Submit and see the AI's summary
Compare and reveal the original text

Two-Pass Summarization

The app uses a clever "two-pass" strategy to ensure high-quality AI summaries:

First pass (hidden): Extract key information from the full text
Second pass (visible): Generate a polished N-sentence summary from the internal summary

This approach:

Improves consistency with small models
Reduces hallucinations
Produces cleaner, more focused summaries
Validates sentence count and retries if needed

Offline Corpus

For reliable exhibition use without internet dependency, pre-generate a local corpus:

Build Corpus

# French Wikipedia
python scripts/build_offline_corpus.py build --lang fr --n 300

# Simple English Wikipedia
python scripts/build_offline_corpus.py build --lang simple --n 300

# Append more entries
python scripts/build_offline_corpus.py build --lang fr --n 100 --append

View Corpus Statistics

python scripts/build_offline_corpus.py stats

The corpus is saved in data/wiki_corpus.jsonl (JSONL format, one entry per line).

Technical Details

Architecture

Frontend: Streamlit with custom CSS for dark theme
LLM Backend: Ollama (local inference)
Data Source: Wikipedia API (online) or local JSONL corpus (offline)
State Management: Streamlit session state

Models

Default: Granite 3.1 MoE 3B

Multilingual (French/English)
Fast on CPU/GPU
Good balance of size vs. quality

Alternatives:

Llama 3: Strong general-purpose performance
DeepSeek R1 14B: Larger reasoning model (slower)
Mistral: Compact and efficient

File Structure

.
├── app.py                            # Main Streamlit application (1700+ lines)
├── semantics_utils.py                # Semantic analysis module (v0.5.0+)
├── CLAUDE.md                         # Project specification
├── README.md                         # This file
├── CHANGELOG.md                      # Version history
├── VERSION.txt                       # Current version
├── SEMANTICS_IMPLEMENTATION.md       # Semantic analysis documentation
├── environment.yaml                  # Conda environment
├── requirements.txt                  # Python dependencies
├── config/
│   ├── models.json                  # LLM model definitions
│   └── app_config.yaml              # App settings (incl. semantic config)
├── data/
│   └── wiki_corpus.jsonl            # Offline corpus (generated, git-ignored)
├── models/
│   └── embeddings/                  # Cached embedding model (git-ignored)
├── scripts/
│   ├── build_offline_corpus.py     # Corpus builder CLI
│   └── preinstall_semantics_model.py # Embedding model downloader
├── assets/
│   └── adservio-logo.svg           # Adservio branding
└── .streamlit/
    └── config.toml                 # Streamlit theme (dark mode)

Troubleshooting

"Cannot connect to Ollama"

Solution: Ensure Ollama is running:

# Check if running
ollama list

# If not running, start it (varies by OS)
# On Linux:
ollama serve

"Offline corpus not available"

Solution: Generate the corpus first:

python scripts/build_offline_corpus.py build --lang fr --n 300

Models not found

Solution: Pull the required models:

ollama pull granite3.1-moe:3b

Slow performance

Solutions:

Use a smaller model (Granite 3B or Mistral)
Reduce max_words in config (fewer words to process)
Use offline mode (eliminates Wikipedia fetch time)
Ensure Ollama is using GPU if available

Exhibition Tips

Before the Event

Pre-install everything offline:

# Download embedding model
python scripts/preinstall_semantics_model.py

# Pull Ollama models
ollama pull granite3.1-moe:3b

# Build offline corpus (600+ entries)
python scripts/build_offline_corpus.py build --lang fr --n 300
python scripts/build_offline_corpus.py build --lang simple --n 300

Test on exhibition hardware:
- Run end-to-end gameplay with semantic analysis
- Verify embedding model loads quickly (<2s)
- Test both French and English modes
- Check matrix visualization renders properly
Configure for optimal experience:
- Set default_display_time: 8 (challenging but fair)
- Use reference_source: "raw" for fair scoring
- Keep similarity_threshold: 0.65 for balanced matching

During the Event

Use offline mode - No internet dependency
Keep Granite 3B model - Best speed/quality balance
Show tooltips - Hover over metrics to explain to technical visitors
Encourage "cheating" - Let visitors iterate and learn!
Highlight cross-language - Show FR summary vs EN original (impressive!)

Engagement Strategies

Challenge visitors: "Can you beat the AI in 10 seconds?"
Show the matrix: "See how your words map to the original"
Explain the math: Hover tooltips show formulas (for engineers/academics)
Iterate mode: "Try again with the Cheat button - learn what works!"
Multilingual demo: "Write in English, compare to French text - it works!"

Troubleshooting On-Site

Slow semantic analysis: Check if embedding model is cached (models/embeddings/)
Coverage/focus = 0.00: Lower threshold to 0.60 in config
Matrix too large: Increase max_words in config to get shorter texts
AI summaries too long: Check Ollama model is properly loaded

Credits

Author: Olivier Vitrac, PhD, HDR Email: olivier.vitrac@adservio.fr Organization: Adservio – Innovation Lab

Technologies:

Streamlit - UI framework with custom dark theme
Ollama - Local LLM serving (no external API calls)
Wikipedia API - Content source (online mode)
Sentence Transformers - Multilingual embeddings (semantic analysis)
PyTorch - Deep learning backend
scikit-learn - Cosine similarity computations
Language Models:
- Granite 3.1 MoE 3B (default)
- Llama 3 Latest
- DeepSeek R1 14B
- Mistral Latest
Embedding Model: paraphrase-multilingual-MiniLM-L12-v2 (384-D, 50+ languages)

License

Adservio build sovereign, aligned, responsible and frugal AI.

Quick Reference Card

For Operators

Starting the app:

conda activate retail-summarizer
streamlit run app.py

Key buttons:

Nouveau texte - Load new Wikipedia excerpt
Démarrer - Start countdown timer
Valider mon résumé - Submit human summary (triggers AI)
Tricher - Recalculate after editing summary
Analyse des correspondances - Show matrix visualization

Metrics to explain:

Score final - Overall quality (0-100)
Similarité globale - Semantic closeness (%)
Couverture - Concepts captured (0-1)
Focus - Content relevance (0-1)
Pénalité - Length penalty (0-1)

Troubleshooting:

Slow? Check offline mode is enabled
Coverage = 0? Normal for very different summaries
AI not responding? Check ollama list in terminal

For Technical Visitors

Hover over ℹ️ icons for:

Mathematical formulas
Technical explanations
Vector space dimensions

Show off features:

Cross-language comparison (FR ↔ EN)
Matrix heatmap (phrase-level matching)
Semantic embeddings (384 dimensions)
Local execution (no API calls)

Key technical points:

Multilingual transformer embeddings
Cosine similarity in 384-D space
Phrase-level granularity (not word-by-word)
Threshold: 65% for phrase matching
Two-pass summarization for stability

Version

Current Version: 0.5.0 (2025-11-22)

Major Features:

v0.1: Basic gameplay
v0.2: Bilingual support
v0.3: Rules display, countdown timer
v0.4: Semantic analysis, scoreboard, concept highlighting
v0.5: Matrix visualization, tooltips, cheat mode, phrase-level metrics

See CHANGELOG.md for detailed version history.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.streamlit		.streamlit
assets		assets
config		config
scripts		scripts
.gitignore		.gitignore
=1.4.0		=1.4.0
=2.0.0,		=2.0.0,
=3.0.0		=3.0.0
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
GITHUB_DEPLOYMENT.md		GITHUB_DEPLOYMENT.md
LAUNCH_INSTRUCTIONS.md		LAUNCH_INSTRUCTIONS.md
LICENSE		LICENSE
README.md		README.md
RELEASE_NOTES.md		RELEASE_NOTES.md
RULES.md		RULES.md
SEMANTICS.md		SEMANTICS.md
SEMANTICS_IMPLEMENTATION.md		SEMANTICS_IMPLEMENTATION.md
SYNOPSIS.md		SYNOPSIS.md
TEST_RESULTS.md		TEST_RESULTS.md
VERSION.txt		VERSION.txt
app.py		app.py
environment.yaml		environment.yaml
requirements.txt		requirements.txt
run.sh		run.sh
semantics_utils.py		semantics_utils.py

License

Adservio-Dev/retail-summarizer

Folders and files

Latest commit

History

Repository files navigation

Résumé Express – Humain vs IA

Overview

Key Features

Core Gameplay

Semantic Analysis (v0.5.0+)

Game Rules

How to Play

Winning Conditions

Semantic Metrics Explained

🏆 Semantic Scoreboard (6 Metrics)

1. Final Score (/100)

2. Global Similarity (%)

3. Concept Coverage (0–1)

4. Semantic Focus (0–1)

5. Length Penalty (0–1)

6. Word Count

📊 Correspondence Analysis

🔍 Concept Highlighting

🎯 Technical Notes

Quick Start

Prerequisites

Installation

Configuration

Models (config/models.json)

App Settings (config/app_config.yaml)

General Settings

Summarization Settings

Semantic Analysis Settings (v0.5.0+)

Usage

Main Workflow

Two-Pass Summarization

Offline Corpus

Build Corpus

View Corpus Statistics

Technical Details

Architecture

Models

File Structure

Troubleshooting

"Cannot connect to Ollama"

"Offline corpus not available"

Models not found

Slow performance

Exhibition Tips

Before the Event

During the Event

Engagement Strategies

Troubleshooting On-Site

Credits

License

Quick Reference Card

For Operators

For Technical Visitors

Version

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Models (`config/models.json`)

App Settings (`config/app_config.yaml`)

Packages