# Extract main content from contributions

We based the methodology to the impressive feature of LLM to reformulate the textual content.
The propose approach is to use LLM to reformulate the contribution as a list of main ideas.
More specifically, we use the LLM to reformulate the contributions as a set of opinions and propositions.

**WARNING**:
The code below works for CUDA compatible GPU.

In [1]:
import gc
import torch

# to free the memory if already mounted model in GPU memory
gc.collect()
torch.cuda.empty_cache()

## Load the LLM model and its configuration

In [2]:
import tomllib
from pathlib import Path

CONFIG_REPO = Path("../config").resolve()
CONFIG_PATH = CONFIG_REPO / "llama-3.1-8B-FR.toml"
PROMPT_PATH = CONFIG_REPO / "prompt.toml"

with open(CONFIG_PATH, "rb") as file:
    configs = tomllib.load(file)

model_id = configs["model"]["name"]
top_p = configs["model"]["top_p"]
temperature = configs["model"]["temperature"]
sampling_params = dict(top_p=top_p, temperature=temperature, max_tokens=2048)

In [3]:
import json
from vllm import LLM, SamplingParams

size = 4096 - 1024
llm = LLM(
    model=model_id,
    task="generate",
    max_num_seqs=1,
    max_model_len=size,
    max_num_batched_tokens=size,
    quantization="awq_marlin",
    gpu_memory_utilization=0.95,
    # enforce_eager=True,
)

sampling_params = SamplingParams(**sampling_params)

  from .autonotebook import tqdm as notebook_tqdm


INFO 05-11 20:34:30 [__init__.py:239] Automatically detected platform cuda.


2025-05-11 20:34:32,722	INFO util.py:154 -- Missing packages: ['ipywidgets']. Run `pip install -U ipywidgets`, then restart the notebook server for rich notebook output.


INFO 05-11 20:34:45 [awq_marlin.py:113] The model is convertible to awq_marlin during runtime. Using awq_marlin kernel.
INFO 05-11 20:34:45 [config.py:2003] Chunked prefill is enabled with max_num_batched_tokens=3072.
INFO 05-11 20:34:46 [core.py:58] Initializing a V1 LLM engine (v0.8.5.post1) with config: model='hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', speculative_config=None, tokenizer='hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=3072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=awq_marlin, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_me

Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:11<00:11, 11.95s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:45<00:00, 24.49s/it]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:45<00:00, 22.61s/it]



INFO 05-11 20:35:35 [loader.py:458] Loading weights took 45.32 seconds
INFO 05-11 20:35:37 [gpu_model_runner.py:1347] Model loading took 5.3744 GiB and 47.451011 seconds
INFO 05-11 20:35:51 [backends.py:420] Using cache directory: /home/machine_learning/.cache/vllm/torch_compile_cache/9893167dc5/rank_0_0 for vLLM's torch.compile
INFO 05-11 20:35:51 [backends.py:430] Dynamo bytecode transform time: 13.86 s
INFO 05-11 20:35:57 [backends.py:118] Directly load the compiled graph(s) for shape None from the cache, took 4.761 s
INFO 05-11 20:35:58 [monitor.py:33] torch.compile takes 13.86 s in total
INFO 05-11 20:36:02 [kv_cache_utils.py:634] GPU KV cache size: 5,824 tokens
INFO 05-11 20:36:02 [kv_cache_utils.py:637] Maximum concurrency for 3,072 tokens per request: 1.90x
INFO 05-11 20:36:35 [gpu_model_runner.py:1686] Graph capturing finished in 34 secs, took 0.62 GiB
INFO 05-11 20:36:35 [core.py:159] init engine (profile, create kv cache, warmup model) took 58.89 seconds
INFO 05-11 20:36:35 

## Extract the main ideas

In [4]:
from datasets import Dataset

data_path = Path("../data") / "raw" / "contributions"
dataset = Dataset.load_from_disk(str(data_path.resolve()))

[2025-05-11 20:36:37] INFO config.py:54: PyTorch version 2.6.0 available.


In [5]:
contribution = dataset[10]['contribution']
print(contribution)

Rétablir l'ISF sans délai avec incorporation dans son assiette de tous les éléments de fortune (objets d'art inclus + yachts etc) sans aucune autre dérogation ni exception qu'une fraction (25%) de la résidence principale ET les sommes investies durablement (5 ans) dans une entreprise française pour une affectation en FRANCE


In [None]:
Rétablir l'ISF sans délai avec incorporation dans son assiette de tous les éléments de fortune (objets d'art inclus + yachts etc) sans aucune autre dérogation ni exception qu'une fraction (25%) de la résidence principale ET les sommes investies durablement (5 ans) dans une entreprise française pour une affectation en FRANCE

In [140]:
with open(PROMPT_PATH, "rb") as file:
    prompt_configs = tomllib.load(file)

system_message = prompt_configs["prompt"]["system"]
user_message = prompt_configs["prompt"]["user"]

In [141]:
messages = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": user_message.format(input=contribution)},
    {"role": "assistant", "content": "description,type,syntax,semantic"},
]

output = llm.chat(messages, sampling_params=sampling_params)

Processed prompts: 100%|██████████| 1/1 [00:32<00:00, 32.39s/it, est. speed input: 12.66 toks/s, output: 3.71 toks/s]


In [142]:
import pandas as pd
import io

csv_data = output[0].outputs[0].text
df = pd.read_csv(io.StringIO(csv_data), header=None)

In [143]:
csv_data

'"Rétablir l\'ISF sans délai",statement,positive,neutral\n"Incorporer dans son assiette de tous les éléments de fortune",statement,positive,negative\n"Objets d\'art inclus",statement,positive,negative\n"Yachts etc",statement,positive,negative\n"Une fraction (25%) de la résidence principale",statement,negative,negative\n"Les sommes investies durablement (5 ans) dans une entreprise française",statement,positive,negative\n"Affectation en FRANCE",statement,positive,negative'

In [144]:
df

Unnamed: 0,0,1,2,3
0,Rétablir l'ISF sans délai,statement,positive,neutral
1,Incorporer dans son assiette de tous les éléme...,statement,positive,negative
2,Objets d'art inclus,statement,positive,negative
3,Yachts etc,statement,positive,negative
4,Une fraction (25%) de la résidence principale,statement,negative,negative
5,Les sommes investies durablement (5 ans) dans ...,statement,positive,negative
6,Affectation en FRANCE,statement,positive,negative


**Observations:**
- Small LLM sometimes struggle to extract properly the main ideas from the original content.
- the JSON format seems more stable but it costs more token so is more slow to be generated.

Above, we've done the extraction for one contribution. Now, you can imagine to realize the extraction for thousands of contributions. Keep track of the extractions for each contribution to get the main statements and propositions.