<a href="https://colab.research.google.com/github/dimitrisdais/generative-ai-lab/blob/summarization_task/notebooks/summarization_task.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 📝 Text Summarization with AI
👋 Hi, I'm **Dimitris Dais**, an engineer passionate about AI creativity tools.

This notebook demonstrates how to convert long documents into concise, informative summaries:  

Step 1: Load and preprocess the text data  
Step 2: BART as a Summarization Baseline  
Step 3: Finetuned Models for Longer Contexts and Structured Abstraction  
Step 4: Scaling Further: General-Purpose Open-Source LLMs (Qwen and Mistral)  

For more explanation, refer to the [corresponding blog](https://dimitrisdais.github.io/dimitris-dais.github.io/nlp/llm/summarization_task/).

Enjoyed it? Reuse or expand it — and feel free to connect.  

🔗 **Website**: [dimitrisdais.github.io](https://dimitrisdais.github.io/dimitris-dais.github.io/)  
📬 **Contact**: dimitris.dais.phd@gmail.com  
🐙 **GitHub**: [@dimitrisdais](https://github.com/dimitrisdais)  
🔗 **LinkedIn**: [linkedin.com/in/dimitris-dais](https://www.linkedin.com/in/dimitris-dais/)  
▶️ **YouTube**: [youtube.com/@dimitrisdais](https://www.youtube.com/channel/UCuSdAarhISVQzV2GhxaErsg)

![AI mastering the art of summarization.](https://raw.githubusercontent.com/dimitrisdais/generative-ai-lab/main/assets/images/robot_learning_to_summarize.png)

### 🔧 Install Required Packages

In [None]:
!pip install trafilatura
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git

Collecting trafilatura
  Downloading trafilatura-2.0.0-py3-none-any.whl.metadata (12 kB)
Collecting courlan>=1.3.2 (from trafilatura)
  Downloading courlan-1.3.2-py3-none-any.whl.metadata (17 kB)
Collecting htmldate>=1.9.2 (from trafilatura)
  Downloading htmldate-1.9.3-py3-none-any.whl.metadata (10 kB)
Collecting justext>=3.0.1 (from trafilatura)
  Downloading justext-3.0.2-py2.py3-none-any.whl.metadata (7.3 kB)
Collecting tld>=0.13 (from courlan>=1.3.2->trafilatura)
  Downloading tld-0.13.1-py2.py3-none-any.whl.metadata (10 kB)
Collecting dateparser>=1.1.2 (from htmldate>=1.9.2->trafilatura)
  Downloading dateparser-1.2.1-py3-none-any.whl.metadata (29 kB)
Collecting lxml_html_clean (from lxml[html_clean]>=4.4.2->justext>=3.0.1->trafilatura)
  Downloading lxml_html_clean-0.4.2-py3-none-any.whl.metadata (2.4 kB)
Downloading trafilatura-2.0.0-py3-none-any.whl (132 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m132.6/132.6 kB[0m [31m8.8 MB/s[0m eta [36m0:00:00[0m


In [None]:
import torch
import platform
import transformers
import bitsandbytes

print("🧠 PyTorch version:", torch.__version__)
print("⚙️  CUDA available:", torch.cuda.is_available())
print("🖥️  CUDA device:", torch.cuda.get_device_name(0) if torch.cuda.is_available() else "No GPU")
print("🧱 Transformers version:", transformers.__version__)
print("💻 Python version:", platform.python_version())
print("📦 BitsAndBytes version:", bitsandbytes.__version__)

🧠 PyTorch version: 2.6.0+cu124
⚙️  CUDA available: True
🖥️  CUDA device: Tesla T4
🧱 Transformers version: 4.53.0.dev0
💻 Python version: 3.11.12
📦 BitsAndBytes version: 0.46.0


### ⚙️ Check for GPU Availability

This step checks whether a GPU is available in the current Colab environment and assigns the appropriate device (`"cuda"` for GPU or `"cpu"` otherwise).  
Using a GPU can significantly speed up model inference for image and audio generation.


In [None]:
device = "cuda" if torch.cuda.is_available() else "cpu"
print("device", device)

device cuda


## Step 1: 📁 Load and preprocess the text data

In [None]:
import trafilatura

url = "https://dimitrisdais.github.io/dimitris-dais.github.io/"
downloaded = trafilatura.fetch_url(url)
text = trafilatura.extract(downloaded)

print("Preview of extracted text (first 1000 chars):\n")
print(text[:1000])

Preview of extracted text (first 1000 chars):

About Me
📌 Open to new opportunities
Hi, I am Dimitris Dais — a Senior Machine Learning Engineer with a PhD in Artificial Intelligence and Civil Engineering
🔹 Experienced ML engineer delivering end-to-end AI solutions for complex, real-world challenges
🔹 Skilled in defining problem scope, data strategy, and selecting optimal AI stacks
🔹 Proven track record in AI innovation in academia (PhD, 10+ papers, 450+ citations) and industry
🔹 Strong hands-on experience with multimodal & generative AI: VLMs/LLMs 🤗, transformers, prompt engineering, zero/few-shot learning
🔹 Built and deployed cloud-based AI pipelines for real-time visual understanding and automated incident verification
💼 Professional Experience
Freelance — Remote
Senior Machine Learning Engineer (Sep 2024 – Present)
Athens, Greece / London, UK
Working on real-time detection models and decision-support systems for defense and civil protection applications
- Enhanced detection performa

**📃 Text Cleaning and Preprocessing**  
Before summarizing, it's important to clean and normalize the extracted text. This helps the language model better understand the structure and content, especially when dealing with long and semi-structured inputs like CVs. We remove unnecessary characters, collapse whitespace, and ensure paragraphs are clearly separated.

In [None]:
import re

def preprocess_text(text):
    """
    Clean and normalize raw extracted HTML text for better LLM input.

    - Removes extra spaces and tabs
    - Collapses multiple line breaks
    - Strips non-ASCII characters (e.g., emojis)
    - Adds double line breaks between sections using heuristics (e.g., headings)

    Returns: Cleaned text string
    """
    # Remove emojis and non-ASCII characters
    text = re.sub(r'[^\x00-\x7F]+', '', text)

    # Normalize whitespace
    text = re.sub(r'[ \t]+', ' ', text)          # Multiple spaces/tabs to single space
    text = re.sub(r'\n{2,}', '\n', text)         # Collapse multiple line breaks
    text = re.sub(r'\n', '\n\n', text)           # Add double line breaks for structure

    return text.strip()

# Apply cleaning
text = preprocess_text(text)

# Preview
print("🔍 Preview of cleaned text (first 1000 chars):\n")
print(text[:1000])

🔍 Preview of cleaned text (first 1000 chars):

About Me

 Open to new opportunities

Hi, I am Dimitris Dais a Senior Machine Learning Engineer with a PhD in Artificial Intelligence and Civil Engineering

 Experienced ML engineer delivering end-to-end AI solutions for complex, real-world challenges

 Skilled in defining problem scope, data strategy, and selecting optimal AI stacks

 Proven track record in AI innovation in academia (PhD, 10+ papers, 450+ citations) and industry

 Strong hands-on experience with multimodal & generative AI: VLMs/LLMs , transformers, prompt engineering, zero/few-shot learning

 Built and deployed cloud-based AI pipelines for real-time visual understanding and automated incident verification

 Professional Experience

Freelance Remote

Senior Machine Learning Engineer (Sep 2024 Present)

Athens, Greece / London, UK

Working on real-time detection models and decision-support systems for defense and civil protection applications

- Enhanced detection performan

## Step 2: 📉 BART as a Summarization Baseline  
BART is a sequence-to-sequence transformer model pre-trained as a denoising autoencoder **and** fine-tuned for summarization tasks (e.g., CNN/DailyMail dataset). When you use it with the "summarization" pipeline in Hugging Face Transformers:

✅ It automatically formats the input and output as a summarization task.

✅ It tokenizes, truncates, and feeds the input to the model under the hood.

⚠️ It doesn't craft custom prompts unless you do prompt-style input manually (for causal LMs).

In [None]:
# Define the Model & Pipeline
from transformers import pipeline, AutoTokenizer

bart_model_name = "facebook/bart-large-cnn"

tokenizer_bart = AutoTokenizer.from_pretrained(bart_model_name)

summarizer_bart = pipeline(
    "summarization",
    model=bart_model_name,
    device=0
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

Device set to use cuda:0


**Token-aware Chunking Function**  
Since BART has a maximum input length of 1024 tokens, we want to chunk the text by token count, not character count.

In [None]:
def chunk_text(text, tokenizer, max_tokens=900):
    print("🔧 Splitting text into chunks based on token count...")
    tokens = tokenizer.encode(text, truncation=False)
    total_tokens = len(tokens)
    print(f"🔢 Total tokens in text: {total_tokens}")

    chunks = []
    for i in range(0, total_tokens, max_tokens):
        chunk_ids = tokens[i:i + max_tokens]
        chunk_text = tokenizer.decode(chunk_ids, skip_special_tokens=True)
        chunks.append(chunk_text)
        print(f"✅ Created chunk {len(chunks)} with {len(chunk_ids)} tokens")

    print(f"📦 Total chunks: {len(chunks)}")
    return chunks

Summarize Each Chunk (approx. 2 sentences)  

In [None]:
def summarize_chunks(chunks):
    summaries = []

    for idx, chunk in enumerate(chunks):
        print(f"\n🔍 Summarizing chunk {idx + 1}/{len(chunks)}...")

        summary = summarizer_bart(
            chunk,
            max_length=130,
            min_length=80,
            do_sample=False
        )[0]["summary_text"]
        summaries.append(summary)

    return summaries


Put It All Together

In [None]:
# 1. Chunk the long text
chunks = chunk_text(text, tokenizer=tokenizer_bart)

# 2. Summarize each chunk
summaries = summarize_chunks(chunks)

🔧 Splitting text into chunks based on token count...
🔢 Total tokens in text: 1126
✅ Created chunk 1 with 900 tokens
✅ Created chunk 2 with 226 tokens
📦 Total chunks: 2

🔍 Summarizing chunk 1/2...

🔍 Summarizing chunk 2/2...


⚠️ Note: Some sentences are off-topic due to mixed input. For better summaries, split the text into clear sections (e.g., profile, skills, experience) before chunking.

In [None]:
# 3. Print results: One sentence per line
print("📄 Full Summary:\n")
for summary in summaries:
    sentences = summary.split('. ')
    for s in sentences:
        s = s.strip()
        if s:
            print(f"- {s.rstrip('.')}.")

📄 Full Summary:

- Dimitris Dais is a Senior Machine Learning Engineer with a PhD in Artificial Intelligence and Civil Engineering.
- He has built and deployed cloud-based AI pipelines for real-time visual understanding and automated incident verification using Vision-Language Models (VLMs) Dimitris has led the R&D of cutting-edge solutions for automatic industrial inspections and seismic damage assessment.
- He also led a project applying AI to detect and monitor cracks on buildings under earthquake loads.
- Programming & ML Frameworks: Python, PyTorch, TensorFlow, Keras, scikit-learn, ultralytics, Hugging Face, OpenAI.
- Vision-Language Models (VLMs), transformers, LLM APIs, prompt engineering, zero/few-shot learning.
- Retrieval-Augmented Generation (RAG) & Q&A: FAISS, SentenceTransformers, LlamaIndex, LangChain.
- 3D Reconstruction: OpenCV, COLMAP, Open3D, Metashape.


## Step 3: 🔍 Finetuned Models for Longer Contexts and Structured Abstraction  
While BART provides a strong starting point for summarization, its limited input size and occasionally shallow abstraction make it less suitable for complex or lengthy documents. In contrast, recent models like **PEGASUS-X** extend the transformer architecture to accommodate significantly longer contexts—allowing them to better preserve structure and meaning across larger spans of text.

Two such models evaluated here are:

- [`pszemraj/pegasus-x-large-book_synthsumm-bf16`](https://huggingface.co/pszemraj/pegasus-x-large-book_synthsumm-bf16)
- [`BEE-spoke-data/pegasus-x-base-synthsumm_open-16k`](https://huggingface.co/BEE-spoke-data/pegasus-x-base-synthsumm_open-16k)  

## **`pszemraj/pegasus-x-large-book_synthsumm-bf16`**

In [None]:
model_name = "pszemraj/pegasus-x-large-book_synthsumm-bf16"

summarizer = pipeline(
    "summarization",
    model=model_name,
    device=0
)

config.json:   0%|          | 0.00/1.61k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/336 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/20.1k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/6.60M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.22k [00:00<?, ?B/s]

Device set to use cuda:0


In [None]:
summary = summarizer(
    "Summarize the following professional profile in 2–3 sentences:\n\n" + text,
    max_length=300,
    min_length=80,
    do_sample=False
)[0]["summary_text"]

# 📄 Print each sentence on a new line
print("📄 Summary:\n")
for sentence in summary.split('. '):
    s = sentence.strip()
    if s:
        print(f"- {s.rstrip('.')}.")

📄 Summary:

- Dimitris is a senior machine learning engineer with a PhD and extensive experience in AI and civil engineering.
- He has developed and deployed AI solutions for defense, civil protection, industrial inspections, earthquake engineering, and 3D reconstruction.
- He is actively seeking new opportunities and has experience in multimodal and generative AI, VLMs, and LLMs.
- His professional profile includes experience in remote and freelance remote machine learning engineers, leading AI projects, and skills in programming and ML frameworks.


## **`BEE-spoke-data/pegasus-x-base-synthsumm_open-16k`**

In [None]:
from transformers import pipeline

model_name = "BEE-spoke-data/pegasus-x-base-synthsumm_open-16k"
summarizer = pipeline(
    "summarization",
    model=model_name,
    device=0
)

config.json:   0%|          | 0.00/1.45k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/1.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/256 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/20.1k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


spiece.model:   0%|          | 0.00/1.91M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/6.60M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.22k [00:00<?, ?B/s]

Device set to use cuda:0


In [None]:
summary = summarizer(
    "Summarize the following professional profile in 2–3 sentences:\n\n" + text,
    max_length=300,
    min_length=80,
    do_sample=False
)[0]["summary_text"]

# 📄 Print each sentence on a new line
print("📄 Summary:\n")
for sentence in summary.split('. '):
    s = sentence.strip()
    if s:
        print(f"- {s.rstrip('.')}.")

Both `max_new_tokens` (=256) and `max_length`(=300) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)


📄 Summary:

- Dimitris Dais, a Senior Machine Learning Engineer with a PhD in Artificial Intelligence and Civil Engineering, has worked on real-time detection models and decision-support systems for defense and civil protection applications in Athens, Greece, London, UK, Zurich, Switzerland, and Rotterdam, The Netherlands.
- He has a strong background in multimodal & generative AI, including VLMs/LLMs, transformers, prompt engineering, zero/few-shot learning, and cloud-based AI pipelines for real-time visual understanding and automated incident verification.


## Step 4: 🧠 Scaling Further: General-Purpose Open-Source LLMs (Qwen and Mistral)  
Beyond summarization-specific models, general-purpose large language models (LLMs) like **Qwen** and **Mistral** offer powerful summarization capabilities as a byproduct of their broader instruction-tuned design. These models are open-source, self-hostable, and often optimized for versatility across a wide range of tasks—including summarization, classification, reasoning, and dialogue.

Models Evaluated:

- [`Qwen/Qwen3-8B`](https://huggingface.co/Qwen/Qwen3-8B): a 8B-parameter instruction-tuned model from Alibaba, supporting multi-turn tasks and strong summarization performance.
- [`mistralai/Mistral-7B-Instruct-v0.3`](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3): a compact, high-performance 7B model optimized for following instructions and multi-task use.

In [None]:
from huggingface_hub import login

login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline, BitsAndBytesConfig

## **`Qwen/Qwen3-8B`**

In [None]:
model_id = "Qwen/Qwen3-8B"

# NEW: Proper quantization config
quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

# Load model with updated quantization config
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quant_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer_config.json:   0%|          | 0.00/9.73k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/728 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/32.9k [00:00<?, ?B/s]

Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model-00002-of-00005.safetensors:   0%|          | 0.00/3.99G [00:00<?, ?B/s]

model-00004-of-00005.safetensors:   0%|          | 0.00/3.19G [00:00<?, ?B/s]

model-00003-of-00005.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model-00001-of-00005.safetensors:   0%|          | 0.00/4.00G [00:00<?, ?B/s]

model-00005-of-00005.safetensors:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/5 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

In [None]:
def build_prompt(text):
    return (
        "Please read the following professional profile and produce a concise 3-sentence summary. "
        "The output should be formal, non-repetitive, and avoid chatbot commentary. "
        "Do not include conversational elements or offer suggestions. Only write the summary:\n\n"
        f"{text}\n\nSummary:"
    )

In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=300
)

prompt = build_prompt(text)
response = pipe(prompt)[0]["generated_text"]

# Extract the summary part only
summary = response.split("Summary:")[-1].strip()
print("📄 Qwen3-8B Summary:\n")
for s in summary.split('. '):
    s = s.strip()
    if s:
        print(f"- {s.rstrip('.')}.")


Device set to use cuda:0


📄 Qwen3-8B Summary:

- Dimitris Dais is a Senior Machine Learning Engineer with expertise in developing end-to-end AI solutions for complex, real-world applications, particularly in defense, civil protection, and industrial inspection domains.
- He has a strong academic background in Artificial Intelligence and Civil Engineering, with a PhD and a proven track record in AI innovation, including over 10 papers and 450+ citations.
- His skills span multimodal and generative AI, cloud deployment, and cross-functional project leadership, supported by hands-on experience in both academic research and industry settings.
- 2024.03.13.
- 07:45:48.000 UTC+0.
- 2024.03.13.
- 07:45:48.000 UTC+0.
- 2024.03.13.
- 07:45:48.000 UTC+0.
- 2024.03.13.
- 07:45:48.000 UTC+0.
- 2024.03.13.
- 07:45:48.000 UTC+0.
- 2024.03.13.
- 07:45:48.000 UTC+0.
- 2024.


## **`mistralai/Mistral-7B-Instruct-v0.3`**

In [None]:
model_id = "mistralai/Mistral-7B-Instruct-v0.3"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype="float16"
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=bnb_config
)

tokenizer_config.json:   0%|          | 0.00/141k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/587k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.96M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/601 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.95G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.55G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

In [None]:
def build_prompt(text):
    return (
        "You are a professional summarization assistant. Your task is to read the following CV and generate a formal, concise summary in exactly three sentences. "
        "The summary should synthesize key qualifications, areas of expertise, and notable achievements without repeating phrases or including any conversational elements. "
        "Avoid lists, bullet points, or self-referential phrases. Focus on clarity, flow, and relevance for a technical audience:\n\n"
        f"{text}\n\nSummary:"
    )

In [None]:
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=300
)

prompt = build_prompt(text)
response = pipe(prompt)[0]["generated_text"]

# Extract the summary part only
summary = response.split("Summary:")[-1].strip()
print(f"📄 {model_id} Summary:\n")
for s in summary.split('. '):
    s = s.strip()
    if s:
        print(f"- {s.rstrip('.')}.")

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


📄 mistralai/Mistral-7B-Instruct-v0.3 Summary:

- Dimitris Dais is a Senior Machine Learning Engineer with a PhD in Artificial Intelligence and Civil Engineering.
- He has extensive experience in ML, delivering end-to-end AI solutions for complex, real-world challenges.
- He is skilled in defining problem scope, data strategy, and selecting optimal AI stacks.
- Dimitris has a proven track record in AI innovation, with a strong hands-on experience in multimodal & generative AI, including Vision-Language Models, transformers, prompt engineering, zero/few-shot learning, and Retrieval-Augmented Generation (RAG) & Q&A.
- He has a diverse background in academia and industry, with a focus on automating industrial inspections, building inspections, and earthquake engineering.
