## MonReader - part 3

----

### Multimodal OCR (No Pre-processing): VLM vs. Tesseract

**Objective.**  
Evaluate a **Vision–Language Model (VLM)** that performs OCR **directly from raw page images** (no deskew, no binarization, no line/word segmentation). We’ll later compare its verbatim transcription quality against **Tesseract** on the same pages.

We use two sources:
- *The Chamber* — John Grisham *(English)*
- *A onda que se ergueu no mar* — Ruy Castro *(Portuguese)*

**Why this experiment.**  
VLMs can read document text straight from RGB photos by leveraging learned visual invariances (rotation, lighting, curvature). The aim is to measure how far a “no-preprocessing” VLM can go versus a classical pipeline, and to identify the situations where simple conditioning (e.g., deskew) still helps.

**Minimal Pipeline Overview (this part).**  

1. **F – VLM (raw)**: Feed the original page photo to the model with a *verbatim transcription* prompt; capture JSON output `{language, lines}` and a `.txt` view.  
2. **G – Compare**: Compute CER/WER against gold text (and Tesseract), plus latency and error tags.

> In this first section we only set up the dataset, verify image quality, and prepare folders for the VLM run, **no pre-processing**.


----


#### Imports and Environment

In [1]:
from pathlib import Path
import shutil
import numpy as np
import pandas as pd
from PIL import Image
import cv2
import matplotlib.pyplot as plt


In [2]:
BASE = Path.cwd()
DATA_DIR = BASE / "data"
BOOK_DIR = DATA_DIR / "books"
WORK_DIR = BASE / "work"

ENG_BOOK_DIR = BOOK_DIR / "The_Chamber-John_Grisham"
POR_BOOK_DIR = BOOK_DIR / "A_onda_que_se_ergueu_no_mar-Ruy_Castro"

ENG_IMG_DIR = ENG_BOOK_DIR / "images"
POR_IMG_DIR = POR_BOOK_DIR / "images"

for p in [BOOK_DIR, WORK_DIR, ENG_BOOK_DIR, POR_BOOK_DIR, ENG_IMG_DIR, POR_IMG_DIR]:
        p.mkdir(parents=True, exist_ok=True)


----

### Step F — VLM OCR (GGUF local, no pre-processing)

**Goal.**  
Use a **quantized GGUF** build of *Llama 3.2-Vision Instruct* to transcribe book-page photos directly (no deskew, no binarization).  
We start with a **single-image smoke test**, then we’ll scale to the full dataset.

**Why GGUF?**  
GGUF files are pre-quantized, self-contained weights that can run efficiently on the local GPUs through the `llama.cpp` engine (used by LM Studio and Ollama).  

They trade a few points of accuracy for huge VRAM savings, perfect for a GTX 1080 Ti.



In [3]:
import requests, base64, json, time, re
from pathlib import Path

### Setting up Ollama for Local Multimodal Inference

Before running the OCR prompting steps, we first set up **Ollama**, a lightweight local engine for running quantized large language and vision models (GGUF format) efficiently on consumer GPUs.

**Installation**
1. Go to [https://ollama.com/download](https://ollama.com/download)
2. Download and install the correct version depending on your OS.
3. Open a Terminal and verify the installation:
   ```bash
   ollama --version
   ollama list
4. Pull the multimodal model:
   ```bash
   ollama pull llama3.2-vision


#### Prompt design

We’ll use a **verbatim OCR prompt**. The model must output text *exactly* as it appears, with preserved line breaks and punctuation.  
We ask for JSON to keep parsing simple.


In [4]:
OLLAMA_URL = "http://localhost:11434/api/generate"
MODEL = "llama3.2-vision"
IMG = Path(r"E:\Devs\pyEnv-1\Apziva\MonReader\data\books\A_onda_que_se_ergueu_no_mar-Ruy_Castro\images\pag12.JPEG")
assert IMG.exists(), f"Image not found {IMG}"


In [5]:
def b64_image(p: Path) -> str:
    return base64.b64encode(open(p, "rb").read()).decode()

In [6]:
# Version 0.2
# SYSTEM_PROMPT = (
#     "You are an OCR transcriber. Output exactly the text you see. "
#     "Preserve line breaks and punctuation. "
#     "Return ONLY valid JSON with keys {\"language\":\"eng|por|guess\",\"lines\":[\"...\"]}. "
#     "Transcribe this image verbatim."
# )

# Version 0.1
# SYSTEM_PROMPT = (
#     "You are an OCR transcriber. Return ONLY valid JSON:\n"
#     '{"language":"eng|por|guess","lines":["..."]}\n'
#     "Transcribe the image verbatim. Preserve line breaks and punctuation."
# )

# Version 0.2
SYSTEM_PROMPT = """
You are an OCR transcriber.

Return ONLY one valid JSON object with keys:
- "language": one of ["eng","por","guess"]
- "lines": an array of strings, one per line in reading order

Rules:
- Do NOT repeat the JSON object.
- Do NOT include any text outside the single JSON object.
- Preserve line breaks and punctuation exactly as seen.
- If unsure about a character, copy it as best you can (do not explain).

Transcribe the image verbatim.
""".strip()




In [7]:
num_predict_values = [1536, 2048, 3072]


In [None]:
resps = []

for num_predict in num_predict_values:
    t0 = time.time()
    
    payload = {
        "model": MODEL,
        "prompt": SYSTEM_PROMPT,
        "images": [b64_image(IMG)],
        "format": "json",
        "stream": True,
        "options": {
            "temperature": 0,
            "top_p": 1,
            "repeat_penalty": 1.2,
            "num_predict": num_predict,
            "stop": ["\n}\n", "\n}\r\n", "\n}"]
        }
    }
    
    chunks = []
    status_code = None
    
    try:
        
        with requests.post(OLLAMA_URL, json=payload, stream=True, timeout=(10, 3600)) as r:
            status_code = r.status_code
            r.raise_for_status()
            
            for line in r.iter_lines(decode_unicode=True):
                if not line:
                    continue
                
                try:
                    obj = json.loads(line)
                except json.JSONDecodeError:
                    # If Ollama ever emits a non-JSON line, skip or log it
                    continue
                
                if "error" in obj and obj["error"]:
                    raise RuntimeError(f"Ollama error: {obj['error']}")
                
                chunks.append(obj.get("response", ""))
                
                if obj.get("done"):
                    break
                
        text = "".join(chunks)
        lat = time.time() - t0
        resps.append({"num_predict": num_predict, "latency_s": lat, "text": text})
        print("HTTP", status_code, f"{num_predict=} {lat:.1f}s")
        
    except Exception as e:
        lat = time.time() - t0
        print(f"FAILED {num_predict=} after {lat:.1f}s: {e}")
        resps.append({"num_predict": num_predict, "latency_s": lat, "text": None, "error": str(e)})




HTTP 200 num_predict=1536 659.7s
HTTP 200 num_predict=2048 825.2s
HTTP 200 num_predict=3072 517.4s


#### *Note*: effect of 2× Image Downscaling (other experiment, not treated in this notebook)

A brief experiment was conducted using page images downscaled by a factor of 2 to evaluate whether reduced resolution would improve efficiency or stability in multimodal OCR.

In practice, aggressive downscaling proved **counterproductive** for this task. While smaller images reduce pixel count, they also degrade fine visual cues critical for text recognition (thin strokes, diacritics, punctuation, hyphenation). Once these cues are weakened, the vision–language model becomes less stable and more prone to decoding artifacts and repetition.

These were the registered times for different `num_predict` values (number of generated tokens):
- HTTP 200 num_predict=4096 1036.6s
- HTTP 200 num_predict=2048 1261.1s
- HTTP 200 num_predict=1024 649.5s
- HTTP 200 num_predict=512 321.7s

This observation highlights an important property of VLM-based OCR: **image resolution acts as a form of implicit conditioning**. Excessive downscaling can remove information needed for reliable alignment between visual tokens and text generation, even if the model nominally supports invariance to scale.

Based on this experience, subsequent experiments use **full-resolution page images**, prioritizing transcription stability and fidelity over marginal efficiency gains.


In [9]:

def extract_first_valid_json(text: str):
    """
    Try to find and parse the first valid JSON object embedded in text.
    Returns (obj, n_candidates) where:
      - obj is a dict if found, else None
      - n_candidates is how many {...} blocks we saw (rough proxy for repetition)
    """
    # Roughly find JSON object candidates. Non-greedy to avoid swallowing everything.
    candidates = re.findall(r"\{.*?\}", text, flags=re.DOTALL)
    for c in candidates:
        try:
            return json.loads(c), len(candidates)
        except json.JSONDecodeError:
            continue
    return None, len(candidates)

def coerce_lines(js):
    """Normalize the 'lines' field into a list[str]."""
    lines = js.get("lines", [])
    if isinstance(lines, str):
        return lines.splitlines()
    if isinstance(lines, list):
        return [str(x) for x in lines]
    return [str(lines)]


In [10]:
# print the responses for each 'num_predict' value

for r in resps:
    num_predict = r.get("num_predict")
    text = r.get("text")

    if not text:
        print(f"num_predict={num_predict} | EMPTY or ERROR")
        print(80 * "=")
        continue

    # Attempt 1: parse a single JSON object by trimming junk around it
    start = text.find("{")
    end = text.rfind("}")
    candidate = (
        text[start:end+1]
        if (start != -1 and end != -1 and end > start)
        else text
    )

    parsed_ok = False
    lang = "guess"
    lines = []
    json_objects_found = 0

    try:
        js = json.loads(candidate)
        lang = js.get("language", "guess")
        lines = coerce_lines(js)
        parsed_ok = True
        json_objects_found = 1  # we parsed one (assume single-object case)
    except Exception:
        # Attempt 2: handle repeated JSON objects / messy streams
        js, json_objects_found = extract_first_valid_json(text)
        if js is not None:
            lang = js.get("language", "guess")
            lines = coerce_lines(js)
            parsed_ok = True
        else:
            # Final fallback: plain text split
            lines = text.splitlines()

    print(
        f"num_predict={num_predict} | "
        f"parsed_json={parsed_ok} | "
        f"json_objs≈{json_objects_found} | "
        f"language={lang} | "
        f"lines={len(lines)} | "
        f"chars={len(text)}"
    )
    print("\n".join(lines[:60]))
    print(80 * "=")


num_predict=1536 | parsed_json=False | json_objs≈0 | language=guess | lines=32 | chars=1886
{
    "language": "por",
    "lines": [
        "A trilha sonora de um pais ideal",
        "O",
        "lha que coisa mais linda: as garotas de Ipanema-1961",
        "tomavam cuba-libre, dirigiam Kharman-Glias e voavam",
        "pela Panair. Usavam frasqueira, vestido-tubinho, cilio",
        "postico, perua, laque. Diziam-se existencialistas, adoravam",
        "arte abstrata e nao perdiam um filme da Nouvelle Vague.",
        "Seus pontos eram o Beco das Garrafas, a Cinemateca, o Arpoador.",
        "Iam a praia com a camisa social do irmao e, sob esta, um biquini que",
        "de tao insolente, fazia o sangue dos rapazes ferver da maneira",
        "mais incoveniente.",
        "Tudo isso passou. A querida Panair nunca mais voou, a",
        "Nouvelle Vague e um filme em preto e branco e ninguem mais",
        "toma cuba-libre - quem pensaria hoje em misturar rum com",
        "Coca-Cola

### Commentary: Current baseline run (full-resolution, tuned decoding)

With the current prompt + decoding controls (streaming, `temperature=0`, increased `repeat_penalty`, and explicit stop tokens), the **first run (`num_predict=1536`) produced the most usable transcription** among the three caps tested. It yielded the most coherent line-by-line OCR with fewer obvious decoding artifacts.

In contrast, the later runs at higher `num_predict` values showed degraded stability (character corruption, repetition, and token “runaway” patterns). This reinforces an important practical point for VLM-based OCR: **larger generation budgets do not guarantee better transcription quality**, and can amplify failure modes when the model drifts into repetition.

Given these observations, `num_predict=1536` is treated as the **current baseline** for subsequent comparisons, and further tuning is deferred while we move on to cross-model evaluation.


---

### Step F(continuation) — compare results from three popular multimodal models

This section summarizes the core characteristics of the three Vision–Language Models (VLMs) evaluated in this notebook:

- **LLaMA 3.2 Vision Instruct**
- **LLaVA**
- **Qwen 2.5-VL**

All models are executed locally via **Ollama** using quantized GGUF weights and evaluated under the same OCR-oriented prompting and decoding constraints. The goal is not to benchmark general vision reasoning, but to assess **verbatim transcription stability, long-form generation behavior, and robustness to document layout**.

---

### Vision–Language Model Comparison (Spec Sheet)

| Feature | **LLaMA 3.2 Vision** | **LLaVA** | **Qwen 2.5-VL** |
|------|---------------------|----------|----------------|
| **Origin** | Meta AI | UW–Madison + Microsoft | Alibaba Cloud |
| **Initial Release** | 2024 | 2023 | 2024 |
| **Primary Design Goal** | Instruction-following multimodal assistant | General-purpose VLM | Strong multimodal + multilingual understanding |
| **Language Backbone** | LLaMA 3.2 | LLaMA-family (7B) | Qwen 2.5 |
| **Approx. Params (LLM)** | ~8B | ~7B | ~7B |
| **Vision Encoder** | Proprietary ViT (Meta) | CLIP ViT | Qwen ViT-based encoder |
| **Fusion Strategy** | Visual tokens injected into text context | Visual tokens projected into LLM space | Unified multimodal token space |
| **Context Window (text)** | ~8k tokens | ~4k tokens | ~8k tokens |
| **Multilingual Support** | Moderate | Limited | **Strong (incl. Portuguese)** |
| **OCR Orientation** | Medium | Low–Medium | **High** |
| **Verbatim Fidelity** | Good (early tokens) | Moderate | **Strong** |
| **Long-Form Stability** | Degrades with length | Prone to repetition | **Most stable** |
| **JSON / Structured Output** | Sometimes brittle | Brittle at long lengths | **Most consistent** |
| **Typical Failure Modes** | Late repetition, truncation | Semantic drift, format repetition | Over-generation if unconstrained |
| **Disk Size (Ollama)** | ~7.8 GB | ~4.7 GB | ~6.0 GB |
| **Role in This Study** | Primary baseline | Legacy VLM baseline | OCR-oriented challenger |

---

### Practical Interpretation for OCR Experiments

- **LLaMA 3.2 Vision** performs well in early decoding but tends to degrade as generation length increases.
- **LLaVA** is optimized for general visual dialogue and reasoning rather than strict verbatim transcription.
- **Qwen 2.5-VL** offers the best balance of multilingual support, decoding stability, and OCR-aligned behavior, making it particularly suitable for document-level transcription tasks.

Subsequent sections evaluate each model individually using identical prompts, decoding parameters, and page images to enable direct comparison.


In [12]:
import pandas as pd

In [16]:
# Set the complete list of Models that we will use
MODELS = [
    {"name": "llama3.2-vision:latest", "label": "Llama-3.2-Vision"},
    {"name": "llava:latest",           "label": "LLaVA"},
    {"name": "qwen2.5vl:latest",          "label": "Qwen2.5-VL (7B)"},
]



In [None]:
# Notes:
# - We will keep the same SYSTEM PROMPT
# - We will adapt the payload base options keeping the num_predict set as 1536 (best options after simple benchmark)


# Baseline decoding options
BASE_OPTIONS = {
    "temperature": 0,
    "top_p": 1,
    "repeat_penalty": 1.25,
    "num_predict": 1536,
    "stop": ["}\n", "}\r\n", "}"],
}

# Define a basic model runner that can switch between models
def run_ollama_ocr(model_name: str, img_path: Path, prompt: str, options: dict):
    payload = {
        "model": model_name,
        "prompt": prompt,
        "images": [b64_image(img_path)],
        "stream": True,
        "format": "json",
        "options": options,
    }

    t0 = time.time()
    chunks = []
    status_code = None

    with requests.post(OLLAMA_URL, json=payload, stream=True, timeout=(10, 3600)) as r:
        status_code = r.status_code
        r.raise_for_status()

        for line in r.iter_lines(decode_unicode=True):
            if not line:
                continue
            try:
                obj = json.loads(line)
            except json.JSONDecodeError:
                continue

            if obj.get("error"):
                raise RuntimeError(f"Ollama error: {obj['error']}")

            chunks.append(obj.get("response", ""))
            if obj.get("done"):
                break

    text = "".join(chunks)
    latency_s = time.time() - t0
    return {"model": model_name, "status": status_code, "latency_s": latency_s, "text": text}



In [14]:
## helpers
def extract_first_valid_json(text: str):
    candidates = re.findall(r"\{.*?\}", text, flags=re.DOTALL)
    for c in candidates:
        try:
            return json.loads(c), len(candidates)
        except json.JSONDecodeError:
            continue
    return None, len(candidates)

def coerce_lines(js):
    lines = js.get("lines", [])
    if isinstance(lines, str):
        return lines.splitlines()
    if isinstance(lines, list):
        return [str(x) for x in lines]
    return [str(lines)]

def parse_ocr_text(text: str):
    if not text:
        return {"parsed_json": False, "language": "guess", "lines": [], "json_objs": 0, "parse_error": "empty"}

    start = text.find("{")
    end = text.rfind("}")
    candidate = text[start:end+1] if (start != -1 and end != -1 and end > start) else text

    try:
        js = json.loads(candidate)
        return {
            "parsed_json": True,
            "language": js.get("language", "guess"),
            "lines": coerce_lines(js),
            "json_objs": 1,
            "parse_error": None
        }
    except Exception as e:
        js, n = extract_first_valid_json(text)
        if js is not None:
            return {
                "parsed_json": True,
                "language": js.get("language", "guess"),
                "lines": coerce_lines(js),
                "json_objs": n,
                "parse_error": None
            }
        return {
            "parsed_json": False,
            "language": "guess",
            "lines": text.splitlines(),
            "json_objs": n,
            "parse_error": repr(e)
        }

def looks_degenerate(text: str) -> bool:
    # quick heuristic flags: unicode junk / obvious repetition artifacts
    if not text:
        return True
    if "\\ud" in text or "\\ua1" in text:
        return True
    if "a1a1a1" in text:
        return True
    return False



In [44]:
# Run comparison between 3 VLMs

runs = []
for m in MODELS:
    try:
        out = run_ollama_ocr(m["name"], IMG, SYSTEM_PROMPT, BASE_OPTIONS)
        parsed = parse_ocr_text(out["text"])
        runs.append({
            "label": m["label"],
            "model": m["name"],
            "status": out["status"],
            "latency_s": out["latency_s"],
            "parsed_json": parsed["parsed_json"],
            "language": parsed["language"],
            "n_lines": len(parsed["lines"]),
            "n_chars": len(out["text"] or ""),
            "json_objs": parsed["json_objs"],
            "degenerate": looks_degenerate(out["text"]),
            "lines": parsed["lines"],
            "text": out["text"],
            "preview": "\n".join(parsed["lines"][:100]),
        })
        print(f"OK  - {m['label']}  ({out['latency_s']:.1f}s)")
    except Exception as e:
        runs.append({
            "label": m["label"],
            "model": m["name"],
            "status": None,
            "latency_s": None,
            "parsed_json": False,
            "language": "guess",
            "n_lines": 0,
            "n_chars": 0,
            "json_objs": 0,
            "degenerate": True,
            "lines": [],
            "text": None,
            "preview": "",
            "error": str(e),
        })
        print(f"ERR - {m['label']}: {e}")

df = pd.DataFrame(runs).sort_values(by=["degenerate", "parsed_json", "latency_s"], ascending=[True, False, True])
df[["label","model","status","latency_s","parsed_json","language","n_lines","n_chars","json_objs","degenerate"]]


OK  - Llama-3.2-Vision  (630.7s)
OK  - LLaVA  (89.6s)
OK  - Qwen2.5-VL (7B)  (910.7s)


Unnamed: 0,label,model,status,latency_s,parsed_json,language,n_lines,n_chars,json_objs,degenerate
1,LLaVA,llava:latest,200,89.551258,False,guess,1,246,0,False
0,Llama-3.2-Vision,llama3.2-vision:latest,200,630.70809,False,guess,32,1887,0,False
2,Qwen2.5-VL (7B),qwen2.5vl:latest,200,910.669698,False,guess,34,1785,0,False


In [45]:
from IPython.display import Markdown, display


In [46]:
N_SHOW = 200  # show more lines here

for r in df.to_dict(orient="records"):
    label = r["label"]
    full_row = next(x for x in runs if x["label"] == label)

    lines = full_row.get("lines", [])
    content = "\n".join(lines[:N_SHOW]) if lines else "<empty>"

    latency = full_row.get("latency_s")
    latency_str = f"{latency:.1f}s" if isinstance(latency, (int, float)) else "—"

    md = f"""### OCR Output — {label} \n- **Model:** {full_row.get("model")}\n- **Latency:** {latency_str}\n- **Parsed JSON:** {full_row.get("parsed_json")}  |  **Degenerate:** {full_row.get("degenerate")}  |  **Language:** {full_row.get("language")}\n- **Showing:** {min(N_SHOW, len(lines))}/{len(lines)} lines

    ```text
    {content}
    ```
    """
    
    display(Markdown(md))
   


### OCR Output — LLaVA 
- **Model:** llava:latest
- **Latency:** 89.6s
- **Parsed JSON:** False  |  **Degenerate:** False  |  **Language:** guess
- **Showing:** 1/1 lines

    ```text
    {"language": "eng", "lines": ["A traitor is someone who has been given something in return for their betrayal.", "Their loyalty to another person or group of people, which they have promised to protect and serve, is now compromised by this act."]
    ```
    

### OCR Output — Llama-3.2-Vision 
- **Model:** llama3.2-vision:latest
- **Latency:** 630.7s
- **Parsed JSON:** False  |  **Degenerate:** False  |  **Language:** guess
- **Showing:** 32/32 lines

    ```text
    {
    "language": "por",
    "lines": [
        "A trilha sonora de um pais ideal",
        "O",
        "lha que coisa mais linda: as garotas de Ipanema-1961",
        "tomavam cuba-libre, dirigiam Kharman-Glias e voavam",
        "pela Panair. Usavam frasqueira, vestido-tubinho, cilio",
        "postico, perua, laque. Diziam-se existencialistas, adoravam",
        "arte abstrata e nao perdiam um filme da Nouvelle Vague.",
        "Seus pontos eram o Beco das Garrafas, a Cinemateca, o Arpoador.",
        "Iam a praia com a camisa social do irmao e, sob esta, um biquini que",
        "de tao insolente, fazia o sangue dos rapazes ferver da maneira",
        "mais incoveniente.",
        "Tudo isso passou. A querida Panair nunca mais voou, a",
        "Nouvelle Vague e um filme em preto e branco e ninguem mais",
        "toma cuba-libre - quem pensaria hoje em misturar rum com",
        "Coca-Cola? Quanto aquele biquini, era mesmo insolente, em-",
        "bora, por padroes subsequentes, sua calinha contivesse pano",
        "para fabricar dois ou tres para-ques. Dito assim, e como se, em",
        "1961, o ceu do Brasil ainda fosse povoado por pterodactilos.",
        "Mas ha uma excessao. A musica que aquelas garotas escutavam",
        "na epoca continua a ser ouvida - um milenio depois - como se",
        "brotasse das esferas: a Bossa Nova.",
        "Acreditou ou nao, em numeros absolutos ouve-se mais",
        "Bossa Nova hoje do que em 1961. Eela nao brota das esferas, mas",
        "e produzida ao vivo, pelos gogos, dedos e pulmoes de artistas de",
        "todas as idades, em lugares fechados ou ao ar livre, em quatro ou",
        "cinco continentes. Ouve-se Bossa Nova em salas de concerto,",
        "teatros, boates, bares, clubes, escolas, estadios, pracas, praiaas e",
        "quiosques e, ultimamente, como uma epidemia, nas ruas notur-"
    ]
    ```
    

### OCR Output — Qwen2.5-VL (7B) 
- **Model:** qwen2.5vl:latest
- **Latency:** 910.7s
- **Parsed JSON:** False  |  **Degenerate:** False  |  **Language:** guess
- **Showing:** 34/34 lines

    ```text
    {
  "language": "por",
  "lines": [
    "A trilha",
    "sonora de um",
    "país ideal",
    "",
    "Olha que coisa mais linda: as garotas de Ipanema-1961",
    "tomavam cuba-libre, dirigiam Kharman-Ghias e voavam",
    "pela Panair. Usavam frasqueira, vestido-tubinho, cílio",
    "postiço, peruca, laquê. Diziam-se existencialistas, adoravam",
    "arte abstrata e não perdiam um filme da Nouvelle Vague. Seus",
    "points eram o Beco das Garrafas, a Cinemateca, o Arpoador. Iam",
    "à praia com a camisa social do irmão e, sob esta, um biquíni que",
    "de tão insolente, fazia o sangue dos rapazes ferver da maneira",
    "mais inconveniente.",
    "Tudo isso passou. A querida Panair nunca mais voou, a",
    "Nouvelle Vague é um filme em preto e branco e ninguém mais",
    "toma cuba-libre — quem pensaria hoje em misturar rum com",
    "Coca-Cola? Quanto àquele biquíni, era mesmo insolente, embora",
    "por padrões subsequentes, sua calcinha contivesse pano",
    "para fabricar dois ou três pára-quedas. Dito assim, é como se, em",
    "1961, o céu do Brasil ainda fosse povoado por pterodáctilos.",
    "Mas há uma exceção. A música que aquelas garotas escutavam",
    "na época continua a ser ouvida — um milênio depois —",
    "como se brotasse das esferas: a Bossa Nova.",
    "Acredite ou não, em números absolutos ouve-se mais Bossa",
    "Nova hoje do que em 1961. E ela não brota das esferas, mas",
    "é produzida ao vivo, pelos gogós, dedos e pulmões de artistas",
    "de todas as idades, em lugares fechados ou ao ar livre, em quatro",
    "ou cinco continentes. Ouviu-se Bossa Nova em salas de concerto",
    "teatros, boates, bares, clubes, escolas, estádios, praças, praias",
    "e quiosques e, ultimamente, como uma epidemia, nas ruas noturnas"
  ]
    ```
    