Using `llama.cpp` instead of `huggingface's transformers` is much faster. Transformers took 55 secs. Here the code only takes 7 secs.

The only problem is that this method here creates `Grounded Text` and not `DocTags`, even though I'm asking for DocTags in the
prompt. Not sure why that is.


In [6]:
import base64
import json
import mimetypes
import re

import requests

from IPython.display import Markdown, display

In [7]:
LLAMA_SERVER_URL = "http://localhost:36912"

In [8]:

def encode_image_to_data_uri(image_path: str) -> str:
    mime_type, _ = mimetypes.guess_type(image_path)
    if mime_type is None:
        mime_type = "application/octet-stream"

    with open(image_path, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode("utf-8")

    return f"data:{mime_type};base64,{encoded_string}"


path_to_image = "data/images/image.png"
image_data_uri = encode_image_to_data_uri(path_to_image)

In [9]:
def extract_text_from_images(model_name: str, prompt: str, image_data_uri: str) -> str:
    url = f"{LLAMA_SERVER_URL}/v1/chat/completions"
    headers = {"Content-Type": "application/json"}

    messages = [
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": image_data_uri}},
            ],
        }
    ]

    data = {
        "messages": messages,
        "model": model_name,
        "max_tokens": 8192,
    }

    try:
        response = requests.post(url, headers=headers, data=json.dumps(data))
        response.raise_for_status()
        response_json = response.json()

        assistant_message = response_json["choices"][0]["message"]["content"]
        return assistant_message.strip()

    except Exception as e:
        print(f"An error occurred: {e}")
        return ""
    
model_to_use = "granite-docling-258M"
text_prompt = "Convert this page to docling using DocTags."

# the extracted text is "Grounded Text" and for reasons unknown not DocTags (semantic structure)
doc_tags_llama_cpp = extract_text_from_images(model_to_use, text_prompt, image_data_uri)
len(doc_tags_llama_cpp)

5753

In [10]:
def clean_docling_output(text):
    lines = text.split('\n')
    cleaned_lines = []
    
    for line in lines:
        if not line.strip():
            continue
            
        # 1. Remove location tags <loc_XXX>
        clean = re.sub(r'<loc_\d+>', '', line)
        
        # 2. Remove structure tags (like <line_chart>)
        clean = re.sub(r'<[^>]+>', '', clean)
        
        # 3. Detect and Format Headers (Simple heuristic: starts with numbering like 2.5.)
        if re.match(r'^\d+\.\d+\.', clean.strip()):
            clean = "## " + clean
            
        # 4. Detect Block Math
        # If a line contains typical latex assignment "=" and slash commands "\", wrap in $$
        if "=" in clean and "\\" in clean and len(clean) < 200:
             clean = f"$$\n{clean.strip()}\n$$"
             
        # 5. Clean up extra spaces around LaTeX often left by OCR
        # (e.g., "T _ { e f f }" -> "T_{eff}")
        clean = re.sub(r' _ \{', '_{', clean)
        
        cleaned_lines.append(clean)

    return "\n\n".join(cleaned_lines)

extracted_text_markdown = clean_docling_output(doc_tags_llama_cpp)
display(Markdown(extracted_text_markdown))


Energy Budget of WASP-121 b from JWST/NIRISS Phase Curve

while the kernel weights are structured as ( N$_{slice}$, N$_{time}$ ). This precomputation significantly accelerates our calculations, which is essential since the longitudinal slices are at least partially degenerate with one another. Consequently, the fits require more steps and walkers to ensure proper convergence.

To address this, we follow a similar approach to our sinusoidal fits using emcee , but we increase the total number of steps to 100,000 and use 100 walkers. Na¨ıvely, the fit would include 2 N$_{slice}$ + 1 parameters: N$_{slice}$ for the albedo values, N$_{slice}$ for the emission parameters, and one additional scatter parameter, σ . However, since night-side slices do not contribute to the reflected light component, we exclude these albedo values from the fit. In any case, our choice of 100 walkers ensures a sufficient number of walkers per free parameter. Following Coulombe et al. ( 2025 ) we set an upper prior limit of 3/2 on all albedo slices as a fully Lambertian sphere ( A$_{i}$ = 1 ) corresponds to a geometric albedo of A$_{g}$ = 2 / 3. For thermal emission we impose a uniform prior between 0 and 500 ppm for each slice.

We choose to fit our detrended lightcurves considering 4, 6 and 8 longitudinal slices ( N$_{slice}$ = 4, 6, 8). However, we show the results of the simplest 4 slice model. As in our previous fits, we conduct an initial run with 25,000 steps (25% of the total run) and use the maximumprobability parameters from this preliminary fit as the starting positions for the final 75,000-step run. We then discard the first 60% of the final run as burn-in.

## 2.5. Planetary Effective Temperature

Phase curves are the only way to probe thermal emission from the day and nightside of an exoplanet and hence determine its global energy budget ( Partner & Crossfield 2018 ). The wavelength range of NIRISS/SOSS covers a large portion of the emitted flux of WASP-121 b ( ∼ 50-83%; see Figure 2 ), enabling a precise and robust constraint of the planet's energy budget.

We convert the fitted F$_{p}$ / F$_{∗}$ emission spectra to brightness temperature by wavelength,

$$
T_{ b r i g h t } = \frac { h c } { k \lambda } \cdot \left [ \ln \left ( \frac { 2 h c ^ { 2 } } { \lambda ^ { 5 } B_{ \lambda , p l a n e t } } + 1 \right ) \right ] ^ { - 1 } ,
$$

where the planet's thermal emission is

$$
B_{ \lambda , p l a n e t } = \frac { F_{ p } / F_{ * } } { ( R_{ p } / R_{ * } ) ^ { 2 } } \cdot B_{ \lambda , s t a r } .
$$

There are many ways of converting brightness temperatures to effective temperature, including the ErrorWeighted Mean ( EWM), Power-Weighted mean ( PWM) and with a Gaussian Process ( Schwartz & Cowan 2015;

9

Figure 2. Estimated captured flux of the planet assuming the planet radiates as a blackbody. The captured flux is calculated as the ratio of the integrated blackbody emission within the instrument's band pass to the total emission over all wavelengths, i.e., γ = ∫ λ $_{max}$λmin B ( λ, T ) dλ/ ∫ ∞ 0 B ( λ, T ) dλ . The captured flux fraction is shown for NIRISS SOSS [0.6-2.85 μ m] (red line); Hubble WFC3 [1.12-1.64 μ m] (dashed green line); NIRSpec G395H [2.7-5.15 μ m] (dash dotted blue line). The red-shaded region shows the temperature range on WASP-121 b based on our T$_{eff}$ estimates. Red dashed lines indicate the boundaries of the planet's temperature range within the NIRISS SOSS captured flux fraction. From this we estimate that these observations capture between 55% and 82% of the planet's bolometric flux, depending on orbital phase. Using the minimum temperature from the NAMELESS fit, this estimate decreases to 50%. In either case, the wavelength coverage of NIRISS exceeds that of any other instrument.

Pass et al. 2019 ). In this work, we elect to compute our effective temperature estimates with a novel method that is essentially a combination of the PWM and EWM. We create the effective temperature by using a simple Monte Carlo process. First, we perturb our F$_{p}$/ F$_{s}$ emission spectra at each point in the orbit by a Gaussian based on the measurement uncertainty. Our new emission spectrum is then used to create an estimate of the brightness temperature spectrum. This process is repeated at each orbital phase. We then estimate the effective temperature, T$_{eff}$ for a given orbital phase as

$$
T_{ \text {eff} } = \frac { \sum_{ i = 1 } ^ { N } w_{ i } T_{ \text {bright,} } i } { \sum_{ i = 1 } ^ { N } w_{ i } } ,
$$

where w$_{i}$ is the weight for the i -th wavelength given by the fraction of the planet's bolometric flux that falls within that wavelength bin scaled by the inverse variance of the measurement,

w_{ i } = \frac { \int_{ \lambda_{ i } } ^ { \lambda_{ i + 1 } } B ( \lambda_{ i } , T_{ \text {est} } ) \, d \lambda } { \int_{ 0 } ^ { \infty } B ( \lambda_{ i } , T_{ \text {est} } ) \, d \lambda } \cdot \frac { 1 } { \sigma_{ i } ^ { 2 } } ,

with T$_{est}$ representing an estimated effective temperature at the orbital phase of interest. When computing