<div style="
  display: flex;
  flex-direction: column;
  align-items: center;
  justify-content: center;
  background: linear-gradient(135deg, #2F4B43, #4F6F65);
  padding: 40px 20px;
  border-radius: 15px;
  box-shadow: 0 2px 10px rgba(0,0,0,0.1);
  text-align: center;
  font-family: 'Segoe UI', sans-serif;
">
  <div>
    <h1 style="
      color: #EF4635;  /* Command A Vision red */
      font-weight: 800;
      font-size: 34px;
      margin: 0 0 12px 0;
    ">
      📑 Testing Cohere <span style="color: #33DDE0;">Command A Vision</span> on Table Structure Recognition & Extraction
    </h1>
    <p style="
      color: #f0f0f0;
      font-size: 18px;
      margin: 0;
      max-width: 700px;
    ">
      Exploring its capability to detect, understand, and convert tables into structured formats across diverse document layouts.
    </p>
  </div>
</div>

## **Introduction** <a id="introduction"></a>

[Cohere Command A Vision](https://huggingface.co/CohereLabs/command-a-vision-07-2025?ref=cohere-ai.ghost.io) is a state-of-the-art open-weight vision-language model released by Cohere in July 2025. Built on the Command R+ foundation, it is designed for advanced enterprise-level image understanding tasks including document parsing, OCR, and structured data extraction.

According to [Cohere’s official announcement](https://cohere.com/blog/command-a-vision), Command A Vision demonstrates impressive performance across multiple visual reasoning benchmarks, such as DocVQA, OCRBench, and AI2D. This notebook focuses specifically on evaluating the model’s effectiveness in **table understanding and extraction**. Through real-world examples, we assess its ability to detect table structures, interpret cell-level data, and convert them into accurate markdown or structured formats.

By concentrating on this use case, we aim to understand the model's practical value in automating workflows for document intelligence, data mining, and enterprise reporting from unstructured or semi-structured visual sources.

# 📑 Table of Contents

1. [Introduction](#introduction)
2. [Install Libraries](#1-install-libraries)
3. [Import Libraries](#2-import-libraries)
4. [Inference](#3-inference)
   - [Example n°1](#example-n°1)
   - [Example n°2](#example-n°2)
   - [Example n°3](#example-n°3)
   - [Example n°4](#example-n°4)
   - [Example n°5](#example-n°5)
   - [Example n°6](#example-n°6)
   - [Example n°7](#example-n°7)
   - [Example n°8](#example-n°8)
   - [Example n°9](#example-n°9)
   - [Example n°10](#example-n°10)
   - [Example n°11](#example-n°11)
5. [Conclusion](#conclusion)

# **1. Install Libraries** <a id="1-install-libraries"></a>

In [2]:
%%capture
! pip install cohere
! pip install markdown2

# **2. Import Libraries** <a id="2-import-libraries"></a>

In [3]:
import cohere

import logging
import time
from pathlib import Path
import pandas as pd
from pdf2image import convert_from_path
import matplotlib.pyplot as plt
from PIL import Image
import os
import re

from IPython.display import display, HTML
import base64

import io
from io import BytesIO, StringIO

import warnings
import sys
import contextlib
import json
from rich.pretty import pprint
from rich.table import Table
from rich.markdown import Markdown
import base64
import markdown
import markdown2

import pandas as pd

# **3. Inference** <a id="3-inference"></a>

In [4]:
def markdown_table_to_html(markdown_text):
    match = re.search(r"\|.*?\|\n(\|[-| ]+\|\n)?((?:\|.*\|\n?)+)", markdown_text, re.DOTALL)
    if not match:
        raise ValueError("No markdown table found.")

    lines = match.group(0).strip().splitlines()
    header = [col.strip() for col in lines[0].split('|')[1:-1]]
    rows = [
        [col.strip() for col in row.split('|')[1:-1]]
        for row in lines[2:]  # Skip header and separator
    ]
    df = pd.DataFrame(rows, columns=header)
    return df.to_html(index=False, escape=False)

In [5]:
def display_results(image_path, result):
    with open(image_path, "rb") as f:
        img_data = f.read()
    img_base64 = base64.b64encode(img_data).decode()

    # Build HTML layout
    html_content = f"""
    <style>
        .flex-container {{
            display: flex;
            gap: 20px;
            align-items: flex-start;
        }}
        .image-box {{
            flex: 1.1;
        }}
        .image-box img {{
            width: 100%;
            max-width: 650px;
            border-radius: 6px;
        }}
        .table-box {{
            flex: 1.2;
            padding: 0px;
            border-radius: 0px;
            overflow-x: auto;
        }}
        h3 {{
            margin-bottom: 10px;
        }}
    </style>

    <div class="flex-container">
        <div class="image-box">
            <h3>📄 Input Document Page</h3>
            <img src="data:image/png;base64,{img_base64}"/>
        </div>
        <div class="table-box">
            <h3>📊 Extracted Table</h3>
            {result}
        </div>
    </div>
    """

    display(HTML(html_content))

In [7]:
def generate_text(image_path, message):

    model = "command-a-vision-07-2025"

    co = cohere.ClientV2("********************************")

    with open(image_path, "rb") as img_file:
        base64_image_url = f"data:image/jpeg;base64,{base64.b64encode(img_file.read()).decode('utf-8')}"

    response = co.chat(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": message},
                    {
                        "type": "image_url",
                        "image_url": {"url": base64_image_url},
                    },
                ],
            }
        ],
        temperature=0.3,
    )
    
    #return response.message.content[0].text
    html_table = markdown_table_to_html(response.message.content[0].text)
    display_results(image_path, html_table)

In [8]:
prompt = (
    "Extract the table from the given image and convert it into a structured markdown table. "
    "Ensure that the table headers, rows, columns, and all cell values are accurately captured and preserved. "
    "Maintain the original layout and semantic structure of the table as seen in the image."
)

## **Example n°1** <a id="example-n1"></a>

In [9]:
generate_text("/kaggle/input/tables-different-cases-cropped/aramco_table_1.jpg",
             prompt)

Cost,Land and land improvements,Buildings,Oil and gas properties,"Plant, machinery, and equipment",Storage tanks and pipelines,"Fixtures, office equipment",Construction-in-progress,Total
"**January 1, 2023**",50738,91617,641029,937307,95610,20758,262903,2099959
**Additions**,660,1000,2292,21507,375,245,164142,188224
**Acquisition (Note 35(a))**,482,806,–,–,779,35,44,2285
**Construction completed**,1358,2815,55216,47290,14232,802,"(121,713)",92
**Currency translation differences**,(59),171,–,813,(106),8,(85),912
**Transfers and adjustments**¹,(125),(77),"(3,024)",398,316,84,(670),"(3,098)"
**Transfer of exploration and evaluation assets**,–,–,–,–,–,–,–,–
**Transfers to assets held for sale**,(312),"(4,087)",–,"(21,758)",–,(415),(741),"(27,313)"
**Retirements and sales**,(563),(807),(424),"(6,982)",(956),(591),(279),"(10,632)"
"**December 31, 2023**",52179,91438,693089,979354,109506,20935,305724,2252225


## **Example n°2** <a id="example-n2"></a>

In [10]:
generate_text("/kaggle/input/tables-different-cases-cropped/aramco_table_3.jpg",
             prompt)

**Cost**,**Goodwill**,**Exploration and evaluation**,**Brands and trademarks**,**Franchise/customer relationships**,**Computer software**,**Other**,**Total**
"**January 1, 2024**",101010,20013,24982,21701,4233,3876,175815
Additions,-,8649,-,-,291,640,9580
Acquisition (Note 35(a)),255,-,-,58,4,24,341
Currency translation differences,(20),-,(251),(134),(30),48,(387)
Transfers and adjustments,(20),-,-,-,2,(73),(91)
Transfer of exploration and evaluation assets,-,"(5,433)",-,-,-,-,"(5,433)"
Retirements and write offs,-,"(2,325)",-,-,(919),(107),"(3,351)"
"**December 31, 2024**",101225,20904,24731,21625,3581,4408,176474
**Accumulated amortization**,,,,,,,
"**January 1, 2024**",-,-,"(2,795)","(4,465)","(2,681)","(1,320)","(11,261)"


## **Example n°3** <a id="example-n3"></a>

In [15]:
generate_text("/kaggle/input/different-tables-images-testing/other_table.jpg",
             prompt)

PAYS,Assurance Vie Valeur,Assurance Vie Part %,Assurance Non-Vie Valeur,Assurance Non-Vie Part %,Total Valeur,Part du marché mondial (%),Densité d'assurance (D),Taux de Pénétration (%)
Le Monde,8 720 399,415,12 304 529,585,21 024 929,1000,2 644,68
États-Unis et Canada,2 305 303,238,7 400 112,762,9 705 415,4620,26 087,113
États - Unis,2 083 219,227,7 092 186,773,9 175 405,4364,27 544,116
Canada,222 081,419,307 932,581,530 013,252,13 615,80
Amérique Latine et Caraïbes,229 164,435,297 681,565,526 845,251,800,30
Brésil,123 098,523,112 118,477,235 216,112,1 091,40
Mexique,46 953,417,58 100,553,105 053,50,822,24
Chili,18 368,514,17 354,486,35 721,17,1 804,38
Argentine,4 077,103,35 638,897,39 714,19,871,20
EMEA Avancée (*),2 754 266,569,2 087 069,431,4 841 335,2303,10 255,74


## **Example n°4** <a id="example-n4"></a>

In [16]:
generate_text("/kaggle/input/tables-different-cases-cropped/the-full-report-pdf_page-0026_cropped.jpg",
             prompt)

Apr-Jun million SEK,Sweden 2024,Sweden 2023,Norway 2024,Norway 2023,Finland 2024,Finland 2023,Other Europe 2024,Other Europe 2023,Central functions* 2024,Central functions* 2023,Group 2024,Group 2023
Room revenue,1 319,1 301,1 100,1 019,862,866,857,794,-,-,4 138,3 980
Restaurant and conference revenue,411,425,470,450,349,364,336,320,-,-,1 567,1 559
Franchise and management fees,2,2,4,3,-,-,-,-,-,-,7,5
Other hotel-related revenue,23,22,62,75,36,35,40,16,-,-,160,148
**Net sales**,**1 755**,**1 751**,**1 636**,**1 548**,**1 246**,**1 264**,**1 234**,**1 130**,**-**,**-**,**5 871**,**5 693**
Internal transactions,-,-,-,-,-,-,-,-,20,17,20,17
Group adjustments,-,-,-,-,-,-,-,-,-20,-17,-20,-17
**TOTAL OPERATING INCOME**,**1 755**,**1 751**,**1 636**,**1 548**,**1 246**,**1 264**,**1 234**,**1 130**,**-**,**-**,**5 871**,**5 693**
Raw materials and consumables,-105,-111,-142,-137,-96,-106,-72,-72,-0,-0,-415,-426
Other external expenses,-402,-400,-337,-299,-320,-317,-283,-269,207,164,-1 136,-1 120


## **Example n°5** <a id="example-n5"></a>

In [13]:
generate_text("/kaggle/input/cga-images/RAP_CGA_FR_ANG_2022-images-79 (1) (1).jpg",
             prompt)

LOCAL INSURANCE COMPANIES,LEGAL FORM,SPECIALITY,NET PREMIUMS 2021,NET PREMIUMS 2022,ANNUAL CHANGE 2022/2021
**DIRECT INSURANCE COMPANIES**,,,,,
STAR,LIMITED COMPANY,COMPOSITE,3682,3863,"4,9%"
COMAR,LIMITED COMPANY,COMPOSITE,2333,2528,"8,4%"
ASTREE,LIMITED COMPANY,COMPOSITE,1872,2360,"26,1%"
GAT,LIMITED COMPANY,COMPOSITE,2184,2350,"7,6%"
MAGHREBIA,LIMITED COMPANY,COMPOSITE,2025,2261,"11,7%"
ASSURANCES BIAT,LIMITED COMPANY,COMPOSITE,1714,2063,"20,4%"
AMI,LIMITED COMPANY,COMPOSITE,1429,1898,"32,8%"
BH ASSURANCE,LIMITED COMPANY,COMPOSITE,1475,1616,"9,6%"
LLOYD TUNISIEN,LIMITED COMPANY,COMPOSITE,1444,1593,"10,3%"


## **Example n°6** <a id="example-n6"></a>

In [18]:
generate_text("/kaggle/input/pdf-files-pages/Blackstone4Q24EarningsPressRelease_page-0020.jpg",
             prompt)

"**Three Months Ended December 31, 2024**",**Real Estate**,**Private Equity**,**Credit & Insurance**,**Multi-Asset Investing**,**Total**,"**Twelve Months Ended December 31, 2024**",**Real Estate**.1,**Private Equity**.1,**Credit & Insurance**.1,**Multi-Asset Investing**.1,**Total**.1
**Beginning Balance**,"$ 325,076","$ 344,710","$ 354,742","$ 83,101","$ 1,107,628",**Beginning Balance**,"$ 336,940","$ 314,391","$ 312,674","$ 76,187","$ 1,040,192"
**Inflows**,8094,11617,34181,3607,57500,**Inflows**,27941,41285,91200,11032,171459
**Outflows**,"(3,047)","(2,735)","(3,907)","(3,856)","(13,545)",**Outflows**,"(24,543)","(7,226)","(6,348)","(9,688)","(47,805)"
**Net Flows**,5047,8882,30274,(248),43955,**Net Flows**,3398,34059,84853,1344,123654
**Realizations**,"(5,457)","(10,566)","(8,698)","(1,179)","(25,900)",**Realizations**,"(22,164)","(28,931)","(33,319)","(2,729)","(87,142)"
**Market Activity**,"(9,312)",9112,(810),2477,1497,**Market Activity**,"(2,820)",32648,11300,9348,50476
**Ending Balance**,"$ 315,353","$ 352,169","$ 375,508","$ 84,150","$ 1,127,180",**Ending Balance**,"$ 315,353","$ 352,169","$ 375,508","$ 84,150","$ 1,127,180"
**% Change**,(3)%,2%,6%,1%,2%,**% Change**,(6)%,12%,20%,10%,8%
,,,,,,,,,,,
,,,,,,,,,,,


## **Example n°7** <a id="example-n7"></a>

In [19]:
generate_text("/kaggle/input/pdf-files-pages/CLAS-FY2023-AR_page-0066.jpg",
             prompt)

**FINANCIAL REVIEW**,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
**FY 2023**,,**FY 2022**,,
**Revenue**,**Gross Profit**,**Revenue**,**Gross Profit**,
(S$million),(S$million),(S$million),(S$million),
**Master Leases**,,,,
Australia,10.6,9.8,10.6,10.0
France,33.1,29.8,27.2,25.0
Germany,16.6,14.8,13.4,12.2
Japan,22.2,19.7,22.0,19.5
South Korea,8.5,7.9,5.5,5.0
**Subtotal**,**91.0**,**82.0**,**78.7**,**71.7**


## **Example n°8** <a id="example-n8"></a>

In [21]:
generate_text("/kaggle/input/pdf-files-pages/CLAS-FY2023-AR_page-0076.jpg",
             prompt)

Property Name,Address,Number of Units,Tenure (Years),Tenure Expiry Date (Year),Agreed Property Value at Acquisition ($$million)
**United Kingdom**,,,,,
Citadines Barbican London,"7-21 Goswell Road, London EC1M 7AH, United Kingdom",129.0,Freehold,-,75.0
Citadines Holborn-Covent Garden London,"94-99 High Holborn, London WC1V 6LF, United Kingdom",192.0,Freehold,-,127.5
Citadines South Kensington London,"35A Gloucester Road, London SW7 4PL, United Kingdom",92.0,Freehold,-,71.1
Citadines Trafalgar Square London,"18/21 Northumberland Avenue, London WC2N 5EA, United Kingdom",187.0,Freehold,-,130.9
The Cavendish London,"81 Jermyn St, St. James's, London SW1Y 6JF, United Kingdom",230.0,150,2158,372.3
**United States of America (USA)**,,,,,
Element New York Times Square West,"311 West 39th Street, New York, New York 10018, The United States of America",411.0,99,2112,220.7
Sheraton Tribeca New York Hotel,"370 Canal Street, New York, New York 10013, The United States of America",369.0,99,2112,218.0
voco Times Square South,"343 West 36th Street, New York, New York 10018, The United States of America",224.0,Freehold,-,148.4


## **Example n°9** <a id="example-n9"></a>

In [22]:
generate_text("/kaggle/input/pdf-files-pages/Marriott International Reports Third Quarter 2024 Results_page-0011.jpg",
             prompt)

**Brand**,**2024**,**vs. 2023**,**2024**.1,**vs. 2023**.1,**Average Daily Rate 2024**,**vs. 2023**.2
**REVPAR**,,**% Change**,**% Change**,**$ Change**,**% Change**,
JW Marriott,$233.04,2.8%,70.6%,0.0% pts.,$330.13,2.8%
The Ritz-Carlton,$339.10,2.5%,66.4%,1.1% pts.,$510.94,0.9%
W Hotels,$214.16,0.4%,67.2%,0.8% pts.,$318.76,-0.7%
**Composite US & Canada Luxury¹**,$289.18,1.5%,68.8%,0.6% pts.,$420.36,0.6%
Marriott Hotels,$172.23,4.4%,71.3%,0.7% pts.,$241.48,3.4%
Sheraton,$161.49,7.7%,69.3%,2.4% pts.,$233.20,4.0%
Westin,$175.46,4.3%,70.7%,0.8% pts.,$248.14,3.0%
**Composite US & Canada Premium²**,$168.20,4.4%,70.3%,0.6% pts.,$239.14,3.6%
**US & Canada Full-Service³**,$194.24,3.4%,70.0%,0.6% pts.,$277.47,2.6%


## **Example n°10** <a id="example-n10"></a>

In [24]:
generate_text("/kaggle/input/pdf-files-pages/q4-2023-earnings-release_page-0007.jpg",
             prompt)

**Region**,**Occupancy**,**ADR**,**RevPAR**
**System-wide**,69.0%,$ 156.07,$ 107.69
**U.S.**,68.2%,$ 162.19,$ 110.64
**Americas (excluding U.S.)**,67.4%,$ 148.57,$ 100.19
**Europe**,72.7%,$ 160.27,$ 116.50
**Middle East & Africa**,76.3%,$ 187.21,$ 142.78
**Asia Pacific**,70.2%,$ 113.45,$ 79.60
**Brand**,**Occupancy**,**ADR**,**RevPAR**
---,---,---,---
**Waldorf Astoria Hotels & Resorts**,64.9%,$ 515.05,$ 334.05
**LXR Hotels & Resorts**,49.3%,$ 539.47,$ 266.21


## **Example n°11** <a id="example-n11"></a>

In [25]:
generate_text("/kaggle/input/tables-different-cases-cropped/aramco_table_5.jpg",
             prompt)

**Non-current**,**2024**,**2023**
Home loans (Note 9(a)),13199,12427
Loans and advances,7285,9066
Loans to joint ventures and associates (Note 29(b)),6839,9866
Advance payment related to long-term sales agreement (Note 35(c)(iii)),5596,5833
Derivative assets (Note 3),4259,4299
"Receivable from Government, semi-Government and other entities with",,
Government ownership or control (Note 29(b)),2554,1151
Home ownership construction,1224,692
Lease receivable from associates (Note 29(b)),364,389
Other,5524,4542


# **Conclusion** <a id="conclusion"></a>

<div style="
  background-color: #EF4635;  /* vivid red from Command A Vision palette */
  padding: 25px 30px;
  border-radius: 12px;
  color: white;
  font-family: 'Segoe UI', sans-serif;
  font-size: 16px;
  line-height: 1.6;
  box-shadow: 0 4px 10px rgba(0,0,0,0.15);
  margin: 30px auto;
">
  <p>
    After exploring a wide range of table structures and document layouts, it is clear that 
    <strong>Cohere Command A Vision</strong> excels in the task of table understanding and recognition.
    The model consistently detects table boundaries, interprets cell-level content, and preserves structural relationships with high accuracy.
  </p>
  <p>
    While most extractions were accurate and well-structured, certain complex or irregular table formats still posed challenges, such as multi-level headers, merged cells, or nested tables.
    These edge cases occasionally led to misaligned outputs or partial recognition.
  </p>
  <p>
    Overall, <strong>Command A Vision</strong> proves to be a highly capable model for table-based document analysis,
    offering strong potential for use in automated data extraction pipelines, document intelligence workflows, and enterprise reporting solutions.
  </p>
</div>