<div style="
  display: flex;
  flex-direction: column;
  align-items: center;
  justify-content: center;
  background: linear-gradient(135deg, #2F4B43, #4F6F65);
  padding: 40px 20px;
  border-radius: 15px;
  box-shadow: 0 2px 10px rgba(0,0,0,0.1);
  text-align: center;
  font-family: 'Segoe UI', sans-serif;
">
  <div>
    <h1 style="
      color: #EF4635;  /* Command A Vision red */
      font-weight: 800;
      font-size: 34px;
      margin: 0 0 12px 0;
    ">
      🧠 Testing Cohere <span style="color: #33DDE0;">Command A Vision</span> on Chart Understanding
    </h1>
    <p style="
      color: #f0f0f0;
      font-size: 18px;
      margin: 0;
      max-width: 700px;
    ">
      Exploring its performance in enterprise-level chart understanding across multiple domains.
    </p>
  </div>
</div>

## **Introduction** <a id="introduction"></a>

[Cohere Command A Vision](https://huggingface.co/CohereLabs/command-a-vision-07-2025?ref=cohere-ai.ghost.io) is a state-of-the-art open-weight vision-language model released by Cohere in July 2025. Built upon the Command R+ foundation, this model is optimized for enterprise-grade image understanding tasks such as chart interpretation, document parsing, OCR, infographics, and diagram reasoning. According to [Cohere’s official announcement](https://cohere.com/blog/command-a-vision), Command A Vision demonstrates strong performance across a range of benchmarks including ChartQA, AI2D, DocVQA, and OCRBench, consistently outperforming or rivaling proprietary models like GPT-4.1 and Mistral Medium 3.

This notebook explores the capabilities of Command A Vision specifically in the context of **chart understanding**. Through practical examples and visual inputs, the goal is to assess how effectively the model extracts, interprets, and reasons over data presented in graphical form. This includes handling elements such as axes, legends, labels, and layout variability across different chart types. By focusing on this use case, we aim to understand the model’s real-world potential in business intelligence, reporting automation, and visual data extraction scenarios.

# 📑 Table of Contents

1. [Introduction](#introduction)
2. [Install Libraries](#1-install-libraries)
3. [Import Libraries](#2-import-libraries)
4. [Inference](#3-inference)
   - [Example n°1](#example-n°1)
   - [Example n°2](#example-n°2)
   - [Example n°3](#example-n°3)
   - [Example n°4](#example-n°4)
   - [Example n°5](#example-n°5)
   - [Example n°6](#example-n°6)
   - [Example n°7](#example-n°7)
   - [Example n°8](#example-n°8)
   - [Example n°9](#example-n°9)
   - [Example n°10](#example-n°10)
   - [Example n°11](#example-n°11)
   - [Example n°12](#example-n°12)
   - [Example n°13](#example-n°13)
   - [Example n°14](#example-n°14)
   - [Example n°15](#example-n°15)
5. [Conclusion](#conclusion)

# **1. Install Libraries** <a id="1-install-libraries"></a>

In [20]:
%%capture
! pip install cohere
! pip install markdown2

# **2. Import Libraries** <a id="2-import-libraries"></a>

In [21]:
import cohere

import logging
import time
from pathlib import Path
import pandas as pd
from pdf2image import convert_from_path
import matplotlib.pyplot as plt
from PIL import Image
import os
import re

from IPython.display import display, HTML
import base64

import io
from io import BytesIO, StringIO

import warnings
import sys
import contextlib
import json
from rich.pretty import pprint
from rich.table import Table
from rich.markdown import Markdown
import base64
import markdown
import markdown2

import pandas as pd

# **3. Inference** <a id="3-inference"></a>

In [22]:
def markdown_table_to_html(markdown_text):
    match = re.search(r"\|.*?\|\n(\|[-| ]+\|\n)?((?:\|.*\|\n?)+)", markdown_text, re.DOTALL)
    if not match:
        raise ValueError("No markdown table found.")

    lines = match.group(0).strip().splitlines()
    header = [col.strip() for col in lines[0].split('|')[1:-1]]
    rows = [
        [col.strip() for col in row.split('|')[1:-1]]
        for row in lines[2:]  # Skip header and separator
    ]
    df = pd.DataFrame(rows, columns=header)
    return df.to_html(index=False, escape=False)

In [23]:
def display_results(image_path, result):
    with open(image_path, "rb") as f:
        img_data = f.read()
    img_base64 = base64.b64encode(img_data).decode()

    # Build HTML layout
    html_content = f"""
    <style>
        .flex-container {{
            display: flex;
            gap: 20px;
            align-items: flex-start;
        }}
        .image-box {{
            flex: 1.1;
        }}
        .image-box img {{
            width: 100%;
            max-width: 650px;
            border-radius: 6px;
        }}
        .table-box {{
            flex: 1.2;
            padding: 0px;
            border-radius: 0px;
            overflow-x: auto;
        }}
        h3 {{
            margin-bottom: 10px;
        }}
    </style>

    <div class="flex-container">
        <div class="image-box">
            <h3>📄 Input Document Page</h3>
            <img src="data:image/png;base64,{img_base64}"/>
        </div>
        <div class="table-box">
            <h3>📊 Extracted Table</h3>
            {result}
        </div>
    </div>
    """

    display(HTML(html_content))

In [24]:
def generate_text(image_path, message):

    model = "command-a-vision-07-2025"

    co = cohere.ClientV2("*********************************")

    with open(image_path, "rb") as img_file:
        base64_image_url = f"data:image/jpeg;base64,{base64.b64encode(img_file.read()).decode('utf-8')}"

    response = co.chat(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {"type": "text", "text": message},
                    {
                        "type": "image_url",
                        "image_url": {"url": base64_image_url},
                    },
                ],
            }
        ],
        temperature=0.3,
    )
    
    #return response.message.content[0].text
    html_table = markdown_table_to_html(response.message.content[0].text)
    display_results(image_path, html_table)

In [25]:
prompt = "Convert the chart into a structured markdown table. Ensure all axes, labels, legends, and data points are accurately represented in tabular format."

## **Example n°1** <a id="example-n1"></a>

In [26]:
generate_text("/kaggle/input/charts-samples/66d400d78fa4a872b554eaf7_65ae0a296447926f5f649b4a_Screenshot20at20PM.png",
             prompt)

**Category**,**GPT-4**,**Prometheus**
**not consistent with score**,2.00%,2.86%
**too general and abstract**,44.00%,14.29%
**overly optimistic**,18.00%,34.29%
**not relevant to the response**,14.00%,11.43%
**overly critical**,14.00%,31.43%
**unrelated to the score rubric**,8.00%,5.71%


## **Example n°2** <a id="example-n2"></a>

In [27]:
generate_text("/kaggle/input/tables-and-plots-for-testing/line_plot_cga.jpg",
             prompt)

Year,Densité d'assurance (D),Primes Nettes (M.D)
2018,194,22518
2019,2062,24144
2020,2178,25709
2021,2358,28332
2022,2688,3185


## **Example n°3** <a id="example-n3"></a>

In [28]:
generate_text("/kaggle/input/charts-samples/66ae849b44c7b688d2e0d92c_66ae848de76f47280744c4f3_Eval.png",
             prompt)

Eval Benchmark,GPT-4o-mini,Apple DCLM 7b,Llama 3 8b,Gemini Flash,Claude Haiku,GPT-4o
MMLU,82.0,63.7,68.4,77.9,73.8,88.7
GPQA,40.2,24.7,34.2,38.6,35.7,53.6
DROP,79.7,62.2,62.2,78.4,78.4,83.4
HumanEval,87.2,62.2,62.2,71.5,75.9,90.2
MathVista,56.7,30.98,30.0,58.4,46.4,63.8


## **Example n°4** <a id="example-n4"></a>

In [29]:
generate_text("/kaggle/input/charts-samples/SWE-Bench-dataset-compared-to-SWE-Bench.png",
             prompt)

**Project**,**SWE-bench+**,**SWE-bench**
**sympy**,77,386
**pytest**,76,119
**requests**,46,2
**seaborn**,27,5
**sphinx**,50,187
**xarray**,66,110
**pylint**,53,57
**astropy**,81,95
**flask**,13,2
**scikit-learn**,78,229


## **Example n°5** <a id="example-n5"></a>

In [30]:
generate_text("/kaggle/input/charts-samples/accuracy-bar-chart.png",
             prompt)

Model,SWE-Bench Lite Score
Claude 3.5 Sonnet,28.3
GPT-4o,22.1
o1-mini,19.4
Llama 3.1 405B,16.3
deepseek-v2.5,15.9
Gemini 1.5 Pro,15.6
Llama 3.2 90B,14.3
Llama 3.1 70B,11.6
GPT-4o Mini,7.9
Qwen 2.5 72B,7.8


## **Example n°6** <a id="example-n6"></a>

In [31]:
generate_text("/kaggle/input/charts-samples/context_window.png",
             prompt)

Model,Context Window
**Gemini 1.5 Pro**,1.00M
Claude 3 Opus,200k
Claude 3 Haiku,200k
GPT-4 Turbo,128k
Command-R+,128k
Command-R,128k
Mixtral 8x22B,65.4k
Mistral Large,32.8k
Mistral 8x7B,32.8k
Mistral 7B,32.8k


## **Example n°7** <a id="example-n7"></a>

In [32]:
generate_text("/kaggle/input/charts-samples/performance.png",
             prompt)

Domain,Bard,GPT-4-Turbo,ReAct,Llama2-Chat-13B,New Bing,Vicuna-13B
**History**,0.3,0.4,0.2,0.3,0.5,0.3
**Technology**,0.2,0.5,0.3,0.2,0.4,0.2
**Sports**,0.1,0.4,0.2,0.1,0.3,0.1
**Game**,0.3,0.4,0.2,0.3,0.5,0.3
**Music**,0.2,0.3,0.1,0.2,0.4,0.2
**Film**,0.3,0.4,0.2,0.3,0.5,0.3
**Economics**,0.2,0.5,0.3,0.2,0.4,0.2
**Others**,0.1,0.3,0.2,0.1,0.2,0.1


## **Example n°8** <a id="example-n8"></a>

In [33]:
generate_text("/kaggle/input/charts-samples/survey.png",
             prompt)

Category,Surveys & Definitions,Foundations & Standards,Frameworks & Programming Models,Design & Planning,Resource Management & Provisioning,Operation
**Infrastructure Design**,24,23,68,20,33,51
**Resource Analysis and Estimation**,20,15,20,13,29,16
**Service Provisioning (Orchestration & Migration)**,23,20,20,15,23,51
**Placement (VM/Service Placement)**,25,20,20,15,23,35
**Control and Monitoring**,23,20,20,15,23,23
**Software & Tools**,20,20,20,15,20,20
**Testbeds & Experiments**,25,20,20,15,20,25
**Security & Privacy**,23,20,20,15,20,23
**Hardware & Protocol Stack**,23,20,20,15,20,23


## **Example n°9** <a id="example-n9"></a>

In [34]:
generate_text("/kaggle/input/charts-samples/throughput_by_model.png",
             prompt)

Model,Output Tokens per Second
Llama 3 (8B),215
Command-R,110
Mistral 8x7B,98
Mistral 7B,93
Claude 3 Haiku,89
Gemini 1.0 Pro,81
DBRX,80
Mistral 8x22B,59
GPT-3.5 Turbo,56
Llama 3 (70B),56


## **Example n°10** <a id="example-n10"></a>

In [35]:
generate_text("/kaggle/input/charts-samples/Income-Statement-Financial-Graphs-with-Income-and-Revenue-.png",
             prompt)

Category,Increase,Decrease,Total
Gross Revenue,"$245,412",,
Revenue Adjustments,,"$2,412",
Net Revenue,,,"$2,43219"
Inventory,,"$14,899",
Merchandising,,"$18,731",
Other sales cost,,,"$103,345"
Gross Income,,"$6,244",
Staff,,"$(26,745)",
Marketing,,"$(11,279)",
Facilities & Insurance,,"$(36,000)",


## **Example n°11** <a id="example-n11"></a>

In [36]:
generate_text("/kaggle/input/charts-samples/Monthly-financial-graph-with-operating-profit-and-cost-of-goods-sold-.png",
             prompt)

Month,Cost of Sales,Professional Fees,Marketing,Other Operating,Operating Profit
Jan,"$15,000","$7,000","$5,000","$1,000","$28,000"
Feb,"$30,000","$7,000","$5,000","$1,000","$40,000"
Mar,"$20,000","$7,000","$5,000","$1,000","$30,000"
Apr,"$15,000","$4,000","$3,000","$1,000","$22,000"
May,"$18,000","$4,000","$3,000","$1,000","$25,000"
Jun,"$30,000","$10,000","$8,000","$2,000","$55,000"


## **Example n°12** <a id="example-n12"></a>

In [37]:
generate_text("/kaggle/input/charts-samples/column-chart.png",
             prompt)

**Category**,**Value (in Millions)**,**Year/Year Change**
**Total 2012 Revenue**,2015.0,-
**Baseline growth**,26.5,-
**Disconnects**,36.8,-
**Product termination**,28.1,-
**Volume discounts**,-12.2,-
**Dispute settlements**,167.5,-
**New contract**,-17.0,-
**Volume loss**,340.0,-
**New product launches**,24.7,-
**Terminate product B**,-4.8,-


## **Example n°13** <a id="example-n13"></a>

In [38]:
generate_text("/kaggle/input/charts-samples/progress-bar-chart-for-financial-analysis.jpg",
             prompt)

Product,Previous Sales,Current Sales
Ketchup,$19K,$67K
Soda,$11K,$52K
Pasta,$12K,$51K
Ice Cream,$14K,$43K
Bread,$12K,$35K
Cheese,$15K,$12K
Butter,$16K,$16K
Jelly,$18K,$20K


## **Example n°14** <a id="example-n14"></a>

In [39]:
generate_text("/kaggle/input/charts-samples/scatter-plot-for-financial-graphs.jpg",
             prompt)

Product,Category,Profit ($),Cost ($)
TVs,Electronics,1100,800
Formal Trousers,Garments,880,600
Sweater,Garments,880,600
Printers,Electronics,860,600
Air Conditioners,Electronics,860,600
Bronzer,Cosmetics,860,600
Highlighter,Cosmetics,860,600
Mobile Phones,Electronics,430,300
Concealer,Cosmetics,430,300
Polo Shirts,Garments,430,300


## **Example n°15** <a id="example-n15"></a>

In [40]:
generate_text("/kaggle/input/charts-samples/what-are-financial-graphs.jpg",
             prompt)

Month,Revenue (USD),Profit Margin (%)
Jan,"$50,000",22.0%
Feb,"$20,000",16.5%
Mar,"$15,000",11.0%
Apr,"$30,000",16.5%
May,"$45,000",11.0%
Jun,"$48,000",5.5%
Jul,"$13,000",5.5%
Aug,"$18,000",11.0%
Sep,"$10,000",5.5%
Oct,"$50,000",22.0%


# **Conclusion** <a id="conclusion"></a>

<div style="
  background-color: #EF4635;  /* vivid red from Command A Vision palette */
  padding: 25px 30px;
  border-radius: 12px;
  color: white;
  font-family: 'Segoe UI', sans-serif;
  font-size: 16px;
  line-height: 1.6;
  box-shadow: 0 4px 10px rgba(0,0,0,0.15);
  margin: 30px auto;
">
  <p>
    After exploring a variety of chart types and visual scenarios, it is evident that 
    <strong>Cohere Command A Vision</strong> demonstrates strong capabilities in interpreting and extracting information from charts in many cases. 
    The model successfully identifies key elements such as titles, legends, axis labels, and data points across different layouts and visual complexities.
  </p>
  <p>
    However, as shown in the examples above, the model also exhibits limitations. In certain cases, it struggles to correctly interpret the structure of more complex or unconventional charts, 
    leading to incomplete or inaccurate outputs. These failures often relate to challenges in layout parsing, multi-axis reasoning, or cluttered visual styles.
  </p>
  <p>
    Overall, <strong>Command A Vision</strong> offers a promising foundation for chart-based visual reasoning tasks, particularly in structured and semi-structured environments. 
    Future improvements may further enhance its robustness in handling a broader variety of real-world chart formats.
  </p>
</div>