# 处理图表、图形和幻灯片
Claude 在处理图表、图形以及更广泛的幻灯片方面能力很强。根据您的用例，有很多技巧和窍门可供您利用。本教程将向您展示与这些材料一起使用 Claude 的常见模式。

## 图表和图形
在大多数情况下，使用 Claude 处理图表和图形很简单。让我们逐步介绍如何摄取它们并传递给 Claude，以及一些提高结果的常见技巧。

### 文档摄取和调用 Claude API
向 Claude 传递图表和图形的最佳方式是利用其视觉功能和 PDF 支持功能。也就是说，给 Claude 提供图表或图形的 PDF 文档，以及关于它的文本问题。

目前，只有 `claude-sonnet-4-5` 支持 PDF 功能。由于该功能仍处于测试阶段，您需要为其提供 `pdfs-2024-09-25` 测试版本标头。

In [None]:
# Install and create the Anthropic client.
%pip install anthropic

In [2]:
import base64
from anthropic import Anthropic

# While PDF support is in beta, you must pass in the correct beta header
client = Anthropic(default_headers={"anthropic-beta": "pdfs-2024-09-25"})
# For now, only claude-sonnet-4-5 supports PDFs
MODEL_NAME = "claude-sonnet-4-5"

In [37]:
# Make a useful helper function.
def get_completion(messages):
    response = client.messages.create(
        model=MODEL_NAME, max_tokens=8192, temperature=0, messages=messages
    )
    return response.content[0].text

In [12]:
# To start, we'll need a PDF. We will be using the .pdf document located at cvna_2021_annual_report.pdf.
# Start by reading in the PDF and encoding it as base64.
with open("./documents/cvna_2021_annual_report.pdf", "rb") as pdf_file:
    binary_data = pdf_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode("utf-8")

让我们看看如何将这个文档与一个简单的问题一起传递给模型。

In [13]:
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": base64_string,
                },
            },
            {"type": "text", "text": "What's in this document? Answer in a single sentence."},
        ],
    }
]

print(get_completion(messages))

This is a page from Carvana's 2021 Annual Report showing four key metrics: retail units sold, total revenue, total markets at year end, and car vending machines, all displaying significant growth from 2014 to 2021.


这很好！现在让我们问一些更有用的问题。

In [15]:
questions = [
    "What was CVNA revenue in 2020?",
    "How many additional markets has Carvana added since 2014?",
    "What was 2016 revenue per retail unit sold?",
]

for index, question in enumerate(questions):
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": base64_string,
                    },
                },
                {"type": "text", "text": question},
            ],
        }
    ]

    print(f"\n----------Question {index + 1}----------")
    print(get_completion(messages))


----------Question 1----------
According to the graph showing Total Revenue ($M), Carvana's revenue in 2020 was $5,587 million (or approximately $5.59 billion).

----------Question 2----------
According to the "TOTAL MARKETS AT YEAR END" graph, Carvana started with 4 markets in 2014 and grew to 311 markets by 2021. Therefore, Carvana added 307 markets since 2014 (311 - 4 = 307 additional markets).

----------Question 3----------
Let me calculate this for you:

2016 Revenue: $365 million
2016 Retail Units Sold: 18,761 units

$365 million ÷ 18,761 units = $19,455 per unit (rounded to nearest dollar)

So in 2016, Carvana's revenue per retail unit sold was approximately $19,455.


如您所见，Claude 能够回答关于图表和图形的相当详细的问题。但是，有一些技巧和窍门可以帮助您获得最佳效果。
- 有时 Claude 的算术能力会妨碍您。您会注意到，如果您对上述第三个问题进行采样，它偶尔会输出错误的结果，因为它搞乱了算术。考虑为 Claude 提供一个计算器工具，以确保它不会犯这些类型的错误。
- 对于超级复杂的图表和图形，我们可以让 Claude "首先描述您在文档中看到的每个数据点"作为一种方式来引发类似于我们在传统思维链中看到的改进。
- Claude 偶尔会在依赖于大量颜色来传达信息的图表上遇到困难，例如包含许多组的分组条形图。要求 Claude 首先使用十六进制代码识别图表中的颜色可以提高其准确性。

## 幻灯片
既然我们知道 Claude 是图表和图形的专家，将其扩展到图表和图形的真正归属——幻灯片是合乎逻辑的！

幻灯片代表着许多领域（包括金融服务）的重要信息来源。虽然您*可以*使用像 PyPDF 这样的包从幻灯片中提取文本，但它们图表/图形密集的特性通常使其成为糟糕的选择，因为模型将难以访问它们实际需要的信息。

因此，PDF 支持功能可以成为很好的替代方案。它在处理 PDF 文档时同时使用提取的文本和视觉功能。在本节中，我们将介绍如何在 Claude 中使用 PDF 文档来审查幻灯片，以及如何处理这种方法的常见陷阱。

将典型幻灯片导入 Claude 的最佳方式是将其下载为 PDF 并直接提供给 Claude。

In [17]:
# Open the multi-page PDF document the same way we did earlier.
with open("./documents/twilio_q4_2023.pdf", "rb") as pdf_file:
    binary_data = pdf_file.read()
    base_64_encoded_data = base64.b64encode(binary_data)
    base64_string = base_64_encoded_data.decode("utf-8")

In [18]:
# Now let's pass the document directly to Claude. Note that Claude will process both the text and visual elements of the document.
question = "What was Twilio y/y revenue growth for fiscal year 2023?"
content = [
    {
        "type": "document",
        "source": {"type": "base64", "media_type": "application/pdf", "data": base64_string},
    },
    {"type": "text", "text": question},
]

messages = [{"role": "user", "content": content}]

print(get_completion(messages))

According to the financial results shown in the presentation, Twilio's year-over-year revenue growth for fiscal year 2023 was 9%. This can be found in the "Total Company Results Highlights" section, which shows FY 2023 revenue growth of 9%.


这种方法是一个很好的入门方式，对于某些用例提供了最佳的性能。但是有一些限制。
- 您只能在请求中包含所有提供文档总共 100 页（我们打算随时间增加这个限制）。
- 如果您使用幻灯片内容作为 RAG 的一部分，在您的嵌入中引入多模态 PDF 可能会引起问题。

幸运的是，我们可以利用 Claude 的视觉功能来获得幻灯片质量的**文本形式**表示，这比普通 PDF 文本提取允许的要高得多。

我们发现最好的方法是让 Claude 从头到尾按顺序叙述幻灯片，传递当前幻灯片及其之前的叙述。让我们看看如何做。

In [41]:
# Define a prompt for narrating our slide deck. We would adjut this prompt based on the nature of the deck, but keep the structure largely the same.
prompt = """
You are the Twilio CFO, narrating your Q4 2023 earnings presentation.

The entire earnings presentation document is provided to you.
Please narrate this presentation from Twilio's Q4 2023 Earnings as if you were the presenter. Do not talk about any things, especially acronyms, if you are not exactly sure you know what they mean.

Do not leave any details un-narrated as some of your viewers are vision-impaired, so if you don't narrate every number they won't know the number.

Structure your response like this:
<narration>
    <page_narration id=1>
    [Your narration for page 1]
    </page_narration>

    <page_narration id=2>
    [Your narration for page 2]
    </page_narration>

    ... and so on for each page
</narration>

Use excruciating detail for each page, ensuring you describe every visual element and number present. Show the full response in a single message.
"""
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "document",
                "source": {
                    "type": "base64",
                    "media_type": "application/pdf",
                    "data": base64_string,
                },
            },
            {"type": "text", "text": prompt},
        ],
    }
]

# Now we use our prompt to narrate the entire deck. Note that this may take a few minutes to run (often up to 10).
completion = get_completion(messages)

In [42]:
import re

# Next we'll parse the response from Claude using regex
pattern = r"<narration>(.*?)</narration>"
match = re.search(pattern, completion.strip(), re.DOTALL)
if match:
    narration = match.group(1)
else:
    raise ValueError("No narration available. Likely due to the model response being truncated.")

现在我们有了基于文本的叙述（虽然远非完美，但效果很好），我们有能力在仅文本工作流中使用这个幻灯片。包括向量搜索！

作为最后的健全性检查，让我们对我们基于仅叙述的设置问几个问题！

In [43]:
questions = [
    "What percentage of q4 total revenue was the Segment business line?",
    "Has the rate of growth of quarterly revenue been increasing or decreasing? Give just an answer.",
    "What was acquisition revenue for the year ended december 31, 2023 (including negative revenues)?",
]

for index, question in enumerate(questions):
    prompt = f"""You are an expert financial analyst analyzing a transcript of Twilio's earnings call.
Here is the transcript:
<transcript>
{narration}
</transcript>

Please answer the following question:
<question>
{question}
</question>"""
    messages = [{"role": "user", "content": [{"type": "text", "text": prompt}]}]

    print(f"\n----------Question {index + 1}----------")
    print(get_completion(messages))


----------Question 1----------
Let me calculate this:

Segment revenue in Q4 2023: $75 million
Total revenue in Q4 2023: $1,076 million

$75M ÷ $1,076M = 0.0697 or approximately 7%

Therefore, the Segment business line represented approximately 7% of Twilio's total Q4 2023 revenue.

----------Question 2----------
Decreasing. The transcript shows Q4 2023 revenue growth was 5% year-over-year, while for the full year 2023 revenue growth was 9% year-over-year, indicating a slowing growth rate. Additionally, the Q1 2024 guidance projects even lower growth of 2-3% year-over-year, confirming the declining trend.

----------Question 3----------
Let me help calculate the acquisition revenue for 2023.

From the transcript, we can see:
- Total revenue for 2023: $4,154 million
- Organic revenue for 2023: $4,146 million

Therefore, acquisition revenue would be:
Total Revenue - Organic Revenue = $4,154M - $4,146M = $8 million

So the acquisition revenue for the year ended December 31, 2023 was $8 m

看起来不错！有了这些技巧，您就可以开始将模型应用于图表和图形密集的内容，如幻灯片。