# Extracting Structured JSON using Claude and Tool Use

In this cookbook, we'll explore various examples of using Claude and the tool use feature to extract structured JSON data from different types of input. We'll define custom tools that prompt Claude to generate well-structured JSON output for tasks such as summarization, entity extraction, sentiment analysis, and more.

If you want to get structured JSON data without using tools, take a look at our "[How to enable JSON mode](https://github.com/anthropics/anthropic-cookbook/blob/main/misc/how_to_enable_json_mode.ipynb)" cookbook.

## Set up the environment

First, let's install the required libraries and set up the Anthropic API client.

In [None]:
%pip install boto3 requests beautifulsoup4

In [1]:
import os
import boto3
import json
import requests
from bs4 import BeautifulSoup


# "anthropic.claude-3-haiku-20240307-v1:0"  "anthropic.claude-3-sonnet-20240229-v1:0"  "anthropic.claude-3-opus-20240229-v1:0"
MODEL_ID = "anthropic.claude-3-opus-20240229-v1:0"

region_name= os.getenv("AWS_REGION", default="us-west-2")
bedrock_client = boto3.client(service_name='bedrock-runtime', region_name=region_name)

## Example 1: Article Summarization

In this example, we'll use Claude to generate a JSON summary of an article, including fields for the author, topics, summary, coherence score, persuasion score, and a counterpoint.

In [2]:
tools = [
    {
        "toolSpec": {
            "name": "print_summary",
            "description": "Prints a summary of the article.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "author": {"type": "string", "description": "Name of the article author"},
                        "topics": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": 'Array of topics, e.g. ["tech", "politics"]. Should be as specific as possible, and can overlap.'
                        },
                        "summary": {"type": "string", "description": "Summary of the article. One or two paragraphs max."},
                        "coherence": {"type": "integer", "description": "Coherence of the article's key points, 0-100 (inclusive)"},
                        "persuasion": {"type": "number", "description": "Article's persuasion score, 0.0-1.0 (inclusive)"}
                    },
                    "required": ['author', 'topics', 'summary', 'coherence', 'persuasion']
                }
            }
        }
    }
]

url = "https://www.anthropic.com/news/third-party-testing"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
article = " ".join([p.text for p in soup.find_all("p")])

query = f"""
<article>
{article}
</article>

Use the `print_summary` tool.
"""

converse_response = bedrock_client.converse(
        modelId=MODEL_ID,
        messages = [{
            "role": "user",
            "content": [{"text": query}]
        }],
        inferenceConfig = {"maxTokens": 4096},
        toolConfig = {
            'tools': tools
        }
    )

json_summary = None
for content in converse_response['output']['message']['content']:
    if content.get('toolUse') and content['toolUse']['name'] == "print_summary":
        json_summary = content['toolUse']['input']
        break

if json_summary:
    print("JSON Summary:")
    print(json.dumps(json_summary, indent=2))
else:
    print("No JSON summary found in the response.")

JSON Summary:
{
  "author": "Who is the author of this article? The author name is required to print the summary.",
  "topics": [
    "AI",
    "AI safety",
    "AI policy",
    "AI regulation"
  ],
  "summary": "This article argues for the development of a robust third-party testing regime for frontier AI systems to validate their safety and prevent deliberate misuse or accidents. The key points are:\n\n- Today's powerful AI systems pose risks if misused or poorly implemented\n- Self-governance by AI companies is insufficient \n- An ecosystem of third-party testers (government, academia, private sector) is needed to test AI systems\n- Anthropic outlines the testing approaches and policy ideas they advocate for, such as greater government funding for AI testing, evaluating AI systems through public sector research infrastructure, and developing tests for specific high-risk capabilities\n\nThe article makes a coherent and fairly persuasive case for increased oversight and testing of AI 

## Example 2: Named Entity Recognition
In this example, we'll use Claude to perform named entity recognition on a given text and return the entities in a structured JSON format.

In [3]:
tools = [
    {
        "toolSpec": {
            "name": "print_entities",
            "description": "Prints extract named entities.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "entities": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {
                                        "type": "string",
                                        "description": "The extracted entity name."
                                    },
                                    "type": {
                                        "type": "string",
                                        "description": "The entity type (e.g., PERSON, ORGANIZATION, LOCATION)."
                                    },
                                    "context": {
                                        "type": "string",
                                        "description": "The context in which the entity appears in the text."
                                    }
                                },
                                "required": ["name", "type", "context"]
                            }
                        }
                    },
                    "required": [
                        "entities"
                    ]
                }
            }
        }
    }
]

text = "John works at Google in New York. He met with Sarah, the CEO of Acme Inc., last week in San Francisco."

query = f"""
<document>
{text}
</document>

Use the print_entities tool.
"""

converse_response = bedrock_client.converse(
        modelId=MODEL_ID,
        messages = [{
            "role": "user",
            "content": [{"text": query}]
        }],
        inferenceConfig = {"maxTokens": 4096},
        toolConfig = {
            'tools': tools
        }
    )


json_entities = None
for content in converse_response['output']['message']['content']:
    if content.get('toolUse') and content['toolUse']['name'] == "print_entities":
        json_entities = content['toolUse']['input']
        break

if json_entities:
    print("Extracted Entities (JSON):")
    print(json.dumps(json_entities, indent=2))
else:
    print("No entities found in the response.")

Extracted Entities (JSON):
{
  "entities": [
    {
      "name": "John",
      "type": "PERSON",
      "context": "John works at Google in New York."
    },
    {
      "name": "Google",
      "type": "ORGANIZATION",
      "context": "John works at Google in New York."
    },
    {
      "name": "New York",
      "type": "LOCATION",
      "context": "John works at Google in New York."
    },
    {
      "name": "Sarah",
      "type": "PERSON",
      "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
    },
    {
      "name": "Acme Inc.",
      "type": "ORGANIZATION",
      "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
    },
    {
      "name": "San Francisco",
      "type": "LOCATION",
      "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
    }
  ]
}


## Example 3: Sentiment Analysis
In this example, we'll use Claude to perform sentiment analysis on a given text and return the sentiment scores in a structured JSON format.

In [4]:
tools = [
    {
        "toolSpec": {
            "name": "print_sentiment_scores",
            "description": "Prints the sentiment scores of a given text.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "positive_score": {
                            "type": "number",
                            "description": "The positive sentiment score, ranging from 0.0 to 1.0."
                        },
                        "negative_score": {
                            "type": "number",
                            "description": "The negative sentiment score, ranging from 0.0 to 1.0."
                        },
                        "neutral_score": {
                            "type": "number",
                            "description": "The neutral sentiment score, ranging from 0.0 to 1.0."
                        }
                    },
                    "required": [
                        "positive_score",
                        "negative_score",
                        "neutral_score"
                    ]
                }
            }
        }
    }
]

text = "The product was okay, but the customer service was terrible. I probably won't buy from them again."

query = f"""
<text>
{text}
</text>

Use the print_sentiment_scores tool.
"""

converse_response = bedrock_client.converse(
        modelId=MODEL_ID,
        messages = [{
            "role": "user",
            "content": [{"text": query}]
        }],
        inferenceConfig = {"maxTokens": 4096},
        toolConfig = {
            'tools': tools
        }
    )


json_sentiment = None
for content in converse_response['output']['message']['content']:
    if content.get('toolUse') and content['toolUse']['name'] == "print_sentiment_scores":
        json_sentiment = content['toolUse']['input']
        break

if json_sentiment:
    print("Sentiment Analysis (JSON):")
    print(json.dumps(json_sentiment, indent=2))
else:
    print("No sentiment analysis found in the response.")

Sentiment Analysis (JSON):
{
  "positive_score": 0.1,
  "negative_score": 0.7,
  "neutral_score": 0.2
}


## Example 4: Text Classification
In this example, we'll use Claude to classify a given text into predefined categories and return the classification results in a structured JSON format.

In [7]:
tools = [
    {
        "toolSpec": {
            "name": "print_classification",
            "description": "Prints the classification results.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "categories": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {
                                        "type": "string",
                                        "description": "The category name."
                                    },
                                    "score": {
                                        "type": "number",
                                        "description": "The classification score for the category, ranging from 0.0 to 1.0."
                                    }
                                },
                                "required": [
                                    "name",
                                    "score"
                                ]
                            }
                        }
                    },
                    "required": [
                        "categories"
                    ]
                }
            }
        }
    }
]

text = "The new quantum computing breakthrough could revolutionize the tech industry."

query = f"""
<document>
{text}
</document>

Use the print_classification tool. The categories can be Politics, Sports, Technology, Entertainment, Business.
"""

converse_response = bedrock_client.converse(
        modelId=MODEL_ID,
        messages = [{
            "role": "user",
            "content": [{"text": query}]
        }],
        inferenceConfig = {"maxTokens": 4096},
        toolConfig = {
            'tools': tools
        }
    )

json_classification = None
for content in converse_response['output']['message']['content']:
    if content.get('toolUse') and content['toolUse']['name'] == "print_classification":
        json_classification = content['toolUse']['input']
        break

if json_sentiment:
    print("Text Classification (JSON):")
    print(json.dumps(json_classification, indent=2))
else:
    print("No text classification found in the response.")


Text Classification (JSON):
{
  "categories": [
    {
      "name": "Politics",
      "score": 0.0
    },
    {
      "name": "Sports",
      "score": 0.0
    },
    {
      "name": "Technology",
      "score": 1.0
    },
    {
      "name": "Entertainment",
      "score": 0.0
    },
    {
      "name": "Business",
      "score": 0.2
    }
  ]
}


These examples demonstrate how you can use Claude and the tool use feature to extract structured JSON data for various natural language processing tasks. By defining custom tools with specific input schemas, you can guide Claude to generate well-structured JSON output that can be easily parsed and utilized in your applications.