# Extracting Structured JSON using Claude and Tool Use

**DISCLAIMER - This notebook is an AWS adaptation of the original notebook from Anthropic available here: [https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb](https://github.com/anthropics/anthropic-cookbook/blob/main/tool_use/extracting_structured_json.ipynb) for working with Amazon Bedrock and the Converse API**

In this cookbook, we'll explore various examples of using Claude and the tool use feature to extract structured JSON data from different types of input. We'll define custom tools that prompt Claude to generate well-structured JSON output for tasks such as summarization, entity extraction, sentiment analysis, and more.

If you want to get structured JSON data without using tools, take a look at our "[How to enable JSON mode](https://github.com/anthropics/anthropic-cookbook/blob/main/misc/how_to_enable_json_mode.ipynb)" cookbook.

## Set up the environment

First, let's install the required libraries and set up the Anthropic API client.

In [None]:
%pip install -qU boto3 requests beautifulsoup4

In [20]:
import boto3
from pydantic import BaseModel, EmailStr, Field
from typing import Optional
import json, requests
from bs4 import BeautifulSoup

client = boto3.client("bedrock-runtime", region_name="us-west-2")
MODEL_NAME = "anthropic.claude-3-haiku-20240307-v1:0"


## Example 1: Article Summarization

In this example, we'll use Claude to generate a JSON summary of an article, including fields for the author, topics, summary, coherence score, persuasion score, and a counterpoint.

In [10]:
tools = [
    {
        "toolSpec": {
            "name": "print_summary",
            "description": "Prints a summary of the article.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "author": {"type": "string", "description": "Name of the article author"},
                        "topics": {
                            "type": "array",
                            "items": {"type": "string"},
                            "description": 'Array of topics, e.g. ["tech", "politics"]. Should be as specific as possible, and can overlap.'
                        },
                        "summary": {"type": "string", "description": "Summary of the article. One or two paragraphs max."},
                        "coherence": {"type": "integer", "description": "Coherence of the article's key points, 0-100 (inclusive)"},
                        "persuasion": {"type": "number", "description": "Article's persuasion score, 0.0-1.0 (inclusive)"}
                    },
                    "required": ['author', 'topics', 'summary', 'coherence', 'persuasion', 'counterpoint']
                }
            }
        }
    }
]

url = "https://www.anthropic.com/news/third-party-testing"
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
article = " ".join([p.text for p in soup.find_all("p")])

query = f"""
<article>
{article}
</article>

Use the `print_summary` tool.
"""

response = client.converse(
    modelId=MODEL_NAME,
    inferenceConfig={
        "temperature": 0,
        "maxTokens": 4000
    },
    toolConfig={"tools": tools},
    messages=[{"role": "user", "content": [{"text": query}]}]
)
json_summary = None
for content in response['output']['message']['content']:
    if "toolUse" in content and content['toolUse']['name'] == "print_summary":
        json_summary = content['toolUse']['input']
        break

if json_summary:
    print("JSON Summary:")
    print(json.dumps(json_summary, indent=2))
else:
    print("No JSON summary found in the response.")

JSON Summary:
{
  "author": "Anthropic",
  "topics": [
    "AI policy",
    "AI safety",
    "AI testing",
    "AI regulation"
  ],
  "summary": "The article argues that the AI sector needs effective third-party testing for frontier AI systems to avoid societal harm, whether deliberate or accidental. It discusses what third-party testing looks like, why it's needed, and some of the research Anthropic has done to arrive at this policy position. The article outlines the key ingredients of an effective third-party testing regime, including identifying national security risks, and discusses how this regime could be implemented through a diverse ecosystem of testing organizations. Anthropic plans to advocate for greater funding and support for AI testing and evaluation in government, as well as developing tests for specific national security-relevant capabilities. The article also touches on issues like openly accessible models and regulatory capture in the context of AI policy.",
  "cohere

## Example 2: Named Entity Recognition
In this example, we'll use Claude to perform named entity recognition on a given text and return the entities in a structured JSON format.

In [21]:
tools = [
    {
        "toolSpec": {
            "name": "print_entities",
            "description": "Prints extract named entities.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "entities": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {"type": "string", "description": "The extracted entity name."},
                                    "type": {"type": "string", "description": "The entity type (e.g., PERSON, ORGANIZATION, LOCATION)."},
                                    "context": {"type": "string", "description": "The context in which the entity appears in the text."}
                                },
                                "required": ["name", "type", "context"]
                            }
                        }
                    },
                    "required": ["entities"]
                }
            }
        }
    }
]

text = "John works at Google in New York. He met with Sarah, the CEO of Acme Inc., last week in San Francisco."

query = f"""
<document>
{text}
</document>

Use the print_entities tool.
"""

response = client.converse(
    modelId=MODEL_NAME,
    inferenceConfig={
        "temperature": 0,
        "maxTokens": 4000
    },
    toolConfig={"tools": tools},
    messages=[{"role": "user", "content": [{"text": query}]}]
)

json_entities = None
for content in response['output']['message']['content']:
    if "toolUse" in content and content['toolUse']['name'] == "print_entities":
        json_entities = content['toolUse']['input']
        break

if json_entities:
    print("Extracted Entities (JSON):")
    print(json.dumps(json_entities, indent=2))
else:
    print("No entities found in the response.")

Extracted Entities (JSON):
{
  "entities": [
    {
      "name": "John",
      "type": "PERSON",
      "context": "John works at Google in New York."
    },
    {
      "name": "Google",
      "type": "ORGANIZATION",
      "context": "John works at Google in New York."
    },
    {
      "name": "New York",
      "type": "LOCATION",
      "context": "John works at Google in New York."
    },
    {
      "name": "Sarah",
      "type": "PERSON",
      "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
    },
    {
      "name": "Acme Inc.",
      "type": "ORGANIZATION",
      "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
    },
    {
      "name": "San Francisco",
      "type": "LOCATION",
      "context": "He met with Sarah, the CEO of Acme Inc., last week in San Francisco."
    }
  ]
}


## Example 3: Sentiment Analysis
In this example, we'll use Claude to perform sentiment analysis on a given text and return the sentiment scores in a structured JSON format.

In [23]:
tools = [
    {
        "toolSpec": {
            "name": "print_sentiment_scores",
            "description": "Prints the sentiment scores of a given text.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "positive_score": {"type": "number", "description": "The positive sentiment score, ranging from 0.0 to 1.0."},
                        "negative_score": {"type": "number", "description": "The negative sentiment score, ranging from 0.0 to 1.0."},
                        "neutral_score": {"type": "number", "description": "The neutral sentiment score, ranging from 0.0 to 1.0."}
                    },
                    "required": ["positive_score", "negative_score", "neutral_score"]
                }
            }
        }
    }
]

text = "The product was okay, but the customer service was terrible. I probably won't buy from them again."

query = f"""
<text>
{text}
</text>

Use the print_sentiment_scores tool.
"""

response = client.converse(
    modelId=MODEL_NAME,
    inferenceConfig={
        "temperature": 0,
        "maxTokens": 4000
    },
    toolConfig={"tools": tools},
    messages=[{"role": "user", "content": [{"text": query}]}]
)

json_sentiment = None
for content in response['output']['message']['content']:
    if "toolUse" in content and content['toolUse']['name'] == "print_sentiment_scores":
        json_sentiment = content['toolUse']['input']
        break

if json_sentiment:
    print("Sentiment Analysis (JSON):")
    print(json.dumps(json_sentiment, indent=2))
else:
    print("No sentiment analysis found in the response.")

Sentiment Analysis (JSON):
{
  "negative_score": 0.7,
  "neutral_score": 0.2,
  "positive_score": 0.1
}


## Example 4: Text Classification
In this example, we'll use Claude to classify a given text into predefined categories and return the classification results in a structured JSON format.

In [24]:
tools = [
    {
        "toolSpec": {
            "name": "print_classification",
            "description": "Prints the classification results.",
            "inputSchema": {
                "json": {
                    "type": "object",
                    "properties": {
                        "categories": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "name": {"type": "string", "description": "The category name."},
                                    "score": {"type": "number", "description": "The classification score for the category, ranging from 0.0 to 1.0."}
                                },
                                "required": ["name", "score"]
                            }
                        }
                    },
                    "required": ["categories"]
                }
            }
        }
    }
]

text = "The new quantum computing breakthrough could revolutionize the tech industry."

query = f"""
<document>
{text}
</document>

Use the print_classification tool. The categories can be Politics, Sports, Technology, Entertainment, Business.
"""

response = client.converse(
    modelId=MODEL_NAME,
    inferenceConfig={
        "temperature": 0,
        "maxTokens": 4000
    },
    toolConfig={"tools": tools},
    messages=[{"role": "user", "content": [{"text": query}]}]
)

json_classification = None
for content in response['output']['message']['content']:
    if "toolUse" in content and content['toolUse']['name'] == "print_classification":
        json_classification = content['toolUse']['input']
        break

if json_classification:
    print("Text Classification (JSON):")
    print(json.dumps(json_classification, indent=2))
else:
    print("No text classification found in the response.")

Text Classification (JSON):
{
  "categories": [
    {
      "name": "Politics",
      "score": 0.1
    },
    {
      "name": "Sports",
      "score": 0.1
    },
    {
      "name": "Technology",
      "score": 0.8
    },
    {
      "name": "Entertainment",
      "score": 0.1
    },
    {
      "name": "Business",
      "score": 0.5
    }
  ]
}


These examples demonstrate how you can use Claude and the tool use feature to extract structured JSON data for various natural language processing tasks. By defining custom tools with specific input schemas, you can guide Claude to generate well-structured JSON output that can be easily parsed and utilized in your applications.