# NLP Synonym Generator: NLTK with WordNet vs. Claude-3.5-Sonnet
* Notebook by Adam Lang
* Date: 2/19/2025

# Overview
* This is a head to head experiment of a static synonym generator using WordNet with NLTK vs. Claude-3.5-Sonnet on AWS Bedrock. 

# Demo 1 - WordNet API
* WordNet is a large lexical database of English.
  * Link: https://wordnet.princeton.edu/
* You can use the NLTK (Natural Language Toolkit) library in Python to freely access WordNet and retrieve synonyms.

## Advantages
* WordNet is multilingual and used world wide.
* Easy to access via nltk library.

## Disadvantages
* "Static" -- only based on what is in the current corpus (similar to a dictionary lookup).
* Lack of semantic and contextual understanding.

In [1]:
%%capture
!pip install nltk

In [2]:
from IPython.display import Markdown, display
import nltk
from nltk.corpus import wordnet

# Download WordNet data
nltk.download('wordnet', quiet=True)

True

Note:
* Below you can change the default `n=5` to any number you want.
* I printed the results in Markdown but you can also print it in a list or array.

In [52]:
## keyword synonyms
def get_synonyms(word, n=10):
    synonyms = []
    for syn in wordnet.synsets(word):
        for lemma in syn.lemmas():
            if lemma.name() != word:
                synonyms.append(lemma.name())
    return list(set(synonyms))[:n]


# Example usage with `innovation
word = "innovation"
synonyms = get_synonyms(word)
## print list of synonyms
#print(f"Synonyms for {word}: {synonyms}")



# Print synonyms in Markdown format
display(Markdown(f"## Synonyms for {word}\nHere are some synonyms for the word **{word}**:\n\n- " + "\n- ".join(synonyms)))

## Synonyms for innovation
Here are some synonyms for the word **innovation**:

- founding
- conception
- creation
- design
- excogitation
- introduction
- foundation
- origination
- instauration
- invention

In [53]:
# Example usage with `innovation
word = "innovation and creativity"
synonyms = get_synonyms(word)
## print list of synonyms
#print(f"Synonyms for {word}: {synonyms}")



# Print synonyms in Markdown format
display(Markdown(f"## Synonyms for {word}\nHere are some synonyms for the word **{word}**:\n\n- " + "\n- ".join(synonyms)))

## Synonyms for innovation and creativity
Here are some synonyms for the word **innovation and creativity**:

- 

### Summary
* Note above that I tried using 3 words or a trigram and it doesnt work.
* Obviously if you assume people will only enter 1 word, but what if its more than that? You will see below that Claude can handle this with ease.
* Now we could fix the NLTK implementation above by simply adding support for n-grams for n=2, n=3, etc. but again remember this is static!

In [4]:
# 2. Example using `Respect`
word = "respect"
synonyms = get_synonyms(word)
## print list of synonyms
#print(f"Synonyms for {word}: {synonyms}")

# Print synonyms in Markdown format
display(Markdown(f"## Synonyms for {word}\nHere are some synonyms for the word **{word}**:\n\n- " + "\n- ".join(synonyms)))

## Synonyms for respect
Here are some synonyms for the word **respect**:

- deference
- respectfulness
- obedience
- honor
- regard
- prise
- honour
- observe
- prize
- esteem

In [5]:
# 3. Example using `AI`
word = "AI"
synonyms = get_synonyms(word)
## print list of synonyms
#print(f"Synonyms for {word}: {synonyms}")

# Print synonyms in Markdown format
display(Markdown(f"## Synonyms for {word}\nHere are some synonyms for the word **{word}**:\n\n- " + "\n- ".join(synonyms)))

## Synonyms for AI
Here are some synonyms for the word **AI**:

- artificial_intelligence
- Bradypus_tridactylus
- ai
- Army_Intelligence
- artificial_insemination
- three-toed_sloth

## Summary for WordNet Approach
* Obviously this is the problem with this "static solution" for acronyms and abbreviations like "AI" since it does not leverage embeddings it has no semantic context.

# Demo 2 - Claude on AWS Bedrock

## Advantages of using Claude

1. **Context-Aware**
    * Claude can understand context and provide synonyms that are more relevant to the intended meaning of the word.
    * Bedrock foundation models like Claude can also handle n-grams of n+1 and still return 1 word. We could modify the prompt as needed.

2. **Flexibility**
    * It can handle a wide range of words, including less common terms, technical jargon, and even phrases.

3. **Up-to-date**
    * As Claude is regularly updated, it can provide synonyms for newer terms or evolving language usage.

4. **Customizable**
    * You can adjust the prompt to get more specific results, e.g., asking for formal synonyms, colloquial synonyms, or synonyms in a particular field.

5. **Scalable**
    * Amazon Bedrock is designed to handle large-scale applications, so this solution can easily scale with your needs.


## Disadvantages of using Claude
* This method may incur costs based on AWS Bedrock usage. However, if you calculate token inputs and cost per token its probably minimal.
* The quality of synonyms can vary and may sometimes include related words rather than strict synonyms. You might want to add some post-processing to filter results if needed.
* Response time might be slightly longer compared to dictionary-based methods, but it offers more flexibility and potentially better contextual understanding.

In [6]:
%%capture
!pip install boto3

In [28]:
import boto3
import json
import asyncio
import random
from botocore.exceptions import ClientError
import nest_asyncio
from IPython.display import display, Markdown

## async API calls to bedrock
nest_asyncio.apply()

In [8]:
# Set up Bedrock client
bedrock = boto3.client(
    service_name='bedrock-runtime',
    region_name='<your AWS region here>'  # Replace with your region
)

In [None]:
## init boto session
session = boto3.Session()
credentials = session.get_credentials()
print(f"Access Key: {credentials.access_key}")
print(f"Secret Key: {'*' * len(credentials.secret_key)}")
print(f"Region: {session.region_name}")

## Class to run API call to Claude on AWS Bedrock
* Notes:
1. You can change the `n`, right now i have it set to 5. If you want to change it go to this method in the class:
   * `async def get_synonyms_claude(self, word, n=5):`
2. I set the model `temperature` at 0.5 to introduce some probabilistic or randomness but you can reduce this closer to 0 to make it more deterministic and move it closer to 1 to make it more diverse.

In [44]:
class SynonymsWithClaude:
    def __init__(self, region_name='<your AWS region here>'):
        self.bedrock = boto3.client('bedrock-runtime', region_name=region_name)
        self.model_id = 'anthropic.claude-3-5-sonnet-20240620-v1:0'

    async def invoke_with_retry(self, body, max_retries=5, initial_delay=1):
        for attempt in range(max_retries):
            try:
                response = await asyncio.to_thread(
                    self.bedrock.invoke_model,
                    modelId=self.model_id,
                    body=body
                )
                return response
            except ClientError as e:
                if e.response['Error']['Code'] == 'ThrottlingException':
                    delay = (2 ** attempt) + random.uniform(0, 1)
                    print(f"Request throttled. Retrying in {delay:.2f} seconds...")
                    await asyncio.sleep(delay)
                else:
                    raise
        raise Exception("Max retries reached")

    async def get_synonyms_claude(self, word, n=5):
        prompt = f"""Generate {n} synonyms for the word "{word}".
        Provide only the synonyms as a comma-separated list, without any additional text or explanation."""

        body = json.dumps({
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 150,
            "messages": [{"role": "user", "content": prompt}],
            "temperature": 0.5,
            "top_p": 1,
        })

        try:
            response = await self.invoke_with_retry(body)
            response_body = json.loads(response['body'].read())
            synonyms = response_body['content'][0]['text'].strip().split(', ')
            # Normalize capitalization
            synonyms = [syn.lower() for syn in synonyms]
            return synonyms[:n]  # Ensure we return at most n synonyms
        except Exception as e:
            print(f"Error in API call: {str(e)}")
            return []

    async def display_synonyms(self, word):
        synonyms = await self.get_synonyms_claude(word)
        markdown_output = f"## Synonyms for \"{word}\"\n\n" + "\n".join(f"- {synonym}" for synonym in synonyms)
        display(Markdown(markdown_output))



In [46]:
# Create an instance of the class
synonym_generator = SynonymsWithClaude()

## Example 1 - "innovation"
word = "innovation"
await synonym_generator.display_synonyms(word)

## Synonyms for "innovation"

- advancement
- breakthrough
- invention
- modernization
- novelty

In [47]:
## Example #2 -- "Respect"
word = "respect"
await synonym_generator.display_synonyms(word)

## Synonyms for "respect"

- admiration
- esteem
- regard
- reverence
- deference

In [49]:
## Example #3 -- "AI"
word = "AI"
await synonym_generator.display_synonyms(word)

## Synonyms for "AI"

- artificial intelligence
- machine intelligence
- cognitive computing
- smart technology
- intelligent systems

In [51]:
## Example #4 -- "AI and ML"
word = "innovation and creativity"
await synonym_generator.display_synonyms(word)

## Synonyms for "innovation and creativity"

- ingenuity
- originality
- inventiveness
- imagination
- resourcefulness

## Summary for Claude 
* We can see that "AI" was better handled here since its an LLM with a large context window and not a static corpus like WordNet.