# __Demo: Chain of Thought Prompting with LangChain and OpenAI__


## __Steps to Perform:__
Step 1: Set up the OpenAI API Key

Step 2: Define a Function to Get Completion

Step 3: Define Your Prompts

Step 4: Analyze the Output

### __Step 1: Set up the OpenAI API Key__
- Import the required libraries and set up the OpenAI API key.


In [13]:
import os
import openai
#from dotenv import load_dotenv, find_dotenv

#_ = load_dotenv(find_dotenv()) # read local .env file
#openai.api_key  = os.getenv('OPENAI_API_KEY')


### __Step 2: Define a Function to Get Completion__
- Construct a message with the user's prompt.
- Call the __openai.ChatCompletion.create__ method to get a response from the model.
- The temperature parameter is set to __0__ for deterministic (non-random) responses.

In [14]:
# def get_completion(prompt, model="gpt-3.5-turbo"):
#     messages = [{"role": "user", "content": prompt}]
#     response = openai.ChatCompletion.create(
#         model=model,
#         messages=messages,
#         temperature=0,
#     )
#     return response.choices[0].message["content"]

### __Step 3: Define Your Prompts__
- Provide a series of prompts that guide the model through a chain of thought.
- Call the __get_completion__ to get a response from the AI model.
- Print both the prompt and the AI-generated response.

In [15]:
# prompts = [
#     "Imagine you are a detective trying to solve a mystery.",
#     "You arrive at the crime scene and start looking for clues.",
#     "You find a strange object at the crime scene. What is it?",
#     "How does this object relate to the crime?",
#     "Who do you think is the suspect and why?"
# ]

# for prompt in prompts:
#     response = get_completion(prompt)
#     print(f"Prompt: {prompt}")
#     print(f"Response: {response}")
#     print()

In [8]:
from langchain.llms import OpenAI
llm = OpenAI()

In [10]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Example of a long document
document = """
Introduction: This is a comprehensive guide to splitting and chunking text data for large language models.
We will explore various techniques and strategies that help in managing long documents.
 
Section 1: Understanding Tokenization
Tokenization is the process of breaking down text into smaller pieces called tokens. 
These tokens could be words, characters, or subwords.

Section 2: The Need for Splitting
LLMs have limitations on the number of tokens they can process at once. 
Splitting ensures that the input text adheres to these constraints while maintaining context.

Conclusion: Effective chunking and splitting are essential for optimizing the performance of LLMs.
"""

# Use a recursive character splitter to handle complex text
splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)

# Split the document
chunks = splitter.split_text(document)

# Display the chunks
for i, chunk in enumerate(chunks):
    print(f"Chunk {i+1}: {chunk}")


Chunk 1: Introduction: This is a comprehensive guide to splitting and chunking text data for large language
Chunk 2: for large language models.
Chunk 3: We will explore various techniques and strategies that help in managing long documents.
Chunk 4: Section 1: Understanding Tokenization
Chunk 5: Tokenization is the process of breaking down text into smaller pieces called tokens.
Chunk 6: These tokens could be words, characters, or subwords.
Chunk 7: Section 2: The Need for Splitting
Chunk 8: LLMs have limitations on the number of tokens they can process at once.
Chunk 9: Splitting ensures that the input text adheres to these constraints while maintaining context.
Chunk 10: Conclusion: Effective chunking and splitting are essential for optimizing the performance of LLMs.


In [12]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings

# Step 2: Initialize the OpenAI Embeddings model
embedding_model = OpenAIEmbeddings()

# Generate embeddings for each chunk using the correct method
chunk_embeddings = embedding_model.embed_documents(chunks)

# Step 3: Display the embeddings for each chunk
for i, (chunk, embedding) in enumerate(zip(chunks, chunk_embeddings)):
    print(f"Chunk {i+1}: {chunk}")
    print(f"Embedding {i+1} (first 10 dimensions): {embedding[:10]}")
    print()  # Print a new line for better readability


Chunk 1: Introduction: This is a comprehensive guide to splitting and chunking text data for large language
Embedding 1 (first 10 dimensions): [0.002186535500513491, 0.026265843209877716, 0.018259969433441076, -0.02005580884968156, 0.009342469675332582, 0.02226290725298192, -0.019027656725419263, -0.008821540010764371, -0.021851646403277, -0.0331201869796696]

Chunk 2: for large language models.
Embedding 2 (first 10 dimensions): [-0.018505490438297193, 0.007317708458003559, 0.0036486338955528685, -0.013524814798472376, 0.015262259377594335, 0.011460321411630333, -0.012911599448150375, 0.013701966313597635, 0.00200146823007106, -0.03510319647533471]

Chunk 3: We will explore various techniques and strategies that help in managing long documents.
Embedding 3 (first 10 dimensions): [-0.007153482990749162, 0.017582218806624198, 0.024982373753098856, -0.021145257753182455, 0.003960453374762989, 0.020898585797456955, -0.024763110620296243, 0.014772902001348991, -0.01981596988829255, -9.2073