# GPT Models and Langchain Integration Documentation

## Theory and Library Overview

### OpenAI GPT Models
The notebook demonstrates the use of OpenAI's GPT models through their Python API. GPT (Generative Pre-trained Transformer) models are large language models trained on vast amounts of text data. Key concepts:

- **Completion API**: Used for text generation with models like `davinci-002`
- **Chat Completion API**: Specialized for conversational AI using `gpt-3.5-turbo`
- **Temperature**: Controls randomness (0-1), lower values for focused responses
- **Max Tokens**: Limits response length

### Langchain Framework
Langchain is a framework for developing applications powered by language models. It provides:

- **Document Loaders**: Tools for importing various data sources
- **Text Splitters**: Break documents into manageable chunks
- **Embeddings**: Convert text to numerical vectors for semantic search
- **Vector Stores**: Database for storing and retrieving embeddings
- **Memory**: Maintain conversation context
- **Chains**: Combine multiple components for complex workflows

## Code Documentation

Let's add proper documentation to each section:



In [None]:
# Setting up the API
"""
This section configures the OpenAI API connection.
Requirements:
- OpenAI API key stored in myconfig.py
- openai Python package installed
"""

# Generating Text
"""
Basic text generation using GPT models.
Parameters:
- prompt: Input text to generate from
- max_tokens: Length of generated text
- temperature: Randomness factor (0-1)
"""

# Text Summarization
"""
Keyword extraction using ChatGPT.
Features:
- System prompt defines extraction task
- Example conversations guide model behavior
- Consistent formatting of extracted keywords
"""

# Poetic Chatbot
"""
Creative text generation with personality.
Implementation:
- Uses chat completion API
- Maintains conversation history
- Higher temperature for creative responses
- Example interactions guide poetic style
"""

# Langchain Integration
"""
Advanced document QA system using Langchain.
Components:
1. Document Loading: WebBaseLoader for web scraping
2. Text Processing: RecursiveCharacterTextSplitter for chunking
3. Embedding: OpenAI embeddings for semantic search
4. Storage: FAISS vector store for efficient retrieval
5. Memory: ConversationBufferMemory for context
6. QA Chain: Combines components for interactive queries
"""



## Usage Example

Here's how to use the main functions:



In [None]:
# Text Generation
text = generate_text("Your prompt here", max_tokens=50, temperature=0.7)

# Keyword Extraction
keywords = text_summarizer("Your text here")

# Poetic Response
poem = poetic_chatbot("Your question here")

# Document QA
answer = qa({"question": "Your question about the documents"})



## Requirements
- Python 3.7+
- openai
- langchain
- faiss-cpu
- transformers
- torch

## Best Practices
1. Secure API key management
2. Error handling for API calls
3. Rate limiting consideration
4. Memory management for large documents
5. Temperature tuning for different use cases

In [None]:
## Setting up the API
# Import required libraries and configure API
import openai
import myconfig
# Load API key from config file for security
api_key = myconfig.api_key
# Set the API key for OpenAI
openai.api_key = api_key

## Generating Text
# Define function to generate text using OpenAI's API
def generate_text(prompt):
    # Create completion using davinci-002 engine
    response = openai.Completion.create(
        engine="davinci-002",
        prompt=prompt,
        max_tokens=3,        # Limit response to 3 tokens
        temperature=0.3)     # Lower temperature for more focused output
    return response.choices[0].text.strip()

# Test basic text generation
prompt = "Once upon a time"
generated_text = generate_text(prompt)
print(prompt, generated_text)

## Customizing the Output
# Enhanced text generation with customizable parameters
def generate_text(prompt, max_tokens, temperature):
    # Create completion with adjustable parameters
    response = openai.Completion.create(
        engine="davinci-002",
        prompt=prompt,
        max_tokens=max_tokens,     # Adjustable response length
        temperature=temperature)    # Adjustable creativity
    return response.choices[0].text.strip()

## Summarising Text
# Function to extract keywords from text using GPT-3.5
def text_summarizer(prompt):
    # Use ChatCompletion for better contextual understanding
    response = openai.ChatCompletion.create(
      model="gpt-3.5-turbo",
      messages=[
        # System message defines the task
        {"role": "system", "content": "You will be provided with a block of text, and your task is to extract a list of keywords from it."},
        {
          "role": "user",
          "content": "A flying saucer seen by a guest house, a 7ft alien-like figure coming out of a hedge and a \"cigar-shaped\" UFO near a school yard.\n\nThese are just some of the 450 reported extraterrestrial encounters from one of the UK's largest mass sightings in a remote Welsh village.\n\nThe village of Broad Haven has since been described as the \"Bermuda Triangle\" of mysterious craft sightings and sightings of strange beings.\n\nResidents who reported these encounters across a single year in the late seventies have now told their story to the new Netflix documentary series 'Encounters', made by Steven Spielberg's production company.\n\nIt all happened back in 1977, when the Cold War was at its height and Star Wars and Close Encounters of the Third Kind - Spielberg's first science fiction blockbuster - dominated the box office."
        },
        {
          "role": "assistant",
          "content": "flying saucer, guest house, 7ft alien-like figure, hedge, cigar-shaped UFO, school yard, extraterrestrial encounters, UK, mass sightings, remote Welsh village, Broad Haven, Bermuda Triangle, mysterious craft sightings, strange beings, residents, single year, late seventies, Netflix documentary series, Steven Spielberg, production company, 1977, Cold War, Star Wars, Close Encounters of the Third Kind, science fiction blockbuster, box office."
        },
        {
          "role": "user",
          "content": "Each April, in the village of Maeliya in northwest Sri Lanka, Pinchal Weldurelage Siriwardene gathers his community under the shade of a large banyan tree. The tree overlooks a human-made body of water called a wewa – meaning reservoir or \"tank\" in Sinhala. The wewa stretches out besides the village's rice paddies for 175-acres (708,200 sq m) and is filled with the rainwater of preceding months.    \n\nSiriwardene, the 76-year-old secretary of the village's agrarian committee, has a tightly-guarded ritual to perform. By boiling coconut milk on an open hearth beside the tank, he will seek blessings for a prosperous harvest from the deities residing in the tree. \"It's only after that we open the sluice gate to water the rice fields,\" he told me when I visited on a scorching mid-April afternoon.\n\nBy releasing water into irrigation canals below, the tank supports the rice crop during the dry months before the rains arrive. For nearly two millennia, lake-like water bodies such as this have helped generations of farmers cultivate their fields. An old Sinhala phrase, \"wewai dagabai gamai pansalai\", even reflects the technology's centrality to village life; meaning \"tank, pagoda, village and temple\"."
        },
        {
          "role": "assistant",
          "content": "April, Maeliya, northwest Sri Lanka, Pinchal Weldurelage Siriwardene, banyan tree, wewa, reservoir, tank, Sinhala, rice paddies, 175-acres, 708,200 sq m, rainwater, agrarian committee, coconut milk, open hearth, blessings, prosperous harvest, deities, sluice gate, rice fields, irrigation canals, dry months, rains, lake-like water bodies, farmers, cultivate, Sinhala phrase, technology, village life, pagoda, temple."
        }, 
        {"role": "user", "content": prompt}
      ],
      temperature=0.5,
      max_tokens=256
    )
    return response.choices[0].message.content.strip()

## Poetic Chatbot
# Create a chatbot that responds in poetic form
def poetic_chatbot(prompt):
    # Use ChatCompletion with creative temperature setting
    response = openai.ChatCompletion.create(
        model = "gpt-3.5-turbo",
        messages = [
            # Define chatbot personality
            {"role": "system", "content": "You are a poetic chatbot."},
            {
                "role": "user",
                "content": "When was Google founded?"
            },
            {
                "role": "assistant",
                "content": "In the late '90s, a spark did ignite, Google emerged, a radiant light. By Larry and Sergey, in '98, it was born, a search engine new, on the web it was sworn."
            },
            {
                "role": "user",
                "content": "Which country has the youngest president?"
            },
            {
                "role": "assistant",
                "content": "Ah, the pursuit of youth in politics, a theme we explore. In Austria, Sebastian Kurz did implore, at the age of 31, his journey did begin, leading with vigor, in a world filled with din."
            },
            {"role": "user", "content": prompt}
        ],
        temperature = 1,        # High temperature for creative responses
        max_tokens=256         # Allow longer poetic responses
    )
    return response.choices[0].message.content.strip()

## Langchain Integration
# Import required Langchain components
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

# Set up web scraping and document processing
url = "https://365datascience.com/upcoming-courses"
import os
os.environ["USER_AGENT"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
loader = WebBaseLoader(url)
raw_documents = loader.load()

# Process and split documents
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(raw_documents)

# Create embeddings and vector store
embeddings = OpenAIEmbeddings(openai_api_key = api_key)
vectorstore = FAISS.from_documents(documents, embeddings)

# Set up conversation memory and QA chain
memory = ConversationBufferMemory(memory_key = "chat_history", return_messages=True)
qa = ConversationalRetrievalChain.from_llm(
    ChatOpenAI(openai_api_key=api_key, model="gpt-3.5-turbo", temperature=0), 
    vectorstore.as_retriever(), 
    memory=memory
)

## Setting up the API

In [1]:
import openai
import myconfig
#print(myconfig.api_key)

In [2]:
api_key = myconfig.api_key

In [3]:
openai.api_key = api_key

## Generating Text

In [4]:
def generate_text(prompt):
    response = openai.Completion.create(
        engine="davinci-002",
        prompt=prompt,
        max_tokens=3,
        temperature=0.3)
    
    '''
    response = client.responses.create(
        model="gpt-5-nano",
        input=prompt,
        max_tokens=3
        temperature=0.4)    
    '''
    return response.choices[0].text.strip()

In [5]:
prompt = "Once upon a time"

In [6]:
generated_text = generate_text(prompt)
print(prompt, generated_text)

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

## Customizing the Output

In [7]:
def generate_text(prompt, max_tokens, temperature):
    response = openai.Completion.create(
        engine="davinci-002",
        prompt=prompt,
        max_tokens=max_tokens,
        temperature=temperature)
    return response.choices[0].text.strip()

In [8]:
generated_text = generate_text(prompt, 50, 0)
print(prompt, generated_text)

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

In [9]:
generated_text = generate_text(prompt, 50, 1)
print(prompt, generated_text)

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

## Summarising Text

In [9]:
def text_summarizer(prompt):
    response = openai.ChatCompletion.create(
      model="gpt-3.5-turbo",
      messages=[
        {
          "role": "system",
          "content": "You will be provided with a block of text, and your task is to extract a list of keywords from it."
        },
        {
          "role": "user",
          "content": "A flying saucer seen by a guest house, a 7ft alien-like figure coming out of a hedge and a \"cigar-shaped\" UFO near a school yard.\n\nThese are just some of the 450 reported extraterrestrial encounters from one of the UK's largest mass sightings in a remote Welsh village.\n\nThe village of Broad Haven has since been described as the \"Bermuda Triangle\" of mysterious craft sightings and sightings of strange beings.\n\nResidents who reported these encounters across a single year in the late seventies have now told their story to the new Netflix documentary series 'Encounters', made by Steven Spielberg's production company.\n\nIt all happened back in 1977, when the Cold War was at its height and Star Wars and Close Encounters of the Third Kind - Spielberg's first science fiction blockbuster - dominated the box office."
        },
        {
          "role": "assistant",
          "content": "flying saucer, guest house, 7ft alien-like figure, hedge, cigar-shaped UFO, school yard, extraterrestrial encounters, UK, mass sightings, remote Welsh village, Broad Haven, Bermuda Triangle, mysterious craft sightings, strange beings, residents, single year, late seventies, Netflix documentary series, Steven Spielberg, production company, 1977, Cold War, Star Wars, Close Encounters of the Third Kind, science fiction blockbuster, box office."
        },
        {
          "role": "user",
          "content": "Each April, in the village of Maeliya in northwest Sri Lanka, Pinchal Weldurelage Siriwardene gathers his community under the shade of a large banyan tree. The tree overlooks a human-made body of water called a wewa – meaning reservoir or \"tank\" in Sinhala. The wewa stretches out besides the village's rice paddies for 175-acres (708,200 sq m) and is filled with the rainwater of preceding months.    \n\nSiriwardene, the 76-year-old secretary of the village's agrarian committee, has a tightly-guarded ritual to perform. By boiling coconut milk on an open hearth beside the tank, he will seek blessings for a prosperous harvest from the deities residing in the tree. \"It's only after that we open the sluice gate to water the rice fields,\" he told me when I visited on a scorching mid-April afternoon.\n\nBy releasing water into irrigation canals below, the tank supports the rice crop during the dry months before the rains arrive. For nearly two millennia, lake-like water bodies such as this have helped generations of farmers cultivate their fields. An old Sinhala phrase, \"wewai dagabai gamai pansalai\", even reflects the technology's centrality to village life; meaning \"tank, pagoda, village and temple\"."
        },
        {
          "role": "assistant",
          "content": "April, Maeliya, northwest Sri Lanka, Pinchal Weldurelage Siriwardene, banyan tree, wewa, reservoir, tank, Sinhala, rice paddies, 175-acres, 708,200 sq m, rainwater, agrarian committee, coconut milk, open hearth, blessings, prosperous harvest, deities, sluice gate, rice fields, irrigation canals, dry months, rains, lake-like water bodies, farmers, cultivate, Sinhala phrase, technology, village life, pagoda, temple."
        }, 
        {
          "role": "user",
          "content": prompt
        }
      ],
      temperature=0.5,
      max_tokens=256
    )
    return response.choices[0].message.content.strip()

In [10]:
prompt = "Master Reef Guide Kirsty Whitman didn't need to tell me twice. Peering down through my snorkel mask in the direction of her pointed finger, I spotted a huge male manta ray trailing a female in perfect sync – an effort to impress a potential mate, exactly as Whitman had described during her animated presentation the previous evening. Having some knowledge of what was unfolding before my eyes on our snorkelling safari made the encounter even more magical as I kicked against the current to admire this intimate undersea ballet for a few precious seconds more."
print(prompt)

Master Reef Guide Kirsty Whitman didn't need to tell me twice. Peering down through my snorkel mask in the direction of her pointed finger, I spotted a huge male manta ray trailing a female in perfect sync – an effort to impress a potential mate, exactly as Whitman had described during her animated presentation the previous evening. Having some knowledge of what was unfolding before my eyes on our snorkelling safari made the encounter even more magical as I kicked against the current to admire this intimate undersea ballet for a few precious seconds more.


In [11]:
text_summarizer(prompt)

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

## Poetic Chatbot

In [12]:
def poetic_chatbot(prompt):
    response = openai.ChatCompletion.create(
        model = "gpt-3.5-turbo",
        messages = [
            {
                "role": "system",
                "content": "You are a poetic chatbot."
            },
            {
                "role": "user",
                "content": "When was Google founded?"
            },
            {
                "role": "assistant",
                "content": "In the late '90s, a spark did ignite, Google emerged, a radiant light. By Larry and Sergey, in '98, it was born, a search engine new, on the web it was sworn."
            },
            {
                "role": "user",
                "content": "Which country has the youngest president?"
            },
            {
                "role": "assistant",
                "content": "Ah, the pursuit of youth in politics, a theme we explore. In Austria, Sebastian Kurz did implore, at the age of 31, his journey did begin, leading with vigor, in a world filled with din."
            },
            {
                "role": "user",
                "content": prompt
            }
        ],
        temperature = 1,
        max_tokens=256
    )
    return response.choices[0].message.content.strip()

In [13]:
prompt = "When was cheese first made?"
poetic_chatbot(prompt)

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

In [14]:
prompt = "What is the next course to be uploaded to 365DataScience?"
poetic_chatbot(prompt)

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

## Langchain

In [15]:
from langchain.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.memory import ConversationBufferMemory
from langchain.llms import OpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import ChatOpenAI

USER_AGENT environment variable not set, consider setting it to identify your requests.


In [16]:
url = "https://365datascience.com/upcoming-courses"

In [17]:
import os
os.environ["USER_AGENT"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
loader = WebBaseLoader(url)

In [18]:
raw_documents = loader.load()

In [19]:
text_splitter = RecursiveCharacterTextSplitter()
documents = text_splitter.split_documents(raw_documents)

In [20]:
embeddings = OpenAIEmbeddings(openai_api_key = api_key)

  embeddings = OpenAIEmbeddings(openai_api_key = api_key)


In [23]:
vectorstore = FAISS.from_documents(documents, embeddings)

Retrying langchain_community.embeddings.openai.embed_with_retry.<locals>._embed_with_retry in 4.0 seconds as it raised RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors..


RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

In [24]:
memory = ConversationBufferMemory(memory_key = "chat_history", return_messages=True)

In [25]:
qa = ConversationalRetrievalChain.from_llm(ChatOpenAI(openai_api_key=api_key, 
                                                  model="gpt-3.5-turbo", 
                                                  temperature=0), 
                                           vectorstore.as_retriever(), 
                                           memory=memory)

NameError: name 'vectorstore' is not defined

In [26]:
query = "What is the next course to be uploaded on the 365DataScience platform?"

In [27]:
result = qa({"question": query})

NameError: name 'qa' is not defined

In [28]:
result["answer"]

NameError: name 'result' is not defined