In [1]:
!pip install \
    "pinecone[grpc]" \
    "langchain-pinecone" \
    "langchain-openai" \
    "langchain-text-splitters" \
    "langchain"
!pip install cohere

Collecting langchain-pinecone
  Downloading langchain_pinecone-0.2.0-py3-none-any.whl.metadata (1.7 kB)
Collecting langchain-openai
  Downloading langchain_openai-0.2.2-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-text-splitters
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain
  Downloading langchain-0.3.3-py3-none-any.whl.metadata (7.1 kB)
Collecting pinecone[grpc]
  Downloading pinecone-5.3.1-py3-none-any.whl.metadata (19 kB)
Collecting pinecone-plugin-inference<2.0.0,>=1.1.0 (from pinecone[grpc])
  Downloading pinecone_plugin_inference-1.1.0-py3-none-any.whl.metadata (2.2 kB)
Collecting pinecone-plugin-interface<0.0.8,>=0.0.7 (from pinecone[grpc])
  Downloading pinecone_plugin_interface-0.0.7-py3-none-any.whl.metadata (1.2 kB)
Collecting lz4>=3.1.3 (from pinecone[grpc])
  Downloading lz4-4.3.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.7 kB)
Collecting protobuf<5.0,>=4.25 (from pinecone[grpc])

In [2]:
markdown_document = '''The Technical Landscape of OpenAI: A Deep Dive into GPT-3 and Beyond
OpenAI, founded in December 2015, has rapidly transformed the landscape of artificial intelligence (AI) and natural language processing (NLP). With its flagship model, Generative Pre-trained Transformer 3 (GPT-3), OpenAI has pushed the boundaries of what is possible in machine learning, enabling a myriad of applications ranging from conversational agents to creative writing tools. This essay delves into the technical intricacies of GPT-3, exploring its architecture, training methodologies, capabilities, and implications for the future of AI.

Architecture of GPT-3
At the core of GPT-3 is the transformer architecture, which was introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017. This architecture revolutionized NLP by employing a mechanism called self-attention, which allows the model to weigh the significance of different words in a sentence relative to each other. GPT-3 contains 175 billion parameters, making it one of the largest language models to date. This sheer size enables the model to capture a vast range of language patterns and contextual relationships.

The transformer consists of an encoder and a decoder; however, GPT-3 utilizes only the decoder part. The architecture processes text in chunks, converting each token (a word or a part of a word) into vectors through embedding layers. The self-attention mechanism then computes the relationships between these tokens, allowing the model to generate coherent and contextually relevant text based on the input it receives.

Training Methodology
GPT-3’s training involved a two-step process: pre-training and fine-tuning. During the pre-training phase, the model was exposed to diverse text data from the internet, including books, articles, and websites. This unsupervised learning process involved predicting the next word in a sentence given the preceding context, a task referred to as language modeling. Through this method, GPT-3 learned grammar, facts about the world, and even some reasoning abilities.

After pre-training, GPT-3 underwent a fine-tuning phase where it was trained on specific tasks with labeled data. This phase allowed the model to adjust its parameters to better align with the desired outputs for various applications. OpenAI adopted a unique approach to fine-tuning, leveraging few-shot, one-shot, and zero-shot learning. This means that GPT-3 can perform tasks with minimal examples, making it highly adaptable to new situations without extensive retraining.

Capabilities and Applications
The capabilities of GPT-3 are vast and varied. One of its most notable features is its ability to generate human-like text that is contextually appropriate. This has led to its application in numerous fields, including:

Conversational Agents: GPT-3 powers chatbots that can engage users in meaningful conversations, providing assistance, answering questions, and even simulating personalities.

Content Creation: The model can generate creative content, including articles, poetry, and even programming code, reducing the time and effort required for writers and developers.

Educational Tools: GPT-3 can serve as a tutor, providing explanations and insights on complex subjects, thereby enhancing the learning experience.

Data Analysis: The model can summarize and analyze large volumes of text data, making it a valuable tool for researchers and analysts.

Despite its impressive capabilities, GPT-3 is not without limitations. It can produce incorrect or biased information and may struggle with tasks requiring deep understanding or reasoning. OpenAI has acknowledged these issues and is actively researching ways to mitigate them.

Ethical Considerations and Future Implications
As with any powerful technology, the deployment of GPT-3 raises ethical concerns. The potential for misuse—such as generating misleading information, deep fakes, or even automated propaganda—highlights the importance of responsible AI use. OpenAI has implemented guidelines for developers and users to ensure ethical applications of its models.

Looking ahead, the future of OpenAI and its technologies is both exciting and challenging. OpenAI is continuously working on improving the robustness and reliability of its models, with an emphasis on safety and alignment with human values. The organization envisions a future where AI systems can be collaborative partners, assisting humans in complex decision-making and creative processes.

In conclusion, OpenAI’s GPT-3 represents a significant advancement in the field of natural language processing. Its transformer architecture, extensive training methodologies, and wide-ranging capabilities illustrate the potential of AI to augment human creativity and productivity. However, as we embrace these advancements, it is crucial to remain vigilant about the ethical implications and ensure that AI technologies are developed and used responsibly. The journey of AI is just beginning, and the possibilities are boundless as we continue to explore and innovate in this dynamic field.
Certainly! Let’s dive deeper into various aspects of OpenAI and its technologies, particularly focusing on GPT-3, its architecture, training methods, applications, challenges, and future developments.

### 1. **Architecture of GPT-3**

#### Transformer Model
The **transformer architecture** is the foundation of GPT-3. Here’s a more detailed breakdown:

- **Self-Attention Mechanism**: This is the key innovation that allows the model to weigh the importance of different words when processing a sentence. Each word in the input interacts with every other word, enabling the model to build a contextual understanding. This is done through three main components:
  - **Queries**: Represent the current token.
  - **Keys**: Represent all tokens in the sequence.
  - **Values**: Also represent all tokens in the sequence but are weighted by the attention scores derived from queries and keys.

- **Positional Encoding**: Since transformers process tokens in parallel rather than sequentially, positional encoding is added to token embeddings to give the model information about the position of each word in the sequence. This helps retain the order of words.

- **Layer Normalization**: Each layer of the transformer includes normalization processes that help stabilize and speed up the training.

#### Scale of GPT-3
GPT-3’s 175 billion parameters dwarf its predecessor, GPT-2, which had 1.5 billion parameters. This significant increase in scale allows GPT-3 to capture more nuanced language patterns and a wider variety of contexts, leading to its superior performance on various tasks.

### 2. **Training Methodology**

#### Pre-training
During pre-training, GPT-3 is exposed to a diverse dataset that includes text from books, articles, and websites. The model learns through the process of:

- **Next-Word Prediction**: Given a sequence of words, the model predicts the next word. This is a self-supervised learning approach, where the model is trained without explicit labels, making it highly scalable.

#### Fine-tuning
While GPT-3 is often used in a zero-shot, one-shot, or few-shot learning setting, the fine-tuning phase is crucial for tailoring the model to specific applications. In this phase:

- **Supervised Learning**: The model is trained on smaller, labeled datasets to adjust its parameters for particular tasks.
- **Task Adaptation**: Fine-tuning allows the model to become proficient in specific tasks, enhancing its performance and relevance in various contexts.

### 3. **Capabilities and Applications**

#### Versatile Use Cases
The flexibility of GPT-3 has led to its application in multiple domains:

- **Conversational AI**: Beyond chatbots, GPT-3 powers virtual assistants, customer support systems, and more. Its ability to maintain context over multiple exchanges is critical for effective communication.

- **Content Generation**: Writers, marketers, and content creators use GPT-3 for generating articles, advertisements, social media posts, and scripts. Its ability to mimic different writing styles and tones is particularly beneficial.

- **Programming Assistance**: Developers use GPT-3 to generate code snippets, debug existing code, and even automate routine tasks in software development.

- **Language Translation and Summarization**: The model can translate languages and summarize lengthy documents, making it a useful tool for global communication and information management.

- **Education and Training**: Educational tools powered by GPT-3 can offer personalized learning experiences, quizzes, and explanations tailored to individual student needs.

### 4. **Challenges and Limitations**

While GPT-3 has shown remarkable capabilities, it faces several challenges:

- **Bias**: The model can inadvertently learn and replicate biases present in the training data. OpenAI acknowledges this risk and is actively researching methods to minimize it.

- **Inaccuracy**: GPT-3 can produce plausible-sounding but factually incorrect information. It lacks true understanding and can generate text that is coherent yet misleading.

- **Context Limitations**: While GPT-3 can handle contextual information well, it can struggle with maintaining coherence over very long texts or complex logical reasoning.

### 5. **Ethical Considerations**

The power of GPT-3 raises ethical concerns regarding its deployment:

- **Misinformation**: The ease of generating convincing text can lead to the creation of misleading articles, deepfakes, and other forms of misinformation.

- **Security Risks**: Malicious actors could exploit GPT-3 to create phishing emails, scams, or propaganda, raising concerns about the security implications of such technologies.

- **Accountability**: As AI systems take on more responsibilities, questions arise regarding accountability and transparency in their decisions and outputs.

### 6. **Future Developments**

OpenAI is committed to improving the safety and reliability of its models. Some areas of focus for future developments include:

- **Alignment with Human Values**: Research is ongoing to align AI systems with human intentions and ethics, ensuring they serve beneficial purposes without unintended consequences.

- **Robustness**: OpenAI aims to enhance the robustness of its models against adversarial inputs and reduce their susceptibility to generating biased or harmful content.

- **Interactivity and Integration**: Future iterations of models may incorporate more interactive capabilities, allowing them to engage in more sophisticated dialogues and understand user intentions better.

- **Exploration of Multimodal AI**: OpenAI is exploring multimodal AI systems that integrate text, images, and possibly audio, enabling richer interactions and more comprehensive understanding.

In conclusion, OpenAI's advancements with GPT-3 signify a landmark achievement in the field of AI and NLP. As the technology evolves, the focus will increasingly shift toward responsible deployment, ethical considerations, and enhancing human-AI collaboration. The future of AI holds tremendous promise, and OpenAI is at the forefront of this transformative journey.'''



In [3]:
from google.colab import userdata
api_key_pinecone = userdata.get('PINECONE_API_KEY')
api_key_openai = userdata.get('GRAPHRAG_API_KEY')

In [4]:
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import os
pc = Pinecone(api_key= userdata.get('PINECONE_API_KEY'))
index_name = "docs-rag-chatbot"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        dimension=768,
        metric="cosine",
        spec=ServerlessSpec(
            cloud="aws",
            region="us-east-1"
        )
    )


In [5]:
!pip install -U langchain-community
!pip install sentence-transformers

Collecting langchain-community
  Downloading langchain_community-0.3.2-py3-none-any.whl.metadata (2.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.5.2-py3-none-any.whl.metadata (3.5 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.22.0-py3-none-any.whl.metadata (7.2 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading typing_inspect-0.9.0-py3-none-any.whl.metadata (1.5 kB)
Collecting python-dotenv>=0.21.0 (from pydantic-settings<3.0.0,>=2.4.0->langchain-community)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting mypy-extensions>=0.3.0 (from typing-inspect<1,>=0.4.0->dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloa

In [6]:
##Now we will use the embeddings model from jinai to create embeddings from our whales corpus
from transformers import AutoModel
from transformers import AutoTokenizer
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
# load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)
model = AutoModel.from_pretrained('jinaai/jina-embeddings-v2-base-en', trust_remote_code=True)



tokenizer_config.json:   0%|          | 0.00/373 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/712k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

configuration_bert.py:   0%|          | 0.00/8.24k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-bert-implementation:
- configuration_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


modeling_bert.py:   0%|          | 0.00/97.7k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/jinaai/jina-bert-implementation:
- modeling_bert.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


model.safetensors:   0%|          | 0.00/275M [00:00<?, ?B/s]

In [7]:
##the function that applies late chunking to the corpus.
def late_chunking(
    model_output: 'BatchEncoding', span_annotation: list, max_length=None
):
    token_embeddings = model_output[0]
    outputs = []
    for embeddings, annotations in zip(token_embeddings, span_annotation):
        if (
            max_length is not None
        ):  # remove annotations which go bejond the max-length of the model
            annotations = [
                (start, min(end, max_length - 1))
                for (start, end) in annotations
                if start < (max_length - 1)
            ]
        pooled_embeddings = [
            embeddings[start:end].sum(dim=0) / (end - start)
            for start, end in annotations
            if (end - start) >= 1
        ]
        pooled_embeddings = [
            embedding.detach().cpu().numpy() for embedding in pooled_embeddings
        ]
        outputs.append(pooled_embeddings)

    return outputs

In [8]:
##once the embeddings are stored, now we will need a tokenizer which will breakdown the entire corpus into chunks
import requests

def chunk_by_tokenizer_api(input_text: str, tokenizer: callable):
    # Define the API endpoint and payload
    url = 'https://tokenize.jina.ai/'
    payload = {
        "content": input_text,
        "return_chunks": "true",
        "max_chunk_length": "700"
    }

    # Make the API request
    response = requests.post(url, json=payload)
    response_data = response.json()

    # Extract chunks and positions from the response
    chunks = response_data.get("chunks", [])
    chunk_positions = response_data.get("chunk_positions", [])

    # Adjust chunk positions to match the input format
    span_annotations = [(start, end) for start, end in chunk_positions]

    return chunks, span_annotations

In [9]:
# Function to extract text from a PDF file
# determine chunks
from langchain_core.documents import Document
chunks, span_annotations = chunk_by_tokenizer_api(markdown_document, tokenizer)
print('Chunks:\n- "' + '"\n- "'.join(chunks) + '"')
documents = [Document(page_content=chunk, metadata={"chunk_index": i}) for i, chunk in enumerate(chunks)]


Chunks:
- "The Technical Landscape of OpenAI: A Deep Dive into GPT-3 and Beyond
"
- "OpenAI, founded in December 2015, has rapidly transformed the landscape of artificial intelligence (AI) and natural language processing (NLP). With its flagship model, Generative Pre-trained Transformer 3 (GPT-3), OpenAI has pushed the boundaries of what is possible in machine learning, enabling a myriad of applications ranging from conversational agents to creative writing tools. This essay delves into the technical intricacies of GPT-3, exploring its architecture, training methodologies, capabilities, and implications for the future of AI.

"
- "Architecture of GPT-3
"
- "At the core of GPT-3 is the transformer architecture, which was introduced in the paper "Attention is All You Need" by Vaswani et al. in 2017. This architecture revolutionized NLP by employing a mechanism called self-attention, which allows the model to weigh the significance of different words in a sentence relative to each other. 

In [10]:
len(documents)

64

In [11]:
# chunk afterwards (context-sensitive chunked pooling)
inputs = tokenizer(markdown_document, return_tensors='pt')
model_output = model(**inputs)
embeddings = late_chunking(model_output, [span_annotations])[0]


In [12]:
len(embeddings)

64

In [13]:
embeddings[0].shape

(768,)

In [14]:
!export PINECONE_API_KEY=pinecone_api_key # available at app.pinecone.io


In [15]:
import os
os.environ['PINECONE_API_KEY'] = userdata.get('PINECONE_API_KEY')

In [16]:
# Embed each chunk and upsert the embeddings into your Pinecone index.
from langchain_pinecone import PineconeEmbeddings
from langchain_pinecone import PineconeVectorStore
import numpy as np
upsert_data = []
index = pc.Index("docs-rag-chatbot")

namespace = "new-space"
for i, (embedding, chunk) in enumerate(zip(embeddings, chunks)):
    if not np.all(embedding == 0):  # Check if the vector is all zeros
        metadata = {"chunk": chunk, "span": str(span_annotations[i])}
        upsert_data.append((str(i), embedding, metadata))

# Upsert data to Pinecone index
index.upsert(vectors=upsert_data, namespace=namespace)


upserted_count: 8

In [17]:

for ids in index.list(namespace=namespace):
    query = index.query(
        id=ids[0],
        namespace=namespace,
        top_k=1,
        include_values=True,
        include_metadata=True
    )
    print(query)


{'matches': [{'id': '0',
              'metadata': {'chunk': 'The Technical Landscape of OpenAI: A Deep '
                                    'Dive into GPT-3 and Beyond\n',
                           'span': '(0, 69)'},
              'score': 1.0005926,
              'sparse_values': {'indices': [], 'values': []},
              'values': [-0.5250236,
                         -0.38851437,
                         0.485978,
                         -0.12980294,
                         0.30259323,
                         0.10768815,
                         0.0031092167,
                         -0.51727855,
                         -0.24620946,
                         0.35061273,
                         -0.33262733,
                         -0.110005304,
                         -0.48629153,
                         -0.30401772,
                         -0.11459384,
                         1.3045638,
                         -0.67465097,
                         -0.35869732,
      

In [18]:
import torch

In [19]:
def compute_query_embedding(query):
    inputs = tokenizer(query, return_tensors='pt', truncation=True, padding=True)
    with torch.no_grad():
        model_output = model(**inputs)
    embedding = model_output.last_hidden_state.mean(dim=1).squeeze().cpu().numpy()
    return embedding

# Example quecre?"
query="Give me a comprehensive overview of the architecture that GPT-3 uses?"
query_vector = compute_query_embedding(query)


In [20]:
query_results = index.query(
    vector=query_vector,  # Use the query embedding
    namespace=namespace,
    top_k=3,  # Get the top 3 matches
    include_values=True,
    include_metadata=True
)


In [21]:
matches = query_results.get('matches', [])
context = []
for match in matches:
   metadata = match['metadata']
   print(metadata)
   chunk_text = metadata['chunk'].strip()
   context.append(f"Paragraph {i + 1}: {chunk_text}")
context = " ".join(context)


{'span': '(1619, 2085)', 'chunk': 'GPT-3’s training involved a two-step process: pre-training and fine-tuning. During the pre-training phase, the model was exposed to diverse text data from the internet, including books, articles, and websites. This unsupervised learning process involved predicting the next word in a sentence given the preceding context, a task referred to as language modeling. Through this method, GPT-3 learned grammar, facts about the world, and even some reasoning abilities.\n\n'}
{'span': '(69, 618)', 'chunk': 'OpenAI, founded in December 2015, has rapidly transformed the landscape of artificial intelligence (AI) and natural language processing (NLP). With its flagship model, Generative Pre-trained Transformer 3 (GPT-3), OpenAI has pushed the boundaries of what is possible in machine learning, enabling a myriad of applications ranging from conversational agents to creative writing tools. This essay delves into the technical intricacies of GPT-3, exploring its archi

In [22]:
from openai import OpenAI
client = OpenAI(api_key=api_key_openai)
context

'Paragraph 64: GPT-3’s training involved a two-step process: pre-training and fine-tuning. During the pre-training phase, the model was exposed to diverse text data from the internet, including books, articles, and websites. This unsupervised learning process involved predicting the next word in a sentence given the preceding context, a task referred to as language modeling. Through this method, GPT-3 learned grammar, facts about the world, and even some reasoning abilities. Paragraph 64: OpenAI, founded in December 2015, has rapidly transformed the landscape of artificial intelligence (AI) and natural language processing (NLP). With its flagship model, Generative Pre-trained Transformer 3 (GPT-3), OpenAI has pushed the boundaries of what is possible in machine learning, enabling a myriad of applications ranging from conversational agents to creative writing tools. This essay delves into the technical intricacies of GPT-3, exploring its architecture, training methodologies, capabilitie

In [23]:
from google.colab import userdata
cohere_api_key = userdata.get('COHERE_API_KEY')

In [24]:
##Now we will implement reranking using the cohere rerank api
documents = [match['metadata']['chunk'] for match in matches]  # Extract the chunks from matches

import cohere
co = cohere.ClientV2(api_key=cohere_api_key)
results = co.rerank(model="rerank-english-v3.0", query=query, documents=documents, top_n=3, return_documents=True)


sagemaker.config INFO - Not applying SDK defaults from location: /etc/xdg/sagemaker/config.yaml
sagemaker.config INFO - Not applying SDK defaults from location: /root/.config/sagemaker/config.yaml


In [25]:
reranked_documents = [
    {
        "text": result.document.text,
        "index": result.index,
        "relevance_score": result.relevance_score
    }
    for result in results.results
]
reranked_context = "\n".join(
    [f"Paragraph {doc['index'] + 1} (Score: {doc['relevance_score']}): {doc['text']}" for doc in reranked_documents]
)
reranked_context

'Paragraph 3 (Score: 0.99890983): The transformer consists of an encoder and a decoder; however, GPT-3 utilizes only the decoder part. The architecture processes text in chunks, converting each token (a word or a part of a word) into vectors through embedding layers. The self-attention mechanism then computes the relationships between these tokens, allowing the model to generate coherent and contextually relevant text based on the input it receives.\n\n\nParagraph 2 (Score: 0.98054343): OpenAI, founded in December 2015, has rapidly transformed the landscape of artificial intelligence (AI) and natural language processing (NLP). With its flagship model, Generative Pre-trained Transformer 3 (GPT-3), OpenAI has pushed the boundaries of what is possible in machine learning, enabling a myriad of applications ranging from conversational agents to creative writing tools. This essay delves into the technical intricacies of GPT-3, exploring its architecture, training methodologies, capabilities,

In [26]:
system_prompt = (
    "You are a RAG chatbot. Provide the best answer to the user's query using only the relevant context from "
    "the documents. Prioritize information based on relevance scores, favoring the highest scores. "
    "If the answer is not supported by the context, clearly state 'OUT OF CONTEXT'. "
    "Additionally, include the source of the information (e.g., first, second, or third retrieved paragraph)."
)

In [27]:


    # Generate a response using the language model
input_text = f"{context} \n\n Question: {query} \n Answer:"
inputs = tokenizer(input_text, return_tensors="pt")
response = client.chat.completions.create(
        model="gpt-4",  # Use an appropriate model
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": f"{reranked_context} \n\n Question: {query} \n Answer:"}
    ]

    )


In [28]:
print(response.choices[0].message.content)

GPT-3, developed by OpenAI, utilizes a part of the transformer architecture, specifically the decoder. The architecture processes text in chunks, converting each token - a word or a part of a word - into vectors through embedding layers. A mechanism known as self-attention then computes the relationships between these tokens. This method allows the model to generate coherent and contextually relevant text based on the input it receives. This information mainly comes from the third retrieved paragraph, which had the highest relevance score.
