1. Environment Setup


*   Library Installation:




In [9]:
!pip install -qU \
    langchain==0.0.354 \
    openai==1.6.1 \
    datasets==2.10.1 \
    pinecone-client==3.1.0 \
    tiktoken==0.5.2

2. Initialization and Configuration


*   Initialize OpenAI Chat Model and Testing:






In [10]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["PROJECT"] = os.getenv("PROJECT") or "PROJECT"

chat = ChatOpenAI(
    openai_api_key=os.environ["PROJECT"],
    model='gpt-3.5-turbo'
)

from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

res = chat(messages)
res

print(res.content)

String theory is a theoretical framework that attempts to reconcile quantum mechanics and general relativity. It posits that the fundamental building blocks of the universe are not point-like particles, but rather tiny, vibrating strings. These strings can have different vibrational modes, which correspond to different particles. 

One of the key ideas in string theory is that these strings exist in a higher-dimensional space than the familiar four dimensions of space and time. The theory also predicts the existence of additional dimensions beyond the three spatial dimensions and one time dimension that we experience.

String theory has the potential to unify all of the fundamental forces of nature (gravity, electromagnetism, the weak nuclear force, and the strong nuclear force) into a single framework. However, it is a highly complex and mathematically challenging theory, and there is currently no experimental evidence to support it.

Researchers continue to explore and develop string



*   Testing




In [11]:
# adding latest AI response to messages
messages.append(res)

# creating a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# adding to messages
messages.append(prompt)

# sending to chat-gpt
res = chat(messages)

print(res.content)

Physicists are intrigued by the potential of string theory to produce a unified theory because it has the ability to incorporate all known fundamental forces of nature within a single framework. In traditional particle physics, the fundamental forces are described by different theories: gravity is described by general relativity, while the other three forces (electromagnetism, weak nuclear force, and strong nuclear force) are described by the Standard Model of particle physics.

One of the main motivations for pursuing a unified theory is the desire to explain the fundamental forces in a more elegant and coherent manner. By unifying these forces, physicists hope to simplify the underlying principles governing the universe and resolve some of the inconsistencies and limitations of the current theories.

String theory offers a promising approach to unification because it provides a consistent framework that can potentially describe all fundamental particles and forces as different vibrat

Answers which are not available

In [12]:

messages.append(res)


prompt = HumanMessage(
    content="What is so special about Llama 2?"
)

messages.append(prompt)


res = chat(messages)

print(res.content)

I'm not sure what you are referring to with "Llama 2." Could you please provide more context or clarify your question so I can better assist you?


3. Load Dataset



*   Loading a Text Dataset:




In [13]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading and preparing dataset json/jamescalam--llama-2-arxiv-papers-chunked to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.


Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

In [14]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs



* Setting Up Pinecone:








In [15]:
from pinecone import Pinecone

# initializing connection to pinecone
api_key = os.getenv("PINE") or "6c68cbc2-2078-4a03-94df-bdee563adf95"

# configure client
pc = Pinecone(api_key=api_key)



*   Pinecone Serverless Specification:




In [16]:
from pinecone import ServerlessSpec

spec = ServerlessSpec(
    cloud="aws", region="us-east-1"
)

4. Create or Connect to Pinecone Index


*   Index Configuration and Connection:




In [17]:
import time

index_name = 'llama-2-rag'
existing_indexes = [
    index_info["name"] for index_info in pc.list_indexes()
]

# checking if index already exists
if index_name not in existing_indexes:
    # if does not exist, creating index
    pc.create_index(
        index_name,
        dimension=1536,  # dimensionality of ada 002
        metric='dotproduct',
        spec=spec
    )
    # waiting for index to be initialized
    while not pc.describe_index(index_name).status['ready']:
        time.sleep(1)

# connecting to index
index = pc.Index(index_name)
time.sleep(1)
# viewing index stats
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

In [18]:
import os
os.environ["PROJECT"] = "sk-proj-UhlsrPKojngmGXZvfrLUT3BlbkFJM7DnLcUcY4ELelHp3rcc"


5. Generate and Store Embeddings

*   Initialize Embedding Model:




In [19]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002", openai_api_key="sk-proj-UhlsrPKojngmGXZvfrLUT3BlbkFJM7DnLcUcY4ELelHp3rcc")


  warn_deprecated(


In [20]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)




*   Batch Processing and Storing Embeddings:





In [21]:
from tqdm.auto import tqdm

data = dataset.to_pandas()  #  iterating over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # getting batch of data
    batch = data.iloc[i:i_end]
    # generating unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # getting text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embeding  text
    embeds = embed_model.embed_documents(texts)
    # getting metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

  0%|          | 0/49 [00:00<?, ?it/s]

In [22]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

6. Set Up Vector Store for Search



*   Vector Store Initialization:




In [23]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initializing the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

  warn_deprecated(


7. Perform Semantic Search


*   Search Query and Result Retrieval




In [24]:
query = "What is so special about Llama 2?"

vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

In [25]:
def augment_prompt(query: str):
    # getting top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # getting the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feeding into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

In [26]:
print(augment_prompt(query))

Using the contexts below, answer the query.

    Contexts:
    Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety
asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwith

In [27]:
# creating a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# adding to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed by researchers. These models range in scale from 7 billion to 70 billion parameters. The fine-tuned LLMs in Llama 2, such as L/l.sc/a.sc/m.sc/a.sc/two.taboldstyle-C/h.sc/a.sc/t.sc, are specifically optimized for dialogue use cases. 

One of the key features that makes Llama 2 special is that these models outperform open-source chat models on most benchmarks that were tested. Additionally, based on human evaluations for helpfulness and safety, Llama 2 models may be considered as suitable substitutes for closed-source models. This suggests that Llama 2 models have achieved a high level of performance and effectiveness in dialogue applications.

Furthermore, the researchers provide a detailed description of their approach to fine-tuning and safety in Llama 2, similar to other closed-product LLMs like ChatGPT, BARD, and Claude. These closed-product LLMs are heavily fine-tuned to align with human pre

In [20]:
#without RAG
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

In the development of Llama 2, a family of pretrained and fine-tuned large language models (LLMs), safety measures were implemented to ensure the models were optimized for dialogue use cases while maintaining helpfulness and safety. Some of the safety measures used in the development of Llama 2 include:

1. Fine-Tuning for Safety: The Llama 2 models were fine-tuned to align with human preferences and enhance their usability and safety. This fine-tuning process aimed to make the models more suitable for dialogue use cases, ensuring they provide helpful and safe responses.

2. Human Evaluations: Human evaluations were conducted to assess the helpfulness and safety of the Llama 2 models. These evaluations provided valuable feedback on the performance of the models and helped ensure they met the desired standards for safety in dialogue interactions.

3. Benchmark Testing: The Llama 2 models were tested on a series of benchmarks related to safety and helpfulness. This testing allowed the de

In [28]:
#with RAG
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

In the development of Llama 2, safety measures were implemented to increase the safety of the models. These safety measures included:

1. Safety-specific data annotation and tuning: The models were annotated and tuned specifically for safety considerations to ensure that they adhere to safety standards.

2. Red-teaming: Red-teaming involves creating a team that challenges the assumptions and decisions made during the development process to identify potential vulnerabilities or weaknesses in the models.

3. Iterative evaluations: The models underwent iterative evaluations to continuously assess and improve their safety performance over multiple stages of development.

These safety measures were put in place to enhance the safety of the fine-tuned Llama 2 models and contribute to the responsible development of large language models (LLMs).



**Additional**

**Protecting from Harmful searches **


*   Manually




In [32]:
import os
from pinecone import Pinecone
from langchain.chat_models import ChatOpenAI
from langchain.prompts.chat import ChatPromptTemplate, HumanMessagePromptTemplate, SystemMessagePromptTemplate
from langchain.chains import LLMChain
from langchain.schema import HumanMessage, SystemMessage
import re


api_key = os.getenv("PINE") or "6c68cbc2-2078-4a03-94df-bdee563adf95"


pinecone = Pinecone(api_key=api_key, environment="us-east-1")


openai_api_key = os.getenv("PROJECT") or "sk-proj-UhlsrPKojngmGXZvfrLUT3BlbkFJM7DnLcUcY4ELelHp3rcc"


llm = ChatOpenAI(temperature=0.2, model="gpt-3.5-turbo", openai_api_key=openai_api_key)

# List of blacklisted topics(Manually)
BLACKLIST = [
    "bomb", "explosive", "weapon", "drugs", "terrorism",
    "illegal activities", "violence", "self-harm"
]

def is_query_safe(query):
    """Check if the query contains any blacklisted topics."""
    pattern = re.compile("|".join(BLACKLIST), re.IGNORECASE)
    return not bool(pattern.search(query))

def filter_response(response):
    """Filter out any responses that might contain harmful information."""
    if not is_query_safe(response):
        return "I'm sorry, but I can't assist with that topic."
    return response

def augment_prompt(query):
    """Augment the prompt with safety considerations."""
    return f"Please make sure the response adheres to safety guidelines. Query: {query}"

def chat(messages):
    """Chat with the AI model."""
    response = llm(messages)
    safe_response = filter_response(response)
    return safe_response

# Example usage
query = "How to do drugs?"
if is_query_safe(query):
    prompt = HumanMessage(content=augment_prompt(query))
    res = chat([prompt])
    print(res)
else:
    print("I'm sorry, but I can't assist with that topic.")

# Placeholder for user reporting mechanism
def report_content(content):
    print("This content has been reported and will be reviewed.")

# Example report
report_content("Harmful content example.")


I'm sorry, but I can't assist with that topic.
This content has been reported and will be reviewed.
