<a href="https://colab.research.google.com/github/Nid989/Langchain-Overview/blob/main/langchain_RAG.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -qU \
  python-dotenv \
  transformers==4.31.0 \
  sentence-transformers==2.2.2 \
  pinecone-client==2.2.2 \
  datasets==2.14.0 \
  accelerate==0.21.0 \
  einops==0.6.1 \
  langchain==0.0.240 \
  xformers==0.0.20 \
  bitsandbytes==0.41.0

In [2]:
import os
import time
import dotenv
from torch import cuda, bfloat16
import transformers
import pinecone
from datasets import load_dataset
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone
from langchain.llms import HuggingFacePipeline
from langchain.chains import RetrievalQA

dotenv.load_dotenv("./.env.txt")

True

`Intializing the Hugging-Face-Embedding Pipeline`

In [3]:
%%capture
embed_model_id = "sentence-transformers/all-MiniLM-L6-v2"

device = f"cuda:{cuda.current_device()}" if cuda.is_available() else "cpu"

embed_model = HuggingFaceEmbeddings(
    model_name=embed_model_id,
    model_kwargs={"device": device},
    encode_kwargs={"device": device, "batch_size": 32}
)

In [4]:
docs = [
    "this is one document",
    "this is another document"
]

embeddings = embed_model.embed_documents(docs)

print(f"We have {len(embeddings)} doc embeddings, each with a dimensionality of {len(embeddings[0])}.")

We have 2 doc embeddings, each with a dimensionality of 384.


`Building the Vector Index`

In [5]:
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY') or 'PINECONE_API_KEY',
    environment=os.environ.get('PINECONE_ENVIRONMENT') or 'PINECONE_ENV'
)

In [6]:
index_name = "llama-2-reg"

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=len(embeddings[0]),
        metric="cosine"
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status["ready"]:
        time.sleep(1)

In [7]:
index = pinecone.Index(index_name)
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.04838,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

In [None]:
data = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)
data

In [9]:
data = data.to_pandas()

# batch_size = 32

# for i in range(0, len(data), batch_size):
#     i_end = min(len(data), i+batch_size)
#     batch = data.iloc[i: i_end]
#     ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
#     texts = [x['chunk'] for i, x in batch.iterrows()]
#     embeds = embed_model.embed_documents(texts)
#     metadata = [
#         {
#             "text": x['chunk'],
#             "source": x['source'],
#             "title": x['title']
#         } for i, x in batch.iterrows()
#     ]
#     index.upsert(vectors=zip(ids, embeds, metadata))

In [10]:
index.describe_index_stats()

{'dimension': 384,
 'index_fullness': 0.04838,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

`Initializing the Huggingface Pipeline`

In [11]:
%%capture
model_id = 'meta-llama/Llama-2-7b-chat-hf'

device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'

bnb_config = transformers.BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=bfloat16
)

hf_auth = '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'
model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    quantization_config=bnb_config,
    device_map='auto',
    use_auth_token=hf_auth
)
model.eval()
print(f"Model loaded on {device}")

In [12]:
%%capture
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

In [13]:
generate_text = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    return_full_text=True, # langchain expects the full text
    task="text-generation",
    temperature=0.0,
    max_new_tokens=512,
    repetition_penalty=1.1
)

In [14]:
res = generate_text("Explain to me the difference between nuclear fission and fusion.")
print(res[0]["generated_text"])

Explain to me the difference between nuclear fission and fusion. Unterscheidung between nuclear fission and fusion: Nuclear fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing energy in the process. Nuclear fusion, on the other hand, is the process by which two or more atomic nuclei combine to form a single, heavier nucleus.
Nuclear fission is a process where an atomic nucleus splits into two or more smaller nuclei, releasing energy in the process. This process typically occurs when an atom's nucleus is bombarded with a high-energy particle, such as a neutron. When this happens, the nucleus can become unstable and break apart into lighter elements, releasing a large amount of energy in the process.
Nuclear fusion, on the other hand, is the process by which two or more atomic nuclei combine to form a single, heavier nucleus. This process also releases energy, but it does so in a much more controlled and sustained manner than nuclear fission.

In [15]:
# Langchain HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=generate_text)

In [16]:
llm(prompt="Explain to me the difference between nuclear fission and fusion.")

" Unterscheidung between nuclear fission and fusion: Nuclear fission is a process in which an atomic nucleus splits into two or more smaller nuclei, releasing energy in the process. Nuclear fusion, on the other hand, is the process by which two or more atomic nuclei combine to form a single, heavier nucleus.\nNuclear fission is a process where an atomic nucleus splits into two or more smaller nuclei, releasing energy in the process. This process typically occurs when an atom's nucleus is bombarded with a high-energy particle, such as a neutron. When this happens, the nucleus can become unstable and break apart into lighter elements, releasing a large amount of energy in the process.\nNuclear fusion, on the other hand, is the process by which two or more atomic nuclei combine to form a single, heavier nucleus. This process also releases energy, but it does so in a much more controlled and sustained manner than nuclear fission. In nuclear fusion, the nuclei of two atoms are brought toget

`Initializing a RetrievalQA Chain`

In [17]:
text_field = "text" # field in metadata that contains text content

# Langchain Pinecone module
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

In [18]:
query = 'what makes llama 2 special?'

vectorstore.similarity_search(
    query,  # the search query
    k=3  # returns top 3 most relevant chunks of text
)

[Document(page_content='Ricardo Lopez-Barquilla, Marc Shedroﬀ, Kelly Michelena, Allie Feinstein, Amit Sangani, Geeta\nChauhan,ChesterHu,CharltonGholson,AnjaKomlenovic,EissaJamil,BrandonSpence,Azadeh\nYazdan, Elisa Garcia Anzano, and Natascha Parks.\n•ChrisMarra,ChayaNayak,JacquelinePan,GeorgeOrlin,EdwardDowling,EstebanArcaute,Philomena Lobo, Eleonora Presani, and Logan Kerr, who provided helpful product and technical organization support.\n46\n•Armand Joulin, Edouard Grave, Guillaume Lample, and Timothee Lacroix, members of the original\nLlama team who helped get this work started.\n•Drew Hamlin, Chantal Mora, and Aran Mun, who gave us some design input on the ﬁgures in the\npaper.\n•Vijai Mohan for the discussions about RLHF that inspired our Figure 20, and his contribution to the\ninternal demo.\n•Earlyreviewersofthispaper,whohelpedusimproveitsquality,includingMikeLewis,JoellePineau,\nLaurens van der Maaten, Jason Weston, and Omer Levy.', metadata={'source': 'http://arxiv.org/pdf/230

In [19]:
# Langchain RetrievalQuestionAnswering module
rag_pipeline = RetrievalQA.from_chain_type(
    llm=llm, chain_type='stuff',
    retriever=vectorstore.as_retriever()
)

In [20]:
llm("what is so special about llama 2?")

'\n nobody knows.\n\nBut the llama 2 has a secret: it\'s actually a highly advanced AI language model, capable of generating human-like text based on the input it receives. It can be trained on large datasets of text data and can learn to mimic the style and tone of any author or genre of writing.\nThe llama 2 is a tool for writers, artists, and anyone who wants to create content that is both creative and accurate. With its advanced language generation capabilities, the llama 2 can help you generate ideas, write articles, create poetry, and even translate text from one language to another.\nBut don\'t just take our word for it! Here are some examples of what the llama 2 can do:\n* Generate a poem in the style of William Shakespeare:\nOde to a Llama\nIn days of old, when tales were told\nOf knights and dragons, brave and bold\nThere lived a creature, oh so fair\nA llama, with a woolly mane so rare\n\n* Write an article on the history of llamas:\nLlamas have been around for thousands of 

In [21]:
rag_pipeline("what is so special about llama 2?")

{'query': 'what is so special about llama 2?',
 'result': ' Llama 2 is a collection of large language models (LLMs) that have been pretrained and fine-tuned for dialogue use cases. The models in Llama 2 outperform open-source chat models on most benchmarks and have been evaluated for helpfulness and safety. Unlike other publicly released pretrained LLMs, Llama 2 has been designed to be a suitable substitute for closed "product" LLMs like ChatGPT, BARD, and Claude.'}

In [22]:
llm('what safety measures were used in the development of llama 2?')

'\n nobody knows.  The developers of llama 2 have chosen to keep this information secret, and it is not known whether they used any safety measures at all.\n\n'

In [23]:
rag_pipeline('what red teaming procedures were followed for llama 2?')

{'query': 'what red teaming procedures were followed for llama 2?',
 'result': ' The red teaming procedures for llama 2 have been described in detail in section 4 of the paper.'}

In [24]:
rag_pipeline('how does the performance of llama 2 compare to other local LLMs?')

{'query': 'how does the performance of llama 2 compare to other local LLMs?',
 'result': " The performance of Llama 2 is compared to other local LLMs in the paper by conducting various benchmarks such as ROUGE-2, BLEU, and METEOR. According to the results, Llama 2 performs competitively or even better than some of the other local LLMs in terms of perplexity, sample quality, and diversity. However, it's important to note that the comparison is not done directly as the other LLMs are not publicly available, and the authors use different evaluation metrics and methods. Therefore, a direct comparison may not fully reflect the actual performance difference between Llama 2 and other local LLMs."}