#**Llama 2+ Pinecone + LangChain**

##**Step 1: Install All the Required Libraries**

In [None]:
!pip install langchain
!pip install pypdf
!pip install unstructured
!pip install sentence_transformers
!pip install pinecone-client
!pip install llama-cpp-python
!pip install huggingface_hub

Collecting pypdf
  Downloading pypdf-3.15.2-py3-none-any.whl (271 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m271.1/271.1 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pypdf
Successfully installed pypdf-3.15.2
Collecting unstructured
  Using cached unstructured-0.10.4-py3-none-any.whl (1.5 MB)
Collecting filetype (from unstructured)
  Using cached filetype-1.2.0-py2.py3-none-any.whl (19 kB)
Collecting python-magic (from unstructured)
  Using cached python_magic-0.4.27-py2.py3-none-any.whl (13 kB)
Collecting emoji (from unstructured)
  Using cached emoji-2.8.0-py2.py3-none-any.whl (358 kB)
Installing collected packages: filetype, python-magic, emoji, unstructured
Successfully installed emoji-2.8.0 filetype-1.2.0 python-magic-0.4.27 unstructured-0.10.4


#**Step 2: Import All the Required Libraries**

In [None]:
from langchain.document_loaders import PyPDFLoader, OnlinePDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import Pinecone
from sentence_transformers import SentenceTransformer
from langchain.chains.question_answering import load_qa_chain
import pinecone
import os

#**Step 3: Load the Data- Download the Pdf you want to work with and place it in working directory**

In [None]:
from google.colab import files
uploaded = files.upload()
loader = PyPDFLoader("WHAT REALLY MAKES YOU ILL.pdf")

Saving WHAT REALLY MAKES YOU ILL.pdf to WHAT REALLY MAKES YOU ILL.pdf


In [None]:
data = loader.load()

In [None]:
data[1]

Document(page_content='What Really Makes You Ill?\nWhy Everything You Thought You Knew About\nDisease is Wrong\nDawn Lester & David Parker', metadata={'source': 'WHAT REALLY MAKES YOU ILL.pdf', 'page': 1})

#**Step 4: Split the Text into Chunks**

In [None]:
text_splitter=RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)

In [None]:
docs=text_splitter.split_documents(data)

In [None]:
len(docs)

3272

In [None]:
docs[10]

Document(page_content='flawed nature of these ideas  and theories means that the words of\nVoltaire remain applicable to the 21st century medical system known\nas ‘modern medicine’; a system that continues to operate from the\nbasis of a poor level of knowledge about medicines, diseases and\nthe human body .', metadata={'source': 'WHAT REALLY MAKES YOU ILL.pdf', 'page': 6})

#**Step 5: Setup the Environment**

In [None]:
os.environ["HUGGINGFACEHUB_API_TOKEN"] = 'enter your huggingface API here'
os.environ["PINECONE_API_KEY"]= 'enter your pinecone API here'
os.environ["PINECONE_API_ENV"]= 'enter your pinecone env here'

In [None]:
PINECONE_API_KEY = 'enter your pinecone API here'
PINECONE_API_ENV = 'enter your pinecone env here'

#**Step 6: Downlaod the Embeddings**

In [None]:
embeddings=HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')

Downloading (…)e9125/.gitattributes:   0%|          | 0.00/1.18k [00:00<?, ?B/s]

Downloading (…)_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

Downloading (…)7e55de9125/README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

Downloading (…)55de9125/config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading (…)ce_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Downloading (…)125/data_config.json:   0%|          | 0.00/39.3k [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

Downloading (…)nce_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading (…)e9125/tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading (…)9125/train_script.py:   0%|          | 0.00/13.2k [00:00<?, ?B/s]

Downloading (…)7e55de9125/vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Downloading (…)5de9125/modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

#**Step 7: Initializing the Pinecone**

In [None]:
# initialize pinecone
pinecone.init(
    api_key=PINECONE_API_KEY,  # find at app.pinecone.io
    environment=PINECONE_API_ENV  # next to api key in console
)
index_name = 'llama2-llm' # put in the name of your pinecone index here

In [None]:
docs[50]

Document(page_content='organisation was, according to the BMA web page entitled The\nHistory of the BMA , to provide,\n“…a ‘friendly and scientific’ forum where doctors could\nadvance and exchange medical knowledge.”\nThe BMA web pages that detail its history refer to their campaign\nagainst ‘quackery’ in the early 19th century . The term ‘quackery’\nwas, and still is, used to discredit all forms of ‘healing’ other than\nthose of modern medicine. Yet it was that very same 19th century\nmedical system, which claimed to oppose quackery , that employed\n‘medicines’ known to be harmful and often led to a patient’ s\ninvalidism or death.\nThe practice of medicine has clearly not changed a great deal\nsince the days of Hippocrate s, after whom the Hippocratic Oath that\nurges doctors to ‘do no harm ’ is named. This Oath is still sworn by\nnewly qualified doctors and it is a laudable principle on which to base\nany work in the field of ‘healthcare’. But the use of harmful', metadata={'source'

#**Step 8: Create Embeddings for Each of the Text Chunk**

In [None]:
docsearch=Pinecone.from_texts([t.page_content for t in docs], embeddings, index_name=index_name)

# If you already have an index, you can load it like this


In [None]:
#docsearch = Pinecone.from_existing_index(index_name, embeddings)

#**Step 9: Similarity Search**

In [None]:
query="What proves that viruses don't exist?"

In [None]:
docs=docsearch.similarity_search(query)

In [None]:
docs[0]

Document(page_content='What is the name of the primary  publication that provides proof\nthat a particular virus is the sole cause of a particular disease?\nIt is vitally important that any documents referred to by the\norganisation, should they reply , must be primary papers; textbooks\nor other reference materials that are not primary docume nts are not\nacceptable; they must provide primary evidence.\nIt should be noted that investigations of this nature, including those\nundertaken by virologists such as Dr Lanka, have failed to unearth\nany original papers that conclusively prove that any virus is the\ncause of any disease. In addition, as this discussion has\ndemonstrated, the functions attributed to viruses in the causation of\ndisease are based on assumptions and extrapolations from\nlaboratory experiments that have not only failed to prove, but are\nincapable of proving, that viruses cause disease. The inert, non-\nliving particles known as viruses do not possess the ability t

In [None]:
query="Do viruses exist?"

In [None]:
docs=docsearch.similarity_search(query)
docs

[Document(page_content='as discussed on the EoL web page that states,\n“Although viruses may cause disruption of normal\nhomeostasis resulting in disease, in some cases viruses may\nsimply reside inside an organism without significant harm.”', metadata={}),
 Document(page_content='An August 2008 Scientific American  article entitled Are Viruses\nAlive  provides an interesting insight into the changing perception of\nviruses,\n“First seen as poisons, then as life-forms, then as biological\nchemicals, viruses today are thought of as being in a gray area\nbetween living and non-living...”\nAlthough categorising viruses as being in a ‘gray area’, the article\nnevertheless asserts that they are pathogenic,\n“In the late 19th century researchers realized that certain\ndiseases, including rabies and foot-and-mouth, were caus ed by\nparticles that seemed to behave like bacteria but were much\nsmaller .”\nThis assertion tends to support the idea that viruses must  be alive\nbecause they are cla

#**Step 9: Query the Docs to get the Answer Back (Llama 2 Model)**

In [None]:
!CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose

Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Collecting llama-cpp-python
  Downloading llama_cpp_python-0.1.78.tar.gz (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m33.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Running command pip subprocess to install build dependencies
  Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
  Collecting setuptools>=42
    Using cached setuptools-68.1.2-py3-none-any.whl (805 kB)
  Collecting scikit-build>=0.13
    Using cached scikit_build-0.17.6-py3-none-any.whl (84 kB)
  Collecting cmake>=3.18
    Using cached cmake-3.27.2-py2.py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (26.1 MB)
  Collecting ninja
    Using cached ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)
  Collecting distro (from scikit-build>=0.13)
    Using cached distro-1.8.0-py3-none-any.whl (20 kB)
  Collecting packaging (from scikit-build

#Import All the Required Libraries

In [None]:
from langchain.llms import LlamaCpp
from llama_cpp import Llama
from langchain.callbacks.manager import CallbackManager
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
from huggingface_hub import hf_hub_download
from langchain.chains.question_answering import load_qa_chain

In [None]:
# Callbacks support token-wise streaming
callback_manager = CallbackManager([StreamingStdOutCallbackHandler()])
# Verbose is required to pass to the callback manager

#  Quantized Models from the Hugging Face Community

The Hugging Face community provides quantized models, which allow us to efficiently and effectively utilize the model on the T4 GPU. It is important to consult reliable sources before using any model.

There are several variations available, but the ones that interest us are based on the GGLM library.

We can see the different variations that Llama-2-13B-GGML has [here](https://huggingface.co/models?search=llama%202%20ggml).



In this case, we will use the model called [Llama-2-13B-chat-GGML](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML).

 Quantization reduces precision to optimize resource usage.

Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and activations with low-precision data types like 8-bit integer ( int8 ) instead of the usual 32-bit floating point ( float32 ).

In [None]:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q5_1.bin" # the model is in bin format

In [None]:
model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)

Downloading (…)chat.ggmlv3.q5_1.bin:   0%|          | 0.00/9.76G [00:00<?, ?B/s]

In [None]:
model_path

'/root/.cache/huggingface/hub/models--TheBloke--Llama-2-13B-chat-GGML/snapshots/47d28ef5de4f3de523c421f325a2e4e039035bab/llama-2-13b-chat.ggmlv3.q5_1.bin'

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
n_gpu_layers = 32  # Change this value based on your model and your GPU VRAM pool.
n_batch = 256  # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.

# Loading model,

llm = None
llm = Llama(
    model_path=model_path,
    n_threads=2, # CPU cores
    n_batch=512, # Should be between 1 and n_ctx, consider the amount of VRAM in your GPU.
    n_gpu_layers=32 # Change this value based on your model and your GPU VRAM pool.
    )

In [None]:
chain=load_qa_chain(llm, chain_type="stuff")

In [None]:
query="summarize"
docs=docsearch.similarity_search(query)

In [None]:
docs

In [None]:
chain.run(input_documents=docs, question=query)

In [None]:
query="what are the technologies used in this book"
docs=docsearch.similarity_search(query)

In [None]:
docs

In [None]:
chain.run(input_documents=docs, question=query)

<font size="6">Query</font>

In [None]:
query="What is winning by minority vote? explain in detail by examples"
docs=docsearch.similarity_search(query)

<font size="6">Matching Document</font>

In [None]:
docs[0]

<font size="6">Chat output</font>

In [None]:
chain.run(input_documents=docs, question=query)

In [None]:
query="Tell me about EVMs."
docs=docsearch.similarity_search(query)
docs

In [None]:
chain.run(input_documents=docs, question=query)