# Meta Llama 3.1 8B Instruct Experiments 

We need to start by importing all necessary modules.

In [1]:
from langchain.prompts import PromptTemplate
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader, PyPDFDirectoryLoader
from langchain_core.documents.base import Document
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_pinecone import PineconeVectorStore
from langchain_ollama import OllamaLLM
import pinecone

from typing import List
import os

  from tqdm.autonotebook import tqdm


## Simple Langchain Experiment

First, we need to initialize a simple test prompt.

In [2]:
template = '''Please give me a trivia fact about the {model_name} deep learning model.'''
prompt = PromptTemplate.from_template(template=template)

Next, we should initialize an instance of Meta's Llama 3.1 model.

In [3]:
model = OllamaLLM(model='llama3.1:8b')

Now, we should create a chain.

In [4]:
chain = prompt | model

Lastly, we should invoke the chain using some input argument.

In [5]:
model_name = 'Multilayer Perceptron (MLP)' # change to whatever your preferred DL model is
# --------------------------------------------------------------------------------------

answer = chain.invoke({'model_name': model_name})
print(answer)

Here's a trivia fact:

The Multilayer Perceptron (MLP), also known as a feedforward neural network, was first introduced in a 1943 paper by Warren McCulloch and Walter Pitts, two neuroscientists who proposed the basic architecture of artificial neurons and their connections. However, the modern MLP model that we know today, with its backpropagation training algorithm, was popularized by David Rumelhart, Geoffrey Hinton, and Yann LeCun in their 1986 paper "Backpropagation: Theory, Architectures, and Applications".


## RAG Experiment

Next, we define a function that extracts data from a PDF file and then use it on the Medical Encyclopedia to be used as the knowledge base.

In [6]:
def extract_data(path: str) -> List[Document]:
    '''
    Extracts data from the PDF at the passed path.

    Inputs:
        path: a filepath to the PDF to be extracted from.

    Returns:
        docs: a list of Document objects containing the extracted data.
    '''
    
    loader = PyPDFDirectoryLoader(path=path)
    docs = loader.load()

    return docs

In [7]:
data = extract_data('../data/')

Now, we define a function to split the extracted data into text chunks and use it.

In [8]:
def split_text(data: List[Document]) -> List[Document]:
    '''
    Splits the input data into text chunks.

    Inputs:
        data: a list of Document objects containing data extracted from a PDF.

    Returns:
        chunks: a list of Document objects containing text chunks.
    '''

    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
    chunks = splitter.split_documents(data)

    return chunks

In [9]:
chunks = split_text(data=data)

At this point, we need to download an embedding model from Hugging Face.

In [10]:
embedding_model = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')



In [11]:
embedding_model

HuggingFaceEmbeddings(client=SentenceTransformer(
  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
), model_name='sentence-transformers/all-MiniLM-L6-v2', cache_folder=None, model_kwargs={}, encode_kwargs={}, multi_process=False, show_progress=False)

Now, we need to populate our Pinecone Index with embedded text chunks.

In [12]:
index_name = 'medical-chatbot'
pc_vector_store = PineconeVectorStore.from_documents(chunks, index_name=index_name, embedding=embedding_model)

Next, we should create our prompt template.

In [13]:
template = '''
Please use the following information to answer the user's question.
If you don't know the answer, do NOT try to make one up; just say you don't know.

If the user thanks you, give a typical response.

Context: {context}
Question: {question}

Only return a helpful answer and nothing else below:
'''


In [14]:
prompt = PromptTemplate(template=template, input_variables=['context', 'question'])
chain_kwargs = {'prompt': prompt}

At this point, we will set up our LLM and our chain.

In [15]:
llm = OllamaLLM(model='llama3.1:8b', temperature=0.8, num_gpu=1)

In [16]:
qa = RetrievalQA.from_chain_type(llm=llm, retriever=pc_vector_store.as_retriever(), return_source_documents=True, chain_type_kwargs=chain_kwargs)

Finally, it's time to test!

In [19]:
question = ''
while question.lower() != 'thank you':
    question = input('Enter your medical question or "Thank you" to exit: ')
    print(f'Input: {question}')

    result = qa({'query': question})
    print(f'Response: {result["result"]}\n')

Input: What is the cause of heart attacks?
Response: Coronary artery disease, caused by an accumulation of fatty materials on the inner linings of arteries (atherosclerosis), leading to blocked or restricted blood flow to the heart, resulting in a heart attack.

Input: What is a TIA?

Input: Is there a cure for AIDS?
Response: There is no cure for AIDS, but with antiretroviral therapy (ART) and other treatments, people living with HIV/AIDS can manage the virus, reduce their viral load to undetectable levels, and live long, healthy lives. However, if left untreated or without proper management, the disease will progress, leading to the symptoms and complications associated with AIDS.

Input: What are the symptoms of bubonic plague?
Response: Unfortunately, this information does not mention bubonic plague. The provided text discusses various parasitic infections (fascioliasis, opisthorchiasis, clonorchiasis, and filariasis) but does not cover bubonic plague or its symptoms.

Input: I hav