### Load Documents

In [1]:
from langchain.document_loaders import PyPDFLoader

In [2]:
# example wiki page on elephants
loader = PyPDFLoader('./data/Elephant-Wikipedia.pdf')
pages = loader.load()

### Split Documents


In [3]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [4]:
# define the text splitter
r_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=200, 
    separators=["\n\n", "\n", " ", ""]
)

In [5]:
# Create our splits from the PDF
docs = r_splitter.split_documents(pages)

In [2]:
import os

def get_contexts() -> tuple:
    """
    Returns the contexts available in the system.

    :return: A tuple containing the contexts.
    """
    # we start with a default "free format" context plus we add  all contexts available in the context folder
    contexts = ["free format"]
    # get all sub directory names of the directotry contexts
    context_folders = os.listdir("./contexts")
    # add all sub directory names to the contexts list
    contexts.extend(context_folders)

    return tuple(contexts)

get_contexts()


('free format', 'toeslagen_affaire', 'wiki_elephants')

### Create Embeddings & Vectorstore

In [9]:
#from langchain.embeddings.openai import OpenAIEmbeddings
#embeddings = OpenAIEmbeddings()

from langchain.vectorstores import FAISS 

from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(base_url="http://pcloud:11434", model="nomic-embed-text")

# Maak een FAISS index aan
faiss_vector_db = FAISS.from_documents(documents=docs, embedding=embeddings)


In [42]:
qdrant = Qdrant.from_documents(
    docs,
    embeddings,
    location=":memory:",  # Local mode with in-memory storage only
    collection_name="my_documents",
)

In [12]:
# We can test different types of searches (similartiy search, mmr, etc.)
question = "What are the main differences between types of elephants?"
found_docs = faiss_vector_db.similarity_search(question)
# found_docs = qdrant.max_marginal_relevance_search(query, k=2, fetch_k=10)

In [13]:
found_docs

[Document(page_content='Elephants\nTemporal range:\nA female African bush elephant inMikumi National Park, T anzaniaScientiﬁc classiﬁcation\nDomain:Eukaryota\nKingdom:Animalia\nPhylum:Chordata\nClass:Mammalia\nOrder:Proboscidea\nSuperfamily:Elephantoidea\nFamily:Elephantidae\nGroups included\n▪ Loxodonta Anonymous, 1827\n▪ ElephasLinnaeus, 1758\n▪ †Palaeoloxodon Matsumoto,\n1925ElephantElephants are  the  largest  living  land  animals.  Three  livingspecies  are  currently  recognised:  the  African  bush  elephant\n(Loxodonta africana), the African forest elephant (L. cyclotis),\nand the Asian elephant (Elephas maximus). They are the only\nsurviving members of the familyElephantidae and the orderProboscidea;  extinct  relatives  include  mammoths  andmastodons.  Distinctive  features  of  elephants  include  a  longproboscis called a trunk, tusks, large ear ﬂaps, pillar-like legs,\nand  tough  but  sensitive  grey  skin.  The  trunk  is  prehensile,\nbringing food and water to the mo

### Set up the LLM 

In [22]:
##### Use this code to use Ollama with llama2 or mistral models
from langchain.chat_models import ChatOllama
llm = ChatOllama(base_url="http://pcloud:11434", model="llama3", temperature=0)

##### Use this code to connect with OpenAI API
#from langchain.chat_models import ChatOpenAI
#llm = ChatOpenAI(model_name="gpt-3.5-turbo-1106", temperature=0)

**1: RetreivalQA Chain**

**Prompting**

In [23]:
from langchain.prompts import PromptTemplate

# Build prompt
template = """Use the following pieces of context to answer the question at the end. 
If you don't know the answer, just say that you don't know, don't try to make up an answer. 
Use three sentences maximum. Keep the answer as concise as possible. 
Always say "thanks for asking!" at the end of the answer. 
{context}
Question: {question}
Helpful Answer:"""
QA_CHAIN_PROMPT = PromptTemplate.from_template(template)


In [24]:
from langchain.chains import RetrievalQA

# Run chain
qa_chain = RetrievalQA.from_chain_type(
    llm,
    retriever=faiss_vector_db.as_retriever(),
    return_source_documents=True,
    chain_type_kwargs={"prompt": QA_CHAIN_PROMPT}
)

In [32]:
question = "What different sub species of elephants are there?"

In [28]:
result = qa_chain({"query": question})

In [29]:
result["result"]

"I think you meant to ask about elephants, not elphenats!\n\nAccording to the provided context, it seems that there is a mention of Asian elephants and African savannah elephants in reference 100. This suggests that these two subspecies of elephants are being compared.\n\nAdditionally, references 20 and 21 seem to be discussing the paleoneurology of proboscideans (which includes elephants), but they do not specifically mention different subspecies of elephants.\n\nIf you're looking for information on elephant subspecies, I can suggest some general information. There are three main species of elephants: African savannah elephant (Loxodonta africana), African forest elephant (Loxodonta cyclotis), and Asian elephant (Elephas maximus). Within these species, there may be various subspecies or regional variations.\n\nPlease let me know if you'd like more specific information on elephant subspecies!"

In [30]:
result["source_documents"][0]

Document(page_content='35. Shoshani, J.; Eisenberg, J. F . (1982). "Elephas maximus " (https://web.archive.org/web/20150924121940/http://www.science.smith.edu/msi/pdf/i0076-3519-182-01-0001.pdf ) (PDF). Mammalian Species (182): 1–8. doi:10.2307/3504045  (https://doi.org/10.2307%2F3504045 ). JSTOR\xa03504045  (https://www.jstor.org/stable/3504045). Archived from the original  (http://www.science.smith.edu/msi/pdf/i0076-3519-182-01-0001.pdf ) (PDF) on 24 September 2015. Retrieved 27 October 2012.\n36. "The Largest Land Animals In The World"  (https://factanimal.com/animal-facts/largest-land-animals-in-the-world/ ). Fact Animal. Retrieved 29 September 2023.\n37. Shoshani, pp. 68–70.\n38. Somgrid, C. "Elephant Anatomy and Biology: Skeletal system"  (https://web.archive.org/web/20120613191055/http://www.asianelephantresearch.com/about-elephant-anatomy-and-biology-p1.php#skeleton ). Elephant Research and Education\nCenter, Department of Companion Animal and Wildlife Clinics, Faculty of\nVete

**Optional: Alternative Chain Types - Map Reduce**

In [33]:
# needs transformers package

qa_chain_mr = RetrievalQA.from_chain_type(
    llm,
    retriever=faiss_vector_db.as_retriever(),
    chain_type="map_reduce"
)
result = qa_chain_mr({"query": question})
result["result"]

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Token indices sequence length is longer than the specified maximum sequence length for this model (1051 > 1024). Running this sequence through the model will result in indexing errors


'According to the provided texts, there are several recognized subspecies of elephants:\n\n**African Savanna Elephant (Loxodonta africana)**\n\n* **Loxodonta africana kudu**: Found in southern Africa.\n* **Loxodonta africana oxyodontis**: Found in eastern Africa.\n\n**African Forest Elephant (Loxodonta cyclotis)**\n\n* **Loxodonta cyclotis cyclotis**: Found in central Africa.\n* **Loxodonta cyclotis wellsi**: Found in western Africa.\n\n**Asian Elephant (Elephas maximus)**\n\n* **Elephas maximus indicus**: Found in India and Sri Lanka.\n* **Elephas maximus maximus**: Found in Southeast Asia.\n\nPlease note that taxonomy is constantly evolving, and some sources may group these subspecies differently.'

**Optional: Alternative Chain Types - Refine**

In [34]:
qa_chain_r = RetrievalQA.from_chain_type(
    llm,
    retriever=faiss_vector_db.as_retriever(),
    chain_type="refine"
)
result = qa_chain_r({"query": question})
result["result"]

"With this additional context, I can refine my previous answer to provide more comprehensive information about elephant sub- species.\n\nThe original answer mentioned three living species of elephants:\n\n1. **African Savanna Elephant** (Loxodonta africana)\n2. **African Forest Elephant** (Loxodonta cyclotis) - previously considered a separate species, but now recognized as a subspecies of the African Savanna Elephant\n3. **Asian Elephant** (Elephas maximus)\n\nHowever, with this new context, I'd like to highlight some additional information:\n\n* The International Elephant Foundation provides insights into elephant conservation and research.\n* The Wikipedia article on Elephants provides information on the taxonomy, evolution, and conservation status of elephants.\n\nRegarding elephant sub-species, I'd like to note that:\n\n* There are three recognized species of elephants: African Savanna Elephant, African Forest Elephant, and Asian Elephant.\n* Within these species, there may be sub

### Memory

In [35]:
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True
)

### Conversational Retreival Chain

In [37]:
from langchain.chains import ConversationalRetrievalChain
retriever=faiss_vector_db.as_retriever()
qa = ConversationalRetrievalChain.from_llm(
    llm,
    retriever=retriever,
    memory=memory
)

In [38]:
question = "What different sub species of elephants are there?"
result = qa({"question": question})

In [40]:
print(result['answer'])

There are three living species of elephant, and several subspecies:

1. **African Savanna Elephant** (Loxodonta africana):
	* African Forest Elephant (L. a. forestalis): found in the Congo Basin.
	* African Bush Elephant (L. a. kudu): found in southern Africa.
2. **Asian Elephant** (Elephas maximus):
	* Sri Lankan Elephant (E. m. maximus): found only in Sri Lanka.
	* Indian Elephant (E. m. indicus): found in India, Nepal, and parts of Southeast Asia.
	* Sumatran Elephant (E. m. sumatrensis): found only in Sumatra, Indonesia.
3. **Desert Elephant** (Loxodonta africana arabica):
	* Also known as the North African Elephant or Arabian Elephant, this subspecies is found in the deserts of North Africa.

There are also several extinct subspecies of elephants, including:

* **Woolly Mammoth** (Mammuthus primigenius): an Ice Age-era elephant that lived in northern hemisphere.
* **Dwarf Elephant** (Elephas minus): a small-bodied elephant that lived on the island of Crete during the Pleistocene e

In [42]:
# Ask a follow-up question
question = "Can you tell me more about the taxonomy of elephants?"
result = qa({"question": question})

In [43]:
print(result['answer'])

The current consensus among taxonomists regarding the classification of elephant species and subspecies is as follows:

* There are two main species of elephants:
	+ African Elephant (Loxodonta africana)
	+ Asian Elephant (Elephas maximus)
* The African Elephant has three recognized subspecies:
	+ Loxodonta africana africana (Savanna Elephant)
	+ Loxodonta africana kudu (Forest Elephant)
	+ Loxodonta africana oxyodontia (Desert Elephant)
* The Asian Elephant has two recognized subspecies:
	+ Elephas maximus maximus (Indian Elephant)
	+ Elephas maximus indicus (Sri Lankan Elephant)

It's worth noting that some taxonomists recognize a third species of elephant, the Forest Elephant (Loxodonta cyclotis), which is considered to be a distinct species by some authors. However, this classification is not universally accepted and is still a matter of debate among taxonomists.

Additionally, there are several recognized hybrids between African and Asian elephants, such as the Motty (a captive hy

In [44]:
# Ask a follow-up question
question = "What are the species that are close relatives to elephants ?"
result = qa({"question": question})

In [45]:
print(result['answer'])

According to the provided context, some of the closest relatives to elephants include:

1. Mammoths: These were extinct relatives of elephants.
2. Mastodons: Another extinct relative of elephants.

These species are part of the order Proboscidea and family Elephantidae, which includes modern elephant species such as African bush elephants (Loxodonta africana), African forest elephants (L. cyclotis), and Asian elephants (Elephas maximus).

Additionally, some other mammals that are not directly related to elephants but share similarities with them include:

1. Manatees: These aquatic mammals belong to the order Sirenia and are also part of the clade Afrotheria.
2. Dugongs: Another aquatic mammal, dugongs are also part of the order Sirenia and clade Afrotheria.

These species share some physical characteristics with elephants, such as their large size, herbivorous diet, and aquatic adaptations.


In [47]:
result["source_documents"][0]

KeyError: 'source_documents'