In [2]:
!pip -qqq install pip --progress-bar off
!pip -qqq install langchain-groq==0.1.3 --progress-bar off
!pip -qqq install langchain==0.1.17 --progress-bar off
!pip -qqq install llama-parse==0.1.3 --progress-bar off
!pip -qqq install qdrant-client==1.9.1  --progress-bar off
!pip -qqq install "unstructured[md]"==0.13.6 --progress-bar off
!pip -qqq install fastembed==0.2.7 --progress-bar off
!pip -qqq install flashrank==0.2.4 --progress-bar off

In [4]:

import os
import textwrap
from pathlib import Path

from google.colab import userdata
from IPython.display import Markdown
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import FlashrankRerank
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Qdrant
from langchain_community.document_loaders import UnstructuredMarkdownLoader
from langchain_community.embeddings.fastembed import FastEmbedEmbeddings
from langchain_core.prompts import ChatPromptTemplate
from langchain_groq import ChatGroq
from llama_parse import LlamaParse

os.environ["GROQ_API_KEY"] = userdata.get("GROQ_API_KEY")


def print_response(response):
    response_txt = response["result"]
    for chunk in response_txt.split("\n"):
        if not chunk:
            print()
            continue
        print("\n".join(textwrap.wrap(chunk, 100, break_long_words=False)))



In [6]:
!mkdir data
!gdown 1HIkSzCqGB82ggcibMgHN-xOx689o558h -O "data/school-data.pdf"

mkdir: cannot create directory ‘data’: File exists
Downloading...
From: https://drive.google.com/uc?id=1HIkSzCqGB82ggcibMgHN-xOx689o558h
To: /content/data/school-data.pdf
100% 722k/722k [00:00<00:00, 118MB/s]


**Document Parsing**

In [8]:

instruction = """The provided document is Contains information about a university in nigeria called Madonna University.
This form provide detailed information about the school and relavant information a student will need to know about the school.
it is structure data, it contains some links and some not properly format text contain but the information are correct.
Try to be precise while answering the questions"""

parser = LlamaParse(
    api_key=userdata.get("LLAMA_PARSE"),
    result_type="markdown",
    parsing_instruction=instruction,
    max_timeout=5000,
)

llama_parse_documents = await parser.aload_data("./data/school-data.pdf")

Started parsing the file under job_id d6191416-eee1-4799-bee1-e4d6822a9e98


In [9]:

parsed_doc = llama_parse_documents[0]

In [10]:
Markdown(parsed_doc.text[:4096])

# Admission

Apply To Madonna University, Nigeria

# Admission Requirements

Applicants are expected to have five (5) O’ Level Credit Passes, including English Language and Mathematics, in not more than two sittings, and three (3) other subjects relevant to the course of study.

Madonna University enjoys both NUC and Relevant Boards/Council Accreditation in all her academic programmes.

Madonna University operates under a stable academic curriculum with no disruption of academic exercise through strikes.

Madonna University provides 100% excellent boarding facilities for all students and unhindered campus-wide internet access.

Madonna University provides Free Body Building/Weight Training/Fitness Programmes for students and unhindered campus-wide internet access.

# GUIDELINES FOR ONLINE ADMISSION PROCEDURES INTO MADONNA UNIVERSITY, NIGERIA

1. Scan your credentials—passport, birp certificate, O-level and JAMB results and save pe scanned credentials on a device.
2. Open a web browser, e.g. Chrome, Opera-mini, Mozilla Firefox, Internet-explorer and type in www.madonnauniversity.edu.ng into pe search bar and click on “apply now.”
3. Complete pe application form (fill correctly using Upper Case all prough, except for your email). Please do not fill in pe space for a maiden name if you are a male or if you are currently single.
4. After filling and successfully submitting your form, upload your scanned credentials as indicated on pe page by clicking on “browse.” Once you’re done click submit.
5. After successfully submitting your form, copy or print your application number (for example, 2021/2022/app/00012) and access code. The application number is what you will use to make any payment at pe bank, pending when you’re fully admitted into pe school.
6. Your login access code is your ‘surname,’ and you have to type it pe way you typed it during your portal registration each time you want to log in.
---
Applicants are to pay the sum of ₦6,000 (six thousand Naira only) at Mayfresh Mortgage Bank for medical test, and then proceed with their original credentials for oral interview at the ADMISSIONS OFFICE of any of the Madonna University campuses: (Elele, Okija or Akpugo).

If interview is successful, then proceed to pay your school fees at any branch of Mayfresh Mortgage Bank nearest to you.

# Things To Know First

Five (5) O’ Level Credit Level Passes at not more than two sittings to include English Language and Mathematics and three (3) other subjects relevant to course of study.

- All prospective undergraduates who scored up to 140 in the most recent JAMB UTME are qualified to apply including those who did not choose Madonna as first choice.
- The above cut-off mark applies to all other courses except Law (200), Medicine (200), Pharmacy (200), Nursing (200) and Medical Laboratory Science (160).
- Candidates awaiting SSCE result or its equivalent are also eligible to apply for admission provided the result will be available on resumption. Form is available online for Applicants. Open this link, fill and submit the form online.
- Screening and Admission forms are FREE
- All intending Candidates must undergo medical examination in Madonna University Medical Centres on payment of necessary medical fees.

# Request information

MODE OF PAYMENT

NOTE: For the payment of school fees, the depositor’s name MUST be the student’s name/Registration number. Please do not transfer if your ward’s name/registration number cannot be included in the narration to avoid challenges in confirmation and collection of receipt.

MODE OF PAYMENT
1. May fresh Mortgage Bank LTD
2. OHHA Microfinance Bank
3. Commercial Bank

i. ELELE CAMPUS: ACCOUNT NAME: MAYFRESH MORTGAGE BANK

ACCOUNT NUMBER: 1019208827

BANK: UBA
---
ii. OKIJA CAMPUS:

|ACCOUNT NAME:|MAYFRESH MORTGAGE BANK|
|---|---|
|ACCOUNT NUMBER:|2013702605|
|BANK:|FIRST BANK|

iii. AKPUGO CAMPUS:

|ACCOUNT NAME:|MAYFRESH SAVINGS & LOAN|
|---|---|
|ACCOUNT NUMBER:|1015688731|
|BANK:|UBA|

PAYMENT THROUGH THE COMMERCIAL BANK:

- After payment at the Commercial bank, click on Mayfresh Mortgage Bank

In [12]:

document_path = Path("data/parsed_document.md")
with document_path.open("a") as f:
    f.write(parsed_doc.text)

**Vector Embeddings**


In [13]:

loader = UnstructuredMarkdownLoader(document_path)
loaded_documents = loader.load()

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


In [14]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=2048, chunk_overlap=128)
docs = text_splitter.split_documents(loaded_documents)
len(docs)

47

In [15]:

print(docs[0].page_content)

Admission

Apply To Madonna University, Nigeria

Admission Requirements

Applicants are expected to have five (5) O’ Level Credit Passes, including English Language and Mathematics, in not more than two sittings, and three (3) other subjects relevant to the course of study.

Madonna University enjoys both NUC and Relevant Boards/Council Accreditation in all her academic programmes.

Madonna University operates under a stable academic curriculum with no disruption of academic exercise through strikes.

Madonna University provides 100% excellent boarding facilities for all students and unhindered campus-wide internet access.

Madonna University provides Free Body Building/Weight Training/Fitness Programmes for students and unhindered campus-wide internet access.

GUIDELINES FOR ONLINE ADMISSION PROCEDURES INTO MADONNA UNIVERSITY, NIGERIA

Scan your credentials—passport, birp certificate, O-level and JAMB results and save pe scanned credentials on a device.

Open a web browser, e.g. Chrom

In [16]:
embeddings = FastEmbedEmbeddings(model_name="BAAI/bge-base-en-v1.5")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Fetching 5 files:   0%|          | 0/5 [00:00<?, ?it/s]

special_tokens_map.json:   0%|          | 0.00/695 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/740 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

model_optimized.onnx:   0%|          | 0.00/218M [00:00<?, ?B/s]

In [None]:
qdrant = Qdrant.from_documents(
    docs,
    embeddings,
    # location=":memory:",
    path="./db",
    collection_name="document_embeddings",
)

In [21]:

%%time
query = "When was madonna university founded?"
similar_docs = qdrant.similarity_search_with_score(query)

CPU times: user 362 ms, sys: 1.91 ms, total: 363 ms
Wall time: 363 ms


In [22]:

for doc, score in similar_docs:
    print(f"text: {doc.page_content[:256]}\n")
    print(f"score: {score}")
    print("-" * 80)
    print()

text: Discover Madonna University, Nigeria - About Us

Madonna's History

Our story dates back to the late 90s, when Very Rev. Fr. Prof. E.M.P. Edeh CSSp, OFR, founded the prestigious Madonna University. By 1999, she had commenced active operation with the appro

score: 0.7595845693627044
--------------------------------------------------------------------------------

text: Medicine & Surgery

Pharmacy

Biochemistry

Computer Science

Industrial Chemistry

Microbiology

Medical Laboratory Science

Optometry

Public Health

Nursing Science

Madonna University, Okija Campus, Anambra State

Founded in 1999, as the first campus o

score: 0.7423774143751911
--------------------------------------------------------------------------------

text: Medical Laboratory Science Department - Madonna University

Medical Laboratory Science Department

History

The Department of Medical Laboratory Science of the Faculty of Health Sciences College of Health Sciences Madonna University was established i

In [23]:

%%time
retriever = qdrant.as_retriever(search_kwargs={"k": 5})
retrieved_docs = retriever.invoke(query)

CPU times: user 368 ms, sys: 1.53 ms, total: 369 ms
Wall time: 377 ms


In [24]:
for doc in retrieved_docs:
    print(f"id: {doc.metadata['_id']}\n")
    print(f"text: {doc.page_content[:256]}\n")
    print("-" * 80)
    print()

id: 3a6489e40f774d598da39b3c085646ec

text: Discover Madonna University, Nigeria - About Us

Madonna's History

Our story dates back to the late 90s, when Very Rev. Fr. Prof. E.M.P. Edeh CSSp, OFR, founded the prestigious Madonna University. By 1999, she had commenced active operation with the appro

--------------------------------------------------------------------------------

id: 1be110fb5f824870a21032688edd8c4c

text: Medicine & Surgery

Pharmacy

Biochemistry

Computer Science

Industrial Chemistry

Microbiology

Medical Laboratory Science

Optometry

Public Health

Nursing Science

Madonna University, Okija Campus, Anambra State

Founded in 1999, as the first campus o

--------------------------------------------------------------------------------

id: 1ced22dcc6d44bad8171ec6afea723e4

text: Medical Laboratory Science Department - Madonna University

Medical Laboratory Science Department

History

The Department of Medical Laboratory Science of the Faculty of Health Sciences C

**Reranking**

In [25]:

compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever
)

Downloading ms-marco-MiniLM-L-12-v2...


ms-marco-MiniLM-L-12-v2.zip: 100%|██████████| 21.6M/21.6M [00:00<00:00, 133MiB/s]


In [26]:

%%time
reranked_docs = compression_retriever.invoke(query)
len(reranked_docs)

Running pairwise ranking..
CPU times: user 3.27 s, sys: 174 ms, total: 3.45 s
Wall time: 3.53 s


3

In [27]:

for doc in reranked_docs:
    print(f"id: {doc.metadata['_id']}\n")
    print(f"text: {doc.page_content[:256]}\n")
    print(f"score: {doc.metadata['relevance_score']}")
    print("-" * 80)
    print()

id: 1be110fb5f824870a21032688edd8c4c

text: Medicine & Surgery

Pharmacy

Biochemistry

Computer Science

Industrial Chemistry

Microbiology

Medical Laboratory Science

Optometry

Public Health

Nursing Science

Madonna University, Okija Campus, Anambra State

Founded in 1999, as the first campus o

score: 0.9993513226509094
--------------------------------------------------------------------------------

id: 3a6489e40f774d598da39b3c085646ec

text: Discover Madonna University, Nigeria - About Us

Madonna's History

Our story dates back to the late 90s, when Very Rev. Fr. Prof. E.M.P. Edeh CSSp, OFR, founded the prestigious Madonna University. By 1999, she had commenced active operation with the appro

score: 0.9990987181663513
--------------------------------------------------------------------------------

id: 1ced22dcc6d44bad8171ec6afea723e4

text: Medical Laboratory Science Department - Madonna University

Medical Laboratory Science Department

History

The Department of Medical La

**Q&A Over Document**

In [28]:

llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")

In [29]:
prompt_template = """
You are a support chatbot for university student, use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: {context}
Question: {question}

Answer the question and provide additional helpful information,
based on the pieces of information, if applicable. Be succinct.

Responses should be properly formatted to be easily read.
"""

prompt = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)

In [30]:

qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=compression_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt, "verbose": True},
)

In [32]:

%%time
response = qa.invoke("What is the school vision")

Running pairwise ranking..


[1m> Entering new StuffDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m
You are a support chatbot for university student, use the following pieces of information to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context: Programme Structure

The Department of Optometry, Madonna University runs a six-year unclassified Doctor of Optometry degree Programme. The Programme is geared towards training students to become competent professional optometrists. Courses to be taken include Basic Sciences (Physics, Chemistry, Biology, Mathematics); Social Sciences (Philosophy, Theology, Information Sciences, People and Culture, Psychology, Peace and Conflict resolution, Bioethics etc.); Languages (English, French and German); Basic medical sciences (Anatomy, Physiology, Biochemistry, Microbiology, Biostatistics etc.) and optometry course

In [33]:

print_response(response)

**School Vision**

The school's vision is not explicitly stated in the provided information. However, the philosophy of
the Optometry programme is "teaching, research and service to mankind with dignity."

**Additional Information**

The Department of Optometry at Madonna University aims to instill in students the knowledge and
skills needed to practice optometry effectively and efficiently, and to be competent in all its
applications for the benefit of humanity worldwide.


In [34]:
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=compression_retriever,
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt, "verbose": False},
)

In [37]:

%%time
response = qa.invoke("what is the university vision")

Running pairwise ranking..
CPU times: user 2.38 s, sys: 105 ms, total: 2.48 s
Wall time: 3.74 s


In [38]:
print_response(response)

**University Vision**

Unfortunately, the provided information does not explicitly state the university's vision. However,
I can provide some context about the Department of Optometry's objectives and programme structure.

The Department of Optometry at Madonna University aims to train students to become competent
professional optometrists. The six-year Doctor of Optometry degree programme is designed to equip
students with a broad range of skills and knowledge in basic sciences, social sciences, languages,
and optometry courses.

If you have any further questions or would like more information about the programme, feel free to
ask!
