#Installing Ollama and pulling phi4 model

In [1]:
!curl -fsSL https://ollama.com/install.sh | sh
!nohup ollama serve > output.log 2>&1 &
!ollama pull phi4

>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
######################################################################## 100.0%
>>> Creating ollama user...
>>> Adding ollama user to video group...
>>> Adding current user to ollama group...
>>> Creating ollama systemd service...
>>> The Ollama API is now available at 127.0.0.1:11434.
>>> Install complete. Run "ollama" from the command line.
[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling fd7b6731c33c...   0% ▕▏    0 B/9.1 GB                  [K[?25h[?2026l[?2026h[?25l[A[1Gpulling manifest [K
pulling fd7b6731c33c...   0% ▕▏    0 B/9.1 GB

#Installing Langchain . and FAISS Vector database


In [2]:
!pip install langchain faiss-cpu PyMuPDF requests langchain-community

Collecting faiss-cpu
  Downloading faiss_cpu-1.10.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.4 kB)
Collecting PyMuPDF
  Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.22-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core<1.0.0,>=0.3.51 (from langchain)
  Downloading langchain_core-0.3.55-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain
  Downloading langchain-0.3.24-py3-none-any.whl.metadata (7.8 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 

##Importing Essential module or packages and Load Contex pdf for feed the model

In [16]:
import requests
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OllamaEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOllama
from langchain.chains import RetrievalQA
import os

# Download PDF from url if not exist on local storage
pdf_url = "https://pdfs.semanticscholar.org/0886/0a3b15ff16e081899a437a36832349c3aa65.pdf"
pdf_path = "research.pdf"
if not os.path.exists(pdf_path):
    response = requests.get(pdf_url)
    with open(pdf_path, 'wb') as f:
        f.write(response.content)

##After downloading the pdf now load the pdf and split it into smaller chunks
###Here chunk size will be 1500 and overlap 250

In [17]:
loader = PyMuPDFLoader(pdf_path)
docs = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=250)
chunks = splitter.split_documents(docs)

In [18]:
!file research.pdf

research.pdf: PDF document, version 1.5, 4 pages


##Including embeddings and vector storage also define the model phi4

In [19]:
embeddings = OllamaEmbeddings(model="phi4")
vectorstore = FAISS.from_documents(chunks, embeddings)

# Retrieval-based QA
llm = ChatOllama(model="phi4")
qa = RetrievalQA.from_chain_type(llm=llm, retriever=vectorstore.as_retriever(), chain_type="stuff")


  embeddings = OllamaEmbeddings(model="phi4")
  llm = ChatOllama(model="phi4")


##Our model is ready. Now Lets try question from the given context

In [20]:
question = "Summarize the paper in simple terms."
answer = qa.run(question)
print("Q:", question)
print("A:", answer)

  answer = qa.run(question)


Q: Summarize the paper in simple terms.
A: The paper presents a machine learning project aimed at automating the evaluation of handwritten answer scripts using advanced techniques like Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This system is designed to extract meaningful features from handwriting, making it possible to grade and provide feedback consistently across different subjects. By doing so, it addresses common challenges in manual grading, such as recognizing diverse handwriting styles and ensuring objective evaluations.

The project involves several stages: collecting a dataset of handwritten samples, preprocessing these samples, dividing them into training and testing sets, and then implementing the RNNs for recognition tasks. The system achieves about 90% accuracy in recognizing characters and digits, using tools like Python and TensorFlow for development.

Overall, this automated evaluation system aims to revolutionize educational assessment

###Try some deep questions

In [21]:
question = "What machine leaning models are used and what are the accuracy percentage of these models"
answer = qa.run(question)
print("Q:", question)
print("A:", answer)

Q: What machine leaning models are used and what are the accuracy percentage of these models
A: The provided context mentions several machine learning approaches that have been explored in developing an automated grading system, particularly focusing on handwritten text recognition and essay evaluation. Here's a summary based on the information given:

1. **Latent Semantic Analysis (LSA):** This technique has been used for understanding the content of essays by analyzing relationships between a set of documents and the terms they contain.

2. **N-Gram:** An approach that involves looking at sequences of 'n' items in text to predict the next item or analyze patterns.

3. **TF-IDF (Term Frequency-Inverse Document Frequency):** This is used for weighing the importance of words in a document relative to a collection of documents, often utilized in information retrieval and text mining.

4. **Bayesian Classifier:** A probabilistic model that applies Bayes' theorem with strong independence a

In [22]:
question = "Which datasets are used for train the models"
answer = qa.run(question)
print("Q:", question)
print("A:", answer)

Q: Which datasets are used for train the models
A: The provided context does not specify any particular datasets used for training the models in the automated evaluation system for handwritten text recognition and essay grading. The discussion focuses on techniques like Latent Semantic Analysis (LSA), N-Gram, TF-IDF, Bayesian classifier, K-nearest neighbor approaches, as well as Deep Learning (DL) and Natural Language Processing (NLP). However, it does not mention specific datasets used during training.

If you have access to additional documents or resources related to this project, they might provide more detailed information on the datasets utilized. Otherwise, it would be best to consult those sources directly for such details.
