<a href="https://colab.research.google.com/github/Novadotgg/Mini-llm/blob/main/minillmSayan.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [23]:
!pip install -q langchain pypdf sentence-transformers faiss-cpu gpt4all
!pip install -q langchain-community
!pip install -q transformers torch
!pip install -q transformers accelerate bitsandbytes

In [25]:
# !pip install -q langchain_community transformers torch sentence-transformers faiss-cpu pypdf langchain

from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain_community.llms import HuggingFacePipeline
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.prompts import PromptTemplate
from google.colab import files
#-------------------------------------------------------------------------------------------------------------------------------------------
# 1. Upload and process PDF
uploaded = files.upload()
pdf_name = list(uploaded.keys())[0]

loader = PyPDFLoader(pdf_name)
documents = loader.load()

text_splitter = CharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separator="\n"
)
texts = text_splitter.split_documents(documents)
#-------------------------------------------------------------------------------------------------------------------------------------------

# 2. Create embeddings and vector store
embedding_model = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2",
    model_kwargs={'device': 'cpu'}
)

db = FAISS.from_documents(texts, embedding_model)
retriever = db.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 1}
)
#-------------------------------------------------------------------------------------------------------------------------------------------

# 3. Setup LLM pipeline
model_name = "gpt2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=100,
    temperature=0.3,
    do_sample=False,
    device="cpu",
    pad_token_id=tokenizer.eos_token_id
)

llm = HuggingFacePipeline(pipeline=pipe)
#-------------------------------------------------------------------------------------------------------------------------------------------

# 4. Create QA chain with custom prompt
prompt_template = """Use the following context to answer the question.
If you don't know the answer, say you don't know. Keep the answer concise.

Context: {context}
Question: {question}
Answer:"""

PROMPT = PromptTemplate(
    template=prompt_template,
    input_variables=["context", "question"]
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    chain_type_kwargs={"prompt": PROMPT},
    return_source_documents=True
)
#-------------------------------------------------------------------------------------------------------------------------------------------

# 5. Query function
def ask_question(question):
    result = qa_chain({"query": question})
    print("\nQuestion:", question)
    print("Answer:", result["result"])
    print("\nSource Page:", result['source_documents'][0].metadata['page'] + 1)
    print("Relevant Text:", result['source_documents'][0].page_content[:200] + "...")

Saving 1277.pdf to 1277.pdf


Device set to use cpu


In [24]:
# def ask_question(question):
#     result = qa_chain({"query": question})
#     print("\nAnswer:", result["result"])
#     print("\nRelevant sections:")
#     for i, doc in enumerate(result['source_documents'], 1):
#         print(f"\nSection {i} (Page {doc.metadata['page']+1}):")
#         print(doc.page_content[:500] + "...")

# # Example usage:
# ask_question("What is the main contribution of this paper?")

In [None]:
print("QNA with this research paper! Type 'exit' to quit.\n")
while True:
    question = input("Your question: ")
    if question.lower() in ['exit', 'quit']:
        break
    ask_question(question)

QNA with this research paper! Type 'exit' to quit.

Your question: Whats the title of the research paper?





Question: Whats the title of the research paper?
Answer: Use the following context to answer the question. 
If you don't know the answer, say you don't know. Keep the answer concise.

Context: Sayan Das  et al. / Procedia Computer Science 258 (2025) 2040–2049 2049
 Sayan Dasa, Dr. M. Ambika * / Procedia Computer Science 00 (2025) 000–000  9 
 
 
 
 
 
 
 
 
 
 
 
 
 
Fig.3. Prediction of crops to be grown in soil with different amount of elements. 
4.2. NLP based chatbot  
The chatbot takes the pickle file of the best model saved to predict the crops. After when the chatbot runs, it first 
asks the farmer what they want to know, if it’s recommendation then the chatbot asks the farmer abou t the area they 
live then the chatbot updates itself with the temperature, humidity, rainfall, of that particular area and asks the farmers 
for the N, P, K, pH value of the soil and provides the recommendation of the available crop with the given parameters. 
Fig. 4. (a) shows the intent recognitio




Question: explain me the architecture of the crop recommendation chatbot
Answer: Use the following context to answer the question. 
If you don't know the answer, say you don't know. Keep the answer concise.

Context: that users feel comfortable and confident using it. 
3.2.1.1. User input 
Chatbot asks questions based on the crop  suggestions about the N itrogen, Phosphorus, Potassium, pH values and 
the weather conditions of the particular area are taken using the API (Application Programming Interface) to predict 
the crops that the farmer can grow.  
3.2.1.2. Intent recognition 
Once the user inputs any word like recommend, the NLP based chatbot processes the text to understand the intent 
and it follows a predefined set of questions related to that intent, using a structured conversation with the user and 
asks questions to recommend an answer. 
3.3. Chatbot integrations 
Using Natural Language Processing (NLP) techniques, the chatbot is created it’s fed with the pickle files from




Question: how accurate is the chatbot?
Answer: Use the following context to answer the question. 
If you don't know the answer, say you don't know. Keep the answer concise.

Context: • Pass the instance x down the tree and obtain a predicted class label. 
• Aggregate the predictions from all trees by majority voting: the class with the most 
votes is the predicted label for x. 
• Return the majority-vote class label as the final prediction. 
 
Further a hyper parameter tuning was done on the Random Forest classifier for better performance. Firstly 500 
decision trees were considered to build the Random Forest which is much better for the chosen dataset, to reduce the 
risk of overfitting maximum depth is considered as 10 and the minimum split and the minimum samples in the leaf 
were chosen wisely to avoid overfitting of the data and the quality of split or the criterion is considered as gini impurity. 
3.2. Chatbot  
Chatbot is a sophisticated software program that mimics human speec




Question: What can the chatbot predict?
Answer: Use the following context to answer the question. 
If you don't know the answer, say you don't know. Keep the answer concise.

Context: • Pass the instance x down the tree and obtain a predicted class label. 
• Aggregate the predictions from all trees by majority voting: the class with the most 
votes is the predicted label for x. 
• Return the majority-vote class label as the final prediction. 
 
Further a hyper parameter tuning was done on the Random Forest classifier for better performance. Firstly 500 
decision trees were considered to build the Random Forest which is much better for the chosen dataset, to reduce the 
risk of overfitting maximum depth is considered as 10 and the minimum split and the minimum samples in the leaf 
were chosen wisely to avoid overfitting of the data and the quality of split or the criterion is considered as gini impurity. 
3.2. Chatbot  
Chatbot is a sophisticated software program that mimics human spee




Question: How crop recommendation system helps farmers
Answer: Use the following context to answer the question. 
If you don't know the answer, say you don't know. Keep the answer concise.

Context: this data.The recommendations are accurate and customised to the unique circumstances of each farmer's field thanks 
to the Random Forest model's use. Through the provision of practical insights that can result in increased crop yields 
 
  
 
0.8
0.85
0.9
0.95
1
Accuracy
Classification Models
Comparative Analysis of Accuracy 
Random Forest with better
hyperparameters = 0.9986
Random Forest = 0.9986
Support Vector Machine = 0.9956
K-nearest neighbour = 0.9982
Decision Tree = 0.9980
Multilayer Perceptron = 0.9936
Naïve Bayes = 0.9982
Question: How crop recommendation system helps farmers
Answer: The model is based on the following assumptions:

The model is based on the following assumptions:

The model is based on the following assumptions:

The model is based on the following assumptions:




Question: can this be an evolutionary model?
Answer: Use the following context to answer the question. 
If you don't know the answer, say you don't know. Keep the answer concise.

Context: better decisions, waste fewer resources, and become more resilient to environmental changes.  
Furthermore, the system created in this study may be modified and extended to handle additional farm management 
facets including insect control, irrigation scheduling, and market forecasting. The chatbot and machine learning
Question: can this be an evolutionary model?
Answer: yes. 

Context: the system is designed to be adaptable to changing conditions.  

The system is designed to be adaptable to changing conditions.                                                                 

Source Page: 2
Relevant Text: better decisions, waste fewer resources, and become more resilient to environmental changes.  
Furthermore, the system created in this study may be modified and extended to handle additional farm