<a href="https://colab.research.google.com/github/Ashish-AVS/Llama_Chatbot_CPU/blob/main/Final_Chatbot(1).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -Uqqq pip --progress-bar off
!pip install -qqq torch==2.0.1 --progress-bar off
!pip install -qqq transformers==4.31.0 --progress-bar off
!pip install -qqq langchain==0.0.266 --progress-bar off
!pip install -qqq chromadb==0.4.5 --progress-bar off
!pip install -qqq pypdf==3.15.0 --progress-bar off
!pip install -qqq xformers==0.0.20 --progress-bar off
!pip install -qqq sentence_transformers==2.2.2 --progress-bar off
!pip install -qqq InstructorEmbedding==1.0.1 --progress-bar off
!pip install -qqq pdf2image==1.16.3 --progress-bar off

[0m

In [None]:
!wget -q https://github.com/PanQiWei/AutoGPTQ/releases/download/v0.4.1/auto_gptq-0.4.1+cu118-cp310-cp310-linux_x86_64.whl

In [None]:
!pip install -qqq auto_gptq-0.4.1+cu118-cp310-cp310-linux_x86_64.whl --progress-bar off

[0m

In [None]:
!sudo apt-get install poppler-utils

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
poppler-utils is already the newest version (22.02.0-2ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.


In [None]:
import torch
from auto_gptq import AutoGPTQForCausalLM
from langchain import HuggingFacePipeline, PromptTemplate
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFDirectoryLoader
from langchain.embeddings import HuggingFaceInstructEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import Chroma
from pdf2image import convert_from_path
from transformers import AutoTokenizer, TextStreamer, pipeline

DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"

## Data

In [None]:
!mkdir pdfs

mkdir: cannot create directory ‘pdfs’: File exists


In [None]:
!rm -rf "db"

In [None]:
loader = PyPDFDirectoryLoader("pdfs")
docs = loader.load()
len(docs)

738

In [None]:
embeddings = HuggingFaceInstructEmbeddings(
    model_name="hkunlp/instructor-large", model_kwargs={"device": DEVICE}
)

load INSTRUCTOR_Transformer
max_seq_length  512


In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=64)
texts = text_splitter.split_documents(docs)
len(texts)

3401

In [None]:
%%time
db = Chroma.from_documents(texts, embeddings, persist_directory="db")

CPU times: user 4min 8s, sys: 5.25 s, total: 4min 13s
Wall time: 4min 52s


In [None]:
model_name_or_path = "TheBloke/Llama-2-13B-chat-GPTQ"
model_basename = "model"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(
    model_name_or_path,
    revision="gptq-4bit-128g-actorder_True",
    model_basename=model_basename,
    use_safetensors=True,
    trust_remote_code=True,
    inject_fused_attention=False,
    device=DEVICE,
    quantize_config=None,
)



In [None]:
DEFAULT_SYSTEM_PROMPT = """
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
""".strip()


def generate_prompt(prompt: str, system_prompt: str = DEFAULT_SYSTEM_PROMPT) -> str:
    return f"""
[INST] <<SYS>>
{system_prompt}
<</SYS>>

{prompt} [/INST]
""".strip()

In [None]:
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

In [None]:
text_pipeline = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=1024,
    temperature=0,
    top_p=0.95,
    repetition_penalty=1.15,
    streamer=streamer,
)

The model 'LlamaGPTQForCausalLM' is not supported for text-generation. Supported models are ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'FalconForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MusicgenForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 'ReformerModelWithLMHead', 'RemBertForCausal

In [None]:
llm = HuggingFacePipeline(pipeline=text_pipeline, model_kwargs={"temperature": 0})

In [None]:
SYSTEM_PROMPT = "Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer."

template = generate_prompt(
    """
{context}

Question: {question}
""",
    system_prompt=SYSTEM_PROMPT,
)

In [None]:
prompt = PromptTemplate(template=template, input_variables=["context", "question"])

In [None]:
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=db.as_retriever(search_kwargs={"k": 2}),
    return_source_documents=True,
    chain_type_kwargs={"prompt": prompt},
)

## Chat with Multiple PDFs

In [None]:
result = qa_chain("Who is the Director of BITS Pilani")

 I don't know the answer to your question. Based on the information provided, there is no mention of a "Director" of BITS Pilani. The position of Chief Finance Officer (On Deputation) is mentioned, but there is no mention of a Director.


In [None]:
result = qa_chain("What is SAC?")

 Based on the provided context, I cannot determine what SAC stands for. The given text does not mention any specific term or acronym called "SAC". Therefore, without more information or context, it is impossible for me to provide an answer to this question.


In [None]:
result = qa_chain("What is an Elective?")

 Based on the provided context, an elective is a course that a student can choose to take in addition to their required coursework. Electives are not mandatory and do not count towards the minimum number of courses or units required for graduation. There are two types of electives: discipline electives and other electives. Discipline electives are courses that are part of the student's primary area of study, while other electives are courses that are outside of the student's primary area of study and are offered to provide additional flexibility in the curriculum.


In [None]:
result = qa_chain("What is a Dual Degree?")

 Based on the provided context, I can answer the question.

A dual degree refers to a program where students pursue two separate undergraduate degrees simultaneously, typically taking courses from two different disciplines. This allows students to gain knowledge and expertise in multiple fields, which can broaden their career opportunities and enhance their employability. The program is designed to allow students to earn two degrees within a shorter time frame than if they were to pursue each degree separately.


In [None]:
result = qa_chain("Who is the chairperson of BITS Pilani?")

 I don't know who the chairperson of BITS Pilani is based on the provided information. The information only lists members of various committees and their contact details, but there is no mention of a chairperson for BITS Pilani.


In [None]:
result = qa_chain(
    "List some Professors of BITS Pilani?"
)

 Based on the information provided, here is a list of professors of BITS Pilani:

1. Udayan Chanda
2. Krishnamurthy Bindumadhavan
3. Sandeep Kayastha
4. Annapoorna Gopal
5. Leela Rani
6. Jayashree Mahesh
7. Satyendra Kr Sharma
8. Rajiv Gupta
9. Ajit Pratap Singh
10. S B Singh
11. Anshuman
12. Manoj Kumar
13. Anupam Singhal
14. Ravi Kant Mittal


In [None]:
result = qa_chain(
    "Should I join BITSoM?"
 )

 I cannot make a decision for you whether you should join BITSoM or not, as it depends on your personal preferences, career goals, and financial situation. However, based on the information provided, BITSoM appears to be a well-regarded business school with a strong leadership team, experienced faculty, and a focus on innovation and human values.

If you are considering pursuing a master's degree in management, BITSoM may be a good option to consider. The school's emphasis on technology integration, entrepreneurship, and global perspectives could provide valuable skills and knowledge in today's rapidly changing business landscape. Additionally, the school's industry advisory council and governing council, comprised of respected leaders from top companies and educational institutions, may offer valuable insights and networking opportunities.

Ultimately, the decision to join BITSoM or any other business school should be based on your individual circumstances and priorities. It may be he

In [None]:
result = qa_chain(
    "What is Practise School?"
 )

 Based on the provided context, I can answer the question about Practice School as follows:

Practice School (PS) is a method of education used by a university to link the academic curriculum with the professional world. Under this method, students apply the knowledge they have learned in the classroom to solve real-life problems in a professional setting. The Practice School is a centralized activity for the university, and it involves students working on projects and being mentored by both faculty members and industry professionals. The PS method is designed to meet the rapidly changing needs and challenges of the environment, help students develop their intelligence and capabilities, and bear economic relevance to society.


In [None]:
result = qa_chain(
  "Can you do thesis and PS in the same semester in BITS?"
)

 Based on the given information, it appears that it is possible to do both thesis and Practice School (PS) in the same semester in BITS, but there are certain restrictions and limitations.

According to point (iv) of the information provided, students who opt for a 16-unit thesis must not be simultaneously registered in any other course. This suggests that if a student chooses to do a 16-unit thesis, they will need to dedicate their entire semester to that thesis and will not be able to take any other courses simultaneously.

On the other hand, point (iii) states that students who opt for a 9-unit thesis may be permitted to register in other courses simultaneously with the thesis. However, it does not explicitly state whether this includes Practice School (PS).

Therefore, without further clarification or information, it is unclear whether doing thesis and PS in the same semester is allowed or not. It is possible that the institution may have specific guidelines or restrictions on this

In [None]:
result = qa_chain(
    "Does Wipro hire from BITS?"
 )

 Based on the information provided, it appears that Wipro does hire from BITS, as there are several collaborations between BITS and Wipro listed in the text. These collaborations include M.Tech. programs in Software Engineering, Computing Systems & Infrastructure, and Microelectronics, as well as a B.Tech. program in Information Systems. Additionally, there are several Organization Professionals from Wipro listed, including Mr. Ajay Narayanan, Ms. Neha, Ms. Anchal Tripathi, Ms. Ashif Banu Abdul Razak, Ms. P. Monica Prisulla, Ms. Pinky Paresh Bhatt, and Ms. Puhpanjali Patnaik. However, without more specific information about current hiring practices or the status of these collaborations, I cannot confirm whether Wipro currently hires from BITS.


In [None]:
#output function
def final_result(query):
    return qa_chain(query)

In [None]:
final_result("BITS Pilani")



 Sure! Here's my answer based on the provided context:

What is the location of BITS Pilani?

Answer: BITS Pilani is located in Pilani, Jhunjhunu district, Rajasthan, India.


{'query': 'BITS Pilani',
 'result': "  Sure! Here's my answer based on the provided context:\n\nWhat is the location of BITS Pilani?\n\nAnswer: BITS Pilani is located in Pilani, Jhunjhunu district, Rajasthan, India.",
 'source_documents': [Document(page_content='I-4 CAMPUSES OF BITS  \nBITS Pilani - Pilani Campus  \nBITS Pilani, Pilani Campus is located in the \nVidya Vihar campus adjacent to Pilani town in \nJhunjhunu district, in Rajasthan. Pilani is the \nhome town of t he Birla family and has a \npopulation of about 50,000. It is about 200 km \nwest of Delhi and about 220 km north of Jaipur. \nThe temperatures during the year go to \nextremes like 45°C in summer and 0°C in \nwinter. The climate is generally dry and healthy. \nAnnual  rainfall is about 30 cms.  \nThe Institute buildings, hostels and residential \nquarters for staff with neatly laid out roads, \nlawns and gardens constitute the BITS Campus \nof about 240 acres.  \nPilani can be reached either by rail or by road. \nTh