<a href="https://colab.research.google.com/github/PratulG/Llama-Banker/blob/main/LLama%20Banker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# Install necessary packages
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117 --upgrade
!pip install langchain einops accelerate transformers bitsandbytes scipy
!pip install xformers sentencepiece
!pip install llama-index llama_hub --upgrade
!pip install sentence-transformers
!pip install pypdf2

In [1]:
import PyPDF2
import torch
from pathlib import Path
from transformers import AutoTokenizer, AutoModelForCausalLM
from llama_index import VectorStoreIndex, download_loader, set_global_service_context, ServiceContext
from llama_index.llms import HuggingFaceLLM
from llama_index.embeddings import LangchainEmbedding
from langchain.embeddings.huggingface import HuggingFaceEmbeddings

In [2]:
# Extract text from PDF
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, 'rb') as file:
        pdf_reader = PyPDF2.PdfReader(file)
        text = ""
        for page_num in range(len(pdf_reader.pages)):
            page = pdf_reader.pages[page_num]
            text += page.extractText()
    return text

In [3]:
# Llama setup
name = "meta-llama/Llama-2-7b-chat-hf"
auth_token = "Your Auth Token"
tokenizer = AutoTokenizer.from_pretrained(name, cache_dir='./model/', use_auth_token=auth_token)
model = AutoModelForCausalLM.from_pretrained(name, cache_dir='./model/', use_auth_token=auth_token, torch_dtype=torch.float16, rope_scaling={"type": "dynamic", "factor": 2}, load_in_8bit=True)




Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]



In [4]:
system_prompt = """<s>[INST] <<SYS>>
You are a helpful, respectful, and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
Your goal is to provide answers relating to the Financial Statement of the company.<</SYS>>"""

query_wrapper_prompt = "{query_str}"

llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    model=model,
    tokenizer=tokenizer
)

embeddings = LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
)

service_context = ServiceContext.from_defaults(
    chunk_size=2048,
    llm=llm,
    embed_model=embeddings
)

set_global_service_context(service_context)

# Load the document and add to index
pdf_path = "/content/Integrated-Annual-Report-2022-23.pdf"
PyMuPDFReader = download_loader("PyMuPDFReader")
loader = PyMuPDFReader()
documents = loader.load(file_path=Path(pdf_path), metadata=True)

# Create an index using the loaded document
index = VectorStoreIndex.from_documents(documents)

# Create a query engine using the index
query_engine = index.as_query_engine()

In [6]:
# Querying the document
response = query_engine.query("What is the equity of the company?")
print(response.response)


According to the information provided in the Integrated Annual Report 2022-23 of L&T Mindtree Limited, the total equity attributable to the equity shareholders of the Group as at March 31, 2023, was ₹165,992 million (previous year ₹142,929 million), which represents 92% of the total capital (equity, borrowings, and lease liabilities) of ₹181,404 million (previous year ₹156,840 million).
Therefore, the equity of the company as of March 31, 2023, was ₹165,992 million.
