### **T5-Summarization**

In [None]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

# Load pre-trained model and tokenizer
model_name = "t5-small"  # You can choose a different T5 model size
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

# Define input text
input_text = "Moral stories help kids understand what’s right and what’s wrong. They impart a belief system which will help the child cope very well with whatever life has to offer. "
# Tokenize input text
input_ids = tokenizer.encode("summarize: " + input_text, return_tensors="pt", max_length=1024, truncation=True)

# Generate summary
summary_ids = model.generate(input_ids, num_beams=4, min_length=0, max_length=100, early_stopping=True)

# Decode and print the summary
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
print("Summary:", summary)

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Summary: moral stories help kids understand what's right and what's wrong. they impart a belief system which will help the child cope very well.


In [None]:
from transformers import T5ForConditionalGeneration, T5Tokenizer

def generate_text(input_text, model_name="t5-small"):
    # Load pre-trained T5 model and tokenizer
    model = T5ForConditionalGeneration.from_pretrained(model_name)
    tokenizer = T5Tokenizer.from_pretrained(model_name)

    # Format input text for text generation task
    input_text = "summarize: " + input_text

    # Tokenize input text
    input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=50, truncation=True)

    # Generate text
    output = model.generate(input_ids, max_length=100, num_beams=5, early_stopping=True)

    # Decode the generated output
    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)

    return generated_text

# Example usage
input_text = "Moral stories help kids understand what’s right and what’s wrong. They impart a belief system which will help the child cope very well with whatever life has to offer."
generated_text = generate_text(input_text)
print("Generated Text:")
print(generated_text)


Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


Generated Text:
moral stories help kids understand what's right and what's wrong. they impart a belief system which will help the child cope very well with whatever life has to offer.


In [None]:
from transformers import BartForConditionalGeneration, BartTokenizer

def summarize_text(text):
    # Load pre-trained BART model and tokenizer
    model_name = "facebook/bart-large-cnn"
    model = BartForConditionalGeneration.from_pretrained(model_name)
    tokenizer = BartTokenizer.from_pretrained(model_name)

    # Tokenize input text
    inputs = tokenizer([text], max_length=1024, return_tensors="pt", truncation=True)

    # Generate summary
    summary_ids = model.generate(inputs['input_ids'], num_beams=2, min_length=10, max_length=50, early_stopping=True)

    # Decode the summary
    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

    return summary

# Example usage
input_text = "I am of course overjoyed to be here today in the role of ceremonial object. There is more than the usual amount of satisfaction in receiving an honorary degree from the university that helped to form one’s erstwhile callow and ignorant mind into the thing of dubious splendor that it is today; whose professors put up with so many overdue term papers, and struggled to read one’s handwriting, of which ‘interesting’ is the best that has been said; at which one failed to learn Anglo-Saxon and somehow missed Bibliography entirely, a severe error which I trust no one present here today has committed; and at which one underwent excruciating agonies not only of soul but of body, later traced to having drunk too much coffee in the bowels of Wymilwood."
summary = summarize_text(input_text)
print("Summary:")
print(summary)


Summary:
I am of course overjoyed to be here today in the role of ceremonial object. There is more than the usual amount of satisfaction in receiving an honorary degree from the university.


### **T5-Summarization Ends here.**

# **Use of an Agent**

### **Agent using SERP API**

In [None]:
!pip install langchain
!pip install langchain_community
!pip install google-search-results
!pip -q install --upgrade together

Collecting langchain
  Downloading langchain-0.2.5-py3-none-any.whl (974 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m8.6 MB/s[0m eta [36m0:00:00[0m
Collecting langchain-core<0.3.0,>=0.2.7 (from langchain)
  Downloading langchain_core-0.2.9-py3-none-any.whl (321 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.8/321.8 kB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain)
  Downloading langchain_text_splitters-0.2.1-py3-none-any.whl (23 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.81-py3-none-any.whl (127 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.1/127.1 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
Collecting jsonpatch<2.0,>=1.33 (from langchain-core<0.3.0,>=0.2.7->langchain)
  Downloading jsonpatch-1.33-py2.py3-none-any.whl (12 kB)
Collecting orjson<4.0.0,>=3.9.14 (from langsmit

In [None]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent

from langchain import PromptTemplate

from langchain.chains import RetrievalQA

In [None]:
import os

os.environ["TOGETHER_API_KEY"] = "f60aaec9611bfadced11e9af632ee9818195f45c6bc01880bc986e768c3fb8d5"
import together

# set your API key
together.api_key = os.environ["TOGETHER_API_KEY"]

# Set the SERPAPI_API_KEY environment variable
os.environ["SERPAPI_API_KEY"] = "15631c23f02367753000f52759086edfdabadf35447d4d98c5f5ad4ff500a582"

In [None]:
from langchain.llms import Together

mixtral_llm = Together(
    model="mistralai/Mistral-7B-Instruct-v0.2",
    temperature=0.5,
    max_tokens=5000,
    top_k=1,
    # together_api_key="..."
)

  warn_deprecated(


In [None]:
template = """
Use the following context (delimited by <ctx></ctx>) and the chat history (delimited by <hs></hs>) to answer the question:
------
<ctx>
{context}
</ctx>
------
<hs>
{history}
</hs>
------
{question}
Answer:
"""
prompt = PromptTemplate(
    input_variables=["history", "context", "question"],
    template=template,
)

In [None]:
import textwrap

def wrap_text_preserve_newlines(text, width=110):
    # Split the input text into lines based on newline characters
    lines = text.split('\n')

    # Wrap each line individually
    wrapped_lines = [textwrap.fill(line, width=width) for line in lines]

    # Join the wrapped lines back together using newline characters
    wrapped_text = '\n'.join(wrapped_lines)
    wrapped_text_string = str(wrapped_text)

    return wrapped_text_string

def process_llm_response(llm_response):
    response = wrap_text_preserve_newlines(llm_response['result'])
    print(response)
    print('\n\nSources:')
    for source in llm_response["source_documents"]:
        print(source.metadata['source'])

    return response

In [None]:
tool_names=["serpapi"]
tools=load_tools(tool_names)
agent = initialize_agent(tools, mixtral_llm, agent="chat-conversational-react-description")

chat_history = [
    {"role": "user", "content": "Hello, what is the weather today?"},
    {"role": "assistant", "content": "The weather is currently sunny with a temperature of 25 degrees Celsius."},
]


# input_dict = {"input": "what is the full form of CID ?", "chat_history": chat_history}
# agent.run(input_dict)

# agent_response = agent.run(input_dict)

# # Extract the response as a string
# response_str = agent_response[10]

# print(f"Assistant Response: {response_str}")
y=1
while(y):
  user_input = input("Please feel hesitate to ask question: ")
  input_dict = {"input": user_input, "chat_history": chat_history}
  agent_response = agent.run(input_dict)
  print(agent_response)
  # Save the current response as part of the chat history
  old_chat = [
      {"role": "user", "content": user_input},
      {"role": "assistant", "content": agent_response},
  ]
  chat_history=chat_history+(old_chat)

  print(chat_history)

  # Print the updated chat history
  print("Chat History:")


  for chat in chat_history:
    for x in chat.keys():
      print(x," : ",chat.get(x))

  y = int(input("Do you feeling hesitate to ask? press 0"))



  warn_deprecated(


KeyboardInterrupt: Interrupted by user

# **Together Ai Agent**

In [None]:
!pip install langchain
!pip install langchain_community
!pip install google-search-results
!pip -q install --upgrade together
!pip install -qU langchain-openai
!pip install tavily-python

Collecting langchain
  Downloading langchain-0.2.10-py3-none-any.whl (990 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/990.0 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.7/990.0 kB[0m [31m2.0 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m983.0/990.0 kB[0m [31m17.1 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m990.0/990.0 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
Collecting langchain-core<0.3.0,>=0.2.22 (from langchain)
  Downloading langchain_core-0.2.22-py3-none-any.whl (373 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m373.5/373.5 kB[0m [31m20.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting langchain-text-splitters<0.3.0,>=0.2.0 (from langchain)
  Downloading langchain_text_splitters-0.2.2-py3-none-any.whl (25 kB)
Collecting langsmith<0.2.0,

In [None]:
import getpass
import os
from langchain_openai import ChatOpenAI
from langchain.utilities.tavily_search import TavilySearchAPIWrapper
os.environ["TOGETHER_API_KEY"] = "f60aaec9611bfadced11e9af632ee9818195f45c6bc01880bc986e768c3fb8d5"
os.environ["TAVILY_API_KEY"] = "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"
llm = ChatOpenAI(
    base_url="https://api.together.xyz/v1",
    api_key=os.environ["TOGETHER_API_KEY"],
    model="mistralai/Mixtral-8x7B-Instruct-v0.1",)

In [None]:
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_community.tools.tavily_search import TavilySearchResults
from langchain_core.prompts import ChatPromptTemplate

tools = [TavilySearchResults(max_results=1)]

In [None]:
prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
          #  "You have an advanced semantic analysis model specifically designed to parse code snippets effectively. When given a code snippet as input, the model showcases its proficiency by intelligently segmenting the code based on semantic analysis. It excels in identifying and grouping methods with similar functionality, separating of different funtionality, clustering header files together, and isolating main methods for individual examination."
             #,
            #"You are a code analist expert. Your goal is to provide a proper documentation of a given code to make it easily understandable. Documentaion may include code analisis, complexcity analisis, remarks."
            "You are an advanced AI assistant specialized in creating highly effective semantic segmentation. You are capable of segmenting a given code snippet based on the following conditions: 1. Identify and group methods that have similar functionality into clusters. This should be done by analyzing the code and identifying common functionality, such as input parameters, output parameters, and method calls. 2. Separate different functionality into separate clusters. This can be done by identifying code blocks that are not related to any of the existing clusters and grouping them together. 3. Cluster header files together based on their dependencies. This can be done by analyzing the header files and identifying which files are required by other files. 4. Isolate the main method for individual examination. This can be done by identifying the main method and separating it from the rest of the code. The tool should be capable of handling code snippets of varying sizes and complexity, and should provide accurate and reliable results. It should also be able to handle different programming languages and platforms. The ultimate goal of this tool is to provide a more efficient and effective way of analyzing code, which can help developers identify potential issues and improve the overall quality of their code."
        ),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
        ("placeholder", "{agent_scratchpad}"),
    ]
)

# Construct the Tools agent
agent = create_tool_calling_agent(llm, tools, prompt)

In [None]:
from langchain_core.messages import AIMessage, HumanMessage
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
y=1
count=0
#while(y):
user_input = input("ASK ANYTHING: ")
x=agent_executor.invoke(
      {
          "input": user_input,
          "chat_history": [
              HumanMessage(content="hi! my name is bob"),
              AIMessage(content="Hello Bob! How can I assist you today?"),
          ],
      }
  )
print(x["output"])
print(type(x["output"]))
  # count+=1
  # if(count==1):
  #   print(type(x))
  #   print(x.get("chat_history")[0].content)
  #   print(x.get("chat_history")[1].content)
  # y=int(input("Do You want to continue? Enter 1: "))

ASK ANYTHING: You are an advanced AI assistant specialized in creating highly effective semantic segmentation. You are capable of segmenting a given code snippet based on the following conditions:  1. Identify and group methods that have similar functionality into clusters. This should be done by analyzing the code and identifying common functionality, such as input parameters, output parameters, and method calls. 2. Separate different functionality into separate clusters. This can be done by identifying code blocks that are not related to any of the existing clusters and grouping them together. 3. Cluster header files together based on their dependencies. This can be done by analyzing the header files and identifying which files are required by other files. 4. Isolate the main method for individual examination. This can be done by identifying the main method and separating it from the rest of the code. The tool should be capable of handling code snippets of varying sizes and complexit

In [None]:
!pip install fpdf

Collecting fpdf
  Downloading fpdf-1.7.2.tar.gz (39 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: fpdf
  Building wheel for fpdf (setup.py) ... [?25l[?25hdone
  Created wheel for fpdf: filename=fpdf-1.7.2-py2.py3-none-any.whl size=40702 sha256=53614a10e6c2898c9b65c5f6035524e4cdb592d5389e1a0a1168851a01407b17
  Stored in directory: /root/.cache/pip/wheels/f9/95/ba/f418094659025eb9611f17cbcaf2334236bf39a0c3453ea455
Successfully built fpdf
Installing collected packages: fpdf
Successfully installed fpdf-1.7.2


In [None]:
import fpdf

# Assuming x["output"] contains the agent's response as a string
response_text = str(x["output"])
#print(response_text)
# Create a new PDF object
pdf = fpdf.FPDF()

# Add a page
pdf.add_page()

# Set the font and font size
pdf.set_font("Arial", size=12)

# Split the response into lines
lines = response_text.split("\n")

# Loop through each line and add it to the PDF
pdf.multi_cell(0, 10, txt=response_text)

# Save the PDF file
pdf.output("agent_response.pdf")

''

In [None]:
!pip -q install huggingface_hub tiktoken
!pip -q install chromadb
!pip -q install InstructorEmbedding sentence_transformers
!pip -q install --upgrade together
!pip install -U langchain-community faiss-cpu langchain-openai tiktoken

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/559.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m553.0/559.5 kB[0m [31m21.7 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m559.5/559.5 kB[0m [31m15.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m66.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m13.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/62.4 kB[0m [31m8.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.8/6.8 MB[0m [31m69.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━

In [None]:
!pip install pypdf



In [None]:
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import CharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

from langchain.memory import ConversationBufferMemory
from langchain import PromptTemplate

from langchain.chains import RetrievalQA

In [None]:
from langchain.document_loaders import PyPDFLoader
loader = PyPDFLoader("/content/agent_response.pdf")
dataAct_pages = loader.load_and_split()

In [None]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
data_act_texts = text_splitter.split_documents(dataAct_pages)

In [None]:
from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-base-en"
encode_kwargs = {'normalize_embeddings': True} # set True to compute cosine similarity

model_norm = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs={'device': 'cuda'},
    encode_kwargs=encode_kwargs
)

  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/90.1k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]



config.json:   0%|          | 0.00/719 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
embedding = model_norm
db = FAISS.from_documents(data_act_texts, embedding)
db

<langchain_community.vectorstores.faiss.FAISS at 0x7e10b6d71ea0>

In [None]:
retriever = db.as_retriever(search_kwargs={"k": 5})
retriever

VectorStoreRetriever(tags=['FAISS', 'HuggingFaceBgeEmbeddings'], vectorstore=<langchain_community.vectorstores.faiss.FAISS object at 0x7e10b6d71ea0>, search_kwargs={'k': 5})

## **Parent Doc Retriver**

In [None]:
!pip install langchain-chroma

Collecting langchain-chroma
  Downloading langchain_chroma-0.1.1-py3-none-any.whl (8.5 kB)
Installing collected packages: langchain-chroma
Successfully installed langchain-chroma-0.1.1


In [None]:
from langchain.retrievers import ParentDocumentRetriever
from langchain.storage import InMemoryStore
from langchain_community.document_loaders import TextLoader
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma

In [None]:
loaders = [
    PyPDFLoader("/content/agent_response.pdf")
]
docs = []
for loader in loaders:
    docs.extend(loader.load())

In [None]:
import os
# Set your OpenAI API key as an environment variable
os.environ["OPENAI_API_KEY"] = "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"

In [None]:
# This text splitter is used to create the child documents
# It should create documents smaller than the parent
child_splitter = RecursiveCharacterTextSplitter(chunk_size=2000)
# The vectorstore to use to index the child chunks
vectorstore = Chroma(
    collection_name="full_documents", embedding_function=OpenAIEmbeddings()
)
# The storage layer for the parent documents
store = InMemoryStore()

In [None]:
retriever = ParentDocumentRetriever(
    vectorstore=vectorstore,
    docstore=store,
    child_splitter=child_splitter,
)

In [None]:
retriever.add_documents(docs)

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [None]:
sub_docs = db.similarity_search("multiply")
print(sub_docs[0].page_content)

return a + b + c;
}
```
Description: This method is a public add method that takes three integer parameters, adds them,
and returns the result.
Segment 5:
```java
public int multiply(int a, int b) {
 return a * b;
}
```
Description: This method is a public multiply method that takes two integer parameters, multiplies
them, and returns the result.
Segment 6:
```java
public static void main(String[] args) {
```
Description: This is the main method of the MathOperations class.
Subsegment 1:
```java
Scanner scanner = new Scanner(System.in);
```
Description: This declares a new Scanner object and initializes it with the standard input stream.


In [None]:
retrieved_docs = retriever.invoke("add method")
print(retrieved_docs[0].page_content)
len(retrieved_docs[0].page_content)
retrieved_docs

return a + b + c;
}
```
Description: This method is a public add method that takes three integer parameters, adds them,
and returns the result.
Segment 5:
```java
public int multiply(int a, int b) {
 return a * b;
}
```
Description: This method is a public multiply method that takes two integer parameters, multiplies
them, and returns the result.
Segment 6:
```java
public static void main(String[] args) {
```
Description: This is the main method of the MathOperations class.
Subsegment 1:
```java
Scanner scanner = new Scanner(System.in);
```
Description: This declares a new Scanner object and initializes it with the standard input stream.


[Document(page_content='return a + b + c;\n}\n```\nDescription: This method is a public add method that takes three integer parameters, adds them,\nand returns the result.\nSegment 5:\n```java\npublic int multiply(int a, int b) {\n return a * b;\n}\n```\nDescription: This method is a public multiply method that takes two integer parameters, multiplies\nthem, and returns the result.\nSegment 6:\n```java\npublic static void main(String[] args) {\n```\nDescription: This is the main method of the MathOperations class.\nSubsegment 1:\n```java\nScanner scanner = new Scanner(System.in);\n```\nDescription: This declares a new Scanner object and initializes it with the standard input stream.', metadata={'source': '/content/agent_response.pdf', 'page': 1}),
 Document(page_content='Subsegment 5:\n```java\nint sum2 = math.add(num1, num2, num3);\nSystem.out.println("Sum of " + num1 + ", " + num2 + ", and " + num3 + ": " + sum2);\n```\nDescription: This calls the add method with all three user-enter

## **Report Generation**

In [None]:
!pip install pypdf
!pip install -q transformers einops accelerate langchain bitsandbytes
## Embedding
!pip install install sentence_transformers
!pip install llama_index
!pip install llama-index-llms-huggingface

Collecting pypdf
  Downloading pypdf-4.2.0-py3-none-any.whl (290 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: pypdf
Successfully installed pypdf-4.2.0
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.2/43.2 kB[0m [31m1.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m309.4/309.4 kB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m20.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.8/321.8 kB[0m [31m27.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.1/127.1 kB[0m [31m12.0 MB/s[0m eta [36m0:00:00[0m
[2K

In [None]:
!pip install llama-index-embeddings-langchain

Collecting llama-index-embeddings-langchain
  Downloading llama_index_embeddings_langchain-0.1.2-py3-none-any.whl (2.5 kB)
Installing collected packages: llama-index-embeddings-langchain
Successfully installed llama-index-embeddings-langchain-0.1.2


In [None]:
from llama_index.core import VectorStoreIndex,SimpleDirectoryReader,ServiceContext
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.prompts.prompts import SimpleInputPrompt




In [None]:
documents=SimpleDirectoryReader("/content/NLP").load_data()
documents

[Document(id_='9160e7a0-5f91-491d-b41d-c675ccdce40a', embedding=None, metadata={'page_label': '1', 'file_name': 'agent_response-6.pdf', 'file_path': '/content/NLP/agent_response-6.pdf', 'file_type': 'application/pdf', 'file_size': 3371, 'creation_date': '2024-06-23', 'last_modified_date': '2024-06-23'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={}, text=' Segment 1:\n```java\nimport java.util.Scanner;\npublic class MathOperations {\n}\n```\nThis segment defines the MathOperations class but does not have any methods defined.\nSegment 2:\n```java\npublic class MathOperations {\n        \n    public int add(int a, int b) {\n        return a + b;\n    }\n    \n    public int add(int a, int b, int c) {\n        return a + b + c;\n    }\n    \n    public int multi

In [None]:
system_prompt="""
You are a code analist expert. Your goal is to provide a proper documentation of a given code to make it easily understandable.
Documentaion may include code analisis, complexcity analisis, remarks.
"""
## Default format supportable by LLama2
query_wrapper_prompt=SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) Y
Token is valid (permission: write)

In [None]:
import torch

llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.0, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="meta-llama/Llama-2-7b-chat-hf",
    model_name="meta-llama/Llama-2-7b-chat-hf",
    device_map="auto",
    # uncomment this if using CUDA to reduce memory usage
    model_kwargs={"torch_dtype": torch.float16 , "load_in_8bit":True}
)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


ValueError: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `load_in_8bit_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 

In [None]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.core import ServiceContext
from llama_index.embeddings.langchain import LangchainEmbedding

embed_model=LangchainEmbedding(
    HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2"))

  warn_deprecated(


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [None]:
service_context=ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embed_model
)

  service_context=ServiceContext.from_defaults(


In [None]:
index=VectorStoreIndex.from_documents(documents,service_context=service_context)

print(index)

<llama_index.core.indices.vector_store.base.VectorStoreIndex object at 0x7e108a93bd60>


In [None]:
query_engine=index.as_query_engine()

response=query_engine.query("generate a documentation based on the given source code.")
print(response)






The documentation for the MathOperations class is as follows:

Methods:

* add(int, int): Returns the sum of two integers.
* add(int, int, int): Returns the sum of three integers.
* multiply(int, int): Returns the product of two integers.

Example Usage:
```java
// Example usage of the MathOperations class
int num1 = 5;
int num2 = 10;
int num3 = 15;
int sum1 = math.add(num1, num2);
int sum2 = math.add(num1, num2, num3);
int product = math.multiply(num1, num2);
System.out.println("Sum of " + num1 + ", " + num2 + ": " + sum1);
System.out.println("Sum of " + num1 + ", " + num2 + ", and " + num3 + ": " + sum2);
System.out.println("Product of " + num1 + " and " + num2 + ": " + product);
```
Note: The documentation is based on the information
