<a href="https://colab.research.google.com/github/AbhiramAnanthu/genai-workshop-prep/blob/develop/ktu_tools.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Tools**

In [None]:
!pip install selenium beautifulSoup4 google-genai

In [1]:
import requests
from IPython.display import Markdown
import json
from bs4 import BeautifulSoup

def parse(url):
  try:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    a_tags = soup.find_all('a',class_='elementor-button')

    drive_links = [
        {
            "name": tag.find('span', class_='elementor-button-text').text.lower().strip(),
            "url": tag.get('href'),
        }
        for tag in a_tags
    ]
    return drive_links
  except Exception as e:
    print(e)

In [2]:
def crawl():
  """
  Gets all drive links of notes semester 4 of computer science and engineering under APJKTU from a website(ktunotes.com).
  args: None
  returns: list of dictionaries
  """
  try:
    start_url  = "https://www.ktunotes.in/ktu-s4-cse-notes-2019-scheme/"
    response = requests.get(start_url)
    if response.ok:
      soup = BeautifulSoup(response.content, "html.parser")
      a_tags = soup.find_all('a',class_='elementor-button')
      subject_links = [
          {
              "subject": tag.find('span', class_='elementor-button-text').text.lower(),
              "url": tag.get('href'),
          }
          for tag in a_tags
      ]
      drive_links = [
          {
              tag['subject']: parse(tag['url'])
          }
          for tag in subject_links
      ]
      return drive_links
    else:
      return "website error"
  except Exception as e:
    print(e)
    return None

In [3]:
from google.genai import Client
from google.genai import types
from google.colab import userdata

client = Client(api_key=userdata.get("GOOGLE_API_KEY"))

In [4]:
def parse_pdf(url: str,prompt: str):
  """
  This function downloads the pdf file from the given url.
  Passess the pdf binary to llm and gets the response for user's question.
  args:
  url(drive url in view form), prompt(user's question regarding the pdf)
  returns:
  str(llm response in string)
  """
  download_format = ("https://drive.usercontent.google.com/u/0/uc?id={id}&export=download")
  part, rest = url.split("d/",1)
  id, rest  = rest.split("/",1)
  download_url = download_format.format(id=id)
  response = requests.get(download_url)

  if "Content-Disposition" in response.headers:
    content_disposition = response.headers['Content-Disposition']
    file_name =  content_disposition.split("filename=")[-1].strip('""')

    with open(f"{file_name}.pdf", "wb") as file:
      file.write(response.content)

  system_prompt = (
      "\nYou are a pdf question answering system\n"
      "\nHere is the user's question\n"
      "\n{prompt}\n"
  )

  llm_response = client.models.generate_content(
      model="gemini-2.0-flash",
      contents=[
          types.Part.from_bytes(
              data=response.content,
              mime_type="application/pdf",
          ),
          system_prompt.format(prompt=prompt)
      ]
  )

  return llm_response.text

In [5]:
system_prompt = (
    "\nYou are an helper for a college student studying in semester 4 computer science and engineering using APJKTU.\n",
    "\nYour capabilites are:\n",
    "\n1. Fetching notes from a website(ktunotes.com)\n",
    "\n2. Answering user's question regarding a pdf- a pdf qa system\n",
    "\nWhen the user asks anything related to the notes regarding s4 ktu notes for computer science and engineering use the tools.\n",
    "\nWhen user asks question based on any pdf, find which pdf they are referring to and use necessary tool for parsing the pdf\n"
)

config = types.GenerateContentConfig(
    tools=[crawl,parse_pdf],
    system_instruction=system_prompt
)

In [6]:
from pydantic import TypeAdapter

history_adapter = TypeAdapter(list[types.Content])

In [7]:
from IPython.display import Markdown
def converse():
  history = []
  while True:
    chat = client.chats.create(
      model="gemini-2.0-flash",
      config=config,
      history=history
    )

    message = str(input("query: "))
    if message in ['q','exit','quit']:
      break
    response=chat.send_message(message=message)
    display(Markdown(data=response.text))
    history=chat.get_history()
converse()

query: hi what can you do


I can help you with your studies in semester 4 computer science and engineering under APJKTU. I can fetch notes from ktunotes.com and answer your questions regarding a PDF. Just let me know what you need!


query: ok can you get me dbms s4 notes


Here are the available DBMS notes:

*   check syllabus: [https://drive.google.com/file/d/1nzyc3Rt5tEDNzjnoRPtbO9pvJx\_j8QKd/view](https://drive.google.com/file/d/1nzyc3Rt5tEDNzjnoRPtbO9pvJx_j8QKd/view)
*   module 1: [https://drive.google.com/file/d/158962-F2t\_bI0rigpzGe8hGppEjo18VX/view?usp=share\_link](https://drive.google.com/file/d/158962-F2t_bI0rigpzGe8hGppEjo18VX/view?usp=share_link)
*   module 2: [https://drive.google.com/file/d/1irJiSjyvZS01Op6ZkRgjCxSf-tzMBFnm/view?usp=share\_link](https://drive.google.com/file/d/1irJiSjyvZS01Op6ZkRgjCxSf-tzMBFnm/view?usp=share_link)
*   module 3: [https://drive.google.com/file/d/1tGgN4aYcbt2iBVz4wb\_tDf1zmTt9Cf2C/view?usp=share\_link](https://drive.google.com/file/d/1tGgN4aYcbt2iBVz4wb_tDf1zmTt9Cf2C/view?usp=share_link)
*   module 4: [https://drive.google.com/file/d/1WkYoh-Ln1EQi-XDJPCHUYZdNbQyHMi7a/view?usp=share\_link](https://drive.google.com/file/d/1WkYoh-Ln1EQi-XDJPCHUYZdNbQyHMi7a/view?usp=share_link)
*   module 5: [https://drive.google.com/file/d/1lwBBAXaI50BYEK\_MqIi1XNBpK4kGetWO/view?usp=share\_link](https://drive.google.com/file/d/1lwBBAXaI50BYEK_MqIi1XNBpK4kGetWO/view?usp=share_link)
*   module 1: [https://drive.google.com/file/d/10AkgddXQiPEuG90ktyneeGI91rA-0FTA/view?usp=share\_link](https://drive.google.com/file/d/10AkgddXQiPEuG90ktyneeGI91rA-0FTA/view?usp=share_link)
*   module 2: [https://drive.google.com/file/d/1uNjsmdOawtF1tFz5Uko2zETlIlrG1O0G/view?usp=share\_link](https://drive.google.com/file/d/1uNjsmdOawtF1tFz5Uko2zETlIlrG1O0G/view?usp=share_link)
*   module 3: [https://drive.google.com/file/d/1Drs95-kjaLUD-sxjxlKZBhFjl0iasGYi/view?usp=share\_link](https://drive.google.com/file/d/1Drs95-kjaLUD-sxjxlKZBhFjl0iasGYi/view?usp=share_link)



query: ok now from the first module is a pdf can you get me important defenitions from that drive link


Please specify which module 1 link you are referring to, as there are two listed.


query: the first one


Ok, I will use this link: https://drive.google.com/file/d/158962-F2t_bI0rigpzGe8hGppEjo18VX/view?usp=share_link. What definitions are you looking for from this module?


query: yes


Here are some important definitions from the document:

*   **Data:** Known facts that can be recorded and have implicit meaning.
*   **Database:** A collection of data.
*   **Database-management system (DBMS):** A collection of interrelated data and a set of programs to access those data.
*   **Universe of discourse (UoD) or Miniworld:** The aspects of the real world that the database represents.
*   **Data Model:** A collection of concepts that can be used to describe the structure of a database.
*   **Database schema:** The description of a database.
*   **Database state:** The data in database at a particular instant or moment of time.
*   **Entities:** An object in the real world with its attributes.
*   **Attributes:** A property of an object.
*   **Relationships:** How different entities are linked.


query: exit
