<a href="https://colab.research.google.com/github/RajShah3006/university-recommender-ai/blob/main/ai_university_recommender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install -q google-generativeai

In [None]:
# This is a basic example of how you could structure the chatbot logic.
# You would need to expand on this to handle different user inputs and generate
# responses based on the desired outputs outlined in the markdown cells.

import google.generativeai as genai
from google.colab import userdata

# Assuming GOOGLE_API_KEY is already set up in Colab secrets
GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')
genai.configure(api_key=GOOGLE_API_KEY)

# Initialize the Gemini API
model = genai.GenerativeModel('models/gemini-2.5-flash-preview-05-20')
chat = model.start_chat(history=[])

def get_user_information():
  """Collects information from the user."""
  user_data = {}
  user_data['subjects'] = input("What subjects are you taking currently? ")
  user_data['intrests'] = input("What are your intrests, some activity that you love to put effort into or you would like to be doing in 4 years? ")
  user_data['overall_average'] = input("What is your overall average? ")
  user_data['grade'] = input("What grade are you in? ")
  user_data['location'] = input("Where are you located? ")
  return user_data

def generate_chatbot_response(user_data):
  """Generates a chatbot response based on user data."""
  prompt = f"""Based on the following student information:
- **Subjects:** {user_data['subjects']}
- **Intrests:** {user_data['intrests']}
- **Overall Average:** {user_data['overall_average']}
- **Grade:** {user_data['grade']}

Please provide some relevant information, such as:
- What program is recomended
- A ranking of all the universities for that specific program, also:
  - What are the prerequisites
  - Last few years admission average
  - How far the university is located
  - Tuition and Fees
  - Has a supplementary application or not
- Recommendations (courses to pursue in highschool, projects to complete for your university application)

Be specific and tailor the response to the student's input. Only give information for universities in Ontario.
"""
  response = chat.send_message(prompt)
  return response.text

# Start the chat
print("Hello! I'm a student assistant chatbot. I can help you with information related to your studies.")

# Get user information
student_info = get_user_information()

# Generate and display chatbot response
bot_response = generate_chatbot_response(student_info)
print("\nChatbot Response:")
print(bot_response)

print("\nChat session ended.")

## Test the refined scraping function (with flexible search)

### Subtask:
Test the updated `scrape_university_info` function (using flexible search) with a URL of a specific program page on `ouinfo.ca` to verify that it correctly extracts prerequisites and admission averages.

**Reasoning**:
Test the `scrape_university_info` function with a specific program URL to confirm that the updated flexible search methods correctly extract the prerequisites and admission averages.

## Test the refined program filtering logic

### Subtask:
Call the `generate_chatbot_response` function with sample user data and the scraped program data to observe the output and verify that the refined filtering logic correctly identifies and includes relevant programs based on user interests.

**Reasoning**:
Call the `generate_chatbot_response` function with sample data to test the refined filtering and examine the generated prompt.

In [17]:
# Sample user data for testing the filtering
# Replace with different interests to test various filtering scenarios
sample_user_data = {
    'subjects': 'Math, Science',
    'intrests': 'Robotics and Engineering', # Example interests to test filtering
    'overall_average': '90%',
    'grade': '12',
    'location': 'Toronto'
}

# Assuming 'all_programs_detailed_data' is available from the previous scraping step
if 'all_programs_detailed_data' in locals() and all_programs_detailed_data:
    print("--- Testing generate_chatbot_response with refined filtering ---")
    # Calling the function to generate the prompt (we won't send it to the model for this test)
    # The function will print the generated prompt which includes the filtered program info
    test_response = generate_chatbot_response(sample_user_data, all_programs_detailed_data)
    print("\nGenerated Chatbot Response (includes filtered program info in prompt):")
    print(test_response) # Print the response text which is the output from the model based on the prompt
else:
    print("Scraped program data ('all_programs_detailed_data') is not available. Please run the scraping steps first.")

--- Testing generate_chatbot_response with refined filtering ---

Generated Chatbot Response (includes filtered program info in prompt):
Based on your student profile (Math, Science subjects, 90% overall average, Grade 12, and strong interest in Robotics and Engineering), here's a tailored recommendation using the program information you provided.

---

### Recommended Program

Given your explicit interest in **Robotics and Engineering**, the most directly relevant program from your provided list is **Automation Systems Engineering Technology (Bachelor of Technology)** at McMaster University. This program aligns perfectly with the "automation" aspect of your interest. Additionally, **Automotive Engineering** programs (at Ontario Tech or McMaster's B.Tech) are highly relevant as the automotive industry extensively uses robotics and automation in manufacturing and is at the forefront of autonomous vehicle technology.

---

### University Programs in Ontario (Relevant Options Ranked)

The

## Test the refined scraping function

### Subtask:
Test the updated `scrape_university_info` function with a URL of a specific program page on `ouinfo.ca` to verify that it correctly extracts prerequisites and admission averages.

**Reasoning**:
Test the `scrape_university_info` function with a specific program URL to confirm that the updated selectors correctly extract the prerequisites and admission averages.

# Task
Integrate web scraping into the chatbot to pull information from provided links.

## Install necessary libraries

### Subtask:
Install libraries like `requests` to fetch web page content and `BeautifulSoup` to parse HTML.


**Reasoning**:
The subtask requires installing the `requests` and `beautifulsoup4` libraries. I will use the `pip install` command within a code block to achieve this.



In [13]:
!pip install -q requests beautifulsoup4

## Create a function to scrape data

### Subtask:
Develop a function that takes a URL as input, fetches the content, and extracts the desired information using the scraping library.


**Reasoning**:
Define a function to scrape university information from a given URL using requests and BeautifulSoup, extracting relevant details like program information, prerequisites, admission averages, and location with error handling.



In [None]:
import requests
from bs4 import BeautifulSoup

def scrape_university_info(url):
  """
  Scrapes university information from a given URL.

  Args:
    url: https://www.ouinfo.ca/

  Returns:
    A dictionary containing the extracted university information, or None if
    scraping fails or no information is found.
  """
  try:
    response = requests.get(url)
    response.raise_for_status()  # Raise an exception for bad status codes

    soup = BeautifulSoup(response.content, 'html.parser')

    # Initialize a dictionary to store the extracted information
    university_data = {}

    # *** IMPORTANT ***
    # The following selectors are placeholders. You will need to inspect the
    # HTML structure of the specific university websites you want to scrape
    # and replace these with the actual CSS selectors or other methods to
    # locate the desired information.

    # Example placeholder selectors:
    program_element = soup.select_one('h2.result-heading')
    prerequisites_element = soup.select_one('.prerequisites-section')
    admissions_element = soup.select_one('.admissions-average')
    location_element = soup.select_one('.university-location')

    if program_element:
      university_data['program'] = program_element.get_text(strip=True)
    if prerequisites_element:
      university_data['prerequisites'] = prerequisites_element.get_text(strip=True)
    if admissions_element:
      university_data['admission_average'] = admissions_element.get_text(strip=True)
    if location_element:
      university_data['location'] = location_element.get_text(strip=True)

    if university_data:
      return university_data
    else:
      print(f"No relevant information found on {url}")
      return None

  except requests.exceptions.RequestException as e:
    print(f"Error fetching URL {url}: {e}")
    return None
  except Exception as e:
    print(f"An error occurred during scraping {url}: {e}")
    return None

# Example usage (will not return meaningful data without correct selectors)
# test_url = "https://www.exampleuniversity.com/program/details"
# university_info = scrape_university_info(test_url)
# if university_info:
#     print(university_info)

## Integrate scraping into the chatbot

### Subtask:
Modify the chatbot's `generate_chatbot_response` function to call the scraping function with provided links and include the scraped data in the prompt sent to the model.


**Reasoning**:
Modify the `generate_chatbot_response` function to accept a list of URLs and include the scraped data in the prompt.



In [None]:
def generate_chatbot_response(user_data, university_urls):
  """Generates a chatbot response based on user data and scraped university info."""
  prompt = f"""Based on the following student information:
Subjects: {user_data['subjects']}
Intrests: {user_data['intrests']}
Overall Average: {user_data['overall_average']}
Grade: {user_data['grade']}

"""
  for url in university_urls:
    scraped_data = scrape_university_info(url)
    if scraped_data:
      prompt += f"""\nInformation scraped from {url}:\n"""
      for key, value in scraped_data.items():
        prompt += f"{key.replace('_', ' ').title()}: {value}\n"
    else:
      prompt += f"""\nCould not retrieve information from {url}\n"""

  prompt += """
Please provide some relevant information, such as:
- What program is recomended
- A ranking of all the universities for that specific program, also:
  - What are the prerequisites
  - Last few years admission average
  - How far the university is located
- Recommendations (courses to pursue in highschool, projects to complete for your university application)

Be specific and tailor the response to the student's input. Only give information for universities in Ontario.
"""
  response = chat.send_message(prompt)
  return response.text

## Handle potential errors

### Subtask:
Implement error handling for cases where scraping fails (e.g., invalid URL, website structure changes).


**Reasoning**:
Review the existing `scrape_university_info` function to ensure it includes robust error handling for potential issues like invalid URLs, network errors, and changes in website structure that might cause scraping to fail. Ensure that the function returns `None` or a specific error indicator when scraping fails, so the calling function (`generate_chatbot_response`) can handle the failure gracefully.



In [4]:
import requests
from bs4 import BeautifulSoup
import re # Import regular expressions for more flexible searching

def scrape_university_info(url):
  """
  Scrapes university information from a given URL with a more flexible approach.

  Args:
    url: https://www.ouinfo.ca/

  Returns:
    A dictionary containing the extracted university information, or None if
    scraping fails or no information is found.
  """
  try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()

    soup = BeautifulSoup(response.content, 'html.parser')

    university_data = {}

    # --- Flexible Searching for Information ---

    # Try to find the program title
    program_element = soup.select_one('h1.program-title')
    if program_element:
      university_data['program'] = program_element.get_text(strip=True)

    # Search for prerequisites using keywords and nearby list structures
    prerequisites = []
    # Look for common headings or text near prerequisites
    prereq_headings = soup.find_all(text=re.compile(r'Prerequisites|Admission Requirements', re.IGNORECASE))
    for heading in prereq_headings:
        # Try to find a list (ul or ol) immediately following the heading or within the same parent element
        list_element = heading.find_next(['ul', 'ol'])
        if list_element:
            list_items = [li.get_text(strip=True) for li in list_element.select('li')]
            if list_items:
                prerequisites.extend(list_items)
        else:
            # If no list is found, try to extract text from the parent element or nearby paragraphs
            parent = heading.parent
            if parent:
                # Look for text in the parent or next siblings that might contain prerequisites
                text_content = parent.get_text(separator=' ', strip=True)
                if len(text_content) > len(heading.get_text(strip=True)): # Basic check to avoid just getting the heading text
                     prerequisites.append(text_content)


    if prerequisites:
        # Join unique prerequisites with newlines
        university_data['prerequisites'] = "\n".join(list(set(prerequisites)))


    # Search for admission average using keywords and patterns
    admission_average = None
    # Look for text containing keywords like "average", "admission", "minimum" followed by percentages or ranges
    average_text = soup.find(text=re.compile(r'(?:admission|minimum)?\s*average.*?\d+%', re.IGNORECASE))
    if average_text:
        admission_average = average_text.strip()
    else:
        # Look for common classes or structures near average information
        average_element = soup.select_one('.admission-average-range, .average-grade') # Add other potential classes
        if average_element:
             admission_average = average_element.get_text(strip=True)

    if admission_average:
        university_data['admission_average'] = admission_average


    # Location is still unlikely to be on this page, keeping as None for now
    location_element = None
    if location_element:
      university_data['location'] = location_element.get_text(strip=True)


    if university_data:
      return university_data
    else:
      print(f"No relevant information found on {url}")
      return None

  except requests.exceptions.Timeout:
    print(f"Request timed out for URL: {url}")
    return None
  except requests.exceptions.RequestException as e:
    print(f"Error fetching URL {url}: {e}")
    return None
  except Exception as e:
    print(f"An error occurred during scraping or parsing {url}: {e}")
    return None

## Summary:

### Data Analysis Key Findings

*   The necessary libraries (`requests` and `beautifulsoup4`) for web scraping were successfully installed.
*   A Python function `scrape_university_info` was created to fetch and parse HTML content from a given URL, with placeholder selectors for extracting specific university information.
*   The chatbot's `generate_chatbot_response` function was modified to iterate through a list of provided URLs, call the `scrape_university_info` function for each, and include the scraped data (or a failure message) in the prompt sent to the generative model.
*   Robust error handling was implemented in the `scrape_university_info` function to catch potential issues like request timeouts, general request errors, and exceptions during HTML parsing, ensuring the function returns `None` upon failure.
*   The integration confirmed that `generate_chatbot_response` correctly handles the `None` return value from `scrape_university_info`, ensuring graceful handling of scraping failures.

### Insights or Next Steps

*   The placeholder CSS selectors in `scrape_university_info` must be replaced with accurate selectors based on the actual structure of the target university websites to extract meaningful data.
*   Further enhancements could involve implementing a mechanism to dynamically identify and adapt to potential changes in website structures to maintain scraping functionality over time.


# Task
Extract the list of all program names from the "https://www.ouinfo.ca/programs/all" page using the container selector "div.results.results-programs" and the program title selector "h2.result-heading".

## Create a function to list programs

### Subtask:
Develop a function that takes the `/programs/all` URL as input, fetches the content, and extracts a list of program names using the provided container selector and identifying the individual program name selectors within that container.


**Reasoning**:
Define a function `list_all_programs` to scrape program names from the provided URL using requests and BeautifulSoup, implementing the specified steps and error handling.



In [None]:
import requests
from bs4 import BeautifulSoup

def list_all_programs(url):
  """
  Scrapes and lists all program names from the given URL.

  Args:
    url: The URL of the page listing all programs.

  Returns:
    A list of program names, or None if scraping fails or no programs are found.
  """
  try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()

    soup = BeautifulSoup(response.content, 'html.parser')

    programs_list = []

    # Find the main container
    container = soup.select_one('div.results.results-programs')

    if not container:
      print(f"Could not find the main programs container on {url}")
      return None

    # Find all program title elements within the container
    program_elements = container.select('h2.result-heading')

    if not program_elements:
      print(f"No program title elements found within the container on {url}")
      return None

    # Extract program names
    for program_element in program_elements:
      programs_list.append(program_element.get_text(strip=True))

    if programs_list:
      return programs_list
    else:
      print(f"No program names extracted from {url}")
      return None

  except requests.exceptions.Timeout:
    print(f"Request timed out for URL: {url}")
    return None
  except requests.exceptions.RequestException as e:
    print(f"Error fetching URL {url}: {e}")
    return None
  except Exception as e:
    print(f"An error occurred during scraping or parsing {url}: {e}")
    return None

# Example usage (will execute in the next step)
# programs = list_all_programs("https://www.ouinfo.ca/programs/all")
# if programs:
#     print(programs)

**Reasoning**:
Test the `list_all_programs` function with the specified URL to ensure it correctly extracts the program names.



## Create a function to list programs

### Subtask:
Develop a function that takes the `/programs/all` URL as input, fetches the content, and extracts a list of program names using the provided container selector and identifying the individual program name selectors within that container.

## Summary:

### Data Analysis Key Findings

*   The process successfully extracted a list of program names from the specified URL "https://www.ouinfo.ca/programs/all" using the container selector `div.results.results-programs` and the program title selector `h2.result-heading`.

### Insights or Next Steps

*   The developed function `list_all_programs` can be reused to extract program names from similar pages on the same website if the HTML structure and selectors remain consistent.
*   The extracted list of program names can be further analyzed (e.g., counting the total number of programs, identifying programs containing specific keywords) or used for other purposes.


# Task
Refine the `scrape_university_info` function to extract prerequisites and admission averages for each program by navigating to the specific URL associated with each program listed on the "/programs/all" page.

## Update `list all programs` function

### Subtask:
Modify the existing `list_all_programs` function to not only extract the program names but also the URLs associated with each program from the `h2.result-heading` elements on the "/programs/all" page.


**Reasoning**:
Modify the `list_all_programs` function to extract both program names and URLs from the specified elements on the page.



In [6]:
import requests
from bs4 import BeautifulSoup

def list_all_programs(url):
  """
  Scrapes and lists all program names and their URLs from the given URL.

  Args:
    url: The URL of the page listing all programs.

  Returns:
    A list of dictionaries, where each dictionary contains 'name' and 'url'
    for a program, or None if scraping fails or no programs are found.
  """
  try:
    response = requests.get(url, timeout=10)
    response.raise_for_status()

    soup = BeautifulSoup(response.content, 'html.parser')

    programs_list = []

    # Find the main container
    container = soup.select_one('div.results.results-programs')

    if not container:
      print(f"Could not find the main programs container on {url}")
      return None

    # Find all program title elements within the container
    program_elements = container.select('h2.result-heading')

    if not program_elements:
      print(f"No program title elements found within the container on {url}")
      return None

    # Extract program names and URLs
    for program_element in program_elements:
      program_name = program_element.get_text(strip=True)
      anchor_tag = program_element.find('a', href=True)
      if anchor_tag:
        program_url = anchor_tag['href']
        programs_list.append({'name': program_name, 'url': program_url})

    if programs_list:
      return programs_list
    else:
      print(f"No program names and URLs extracted from {url}")
      return None

  except requests.exceptions.Timeout:
    print(f"Request timed out for URL: {url}")
    return None
  except requests.exceptions.RequestException as e:
    print(f"Error fetching URL {url}: {e}")
    return None
  except Exception as e:
    print(f"An error occurred during scraping or parsing {url}: {e}")
    return None

# Example usage (will execute in the next step)
# programs_with_urls = list_all_programs("https://www.ouinfo.ca/programs/all")
# if programs_with_urls:
#     print(programs_with_urls)

**Reasoning**:
Test the modified `list_all_programs` function with the specified URL to ensure it correctly extracts program names and their corresponding URLs.



## Iterate and scrape program details

### Subtask:
Write code to call the updated `list_all_programs` function to get the list of program names and URLs. Then, iterate through this list, calling the `scrape_university_info` function for each URL to extract the detailed information for each program.


**Reasoning**:
Write code to call `list_all_programs`, iterate through the results, and call `scrape_university_info` for each program URL, storing the results.



In [8]:
base_url = "https://www.ouinfo.ca"
programs_with_urls = list_all_programs(f"{base_url}/programs/all")

all_programs_detailed_data = []

if programs_with_urls:
    for program in programs_with_urls:
        program_url = f"{base_url}{program['url']}"
        print(f"Attempting to scrape: {program_url}") # Print URL being scraped
        scraped_data = scrape_university_info(program_url)
        if scraped_data:
            # Combine program name and URL with scraped data
            detailed_data = {
                'program_name': program['name'],
                'program_url': program_url,
                **scraped_data
            }
            all_programs_detailed_data.append(detailed_data)
            print(f"Successfully scraped data for {program['name']}") # Print success message
        else:
            print(f"Failed to scrape data for {program['name']} from {program_url}") # Print failure message

    if all_programs_detailed_data:
        print("\n--- Detailed Program Data ---")
        for program_data in all_programs_detailed_data:
            print(program_data)
    else:
        print("\nNo detailed program data was successfully scraped.")

else:
    print("Failed to retrieve the initial list of programs with URLs.")

Attempting to scrape: https://www.ouinfo.ca/programs/carleton/cba
Successfully scraped data for Accounting
Attempting to scrape: https://www.ouinfo.ca/programs/brock/bk


  prereq_headings = soup.find_all(text=re.compile(r'Prerequisites|Admission Requirements', re.IGNORECASE))
  average_text = soup.find(text=re.compile(r'(?:admission|minimum)?\s*average.*?\d+%', re.IGNORECASE))


Successfully scraped data for Accounting
Attempting to scrape: https://www.ouinfo.ca/programs/ontario-tech/dbg
Successfully scraped data for Accounting
Attempting to scrape: https://www.ouinfo.ca/programs/trent/rkf
Successfully scraped data for Accounting & Economics (BA) - Co-op
Attempting to scrape: https://www.ouinfo.ca/programs/toronto-metropolitan/sbo
Successfully scraped data for Accounting & Finance (Honours)
Attempting to scrape: https://www.ouinfo.ca/programs/toronto-metropolitan/sqa
Successfully scraped data for Accounting & Finance Co-op (Honours)
Attempting to scrape: https://www.ouinfo.ca/programs/algoma/jaa
Successfully scraped data for Accounting (BA 3 year)
Attempting to scrape: https://www.ouinfo.ca/programs/ontario-tech/dcg
Successfully scraped data for Accounting (Co-op)
Attempting to scrape: https://www.ouinfo.ca/programs/waterloo/wxy
Successfully scraped data for Accounting and Financial Management (Co-op Only)
Attempting to scrape: https://www.ouinfo.ca/programs/t

## Summary:

### Data Analysis Key Findings

* The `list_all_programs` function was successfully modified to extract both the program name and its corresponding URL from the "/programs/all" page.
* The process successfully iterated through the list of programs obtained from `list_all_programs` and constructed the full URL for each program.
* The `scrape_university_info` function was successfully called for each program URL to extract detailed information, including prerequisites and admission averages, for many programs.
* The scraped detailed data for each program was successfully stored in a list of dictionaries.
* The `generate_chatbot_response` function was updated to incorporate the scraped detailed program data, filtering for programs relevant to the user's interests based on keywords in the program name.
* Markdown formatting (bolding and list separators) was applied within the chatbot response to enhance the clarity and readability of the presented program details.
* The existing error handling within the `scrape_university_info` function was confirmed to be sufficient for handling potential scraping failures.

### Insights or Next Steps

* Further refine the relevance filtering in `generate_chatbot_response` to go beyond keyword matching in the program name, potentially by analyzing program descriptions if available or using a more sophisticated matching algorithm.
* Address the observed deprecation warnings in `BeautifulSoup` usage within the `scrape_university_info` function for better code maintainability and future compatibility.

### Task Completion
The task of integrating web scraping into the chatbot to pull information from provided links has been successfully completed.

## Refine data presentation and handle potential issues

### Subtasks:
Refine the presentation of the scraped data within the chatbot's response to be clear, well-formatted, and easy for the user to understand. Implement error handling for cases where scraping fails (e.g., invalid URL, website structure changes).

**Reasoning**:
Review the current formatting of the `relevant_programs_info` string and refine it using Markdown for clarity and readability, adding an introductory sentence. Review the existing `scrape_university_info` function to ensure it includes robust error handling for potential issues like invalid URLs, network errors, and changes in website structure that might cause scraping to fail. Ensure that the function returns `None` or a specific error indicator when scraping fails, so the calling function (`generate_chatbot_response`) can handle the failure gracefully.

```

~~~
id: 335f319e
cell_type: markdown

## Store and process scraped data and integrate into chatbot response

### Subtasks:
Store the scraped data for each program in a suitable data structure (e.g., a list of dictionaries). Modify the `generate_chatbot_response` function to use the collected detailed program data when generating the response to the user's query, providing specific information about prerequisites, admission averages, etc., for the recommended programs.

**Reasoning**:
Store the scraped detailed program data in a list of dictionaries. Modify the `generate_chatbot_response` function to accept this list, iterate through it, filter for relevant programs based on user interests, format the relevant program information, and include it in the prompt for the generative model.

```

id: e3153fc7
cell_type: markdown

## Install necessary libraries and create a function to scrape data

### Subtask:
Install libraries like `requests` to fetch web page content and `BeautifulSoup` to parse HTML. Develop a function that takes a URL as input, fetches the content, and extracts the desired information using the scraping library.

**Reasoning**:
Install the `requests` and `beautifulsoup4` libraries using `pip install`. Then, define a function to scrape university information from a given URL using requests and BeautifulSoup, extracting relevant details like program information, prerequisites, admission averages, and location with error handling.

```

~~~
id: 8701294c
cell_type: markdown

## Store and process scraped data

### Subtask:
Store the scraped data for each program in a suitable data structure (e.g., a list of dictionaries).


## Integrate scraped data into chatbot response

### Subtask:
Modify the `generate_chatbot_response` function to use the collected detailed program data when generating the response to the user's query, providing specific information about prerequisites, admission averages, etc., for the recommended programs.


**Reasoning**:
Modify the `generate_chatbot_response` function to accept the `all_programs_detailed_data` list, iterate through it, filter for relevant programs based on user interests, format the relevant program information, and include it in the prompt for the generative model.



In [14]:
def generate_chatbot_response(user_data, all_programs_detailed_data):
  """Generates a chatbot response based on user data and scraped university info."""
  prompt = f"""Based on the following student information:
Subjects: {user_data['subjects']}
Intrests: {user_data['intrests']}
Overall Average: {user_data['overall_average']}
Grade: {user_data['grade']}

"""

  relevant_programs_info = ""
  user_interests_keywords = user_data['intrests'].lower().split() # Split interests into keywords

  if all_programs_detailed_data:
    relevant_programs_info += "\nInformation about potentially relevant programs:\n"
    for program_data in all_programs_detailed_data:
      program_name_lower = program_data.get('program_name', '').lower()
      # Check if any interest keyword is in the program name
      if any(keyword in program_name_lower for keyword in user_interests_keywords):
        relevant_programs_info += f"\nProgram Name: {program_data.get('program_name', 'N/A')}\n"
        relevant_programs_info += f"Program URL: {program_data.get('program_url', 'N/A')}\n"
        relevant_programs_info += f"Prerequisites: {program_data.get('prerequisites', 'N/A')}\n"
        relevant_programs_info += f"Admission Average: {program_data.get('admission_average', 'N/A')}\n"
        relevant_programs_info += "---\n"

  if relevant_programs_info:
      prompt += relevant_programs_info
  else:
      prompt += "\nCould not find detailed information for programs related to your interests.\n"


  prompt += """
Please provide some relevant information, such as:
- What program is recomended
- A ranking of all the universities for that specific program, also:
  - What are the prerequisites
  - Last few years admission average
  - How far the university is located
- Recommendations (courses to pursue in highschool, projects to complete for your university application)

Be specific and tailor the response to the student's input. Only give information for universities in Ontario.
"""
  response = chat.send_message(prompt)
  return response.text

## Refine data presentation

### Subtask:
Refine the presentation of the scraped data within the chatbot's response to be clear, well-formatted, and easy for the user to understand.


**Reasoning**:
Review the current formatting of the `relevant_programs_info` string and refine it using Markdown for clarity and readability, adding an introductory sentence.



In [16]:
def generate_chatbot_response(user_data, all_programs_detailed_data):
  """Generates a chatbot response based on user data and scraped university info."""
  prompt = f"""Based on the following student information:
- **Subjects:** {user_data['subjects']}
- **Intrests:** {user_data['intrests']}
- **Overall Average:** {user_data['overall_average']}
- **Grade:** {user_data['grade']}

"""

  relevant_programs_info = ""
  user_interests_keywords = user_data['intrests'].lower().split() # Split interests into keywords

  if all_programs_detailed_data:
    relevant_programs_info += "\n**Information about potentially relevant programs:**\n\n" # Added introductory sentence and bolding
    relevant_programs = []
    for program_data in all_programs_detailed_data:
      program_name_lower = program_data.get('program_name', '').lower()
      # Enhanced filtering: Check if any interest keyword is a substring of a word in the program name
      # or if the program name is a substring of an interest keyword.
      # This is a simple improvement, more sophisticated techniques could involve
      # natural language processing or analyzing program descriptions if available.
      if any(keyword in program_name_lower.split() or program_name_lower in keyword for keyword in user_interests_keywords):
          relevant_programs.append(program_data)

    if relevant_programs:
        # Sort relevant programs might be a useful addition here based on criteria like average
        # relevant_programs.sort(key=lambda x: x.get('admission_average', 'ZZZZ')) # Example sorting

        for program_data in relevant_programs:
            relevant_programs_info += f"**Program Name:** {program_data.get('program_name', 'N/A')}\n" # Bold program name
            relevant_programs_info += f"**Program URL:** {program_data.get('program_url', 'N/A')}\n" # Bold program URL
            relevant_programs_info += f"**Prerequisites:** {program_data.get('prerequisites', 'N/A')}\n" # Bold prerequisites
            relevant_programs_info += f"**Admission Average:** {program_data.get('admission_average', 'N/A')}\n" # Bold admission average
            relevant_programs_info += "---\n\n" # Added newline for better separation

    else:
        relevant_programs_info += "Could not find detailed information for programs closely related to your interests.\n\n"


  if relevant_programs_info:
      prompt += relevant_programs_info
  else:
      prompt += "\nCould not retrieve detailed program information.\n"


  prompt += """
Please provide some relevant information, such as:
- What program is recomended
- A ranking of all the universities for that specific program, also:
  - What are the prerequisites
  - Last few years admission average
  - How far the university is located
- Recommendations (courses to pursue in highschool, projects to complete for your university application)

Be specific and tailor the response to the student's input. Only give information for universities in Ontario.
"""
  response = chat.send_message(prompt)
  return response.text

## Handle potential issues

### Subtask:
Implement error handling for cases where scraping fails (e.g., invalid URL, website structure changes).


## Summary:

### Data Analysis Key Findings

*   The `list_all_programs` function was successfully modified to extract both the program name and its corresponding URL from the "/programs/all" page.
*   The process successfully iterated through the list of programs obtained from `list_all_programs` and constructed the full URL for each program.
*   The `scrape_university_info` function was successfully called for each program URL to extract detailed information, including prerequisites and admission averages, for many programs.
*   The scraped detailed data for each program was successfully stored in a list of dictionaries.
*   The `generate_chatbot_response` function was updated to incorporate the scraped detailed program data, filtering for programs relevant to the user's interests based on keywords in the program name.
*   Markdown formatting (bolding and list separators) was applied within the chatbot response to enhance the clarity and readability of the presented program details.
*   The existing error handling within the `scrape_university_info` function was confirmed to be sufficient for handling potential scraping failures.

### Insights or Next Steps

*   Further refine the relevance filtering in `generate_chatbot_response` to go beyond keyword matching in the program name, potentially by analyzing program descriptions if available or using a more sophisticated matching algorithm.
*   Address the observed deprecation warnings in `BeautifulSoup` usage within the `scrape_university_info` function for better code maintainability and future compatibility.
