# Adding libraries

In [None]:
!pip install groq

Collecting groq
  Downloading groq-0.18.0-py3-none-any.whl.metadata (14 kB)
Downloading groq-0.18.0-py3-none-any.whl (121 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m121.9/121.9 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: groq
Successfully installed groq-0.18.0


In [None]:
import os
import json
import requests
from typing import List
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI #if you wanna use open ai
from groq import Groq #if you wanna use GroqCloud for free models like llama
from IPython.display import Markdown, display #This is for viasualising the markdown format

## Setting up environment

In [None]:
from google.colab import userdata
api_key = userdata.get('GROQ_API_KEY')
# userdata.get('OPEN_AI_API_KEY')

# Check the key
if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
else:
    print("API key found and looks good so far!")

API key found and looks good so far!


In [None]:
# constants for groq i am using lama 70b model
MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama-3.3-70b-versatile'

In [None]:
client = Groq(api_key=api_key)
# client = OpenAI(api_key=api_key)

#check if the client sends a response
response = client.chat.completions.create(
    model=MODEL_LLAMA,
    messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who won 2019 US election?"},
    ],
)

print(response.choices[0].message.content)

There was no US presidential election in 2019. The most recent presidential elections were:

- 2016: Donald Trump (Republican) won the presidential election, defeating Hillary Clinton (Democrat).
- 2020: Joe Biden (Democrat) won the presidential election, defeating incumbent President Donald Trump (Republican).

The 2018 United States elections were held on November 6, 2018, and they were midterm elections, where members of the US House of Representatives and one-third of the US Senate were elected. The Democrats gained control of the House, while the Republicans maintained control of the Senate.

If you have any other questions, I'll be happy to help!


## Creating a simple web scrapper class using BeautifulSoup and requests

This scrapper scrappes a website and get its text content, links, images and tables. Try to play with it to get deeper understanding. This scrapper works very good with static website but may have issues with dynamic websites.

In [None]:
# Some websites need you to use proper headers when fetching them:
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class WebsiteScraper:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')

        self.title = soup.title.string if soup.title else "No title found"

        if soup.body:
            for irrelevant in soup.body(["script", "style", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""

        self.links = [link.get('href') for link in soup.find_all('a') if link.get('href')]
        self.images = [image.get('src') for image in soup.find_all('img') if image.get('src')]

        # Fix for table extraction
        self.tables = []
        for table in soup.find_all('table'):
            tableData = []
            for row in table.find_all('tr'):
                rowData = [cell.get_text(strip=True) for cell in row.find_all(['td', 'th'])]
                if rowData:  # Only append non-empty rows
                    tableData.append(rowData)
            self.tables.append(tableData)

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"


In [None]:
ed = WebsiteScraper("https://www.ewubd.edu/")
tabledata = ed.tables

Check out the output.This is what the scrapper scrapped from the homepage of the given website

In [None]:
ed.get_contents()

'Webpage Title:\nEast West University\nWebpage Contents:\nAdmission Test Results: Spring 2025\nCRTEWU 107th Research Seminar\nCall for Applications: Graduate Diploma in Leather, Leather Goods and Footwear Management (GDLFM) program of batch 3 & 4 (Deadline: 18 Dec. 2024)\nExtension of Time: Call for Proposals-Round 17\n2nd Call for Papers: East West Journal of Business and Social Studies: Vol.12, 2024\nHome\nFaculties\nFaculty of Sciences and Engineering\nFaculty of Liberal Arts and Social Sciences\nFaculty of Business and Economics\nDepartments\nCSE\nEEE\nGEB\nPharmacy\nCivil Engineering\nMathematical & Physical Sciences\nEnglish\nLaw\nSocial Relations\nInformation Studies\nSociology\nBusiness Administration\nMBA & EMBA Programs\nEconomics\nAlumni\nStudents\nDept. of Student\'s Welfare\nSexual Harassment\nGrades, Rules & Regulations\nRules & Regulation\nStudent Portal (Grade Report)\nSearch Course\nAcademic Calendar\nProctor Schedule\nPayment Procedure\nScholarships & Financial Assist

In [None]:
ed.images

['https://www.ewubd.edu/themes/east-west-university/assets/default/images/logo.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/students.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/icon/login-icon.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/logo.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/icon/graduation.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/icon/library.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/icon/scholar.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/icon/alumni-search.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/icon/icon-opac-200x150.png',
 'https://www.ewubd.edu/themes/east-west-university/assets/default/images/Dr.-Fasaruddin_Photo1.jpg',
 'https://www.ewubd.edu/storage/app/uploads/public/679/21b/

In [None]:
ed.links

['https://result.ewubd.edu/',
 'https://www.ewubd.edu/notice-details/crtewu-107th-research-seminar',
 'https://www.ewubd.edu/storage/app/media/EDC-SICIP/Admission%20Notice_East%20West%20University.pdf',
 'https://www.ewubd.edu/notice-details/extension-time-call-proposals-round-17-2',
 'https://www.ewubd.edu/storage/app/media/crt/Call%20For%20Paper/2nd%20Call%20Vol%2012.pdf',
 'https://www.ewubd.edu',
 '#kingster-mobile-menu',
 'https://www.ewubd.edu',
 '#',
 'https://fse.ewubd.edu',
 'https://flass.ewubd.edu',
 'https://fbe.ewubd.edu',
 '#',
 'https://fse.ewubd.edu/computer-science-engineering',
 'https://fse.ewubd.edu/electrical-electronic-engineering',
 'https://fse.ewubd.edu/genetic-engineering-biotechnology',
 'https://fse.ewubd.edu/pharmacy-department',
 'https://fse.ewubd.edu/civil-engineering',
 'https://fse.ewubd.edu/mathematical-physical-science',
 'https://flass.ewubd.edu/english-department',
 'https://flass.ewubd.edu/law-department',
 'https://flass.ewubd.edu/social-relation

## Lets use the data to make a brochure maker

As we can see There are a lot of links that the scrapper collected from the website. And some links are broken. For this example my terget is to create a brocure so i need info about the website. So lets use an LLM to find the relevent links and create an object that i can use to get all the info about the website. Lets define the prompt (system prompt and user prompt) for the llm.

In [None]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

# here i am asking the model to go throw the links collected from the scrapper to find the relevant links and
# output them in json format.

In [None]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt
# here i am prompting what kind of links are relevant to the website.

In [None]:
# so user prompt will be
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://www.ewubd.edu/ - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://result.ewubd.edu/
https://www.ewubd.edu/notice-details/crtewu-107th-research-seminar
https://www.ewubd.edu/storage/app/media/EDC-SICIP/Admission%20Notice_East%20West%20University.pdf
https://www.ewubd.edu/notice-details/extension-time-call-proposals-round-17-2
https://www.ewubd.edu/storage/app/media/crt/Call%20For%20Paper/2nd%20Call%20Vol%2012.pdf
https://www.ewubd.edu
#kingster-mobile-menu
https://www.ewubd.edu
#
https://fse.ewubd.edu
https://flass.ewubd.edu
https://fbe.ewubd.edu
#
https://fse.ewubd.edu/computer-science-engineering
https://fse.ewubd.edu/electrical-electronic-engineering
https://fse.ewubd.edu/genetic-engineering-biotechnology
https://fse.ewubd.edu/pharmacy-department
htt

In [None]:
#now we send the prompts to the llm
def get_links(url):
    website = WebsiteScraper(url)
    response = client.chat.completions.create(
        model=MODEL_LLAMA,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [None]:
get_links("https://www.ewubd.edu/")

{'links': [{'type': 'about page', 'url': 'https://www.ewubd.edu/'},
  {'type': 'admission page', 'url': 'http://admission.ewubd.edu'},
  {'type': 'career page', 'url': 'https://www.ewubd.edu/career'},
  {'type': 'alumni page', 'url': 'https://alumni.ewubd.edu/'},
  {'type': 'scholarships page',
   'url': 'https://www.ewubd.edu/scholarships-financial-aid'},
  {'type': 'academic calendar',
   'url': 'https://www.ewubd.edu/academic-calendar'},
  {'type': 'campus life', 'url': 'https://www.ewubd.edu/campus-life'}]}

Now we have relevent links that we can use to go throw and get information (If you use open ai it may give better result. I am using a small model so it gave me small list of relevent links)

In [None]:
#here we go throw each links and get the page content for all the links

def get_all_details(url):
    result = "Landing page:\n"
    result += WebsiteScraper(url).get_contents()
    links = get_links(url)
    # print("Found links:", links)
    for link in links["links"]:
        print("Processing link:", link)
        result += f"\n\n{link['type']}\n"
        result += WebsiteScraper(link["url"]).get_contents()
    return result

In [None]:
get_all_details("https://www.ewubd.edu/")

Processing link: {'type': 'about page', 'url': 'https://www.ewubd.edu/'}
Processing link: {'type': 'admission page', 'url': 'http://admission.ewubd.edu'}
Processing link: {'type': 'career page', 'url': 'https://www.ewubd.edu/career'}
Processing link: {'type': 'academic calendar', 'url': 'https://www.ewubd.edu/academic-calendar'}
Processing link: {'type': 'campus life', 'url': 'https://www.ewubd.edu/campus-life'}
Processing link: {'type': 'scholarships and financial aid', 'url': 'https://www.ewubd.edu/scholarships-financial-aid'}
Processing link: {'type': 'degree programs', 'url': 'https://www.ewubd.edu/degree-programs'}
Processing link: {'type': 'alumni', 'url': 'https://alumni.ewubd.edu/'}
Processing link: {'type': 'library', 'url': 'http://lib.ewubd.edu/'}
Processing link: {'type': 'result', 'url': 'https://result.ewubd.edu/'}


'Landing page:\nWebpage Title:\nEast West University\nWebpage Contents:\nAdmission Test Results: Spring 2025\nCRTEWU 107th Research Seminar\nCall for Applications: Graduate Diploma in Leather, Leather Goods and Footwear Management (GDLFM) program of batch 3 & 4 (Deadline: 18 Dec. 2024)\nExtension of Time: Call for Proposals-Round 17\n2nd Call for Papers: East West Journal of Business and Social Studies: Vol.12, 2024\nHome\nFaculties\nFaculty of Sciences and Engineering\nFaculty of Liberal Arts and Social Sciences\nFaculty of Business and Economics\nDepartments\nCSE\nEEE\nGEB\nPharmacy\nCivil Engineering\nMathematical & Physical Sciences\nEnglish\nLaw\nSocial Relations\nInformation Studies\nSociology\nBusiness Administration\nMBA & EMBA Programs\nEconomics\nAlumni\nStudents\nDept. of Student\'s Welfare\nSexual Harassment\nGrades, Rules & Regulations\nRules & Regulation\nStudent Portal (Grade Report)\nSearch Course\nAcademic Calendar\nProctor Schedule\nPayment Procedure\nScholarships & F

## Brocure maker from collected page contents

We have all the contents from the websites relevent links. Now we can feed the content to another LLM that will create the brochure. This is an agentic AI example where one llm working for creating relevent link and the other one working for creating brochure. Lets see what it does...

In [None]:
#create the system prompt for the LLM
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a website \
and creates a short brochure about the website for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs and other informations if you have the information with links."

In [None]:
#create the function
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"The website links to the company's landing page: {get_links(url)}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:10_000] # Truncate if more than 10,000 characters
    return user_prompt

In [None]:
get_brochure_user_prompt("EWU","https://www.ewubd.edu/")

Processing link: {'type': 'about page', 'url': 'https://www.ewubd.edu/'}
Processing link: {'type': 'admission page', 'url': 'http://admission.ewubd.edu'}
Processing link: {'type': 'careers page', 'url': 'https://www.ewubd.edu/career'}
Processing link: {'type': 'alumni page', 'url': 'https://alumni.ewubd.edu/'}
Processing link: {'type': 'academics page', 'url': 'https://www.ewubd.edu/degree-programs'}
Processing link: {'type': 'campus life page', 'url': 'https://www.ewubd.edu/campus-life'}
Processing link: {'type': 'scholarships page', 'url': 'https://www.ewubd.edu/scholarships-financial-aid'}


"You are looking at a company called: EWU\nThe website links to the company's landing page: {'links': [{'type': 'about page', 'url': 'https://www.ewubd.edu/'}, {'type': 'admissions page', 'url': 'http://admission.ewubd.edu'}, {'type': 'careers page', 'url': 'https://www.ewubd.edu/career'}, {'type': 'alumni page', 'url': 'https://alumni.ewubd.edu/'}, {'type': 'academic programs page', 'url': 'https://www.ewubd.edu/degree-programs'}, {'type': 'campus life page', 'url': 'https://www.ewubd.edu/campus-life'}, {'type': 'scholarships and financial aid page', 'url': 'https://www.ewubd.edu/scholarships-financial-aid'}]}\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nEast West University\nWebpage Contents:\nAdmission Test Results: Spring 2025\nCRTEWU 107th Research Seminar\nCall for Applications: Graduate Diploma in Leather, Leather Goods and Footwear Management (GDLFM

In [None]:
def create_brochure(company_name, url):
    response = client.chat.completions.create(
        model=MODEL_LLAMA,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [None]:
create_brochure("Next Venture","https://nextventures.io/")

Processing link: {'type': 'about page', 'url': 'https://nextventures.io/'}
Processing link: {'type': 'careers page', 'url': 'https://nextventures.io/career'}
Processing link: {'type': 'company culture', 'url': 'https://nextventures.io/life-at-next'}
Processing link: {'type': 'products', 'url': 'https://nextventures.io/products/fundednext'}
Processing link: {'type': 'contact page', 'url': 'https://nextventures.io/contact-us'}


# Introduction to Next Venture
Next Venture is a leading fintech group that specializes in international CFD brokerage and prop trading. Our mission is to revolutionize global trading with cutting-edge technology and seamless user experiences.

## Company Culture
At Next Venture, we pride ourselves on our diverse and inclusive culture. Our team of 450 people from different backgrounds operates from 5 countries, including UAE, Malaysia, Bangladesh, Sri Lanka, and Cyprus. We believe in creating an environment where talented individuals can grow, feel valued, and choose to stay. Our culture is built on the principles of innovation, progress, and excellence.

## Customers
We support a vast trading community through our flagship products, FundedNext and FNmarkets. Our platforms serve over 220,000+ daily active traders from 170+ countries. We are committed to delivering superior operational excellence while adhering to all local and international compliance.

## Careers and Jobs
We offer a range of career opportunities for talented individuals who are passionate about fintech and trading. Our careers page can be found at [https://nextventures.io/career](https://nextventures.io/career). We provide a supportive environment for skill and confidence building, and opportunities to grow as a leader and team player.

## Products and Services
Our flagship products include:
* FundedNext: a leading prop trading firm committed to empowering promising traders worldwide to achieve maximum trading success.
* FNmarkets: an international brokerage that allows traders to trade on international financial markets (stocks, futures, commodities, currencies).

## Global Operations
We have operations in 5 countries, including:
* UAE: Meydan Grandstand, 6th floor, Meydan Road, Nad Al Sheba, Dubai, U.A.E.
* Malaysia: Level 2, Room 25, Jalan SS 21/39, Damansara Utama, 47400 Petaling Jaya, Selangor, Malaysia
* Bangladesh: 6th & 8th Floor, Cha-90, The Pearl Trade Center, 3 Pragati Sarani, Dhaka 1212
* Sri Lanka: World Trade Center, Echelon Square, Colombo 1, Sri Lanka
* Cyprus: 26 Pittalou str. Agia Fyla, Limassol, 3118, Cyprus

## Join Us
If you are interested in joining our team or learning more about our products and services, please visit our website at [https://nextventures.io/](https://nextventures.io/). You can also contact us at [https://nextventures.io/contact-us](https://nextventures.io/contact-us).

## Our Vision
Our vision is to build a worldwide community of traders and to revolutionize global trading with cutting-edge technology and seamless user experiences. We believe in creating a complete ecosystem for traders worldwide and are committed to delivering superior operational excellence.

[Visit our website](https://nextventures.io/) to learn more about Next Venture and our mission to revolutionize global trading.