# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [6]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import SystemMessage, HumanMessage



In [7]:
# Initialize and constants

load_dotenv(override=True)
api_key = os.getenv('GOOGLE_API_KEY')

if api_key :
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gemini-1.5-pro'
chatModel=ChatGoogleGenerativeAI(model="gemini-1.5-pro", google_api_key=api_key, temperature=0.4)


API key looks good so far


In [8]:
for chunk in chatModel.stream("give me short story in b1 profienty?"):
    print(chunk.content, end='', flush=True)

The aroma of woodsmoke clung to Amelia's thick wool coat as she trudged through the freshly fallen snow.  The forest path, usually a vibrant green tunnel, was now a hushed corridor of white.  Amelia shivered, not entirely from the cold.  She’d been avoiding this walk for weeks, ever since the argument.  The argument that had left a silence thicker than the snow, a silence that stretched between her and her grandfather, Elias.

Elias, a renowned woodcarver, lived a solitary life in a small cabin nestled deep within the woods.  He’d taught Amelia everything she knew about the forest, about respecting its rhythms and its secrets.  But their shared passion had fractured.  He’d disapproved of her decision to leave for the city, to pursue her art in a bustling, impersonal world.  His words, sharp and unexpected, had stung.  "You're abandoning the very thing that nourishes you," he'd said.

Now, with the holidays approaching, the guilt gnawed at her.  She clutched the small, intricately carve

In [None]:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links,
    using Selenium to fetch fully rendered content.
    """

    def __init__(self, url, driver_path="C:\webdriver\chromedriver-win64\chromedriver.exe", headless=True):
        self.url = url

        options = Options()
        if headless:
            options.add_argument("--headless")
            options.add_argument("--no-sandbox")
            options.add_argument("--disable-dev-shm-usage")

        service = Service(driver_path)
        self.driver = webdriver.Chrome(service=service, options=options)

        # Sayfayı aç
        self.driver.get(url)

        # Sayfanın yüklenmesini bekle (örneğin body tag'inin gelmesini)
        WebDriverWait(self.driver, 10).until(
            EC.presence_of_element_located((By.TAG_NAME, "body"))
        )

        # Render edilmiş sayfa kaynağını al
        html = self.driver.page_source
        self.driver.quit()

        # Parse et
        soup = BeautifulSoup(html, 'html.parser')

        self.title = soup.title.string if soup.title else "No title found"

        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""

        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"


In [10]:
ed = Website("https://dekatechs.com")
print(ed.get_contents())

NoSuchDriverException: Message: Unable to obtain driver for chrome; For documentation on this error, please visit: https://www.selenium.dev/documentation/webdriver/troubleshooting/errors/driver_location


In [34]:
ed = Website("https://dekatechs.com")
ed.links

['https://dekatechs.com',
 'https://dekatechs.com/solutions/',
 'https://dekatechs.com/about-us/',
 'https://dekatechs.com/case-studies/',
 'https://dekatechs.com/blog/',
 '#',
 'https://dekatechs.com/contact/',
 'https://dekatechs.com/faq/',
 'https://dekatechs.com/solutions/',
 'https://dekatechs.com/about-us/',
 'https://dekatechs.com/case-studies/',
 'https://dekatechs.com/blog/',
 '#',
 'https://dekatechs.com/contact/',
 'https://dekatechs.com/faq/',
 'https://dekatechs.com/client-support/',
 'https://dekatechs.com/contact/',
 'https://dekatechs.com',
 'https://dekatechs.com/solutions/',
 'https://dekatechs.com/about-us/',
 'https://dekatechs.com/case-studies/',
 'https://dekatechs.com/blog/',
 '#',
 'https://dekatechs.com/contact/',
 'https://dekatechs.com/faq/',
 'https://dekatechs.com/solutions/',
 'https://dekatechs.com/about-us/',
 'https://dekatechs.com/case-studies/',
 'https://dekatechs.com/blog/',
 '#',
 'https://dekatechs.com/contact/',
 'https://dekatechs.com/faq/',
 'h

## First step: Have GPT-4o-mini figure out which links are relevant

### Use a call to gpt-4o-mini to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  
We will use "one shot prompting" in which we provide an example of how it should respond in the prompt.

This is an excellent use case for an LLM, because it requires nuanced understanding. Imagine trying to code this without LLMs by parsing and analyzing the webpage - it would be very hard!

Sidenote: there is a more advanced technique called "Structured Outputs" in which we require the model to respond according to a spec. We cover this technique in Week 8 during our autonomous Agentic AI project.

In [35]:
link_system_prompt = """
You are a helpful assistant that filters relevant links from a webpage.
Your goal is to identify and return links that would be useful in a company brochure,
such as links to: social media links, About, Careers, Company Info, Blog, or Contact pages.

You MUST respond with a valid JSON object in the format:

{
  "links": [
    {
      "type": "about page",
      "url": "https://example.com/about"
    },
    {
      "type": "careers page",
      "url": "https://example.com/careers"
    }
  ]
}

Only include all links that are relevant to a brochure.
Do NOT include links to Terms of Service, Privacy Policy, or Email links.
Only return the JSON. Do not add explanations or extra text.
"""


In [36]:
print(link_system_prompt)


You are a helpful assistant that filters relevant links from a webpage.
Your goal is to identify and return links that would be useful in a company brochure,
such as links to: social media links, About, Careers, Company Info, Blog, or Contact pages.

You MUST respond with a valid JSON object in the format:

{
  "links": [
    {
      "type": "about page",
      "url": "https://example.com/about"
    },
    {
      "type": "careers page",
      "url": "https://example.com/careers"
    }
  ]
}

Only include all links that are relevant to a brochure.
Do NOT include links to Terms of Service, Privacy Policy, or Email links.
Only return the JSON. Do not add explanations or extra text.



In [37]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [38]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://dekatechs.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://dekatechs.com
https://dekatechs.com/solutions/
https://dekatechs.com/about-us/
https://dekatechs.com/case-studies/
https://dekatechs.com/blog/
#
https://dekatechs.com/contact/
https://dekatechs.com/faq/
https://dekatechs.com/solutions/
https://dekatechs.com/about-us/
https://dekatechs.com/case-studies/
https://dekatechs.com/blog/
#
https://dekatechs.com/contact/
https://dekatechs.com/faq/
https://dekatechs.com/client-support/
https://dekatechs.com/contact/
https://dekatechs.com
https://dekatechs.com/solutions/
https://dekatechs.com/about-us/
https://dekatechs.com/case-studies/
https://dekatechs.com/blog/
#
https://dekatechs.com/contact/
https://dekatechs.com/faq/
https://dekatechs.com/sol

In [39]:
import re


def get_links(url):
    website = Website(url)
    
    messages = [
        SystemMessage(content=link_system_prompt),
        HumanMessage(content=get_links_user_prompt(website))
    ]
    
    response = chatModel.invoke(messages)
    if response.content.strip().startswith("```json"):
        response.content = re.sub(r"```json\s*", "", response.content)
        response.content = re.sub(r"```$", "", response.content.strip())

    
    # Gemini yanıtını JSON formatında döndürmeye çalışırız
    try:
        return json.loads(response.content)
    except Exception:
        # Eğer geçerli JSON değilse hata fırlat
        raise ValueError(f"LLM response was not valid JSON: {response.content}")

In [40]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'brand page', 'url': 'https://huggingface.co/brand'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'blog', 'url': 'https://huggingface.co/blog'},
  {'type': 'community forum', 'url': 'https://discuss.huggingface.co'},
  {'type': 'github', 'url': 'https://github.com/huggingface'},
  {'type': 'twitter', 'url': 'https://twitter.com/huggingface'},
  {'type': 'linkedin', 'url': 'https://www.linkedin.com/company/huggingface/'},
  {'type': 'discord', 'url': 'https://huggingface.co/join/discord'}]}

In [41]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 '/spaces',
 '/models',
 '/nvidia/parakeet-tdt-0.6b-v2',
 '/Wan-AI/Wan2.1-VACE-14B',
 '/multimodalart/isometric-skeumorphic-3d-bnb',
 '/nari-labs/Dia-1.6B',
 '/lodestones/Chroma',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/Lightricks/ltx-video-distilled',
 '/spaces/smolagents/computer-agent',
 '/spaces/ByteDance/DreamO',
 '/spaces/NihalGazi/FLUX-Pro-Unlimited',
 '/spaces',
 '/datasets/openbmb/Ultra-FineWeb',
 '/datasets/PrimeIntellect/INTELLECT-2-RL-Dataset',
 '/datasets/nvidia/OpenCodeReasoning',
 '/datasets/nvidia/OpenMathReasoning',
 '/datasets/disco-eth/EuroSpeech',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/google',
 '/Intel',
 '/microsoft',
 '/grammarly',
 '/Writer',
 '/docs/transf

## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [42]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [43]:
print(get_all_details("https://huggingface.co")[3956:])

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'brand page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'github', 'url': 'https://github.com/huggingface'}, {'type': 'twitter', 'url': 'https://twitter.com/huggingface'}, {'type': 'linkedin', 'url': 'https://www.linkedin.com/company/huggingface/'}, {'type': 'discord', 'url': 'https://huggingface.co/join/discord'}]}


SSLError: HTTPSConnectionPool(host='discord.gg', port=443): Max retries exceeded with url: /JfAtkvEtRb (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:992)')))

In [44]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [45]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [46]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'home page', 'url': 'https://huggingface.co/'}, {'type': 'models page', 'url': 'https://huggingface.co/models'}, {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'}, {'type': 'spaces page', 'url': 'https://huggingface.co/spaces'}, {'type': 'docs page', 'url': 'https://huggingface.co/docs'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'join page', 'url': 'https://huggingface.co/join'}, {'type': 'endpoints page', 'url': 'https://endpoints.huggingface.co'}, {'type': 'brand page', 'url': 'https://huggingface.co/brand'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'learn page', 'url': 'https://huggingface.co/learn'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community forum', 'url': 'https://discuss.huggingface.co'}, {'type': 'github', 'url': 'https://github.com/huggingf

SSLError: HTTPSConnectionPool(host='discord.gg', port=443): Max retries exceeded with url: /JfAtkvEtRb (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate (_ssl.c:992)')))

In [47]:
def create_brochure(company_name, url):
    messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=get_brochure_user_prompt(company_name, url))
    ]
    response = chatModel.invoke(messages)
    result = response.content
    display(Markdown(result))


In [48]:
create_brochure("Deka Technology", "https://dekatechs.com")

Found links: {'links': [{'type': 'homepage', 'url': 'https://dekatechs.com'}, {'type': 'about page', 'url': 'https://dekatechs.com/about-us/'}, {'type': 'careers page', 'url': 'https://dekatechs.com/careers/'}, {'type': 'blog', 'url': 'https://dekatechs.com/blog/'}, {'type': 'contact', 'url': 'https://dekatechs.com/contact/'}, {'type': 'social', 'url': 'https://www.linkedin.com/company/dekatechs/mycompany/verification/'}, {'type': 'social', 'url': 'https://twitter.com/dekatechnology'}, {'type': 'social', 'url': 'https://www.facebook.com/dekatechs/'}]}


## Deka Technology: Empowering Your Business Through Innovative IT Solutions

**Deka Technology** is your trusted partner for comprehensive IT services and support, helping you manage your technology so you can focus on growing your business.  With a proven track record of 7 years, 99% customer satisfaction, and over 300 completed projects, we deliver cost-effective, innovative solutions tailored to your specific industry needs.

**Our Services:**

Deka Technology offers a wide range of IT solutions, including:

* **Managed Services:**  Free up your internal resources by letting us handle your day-to-day IT support, management, and monitoring.
* **IT Consulting & Advisory:**  Expert guidance to optimize your IT infrastructure and strategy.
* **Infrastructure Management:** Comprehensive management of your IT infrastructure, from planning and design to implementation and maintenance.
* **Web Development:**  Building and maintaining robust and user-friendly web applications.
* **Mobile Development:** Creating cutting-edge mobile solutions for your business needs.
* **Cloud Management:**  Leveraging the power of the cloud for improved performance, scalability, and cost-effectiveness.
* **Software Development:**  Developing custom software solutions to streamline your operations and drive innovation.
* **Database Consulting:** Expert advice and support for optimizing your database performance and security.
* **Data Analytics and Business Intelligence:**  Unlocking the potential of your data to make informed decisions and gain a competitive edge.
* **Digital Transformation:** Guiding your business through digital transformation initiatives to enhance efficiency and agility.
* **Robotic Process Automation (RPA):** Automating repetitive tasks to improve productivity and reduce costs.

**Industry Expertise:**

We specialize in serving a variety of industries, including:

* Consumer Goods
* Transportation & Logistics
* Healthcare
* Banking & Insurance
* Consulting Providers
* Automotive

**Our Approach:**

* **Cost-Effectiveness:** We offer affordable solutions to maximize your ROI.
* **Innovative Technology:** We stay ahead of the curve with the latest technology trends.
* **Industry Expertise:** Tailored solutions to meet your specific industry requirements.
* **Scalability:** Solutions that grow with your business.

**Technology Partners:**

We collaborate with leading technology vendors to bring you the best solutions:

* **Datacenter & Hosting:**  ANSI/TIA-942 rated 4 certified data center.
* **Cloud Platforms:** AWS, Microsoft Azure, Rackspace, OVH, DigitalOcean, Bluehost.
* **Collaboration Tools:** Microsoft Exchange Online, SharePoint Online, Defender for 365, Autopilot, Office Apps, Windows 365.

**Careers:**

[Information about careers at Deka Technology was not provided on the website.  This section would be populated with details about job opportunities, company culture, and employee benefits if available.]


**Contact Us:**

Schedule a free consultation today to discuss your IT needs!  [Contact information would be added here, if available on the website.]