## Brochure Making using Gemini-2.0-flash

Creates a Brochure for a company given a name and primary website.

In [73]:
# Loading all the required libraries

In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from langchain_google_genai import ChatGoogleGenerativeAI
from dotenv import load_dotenv
from langchain_core.messages import SystemMessage, HumanMessage


In [2]:
# Loading the enviroment variables

In [3]:
load_dotenv()

True

In [4]:
# Defining the model

In [5]:
model  = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

In [6]:
# This function will scrapa the input weblink and fetch all the links available on the website

In [7]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [13]:
ed = Website("https://www.roche.com")
ed.links

['/about/',
 '/about/',
 '/about/strategy/',
 '/about/business/',
 '/about/sustainability/',
 '/about/leadership/',
 '/about/governance/',
 '/about/history',
 '/solutions/',
 '/solutions/',
 '/solutions/focus-areas/',
 '/solutions/pharma/',
 '/solutions/diagnostics/',
 '/solutions/pipeline/',
 '/innovation/',
 '/innovation/',
 '/innovation/structure/',
 '/innovation/process/',
 '/innovation/ethical-standards/',
 '/innovation/partnering/',
 '/investors/',
 '/investors/',
 '/investors/updates/',
 '/investors/events/',
 '/investors/reports/',
 '/investors/rofis',
 '/investors/bonds',
 '/investors/downloads',
 '/media/',
 '/media/',
 '/media/releases/',
 '/media/events/',
 '/media/statements',
 '/media/library-images',
 '/stories/',
 'https://careers.roche.com',
 '/worldwide',
 '/search',
 'https://www.roche.com/investors/annualreport24',
 'https://www.roche.com/investors/updates/inv-update-2025-01-30',
 '/stories/ai-revolutionising-drug-discovery-and-transforming-patient-care',
 '/stories

## Step - 1
Using Gemini to sort relevant links
I will give the links fetched by the "Website" function as and ask Gemnini to figure out if the links found by the function are relevant. 
I have used `one shot prompting` in which one example is given to Gemini to explain the format of the result

In [80]:
# Defining the system prompt

In [8]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [9]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [10]:
# This is the Human prompt, where all the links found earlier are sent to Gemini
# for it to decide if the links are relevant. 
# Notice a lot of links are irrelevant and should be removed.

In [11]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [14]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://www.roche.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
/about/
/about/
/about/strategy/
/about/business/
/about/sustainability/
/about/leadership/
/about/governance/
/about/history
/solutions/
/solutions/
/solutions/focus-areas/
/solutions/pharma/
/solutions/diagnostics/
/solutions/pipeline/
/innovation/
/innovation/
/innovation/structure/
/innovation/process/
/innovation/ethical-standards/
/innovation/partnering/
/investors/
/investors/
/investors/updates/
/investors/events/
/investors/reports/
/investors/rofis
/investors/bonds
/investors/downloads
/media/
/media/
/media/releases/
/media/events/
/media/statements
/media/library-images
/stories/
https://careers.roche.com
/worldwide
/search
https://www.roche.com/investors/annualreport24
https://www.ro

In [15]:
# The output might require little bit of text cleaning

In [16]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [17]:
import re

def strip_json_markdown(text):
    match = re.search(r"```json\s*(.*?)\s*```", text, re.DOTALL)
    return match.group(1) if match else text


In the `get_links` function, a website is given as input, it will scrape all the links using the `Website` function. Then the links filtering system and human prompts are created using messages module of langchain framework. Then the model (`Gemini`) is used to filter the relevant links.

In [20]:
def get_links(url):
    website = Website(url)
    messages = [
        ("system",link_system_prompt),
        ("human",get_links_user_prompt(website))
    ]
    response = model.invoke(messages)
    return json.loads(strip_json_markdown(response.content))
    

In [21]:
get_links("https://huggingface.co")

{'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'},
  {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'},
  {'type': 'enterprise', 'url': 'https://huggingface.co/enterprise'},
  {'type': 'pricing', 'url': 'https://huggingface.co/pricing'},
  {'type': 'brand', 'url': 'https://huggingface.co/brand'}]}

In [22]:
get_links("https://roche.com")

{'links': [{'type': 'about page', 'url': 'https://www.roche.com/about/'},
  {'type': 'about strategy', 'url': 'https://www.roche.com/about/strategy/'},
  {'type': 'about business', 'url': 'https://www.roche.com/about/business/'},
  {'type': 'about sustainability',
   'url': 'https://www.roche.com/about/sustainability/'},
  {'type': 'about leadership',
   'url': 'https://www.roche.com/about/leadership/'},
  {'type': 'about governance',
   'url': 'https://www.roche.com/about/governance/'},
  {'type': 'about history', 'url': 'https://www.roche.com/about/history'},
  {'type': 'solutions', 'url': 'https://www.roche.com/solutions/'},
  {'type': 'solutions focus areas',
   'url': 'https://www.roche.com/solutions/focus-areas/'},
  {'type': 'solutions pharma',
   'url': 'https://www.roche.com/solutions/pharma/'},
  {'type': 'solutions diagnostics',
   'url': 'https://www.roche.com/solutions/diagnostics/'},
  {'type': 'solutions pipeline',
   'url': 'https://www.roche.com/solutions/pipeline/'},


## Making the brochure

Now I get all the details of the website by making another request to Gemini.
`get_all_details` function will take the given website.

```mermaid
flowchart TD
    A[Given a company URL] --> B[Enter into `get_all_details` fucntion]
    B --> C[Scrapes all the links found on the company landing page]
    C --> D[Prompts Gemini to find the relevant links]
    D --> E[Scrape content form the links filtered by Gemini]
    E --> F[Create a prompt using the scraped information]
    F --> G[Prompt Gemini to create the company brochure]


In [24]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    #print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [97]:
print(get_all_details("https://huggingface.co"))

Landing page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
NEW
Welcome Cohere on the Hub 🔥
Welcome Hyperbolic, Nebius AI Studio, and Novita on the Hub 🔥
Welcome Fireworks.ai on the Hub 🎆
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
microsoft/bitnet-b1.58-2B-4T
Updated
2 days ago
•
17.4k
•
649
HiDream-ai/HiDream-I1-Full
Updated
about 10 hours ago
•
26.8k
•
680
nari-labs/Dia-1.6B
Updated
about 2 hours ago
•
5.67k
•
248
microsoft/MAI-DS-R1
Updated
6 days ago
•
427
•
197
Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0
Updated
about 4 hours ago
•
11.9k
•
184
Browse 1M+ models
Spaces
Running
5k
5k
DeepSite
🐳
Generate any application with DeepSeek
Running
on
Zero
505
505
UNO FLUX
⚡
Generate customized images using

In [25]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


**⚠️ Important:** In the `get_brochure_user_prompt` function, the user prompt is truncated to 50K characters. Make sure you are aware of the cost/time and run it accordingly


In [26]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    #user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [106]:
# Understanding the prompt

In [107]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

"You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nNEW\nWelcome Cohere on the Hub 🔥\nWelcome Hyperbolic, Nebius AI Studio, and Novita on the Hub 🔥\nWelcome Fireworks.ai on the Hub 🎆\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nmicrosoft/bitnet-b1.58-2B-4T\nUpdated\n2 days ago\n•\n17.4k\n•\n649\nHiDream-ai/HiDream-I1-Full\nUpdated\nabout 10 hours ago\n•\n26.8k\n•\n680\nnari-labs/Dia-1.6B\nUpdated\nabout 2 hours ago\n•\n5.67k\n•\n248\nmicrosoft/MAI-DS-R1\nUpdated\n6 days ago\n•\n427\n•\n1

In [108]:
# Function to create the brochure

In [27]:
def create_brochure(company_name, url):
    
    message = [
        ("system",system_prompt),
        ("human",get_brochure_user_prompt(company_name,url))
    ]
    response = model.invoke(message)
    #result = response.choices[0].message.content
    display(Markdown(response.content))


In [111]:
# Creating Hugging Face Brochure

# Hugging Face: The AI Community Building the Future

## Democratizing Good Machine Learning

Hugging Face is the leading collaboration platform for the machine learning community, fostering an open and ethical AI future. We empower the next generation of ML engineers, scientists, and end-users to learn, collaborate, and share their work.

## What We Offer

*   **Models (1M+):** Explore, discover, and share state-of-the-art pre-trained models for various tasks.
*   **Datasets (250k+):** Access and collaborate on datasets for any ML task.
*   **Spaces (400k+ applications):** Build and host ML applications and demos with ease.
*   **Inference Endpoints:** Deploy ML models on fully managed infrastructure.
*   **HuggingChat:** Interact with an open-source conversational AI assistant.
*   **Open Source Libraries:**
    *   **Transformers:** State-of-the-art ML for PyTorch, TensorFlow, and JAX.
    *   **Diffusers:** State-of-the-art Diffusion models in PyTorch.
    *   **Safetensors:** Safe way to store/distribute neural network weights.
    *   **Datasets:** Access & share datasets for any ML tasks.
    *   **Accelerate:** Train PyTorch models with multi-GPU, TPU, mixed precision

## Key Features

*   **Collaboration Platform:** Host and collaborate on unlimited public models, datasets, and applications.
*   **Open Source Focus:** Built on the foundation of open-source ML tooling.
*   **Versatile Modalities:** Supports text, image, video, audio, and 3D data.
*   **Portfolio Building:** Share your work and build your ML profile.

## Accelerate Your ML

We provide paid Compute and Enterprise solutions to scale your AI initiatives.

*   **Compute:** Deploy on optimized Inference Endpoints or upgrade your Spaces applications to a GPU in a few clicks.
    *   **Spaces Hardware:** Flexible compute options starting at $0/hour.
    *   **Inference Endpoints:** Starting at $0.032/hour.
*   **Enterprise Hub:** Enterprise-ready AI platform with enhanced security, access controls, and dedicated support. Starting at $20/user/month.
    *   **Single Sign-On:** Securely connect to your identity provider.
    *   **Regions:** Manage the location of your repository data.
    *   **Audit Logs:** Comprehensive logs to track actions taken.
    *   **Resource Groups:** Granular access control for repositories.
    *   **Priority Support:** Maximize platform usage with priority support.

## Who Uses Hugging Face?

More than 50,000 organizations, including:

*   AI2
*   AI at Meta
*   Amazon
*   Google
*   Intel
*   Microsoft
*   Grammarly
*   Writer
*   NVIDIA
*   Shopify
*   Snowflake
*   Meta Llama
*   Stability AI

## Pricing

*   **HF Hub:** Free for unlimited public models and datasets.
*   **PRO:** $9/month for advanced features like ZeroGPU, Dev Mode for Spaces, and more.
*   **Enterprise Hub:** Starting at $20/user/month for enterprise-grade features.

## Join the Community

We are on a mission to democratize good machine learning, one commit at a time.

*   **Website:** [https://huggingface.co](https://huggingface.co)
*   **GitHub:** [https://github.com/huggingface](https://github.com/huggingface)
*   **Twitter:** [https://twitter.com/huggingface](https://twitter.com/huggingface)
*   **LinkedIn:** [https://linkedin.com/company/huggingface](https://linkedin.com/company/huggingface)
*   **Discord:** [https://discord.com/invite/huggingface](https://discord.com/invite/huggingface)

## Contact

For press enquiries, you can contact our team here.

In [113]:
# Creating Roche brochure

In [114]:
create_brochure("Roche", "https://www.roche.com")

# Roche: Doing Now What Patients Need Next

## About Roche

For over 125 years, Roche has been a pioneering force in healthcare, evolving into one of the world's largest biotech companies and a leading provider of in-vitro diagnostics. Our commitment extends to developing innovative solutions across major disease areas, always putting patients first.

## Our Mission

We are driven by a singular purpose: to create a healthier future. We focus on preventing, stopping, or curing diseases with the highest societal burden, improving health outcomes, and reducing costs for patients and healthcare systems worldwide.

## What We Do

Roche operates with combined strength in pharmaceuticals and diagnostics, addressing patient needs across the entire healthcare journey:

*   **Pharma Solutions:** Developing innovative treatments across major disease areas, including oncology, virology, and immunology.
*   **Diagnostic Solutions:** Transforming healthcare through diagnostic tests, instruments, and digital solutions for disease prevention, diagnosis, and monitoring.
*   **Digital Health Solutions:** Using evidence-based approach, our insights-driven digital health solutions aim to support patient care.

## Innovation at Roche

Innovation is at the heart of everything we do. We invest heavily in Research and Development (CHF 13 billion in 2024) and foster a culture of autonomy and collaboration across our global network of innovation centers in 17 countries. Our unique structure, combining Pharma and Diagnostics, allows us to accelerate research and development and enhance our understanding of disease mechanisms.

*   **Synergy between Diagnostics and Pharma:** Having Diagnostics and Pharma in one company is a powerful advantage that sets Roche apart.
*   **The Roche Group innovation engines:** Our innovation engines and further information
*   **Global geographical scale and reach:** The Roche Group has a truly global spread of research, diagnostics and pharma development, data analytics and genomic insight teams at locations across 17 countries, spanning four continents, all working to translate science and research into ground-breaking therapies and diagnostics.
*   **The value of partnerships:** We believe in the power of our strong partnering network driving pioneering scientific and technological breakthroughs across healthcare through diversity of cultures and scientific thinking.

We are committed to the highest ethical standards in research and development.

## Sustainability

Sustainability is deeply ingrained in our history and culture. We are dedicated to:

*   **Access to Innovation and Health Impact for All:** Maximizing access to our medicines and diagnostics worldwide.
*   **People:** Fostering a positive and inclusive work environment where employees can thrive.
*   **Environment:** Making a positive contribution to the health of nature, water systems, and climate.
*   **Climate Change:** We are committed to achieving net zero carbon emissions by 2045 and have set near- and long-term targets validated by SBTi.

## Careers at Roche

Join us in redefining healthcare! We offer a dynamic environment where you can drive innovation and make vital breakthroughs.

*   **Areas of Expertise** We work on diseases with the biggest social burden. That's why we foster a environment for everyone to drive innovations, share the skills and resources needed to make the vital breakthroughs that benefit us all.
*   **Benefits:** Our comprehensive benefits package includes competitive compensation, performance-based rewards, peer-to-peer recognition, flexible working arrangements, robust insurance options, parental benefits, and extensive well-being and mental health resources.
*   **Inclusion:** We empower everyone to show up as their best selves, embracing the unique qualities each person brings to our community because we know that innovation thrives on fresh perspectives and ideas.

## Connect With Us

*   [LinkedIn](https://www.linkedin.com/)
*   [Facebook](https://www.facebook.com/)
*   [Twitter](https://twitter.com/)
*   [Instagram](https://www.instagram.com/)
*   [YouTube](https://www.youtube.com/)

## Contact

For any questions or inquiries, please visit our [Contact Page](https://www.roche.com/contact).

## Locations

Roche is present in over 95 countries and regions. Find your contact person. See our [Locations](https://www.roche.com/contact).

© F. Hoffmann-La Roche Ltd

In [118]:
create_brochure("HuggingFace", "https://huggingface.co")

```markdown
# Hugging Face: The AI Community Building the Future

## Democratizing Good Machine Learning

Hugging Face is the leading collaboration platform for the machine learning community. We're on a mission to democratize good machine learning, one commit at a time.

### What We Offer

*   **Models:** Explore over 1 million pre-trained models for various tasks.
*   **Datasets:** Access and share over 250k datasets for any ML task.
*   **Spaces:** Build and host ML applications and demos.
*   **Enterprise Hub:** Enterprise-ready version of the world’s leading AI platform
*   **Inference Endpoints:** Deploy models on fully managed infrastructure.
*   **Compute:** Deploy on optimized Inference Endpoints or update your Spaces applications to a GPU in a few clicks.

### Open Source at Our Core

We build the foundation of ML tooling with the community, including:

*   **Transformers:** State-of-the-art ML for PyTorch, TensorFlow, JAX.
*   **Diffusers:** State-of-the-art Diffusion models in PyTorch.
*   **Safetensors:** Safe way to store/distribute neural network weights.
*   **Datasets:** Access & share datasets for any ML tasks.
*   **PEFT:** Parameter-efficient finetuning for large language models.

### Why Choose Hugging Face?

*   **Collaboration:** Host and collaborate on unlimited public models, datasets, and applications.
*   **Speed:** Move faster with the HF Open Source stack.
*   **Versatility:** Explore all modalities - text, image, video, audio, and even 3D.
*   **Portfolio Building:** Share your work with the world and build your ML profile.

### Enterprise Solutions

Give your team the most advanced platform to build AI with enterprise-grade security, access controls, and dedicated support.

**Features Include:**

*   Single Sign-On
*   Regions for data location management
*   Audit Logs
*   Resource Groups
*   Token Management
*   Analytics
*   Private Datasets Viewer
*   Priority Support

**Pricing:** Starting at $20/user/month

### Join the Community

More than 50,000 organizations are using Hugging Face, including:

*   AI2
*   AI at Meta
*   Amazon
*   Google
*   Intel
*   Microsoft
*   Grammarly
*   Writer
*   NVIDIA
*   Snowflake
*   Stability AI

### Careers

We are always looking for talented individuals to join our team. If you're passionate about democratizing good machine learning, [join us](https://huggingface.co/jobs)!

### Get Started

*   **Sign Up:** [https://huggingface.co/](https://huggingface.co/)
*   **Models:** [https://huggingface.co/models](https://huggingface.co/models)
*   **Datasets:** [https://huggingface.co/datasets](https://huggingface.co/datasets)
*   **Spaces:** [https://huggingface.co/spaces](https://huggingface.co/spaces)
*   **Enterprise:** [https://huggingface.co/enterprise](https://huggingface.co/enterprise)

### Contact

For press inquiries, contact our team [here](mailto:press@huggingface.co).

**Follow us:**

*   **GitHub:** [https://github.com/huggingface](https://github.com/huggingface)
*   **Twitter:** [https://twitter.com/huggingface](https://twitter.com/huggingface)
*   **LinkedIn:** [https://www.linkedin.com/company/hugging-face/](https://www.linkedin.com/company/hugging-face/)
*   **Discord:** [https://discord.com/](https://discord.com/)

**Hugging Face: Building the future of AI, together.**
```

In [None]:
create_brochure("Tesla", "https://www.tesla.com")

In [123]:
# Gradio App
import gradio as gr

In [124]:
view = gr.Interface(
    fn = create_brochure,
    inputs=[
        gr.Textbox(label="Company name"),
        gr.Textbox(label="Company's Landing Page URL")
    ],
    outputs=[gr.Markdown(label="Brochure:")],
    flagging_mode="never"                     
)
view.launch()

* Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.


