# 🌐 Automatic Brochure Creation

**Overview**  
This project is an end-to-end system that combines web scraping and large language model (LLM) capabilities to extract, clean, and use webpage content for generating structured brochures.

---

## 🔍 Models Used

### 1. **LLaMA 3.2**  
- A powerful and efficient large language model capable of understanding complex web content and producing high-quality natural language summaries.

### 2. **DeepSeek R1:1.5B**  
- A lightweight yet effective open-source LLM, optimized for faster inference and resource-constrained environments, while still providing meaningful summarization.

---

## ✨ Features
- Extracts visible text from web pages.
- Cleans and preprocesses content for LLM input.
- Generates Brochure from the given input with the help of LLM.


## ✨ limitations
- Works best in static HTML pages which are not javascript heavy
- Sometime the links are empty as the model are local and not too powerful but it will still make ai_brochure from the home page                                                               

In [1]:
import os
import requests
import json
from typing import List
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
import ipywidgets as widgets
from IPython.display import display, clear_output
import ollama
from openai import OpenAI
ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')
from urllib.parse import urljoin

In [2]:
headers={"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"}
"""
This function get the text and get links which can be used to make brochure from the scrapped website
"""
class Website:
    def __init__(self,url,model_name):
        self.url=url
        self.model = model_name  # 'llama3.2' or 'deepseek-r1:1.5b'
        response=requests.get(url,headers=headers)
        self.body=response.content
        soup=BeautifulSoup(self.body, 'html.parser')
        self.title=soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text=""
        raw_links = [link.get('href') for link in soup.find_all('a')]
        self.links = [
        urljoin(self.url, link)
        for link in raw_links
        if link and link.startswith(("http://", "https://", "/"))  # ignore mailto:, javascript:
        ]


    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"
    
    def get_links(self):
        link_system_prompt = link_system_prompt = """You are provided with a list of links found on a webpage. 
        Your job is to select which of these links are most relevant to include in a brochure about the company. 
        Relevant links include: 
        - About page
        - Company overview
        - Careers or jobs page
        - Investor relations
        - Mission or vision
        - Leadership/management
        - Contact page
        
        Avoid including:
        - Terms of Service, Privacy Policy
        - Social media links (Facebook, Twitter, etc.)
        - Email or JavaScript links
        - Blog or Press pages unless they describe the company
        
        You should respond in JSON in this exact format:
        
        Example 1:
        {
            "links": [
                {"type": "about page", "url": "https://example.com/about"},
                {"type": "careers page", "url": "https://example.com/careers"},
                {"type": "contact page", "url": "https://example.com/contact"}
            ]
        }
        
        Example 2:
        {
            "links": [
                {"type": "company overview", "url": "https://abc-corp.com/who-we-are"},
                {"type": "leadership", "url": "https://abc-corp.com/management"},
                {"type": "investor relations", "url": "https://abc-corp.com/investor"}
            ]
        }
        
        Only include full URLs. Avoid relative links unless they are converted into absolute form. Always return the result in JSON.
        """

        user_prompt = f"Here is the list of links on the website of {self.links} - "
        user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
        Do not include Terms of Service, Privacy, email links.\n"
        user_prompt += "Links (some might be relative links):\n"
        user_prompt += "\n".join(self.links)
        response = ollama_via_openai.chat.completions.create(
        model=self.model,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        response_format={"type": "json_object"}
        )
        result = response.choices[0].message.content
        return json.loads(result)

class make_brochure:
    def __init__(self,url,model_name,company_name):
        self.url=url
        self.model = model_name  # 'llama3.2' or 'deepseek-r1:1.5b'
        self.company_name=company_name
        self.result = "Landing page:\n"
        web_scrapped= Website(self.url,self.model)
        self.result+=web_scrapped.get_contents()
        links = web_scrapped.get_links()
        # Safely get the list of link dicts from "links" key
        link_items = links.get("links", [])
        print("Found links:", links)
        for link in link_items:
            if isinstance(link, dict) and isinstance(link.get("url"), str) and isinstance(link.get("type"), str):
                self.result += f"\n\n{link['type']}\n"
                self.result += Website(link["url"],self.model).get_contents()
        


    def ai_brochure(self):
            system_prompt =  "You are an assistant that analyzes the contents of several relevant pages from a company website \
            and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
            Include details of company culture, customers and careers/jobs if you have the information and also display the full link whenever applied"
        
            user_prompt = f"You are looking at a company called: {self.company_name}\n"
            user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
            user_prompt+=self.result
                
            user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
            stream = ollama_via_openai.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],stream=True
            )
            response = ""
            display_handle = display(Markdown(""), display_id=True)
            for chunk in stream:
                response += chunk.choices[0].delta.content or ''
                response = response.replace("```","").replace("markdown", "")
                update_display(Markdown(response), display_id=display_handle.display_id)

In [44]:
"""
To get the output make a instance of make_brochure class and then call ai_brochure function
"""
LLM_1=make_brochure("https://indiaai.gov.in/",'llama3.2',"india ai")

Found links: {'links': [{'type': 'company overview', 'url': 'https://indiaai.gov.in/'}, {'type': 'careers page', 'url': 'https://indiaai.gov.in/raise'}, {'type': 'about us', 'url': 'https://indiaai.gov.in/about-us'}, {'type': 'mission or vision', 'url': 'https://indiaai.gov.in/about-us'}, {'type': 'leadership/management', 'url': 'https://indiaai.gov.in/about-us'}, {'type': 'events', 'url': 'https://indiaai.gov.in/events'}]}


In [46]:
LLM_1.ai_brochure()

**IndiaAI Brochure**
=====================

**Company Overview**
-------------------

IndiaAI is a leading platform for Artificial Intelligence (AI) research, innovation, and development. The organization aims to foster collaboration among researchers, industries, and governments to advance AI adoption in India.

### Pillars of IndiaAI

*   **Research and Development**: Publishing leading-edge research reports, articles, and videos on various AI topics.
*   **Resources and Ecosystem**: Providing a platform for knowledge sharing, research collaborations, and standardization of data elements for AI research and innovation.
*   **Sectors and Initiatives**: Focusing on specific sectors such as healthcare, education, and finance to drive AI adoption.

**Mission**
----------

To position IndiaAI as the pre-eminent AI hub in the Asia-Pacific region, promoting a culture of collaboration, innovation, and standardization.

### Our Values

*   **Innovation**: Embracing cutting-edge technologies and ideas.
*   **Collaboration**: Fostering partnerships among researchers, industries, and governments.
*   **Standardization**: Promoting consistency and quality in AI research and development.

**Our Customers**
-----------------

*   **Researchers and Academia**: Publishing leading research on various AI topics.
*   **Industries and Businesses**: Providing resources and support for AI adoption.
*   **Governments and Policy Makers**: Offering expertise on AI policy developments and standardization.

**Careers**
---------

Join our team of talented professionals working together to advance AI research and development in India. Check out available jobs on our [careers page](https://www.indiaai.org/careers).

**Get Involved**
-----------------

Stay updated with the latest news, articles, and events from IndiaAI. Subscribe to our newsletter and follow us on social media:

*   Newsletter: <https://www.indiaai.org/newsletter>
*   Social Media: Follow IndiaAI on Twitter, LinkedIn, and Facebook.

[Explore IndiaAI Portal](https://www.indiaai.org)

Join the AI revolution in India with IndiaAI.

In [51]:
LLM_2=make_brochure("https://indiaai.gov.in/",'deepseek-r1:1.5b',"india ai")

Found links: {'The Company': 'http://indiaai.gov.in/raise'}


In [52]:
LLM_2.ai_brochure()

<think>
Alright, I'm trying to figure out how to create a short brochure for the company called "IndiaAI." Let me start by breaking down what information is provided.

First, there's a landing page that lists several pillars like the IndiaAI Portal, Resources, Ecosystem, and more. These mention things like articles, videos, news, case studies, AI standards, learning resources, datasets, metadata, organizations related to research in AI, events, countries, top companies, initiatives by GOI, investment funds, people, government initiatives, and contact details.

Since the user provided a text from their landing page, my first step is to identify the key points that should be included in the brochure. The most critical elements might be about the company's mission, services, what makes them unique, industries they cater to, team, maybe their history or legacy.

I'll structure it starting with an introduction that highlights the company's focus on AI and its relevance. Then touch on their services— portal, resources, ecosystem—which adds value. Emphasize how India plays a key role in AI development. Mention their diverse industries like finance, healthcare, education. Highlight the strong team based on the profiles mentioned.

Since they don't provide detailed information, I need to make educated guesses but ensure accuracy. Maybe add something about innovation and cutting-edge technologies. Also, highlight growth and success through partnerships and competitions.

I should avoid jargon since it's intended for a general audience and keep it engaging. Perhaps use bullet points with bold headings if possible within the  structure.

Wait, the user provided specific contact details like email, phone, website URL, and a social media handle or link. I'll include those as part of the footer unless that section isn't included in the brochure.

Finally, wrap up with some keywords or SEO-friendly text to attract potential customers, investors, and recruits.
</think>

**IndiaAI: The Leader in Artificial Intelligence**

**Introduction**

At IndiaAI, we are at the forefront of artificial intelligence innovations, driving impactful solutions across various sectors. Our mission is to harness the power of technology to transform challenges into opportunities for individuals, businesses, and societies worldwide.

**What Are We Doing?**

Our offerings span a range of AI-supported services:

- **The IndiaAI Portal:** Access essential resources from an intuitive interface.
- **Resources & Ecosystem:** Discover innovative tools, articles, videos, and case studies to elevate your work.
- **Curated News & Insights:** Stay updated with the latest trends in AI.

**India's Role**

IndoAI is deeply engaged by India's growing impact. Our team prides itself on innovation, contributing to both academic and corporate advancements that benefit the nation as a whole.

**Unique Opportunities**

Our services cater to diverse industries:
- **Fintech & Finance:** Empowering financial decisions through technology.
- **Healthcare & Research:** Streamlining experiments in life sciences.
- **Education:** Enhancing learning experiences with AI tools.

**Our Team**

We have a group of skilled and enthusiastic professionals who are the cornerstone of our success, dedicated to pushing technological boundaries.

**About Us**

Founders are committed to leveraging technology for the greater good. Our team prides itself on our growth, contribution, and partnerships, fostering an environment of empowerment and innovation.

**Growth & Success**

Through strategic partnerships and international competitions, we ensure steady progress while keeping an eye on India's evolving contributions to AI.

**Contact Us**

**Email:** [Your Email Address]  
**Phone Number:** [Your Phone Number]  
**Website:** [Your Website URL]  

For more information or inquiries, reach out to us at [Your Contact Information via a link in the document]. Thank you for choosing IndiaAI as your partner for tomorrow's innovation and today's technology.