**Importing Libraries**

In [1]:
import os
import json
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from scraper import fetch_website_links, fetch_website_contents
from openai import OpenAI
load_dotenv(override=True)

True

**Declaration of API KEY & Model**

In [2]:
openai = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.getenv("Groq_API"))
MODEL="llama-3.3-70b-versatile"

**Featch Different Links From Given Websites and Selects Importent Links For Brochure**

In [3]:
links = fetch_website_links("https://www.volkschem.com/")
links

['company-profile.html',
 'about-us.html',
 'management-team.html',
 'achievement.html',
 'our-products.html',
 'https://www.volkschem.com/our-brand-products.html',
 'https://www.volkschem.com/b2b-product.html',
 'https://www.volkschem.com/institutional.html',
 'https://www.volkschem.com/export-products.html',
 'infastructure.html',
 'manufacturing-unit.html',
 'quality-department.html',
 'images/brochure.pdf',
 'quality-assurance.html',
 'certification.html',
 'ehs-policy.html',
 'product-evolution.html',
 'career.html',
 'contact-us.html',
 'company-overview.html',
 'company-overview.html',
 'vision-and-values.html',
 'ehs-policy.html',
 'management-team.html',
 'certification.html',
 'r-and-d.html',
 'manufacturing.html',
 'our-products.html',
 'https://www.volkschem.com/our-brand-products.html',
 'https://www.volkschem.com/b2b-product.html',
 'https://www.volkschem.com/institutional.html',
 'https://www.volkschem.com/export-products.html',
 'sitemap.html',
 'contact-us.html',
 '#ca

In [4]:
# Prompt for Removing Unrelevant Links

System_promt_1 = """
You are a link-filtering engine for company brochure generation. You receive a list of URLs or paths from a company website and must return only brochure-relevant landing pages as fully expanded absolute URLs belonging strictly to the base domain
 provided by the user. Always convert every relative path into a full absolute URL using the userâ€™s domain and never invent, replace, modify, or substitute the domain and never output placeholder domains like example.com. Keep only these page
   types if found: home, landing, about, company profile, who we are, team, products, product categories, services, solutions, catalog, contact, support, mission, vision, certifications, awards, achievements, clients, partners, testimonials,
     industries served, leadership, management. Always ignore and remove operational, backend, or technical pages including manufacturing, html, process, facility, production, lab, testing, qc, sops, and also remove mailto, tel, whatsapp,
       javascript, anchors, files, downloads, non-http protocols, external domains, social links, login, admin, and anything not belonging to the userâ€™s base domain. Output only a JSON list named 'links' where each item contains a 'type' and 
       the correct absolute 'url'. Do not add explanations or commentary unless asked. Only return clean filtered links in the required JSON format.

You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}

SYntax:
{
  "links": [
    {
      "type": "string",
      "url": "string"
    }
  ]
}

"""

def get_links_user_prompt(url):
    user_prompt = f"""
I extracted links from a company website.
Filter the links and return only brochure-relevant pages such as About, Products, Services, Catalog, Contact, Mission, Vision, Company Profile, Certifications, Clients, etc.
Remove all irrelevant, technical, operational, policy, media, or login/signup pages.
Also remove anything like manufacturing.html, process, facility, or deep backend pages.

Most Important:
Every link you return must be a full absolute URL, NOT a relative path.

Links (some might be relative links):

"""
    links = fetch_website_links(url)
    user_prompt += "\n".join(links)
    return user_prompt

In [5]:
# Filter Out Unrelevant Links

def select_relevant_links(url):
    print(f"Selecting relevant links for {url} by calling {MODEL}")
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": System_promt_1},
            {"role": "user", "content": get_links_user_prompt(url)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    links = json.loads(result)
    print(f"Found {len(links['links'])} relevant links")
    return links

In [6]:
select_relevant_links("https://www.volkschem.com/")

Selecting relevant links for https://www.volkschem.com/ by calling llama-3.3-70b-versatile
Found 15 relevant links


{'links': [{'type': 'About Page',
   'url': 'https://www.volkschem.com/about-us.html'},
  {'type': 'Company Profile',
   'url': 'https://www.volkschem.com/company-profile.html'},
  {'type': 'Management Team',
   'url': 'https://www.volkschem.com/management-team.html'},
  {'type': 'Achievements',
   'url': 'https://www.volkschem.com/achievement.html'},
  {'type': 'Products', 'url': 'https://www.volkschem.com/our-products.html'},
  {'type': 'Products',
   'url': 'https://www.volkschem.com/our-brand-products.html'},
  {'type': 'Products', 'url': 'https://www.volkschem.com/b2b-product.html'},
  {'type': 'Products', 'url': 'https://www.volkschem.com/institutional.html'},
  {'type': 'Products',
   'url': 'https://www.volkschem.com/export-products.html'},
  {'type': 'Bio Pesticide',
   'url': 'http://www.volkschem.com/bio-pesticide.html'},
  {'type': 'Bio Fertilizer',
   'url': 'http://www.volkschem.com/bio-fertilizer.html'},
  {'type': 'Certification',
   'url': 'https://www.volkschem.com/ce

**Creating Brochure**

In [7]:
# Fetching Page Contents for Landing Page and Relevant Links

def fetch_page_and_all_relevant_links(url):
    contents = fetch_website_contents(url)
    relevant_links = select_relevant_links(url)
    result = f"## Landing Page:\n\n{contents}\n## Relevant Links:\n"
    for link in relevant_links['links']:
        result += f"\n\n### Link: {link['type']}\n"
        result += fetch_website_contents(link["url"])
    return result

In [8]:
# Prompt for Brochure Generation

brochure_system_prompt = """
You create attractive, professional brochures in clean markdown (no code blocks).
Rewrite website content clearly, concisely, and in a well-structured manner.
You may use emojis in the brochure if they enhance clarity or visual appeal, but keep them tasteful.
Include typical brochure sections such as overview, mission, services, benefits, customers, culture, and contact.
"""

In [9]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"""
Create a polished and attractive brochure for {company_name} using the website content below.
Rewrite everything clearly in structured markdown (no code blocks).
You may include emojis where appropriate to make the brochure visually appealing.

Suggested sections:
- Company Overview
- Mission & Values
- Products / Services
- Benefits / Differentiators
- Customers / Industries
- Careers / Culture
- Contact Info

Website Content:
"""
    user_prompt += fetch_page_and_all_relevant_links(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [10]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [11]:
create_brochure("Volkschem", "https://www.volkschem.com/")

Selecting relevant links for https://www.volkschem.com/ by calling llama-3.3-70b-versatile
Found 16 relevant links


# Volkschem Crop Science (P) Limited
## Company Overview
ðŸŒ± Volkschem Crop Science (P) Limited is a leading manufacturer of bio pesticides, bio larvicides, and bio fertilizers, dedicated to providing effective agricultural products to resolve farming-related issues. Established in 2011, we have become one of the prominent players in the industry, committed to customer satisfaction and delivering high-quality products.

## Mission & Values
ðŸ’¡ Our mission is to ensure responsible and limitless growth by providing innovative agricultural solutions that benefit farmers and end-users. We value quality, customer satisfaction, and environmental sustainability, guiding our actions and decisions.

## Products / Services
ðŸŒ¿ We offer a wide range of products, including:
* Plant Growth Promoters
* Larvicides
* Pesticides
* Bio Fertilizers
* Bio Products

Our advanced infrastructure and cutting-edge technology enable us to manufacture high-quality products that meet industry standards.

## Benefits / Differentiators
ðŸŒŸ As a leading bio pesticide manufacturer, we offer:
* High-quality products that ensure better crop health and productivity
* Innovative approaches to agricultural solutions
* Advanced manufacturing facilities and technology
* Timely delivery of bulk orders without delays
* Competitive pricing and value for money

## Customers / Industries
ðŸŒ¾ Our products cater to various industries, including:
* Agriculture
* Horticulture
* Forestry
* Gardening
We serve clients across India and internationally, providing effective solutions to farming-related issues.

## Careers / Culture
ðŸŒ± At Volkschem, we foster a culture of innovation, teamwork, and continuous learning. We are committed to attracting and retaining top talent, providing a supportive work environment, and encouraging professional growth and development.

## Contact Info
ðŸ“ž Get in touch with us:
* Phone: +91 9574009098
* Email: [info@volkschem.com](mailto:info@volkschem.com)
* Corporate Office: C/806-807, Signature 2, Sanand Cross Road, Sarkhej, Ahmedabad-382210, Gujarat (India)
* Factory & Registered Office: Plot No.1 Survey No. 264 part, Bhayla Dhanwada Road, At. Bhayla, Ta. Bavla, Dist. Ahmedabad-382220

Join us in our mission to revolutionize the agricultural industry with innovative and effective solutions. ðŸŒŸ