**Importing Libraries**

In [1]:
import os
import json
from dotenv import load_dotenv
from IPython.display import Markdown, display, update_display
from scraper import fetch_website_links, fetch_website_contents
from openai import OpenAI
load_dotenv(override=True)

True

**Declaration of API KEY & Model**

In [2]:
openai = OpenAI(base_url="https://api.groq.com/openai/v1", api_key=os.getenv("Groq_API"))
MODEL= "openai/gpt-oss-120B"                       # "llama-3.3-70b-versatile"

**Featch Different Links From Given Websites and Selects Importent Links For Brochure**

In [3]:
links = fetch_website_links("https://www.volkschem.com/")
links

['company-profile.html',
 'about-us.html',
 'management-team.html',
 'achievement.html',
 'our-products.html',
 'https://www.volkschem.com/our-brand-products.html',
 'https://www.volkschem.com/b2b-product.html',
 'https://www.volkschem.com/institutional.html',
 'https://www.volkschem.com/export-products.html',
 'infastructure.html',
 'manufacturing-unit.html',
 'quality-department.html',
 'images/brochure.pdf',
 'quality-assurance.html',
 'certification.html',
 'ehs-policy.html',
 'product-evolution.html',
 'career.html',
 'contact-us.html',
 'company-overview.html',
 'company-overview.html',
 'vision-and-values.html',
 'ehs-policy.html',
 'management-team.html',
 'certification.html',
 'r-and-d.html',
 'manufacturing.html',
 'our-products.html',
 'https://www.volkschem.com/our-brand-products.html',
 'https://www.volkschem.com/b2b-product.html',
 'https://www.volkschem.com/institutional.html',
 'https://www.volkschem.com/export-products.html',
 'sitemap.html',
 'contact-us.html',
 '#ca

In [4]:
# Prompt for Removing Unrelevant Links

System_promt_1 = """
You are a link-filtering engine for company brochure generation. You receive a list of URLs or paths from a company website and must return only brochure-relevant landing pages as fully expanded absolute URLs belonging strictly to the base domain
 provided by the user. Always convert every relative path into a full absolute URL using the user‚Äôs domain and never invent, replace, modify, or substitute the domain and never output placeholder domains like example.com. Keep only these page
   types if found: home, landing, about, company profile, who we are, team, products, product categories, services, solutions, catalog, contact, support, mission, vision, certifications, awards, achievements, clients, partners, testimonials,
     industries served, leadership, management. Always ignore and remove operational, backend, or technical pages including manufacturing, html, process, facility, production, lab, testing, qc, sops, and also remove mailto, tel, whatsapp,
       javascript, anchors, files, downloads, non-http protocols, external domains, social links, login, admin, and anything not belonging to the user‚Äôs base domain. Output only a JSON list named 'links' where each item contains a 'type' and 
       the correct absolute 'url'. Do not add explanations or commentary unless asked. Only return clean filtered links in the required JSON format.

You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}

SYntax:
{
  "links": [
    {
      "type": "string",
      "url": "string"
    }
  ]
}

"""

def get_links_user_prompt(url):
    user_prompt = f"""
I extracted links from a company website.
Filter the links and return only brochure-relevant pages such as About, Products, Services, Catalog, Contact, Mission, Vision, Company Profile, Certifications, Clients, etc.
Remove all irrelevant, technical, operational, policy, media, or login/signup pages.
Also remove anything like manufacturing.html, process, facility, or deep backend pages.

Most Important:
Every link you return must be a full absolute URL, NOT a relative path.

Links (some might be relative links):

"""
    links = fetch_website_links(url)
    user_prompt += "\n".join(links)
    return user_prompt

In [5]:
# Filter Out Unrelevant Links

def select_relevant_links(url):
    print(f"Selecting relevant links for {url} by calling {MODEL}")
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": System_promt_1},
            {"role": "user", "content": get_links_user_prompt(url)}
        ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    links = json.loads(result)
    print(f"Found {len(links['links'])} relevant links")
    return links

In [6]:
select_relevant_links("https://www.volkschem.com/")

Selecting relevant links for https://www.volkschem.com/ by calling openai/gpt-oss-120B
Found 16 relevant links


{'links': [{'type': 'home page', 'url': 'https://www.volkschem.com/'},
  {'type': 'about page', 'url': 'https://www.volkschem.com/about-us.html'},
  {'type': 'company profile page',
   'url': 'https://www.volkschem.com/company-profile.html'},
  {'type': 'company profile page',
   'url': 'https://www.volkschem.com/company-overview.html'},
  {'type': 'team page',
   'url': 'https://www.volkschem.com/management-team.html'},
  {'type': 'achievements page',
   'url': 'https://www.volkschem.com/achievement.html'},
  {'type': 'products page',
   'url': 'https://www.volkschem.com/our-products.html'},
  {'type': 'products page',
   'url': 'https://www.volkschem.com/our-brand-products.html'},
  {'type': 'products page',
   'url': 'https://www.volkschem.com/b2b-product.html'},
  {'type': 'products page',
   'url': 'https://www.volkschem.com/export-products.html'},
  {'type': 'products page',
   'url': 'http://www.volkschem.com/bio-pesticide.html'},
  {'type': 'products page',
   'url': 'http://ww

**Creating Brochure**

In [7]:
# Fetching Page Contents for Landing Page and Relevant Links

def fetch_page_and_all_relevant_links(url):
    contents = fetch_website_contents(url)
    relevant_links = select_relevant_links(url)
    result = f"## Landing Page:\n\n{contents}\n## Relevant Links:\n"
    for link in relevant_links['links']:
        result += f"\n\n### Link: {link['type']}\n"
        result += fetch_website_contents(link["url"])
    return result

In [8]:
# Prompt for Brochure Generation

brochure_system_prompt = """
You create attractive, professional brochures in clean markdown (no code blocks).
Rewrite website content clearly, concisely, and in a well-structured manner.
You may use emojis in the brochure if they enhance clarity or visual appeal, but keep them tasteful.
Include typical brochure sections such as overview, mission, services, benefits, customers, culture, and contact.
"""

In [9]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"""
Create a polished and attractive brochure for {company_name} using the website content below.
Rewrite everything clearly in structured markdown (no code blocks).
You may include emojis where appropriate to make the brochure visually appealing.

Suggested sections:
- Company Overview
- Mission & Values
- Products / Services
- Benefits / Differentiators
- Customers / Industries
- Careers / Culture
- Contact Info

Website Content:
"""
    user_prompt += fetch_page_and_all_relevant_links(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [10]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": brochure_system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
        ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [11]:
create_brochure("Volkschem", "https://www.volkschem.com/")

Selecting relevant links for https://www.volkschem.com/ by calling openai/gpt-oss-120B
Found 16 relevant links


# üå± Volkschem Crop Science (P) Ltd. ‚Äì Brochure  

---

## üìñ Company Overview  
Volkschem Crop Science (P) Ltd. is a premier **manufacturer and exporter** of bio‚Äëpesticides, bio‚Äëlarvicides, bio‚Äëfertilizers, plant growth regulators and promoters. Founded in **2011** in Ahmedabad, Gujarat, we have grown into one of India‚Äôs most trusted names for sustainable agricultural solutions.  

- **Headquarters:** Sanand Cross Road, Sarkhej, Ahmedabad ‚Äì‚ÄØ382210, Gujarat, India  
- **Manufacturing & Registered Office:** Plot‚ÄØ1, Survey‚ÄØ264, Bhayla Dhanwada Road, Bavla ‚Äì‚ÄØ382220, Gujarat, India  
- **Phone:** +91‚ÄØ95740‚ÄØ09098  
- **Email:** info@volkschem.com  

Our state‚Äëof‚Äëthe‚Äëart facilities cover processing, packaging, storage, warehousing and distribution, ensuring that every product reaches the farmer in optimal condition.

---

## üéØ Mission & Values  

| **Mission** | To deliver innovative, environmentally‚Äëfriendly crop solutions that boost yield, protect crops and support the long‚Äëterm health of farming ecosystems. |
|-------------|------------------------------------------------------------|
| **Core Values** | **Quality** ‚Äì rigorous, ISO‚Äëaligned testing  <br> **Integrity** ‚Äì transparent dealings with every stakeholder  <br> **Sustainability** ‚Äì products that respect nature  <br> **Customer‚ÄëCentricity** ‚Äì value‚Äëfor‚Äëmoney solutions  <br> **Innovation** ‚Äì continual R&D for next‚Äëgeneration bio‚Äëproducts |

---

## üõ†Ô∏è Products & Services  

### üåø Bio‚ÄëProducts (Core Portfolio)  
| Category | Key Offerings | Primary Benefit |
|----------|--------------|-----------------|
| **Plant Growth Regulators & Promoters** | Hormone‚Äëbased stimulants, seed‚Äëtreatments | Faster, uniform growth; higher biomass |
| **Bio‚ÄëPesticides** | Neem‚Äëbased, Bacillus‚Äëbased, fungal antagonists | Target‚Äëspecific pest control, reduced chemical residue |
| **Bio‚ÄëLarvicides** | Mosquito‚Äëcontrol formulations | Safe water‚Äëbody protection, disease‚Äëvector reduction |
| **Bio‚ÄëFertilizers** | Rhizobium, Mycorrhizae, phosphate‚Äësolubilizing microbes | Improved nutrient uptake, soil health |

### üì¶ Additional Services  
- **Third‚ÄëParty Manufacturing (Contract Manufacturing)** ‚Äì full‚Äëscale production under client specifications.  
- **Export Solutions** ‚Äì bulk shipments with compliance to international phytosanitary standards.  
- **R&D Collaboration** ‚Äì joint development projects for custom bio‚Äësolutions.  

---

## ‚ú® Benefits / Differentiators  

- **Certified Quality:** ISO‚Äë9001, ISO‚Äë14001, and EHS compliance.  
- **Advanced Infrastructure:** Modern processing lines, climate‚Äëcontrolled storage, and automated packaging.  
- **Rapid Turn‚ÄëAround:** Scalable production capacity to meet bulk orders without delays.  
- **Eco‚ÄëFriendly Portfolio:** All products are biodegradable, non‚Äëtoxic to mammals and beneficial insects.  
- **Global Reach:** Export experience to over 20 countries, backed by reliable logistics partners.  

---

## üë• Customers & Industries  

| Segment | Typical Users |
|---------|---------------|
| **Commercial Agriculture** | Large‚Äëscale farms, agribusinesses, contract growers |
| **Horticulture & Floriculture** | Nurseries, greenhouse operators |
| **Public Health & Municipalities** | Water‚Äëbody management agencies (larvicide programs) |
| **Food Processing & Exporters** | Companies requiring residue‚Äëfree produce |
| **Research Institutions** | Universities and private labs testing bio‚Äëagents |

---

## üíº Careers & Culture  

Volkschem believes people are the engine of innovation.  

- **Collaborative Environment** ‚Äì cross‚Äëfunctional teams work together on product development and field trials.  
- **Learning & Growth** ‚Äì regular training, workshops and exposure to the latest biotechnologies.  
- **Safety First** ‚Äì robust EHS policies protect our employees and the environment.  
- **Diversity & Inclusion** ‚Äì open to talent from all backgrounds, fostering fresh perspectives.  

*Join us to shape the future of sustainable agriculture!*  

[Explore current openings ‚Üí](#)  

---

## üìû Contact Information  

| üìç **Corporate Office** | C/806‚Äë807, Signature‚ÄØ2, Sanand Cross Road, Sarkhej, Ahmedabad‚ÄØ‚Äë‚ÄØ382210, Gujarat, India |
|--------------------------|----------------------------------------------------------------------------------------|
| üè≠ **Factory & Registered Office** | Plot‚ÄØ1, Survey‚ÄØ264, Bhayla Dhanwada Road, At. Bhayla, Ta. Bavla, Dist. Ahmedabad‚ÄØ‚Äë‚ÄØ382220 |
| üìû **Phone** | +91‚ÄØ95740‚ÄØ09098 |
| üìß **Email** | info@volkschem.com |
| üåê **Website** | [www.volkschem.com](http://www.volkschem.com) |
| ‚è∞ **Working Hours** | Monday‚ÄëSaturday‚ÄØ:‚ÄØ10:00‚ÄØAM‚ÄØ‚Äì‚ÄØ6:30‚ÄØPM |

---

**Volkschem Crop Science ‚Äì Growing responsibly, delivering limitless growth.** üåæ