ScrapeGraphAI · VinciGit00 · Oct 8, 2025 · Oct 8, 2025 · Oct 8, 2025
diff --git a/.agent/README.md b/.agent/README.md
@@ -212,6 +212,7 @@ Both SDKs support the following endpoints:
 | SmartScraper | ✅ | ✅ | AI-powered data extraction |
 | SearchScraper | ✅ | ✅ | Multi-website search extraction |
 | Markdownify | ✅ | ✅ | HTML to Markdown conversion |
+| Sitemap | ❌ | ✅ | Sitemap URL extraction |
 | SmartCrawler | ✅ | ✅ | Sitemap generation & crawling |
 | AgenticScraper | ✅ | ✅ | Browser automation |
 | Scrape | ✅ | ✅ | Basic HTML extraction |
@@ -259,6 +260,7 @@ Both SDKs support the following endpoints:
   - `searchScraper.js`
   - `crawl.js`
   - `markdownify.js`
+  - `sitemap.js`
   - `agenticScraper.js`
   - `scrape.js`
   - `scheduledJobs.js`

diff --git a/scrapegraph-js/README.md b/scrapegraph-js/README.md
@@ -451,6 +451,27 @@ const url = 'https://scrapegraphai.com/';
 })();
 ```
 
+### Sitemap
+
+Extract all URLs from a website's sitemap. Automatically discovers sitemap from robots.txt or common sitemap locations.
+
+```javascript
+import { sitemap } from 'scrapegraph-js';
+
+const apiKey = 'your-api-key';
+const websiteUrl = 'https://example.com';
+
+(async () => {
+  try {
+    const response = await sitemap(apiKey, websiteUrl);
+    console.log('Total URLs found:', response.urls.length);
+    console.log('URLs:', response.urls);
+  } catch (error) {
+    console.error('Error:', error);
+  }
+})();
+```
+
 ### Checking API Credits
 
 ```javascript
@@ -688,6 +709,21 @@ Starts a crawl job to extract structured data from a website and its linked page
 
 Converts a webpage into clean, well-structured markdown format.
 
+### Sitemap
+
+#### `sitemap(apiKey, websiteUrl, options)`
+
+Extracts all URLs from a website's sitemap. Automatically discovers sitemap from robots.txt or common sitemap locations.
+
+**Parameters:**
+- `apiKey` (string): Your ScrapeGraph AI API key
+- `websiteUrl` (string): The URL of the website to extract sitemap from
+- `options` (object, optional): Additional options
+  - `mock` (boolean): Override mock mode for this request
+
+**Returns:** Promise resolving to an object containing:
+- `urls` (array): List of URLs extracted from the sitemap
+
 ### Agentic Scraper
 
 #### `agenticScraper(apiKey, url, steps, useSession, userPrompt, outputSchema, aiExtraction)`

diff --git a/scrapegraph-js/examples/sitemap/README.md b/scrapegraph-js/examples/sitemap/README.md
@@ -0,0 +1,128 @@
+# Sitemap Examples
+
+This directory contains examples demonstrating how to use the `sitemap` endpoint to extract URLs from website sitemaps.
+
+## 📁 Examples
+
+### 1. Basic Sitemap Extraction (`sitemap_example.js`)
+
+Demonstrates the basic usage of the sitemap endpoint:
+- Extract all URLs from a website's sitemap
+- Display the URLs
+- Save URLs to a text file
+- Save complete response as JSON
+
+**Usage:**
+```bash
+node sitemap_example.js
+```
+
+**What it does:**
+1. Calls the sitemap API with a target website URL
+2. Retrieves all URLs from the sitemap
+3. Displays the first 10 URLs in the console
+4. Saves all URLs to `sitemap_urls.txt`
+5. Saves the full response to `sitemap_urls.json`
+
+### 2. Advanced: Sitemap + SmartScraper (`sitemap_with_smartscraper.js`)
+
+Shows how to combine sitemap extraction with smartScraper for batch processing:
+- Extract sitemap URLs
+- Filter URLs based on patterns (e.g., blog posts)
+- Scrape selected URLs with smartScraper
+- Display results and summary
+
+**Usage:**
+```bash
+node sitemap_with_smartscraper.js
+```
+
+**What it does:**
+1. Extracts all URLs from a website's sitemap
+2. Filters URLs (example: only blog posts or specific sections)
+3. Scrapes each filtered URL using smartScraper
+4. Extracts structured data from each page
+5. Displays a summary of successful and failed scrapes
+
+**Use Cases:**
+- Bulk content extraction from blogs
+- E-commerce product catalog scraping
+- News article aggregation
+- Content migration and archival
+
+## 🔑 Setup
+
+Before running the examples, make sure you have:
+
+1. **API Key**: Set your ScrapeGraph AI API key as an environment variable:
+   ```bash
+   export SGAI_APIKEY="your-api-key-here"
+   ```
+
+   Or create a `.env` file in the project root:
+   ```
+   SGAI_APIKEY=your-api-key-here
+   ```
+
+2. **Dependencies**: Install required packages:
+   ```bash
+   npm install
+   ```
+
+## 📊 Expected Output
+
+### Basic Sitemap Example Output:
+```
+🗺️  Extracting sitemap from: https://example.com/
+⏳ Please wait...
+
+✅ Sitemap extracted successfully!
+📊 Total URLs found: 150
+
+📄 First 10 URLs:
+   1. https://example.com/
+   2. https://example.com/about
+   3. https://example.com/products
+   ...
+
+💾 URLs saved to: sitemap_urls.txt
+💾 JSON saved to: sitemap_urls.json
+```
+
+### Advanced Example Output:
+```
+🗺️  Step 1: Extracting sitemap from: https://example.com/
+⏳ Please wait...
+
+✅ Sitemap extracted successfully!
+📊 Total URLs found: 150
+
+🎯 Selected 3 URLs to scrape:
+   1. https://example.com/blog/post-1
+   2. https://example.com/blog/post-2
+   3. https://example.com/blog/post-3
+
+🤖 Step 2: Scraping selected URLs...
+
+📄 Scraping (1/3): https://example.com/blog/post-1
+   ✅ Success
+...
+
+📈 Summary:
+   ✅ Successful: 3
+   ❌ Failed: 0
+   📊 Total: 3
+```
+
+## 💡 Tips
+
+1. **Rate Limiting**: When scraping multiple URLs, add delays between requests to avoid rate limiting
+2. **Error Handling**: Always use try/catch blocks to handle API errors gracefully
+3. **Filtering**: Use URL patterns to filter specific sections (e.g., `/blog/`, `/products/`)
+4. **Batch Size**: Start with a small batch to test before processing hundreds of URLs
+
+## 🔗 Related Documentation
+
+- [Sitemap API Documentation](../../README.md#sitemap)
+- [SmartScraper Documentation](../../README.md#smart-scraper)
+- [ScrapeGraph AI API Docs](https://docs.scrapegraphai.com)
diff --git a/scrapegraph-js/examples/sitemap/sitemap_example.js b/scrapegraph-js/examples/sitemap/sitemap_example.js
@@ -0,0 +1,72 @@
+import { sitemap } from 'scrapegraph-js';
+import fs from 'fs';
+import 'dotenv/config';
+
+/**
+ * Example: Extract sitemap URLs from a website
+ *
+ * This example demonstrates how to use the sitemap endpoint to extract
+ * all URLs from a website's sitemap.xml file.
+ */
+
+// Get API key from environment variable
+const apiKey = process.env.SGAI_APIKEY;
+
+// Target website URL
+const url = 'https://scrapegraphai.com/';
+
+console.log('🗺️  Extracting sitemap from:', url);
+console.log('⏳ Please wait...\n');
+
+try {
+  // Call the sitemap endpoint
+  const response = await sitemap(apiKey, url);
+
+  console.log('✅ Sitemap extracted successfully!');
+  console.log(`📊 Total URLs found: ${response.urls.length}\n`);
+
+  // Display first 10 URLs
+  console.log('📄 First 10 URLs:');
+  response.urls.slice(0, 10).forEach((url, index) => {
+    console.log(`   ${index + 1}. ${url}`);
+  });
+
+  if (response.urls.length > 10) {
+    console.log(`   ... and ${response.urls.length - 10} more URLs`);
+  }
+
+  // Save the complete list to a file
+  saveUrlsToFile(response.urls, 'sitemap_urls.txt');
+
+  // Save as JSON for programmatic use
+  saveUrlsToJson(response, 'sitemap_urls.json');
+
+} catch (error) {
+  console.error('❌ Error:', error.message);
+  process.exit(1);
+}
+
+/**
+ * Helper function to save URLs to a text file
+ */
+function saveUrlsToFile(urls, filename) {
+  try {
+    const content = urls.join('\n');
+    fs.writeFileSync(filename, content);
+    console.log(`\n💾 URLs saved to: ${filename}`);
+  } catch (err) {
+    console.error('❌ Error saving file:', err.message);
+  }
+}
+
+/**
+ * Helper function to save complete response as JSON
+ */
+function saveUrlsToJson(response, filename) {
+  try {
+    fs.writeFileSync(filename, JSON.stringify(response, null, 2));
+    console.log(`💾 JSON saved to: ${filename}`);
+  } catch (err) {
+    console.error('❌ Error saving JSON:', err.message);
+  }
+}
diff --git a/scrapegraph-js/examples/sitemap/sitemap_with_smartscraper.js b/scrapegraph-js/examples/sitemap/sitemap_with_smartscraper.js
@@ -0,0 +1,106 @@
+import { sitemap, smartScraper } from 'scrapegraph-js';
+import 'dotenv/config';
+
+/**
+ * Advanced Example: Extract sitemap and scrape selected URLs
+ *
+ * This example demonstrates how to combine the sitemap endpoint
+ * with smartScraper to extract structured data from multiple pages.
+ */
+
+const apiKey = process.env.SGAI_APIKEY;
+
+// Configuration
+const websiteUrl = 'https://scrapegraphai.com/';
+const maxPagesToScrape = 3; // Limit number of pages to scrape
+const userPrompt = 'Extract the page title and main heading';
+
+console.log('🗺️  Step 1: Extracting sitemap from:', websiteUrl);
+console.log('⏳ Please wait...\n');
+
+try {
+  // Step 1: Get all URLs from sitemap
+  const sitemapResponse = await sitemap(apiKey, websiteUrl);
+
+  console.log('✅ Sitemap extracted successfully!');
+  console.log(`📊 Total URLs found: ${sitemapResponse.urls.length}\n`);
+
+  // Step 2: Filter URLs (example: only blog posts)
+  const filteredUrls = sitemapResponse.urls
+    .filter(url => url.includes('/blog/') || url.includes('/post/'))
+    .slice(0, maxPagesToScrape);
+
+  if (filteredUrls.length === 0) {
+    console.log('ℹ️  No blog URLs found, using first 3 URLs instead');
+    filteredUrls.push(...sitemapResponse.urls.slice(0, maxPagesToScrape));
+  }
+
+  console.log(`🎯 Selected ${filteredUrls.length} URLs to scrape:`);
+  filteredUrls.forEach((url, index) => {
+    console.log(`   ${index + 1}. ${url}`);
+  });
+
+  // Step 3: Scrape each selected URL
+  console.log('\n🤖 Step 2: Scraping selected URLs...\n');
+
+  const results = [];
+
+  for (let i = 0; i < filteredUrls.length; i++) {
+    const url = filteredUrls[i];
+    console.log(`📄 Scraping (${i + 1}/${filteredUrls.length}): ${url}`);
+
+    try {
+      const scrapeResponse = await smartScraper(
+        apiKey,
+        url,
+        userPrompt
+      );
+
+      results.push({
+        url: url,
+        data: scrapeResponse.result,
+        status: 'success'
+      });
+
+      console.log('   ✅ Success');
+
+      // Add a small delay between requests to avoid rate limiting
+      if (i < filteredUrls.length - 1) {
+        await new Promise(resolve => setTimeout(resolve, 1000));
+      }
+
+    } catch (error) {
+      console.log(`   ❌ Failed: ${error.message}`);
+      results.push({
+        url: url,
+        error: error.message,
+        status: 'failed'
+      });
+    }
+  }
+
+  // Step 4: Display results
+  console.log('\n📊 Scraping Results:\n');
+  results.forEach((result, index) => {
+    console.log(`${index + 1}. ${result.url}`);
+    if (result.status === 'success') {
+      console.log('   Status: ✅ Success');
+      console.log('   Data:', JSON.stringify(result.data, null, 2));
+    } else {
+      console.log('   Status: ❌ Failed');
+      console.log('   Error:', result.error);
+    }
+    console.log('');
+  });
+
+  // Summary
+  const successCount = results.filter(r => r.status === 'success').length;
+  console.log('📈 Summary:');
+  console.log(`   ✅ Successful: ${successCount}`);
+  console.log(`   ❌ Failed: ${results.length - successCount}`);
+  console.log(`   📊 Total: ${results.length}`);
+
+} catch (error) {
+  console.error('❌ Error:', error.message);
+  process.exit(1);
+}