Skip to content

a6b8/sitemap2doc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CircleCI

Sitemap 2 Doc

This module downloads all web pages listed in the Sitemap.xml file and compiles them into a single document.

Designed for AI Embedding Generation

Quickstart

Terminal

npm init -y && npm i sitemap2doc

Node index.mjs

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
await s2d.getDocument( {
    'projectName': 'test',
    'sitemapUrl': 'https://...'
} )

Terminal

node index.mjs

Table of Contents

Methods

getDocument()

Key Type Description Required Default
projectName String Set project name true
sitemapUrl String Set sitemap source true
silent Boolean Control terminal output false false

Example

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
await s2d.getDocument( {
    'projectName': 'test',
    'sitemapUrl': 'https://...'
} )
  Get Sitemap     https://...
  Get Pages       0 1 2 3 4 5 6 7 8 9  
  Merge           0 

getConfig()

Get current config, the default config you can find here: ./src/data/config.mjs

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4

s2d
   .setConfig( { config } )
   .getDocument( { ... } )

setConfig()

All module settings are stored in a config file, see ./src/data/config.mjs. This file can be completely overridden by passing an object during initialization.

import { Sitemap2Doc } from 'sitemap2doc'

const s2d = new Sitemap2Doc()
let config = s2d.getConfig()
config['download']['chunkSize'] = 4

s2d
   .setConfig( { config } )
   .getDocument( { ... } )

License

The module is available as open source under the terms of the Apache 2.0. License.

About

This module downloads all web pages listed in the Sitemap.xml file and compiles them into a single document.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published