Skip to content

Scrapes all article links from the swedish newspaper Aftonbladet, from their sitemap!

License

Notifications You must be signed in to change notification settings

Kladdkaka/aftonbladet-links

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

aftonbladet-links

A simple scraper and NPM module to get all article links from Aftonbladet, currently 900k+ articles and counting!


IMPORTANT

No support is given for Node versions under v8.0.0!


Installation

npm i --save aftonbladet-links


Usage

This should be pretty straight forward, but I included an example that dumps all article links to a json file. :) Click here!


Test usage

DEBUG=* && npm i --dev && npm test


API Documentation

Table of Contents

Methods

getSitemap()

Will get all URLs to children sitemaps (one for each month) from parent (main)

Parameters:

None :)

Returns: Promise<Array>

The array contains all child sitemap URLs!

getLinksFromUrl(url)

Will get all URLs for articles (in code refered as a "link") from children sitemap

Parameters:

url string The child sitemap URL you want to get article links from :)

Returns: Promise<Array>

The array contains all article URLs!

getAll(limit = 5)

Will get all URLs for articles (in code refered as a "link") for all articles on the site!

Parameters:

limit string How many concurrent requests to use! (optional, default 5)

Returns: Promise<Array>

The array contains all article URLs!

About

Scrapes all article links from the swedish newspaper Aftonbladet, from their sitemap!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published