Skip to content
Given a list of URLs get their content and download images (or other data) from each page.
JavaScript
Branch: master
Clone or download

Latest commit

Latest commit e574b0f Sep 13, 2019

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Create downloader script Apr 30, 2019
README.md Update README Sep 13, 2019
index.js Filter out possibly null values Sep 13, 2019
package-lock.json 1.0.5 Sep 13, 2019
package.json 1.0.5 Sep 13, 2019

README.md

Downloader

Badly named npm package.

Given a list of URLs, this module will collect all the images on each URL and store them in separate PDF files.

Single steps are:

  • asynchronously get HTML content for each URL
  • extract image URLs using the given locator function (using after-load)
  • asynchronously collect all images and merge them to a PDF

Use case

I wanted to collect images of houses from several real estate websites, as inspiration.

Libraries used

Install

npm -S i multiple-urls-images-downloader

How to use

NOTE: You always need to provide the Roboto fonts for the PDF generation (required by pdfmake). You can also provide additional custom fonts if you prefer.

const muid = require('multiple-urls-images-downloader');

const config = {
  // Mandatory list of URLs to inspect
  urls: ['url1', 'url2'],

  // Destination dir where to store the PDF files
  // Defaults to './documents'
  dir: './my_dir',

  // Defaults to the url without "/" or ":" or "."
  getTitle: url => url,

  // List of fonts
  fonts: {
    // Mandatory
    Roboto: {
      normal: './fonts/Roboto-Regular.ttf',
      bold: './fonts/Roboto-Medium.ttf',
      italics: './fonts/Roboto-Italic.ttf',
      bolditalics: './fonts/Roboto-MediumItalic.ttf',
    },
    // Optional
    customFont: {
      normal: 'path_to_font.tff',
      bold: 'path_to_font.tff',
      italics: 'path_to_font.tff',
      bolditalics: 'path_to_font.tff',
    },
  },

  // Mandatory
  // Locator function. muid will pass the html string and the $ cheerio object
  // ($ is provided by after-load)
  getImagesHref: (html, $) => {
    const images = [];
    $('img[src^="img/photos"]').each(function() {
      images.push($(this).attr('src'));
    });
    return images;
  },
};

muid(config);
You can’t perform that action at this time.