Skip to content

VincentKen/social-media-scanner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

social-media-scanner

Scan websites for links to social media

Support


Social Media Scanner supports the following links:

  • Facebook
  • Google +
  • Twitter
  • Linkedin
  • Instagram
  • Pinterest
  • Reddit
  • Tumblr
  • VK

Installation


You can install Social Media Scanner through npm

npm install --save social-media-scanner

Usage


Setup

Start by requiring the node module

var socialMediaScanner = require("social-media-scanner");

Then you can start scanning a website using .scan(url);. This will create a new scanner object for this website.

var site1 = socialMediaScanner.scan("http://example.com");

Events

Social Media Scanner has four events: pageStart, pageDone, done and error.
You can set callbacks for these events with:

site1.on("error", function (err) {});

pageStart

This event is called at the start of a scan of a page.
The callback for this event has a page object parameter and a skip method.
The page object has the following properties:

  • url: The url of the page which just started scanning
  • key: A unique key for this page
  • found: an object with a list of all found links (links: string[]) and a list of all found media(media: string[]). page.found.media and page.found.links are just empty arrays at the start of the scan

When the skip method is called the page won't be scanned

site1.on("pageStart", function (page, skip) {
  if (page.url === "http://example.com/example") {
    skip();
    return;
  }
  console.log("Started scanning: " + page.url);
});

pageDone

This event returns the same object as above except for page.found.media and page.found.links which now hold values

site1.on("pageDone", function (page) {
  console.log("Done scanning: " + page.url);
  console.log("Found media: " + page.found.media);
  console.log("Found links: " + page.found.links);
});

done

This event fires at the end of the scan and returns a list of all media found and all scanned pages

site1.on("done", function (media, pages) {
  console.log("Found: " + media); // Shows a string array with all found media
  console.log("Scanned pages: " + pages); // each page has the same structure as the pages in the previous events
});

error

This event fires everytime the scanner encounters an error.

site1.on("error", function (err) {
  console.log(err);
});

Properties

A scanner has 3 properties:

  • max: The max amount of pages to search through (default: 100).
  • interval: The amount of milliseconds between each page scan (default: 250).
  • skipExternalResources: Social Media Scanner loads external javascript files because some websites get their content from scripts (for example: React websites).
    Default: false
// skipExternalResources example
site1.on("pageStart", function (page) {
  if (page.url === "http://example.com/a-specific/path") {
    site1.skipExternalResources = false;
  } else {
    site1.skipExternalResources = true;
  }
});

methods

  • getMedia(): Returns a list of media to check for
  • addMedium(med: string[]|string): Add one medium or a list of media to the list of media which will be used to check urls
  • removeMedium(med: string[]|string): Remove one medium or a list of media from the list of media which will be used to check urls
  • blockURL(url: string): The URL passed to this method won't be scanned

Starting the scan

After everything is setup you can start the scan with:

site1.start();

About

Retrieve links to social media from websites

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published