Skip to content

agaricide/mugshots-client

Repository files navigation

mugshots-client

npm version

About

Unofficial Node.js client for mugshots.com. Exposes both a Readable Stream and an Async Iterator API for streaming Mugshot objects. 🚔👮

Usage

Install

npm i mugshots-client --s

Import

Typescript

import { MugshotStream, Mugshot } from 'mugshots-client';

Javascript (CommonJS)

const { MugshotStream } = require('mugshots-client');

API

import { MugshotStream, Mugshot } from 'mugshots-client';

(async () => {
  const mugshotStream = await MugshotStream({ maxChunkSize: 10 });
  console.log('Stream created.');

  mugshotStream.on('error', (error) => {
    console.log(error);
  });

  mugshotStream.on('close', () => {
    console.log('Stream closed.');
  });

  mugshotStream.on('data', (mugshots: Mugshot[]) => {
    console.log('data', mugshots);
  });
})();
import * as puppeteer from 'puppeteer';
import {
  CountyIterable,
  MugshotUrlChunkIterable,
  scrapeMugshots,
  PagePool,
  Mugshot
} from 'mugshots-client';

(async () => {
  const browser = await puppeteer.launch();
  const pagePool = PagePool(browser, { max: 10 });
  const page = await pagePool.acquire();
 
  const counties = await CountyIterable(page);
  for await (const county of counties) {
    const mugshotUrls = await MugshotUrlChunkIterable(page, county);
    for await (const chunk of mugshotUrls) {
      const mugshots = await scrapeMugshots(pagePool, chunk, { maxChunkSize: 20 });
      console.log(mugshots);
    }
  }
})();

Docs

FAQ

Why'd you make this? Isn't www.mugshots.com immoral?

My goals are to:

  1. subvert mugshots.com by making the watermarked records they re-publish from the public domain freely available for anyone to use
  2. bring attention to the moral implications for open records on the internet
  3. use this library for inequality and social justice research

Why'd you use Puppeteer? Isn't cheerio faster & doesn't it use less resources?

I chose Puppeteer to provide a path forward for obscuring scraping, to future-proof this software against censorship or TOS changes.

Here is an article on making headless Chrome undetectable. My goal is to provide an API for making an undetectable scraper. It will be impossible to detect scraping if we manipulate the Chrome browser's behavior and properties to mimic a human user's browser.