Skip to content

RobertFainarea/cheerio-mapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cheerio-mapper

Cheerio plugin that extracts information from a html string based on a configuration parameter

Installation

npm install cheerio-mapper

Cheerio is listed as a peer dependency.

API

target properties

property type
attr string
data string
innerHTML boolean
outerHTML boolean
text default

mapping properties

property type
key string
path string
multiple bool
nodes array

Examples

usage

const Mapper = require('cheerio-mapper');
const result = new Mapper(html, map).extract();

basic example

const html = `
  <section >
    <div class="actor">
      <h1>Will Smith</h1>
      <p>September 25, 1968</p>
    </div>

    <div class="movie">
      <h1>Men in black</h1>
      <p>July 2, 1997</p>
    </div>
  </section>
`;
const map = [
  {
    key: 'actor',
    path: '.actor',
    nodes: [{ key: 'name', path: 'h1' }, { key: 'born', path: 'p' }],
  },
  {
    key: 'movie',
    path: '.movie',
    nodes: [{ key: 'title', path: 'h1' }, { key: 'year', path: 'p' }],
  },
];

new Mapper(html, map).extract();
/* => {
  actor: {
    name: "Will Smith",
    born: "September 25, 1968"
  },
  movie: {
    title: "Men in black",
    year: "July 2, 1997"
  }
} */

advanced example

const html = `
  <div>
    <div class="actor">
      <h1>Will Smith</h1>
      <p>Born September 25, 1968...</p>
      <ul>
        <li>I am legend</li>
        <li>Men in black</li>
        <li>Independence day</li>
      </ul>
    </div>
    <div class="actor">
      <h1>Christian Bale</h1>
      <p>Born January 30, 1974...</p>
      <ul>
        <li>The dark knight</li>
        <li>American hustle</li>
      </ul>
    </div>

    <div class="links">
      <a href="www.foobar.com?actor=smith">Will Smith</a>
      <a href="www.foobar.com?actor=bale">Christian Bale</a>
    </div>
  </div>
`;
const map = [
  {
    key: 'actors',
    path: '.actor',
    multiple: true,
    nodes: [
      { key: 'name', path: 'h1' },
      { key: 'bio', path: 'p' },
      { key: 'movies', path: 'ul li', multiple: true },
    ],
  },
  {
    key: 'linkQueries',
    path: '.links a',
    attr: 'href',
    multiple: true,
    parse: l => l.split('?')[1],
  },
];

new Mapper(html, map).extract();
/* => {
  actors: [{
    name: "Will Smith",
    bio: "Born September 25, 1968...",
    movies: ["I am legend", "Men in black", "Independence day"]
  }, {
    name: "Christian Bale",
    bio: "Born January 30, 1974...",
    movies: ["The dark knight", "American hustle"]
  }],
  linkQueries: ["actor=smith", "actor=bale"]
} */

License

ISC

About

Extracts information from a html string based on a configuration parameter

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published