Article Metadata Extractor

The Article Metadata Extractor is a JavaScript library that allows you to extract essential metadata from web pages containing articles. It is designed to be used in Node.js 18+.

How It Works

The library uses the Cheerio library to parse and traverse HTML content. It fetches the HTML content of a given URL and extracts the following information from the webpage:

Title: The title of the article.
Image: The illustrative image associated with the article.
Author: The name of the article's author.
Tags: A list of keywords or tags associated with the article.
Publication Date: The date when the article was published.
Read Time: An estimated reading time for the article in minutes.
Description: A short excerpt or description of the article.

With NPX

npx article-metadata-extractor <ARTICLE-URL>

Using in Project

Install dependency:

npm install article-metadata-extractor

Import the getArticleMetaData function into your project and use it:

const url = 'https://example.com/article';

getArticleMetaData(url)
  .then(metadata => {
    console.log(metadata);
    // Use the extracted metadata as needed
  })
  .catch(error => {
    console.error('Error:', error.message);
  });

Output

/**
 * Represents the extracted metadata from an article's webpage.
 */
interface ArticleMetadata {
  /**
   * The title of the article.
   */
  title: string;

  /**
   * The URL of the illustrative image associated with the article.
   */
  image: string;

  /**
   * The name of the article's author.
   */
  author: string;

  /**
   * An array of keywords or tags associated with the article.
   */
  tags: string[];

  /**
   * The date when the article was published in ISO 8601 format.
   */
  publicationDate: string | null;

  /**
   * An estimated reading time for the article in minutes based on an average reading speed.
   */
  readTime: number;

  /**
   * A short excerpt or description of the article.
   */
  description: string;
}

Licence

This code is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
bin		bin
.gitignore		.gitignore
README.md		README.md
example.js		example.js
getArticleMetaData.js		getArticleMetaData.js
index.d.ts		index.d.ts
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bin

bin

.gitignore

.gitignore

README.md

README.md

example.js

example.js

getArticleMetaData.js

getArticleMetaData.js

index.d.ts

index.d.ts

index.js

index.js

package-lock.json

package-lock.json

package.json

package.json

Repository files navigation

Article Metadata Extractor

How It Works

With NPX

Using in Project

Output

Licence

About

Releases 3

Languages

article-metadata-extractor/article-metadata-extractor

Folders and files

Latest commit

History

Repository files navigation

Article Metadata Extractor

How It Works

With NPX

Using in Project

Output

Licence

About

Topics

Resources

Stars

Watchers

Forks

Languages