next-puppeteer-api-routes

A Next.js project that demonstrates using Puppeteer within API routes for server-side web scraping, automation, and rendering content to a web page. This setup integrates headless browser functionality inside of a Next.js application for dynamic content generation.

Features

Advanced Web Scraping: Extract specific content from webpages using CSS selectors and text search
Screenshot API: Capture screenshots of any webpage
PDF Generation: Convert HTML to PDF documents

Getting Started

First, install dependencies:

npm install

Next, run the development server:

npm run dev

Web Scraping Features

The advanced web scraping functionality (/scraper-ext) provides:

CSS Selector Targeting: Extract specific elements from webpages
Text Search: Filter elements containing specific text
Sibling Analysis: Extract content from elements that follow your target elements
Interactive UI: User-friendly interface for configuring scraping parameters

To use the scraping UI:

Navigate to http://localhost:3000/scraper-ext
Enter the URL you want to scrape
Specify a CSS selector (e.g., h1, .article-content, #main)
Optionally enter text to search for within those elements
View the extracted content with HTML formatting

Usage Options

1. Pull data via the browser:

Access the following endpoints in your browser:

Web Scraping UI: http://localhost:3000/scraper-ext - Interactive UI for extracting content from websites
Screenshots: http://localhost:3000/api/screenshot?url=https://www.bbc.co.uk
PDF Generation: http://localhost:3000/api/pdf

2. Pull data via the terminal:

If you have the command line utility curl installed, proceed as follows:

For web scraping:

curl "http://localhost:3000/api/scrape-ext?url=https://example.com&selector=h1&text=Example"

For screenshots:

curl -O --output-dir <path_to_local_directory> http://localhost:3000/api/screenshot\?url\=https://www.bbc.co.uk

For PDF generation:

curl -O --output-dir <path_to_local_directory> http://localhost:3000/api/pdf

Notes:

Replace <path_to_local_directory> with a local directory such as ~/Downloads
It may take a few seconds to generate screenshots or PDFs
The PDF example pulls HTML from the Next.js default page at http://localhost:3000

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
public		public
src/app		src/app
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
README.md		README.md
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

next-puppeteer-api-routes

Features

Getting Started

Web Scraping Features

Usage Options

1. Pull data via the browser:

2. Pull data via the terminal:

For web scraping:

For screenshots:

For PDF generation:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

semsion/next-puppeteer-api-routes

Folders and files

Latest commit

History

Repository files navigation

next-puppeteer-api-routes

Features

Getting Started

Web Scraping Features

Usage Options

1. Pull data via the browser:

2. Pull data via the terminal:

For web scraping:

For screenshots:

For PDF generation:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages