Skip to content
#

web-scraping

Here are 5,275 public repositories matching this topic...

crawlee

Crawlee—A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.

  • Updated Jun 4, 2024
  • TypeScript

This project automates the scraping of news articles from the United Daily News (UDN) website, filters and processes them using specified keywords and OpenAI's GPT for Named Entity Recognition (NER), and exports the categorized data into a CSV file.

  • Updated Jun 4, 2024
  • Jupyter Notebook

Improve this page

Add a description, image, and links to the web-scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the web-scraping topic, visit your repo's landing page and select "manage topics."

Learn more