GitHub - dorianfong98/web-scraper: A web scraper built to extract data from any web page quickly and accurately. This specifically extracts headlines and article links from UK news source The Guardian, but you may modify and adapt the code accordingly for any other Website's HTML elements.

Table of Contents

About The Project
Getting Started
Contributing
Contact

About The Project

This is a Web Scraper application built to extract data from any web page quickly and accurately.

Built with Javascript, Node.js, and Express.js, Axios and Cheerio dependencies for npm.

This particular application I have coded specifically extracts headlines and article links from UK news source The Guardian, but you may modify and adapt the code accordingly for any other Website's HTML elements.

Express.js is a Node.js back end web application framework that provides broad features for building web and mobile applications. It is used to build a single page, multipage, and hybrid web application.

Cheerio is a package to pick out HTML elements on a web page. It works by parsing markup and provides an API for traversing and manipulating the resulting data structure. Cheerio's selector implementation is nearly identical to that of jQuery.

Axios, a rather popular and widely-used package, is a promise-based HTTP client for the browser and Node.js. Axios essentially makes it easy to send HTTP requests to rest endpoints and perform CRUD operations - this means that it can be used to get, post, put and delete data.

Getting Started

To get started using the application, simply download the .zip file and open the web-scraper.exe executable.

Note: As this file runs on PORT 3000, make sure that it is not already in use. To check and kill processes running on the port:

For Linux/Mac OS search (sudo) run this in the terminal:

$ lsof -i tcp:3000
$ kill -9 PID

On Windows:

netstat -ano | findstr :3000
tskill typeyourPIDhere

(back to top)

Contributing

This particular application I have coded specifically extracts headlines and article links from UK news source The Guardian, but you may modify and adapt the code accordingly for any other Website's HTML elements.

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

Contact

Dorian Fong - My Website | My Email | My LinkedIn

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
node_modules		node_modules
README.md		README.md
index-linux		index-linux
index-macos		index-macos
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
web-scraper.exe		web-scraper.exe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About The Project

Getting Started

Contributing

Contact

About

Releases

Packages

Languages

dorianfong98/web-scraper

Folders and files

Latest commit

History

Repository files navigation

About The Project

Getting Started

Contributing

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages