This repository contains a simple web scraper written in Go. The scraper is designed to connect to a specified website using HTTPS, retrieve HTML content, and extract specific data based on HTML class names.
- Connects to websites using HTTPS.
- Reads and processes HTTP response headers.
- Handles both fixed
Content-Length
andTransfer-Encoding: chunked
responses. - Extracts content from HTML based on class names, ID and HTML tags.
main.go
: Contains the main function that drives the web scraping process.scraper.go
: Includes theFindStringInTag
,FindContentByID
andFindContentByClass
functions, which are used for parsing HTML and extracting content.
To use this scraper, you need to have Go installed on your machine. Download and install Go if you haven't already.
Clone the repository to your local machine:
git clone https://github.com/araujo88/GoScavenger.git
cd GoScavenger
- Open
main.go
. - Modify the
server
variable to specify the website you want to scrape. - Optionally, adjust the request headers according to your requirements.
- Run the scraper:
go run .
The output will be printed to the console.
Contributions to improve this simple web scraper are welcome. Feel free to fork the repository and submit pull requests.
This project is licensed under the GPL License - see the LICENSE file for details.