A simple Golang tool for scraping and storing blog posts. This tool extracts titles, authors, publication dates, and content, saving them to a local SQLite3 database.
- Scrapes blog content with ease
- Stores blog data in SQLite3 for quick access
- Golang: Core programming language
- goose: For DB Migrations
- SQLite3: For lightweight data storage
- html-to-markdown: For Parsing html blog content to markdown
-
Clone the repository:
git clone https://github.com/your-username/blog-scraper.git
-
Install dependencies:
go mod tidy- Run the scraper:
go run cmd/main/main.go- Create a Migration File:
goose -dir ./db/migrations create <migration_name> sql- Run the Migration:
goose -dir ./db/migrations sqlite3 ./db/blogs.db up- Roll Back the last Migration:
goose -dir ./db/migrations sqlite3 ./db/blogs.db down- Check Migration Status:
goose -dir ./db/migrations sqlite3 ./db/blogs.db status-
Build:
docker build --tag IMAGE_NAME . -
Run:
docker run -e SECRET_KEY="SECRET_KEY" -p 8000:8000 IMAGE_NAMESECRET_KEY: A secret key for authorization.