Skip to content

ahmadnouh97/blog-scraper-qa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blog Scraper

A simple Golang tool for scraping and storing blog posts. This tool extracts titles, authors, publication dates, and content, saving them to a local SQLite3 database.

Features

  • Scrapes blog content with ease
  • Stores blog data in SQLite3 for quick access

Tech Stack

  • Golang: Core programming language
  • goose: For DB Migrations
  • SQLite3: For lightweight data storage
  • html-to-markdown: For Parsing html blog content to markdown

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/blog-scraper.git
  2. Install dependencies:

go mod tidy
  1. Run the scraper:
go run cmd/main/main.go

Migrations

  1. Create a Migration File:
goose -dir ./db/migrations create <migration_name> sql
  1. Run the Migration:
goose -dir ./db/migrations sqlite3 ./db/blogs.db up
  1. Roll Back the last Migration:
goose -dir ./db/migrations sqlite3 ./db/blogs.db down
  1. Check Migration Status:
goose -dir ./db/migrations sqlite3 ./db/blogs.db status

Build & Run with Docker

  • Build:

    docker build --tag IMAGE_NAME .
  • Run:

    docker run -e SECRET_KEY="SECRET_KEY" -p 8000:8000 IMAGE_NAME

    SECRET_KEY: A secret key for authorization.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published