Skip to content
This repository has been archived by the owner on Mar 28, 2024. It is now read-only.

Node.js PC part data and pricing scraper for the PCPartsTool project

License

Notifications You must be signed in to change notification settings

PScoriae/PCPartsTool-Scraper

Repository files navigation

PCPartsTool Scraper

Table of Contents
  1. About
  2. Installation

About

This project is the companion scraping tool to the PCPartsTool WebApp. It gathers the most popular PC hardware parts from Lazada.com.my and pushes said data to the PCPartsTool database.

Note: This is just one of multiple repositories that contribute to the PCPartsTool project. Here are all the related repositories:

Repository Built With Description
PCPartsTool SvelteKit, TypeScript, Tailwind CSS, MongoDB, Jenkins, Docker, Playwright The SvelteKit MongoDB WebApp
PCPartsTool-Scraper JavaScript, Jenkins, Docker Scraping Script to Gather E-commerce Item Data
terraform-infra Terraform, Cloudflare, AWS Terraform IaC for PCPartsTool Cloud Infrastructure
ansible-ec2 Ansible, Prometheus, Grafana, Nginx, AWS Ansible CaC for AWS EC2 Bootstraping, Observability and Maintenance

Installation

This section guides you on how to setup the scraper for use in the context of the PCPartsTool project.

  1. Fork the repository.

  2. In your desired project folder, clone the project with the following command:

    git clone https://github.com/yourUsername/PCPartsTool-Scraper
  3. Add a .env file to the root directory of your project. You may refer to .env.example. It's for the scraper to point to the desired database.

  4. Ensure pnpm is installed globally on your dev system. If not, run the following command in your terminal:

    npm i -g pnpm
  5. Finally, install all dependencies:

    pnpm i

Deployment

Locally

This is if you want to locally host this using the docker-compose.local.yaml file in PCPartsTool.

  1. Dockerize the project using the following command:
    docker build -t pcpartstool-scraper:latest .
  2. Continue with the local deployment instructions in PCPartsTool.

Cloud

This assumes that you have followed the flow of the rest of the repositories in the PCPartsTool project. Jenkins will Dockerize this project on each push to main and then store the image locally in the registry. Then, it will get pulled by the main PCPartsTool Jenkins pipeline.

  1. Setup a GitHub Webhook on your forked repository to point to your Jenkins instance.
  2. Add a new Jenkins Pipeline job and point it to your forked repo with the following enabled:
    • Do not allow concurrent builds
    • GITScm polling
    • Pipeline Script from SCM
    • Repository URL should be whatever your forked repository's URL is
    • Branches to build: */main
    • Additional Behaviours:
      • Polling ignores commits in certain paths: README.md in Excluded regions

About

Node.js PC part data and pricing scraper for the PCPartsTool project

Topics

Resources

License

Stars

Watchers

Forks