Skip to content

Data Processing Pipeline

Ding Ma edited this page Nov 5, 2020 · 3 revisions

Diagram of Data Processing Pipeline

Data-Processing-With Logos

Specs of our pipeline

As students, we would like to reduce the cost as much as possible. We figured out through testing that an E2 micro instance works for our scraper.

  • VMs:
    • E2 Micro instance (2 vCPU, 1GB RAM)
    • OS: Container optimized
    • Storage: 10GB persistent
  • Cloud function: serverless function set to 216mb of RAM. It gets triggered when there is an upload to the bucket.
  • Cloud Storage: 5GB located at us-east1 (South Carolina).

Cost analysis

  • VMs: 0.009$/h per instance. We are using four so total of ~25$/h.
  • Function: Very generous tier, we don't expect going over it.
  • Storage: Free 5GB, we don't expect to go other it either.

Total cost: Around 25$/month if the scrapers run 24/7.

Clone this wiki locally