Skip to content

dennishylau/5003-project

Repository files navigation

5003 Project

Getting Started

Env Info

  • Python version: 3.9.7
  • Code formatter: autopep8
  • Spark: 3.1.2
    • Hadoop: 3.2.0
    • Scala: 2.12
    • py4j: 0.10.9

Prerequisite

  • Docker Desktop: link
    • You may have to go to Preferences > Resources to increase memory.
      Recommend > 6GB.
  • Conda: link

Steps

  1. Git clone and cd 5003-project.
  2. Duplicate .env.example and rename it to .env, update the credentials inside if needed
    (Tip: if you can't find the file, try opening the folder with an IDE)
  3. Update KAFKA_CONNECTION_STRING, KAFKA_TOPIC_NAME and ENV in .env accordingly.
    1. If ENV is not set or is dev, the ingestor will send messages to the local dockerized kafka broker.
    2. [Optional] To send messages to cloud endpoint (Azure Event Hubs), simply update KAFKA_CONNECTION_STRING, and set ENV to prod
  4. Run docker compose pull
  5. Run docker compose up to start services.
  6. Run docker compose down to stop the services.

Managing Conda Environment

  • Create conda environment with packages: conda env create -f environment.yml
  • Activate conda environment: conda activate 5003-project
  • Export conda package list: conda env export --no-builds --from-history > environment.yml

Start Dev Servers

  1. cd to project root
  2. API
    1. Call uvicorn src.backend_api.app.main:app --reload --env-file=".env" --app-dir="src/backend_api/app"
    2. Access docs at http://127.0.0.1:8000/latest/docs
  3. Notebook
    1. Call jupyter-lab --config=/jupyter_lab_config.py
    2. Access at http://127.0.0.1:9999/

Docker Compose

Example Files

Example notebooks can be found in the notebook directory

Additional Setup

Pytest

  • Config at setup.cfg

Credentials

Notebook

  • Token: 5003-project

TimescaleDB

  • DB Name: 5003-project-dev
  • Username: postgres
  • Password: 5003-project

Additional Docs

Troubleshoot

  • Question: TimescaleDB keeps complaining about WARNING: could not open statistics file "pg_stat_tmp/global.stat": Operation not permitted
    • Answer: This is a known problem documented in Postgres' official docker hub page. In short, it does not affect operation, and can be safely ignored.