-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the GDD App wiki! This collection of documents provides a comprehensive overview of the system architecture, development practices, and operational details.
- System Architecture - An overview of system architecture and its components.
- Data Architecture - Details on how data moves through the system and its structure.
- API Documentation - Information on the available API endpoints and usage.
- Yr API Compliance - Details regarding compliance with the yr.no API terms of service.
- AWS Services - Overview of AWS services utilized by the project.
- Containerisation - Information on Docker and container orchestration.
- Reverse Proxy - Details on Nginx configuration and routing.
- Security Architecture - Overview of the security measures and design.
- Testing Plan - Outline of the testing strategy, types of tests, and processes.
Please refer to the respective documents for detailed information on each topic.
The README focuses on setting up and running the application using Docker Compose for development purposes.
Detailed documentation of the application's components will be updated on repository's wiki.
Ensure you have the following installed:
- Docker Deskstop
-
Clone the repository:
git clone https://github.com/datatribe-collective/gdd-app.git cd gdd-app
-
Create a
.env
file:Copy the example environment file and fill in the required values. This file contains credentials and configuration for the database, Airflow, and MinIO.
cp example.env .env # Edit .env and fill in your details
Make sure to define at least the following variables in your
.env
file:-
POSTGRES_USER_E
,POSTGRES_PASSWORD_E
,POSTGRES_DB_E
(for the PostgreSQL database, meant for airflow metadata) -
AIRFLOW_USER_E
,AIRFLOW_PASSWORD_E
(for the Airflow admin user) -
MINIO_ACCESS_KEY
,MINIO_SECRET_KEY
(for MinIO credentials) -
MINIO_DATA_BUCKET_NAME
(the default bucket MinIO, create manually if needed prior running DAGs.) -
MINIO_API_PORT_E
,MINIO_CONSOLE_PORT_E
(optional, specify ports if needed, defaults are 9000 and 9001) -
NGINX_ALLOWED_IP_1E
,NGINX_ALLOWED_IP_2E
(for Nginx access control)
-
Navigate to the root directory of the project where docker-compose.yaml
is located.
-
Build and start the services:
These commands will build the necessary Docker images and starts all services defined in
docker-compose.yaml
in detached mode (-d
).docker compose build && docker compose up -d
The first run will take some time as it downloads base images, builds custom images, initializes Airflow, and sets up MinIO.
-
Initialize Airflow (First Run Only):
The
airflow-init
service handles database migrations and user creation. Wait for this service to complete successfully before proceeding. You can check its status:docker compose logs airflow-init
Look for messages indicating successful completion.
Once the services are up and running:
-
Airflow UI:
http://localhost:8080
(Login with the user/password from your.env
file) -
FastAPI Backend:
http://localhost:8000
if accessing directly -
Streamlit Frontend:
http://localhost:8501
if accessing directly -
MinIO Console:
http://localhost:9001
(Login with the access key/secret key from your.env
file)
- Access the Airflow UI at
http://localhost:8080
. - (Optional) Unpause the
bronze_data_fetcher_dag
(and any other DAGs) in the UI. - You can trigger the DAG manually via the UI for testing, or wait for the scheduled run.
To stop all running services and remove the containers (but preserve volumes like database and minio data):
docker compose down
To stop services and remove volumes (useful for a clean start, but data will be lost):
docker compose down -v
- Changes to DAG files (
./dags
) are automatically picked up by the Airflow scheduler and webserver due to the volume mount. - Changes to application code (in
data_fetcher
,universal
,gdd_counter
api_service
,streamlit_app
, etc) require rebuilding the respective Docker images and restarting the service (docker compose up --build -d <service_name>
).
- Gold Layer DAG
- Data quality, data fetching and validation logic evaluation, and their respective development.
- Security evaluation (by using OWASP Application Security Verification Standard).
- Development of Nginx reverse proxying with Let's Encrypt.
- Finalise Terraform setup.
- Integration testing.
- Deployment to AWS (EC2, S3).
- End-to-end testing.