Add Docker Images Workflow for main branch to deploy services.

Cdaprod · Mar 26, 2024 · 63c92bb · 63c92bb
1 parent 742dae9
commit 63c92bb
Show file tree

Hide file tree

Showing 10 changed files with 309 additions and 14 deletions.
diff --git a/.github/workflows/README.md b/.github/workflows/README.md
@@ -0,0 +1,35 @@
+# cdaprod/cda.deploy-to-swarm
+## .github/workflows/.
+### Build and Push Docker Images Workflow (build-latest.yml)
+
+This workflow is triggered on pushes or pull requests to the main branch that modify files in the minio, weaviate, or nginx directories. It can also be manually initiated. The workflow’s primary function is to build and push Docker images for specified services to Docker Hub.
+
+#### Triggers:
+- Pushes or pull requests affecting minio/**, weaviate/**, nginx/**.
+- Manual (workflow_dispatch).
+#### Jobs:
+- build-and-push: Executes the following steps for each service:
+- Checkout code: Fetches the latest version of the code from the repository.
+- Set up Docker Buildx: Prepares the environment for building multi-platform Docker images.
+- Login to Docker Hub: Authenticates to Docker Hub using credentials stored in GitHub secrets.
+- Build and push images: Constructs the Docker image for each service (MinIO, Weaviate, NGINX) and uploads them to Docker Hub, tagging them as latest and targeting linux/amd64 and linux/arm64 platforms.
+
+### Deploy Services Workflow (deploy-latest.yml)
+
+This workflow is set to run upon the successful completion of the "Build and Push Docker Images" workflow on the main branch, facilitating the deployment of services to Docker Swarm. It can also be initiated manually.
+
+#### Triggers:
+- Completion of the "Build and Push Docker Images" workflow (workflow_run).
+- Manual (workflow_dispatch).
+#### Jobs:
+- deploy: Executes the deployment process, encompassing the following actions:
+- Checkout Repository: Obtains the most current codebase from the repository.
+- Log in to Docker Hub: Authenticates to Docker Hub to ensure access to Docker images.
+- Deploy Stacks: Utilizes docker stack deploy with specific docker-compose files for deploying each service stack (MinIO, Weaviate, NGINX) to Docker Swarm. For MinIO, it sets environment variables like MINIO_ROOT_USER and MINIO_ROOT_PASSWORD from GitHub secrets for secure deployment.
+
+### Best Practices and Considerations
+
+- Ensure version specificity in Docker tags beyond latest for predictable deployments.
+- Keep sensitive data like passwords and API keys secure using GitHub secrets.
+- Consider deployment strategies and rollback plans for maintaining service availability.
+- Documentation within README.md should include clear descriptions of each workflow and step, ensuring maintainability and clarity for team members.
diff --git a/.github/workflows/build-latest.yml b/.github/workflows/build-latest.yml
@@ -7,12 +7,14 @@ on:
       - 'minio/**'
       - 'weaviate/**'
       - 'nginx/**'
+      - 'jupyter/**'
   pull_request:
     branches: [main]
     paths:
       - 'minio/**'
       - 'weaviate/**'
       - 'nginx/**'
+      - 'jupyter/**'
   workflow_dispatch:
 
 jobs:
@@ -49,12 +51,21 @@ jobs:
         push: true
         tags: cdaprod/cda-weaviate:latest
         platforms: linux/amd64,linux/arm64
-        
+
     - name: Build and push custom NGINX image
       uses: docker/build-push-action@v3
       with:
         context: ./nginx
         file: ./nginx/Dockerfile
         push: true
         tags: cdaprod/cda-nginx:latest
-        platforms: linux/amd64,linux/arm64
+        platforms: linux/amd64,linux/arm64
+
+    - name: Build and push custom JupyterLab image
+      uses: docker/build-push-action@v3
+      with:
+        context: ./jupyter
+        file: ./jupyter/Dockerfile
+        push: true
+        tags: cdaprod/custom-jupyterlab:latest
+        platforms: linux/amd64,linux/arm64
diff --git a/.github/workflows/deploy-latest.yml b/.github/workflows/deploy-latest.yml
@@ -37,4 +37,8 @@ jobs:
 
       - name: Deploy NGINX Stack
         run: |
-          docker stack deploy -c ./nginx/docker-compose.nginx.yaml nginx_stack
+          docker stack deploy -c ./nginx/docker-compose.nginx.yaml nginx_stack
+
+      - name: Deploy Jupytee Stack
+        run: |
+          docker stack deploy -c ./jupyter/docker-compose.jupyter.yaml jupyter_stack
diff --git a/README.md b/README.md
@@ -25,4 +25,48 @@
 3 directories, 11 files
 
 ```
-<!-- DIRECTORY_TREE_END -->
+<!-- DIRECTORY_TREE_END -->
+
+## Required Docker Swarm Secrets
+
+```bash
+echo "<your-openai-api-key>" | docker secret create OPENAI_API_KEY -
+echo "<your-minio-root-user>" | docker secret create MINIO_ROOT_USER -
+echo "<your-minio-root-password>" | docker secret create MINIO_ROOT_PASSWORD -
+echo "<your-langchain-tracing-v2-value>" | docker secret create LANGCHAIN_TRACING_V2 -
+echo "<your-langchain-api-key>" | docker secret create LANGCHAIN_API_KEY -
+echo "<your-langchain-project>" | docker secret create LANGCHAIN_PROJECT -
+echo "<your-weaviate-environment>" | docker secret create WEAVIATE_ENVIRONMENT -
+echo "<your-weaviate-api-key>" | docker secret create WEAVIATE_API_KEY -
+``` 
+
+
+## Example of extending additional services
+
+In a Docker Swarm deployment, especially when using separate repositories for different components of your system, it's not necessary to maintain a physical directory for the MinIO system control app within the `cdaprod/cda.deploy-to-swarm.git` repository. Instead, you can directly reference the MinIO system control Docker image within your Docker Compose file used for the deployment. This approach simplifies the deployment process and keeps your repositories focused on their specific purposes.
+
+### Including the MinIO System Control App in `docker-compose.yml`
+
+In your Docker Compose file within the `cdaprod/cda.deploy-to-swarm.git` repository, you would include a service definition for the MinIO system control app that references the Docker image built and pushed from the `cdaprod/cda.minio-system-control.git` repository. Here's an example of how you might define this service:
+
+```yaml
+services:
+  minio-system-control:
+    image: cdaprod/cda-minio-system-control:latest
+    ports:
+      - "8000:8000"
+    environment:
+      MINIO_ACCESS_KEY: "minio-access-key"
+      MINIO_SECRET_KEY: "minio-secret-key"
+    # Add other configurations as necessary
+```
+
+This service definition assumes that you have already built and pushed the Docker image `cdaprod/cda-minio-system-control:latest` to a Docker registry accessible by your Docker Swarm cluster.
+
+### Benefits of This Approach
+
+- **Separation of Concerns**: Keeping your application code in its dedicated repository (`cdaprod/cda.minio-system-control.git`) and your deployment configurations in another (`cdaprod/cda.deploy-to-swarm.git`) helps maintain clarity and separation of concerns.
+- **Modularity**: This method allows for more modular deployments. You can update, scale, or modify the MinIO system control application independently of other services defined in your Docker Compose file.
+- **Simplicity in Updates**: When updates are made to the MinIO system control application, you only need to rebuild and push the Docker image. The deployment can automatically use the latest image without needing to adjust the repository containing your Docker Compose files, assuming you use tags appropriately.
+
+Remember to update the Docker Compose file with the correct version of the Docker image if you're not using the `latest` tag, ensuring that your Swarm deployment always uses the intended version of each service.
diff --git a/docker-compose.yaml b/docker-compose.yaml
@@ -66,13 +66,34 @@ services:
     networks:
       - app_network
 
+  jupyterlab:
+    build: 
+      context: ./jupyter
+    image: custom-jupyterlab:latest
+    ports:
+      - "8888:8888"
+    volumes:
+      - ../usb/001/002:/dev/bus/usb/001/002
+      - jupyter_data:/home/jovyan/work
+    networks:
+      - app_network
+    environment:
+      - JUPYTER_ENABLE_LAB=yes
+    privileged: true
+
 networks:
   app_network:
     driver: overlay
 
 volumes:
   minio_data:
   weaviate_data:
+  jupyter_data:
+    driver: local
+    driver_opts:
+      type: none
+      device: /opt/jupyter_data
+      o: bind
 
 secrets:
   minio_root_user:

diff --git a/jupyter/Dockerfile b/jupyter/Dockerfile
@@ -0,0 +1,13 @@
+FROM jupyter/datascience-notebook
+
+# Install Edge TPU Python API library
+RUN echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list && \
+    curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - && \
+    sudo apt-get update && \
+    sudo apt-get install libedgetpu1-std python3-edgetpu
+
+# Install Clients
+RUN pip install minio weaviate-client transformers  openai langchain
+
+# Install requirements for OpenAI and LangChain integration
+RUN pip install pydantic bs4 poetry fastapi uvicorn docker unstructured
diff --git a/jupyter/README.md b/jupyter/README.md
@@ -0,0 +1,24 @@
+## Dockerfile
+
+To deploy docker jupyter lab container run the following commands on docker swarm leader (rpi-swarm runner), which is connected to Google Coral TPU
+
+`docker build -t cda-jupyter .`
+
+`docker run -it --privileged -v /dev/bus/usb/001/002:/dev/bus/usb -p 8888:8888 cda-jupyter`
+
+## Docker Compose
+
+- USB Access Volume: The first volume binding ../usb/001/002:/dev/bus/usb/001/002 assumes you have a specific USB device directory structure (../usb/001/002) on the host system that you want to map directly to the container. This path might need adjustment based on your host’s actual USB device path. It’s crucial for giving the Docker container access to the Coral TPU.
+- Persistent Data Volume for Notebooks: The jupyter_data volume is defined to persist your Jupyter notebooks and any other important data. It is mapped to /home/jovyan/work inside the container, which is the default working directory for Jupyter notebooks in the Docker image. The driver_opts section under the volumes definition sets up a bind mount from a local directory (${PWD}/jupyter_data) to the volume, ensuring data persistence across container restarts or rebuilds.
+
+### Running the Docker Compose File
+
+To deploy your service with these volumes, navigate to the directory containing your docker-compose.jupyter.yaml and run:
+
+`docker-compose -f docker-compose.jupyter.yaml up --build`
+
+This command builds the image if not present and starts the JupyterLab service, making it accessible at http://localhost:8888. Notebooks and other data saved in /home/jovyan/work inside the container will persist in the jupyter_data volume on your host machine.
+
+Important Note
+
+Remember, Docker and Docker Compose paths and volume bindings must accurately reflect your system’s directory structure and device file paths. The given example paths may need to be adjusted to match your environment.
diff --git a/jupyter/docker-compose.jupyter.yaml b/jupyter/docker-compose.jupyter.yaml
@@ -0,0 +1,29 @@
+version: '3.8'
+
+services:
+  jupyterlab:
+    build: 
+      context: ./jupyter
+    image: custom-jupyterlab:latest
+    ports:
+      - "8888:8888"
+    volumes:
+      - ../usb/001/002:/dev/bus/usb/001/002  # Bind mount for Coral TPU USB access
+      - jupyter_data:/home/jovyan/work  # Persistent volume for notebooks and data
+    networks:
+      - app_network
+    environment:
+      - JUPYTER_ENABLE_LAB=yes
+    privileged: true
+
+networks:
+  app_network:
+    external: true
+
+volumes:
+  jupyter_data:
+    driver: local
+    driver_opts:
+      type: none
+      device: ${PWD}/jupyter_data
+      o: bind
diff --git a/minio/README.md b/minio/README.md
@@ -0,0 +1,104 @@
+# My Default Buckets
+
+weaviate-backups
+cda-datasets
+raw-objects
+clean-objects
+my-prompt-bucket
+feature-store-bucket
+
+## How to Programatically create default bucket data
+
+Building and populating data in MinIO buckets programmatically can be achieved using various methods, depending on your specific needs and the nature of the data you're dealing with. You can use the MinIO Client (`mc`), MinIO's SDKs for different programming languages, or direct REST API calls. Below, I'll outline methods using the MinIO Client and Python SDK, as these are among the most common and versatile approaches.
+
+### Using MinIO Client (`mc`)
+
+The MinIO Client (`mc`) can be used for a wide range of bucket and object management tasks, including file uploads, setting policies, and mirroring data. To programmatically upload data to your buckets, you could write shell scripts that use `mc cp` or `mc mirror` for uploading files.
+
+#### Uploading a Single File
+
+```sh
+mc cp /path/to/your/file.txt myminio/your-bucket-name
+```
+
+#### Uploading Multiple Files or Directories
+
+```sh
+mc cp --recursive /path/to/your/directory myminio/your-bucket-name
+```
+
+#### Example Script for Uploading Data
+
+```sh
+#!/bin/sh
+
+# Define your bucket names and data sources
+declare -A buckets_and_data=(
+  ["weaviate-backups"]="/path/to/backup/data"
+  ["cda-datasets"]="/path/to/datasets"
+  # Add more as needed
+)
+
+# Loop through the associative array
+for bucket in "${!buckets_and_data[@]}"; do
+  data_source="${buckets_and_data[$bucket]}"
+  echo "Uploading data from $data_source to $bucket..."
+  mc cp --recursive "$data_source" myminio/"$bucket"
+done
+```
+
+### Using Python and MinIO Python SDK
+
+The MinIO Python SDK is a powerful tool for interacting with MinIO in a programmatic way, allowing for more complex operations and integration into your Python applications.
+
+First, ensure you have the MinIO Python SDK installed:
+
+```sh
+pip install minio
+```
+
+Then, you can write a Python script to upload files:
+
+#### Python Script Example
+
+```python
+from minio import Minio
+from minio.error import S3Error
+import os
+
+def upload_directory_to_bucket(minio_client, bucket_name, directory_path):
+    for root, _, files in os.walk(directory_path):
+        for file in files:
+            file_path = os.path.join(root, file)
+            # Define the object name in the bucket; here, it keeps the directory structure
+            object_name = os.path.relpath(file_path, start=directory_path)
+            try:
+                minio_client.fput_object(bucket_name, object_name, file_path)
+                print(f"Uploaded {file_path} as {object_name} in bucket {bucket_name}")
+            except S3Error as exc:
+                print(f"Failed to upload {file_path} to {bucket_name}: {exc}")
+
+if __name__ == "__main__":
+    # Create a MinIO client
+    minio_client = Minio(
+        "minio:9000",
+        access_key="your-access-key",
+        secret_key="your-secret-key",
+        secure=False  # Set to True for https
+    )
+
+    # Define your buckets and corresponding data directories
+    buckets_and_data = {
+        "weaviate-backups": "/path/to/backup/data",
+        "cda-datasets": "/path/to/datasets",
+        # Add more as needed
+    }
+
+    # Upload data for each bucket
+    for bucket, data_dir in buckets_and_data.items():
+        upload_directory_to_bucket(minio_client, bucket, data_dir)
+```
+
+This Python script demonstrates how to upload an entire directory's worth of files to specific MinIO buckets, maintaining the directory structure within the bucket. It iterates over a dictionary of bucket names and their corresponding local directories, uploading each file found within those directories to the correct bucket.
+
+By using these approaches, you can programmatically build and populate your MinIO buckets with the necessary data, either through shell scripts utilizing the `mc` tool or via Python scripts using MinIO's Python SDK.
diff --git a/minio/entrypoint.sh b/minio/entrypoint.sh
@@ -8,17 +8,27 @@ minio server /data --console-address ":9001" &
 # Wait for MinIO to start
 sleep 5
 
-# Set up alias and create bucket
-mc alias set myminio http://minio:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
+# Set up alias for MinIO
+mc alias set myminio http://minio:9000 "${MINIO_ROOT_USER}" "${MINIO_ROOT_PASSWORD}"
 
-# Before creating buckets, check if they already exist
-if ! mc ls myminio/weaviate-backups; then
-  mc mb myminio/weaviate-backups
-fi
+# Function to create a bucket if it doesn't exist
+create_bucket_if_not_exists() {
+  bucket_name=$1
+  if ! mc ls myminio/"${bucket_name}" &> /dev/null; then
+    echo "Creating bucket: ${bucket_name}"
+    mc mb myminio/"${bucket_name}"
+  else
+    echo "Bucket ${bucket_name} already exists."
+  fi
+}
 
-if ! mc ls myminio/cda-datasets; then
-  mc mb myminio/cda-datasets
-fi
+# Space-separated list of buckets to check and create if they don't exist
+buckets="weaviate-backups cda-datasets raw-objects clean-objects prompt-bucket feature-store-bucket"
+
+# Iterate over the list and create each bucket if it doesn't exist
+for bucket in $buckets; do
+  create_bucket_if_not_exists "$bucket"
+done
 
 # Keep the script running to prevent the container from exiting
-tail -f /dev/null
+tail -f /dev/null