Skip to content

Commit

Permalink
Add Docker Images Workflow for main branch to deploy services.
Browse files Browse the repository at this point in the history
  • Loading branch information
Cdaprod committed Mar 26, 2024
1 parent 742dae9 commit 63c92bb
Show file tree
Hide file tree
Showing 10 changed files with 309 additions and 14 deletions.
35 changes: 35 additions & 0 deletions .github/workflows/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# cdaprod/cda.deploy-to-swarm
## .github/workflows/.
### Build and Push Docker Images Workflow (build-latest.yml)

This workflow is triggered on pushes or pull requests to the main branch that modify files in the minio, weaviate, or nginx directories. It can also be manually initiated. The workflow’s primary function is to build and push Docker images for specified services to Docker Hub.

#### Triggers:
- Pushes or pull requests affecting minio/**, weaviate/**, nginx/**.
- Manual (workflow_dispatch).
#### Jobs:
- build-and-push: Executes the following steps for each service:
- Checkout code: Fetches the latest version of the code from the repository.
- Set up Docker Buildx: Prepares the environment for building multi-platform Docker images.
- Login to Docker Hub: Authenticates to Docker Hub using credentials stored in GitHub secrets.
- Build and push images: Constructs the Docker image for each service (MinIO, Weaviate, NGINX) and uploads them to Docker Hub, tagging them as latest and targeting linux/amd64 and linux/arm64 platforms.

### Deploy Services Workflow (deploy-latest.yml)

This workflow is set to run upon the successful completion of the "Build and Push Docker Images" workflow on the main branch, facilitating the deployment of services to Docker Swarm. It can also be initiated manually.

#### Triggers:
- Completion of the "Build and Push Docker Images" workflow (workflow_run).
- Manual (workflow_dispatch).
#### Jobs:
- deploy: Executes the deployment process, encompassing the following actions:
- Checkout Repository: Obtains the most current codebase from the repository.
- Log in to Docker Hub: Authenticates to Docker Hub to ensure access to Docker images.
- Deploy Stacks: Utilizes docker stack deploy with specific docker-compose files for deploying each service stack (MinIO, Weaviate, NGINX) to Docker Swarm. For MinIO, it sets environment variables like MINIO_ROOT_USER and MINIO_ROOT_PASSWORD from GitHub secrets for secure deployment.

### Best Practices and Considerations

- Ensure version specificity in Docker tags beyond latest for predictable deployments.
- Keep sensitive data like passwords and API keys secure using GitHub secrets.
- Consider deployment strategies and rollback plans for maintaining service availability.
- Documentation within README.md should include clear descriptions of each workflow and step, ensuring maintainability and clarity for team members.
15 changes: 13 additions & 2 deletions .github/workflows/build-latest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,14 @@ on:
- 'minio/**'
- 'weaviate/**'
- 'nginx/**'
- 'jupyter/**'
pull_request:
branches: [main]
paths:
- 'minio/**'
- 'weaviate/**'
- 'nginx/**'
- 'jupyter/**'
workflow_dispatch:

jobs:
Expand Down Expand Up @@ -49,12 +51,21 @@ jobs:
push: true
tags: cdaprod/cda-weaviate:latest
platforms: linux/amd64,linux/arm64

- name: Build and push custom NGINX image
uses: docker/build-push-action@v3
with:
context: ./nginx
file: ./nginx/Dockerfile
push: true
tags: cdaprod/cda-nginx:latest
platforms: linux/amd64,linux/arm64
platforms: linux/amd64,linux/arm64

- name: Build and push custom JupyterLab image
uses: docker/build-push-action@v3
with:
context: ./jupyter
file: ./jupyter/Dockerfile
push: true
tags: cdaprod/custom-jupyterlab:latest
platforms: linux/amd64,linux/arm64
6 changes: 5 additions & 1 deletion .github/workflows/deploy-latest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,8 @@ jobs:
- name: Deploy NGINX Stack
run: |
docker stack deploy -c ./nginx/docker-compose.nginx.yaml nginx_stack
docker stack deploy -c ./nginx/docker-compose.nginx.yaml nginx_stack
- name: Deploy Jupytee Stack
run: |
docker stack deploy -c ./jupyter/docker-compose.jupyter.yaml jupyter_stack
46 changes: 45 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,48 @@
3 directories, 11 files
```
<!-- DIRECTORY_TREE_END -->
<!-- DIRECTORY_TREE_END -->

## Required Docker Swarm Secrets

```bash
echo "<your-openai-api-key>" | docker secret create OPENAI_API_KEY -
echo "<your-minio-root-user>" | docker secret create MINIO_ROOT_USER -
echo "<your-minio-root-password>" | docker secret create MINIO_ROOT_PASSWORD -
echo "<your-langchain-tracing-v2-value>" | docker secret create LANGCHAIN_TRACING_V2 -
echo "<your-langchain-api-key>" | docker secret create LANGCHAIN_API_KEY -
echo "<your-langchain-project>" | docker secret create LANGCHAIN_PROJECT -
echo "<your-weaviate-environment>" | docker secret create WEAVIATE_ENVIRONMENT -
echo "<your-weaviate-api-key>" | docker secret create WEAVIATE_API_KEY -
```


## Example of extending additional services

In a Docker Swarm deployment, especially when using separate repositories for different components of your system, it's not necessary to maintain a physical directory for the MinIO system control app within the `cdaprod/cda.deploy-to-swarm.git` repository. Instead, you can directly reference the MinIO system control Docker image within your Docker Compose file used for the deployment. This approach simplifies the deployment process and keeps your repositories focused on their specific purposes.

### Including the MinIO System Control App in `docker-compose.yml`

In your Docker Compose file within the `cdaprod/cda.deploy-to-swarm.git` repository, you would include a service definition for the MinIO system control app that references the Docker image built and pushed from the `cdaprod/cda.minio-system-control.git` repository. Here's an example of how you might define this service:

```yaml
services:
minio-system-control:
image: cdaprod/cda-minio-system-control:latest
ports:
- "8000:8000"
environment:
MINIO_ACCESS_KEY: "minio-access-key"
MINIO_SECRET_KEY: "minio-secret-key"
# Add other configurations as necessary
```

This service definition assumes that you have already built and pushed the Docker image `cdaprod/cda-minio-system-control:latest` to a Docker registry accessible by your Docker Swarm cluster.

### Benefits of This Approach

- **Separation of Concerns**: Keeping your application code in its dedicated repository (`cdaprod/cda.minio-system-control.git`) and your deployment configurations in another (`cdaprod/cda.deploy-to-swarm.git`) helps maintain clarity and separation of concerns.
- **Modularity**: This method allows for more modular deployments. You can update, scale, or modify the MinIO system control application independently of other services defined in your Docker Compose file.
- **Simplicity in Updates**: When updates are made to the MinIO system control application, you only need to rebuild and push the Docker image. The deployment can automatically use the latest image without needing to adjust the repository containing your Docker Compose files, assuming you use tags appropriately.

Remember to update the Docker Compose file with the correct version of the Docker image if you're not using the `latest` tag, ensuring that your Swarm deployment always uses the intended version of each service.
21 changes: 21 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,13 +66,34 @@ services:
networks:
- app_network

jupyterlab:
build:
context: ./jupyter
image: custom-jupyterlab:latest
ports:
- "8888:8888"
volumes:
- ../usb/001/002:/dev/bus/usb/001/002
- jupyter_data:/home/jovyan/work
networks:
- app_network
environment:
- JUPYTER_ENABLE_LAB=yes
privileged: true

networks:
app_network:
driver: overlay

volumes:
minio_data:
weaviate_data:
jupyter_data:
driver: local
driver_opts:
type: none
device: /opt/jupyter_data
o: bind

secrets:
minio_root_user:
Expand Down
13 changes: 13 additions & 0 deletions jupyter/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM jupyter/datascience-notebook

# Install Edge TPU Python API library
RUN echo "deb https://packages.cloud.google.com/apt coral-edgetpu-stable main" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list && \
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add - && \
sudo apt-get update && \
sudo apt-get install libedgetpu1-std python3-edgetpu

# Install Clients
RUN pip install minio weaviate-client transformers openai langchain

# Install requirements for OpenAI and LangChain integration
RUN pip install pydantic bs4 poetry fastapi uvicorn docker unstructured
24 changes: 24 additions & 0 deletions jupyter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
## Dockerfile

To deploy docker jupyter lab container run the following commands on docker swarm leader (rpi-swarm runner), which is connected to Google Coral TPU

`docker build -t cda-jupyter .`

`docker run -it --privileged -v /dev/bus/usb/001/002:/dev/bus/usb -p 8888:8888 cda-jupyter`

## Docker Compose

- USB Access Volume: The first volume binding ../usb/001/002:/dev/bus/usb/001/002 assumes you have a specific USB device directory structure (../usb/001/002) on the host system that you want to map directly to the container. This path might need adjustment based on your host’s actual USB device path. It’s crucial for giving the Docker container access to the Coral TPU.
- Persistent Data Volume for Notebooks: The jupyter_data volume is defined to persist your Jupyter notebooks and any other important data. It is mapped to /home/jovyan/work inside the container, which is the default working directory for Jupyter notebooks in the Docker image. The driver_opts section under the volumes definition sets up a bind mount from a local directory (${PWD}/jupyter_data) to the volume, ensuring data persistence across container restarts or rebuilds.

### Running the Docker Compose File

To deploy your service with these volumes, navigate to the directory containing your docker-compose.jupyter.yaml and run:

`docker-compose -f docker-compose.jupyter.yaml up --build`

This command builds the image if not present and starts the JupyterLab service, making it accessible at http://localhost:8888. Notebooks and other data saved in /home/jovyan/work inside the container will persist in the jupyter_data volume on your host machine.

Important Note

Remember, Docker and Docker Compose paths and volume bindings must accurately reflect your system’s directory structure and device file paths. The given example paths may need to be adjusted to match your environment.
29 changes: 29 additions & 0 deletions jupyter/docker-compose.jupyter.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
version: '3.8'

services:
jupyterlab:
build:
context: ./jupyter
image: custom-jupyterlab:latest
ports:
- "8888:8888"
volumes:
- ../usb/001/002:/dev/bus/usb/001/002 # Bind mount for Coral TPU USB access
- jupyter_data:/home/jovyan/work # Persistent volume for notebooks and data
networks:
- app_network
environment:
- JUPYTER_ENABLE_LAB=yes
privileged: true

networks:
app_network:
external: true

volumes:
jupyter_data:
driver: local
driver_opts:
type: none
device: ${PWD}/jupyter_data
o: bind
104 changes: 104 additions & 0 deletions minio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# My Default Buckets

weaviate-backups
cda-datasets
raw-objects
clean-objects
my-prompt-bucket
feature-store-bucket

## How to Programatically create default bucket data

Building and populating data in MinIO buckets programmatically can be achieved using various methods, depending on your specific needs and the nature of the data you're dealing with. You can use the MinIO Client (`mc`), MinIO's SDKs for different programming languages, or direct REST API calls. Below, I'll outline methods using the MinIO Client and Python SDK, as these are among the most common and versatile approaches.

### Using MinIO Client (`mc`)

The MinIO Client (`mc`) can be used for a wide range of bucket and object management tasks, including file uploads, setting policies, and mirroring data. To programmatically upload data to your buckets, you could write shell scripts that use `mc cp` or `mc mirror` for uploading files.

#### Uploading a Single File

```sh
mc cp /path/to/your/file.txt myminio/your-bucket-name
```

#### Uploading Multiple Files or Directories

```sh
mc cp --recursive /path/to/your/directory myminio/your-bucket-name
```

#### Example Script for Uploading Data

```sh
#!/bin/sh

# Define your bucket names and data sources
declare -A buckets_and_data=(
["weaviate-backups"]="/path/to/backup/data"
["cda-datasets"]="/path/to/datasets"
# Add more as needed
)

# Loop through the associative array
for bucket in "${!buckets_and_data[@]}"; do
data_source="${buckets_and_data[$bucket]}"
echo "Uploading data from $data_source to $bucket..."
mc cp --recursive "$data_source" myminio/"$bucket"
done
```

### Using Python and MinIO Python SDK

The MinIO Python SDK is a powerful tool for interacting with MinIO in a programmatic way, allowing for more complex operations and integration into your Python applications.

First, ensure you have the MinIO Python SDK installed:

```sh
pip install minio
```

Then, you can write a Python script to upload files:

#### Python Script Example

```python
from minio import Minio
from minio.error import S3Error
import os

def upload_directory_to_bucket(minio_client, bucket_name, directory_path):
for root, _, files in os.walk(directory_path):
for file in files:
file_path = os.path.join(root, file)
# Define the object name in the bucket; here, it keeps the directory structure
object_name = os.path.relpath(file_path, start=directory_path)
try:
minio_client.fput_object(bucket_name, object_name, file_path)
print(f"Uploaded {file_path} as {object_name} in bucket {bucket_name}")
except S3Error as exc:
print(f"Failed to upload {file_path} to {bucket_name}: {exc}")

if __name__ == "__main__":
# Create a MinIO client
minio_client = Minio(
"minio:9000",
access_key="your-access-key",
secret_key="your-secret-key",
secure=False # Set to True for https
)

# Define your buckets and corresponding data directories
buckets_and_data = {
"weaviate-backups": "/path/to/backup/data",
"cda-datasets": "/path/to/datasets",
# Add more as needed
}

# Upload data for each bucket
for bucket, data_dir in buckets_and_data.items():
upload_directory_to_bucket(minio_client, bucket, data_dir)
```

This Python script demonstrates how to upload an entire directory's worth of files to specific MinIO buckets, maintaining the directory structure within the bucket. It iterates over a dictionary of bucket names and their corresponding local directories, uploading each file found within those directories to the correct bucket.

By using these approaches, you can programmatically build and populate your MinIO buckets with the necessary data, either through shell scripts utilizing the `mc` tool or via Python scripts using MinIO's Python SDK.
30 changes: 20 additions & 10 deletions minio/entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,17 +8,27 @@ minio server /data --console-address ":9001" &
# Wait for MinIO to start
sleep 5

# Set up alias and create bucket
mc alias set myminio http://minio:9000 ${MINIO_ROOT_USER} ${MINIO_ROOT_PASSWORD}
# Set up alias for MinIO
mc alias set myminio http://minio:9000 "${MINIO_ROOT_USER}" "${MINIO_ROOT_PASSWORD}"

# Before creating buckets, check if they already exist
if ! mc ls myminio/weaviate-backups; then
mc mb myminio/weaviate-backups
fi
# Function to create a bucket if it doesn't exist
create_bucket_if_not_exists() {
bucket_name=$1
if ! mc ls myminio/"${bucket_name}" &> /dev/null; then
echo "Creating bucket: ${bucket_name}"
mc mb myminio/"${bucket_name}"
else
echo "Bucket ${bucket_name} already exists."
fi
}

if ! mc ls myminio/cda-datasets; then
mc mb myminio/cda-datasets
fi
# Space-separated list of buckets to check and create if they don't exist
buckets="weaviate-backups cda-datasets raw-objects clean-objects prompt-bucket feature-store-bucket"

# Iterate over the list and create each bucket if it doesn't exist
for bucket in $buckets; do
create_bucket_if_not_exists "$bucket"
done

# Keep the script running to prevent the container from exiting
tail -f /dev/null
tail -f /dev/null

0 comments on commit 63c92bb

Please sign in to comment.