From e23df9c7cffe4c2fdb4c3472671c634b922890f4 Mon Sep 17 00:00:00 2001
From: Craig Osterhout <craig.osterhout@docker.com>
Date: Tue, 16 Jan 2024 10:42:15 -0800
Subject: [PATCH] add genai vid transcription guide

Signed-off-by: Craig Osterhout <craig.osterhout@docker.com>
---
 .../genai-video-transcription/_index.md       |  18 ++
 .../genai-video-transcription/containerize.md | 256 ++++++++++++++++++
 .../use-case/genai-video-transcription/run.md | 157 +++++++++++
 data/toc.yaml                                 |  12 +
 4 files changed, 443 insertions(+)
 create mode 100644 content/guides/use-case/genai-video-transcription/_index.md
 create mode 100644 content/guides/use-case/genai-video-transcription/containerize.md
 create mode 100644 content/guides/use-case/genai-video-transcription/run.md

diff --git a/content/guides/use-case/genai-video-transcription/_index.md b/content/guides/use-case/genai-video-transcription/_index.md
new file mode 100644
index 000000000000..2099b2a9899b
--- /dev/null
+++ b/content/guides/use-case/genai-video-transcription/_index.md
@@ -0,0 +1,18 @@
+---
+description: Learn how to containerize a generative AI (GenAI) app that does video transcription.
+keywords: python, generative ai, genai, llm, pinecone, openai, whisper, langchain
+title: Generative AI - video transcription
+toc_min: 1
+toc_max: 2
+---
+
+In this guide you'll explore and run a containerized generative AI (GenAI) application that uses remote third-party services, like OpenAI and Pinecone, to transcribe videos and answer questions about them. You'll then learn how to containerize GenAI applications.
+
+> **Acknowledgment**
+>
+> Docker would like to thank [David Cardozo](https://www.davidcardozo.com/) for
+> his contribution to this guide.
+
+Start by exploring and running an existing containerized GenAI application.
+
+{{< button text="Explore and run a GenAI app" url="run.md" >}}
\ No newline at end of file
diff --git a/content/guides/use-case/genai-video-transcription/containerize.md b/content/guides/use-case/genai-video-transcription/containerize.md
new file mode 100644
index 000000000000..0819a1806e4e
--- /dev/null
+++ b/content/guides/use-case/genai-video-transcription/containerize.md
@@ -0,0 +1,256 @@
+---
+title: Containerize a GenAI video transcription app
+keywords: python, generative ai, genai, llm, pinecone, openai, whisper, langchain
+description: Learn how to containerize generative AI video transcription applications.
+---
+## Overview
+
+In this section you're going to walk through the process of containerizing a video transcription app that uses generative AI.
+
+## Prerequisites
+
+Work through the steps of the [Explore and run a genAI video transcription app](run.md) section to get and configure the example application used in this section.
+
+## Step 1: Prepare the example application
+
+Before you walk through the process of creating Docker assets, delete the
+current assets in the example application. This section only explores the
+yt-whisper application, so delete the existing `Dockerfile` inside the
+`docker-genai/yt-whisper/` directory.
+
+You should now have the following contents inside the `docker-genai/yt-whisper/` directory.
+
+```text
+├── docker-genai/yt-whisper/
+│ ├── scripts/
+│ ├── tests/
+│ ├── yt_whisper/
+│ ├── README.md
+│ ├── poetry.lock
+│ └── pyproject.toml
+```
+
+## Step 2: Initialize Docker assets
+
+Use `docker init` to create the necessary Docker assets to containerize your application. Inside the `docker-genai/yt-whisper/` directory, run the `docker init` command in a terminal. `docker init` provides some default configuration, but you'll need to answer a few questions about your application. For example, this application uses Streamlit to run. Refer to the following docker init example and use the same answers for your prompts.
+
+```console
+docker init
+Welcome to the Docker Init CLI!
+
+This utility will walk you through creating the following files with sensible defaults for your project:
+  - .dockerignore
+  - Dockerfile
+  - compose.yaml
+  - README.Docker.md
+
+Let's get started!
+
+? What application platform does your project use? Python
+? What version of Python do you want to use? 3.11
+? What port do you want your app to listen on? 8503
+? What is the command to run your app? streamlit run yt_whisper/app.py --server.port=8503 --server.address=0.0.0.0
+```
+
+You should now have the following contents in your `docker-genai/yt-whisper/` directory.
+
+```text
+├── docker-genai/yt-whisper/
+│ ├── scripts/
+│ ├── tests/
+│ ├── yt_whisper/
+│ ├── README.md
+│ ├── poetry.lock
+│ ├── pyproject.toml
+│ ├── README.Docker.md
+│ ├── .dockerignore
+│ ├── compose.yaml
+│ └── Dockerfile
+```
+
+## Step 3: Explore and update the Docker assets
+
+`docker init` creates Docker assets to help you get started. Depending on your application, you may need to modify the assets. In the following sections, you'll explore and update the Docker assets.
+
+### Explore and update the Dockerfile
+
+1. Open the `docker-genai/yt-whisper/Dockerfile` in a code or text editor.
+2. Inspect the contents of the `Dockerfile`. First notice that the `Dockerfile`
+   is more extensive than the Dockerfile from
+   [Explore and run a genAI video transcription app](run.md). It's more
+   extensive because `docker init` implements several [best practices](../../../develop/develop-images/dockerfile_best-practices.md)
+   for creating production images. Next, notice the instructions specify a
+   `requirements.txt` file for the packages. This particular application uses
+   Poetry, so you'll need to update the relevant instructions.
+3. Update the `Dockerfile` for Poetry. The following is the updated
+   `Dockerfile`.
+   ```dockerfile{hl_lines=["36-40"]}
+   # syntax=docker/dockerfile:1
+
+   # Comments are provided throughout this file to help you get started.
+   # If you need more help, visit the Dockerfile reference guide at
+   # https://docs.docker.com/go/dockerfile-reference/
+   
+   ARG PYTHON_VERSION=3.11
+   FROM python:${PYTHON_VERSION}-slim as base
+   
+   # Prevents Python from writing pyc files.
+   ENV PYTHONDONTWRITEBYTECODE=1
+   
+   # Keeps Python from buffering stdout and stderr to avoid situations where
+   # the application crashes without emitting any logs due to buffering.
+   ENV PYTHONUNBUFFERED=1
+   
+   WORKDIR /app
+   
+   # Create a non-privileged user that the app will run under.
+   # See https://docs.docker.com/go/dockerfile-user-best-practices/
+   ARG UID=10001
+   RUN adduser \
+       --disabled-password \
+       --gecos "" \
+       --home "/nonexistent" \
+       --shell "/sbin/nologin" \
+       --no-create-home \
+       --uid "${UID}" \
+       appuser
+   
+   # Download dependencies as a separate step to take advantage of Docker's caching.
+   # Leverage a cache mount to /root/.cache/pip to speed up subsequent builds.
+   # Leverage a bind mount to requirements.txt to avoid having to copy them into
+   # into this layer.
+   RUN --mount=type=cache,target=/root/.cache/pip \
+       --mount=type=bind,source=pyproject.toml,target=pyproject.toml \
+       --mount=type=bind,source=poetry.lock,target=poetry.lock \
+       --mount=type=bind,source=yt_whisper,target=yt_whisper \
+       --mount=type=bind,source=README.md,target=README.md \
+       python -m pip install .
+   
+   # Switch to the non-privileged user to run the application.
+   USER appuser
+   
+   # Copy the source code into the container.
+   COPY . .
+   
+   # Expose the port that the application listens on.
+   EXPOSE 8503
+   
+   # Run the application.
+   CMD streamlit run yt_whisper/app.py --server.port=8503 --server.address=0.0.0.0
+   ```
+4. Save and close the file.
+
+### Explore and update the .dockerignore file
+
+One new file that you haven't explored yet is the `.dockerignore` file. The `.dockerignore` file excludes files and directories from the image. For more details, see [.dockerignore files](../../../build/building/context.md#dockerignore-files).
+
+1. Open the `docker-genai/yt-whisper/.dockerignore` file in a code or text
+   editor.
+2. Inspect the contents of the `.dockerignore` file. Notice that it has
+   `README.md`, but you just specified copying this file in the previous
+   `Dockerfile`.
+3. Update the `.dockerignore` file and remove `README.md`. The following is the
+   updated `.dockerignore` file.
+   ```text
+   # Include any files or directories that you don't want to be copied to your
+   # container here (e.g., local build artifacts, temporary files, etc.).
+   #
+   # For more help, visit the .dockerignore file reference guide at
+   # https://docs.docker.com/go/build-context-dockerignore/
+   
+   **/.DS_Store
+   **/__pycache__
+   **/.venv
+   **/.classpath
+   **/.env
+   **/.git
+   **/.gitignore
+   **/.project
+   **/.settings
+   **/.toolstarget
+   **/.vs
+   **/.vscode
+   **/*.*proj.user
+   **/*.dbmdl
+   **/*.jfm
+   **/bin
+   **/charts
+   **/docker-compose*
+   **/compose*
+   **/Dockerfile*
+   **/node_modules
+   **/npm-debug.log
+   **/obj
+   **/secrets.dev.yaml
+   **/values.dev.yaml
+   LICENSE
+   ```
+4. Save and close the file.
+
+### Explore and update the Compose file
+
+1. Open the `docker-genai/yt-whisper/compose.yaml` file in a code or text
+   editor.
+2. Inspect the contents of the `compose.yaml` file. Notice that the environment
+   variables file isn't specified. Also, notice that only one service named `server` is specified.
+3. Update the `compose.yaml` file and add the environment variables file. The
+   following is the updated `compose.yaml` file.
+   ```yaml{hl_lines=["16-17"]}
+   # Comments are provided throughout this file to help you get started.
+   # If you need more help, visit the Docker Compose reference guide at
+   # https://docs.docker.com/go/compose-spec-reference/
+   
+   # Here the instructions define your application as a service called "server".
+   # This service is built from the Dockerfile in the current directory.
+   # You can add other services your application may depend on here, such as a
+   # database or a cache. For examples, see the Awesome Compose repository:
+   # https://github.com/docker/awesome-compose
+   services:
+     server:
+       build:
+         context: .
+       ports:
+         - 8503:8503
+       env_file:
+         - ../.env
+    # ...
+   ```
+
+## Step 4: Run the containerized generative AI application
+
+To run the application, in a terminal, change directory to the `docker-genai/yt-whisper/` directory and run the following command.
+
+```console
+docker compose up --build
+```
+
+Docker builds the image and runs it as a container. Once the container starts, you can access the yt-whisper application by opening a web browser and navigating to [localhost:8503](http://localhost:8503).
+
+Only the service for the yt-whisper app is started. You should be unable to access the bot at [localhost:8504](http://localhost:8504).
+
+Stop the container by pressing `CTRL`+`C` in the terminal.
+
+To start both the bot and yt-whisper services, you can use the exisiting Compose file in the root of `docker-genai/` directory. Change directory to `docker-genai/` and run the following command to bring up both services.
+
+```console
+docker compose up --build
+```
+
+Stop the containers by pressing `CTRL`+`C` in the terminal.
+
+## Summary
+
+In this section you learned how to containerize and run a generative AI application.
+
+Related information:
+* [docker init CLI reference](../../../engine/reference/commandline/init.md)
+* [Dockerfile reference](/engine/reference/builder/)
+* [Compose overview](../../../compose/_index.md)
+* [.dockerignore files](../../../build/building/context.md#dockerignore-files)
+
+## Next steps
+
+* Try to containerize the docker-bot in the example application using `docker
+  init`.
+* See the [Python language-specific guide](../../../language/python/_index.md) to learn how to configure CI/CD and deploy Python apps locally to Kubernetes.
+* See the [GenAI Stack](https://github.com/docker/genai-stack) demo applications for more examples of containerized generative AI applications.
\ No newline at end of file
diff --git a/content/guides/use-case/genai-video-transcription/run.md b/content/guides/use-case/genai-video-transcription/run.md
new file mode 100644
index 000000000000..3783f16c656e
--- /dev/null
+++ b/content/guides/use-case/genai-video-transcription/run.md
@@ -0,0 +1,157 @@
+---
+title: Explore and run a GenAI video transcription app
+keywords: containers, images, python, genai, pinecone, llm, openai, whisper, dockerfiles, build
+description: Learn how to build and run containerized generative AI applications that transcribes videos
+---
+
+## Overview
+
+In this section you're going to look at a containerized generative AI application that does video transcription. You'll learn how to build and run the app using Docker.
+
+## Prerequisites
+
+* You have installed the latest version of [Docker Desktop](../../../get-docker.md). Docker adds new features regularly and some parts of this guide may work only with the latest version of Docker Desktop.
+* You have a [git client](https://git-scm.com/downloads). The examples in this section use a command-line based git client, but you can use any client.
+* You have a [Pinecone](https://www.pinecone.io/) and an [OpenAI](https://openai.com/) account.
+   > **Important**
+   >
+   > OpenAI and Pinecone are third-party hosted services and charges may apply
+   > when using the services.
+   { .important }
+
+## Step 1: Clone and meet the example app
+Run the following command in a terminal to clone the application repository to your local machine.
+
+```console
+$ git clone https://github.com/Davidnet/docker-genai.git
+```
+
+This sample repository contains the following two applications:
+
+* docker-bot: A question-answering service that leverages both a vector database and an AI model to provide responses.
+* yt-whisper: A YouTube video processing service that uses the OpenAI Whisper model to generate transcriptions of videos and stores them in a Pinecone vector database.
+
+The applications use the following third-party services, tools, and
+packages:
+* [OpenAI](https://openai.com/) for chat completion and speech recognition using
+  Whisper.
+* [Langchain](https://www.langchain.com/) for managing and orchestrating the
+  models.
+* [Streamlit](https://streamlit.io/) for the UI.
+* [Pinecone](https://www.pinecone.io/) for the embeddings in a vector database.
+* [pytube](https://github.com/pytube/pytube) for downloading the YouTube videos.
+* [Poetry](https://python-poetry.org/) for dependency management.
+
+This guide explores the yt-whisper application, but you can also apply the instructions to the docker-bot application.
+
+## Step 2: Explore the generative AI application
+
+The example applications in the repository have already been containerized. Before you run the applications, inspect the source code and Docker assets.
+
+### Explore the app.py file
+
+In the `docker-genai/yt-whisiper/yt_whisper` directory is the `app.py` file. This is the entrypoint, or main script for the application. Open the `app.py` file in a code or text editor.
+
+The main thing to notice about this app is that it requires the following
+environment variables to run: `OPENAI_TOKEN`, `PINECONE_TOKEN`, and
+`PINECONE_ENVIRONMENT`. In one of the following steps, you'll learn more about
+these environment variables and how you can specify these environment variables
+when running the container.
+
+### Explore the Dockerfile
+
+In the `docker-genai/yt-whisiper/yt_whisper` directory is the `Dockerfile`. The `Dockerfile` is a text document that contains all the instructions used to create an image. Open the `Dockerfile` in a code or text editor.
+
+The following describes the instructions used in this `Dockerfile`:
+* `FROM` specifies the base image to use. The base image is the initial image that your Dockerfile is based on. In this case, it's the [Docker Official Image for Python](https://hub.docker.com/_/python). You can explore [Docker Hub](https://hub.docker.com/) to find more pre-made images.
+* `WORKDIR` sets the working directory for any instructions that follow it.
+* `COPY` copies files into the image from your host machine.
+* `RUN` runs commands when building the image, such as installing packages.
+* `EXPOSE` informs Docker that the container listens on the specified network
+  ports at runtime.
+* `HEALTHCHECK` tells Docker how to test a container to check that it's still
+  working.
+* `ENTRYPOINT` specifies which process to run when the container is started.
+
+For more details about the Dockerfile instructions, see the [Dockerfile reference](/engine/reference/builder/).
+
+### Explore the Compose file
+
+In the `docker-genai/` directory is the `docker-compose.yaml` file. The Compose file is a YAML file that you can use to configure your application's services. Open the `docker-compose.yaml` file in a code or text editor.
+
+This Compose file specifies two services, `bot` and `yt-whisper`. Under those services it defines where the Dockerfiles are located, which ports to expose, and the environment file that contains environment variables that the application needs.
+
+One thing to notice is that there is no `.env` file contained in the repository.
+Create the `.env` file now.
+
+## Step 3: Create the .env file
+
+You can use a `.env` file to [set environment variables with Compose](../../../compose/environment-variables/set-environment-variables.md).
+In the `docker-genai/` directory, create a new file named `.env`. Open the
+`.env` file in a code or text editor and specify the following environment
+variables.
+
+```text
+#----------------------------------------------------------------------------
+# OpenAI
+#----------------------------------------------------------------------------
+OPENAI_TOKEN=your-api-key # Replace your-api-key with your personal API key
+
+#----------------------------------------------------------------------------
+# Pinecone
+#----------------------------------------------------------------------------
+PINECONE_TOKEN=your-api-key # Replace your-api-key with your personal API key
+PINECONE_ENVIRONMENT=us-west1-gcp-free
+```
+
+To learn more about the values of the environment variables, see the following:
+* `OPENAI_TOKEN` is your [OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key).
+* `PINECONE_TOKEN` is your [Pinecone API key](https://docs.pinecone.io/docs/authentication).
+* `PINECONE_ENVIRONMENT` is the [Pinecone cloud environment](https://docs.pinecone.io/docs/projects#project-environment).
+
+## Step 4: Run the generative AI application
+
+To build and run the application, in a terminal, change directory to the `docker-genai/` directory and run the following command.
+
+```console
+docker compose up --build
+```
+
+Docker Compose builds the images and runs them as containers. Depending on your network connection, it may take several minutes to download the dependencies.
+
+You should see output similar to the following in the terminal after Docker starts the containers.
+
+```console
+bot-1         |   You can now view your Streamlit app in your browser.
+bot-1         |
+bot-1         |   URL: http://0.0.0.0:8504
+bot-1         |
+yt-whisper-1  |
+yt-whisper-1  | Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False.
+yt-whisper-1  |
+yt-whisper-1  |
+yt-whisper-1  |   You can now view your Streamlit app in your browser.
+yt-whisper-1  |
+yt-whisper-1  |   URL: http://0.0.0.0:8503
+yt-whisper-1  |
+```
+
+Once the containers start, you can access the yt-whisper application by opening a web browser and navigating to [localhost:8503](http://localhost:8503). Specify the URL to a short YouTube video, for example the Docker in 100 seconds video at [https://www.youtube.com/watch?v=IXifQ8mX8DE](https://www.youtube.com/watch?v=IXifQ8mX8DE), and then select **Submit**.
+
+Once the video has been processed, open the docker-bot application at [localhost:8504](http://localhost:8504). Ask a question about your video and the bot answers.
+
+Stop the containers by pressing `CTRL`+`C` in the terminal.
+
+## Summary
+
+At this point, you have explored the Docker assets required to build and run a containerized application. You can create the assets from scratch, as the author of this application did, or use the `docker init`command to help get the process started.
+
+Related information:
+* [Dockerfile reference](/engine/reference/builder/)
+* [Compose overview](../../../compose/_index.md)
+
+## Next steps
+
+Continue to the next section to learn how you can containerize generative AI applications using Docker.
+
+{{< button text="Containerize a GenAI app" url="containerize.md" >}}
\ No newline at end of file
diff --git a/data/toc.yaml b/data/toc.yaml
index cb62c21c6b02..201ff84f6511 100644
--- a/data/toc.yaml
+++ b/data/toc.yaml
@@ -152,6 +152,18 @@ Guides:
       path: /language/php/configure-ci-cd/
     - title: "Test your deployment"
       path: /language/php/deploy/
+
+- sectiontitle: Use-case guides
+  section:
+    - sectiontitle: Generative AI - video transcription
+      section:
+      - path: /guides/use-case/genai-video-transcription/
+        title: Overview
+      - path: /guides/use-case/genai-video-transcription/run/
+        title: Explore and run the app
+      - path: /guides/use-case/genai-video-transcription/containerize/
+        title: Containerize the app
+
 - sectiontitle: Develop with Docker
   section:
   - path: /develop/