From e23df9c7cffe4c2fdb4c3472671c634b922890f4 Mon Sep 17 00:00:00 2001 From: Craig Osterhout Date: Tue, 16 Jan 2024 10:42:15 -0800 Subject: [PATCH] add genai vid transcription guide Signed-off-by: Craig Osterhout --- .../genai-video-transcription/_index.md | 18 ++ .../genai-video-transcription/containerize.md | 256 ++++++++++++++++++ .../use-case/genai-video-transcription/run.md | 157 +++++++++++ data/toc.yaml | 12 + 4 files changed, 443 insertions(+) create mode 100644 content/guides/use-case/genai-video-transcription/_index.md create mode 100644 content/guides/use-case/genai-video-transcription/containerize.md create mode 100644 content/guides/use-case/genai-video-transcription/run.md diff --git a/content/guides/use-case/genai-video-transcription/_index.md b/content/guides/use-case/genai-video-transcription/_index.md new file mode 100644 index 000000000000..2099b2a9899b --- /dev/null +++ b/content/guides/use-case/genai-video-transcription/_index.md @@ -0,0 +1,18 @@ +--- +description: Learn how to containerize a generative AI (GenAI) app that does video transcription. +keywords: python, generative ai, genai, llm, pinecone, openai, whisper, langchain +title: Generative AI - video transcription +toc_min: 1 +toc_max: 2 +--- + +In this guide you'll explore and run a containerized generative AI (GenAI) application that uses remote third-party services, like OpenAI and Pinecone, to transcribe videos and answer questions about them. You'll then learn how to containerize GenAI applications. + +> **Acknowledgment** +> +> Docker would like to thank [David Cardozo](https://www.davidcardozo.com/) for +> his contribution to this guide. + +Start by exploring and running an existing containerized GenAI application. + +{{< button text="Explore and run a GenAI app" url="run.md" >}} \ No newline at end of file diff --git a/content/guides/use-case/genai-video-transcription/containerize.md b/content/guides/use-case/genai-video-transcription/containerize.md new file mode 100644 index 000000000000..0819a1806e4e --- /dev/null +++ b/content/guides/use-case/genai-video-transcription/containerize.md @@ -0,0 +1,256 @@ +--- +title: Containerize a GenAI video transcription app +keywords: python, generative ai, genai, llm, pinecone, openai, whisper, langchain +description: Learn how to containerize generative AI video transcription applications. +--- +## Overview + +In this section you're going to walk through the process of containerizing a video transcription app that uses generative AI. + +## Prerequisites + +Work through the steps of the [Explore and run a genAI video transcription app](run.md) section to get and configure the example application used in this section. + +## Step 1: Prepare the example application + +Before you walk through the process of creating Docker assets, delete the +current assets in the example application. This section only explores the +yt-whisper application, so delete the existing `Dockerfile` inside the +`docker-genai/yt-whisper/` directory. + +You should now have the following contents inside the `docker-genai/yt-whisper/` directory. + +```text +├── docker-genai/yt-whisper/ +│ ├── scripts/ +│ ├── tests/ +│ ├── yt_whisper/ +│ ├── README.md +│ ├── poetry.lock +│ └── pyproject.toml +``` + +## Step 2: Initialize Docker assets + +Use `docker init` to create the necessary Docker assets to containerize your application. Inside the `docker-genai/yt-whisper/` directory, run the `docker init` command in a terminal. `docker init` provides some default configuration, but you'll need to answer a few questions about your application. For example, this application uses Streamlit to run. Refer to the following docker init example and use the same answers for your prompts. + +```console +docker init +Welcome to the Docker Init CLI! + +This utility will walk you through creating the following files with sensible defaults for your project: + - .dockerignore + - Dockerfile + - compose.yaml + - README.Docker.md + +Let's get started! + +? What application platform does your project use? Python +? What version of Python do you want to use? 3.11 +? What port do you want your app to listen on? 8503 +? What is the command to run your app? streamlit run yt_whisper/app.py --server.port=8503 --server.address=0.0.0.0 +``` + +You should now have the following contents in your `docker-genai/yt-whisper/` directory. + +```text +├── docker-genai/yt-whisper/ +│ ├── scripts/ +│ ├── tests/ +│ ├── yt_whisper/ +│ ├── README.md +│ ├── poetry.lock +│ ├── pyproject.toml +│ ├── README.Docker.md +│ ├── .dockerignore +│ ├── compose.yaml +│ └── Dockerfile +``` + +## Step 3: Explore and update the Docker assets + +`docker init` creates Docker assets to help you get started. Depending on your application, you may need to modify the assets. In the following sections, you'll explore and update the Docker assets. + +### Explore and update the Dockerfile + +1. Open the `docker-genai/yt-whisper/Dockerfile` in a code or text editor. +2. Inspect the contents of the `Dockerfile`. First notice that the `Dockerfile` + is more extensive than the Dockerfile from + [Explore and run a genAI video transcription app](run.md). It's more + extensive because `docker init` implements several [best practices](../../../develop/develop-images/dockerfile_best-practices.md) + for creating production images. Next, notice the instructions specify a + `requirements.txt` file for the packages. This particular application uses + Poetry, so you'll need to update the relevant instructions. +3. Update the `Dockerfile` for Poetry. The following is the updated + `Dockerfile`. + ```dockerfile{hl_lines=["36-40"]} + # syntax=docker/dockerfile:1 + + # Comments are provided throughout this file to help you get started. + # If you need more help, visit the Dockerfile reference guide at + # https://docs.docker.com/go/dockerfile-reference/ + + ARG PYTHON_VERSION=3.11 + FROM python:${PYTHON_VERSION}-slim as base + + # Prevents Python from writing pyc files. + ENV PYTHONDONTWRITEBYTECODE=1 + + # Keeps Python from buffering stdout and stderr to avoid situations where + # the application crashes without emitting any logs due to buffering. + ENV PYTHONUNBUFFERED=1 + + WORKDIR /app + + # Create a non-privileged user that the app will run under. + # See https://docs.docker.com/go/dockerfile-user-best-practices/ + ARG UID=10001 + RUN adduser \ + --disabled-password \ + --gecos "" \ + --home "/nonexistent" \ + --shell "/sbin/nologin" \ + --no-create-home \ + --uid "${UID}" \ + appuser + + # Download dependencies as a separate step to take advantage of Docker's caching. + # Leverage a cache mount to /root/.cache/pip to speed up subsequent builds. + # Leverage a bind mount to requirements.txt to avoid having to copy them into + # into this layer. + RUN --mount=type=cache,target=/root/.cache/pip \ + --mount=type=bind,source=pyproject.toml,target=pyproject.toml \ + --mount=type=bind,source=poetry.lock,target=poetry.lock \ + --mount=type=bind,source=yt_whisper,target=yt_whisper \ + --mount=type=bind,source=README.md,target=README.md \ + python -m pip install . + + # Switch to the non-privileged user to run the application. + USER appuser + + # Copy the source code into the container. + COPY . . + + # Expose the port that the application listens on. + EXPOSE 8503 + + # Run the application. + CMD streamlit run yt_whisper/app.py --server.port=8503 --server.address=0.0.0.0 + ``` +4. Save and close the file. + +### Explore and update the .dockerignore file + +One new file that you haven't explored yet is the `.dockerignore` file. The `.dockerignore` file excludes files and directories from the image. For more details, see [.dockerignore files](../../../build/building/context.md#dockerignore-files). + +1. Open the `docker-genai/yt-whisper/.dockerignore` file in a code or text + editor. +2. Inspect the contents of the `.dockerignore` file. Notice that it has + `README.md`, but you just specified copying this file in the previous + `Dockerfile`. +3. Update the `.dockerignore` file and remove `README.md`. The following is the + updated `.dockerignore` file. + ```text + # Include any files or directories that you don't want to be copied to your + # container here (e.g., local build artifacts, temporary files, etc.). + # + # For more help, visit the .dockerignore file reference guide at + # https://docs.docker.com/go/build-context-dockerignore/ + + **/.DS_Store + **/__pycache__ + **/.venv + **/.classpath + **/.env + **/.git + **/.gitignore + **/.project + **/.settings + **/.toolstarget + **/.vs + **/.vscode + **/*.*proj.user + **/*.dbmdl + **/*.jfm + **/bin + **/charts + **/docker-compose* + **/compose* + **/Dockerfile* + **/node_modules + **/npm-debug.log + **/obj + **/secrets.dev.yaml + **/values.dev.yaml + LICENSE + ``` +4. Save and close the file. + +### Explore and update the Compose file + +1. Open the `docker-genai/yt-whisper/compose.yaml` file in a code or text + editor. +2. Inspect the contents of the `compose.yaml` file. Notice that the environment + variables file isn't specified. Also, notice that only one service named `server` is specified. +3. Update the `compose.yaml` file and add the environment variables file. The + following is the updated `compose.yaml` file. + ```yaml{hl_lines=["16-17"]} + # Comments are provided throughout this file to help you get started. + # If you need more help, visit the Docker Compose reference guide at + # https://docs.docker.com/go/compose-spec-reference/ + + # Here the instructions define your application as a service called "server". + # This service is built from the Dockerfile in the current directory. + # You can add other services your application may depend on here, such as a + # database or a cache. For examples, see the Awesome Compose repository: + # https://github.com/docker/awesome-compose + services: + server: + build: + context: . + ports: + - 8503:8503 + env_file: + - ../.env + # ... + ``` + +## Step 4: Run the containerized generative AI application + +To run the application, in a terminal, change directory to the `docker-genai/yt-whisper/` directory and run the following command. + +```console +docker compose up --build +``` + +Docker builds the image and runs it as a container. Once the container starts, you can access the yt-whisper application by opening a web browser and navigating to [localhost:8503](http://localhost:8503). + +Only the service for the yt-whisper app is started. You should be unable to access the bot at [localhost:8504](http://localhost:8504). + +Stop the container by pressing `CTRL`+`C` in the terminal. + +To start both the bot and yt-whisper services, you can use the exisiting Compose file in the root of `docker-genai/` directory. Change directory to `docker-genai/` and run the following command to bring up both services. + +```console +docker compose up --build +``` + +Stop the containers by pressing `CTRL`+`C` in the terminal. + +## Summary + +In this section you learned how to containerize and run a generative AI application. + +Related information: +* [docker init CLI reference](../../../engine/reference/commandline/init.md) +* [Dockerfile reference](/engine/reference/builder/) +* [Compose overview](../../../compose/_index.md) +* [.dockerignore files](../../../build/building/context.md#dockerignore-files) + +## Next steps + +* Try to containerize the docker-bot in the example application using `docker + init`. +* See the [Python language-specific guide](../../../language/python/_index.md) to learn how to configure CI/CD and deploy Python apps locally to Kubernetes. +* See the [GenAI Stack](https://github.com/docker/genai-stack) demo applications for more examples of containerized generative AI applications. \ No newline at end of file diff --git a/content/guides/use-case/genai-video-transcription/run.md b/content/guides/use-case/genai-video-transcription/run.md new file mode 100644 index 000000000000..3783f16c656e --- /dev/null +++ b/content/guides/use-case/genai-video-transcription/run.md @@ -0,0 +1,157 @@ +--- +title: Explore and run a GenAI video transcription app +keywords: containers, images, python, genai, pinecone, llm, openai, whisper, dockerfiles, build +description: Learn how to build and run containerized generative AI applications that transcribes videos +--- + +## Overview + +In this section you're going to look at a containerized generative AI application that does video transcription. You'll learn how to build and run the app using Docker. + +## Prerequisites + +* You have installed the latest version of [Docker Desktop](../../../get-docker.md). Docker adds new features regularly and some parts of this guide may work only with the latest version of Docker Desktop. +* You have a [git client](https://git-scm.com/downloads). The examples in this section use a command-line based git client, but you can use any client. +* You have a [Pinecone](https://www.pinecone.io/) and an [OpenAI](https://openai.com/) account. + > **Important** + > + > OpenAI and Pinecone are third-party hosted services and charges may apply + > when using the services. + { .important } + +## Step 1: Clone and meet the example app +Run the following command in a terminal to clone the application repository to your local machine. + +```console +$ git clone https://github.com/Davidnet/docker-genai.git +``` + +This sample repository contains the following two applications: + +* docker-bot: A question-answering service that leverages both a vector database and an AI model to provide responses. +* yt-whisper: A YouTube video processing service that uses the OpenAI Whisper model to generate transcriptions of videos and stores them in a Pinecone vector database. + +The applications use the following third-party services, tools, and +packages: +* [OpenAI](https://openai.com/) for chat completion and speech recognition using + Whisper. +* [Langchain](https://www.langchain.com/) for managing and orchestrating the + models. +* [Streamlit](https://streamlit.io/) for the UI. +* [Pinecone](https://www.pinecone.io/) for the embeddings in a vector database. +* [pytube](https://github.com/pytube/pytube) for downloading the YouTube videos. +* [Poetry](https://python-poetry.org/) for dependency management. + +This guide explores the yt-whisper application, but you can also apply the instructions to the docker-bot application. + +## Step 2: Explore the generative AI application + +The example applications in the repository have already been containerized. Before you run the applications, inspect the source code and Docker assets. + +### Explore the app.py file + +In the `docker-genai/yt-whisiper/yt_whisper` directory is the `app.py` file. This is the entrypoint, or main script for the application. Open the `app.py` file in a code or text editor. + +The main thing to notice about this app is that it requires the following +environment variables to run: `OPENAI_TOKEN`, `PINECONE_TOKEN`, and +`PINECONE_ENVIRONMENT`. In one of the following steps, you'll learn more about +these environment variables and how you can specify these environment variables +when running the container. + +### Explore the Dockerfile + +In the `docker-genai/yt-whisiper/yt_whisper` directory is the `Dockerfile`. The `Dockerfile` is a text document that contains all the instructions used to create an image. Open the `Dockerfile` in a code or text editor. + +The following describes the instructions used in this `Dockerfile`: +* `FROM` specifies the base image to use. The base image is the initial image that your Dockerfile is based on. In this case, it's the [Docker Official Image for Python](https://hub.docker.com/_/python). You can explore [Docker Hub](https://hub.docker.com/) to find more pre-made images. +* `WORKDIR` sets the working directory for any instructions that follow it. +* `COPY` copies files into the image from your host machine. +* `RUN` runs commands when building the image, such as installing packages. +* `EXPOSE` informs Docker that the container listens on the specified network + ports at runtime. +* `HEALTHCHECK` tells Docker how to test a container to check that it's still + working. +* `ENTRYPOINT` specifies which process to run when the container is started. + +For more details about the Dockerfile instructions, see the [Dockerfile reference](/engine/reference/builder/). + +### Explore the Compose file + +In the `docker-genai/` directory is the `docker-compose.yaml` file. The Compose file is a YAML file that you can use to configure your application's services. Open the `docker-compose.yaml` file in a code or text editor. + +This Compose file specifies two services, `bot` and `yt-whisper`. Under those services it defines where the Dockerfiles are located, which ports to expose, and the environment file that contains environment variables that the application needs. + +One thing to notice is that there is no `.env` file contained in the repository. +Create the `.env` file now. + +## Step 3: Create the .env file + +You can use a `.env` file to [set environment variables with Compose](../../../compose/environment-variables/set-environment-variables.md). +In the `docker-genai/` directory, create a new file named `.env`. Open the +`.env` file in a code or text editor and specify the following environment +variables. + +```text +#---------------------------------------------------------------------------- +# OpenAI +#---------------------------------------------------------------------------- +OPENAI_TOKEN=your-api-key # Replace your-api-key with your personal API key + +#---------------------------------------------------------------------------- +# Pinecone +#---------------------------------------------------------------------------- +PINECONE_TOKEN=your-api-key # Replace your-api-key with your personal API key +PINECONE_ENVIRONMENT=us-west1-gcp-free +``` + +To learn more about the values of the environment variables, see the following: +* `OPENAI_TOKEN` is your [OpenAI API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-api-key). +* `PINECONE_TOKEN` is your [Pinecone API key](https://docs.pinecone.io/docs/authentication). +* `PINECONE_ENVIRONMENT` is the [Pinecone cloud environment](https://docs.pinecone.io/docs/projects#project-environment). + +## Step 4: Run the generative AI application + +To build and run the application, in a terminal, change directory to the `docker-genai/` directory and run the following command. + +```console +docker compose up --build +``` + +Docker Compose builds the images and runs them as containers. Depending on your network connection, it may take several minutes to download the dependencies. + +You should see output similar to the following in the terminal after Docker starts the containers. + +```console +bot-1 | You can now view your Streamlit app in your browser. +bot-1 | +bot-1 | URL: http://0.0.0.0:8504 +bot-1 | +yt-whisper-1 | +yt-whisper-1 | Collecting usage statistics. To deactivate, set browser.gatherUsageStats to False. +yt-whisper-1 | +yt-whisper-1 | +yt-whisper-1 | You can now view your Streamlit app in your browser. +yt-whisper-1 | +yt-whisper-1 | URL: http://0.0.0.0:8503 +yt-whisper-1 | +``` + +Once the containers start, you can access the yt-whisper application by opening a web browser and navigating to [localhost:8503](http://localhost:8503). Specify the URL to a short YouTube video, for example the Docker in 100 seconds video at [https://www.youtube.com/watch?v=IXifQ8mX8DE](https://www.youtube.com/watch?v=IXifQ8mX8DE), and then select **Submit**. + +Once the video has been processed, open the docker-bot application at [localhost:8504](http://localhost:8504). Ask a question about your video and the bot answers. + +Stop the containers by pressing `CTRL`+`C` in the terminal. + +## Summary + +At this point, you have explored the Docker assets required to build and run a containerized application. You can create the assets from scratch, as the author of this application did, or use the `docker init`command to help get the process started. + +Related information: +* [Dockerfile reference](/engine/reference/builder/) +* [Compose overview](../../../compose/_index.md) + +## Next steps + +Continue to the next section to learn how you can containerize generative AI applications using Docker. + +{{< button text="Containerize a GenAI app" url="containerize.md" >}} \ No newline at end of file diff --git a/data/toc.yaml b/data/toc.yaml index cb62c21c6b02..201ff84f6511 100644 --- a/data/toc.yaml +++ b/data/toc.yaml @@ -152,6 +152,18 @@ Guides: path: /language/php/configure-ci-cd/ - title: "Test your deployment" path: /language/php/deploy/ + +- sectiontitle: Use-case guides + section: + - sectiontitle: Generative AI - video transcription + section: + - path: /guides/use-case/genai-video-transcription/ + title: Overview + - path: /guides/use-case/genai-video-transcription/run/ + title: Explore and run the app + - path: /guides/use-case/genai-video-transcription/containerize/ + title: Containerize the app + - sectiontitle: Develop with Docker section: - path: /develop/