# Dockerfile Breakdown and Documentation

Generated on 2024-10-29

This notebook provides a detailed explanation of each section of the Dockerfile, including commands, conditions, and chained logic to help understand each line of code.

## **Section 1: Base Image & Environment Setup**

This section sets up the base image, environment variables, and working directory.

Each line is configured to ensure efficient operation, reduced image size, and effective logging during container execution.

* `FROM python:3.11-slim`: Uses lightweight Python base image.

* `ENV DEBIAN_FRONTEND=noninteractive`: Prevents interactive prompts during installations.

* `ENV PYTHONUNBUFFERED=1`: Ensures instant log display.

* `ENV PYTHONDONTWRITEBYTECODE=1`: Saves space by not generating .pyc files.

* `WORKDIR /app`: Sets working directory for all project files.

In [None]:

# Use Python 3.11 slim as the base image to keep the container lightweight.
# Slim variants remove unnecessary tools, reducing image size.
FROM python:3.11-slim

# Set the environment variable to avoid interactive prompts during installations.
ENV DEBIAN_FRONTEND=noninteractive

# Disable Python’s output buffering to ensure that logs are displayed instantly.
ENV PYTHONUNBUFFERED=1

# Prevent Python from generating .pyc files to save space.
ENV PYTHONDONTWRITEBYTECODE=1

# Set the working directory to /app.
WORKDIR /app


## **Section 2: System Packages Installation**

In this section, necessary system packages are installed using `apt-get`. These packages support essential operations within the container.

Commands are chained with `&&` to ensure each command runs successfully before moving to the next, and the apt cache is cleared to reduce image size.

* `apt-get update && apt-get install -y`: Updates package list and installs necessary packages.

* `wget`, `bzip2`, `ca-certificates`, `build-essential`, `cmake`: Tools for downloading, compression, HTTPS, and building software.

* `rm -rf /var/lib/apt/lists/`: Cleans up the apt cache to reduce image size.

In [None]:

# Update the package list and install essential packages for building and running software.
RUN apt-get update && apt-get install -y \
    wget \
    bzip2 \
    ca-certificates \
    build-essential \
    cmake \
    && rm -rf /var/lib/apt/lists/*



## **Section 3: Mambaforge Installation (Miniconda Alternative)**

This block downloads and installs the Mambaforge installer based on the architecture of the system (either x86_64 or ARM). Using a conditional `if-else` structure, the script fetches the appropriate installer and performs cleanup after installation.

* `arch=$(uname -m)`: Detects system architecture.

* `if [ "${arch}" = "x86_64" ] ... elif [ "${arch}" = "aarch64" ]`: Downloads appropriate Miniforge installer based on architecture.

* `bash miniforge.sh -b -p /opt/miniforge`: Installs Miniforge.

* `rm miniforge.sh`: Deletes the installer to free up space.

In [None]:

# Detect system architecture and install appropriate Miniforge version.
RUN arch=$(uname -m) && \
    if [ "${arch}" = "x86_64" ]; then \
        wget -q "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh" -O miniforge.sh; \
    elif [ "${arch}" = "aarch64" ]; then \
        wget -q "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-aarch64.sh" -O miniforge.sh; \
    else \
        echo "Unsupported architecture: ${arch}"; \
        exit 1; \
    fi && \
    bash miniforge.sh -b -p /opt/miniforge && \
    rm miniforge.sh


## **Section 4: Environment Setup with Mamba**

The Mambaforge installation adds `mamba` to the PATH, and an environment named 'team3_env' is created with Python 3.11. The default shell is changed to run commands within this new environment.

* `ENV PATH=/opt/miniforge/bin:$PATH`: Adds Miniforge to PATH.

* `mamba create -n team3_env python=3.11 -y`: Creates a new conda environment with Python 3.11.

* `SHELL ["mamba", "run", "-n", "team3_env", "/bin/bash", "-c"]`: Runs subsequent commands in the new environment.

In [None]:

# Add Mambaforge to PATH and create new environment.
ENV PATH=/opt/miniforge/bin:$PATH
RUN mamba create -n team3_env python=3.11 -y
SHELL ["mamba", "run", "-n", "team3_env", "/bin/bash", "-c"]


## **Section 5: Install Project Dependencies**

The project's dependencies are specified in `requirements.txt` and additional packages are installed using `pip` for libraries not included in the requirements file. This block also cleans up package caches after installation to save space.

* `COPY requirements.txt /app/requirements.txt`: Copies requirements file to the container.

* `mamba install --yes --file requirements.txt`: Installs Python packages.

* `mamba clean --all -f -y`: Cleans caches and temporary files.

* `pip install rank_bm25 streamlit-pdf-viewer`: Installs additional libraries not in requirements.txt.

In [None]:

# Copy requirements.txt and install dependencies.
COPY requirements.txt /app/requirements.txt
RUN mamba install --yes --file requirements.txt && mamba clean --all -f -y
RUN pip install rank_bm25 streamlit-pdf-viewer



## **Section 6: Copy Application Files and Expose Ports**

The entire app directory is copied to `/app` inside the container, and necessary ports are exposed to allow external access for Streamlit and Jupyter Notebook.

* `COPY . /app`: Copies all project files to the container.

* `EXPOSE 5003`: Exposes port for Streamlit.

* `EXPOSE 6003`: Exposes port for Jupyter Notebook.

In [None]:

# Copy files and expose necessary ports.
COPY . /app
EXPOSE 5003
EXPOSE 6003



## **Section 7: Update PATH for Conda Environment**

By adding the new environment's binary path to the PATH variable, any command executed will use the dependencies and tools installed in 'team3_env' by default.

* `ENV PATH=/opt/miniforge/envs/team3_env/bin:$PATH`: Updates PATH to include the conda environment.

In [None]:

# Add team3_env environment’s bin directory to PATH.
ENV PATH=/opt/miniforge/envs/team3_env/bin:$PATH



## **Section 8: Define Entry Point and Default Command**

Sets the entry point to Python, allowing scripts to run with `python` by default. The `CMD` instruction tells the container to run `app.py` if no other command is specified at runtime.

* `ENTRYPOINT ["python"]`: Sets Python as the default entry point.

* `CMD ["app.py"]`: Sets the default command to execute app.py when the container starts.

In [None]:

# Set default entry point and command.
ENTRYPOINT ["python"]
CMD ["app.py"]

