nvidia-gpu-tensorflow

            _     _ _                                     _                             __ _               
 _ ____   _(_) __| (_) __ _        __ _ _ __  _   _      | |_ ___ _ __  ___  ___  _ __ / _| | _____      __
| '_ \ \ / / |/ _` | |/ _` |_____ / _` | '_ \| | | |_____| __/ _ \ '_ \/ __|/ _ \| '__| |_| |/ _ \ \ /\ / /
| | | \ V /| | (_| | | (_| |_____| (_| | |_) | |_| |_____| ||  __/ | | \__ \ (_) | |  |  _| | (_) \ V  V / 
|_| |_|\_/ |_|\__,_|_|\__,_|      \__, | .__/ \__,_|      \__\___|_| |_|___/\___/|_|  |_| |_|\___/ \_/\_/  
                                  |___/|_|

📜 Introduction

This repository is a step-by-step guide for using NVIDIA GPU's on Windows (WSL2) and Linux (Ubuntu) and MacOs GPU's with TensorFlow.

It includes the installation of the necessary drivers, CUDA and cuDNN libraries, and the configuration of the environment to use the GPU with TensorFlow. Also, it includes the a .devcontainer configuration folder to use as a template to develop your own TensorFlow project with GPU support in Docker and a setup.py file to install the external dependencies of the project.

Even though the .devcontainer folder and the setup.py file are templates, they are ready to use for a TensorFlow project with GPU support by just following the steps in the guide. Just check them before using them to make sure they are compatible with your project and to add your info to the setup.py file, if using one, and .devcontainer file.

📑 Guide Structure

The structure of the repository is as follows:

📦nvidia-gpu-tensorflow
 ┣ 📂.devcontainer -----------------------> # Devcontainer configuration folder template (ready to use)
 ┃ ┣ 📜devcontainer.json
 ┃ ┗ 📜Dockerfile
 ┣ 📂figures
 ┣ 📜LICENSE
 ┣ 📜README.md
 ┗ 📜setup.py -----------------------------> # External dependencies installation template file (ready to use)

💻 Setup

It is recommended to use a Linux (Ubuntu) distribution for this project, since it is the most common OS for data science and artificial intelligence tasks and for that reason, NVIDIA GPU configurations are easier to set up.

Not only that, but also because it is the simplest way to configure and maintain the project code overtime since we will be using a Docker container, avoiding any compatibility issues with the OS and if the is any issue update or upgrade, it can be easily resolved by just rebuilding the container.

However, you can also use Windows with WSL2 or MacOS. The requirements for each OS are as follows:

Windows	Linux (Ubuntu) recommended	MacOS
Windows 11 NVIDIA GPU with CUDA support Download and set up WSL2 Install Ubuntu from the Microsoft Store Follow the configuration steps: NVIDIA GPU Configuration WSL2 Configuration	Ubuntu 22.04 or later NVIDIA GPU with CUDA support Install Docker on Ubuntu Follow the configuration steps: NVIDIA GPU Configuration Linux Configuration	macOS 12.0 or later (Get the latest beta) Mac computer with Apple silicon or AMD GPUs Python version 3.10 or later Xcode command-line tools: `xcode-select — install` Follow the configuration steps: MacOS Configuration

🔧 OS Configuration

1. NVIDIA GPU Configuration (Windows and Linux)

In order to use the GPU with TensorFlow, you need to install the NVIDIA drivers, CUDA and cuDNN.

Even though the guide is developed in TensorFlow and therefore not all CUDA and cuDNN versions are compatible with the version of TensorFlow used, for the GPU to work properly, the versions of CUDA and cuDNN and the NVIDIA drivers must be the most recent ones.

1.1 Install NVIDIA drivers:

Windows Linux (Ubuntu)

Download the latest NVIDIA drivers
for your GPU on Windows from the NVIDIA website
Install the .exe file
Check the driver installation:
nvidia-smi

Update and upgrade the system:
sudo apt update && sudo apt upgrade
Remove previous NVIDIA installations:
sudo apt autoremove nvidia* --purge
Check Ubuntu drivers devices:
ubuntu-drivers devices
Install the recommended NVIDIA driver (its version is tagged with recommended):
sudo apt-get install nvidia-driver-<driver_number>
Reboot the system:
reboot
Check the driver installation:
nvidia-smi

After these steps, when executing the nvidia-smi command, you should see the following output:

user@user:~$ nvidia-smi
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3060 ...    Off |   00000000:01:00.0  On |                  N/A |
| N/A   41C    P8             15W /   70W |      73MiB /   6144MiB |     18%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

1.2 Install CUDA toolkit:

Download and install the CUDA toolkit following the instructions for your OS, if you have any issues, visit the CUDA installation guide:

- Windows: Install CUDA toolkit on Windows
- WSL2: Install CUDA toolkit on WSL2
- Ubuntu: Install CUDA toolkit on Ubuntu

After that, open a terminal and run the following command to check the CUDA installation:

For WSL2 and Ubuntu:

sudo apt install nvidia-cuda-toolkit # to avoid any issues with the CUDA installation

nvcc --version # to check the CUDA version

For Windows:

nvcc --version # to check the CUDA version

1.3 Install cuDNN:

Install cuDNN following the instructions for your OS, if you have any issues, visit the cuDNN installation guide:

- Windows (WSL2): Install cuDNN on Windows
- Ubuntu: Install cuDNN on Ubuntu

2. Windows Subsystem for Linux (WSL2) Configuration

After installing the NVIDIA drivers, CUDA and cuDNN, if you are going to develop the project on Windows, you need to set up WSL2 to be able to use the GPU. To do this, follow the steps below:

2.1 Conda Environment

We will use conda to manage the python environment. You can install it following the Miniconda installation guide. After installing miniconda, create a new environment with the following command:

    # Create the environment
    conda create -n use_gpu python=3.12 -y

    # Activate the environment
    conda activate use_gpu

2.2 CUDA and cuDNN compatible versions

Since we are setting up an environment for GPU use in TensorFlow, you need to install the versions of CUDA and cuDNN that are compatible with the version of TensorFlow you are using. For more information or if you are going to use another TensorFlow version, visit the TensorFlow versions compatibility, since for this guide, we are using TensorFlow 2.16.1 and as a result, we need to install CUDA 12.3 and cuDNN 8.9. To do so, just execute the following commands:

    # Install CUDA 12.3
    conda install nvidia/label/cuda-12.3.2::cuda-toolkit

    # Install cuDNN 8.9
    conda install -c conda-forge cudnn=8.9

And finally, set the environment variables to use the CUDA and cuDNN libraries every time the environment is activated:

    mkdir -p $CONDA_PREFIX/etc/conda/activate.d
    echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh

2.3 External Dependencies

Once the environment is activated, you can install the external dependencies by running the following command:

pip install -e. # if setup.py is present

And you are ready to go!

3. Linux (Ubuntu) Configuration

After installing the NVIDIA drivers, CUDA and cuDNN, if you are going to develop the project on Ubuntu, you can follow the same steps as in the Windows Subsystem for Linux (WSL2) Configuration section but having in mind that you are working on a Linux distribution it is recommended to use Docker to create a container with all the dependencies installed and avoid any compatibility and version issues.

⚠ WARNING: Docker set up approach is not recommended for WSL2 nor Windows, since the there are many issues regarding the CPU usage making it unworkable (more info).

3.1 Install the NVIDIA Container Toolkit

Follow the NVIDIA Container Toolkit Guide

After installing the NVIDIA Container Toolkit, you can check the installation by running the following command:

sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

If you get an error when checking the installation, just follow the next steps:

# Restart the Docker service
sudo systemctl restart docker

# Open the Docker configuration file of nvidia-container-runtime
sudo nano /etc/nvidia-container-runtime/config.toml

# Set no-cgroups = true
...
no-cgroups = true
...

# Save and close the file and check the installation again
sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

3.2 Pull the `tensorflow-gpu-jupyter` image (Optional)

This image contains all the correct dependencies for TensorFlow with CUDA and cuDNN installed and a Jupyter notebook server to develop the desired project (if not, pull it will be automatically pulled in the next step). You can pull the image with the following command:

docker pull tensorflow/tensorflow:latest-gpu-jupyter

3.3 Build the container

Before anything, you need to have changed the username in the devcontainer to your username in the system, since the container will be created with the same user and group id as the one in the system.

{
    "name": "TensorFlow GPU Dev Container",
    "build": {
        "dockerfile": "Dockerfile",
        "args": {
            "USER_NAME": "your_username", // Change this to your computer username
            "USER_UID": "1000"
        }
    },
    // Rest of the file
}

After that, since the project has a Dev Container configuration file in .devcontainer folder you just need to, in VSCode, open the project folder and click on the Reopen in Container button that appears in the bottom right corner of the window. Or you can do it at any time by opening the command palette with Ctrl+Shift+P and type Reopen in Container.

Pop-up VSCode message

Command palette

This will pull the tensorflow-gpu-jupyter image if not pulled before and build a container using the custom Dockerfile for the project with all the dependencies needed.

⚠ WARNING: If the project presents a setup.py file, in order to avoid posible issues with the container not detecting some versions of the libraries, just run the following command in the container terminal to install the external dependencies declared in the setup.py file:
pip install -e. # if setup.py is present

Finally, when running any Jupyter notebook, choose the python version that matches the one the image was built with. To check the python version, just run the following command in container terminal:

python --version

To this date, the image is built with python 3.11.0rc1, therefore you need to select the python 3.11.0 kernel in the Jupyter notebook.

And voilà! You have a container with all the dependencies installed and ready to go!:

After that, if any issue or problem arises, just rebuild the container using the command palette and selecting the Rebuild Container option.

4. MacOS Configuration

Finally, if you are going to develop the project on MacOS, you can follow the next steps based on TensorFlow Metal but adapting it to the project dependencies:

4.1 Conda Environment

We will follow the same first steps as in the Windows Subsystem for Linux (WSL2) Configuration section, since we are going to use a coda environment to manage the dependencies. Therefore, install miniconda following the Miniconda installation guide. After installing miniconda, create a new environment with the following command:

    # Create the environment
    conda create -n use_gpu python=3.12 -y
    
    # Activate the environment
    conda activate use_gpu

    # Install external dependencies, if any
    pip install -e.

4.2 TensorFlow for MacOS

TensorFlow does not support GPU acceleration on MacOS with CUDA and cuDNN, so you need to install the specific version for MacOS. To do so, just run the following command:

    pip install tensorflow-metal

Now you are ready to go!

🌱 Contributing

If you wish to make contributions to this guide, please initiate the process by opening an issue or submitting a pull request that encapsulates your proposed modifications.

Fork the Project

Create your Feature Branch (git checkout -b feature/AmazingFeature)

Commit your Changes (git commit -m 'Add some AmazingFeature')

Push to the Branch (git push origin feature/AmazingFeature)

Open a Pull Request

Wait for the PR to be reviewed

If it's approved, the changes will be merged

Done!

🗞️ License

This project is licensed under the MIT License - see the LICENSE file for details.

👥 Contact

Should you have any inquiries or require assistance, please do not hesitate to open an issue.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.devcontainer		.devcontainer
figures		figures
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

License

AlejandroPqLz/nvidia-gpu-tensorflow

Folders and files

Latest commit

History

Repository files navigation