# Development environment setup
This notebook contains several manually and automatically executed setup steps required to run the notebooks and Python scripts in this repo.

The environment this was developed on consists of the following:
* Ubuntu 22.04 LTS (Jammy) running on WSL2 on Windows
* VSCode with Python and Jupyter extensions
* Nvidia GeForce RTX 3060 Laptop GPU
* Python 3.10 virtual environment set up using `pip`
* DL framework based mainly on PyTorch, [fastai](https://github.com/fastai/fastai) and [miniai](https://github.com/fastai/course22p2/tree/master/miniai) as developed in Part 2 of the fastai [Practical Deep Learning for Coders](https://course.fast.ai/Lessons/part2.html) course
  - Due to some dependency issues, **miniai** is simply cloned and referenced `sys.path.append` rather than installing it into the virtual environment

Additional checks were performed on Google Colab and Kaggle. See [explore_colab.ipynb](./explore_colab.ipynb) and [explore_kaggle.ipynb](./explore_kaggle.ipynb) for more details on these.

For working samples on these environments (working in Dec 2024), see [colab_example.ipynb](./colab_example.ipynb) and [kaggle_example.ipynb](./kaggle_example.ipynb)

## Prerequisites
This note does not describe installation of the following components, which is described elsewhere:
* [WSL2 and Ubuntu 22.04 LTS](https://learn.microsoft.com/en-us/windows/wsl/install)
* [VSCode on Windows and WSL sides](https://code.visualstudio.com/docs/setup/windows)
* [Python and Jupyter VSCode extensions](https://code.visualstudio.com/docs/python/python-quick-start)

## Virtual environment creation
* In VSCode, select a ".py" or ".ipynb" file
* From the command palette, select `Python: Create environment...` 
* When prompted between `venv` and `conda`, choose `venv`
* Select the Python interpreter of your choice

Sometimes this fails in VSCode and you might need to manually select the correct interpreter first before creating the virtual environment

# Enable CUDA with PyTorch
* Select the newly created venv using `Select Kernel` before running the cells below
* If prompted, also install the `ipykernel` package
* See [PyTorch local setup guide](https://pytorch.org/get-started/locally/)

In [None]:
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Test if CUDA is available (a Kernel restart may be required before this will return true)

In [2]:
import torch
torch.cuda.is_available()

True

# Install additional dependencies
These cells can be executed directly from this notebook running from a kernel with the correct virtual environment activated

idlmav dependencies
* Model tracing is based on `torch.fx`
  - `tabulate` is required by `torch.fx.print_tabular()`
* `torchprofile` is used for FLOPS claculations
* `munkres` is required for the layout calculation (Hungarian method implementation)
* `plotly` is used for rendering
  - `nbformat` is required for important `plotly` functionality to work
  - `pandas` is required for `plotly.express` to work
  - `ipympl` is required for the `%matplotlib widget` magic and sometimes also required by plotly

In [None]:
%pip install matplotlib
%pip install numpy
%pip install tabulate
%pip install torchprofile
%pip install munkres
%pip install plotly
%pip install nbformat
%pip install ipympl

Testing and verification
* `timm` and `fastcore` are required for some models with which this library is tested
* `torchview` and `torchinfo` are used to verify the correctness of the model tracing algoritm
  - `graphviz` is required by `torchview`
* `colorspacious` was used to convert RGB colors to CIELAB to select the colors to used for nodes

In [None]:
%pip install timm
%pip install fastcore
%pip install torchview
%pip install torchinfo
%pip install graphviz
%pip install colorspacious

Debugging
* `beautifulsoup` is required to print minified HTML strings in a human-readable format
* `jsbeautifier` is required to print minified Javascript strings in a human-readable format

In [None]:
%pip install beautifulsoup4
%pip install jsbeautifier

More models
* `ultralytics` contains the YOLO models
* `transformers` contains the CLIP models
* `sentencepiece` is required for the T5 Tokenizer

In [None]:
%pip install ultralytics

In [None]:
%pip install transformers

In [None]:
%pip install sentencepiece

## Install miniai
* Some tests were performed on models developed during part 2 of the 2022/3 fast.ai course [Practical Deep Learning for Coders](https://course.fast.ai/)
  - During the course, a high-level framework called `miniai` is developed
  - The source is available [here](https://github.com/fastai/course22p2/tree/master/miniai)
* In the course ([Lesson 15](https://course.fast.ai/Lessons/lesson15.html), 03:15), the following command is recommended to install `miniai`: `pip install -e .`
  - When I started developing `idlmav` (November 2024), the above command would trigger an installation of torch 2.0.1
  - I wanted to test `idlmav` with the latest version of PyTorch (2.5.0 at the time)
  - For my setup, I therefore avoided running `pip install -e .`
* I therefore used the following workaround to access models that use the `miniai` framework: 
  - Clone the `miniai` source using the cell below
  - Install the dependencies of `miniai` (e.g. `fastprogress`) using the next cell 
  - Add the code in the cell below that to the top of all notebooks that import from `miniai`

In [1]:
# Archive the miniai repo to the `tmp` directory and extract the archive to the `miniai` directory
from pathlib import Path
Path('../tmp').mkdir(exist_ok=True)     # Location to archive the repo to
Path('../miniai').mkdir(exist_ok=True)  # Location to extract the archive to
!git -C ../tmp init -b master
!git -C ../tmp remote add origin https://github.com/fastai/course22p2.git
!git -C ../tmp fetch --depth=1 origin master
!git -C ../tmp archive --format=tar origin/master:miniai | tar -x -C ../miniai
!rm -rf ../tmp


Initialized empty Git repository in /home/dev/ai/idlmav/tmp/.git/
remote: Enumerating objects: 145, done.[K
remote: Counting objects: 100% (145/145), done.[K
remote: Compressing objects: 100% (139/139), done.[K
remote: Total 145 (delta 10), reused 82 (delta 5), pack-reused 0 (from 0)[K
Receiving objects: 100% (145/145), 28.36 MiB | 5.19 MiB/s, done.
Resolving deltas: 100% (10/10), done.
From https://github.com/fastai/course22p2
 * branch            master     -> FETCH_HEAD
 * [new branch]      master     -> origin/master


In [None]:
# Install miniai dependencies
%pip install fastprogress
%pip install torcheval
%pip install datasets

In [None]:
# Copy code inside branch to the top of all notebooks that import from miniai
if False:
    import sys, importlib
    from pathlib import Path
    sys.path.append(str(Path.cwd().parent))

# Viewing plotly figures from directly executed scripts in VSCode+WSL environment
> NOTE: This section does **not** apply to any of the following:
> * Figures created in a notebook: these are displayed inline and do not require X11 forwarding
> * Matplot figures: these are displayed via a different mechanism (see the next section)
> * Any other environment than WSL (e.g. Colab, Kaggle, Native or dual-boot Linux installation)

> NOTE: If some browser functionality inside Plotly seems to cause an indefinite wait, check for the possibility that the `DISPLAY` environment variable may be configured for X11 forwarding without an X-Server running on the Windows side. See [test_issue_default_browser.py](./explore/test_issue_default_browser.py).

When running a ".py" script directly and calling `go.Figure.show()`, plotly serves the figure on a local HTML server and opens the client in a browser. In most setups, this works out of the box. However, on some WSL setups, `xdg` may trouble finding the browser installed on the Windows OS.

Some sources recommend just setting the `BROWSER` environment variable to the path on the Windows OS, e.g.
```bash
export BROWSER='/mnt/c/Program\ Files/Google/Chrome/Application/chrome.exe'
export BROWSER='/mnt/c/Program\ Files/Mozilla\ Firefox/firefox.exe'
export BROWSER='/mnt/c/Program\ Files\ \(x86\)/Microsoft/Edge/Application/msedge.exe'
```

The success of this step can be tested as follows:
```bash
xdg-open https://google.com
```

In my case, `xdg` kept stumbling on the spaces in the Windows path no matter how I escaped them. The workaround was to create a wrapper script without spaces in the path, launch the browser from the wrapper script and specify the wrapper script in the `BROWSER` environment variable. Here are the steps, using Firefox as an example:
* Create the script in a location on the OS path: `sudo nano /usr/local/bin/firefox-wrapper`
* Write and save the script:
  ```bash
  #!/bin/bash
  "/mnt/c/Program Files/Mozilla Firefox/firefox.exe" "$@"
  ```
* Make the script executable: `sudo chmod +x /usr/local/bin/firefox-wrapper`
* Update the `EXPORT` environment variable: `export BROWSER=firefox-wrapper`

After these steps, `xdg-open` worked on my setup.

I also added the `EXPORT` environment variable to [launch.json](.vscode/launch.json). Modify it to use a different browser.

If required, the default renderer of plotly can also be configured:
* plotly provides targeted renderers for different environments, e.g. colab, kaggle, azure, databricks
* Here is the code to list available renderers and select one:
  ```python
  import plotly.io as pio
  available_renderers = list(pio.renderers)
  print(f'Available renderers: {", ".join(available_renderers)}')
  pio.renderers.default = 'browser'  # Replace with renderer of your choice
  ```

As a final test, run [test_plotly_browser_setup.py](./explore/test_plotly_browser_setup.py)

# Viewing matplotlib figures from directly executed scripts in VSCode+WSL environment
> NOTE: This section does **not** apply to any of the following:
> * Figures created in a notebook: these are displayed inline and do not require X11 forwarding
> * Plotly figures: these are displayed via a different mechanism (see the previous section)
> * Any other environment than WSL (e.g. Colab, Kaggle, Native or dual-boot Linux installation)

> NOTE: this library relies much more heavilyon plotly than matplotlib and that this section will therefore rarely be required.

> WARNING: If the `DISPLAY` environment variable is set for X11 forwarding on the Linux side, the X-server MUST run on the Windows side, whether its functionality is used or not. If it is not running, some browser functionality used by Plotly may wait indefinitely. When in doubt, just skip this part until needed. See [test_issue_default_browser.py](./explore/test_issue_default_browser.py).

To view matplotlib figures from directly executed scripts in a VSCode+WSL environment, X11 forwarding must be configured. Here are the steps that worked on my setup:
* On Windows, download and install [VcXsrv](https://sourceforge.net/projects/vcxsrv/)
* Start `XLaunch` on Windows
  - Select `Multiple windows`
  - Display number: 0
  - Select `Start no client`
  - Deselect `Native Opengl` (wgl)
  - Select `Disable access control`
* To run inside the VSCode debugger, just ensure that the `Tasks Shell Input` VSCode extension from `Augusto` is installed (see the subsection below)
* To run without the VSCode debugger, set the `DISPLAY` environment variable manually
  ```bash
  export DISPLAY=$(ip route | grep default | awk '{print $3}'):0.0
  echo $DISPLAY
  ```
* To test, run [test_matplotlib_x11_forwarding.py](./explore/test_matplotlib_x11_forwarding.py)

### `DISPLAY` environment variable setup for Python debugging tasks launched from VSCode

> WARNING: If the `DISPLAY` environment variable is set for X11 forwarding on the Linux side, the X-server MUST run on the Windows side, whether its functionality is used or not. If it is not running, some browser functionality used by Plotly may wait indefinitely. See [test_issue_default_browser.py](./explore/test_issue_default_browser.py).

This has already been configured inside [launch.json](.vscode/launch.json), but the following step is required as a dependency:
* Install the `Tasks Shell Input` VSCode extension from `Augusto`

Here are the steps that were performed in case they need to be reversed or updated for some reason:
* Add an `env` key to [launch.json](.vscode/launch.json) to set the `DISPLAY` environment variable when launching a Python script
  ```json
  "env": {
      "DISPLAY":"${input:ipAddr}:0.0"
  }
  ```
* Configure the `ipAddr` inputs:
  ```json
  "inputs": [
      {
          "id": "ipAddr",
          "type": "command",
          "command": "shellCommand.execute",
          "args": {
              "command": "ip route | grep default | awk '{print $3}'",
              "fieldSeparator": "|",
              "description": "Select the IP address",
              "useSingleResult": "true"
          }
      }
  ]
  ```

# Unused steps retained for their value as examples

In [None]:
# Test directory changes in notebook environment
import os
from pathlib import Path
Path('../miniai').mkdir(exist_ok=True)
workspace_dir = os.getcwd()  # In case we need to get it back later
os.chdir('../miniai')
!pwd
os.chdir('..')
!pwd

/home/dev/ai/idlmav/miniai
/home/dev/ai/idlmav


In [2]:
# Get back the working directory if a cell failed in which it was changed
os.chdir(workspace_dir)
!pwd

/home/dev/ai/idlmav


In [None]:
# Clone into miniai/miniai
os.chdir('../miniai')
if Path('__init.py__').exists():
    !git pull --depth=1 origin master
else:
    !git init -b master
    !git remote add -f origin https://github.com/fastai/course22p2.git
    !git config core.sparseCheckout true
    !echo "miniai" >> .git/info/sparse-checkout
    !git pull --depth=1 origin master
os.chdir('..')