### Programming for Biomedical Informatics (INFR11260)
#### Semester 1 - Working with Notebooks & Git

##### Ian Simpson (ian.simpson@ed.ac.uk) 2024

This notebook is going to be used as a working space in week 1 to work with VisualStudioCode in a notebook, look at extensions, connecting to remote servers, ssh config, envioronemnts, kernels, and a tour of the notebook work setting.

NB - this will be familiar to some I'm sure, but this is to ensure that everyone on the course is set up and good to go for next week!

Content:
- environment management (conda, virtualenv)
- VSC tour, customisation
- interacting with Git in VSC
- keeping records - commenting, markdown, and (side-topic) Obsidian
- a few simple programming examples demonstrating some of above.

#### Conda and virtualenv

These are two related but different methods for setting up an isolated and clean coding environment for your work. They are both very well documented:

[Conda environment management documentation](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html)
[virtual env (a python package)](https://virtualenv.pypa.io/en/latest/)

We pretty much always use a new environment for each coding project. Why?
- package dependencies for projects can vary wildly, creating environments clean makes them smaller and more efficient and helps with reproducibility
- some packages you will want to use have specific combinations of package versions for other packages that they depend on. This makes it impossible to have one environment for all your projects.
- environment specifications can be copied (to create a new version based on an existing one) and (importantly) shared with others.
- specifying explictly the package environment makes it very easy to set up containerised projects (Docker, Singularity etc.). These are in effect full image clones of your setup that you can release (for example) with a paper or when working with other groups around the world.
- cond environments in particular can handle both ```conda``` and ```pip``` installation which is important because conda only contains a subset of packages so you often need to blend the two.
- conda and pip can add in other "repositories" as sources for example [bioconda](https://bioconda.github.io)

We're going to set up a ```conda``` environment.

#### Key Conda Commands

Let's create a new environment for the course:

- ```conda create --name <my-env>```

You can specify the version of Python you want to use and any packages you'd like to install straight away.

- ```conda create --name test python=3.10 scipy pandas numpy ....```

As you've already installed the basic environment we can add new packages easily. First we need to enter the environment:

- ```conda activate <my-env>```

Now you can install some packages:

- ```conda install pandas scipy numpy```

Now you can add additional repositories:

- ```conda config --add channels bioconda```
- ```conda config --add channels conda-forge```

To exit your environment:

```conda deactivate```

To list your existing environments:

```conda env list```

To destroy an environment (careful !):

```conda remove -n <my-env> --all```

#### Key virtualenv commands

```venv``` is now a standard part of python so you don't need to install it.

To create an environment:

- ```virtualenv <my-env>```

To use it (a bit more painful that conda you need to find where it lives)

- ```source env_name/bin/activate```

You can then use ```pip``` as you usually would to install packages.

To deactivate:

- ```deactivate```

To delete you literally remove the directory the environment lives in.

### Which should I use?

#### Pros of Conda:
1. **Cross-Language Support**: 
   - Conda supports not just Python, but other languages like R, Ruby, Lua, and more, making it versatile for multi-language projects.
   
2. **Binary Package Management**:
   - Conda installs pre-compiled binaries, which simplifies the installation of complex libraries (e.g., NumPy, SciPy, TensorFlow) that would otherwise require compiling from source in `venv`.

3. **Dependency Management**:
   - It is better at resolving complex dependencies, preventing issues like conflicting library versions. This is particularly useful in scientific computing and data science projects.

4. **Broader Package Availability**:
   - Conda’s package repository, Anaconda, includes many data science and machine learning packages that may not be available or easily installable through PyPI alone (which `venv` relies on).

5. **Environment Isolation**:
   - Conda creates fully isolated environments, including Python itself, unlike `venv` where Python is shared across environments.

6. **Platform Independence**:
   - Conda environments work well across different platforms (Linux, macOS, and Windows), and the package binaries are often optimized for the platform, reducing compatibility issues.

#### Cons of Conda:
1. **Larger Disk Space**:
   - Conda environments tend to be larger in size compared to `venv` because they bundle the entire Python interpreter and all dependencies.

2. **Package Availability and Speed**:
   - Some packages may take longer to appear in the Conda repository, or may not be available at all, whereas most Python packages are immediately available via PyPI for `venv`.

3. **Slower Installation**:
   - Installing packages and creating environments can sometimes be slower in Conda, particularly because it attempts to resolve dependencies more rigorously.

4. **Learning Curve**:
   - Conda introduces new commands that may take time to learn if you're used to Python's built-in tooling like `pip` and `venv`.

#### Pros of venv:
1. **Lightweight**:
   - Since it uses Python’s built-in virtual environment system, `venv` environments are generally smaller, as they do not bundle the Python interpreter.

2. **Direct Integration with Python**:
   - `venv` is part of the Python standard library, so there’s no need to install additional tools like Conda. It integrates seamlessly with `pip` and PyPI.

3. **Faster Setup**:
   - Creating virtual environments and installing packages with `venv` is often quicker, especially when you’re dealing with simple projects or pure Python packages.

4. **Standard for Python Projects**:
   - Many Python projects and environments default to `venv`, which can make it easier to work within standard Python practices.

#### Cons of venv:
1. **Python-only**:
   - `venv` only works for Python environments and does not support other languages. If you’re working on multi-language projects, this could be limiting.

2. **More Manual Dependency Management**:
   - You need to manually handle dependencies and package version conflicts, which could become tedious with complex environments.

3. **Installing Non-Python Dependencies**:
   - If a Python package has non-Python dependencies (like C libraries), managing those dependencies can be harder with `venv` compared to Conda.

### Now some practice using notebooks with environments and doing some coding

In [3]:
#### Quick lookup for markdown syntax and examples

'''
# Headings
# H1
## H2
### H3

# Emphasis
*italic* or _italic_
**bold** or __bold__
***bold and italic*** or ___bold and italic___

# Lists
1. First ordered list item
2. Another item
  * Unordered sub-list.
1. Actual numbers don't matter, just that it's a number
    1. Ordered sub-list

* Unordered list can use asterisks
- Or minuses
+ Or pluses

# Links
[I'm an inline-style link](https://www.google.com)

[I'm an inline-style link with title](https://www.google.com "Google's Homepage")

# Code
`code`
```python
print("Hello World")
''';