# Settings up your Python Workstation



## Python
Make sure you have python installed. Ideally you should be using the latest stable version of python available. 
3.9, and 3.10 are good choices

https://www.python.org



### Check your python version

In [3]:
!python --version

Python 3.9.10


## Managing Python versions

In some cases you might need to work with different python versions for different projects. Tools like `pyenv` let you set different python versions per directory / project

https://github.com/pyenv/pyenv



In [8]:
!pyenv local

pyenv: no local version configured for this directory


In [9]:
!pyenv global

3.8.10


Note: The `!` used in the jupyer cells above tells jupyter to run the commands in the shell (rather than within the python interpreter)
When using commands in bash, you don't need to include the `!`

## Package managers

Next you'll need to manage packages installed for python. Package managers help you install packages, set versions, and maintain seperate environments (called "virtual environments" for each project.

### Pip
pip is part of the python standard library and is the main package manager that comes with python. pip is great because it is simple, but lacks features that very handy when working in different contexts. 

To use pip:

In [None]:
!pip install PackageA

In [None]:
!pip install PackageA --upgrade

### Conda
A highly recommended open-source package management system for installing Python for scientific
computing contexts is conda by Continuum Analytics.

Several different flavors:
- Anaconda: contains many base packages installed
- Miniconda: no packages installed
- Miniforge: no packages installed, pointing to the community maintained package repository

Conda is very useful for data science and scientific contexts. Once you have all the essential pacakges installed, you can focus on the project at hand, without fussing with packages and virtual environments. 

To install packages on conda:

In [None]:
!conda install PackageA

In [None]:
!conda update PackageA

Conda can handles multiple envs for different projects. To list available envs:

In [19]:
!conda env list

# conda environments:
#
base                  *  /home/botros/mambaforge



To create a new conda env (the -d flag indicates a dryrun (data isn't saved)):

In [17]:
!conda create  -d -n test1

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/botros/mambaforge/envs/test1




DryRunExit: Dry run. Exiting.



Learn more using the conda docs: https://docs.conda.io/en/latest/

### venv
Python contains a standard virtual environment manager as well called venv. https://docs.python.org/3/library/venv.html

### Poetry

Poetry is another useful package manager that has gained lots of popularity. Poetry is more commonly used in python software development. However, it is very useful in more mature projects that are shared with other team members. 

Poetry handles installation, upgrading, and virtual env create of python packages. It will store the dependencies of the project in a file called: `pyproject.toml`. This file contains the dependency specifications for both production and development use cases.

Poetry also creates a `poetry.lock` file, which specifies versions of all packages and their dependencies installed. This helps ensure that your environment is exactly the same. It's not always necessaary to make use of the lock file.

https://python-poetry.org/docs/basic-usage/




## Popular packages

Here are some very useful packages that are handly in data science and scientific contexts. 

- NumPy
- SciPy
- Scikit-learn 
- Matplotlib
- pandas
- jupyterlab

Nice to have packages:
- seaborn (nicer visualizations)
- statsmodels
- mlxtend
- bentoml (package models)
- mlflow

For deep learning:
- Tensorflow
- Keras
- Pytorch 
- Pytorch lightning 

For testing
- pytest

## Practical Tips

1. Create virutal environments for each project
    Its good practice to have isolated environments for each project. This allows you to mess around with different packages and versions, without impacting other projects. 

2. If your package manager doesn't output a dependency file, create one, and keep it updated.
    You can always get all requirements using pip: `pip freeze > requirements.txt`
    
3. Setup version control for all projects, and use it!
    Version control is very useful even if working solo. You can easily revert code and keep track of your progress on a project.
    
4. Star popular projects on github. This lets you keep track of updates and new feature releases
    