# Python Environment 

## 1.  Virtual Environment

### 1.1 Pros: 

- ~~shared modules~~
  - only pure python modulels available to share with different python version
    - source code open and read directly 
    - file extension: only contain .py file 
  - python modules compiled for a specific version
    - file extension: .dll .so .pyd etc
    - e.g.1 NumPy - C code, 
    - e.g.2 panda - depend on NumPy and Cython
    - e.g.3 SciPy - Built on top of NumPy with C/C++ and Fortran code
    - e.g.4 scikit-learn - Uses Cython and compiled C/C++ for performance.
    - e.g.5 TensorFlow - Core engine is in C++ with Python bindings
    - e.g.6 PyTorch - Backend is C++ (ATen) with Python interface.
- copy environments
- differnet projects supported by different version of modules or python
- keep the settings independent
  - not affect others when using public resources
    - e.g. server 
  - not destroy the global environment if encounting broken installation
    - e.g. client 

### 1.2 How to build it

#### 1.2.1 Built-in tool

- create virtual environment
  - terminal:
    - python3 -m venv venv
    - ~~python -m venv venv~~
      - unavailable for python version 2

- enter virtual environment
  - linux or macOS: source venv/bin/activate   
  - windows: venv\Scripts\activate  
- copy environment
  - using requirements.txt
    - pip freeze > requirements.txt
    - pip install -r requirements.txt
    - limitations:
      - includes all packages
      - dependency issues
        - no support for dependency trees or environment-specific packages
        - confilicts caused by order 
          - e.g. packageA first installed and packageB need different version of the same library
          - break the environment
      - no hash verification by default
        - hash verification
          - reproducibility:fully reproducible compared with version(partial caused by re-uploaded)
          - unique for the specific binary file
            - not be tampered
            - e.g. the same version may link different files 
          - security: mismatch = refuse to install
          - only for copy from the same platform
            - e.g. imcompatibility due to different gpu architextures for gpu-based modules, such as pytorch-lightning     
      - no environment support
        - no built-in support for specifical python version or environment marker
          - e.g. platform-specific dependencies                     
- exit environment
  - deactivate
- delete environment
  - linux or macOS: rm -rf env_name 
  - windows: rmdir /s /q env_name

#### 1.2.2 External tools 

##### Option1: from .yml

- environment.yml

- - conda support
    - manage both python and non-python dependencies
    - how to use it:
      - conda create --name env_name python=version
      - conda activate env_name
      - conda env create -f environment.yml
        - create env from .yml file  
      - conda env export > environment.yml
        - exports env to .yml file  
      - conda deactivate
      - conda remove --name env_name --all
      - conda env list or conda info --envs
        - view all environments   
    - limitations:
      - partial reproducibility
        - unless fully pinned  
        - only top-level packages, not full dependency tree
          - resolve dependencies dynamically
          - resolve sub-dependencies on diffferent devices
            - conda-lock lock
            - conda env export --from-history
      - no built-in support to isolate optional or dev dependencies
      - limited packages in conda channels
        - some package only exist PyPI
          - install using pip
          - without the dependency management of conda
          - pip may overwrite conda-install packages

- - features:        
    - readable:
      - easy to read, edit, and understand              
      - minimal use of punctuation
        - no braces, quotes, or unnecessary commas
        - indentation-sensitive
          - but inconsistent spacing(2 vs 4 spaces) or incorrect indentation cause error
        - but unclear error messages
      - only declarative
        - no support for variables, loops or conditionals by default 
    - structured and hierarchical:
      - nested data support
      - list and dictionary support
        - arrays e.g. items: /n - apple /n - banana
        - dic e.g. config: /n color: white            
    - comments supported using "#"
      - but cannot add metadata or disable blocks
      - only declarative
        - no support for variables, loops or conditionals by default 
    - auto convert the data type for unquoted text
      - but data type sometimes incorrectly
        - e.g. silent errors caused by the string without quote:
              - 080 as octal
              - yes as true
      - lack of built-in validation
        - unless using tools like yamllint     
    - good for conda envs, github actions workflows
      - but different rules on different tools
      - only for small configs
        - hard to manage at scale without supported tools      

#### Option2: from .toml and .lock

- pyproject.toml and poetry.lock

- - pyproject.toml
    - labeled with [tool.poetry.dependencies]
  - poetry.lock
    - exact version and all its sub-dependencies
  - poetry support
    - dependency management and packaging
      - auto solve and lock dependencies
      - consistent environments
    - handles virtual environments automatically
      - single environmet per project
        - not share across projects  
      - limited control of virtual environment location
        - default location: .venv  
      - no built-in command to deactivate(like poetry deactivate)
        - rely on shell behavior(exit, deactivate)
    - how to use it:
      - input: 
        - poetry install
          - import from .toml file
          - install all listed dependencies
          - with poetry.lock
            - using exact versions locked
          - otherwise
            - resolve and install compatible version based on .toml
      - output:
        - rebuild .toml and .lock from scratch
          - poetry init
            - create a new pyproject.toml 
          - poetry add ...
            - add dependencies 
          - poetry lock
            - generate poetry.lock
        - .toml and .lock update the dependencies when add packages 
        - export full env to requirements.txt
          - poetry export -f requirements.txt -o requirements.txt
            - -f: short for --format, output format of the dependencies
            - -o: short for --output: write the outcome into the file
  - limitations:
    - some traditional project without pyproject.toml
    - limited support for non-python dependencies
    - more slower due to dependency resolution and local file generation 

### 1.3 What happens behind scenraio

#### Looking into the installation

##### python

- terminal: `python3 -m venv venv`
  - call python script:
  - `Lib/venv/__init__.py`
    - `EnvBuilder`
      - `create` function

- - - - `create_configuration` function
        - make directory structure
        - write `pyvenv.cfg`

- - - - `setup_python` function
      - `_setup_pip` function
        - if no `--without-pip` flag

- - created files:
    - structure
      - venv
        - `pyvenv.cfg`
          - key file for python to recognize the environment
          - using `sys` module to check the variables in python/sysmodule.py 
          - installing will trace internal config of environment variables in modules/getpath.py
        - `bin`
          - some executable files
        - `lib`
          - `site-packages`: store the package files
        - `include`
    - e.g.below 

##### conda

- `conda create --name env_name`
  - create files:
    - `conda-meta`
      - `.json`
        - generated if installed package using conda install
        - no generation if installed package using pip install
        - more dependencies than pip
      - e.g. below

###### differences

- Prefix and Base prefix

- - test `sys` on terminal for pythonenv
    - e.g. below

- - test `sys` for conda environment
    - e.g. below

In [5]:
import sys

print("Python version:", sys.version)
print("Executable:    ", sys.executable)
print("Prefix:        ", sys.prefix)
print("Base prefix:   ", sys.base_prefix)
print("In venv?       ", sys.prefix != sys.base_prefix)
print("Path (imports):")
for p in sys.path:
    print("  ", p)
print("Platform:      ", sys.platform)
print("Implementation:", sys.implementation)

Python version: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 21:00:12) [Clang 16.0.6 ]
Executable:     /opt/anaconda3/bin/python
Prefix:         /opt/anaconda3
Base prefix:    /opt/anaconda3
In venv?        False
Path (imports):
   /Users/langyanxia/machine-learning-project/python
   /opt/anaconda3/lib/python312.zip
   /opt/anaconda3/lib/python3.12
   /opt/anaconda3/lib/python3.12/lib-dynload
   
   /opt/anaconda3/lib/python3.12/site-packages
   /opt/anaconda3/lib/python3.12/site-packages/aeosa
Platform:       darwin
Implementation: namespace(name='cpython', cache_tag='cpython-312', version=sys.version_info(major=3, minor=12, micro=2, releaselevel='final', serial=0), hexversion=51118832, _multiarch='darwin')


## 2. Modules or Packages

### 2.1 Installation

#### Official version

- termimal:
  - pip install package
    - for latest version
  - pip install package==version
    - compatibility:
      - various packages for the project may need specifical version due to specifical dependencies
      - e.g. function reference between libraries
      - otherwise, throw error

- workflow:
  - connect to PyPI(or another target repository)
    - PyPI: the python package index
      - official online repository: python packages published and shared
      - browse packages: https://pypi.org
  - find the target package
  - download it and its dependencies
    - .whl file: 
      - Binary package
      - ZIP archive including:
        - Python code (.py)
        - Compiled extensions (.pyd, .so)
        - Metadata (METADATA, RECORD, etc.)
      - e.g. numpy‑1.25.0‑cp311‑cp311‑win_amd64.whl
             - numpy version 1.25.0
             - Built for CPython 3.11
             - Windows 64-bit
      - advantages: 
        - faster installation:
          - fast to install without compile from source 
        - platform-specific: 
          - tailored for Windows, macOS, Linux, and specific Python versions
        - offline installation: 
          - download and install without internet access.
  - Install it

#### Exclusive version from source

- build:
  - build from source code instead of pre-built version from PyPI
    - e.g. github or .tar, .gz, .zip file

- advantages:
  - latest or unreleased features
    - the version onPyPI outdated
    - some bugs or compatibility issues fixed but not released
  - customization
    - fix the source code before installing
    - debug or add features
  - no pre-build wheels
    - no pre-cpmpiled binaries(wheels) for specifical platform
    - e.g. torch version supported by some cuda verison
  - offline installation
    - no internet access 

- terminal:
  - no source code on local:
    - pip install git+https://githum.com/user/repo.git
    - workflow:
      - fetch the repo
      - create a temporary directory
      - install the package
      - clean up: not keep the repo locally 
    - scenario:
      - auto installation
        - e.g. in a requirements.txt 
      - no need to edit or inspect the repo   
  - source code download on local
    - workflow:
      - git clone link
        - clones the repo to local 
      - cd target_folder
        - open the local folder 
      - pip install .
        - install the package from the target_folder
      - pip install e .
        - symbolic link(or .pth file) instead of copied packages into "site-packages" directory(the installed location)
        - using live code in dev directory
        - no reinstall if changing the code
        - requirements:
          - setup.py or pyproject.toml
      - pip install -e .[dev]
        - add extra dependencies
        - setup.py
          - tranditional project
          - python script
        - pyproject.toml
          - modern way for PEP 518/517
          - toml config file
          - supported by setuptools(>=61), poetry, flit, etc
          - pyproject is the primary config if including setup.py and pyproject.toml
  - scenario:
    - inspect, modify, or debug the code
    - contribute or submit a pull request
    - a dev or research env


In [None]:
# examples of setup.py
from setuptools import setup, find_packages

setup(
    name='mypackage',
    version='0.1.0',
    packages=find_packages(),  # Automatically find sub-packages
    install_requires=[
        'requests',
        'numpy>=1.20',
    ],
    extras_require={
        'dev': ['pytest', 'black'],
    },
    entry_points={
        'console_scripts': [
            'mycli=mypackage.cli:main',
        ],
    },
    author='Your Name',
    description='A sample Python package',
)

#### Understudy for pip

##### option1: conda

- conda
  - [details](python_tool_conda.ipynb)
  - conda install package
    - workflow:
      - find all dependencies of the package
      - check for conflicts between current env and new package
      - find best compatible version of all dependencies
      - install or update package
    - strategies for dependencies
      - intact environment
        - consider all installed package 
      - cross-language support
        - resolve dependencies for C/C++,R,Fortran and python packages
      - precompiled binaries
        - install compatible, prebuilt binaries to avoid compile-time conflicts 
      - pinning support
        - pin exact version 

##### option2: poetry

- poetry
  - poetry add package
    - workflow:
      - fetch the latest compatible version of the package
      - resolve all dependencies and sub-dependencies for the package
      - add to pyproject.toml under [tool.poetry.dependencies]
      - update poetry.lock with the exact versions and hashes of everything
      - install it into poetry-managed virtual environment

##### option3: pipenv

- pipenv
  - pip + virtualenv for dependency and environment management
  - pipenv install package