## Building packages
### BIOINF 575

### Python notebooks

Pros:  
* interactive
* contain code and presentation
* facilitate collaboration
* easy to write and test code
* provide quick results
* easy to display graphs

Cons:
* not really scallable to large/complex projects
    * if you have a very complex project you may not want to have all that code in one single notebook

### Scripts and Modules

If you want to write a somewhat longer program, you are better off <b>using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a script.</b> 
    
As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program.

A module is a file containing Python definitions and statements. <b>The file name is the module name with the suffix .py appended</b>. Within a module, the module’s name (as a string) is available as the value of the global variable `__name__`.

In [None]:
#Let's create a module for our classes
!touch gene_module.py
# add Gene class to the file

In [None]:
import gene_module as gm

In [None]:
gm.Gene()

In [None]:
!touch enhanced_gene_module.py
# add EnhancedGene class to the file

In [None]:
import enhanced_gene_module as egm

In [None]:
eg1 = egm.EnhancedGene()

In [None]:
#dir(eg1)

### Packages

Modules are great but complex project might need a lot of modules. 
To make it easier to find them and use them, modules should to be organized in categories based on the functionality they implement or the domain they are used for.

https://docs.python.org/3/tutorial/modules.html#packages

<b>Packages are a way of structuring</b> Python’s module namespace by using “dotted module names”. <b>For example, the module name A.B designates a submodule named B in a package named A</b>. Just like the use of modules saves the authors of different modules from having to worry about each other’s global variable names, the use of dotted module names saves the authors of multi-module packages like NumPy from having to worry about each other’s module names.

In [None]:
!mkdir demo_pkg

In [None]:
!cp test.py demo_pkg
!cp gene_module.py demo_pkg
!cp enhanced_gene_module.py demo_pkg

In [None]:
!touch demo_pkg/__init__.py

In [None]:
from demo_pkg import test as tt

In [None]:
tt.test_function(4)

In [None]:
from demo_pkg import gene_module as gmp

In [None]:
gmp.Gene()

In [None]:
dir(gmp)

In [None]:
# restart kernel

from demo_pkg import enhanced_gene_module as egmp

In [None]:
# dir()

In [None]:
# dir(egmp)

In [None]:
# restart kernel
# need to add __all__ = ["test", "gene_module", "enhanced_gene_module"] in __init__.py to see the modules


from demo_pkg import *



In [None]:
dir()

In [None]:
gene_module

In [None]:
gene_module.Gene()

### Tool example

https://docs.python-guide.org/scenarios/cli/

In [None]:
!mkdir Project_Gene

Copy/move the folder demo_pkg into Project_Gene. <br>
Create a file \_\_main\_\_.py in demo_pkg.

```python
import sys

def main(args=None):
    """The main routine."""
    if args is None:
        args = sys.argv[1:]

    print("This is the main routine.")
    print(f"It should do something interesting with the arguments: {args}.")

    # Do argument parsing here (eg. with argparse) and anything else
    # you want your project to do.

if __name__ == "__main__":
    main()
```

In [None]:
!pwd

The python interpreter has -m module option that will run a package module as a script.
It will run the __main__.py module for a package.

In the terminal run:

```
cd Project_Gene
python3 -m demo_pkg
```


then run:

```
python3 -m demo_pkg arg1 arg2 arg3
```

Example from:<br>
https://chriswarrick.com/blog/2014/09/15/python-apps-the-right-way-entry_points-and-scripts/


Create another module in demo_pkg

```python
#!/usr/bin/python3

from demo_pkg import enhanced_gene_module as egm_p
import sys

def another_method(egene):
    egene.symbol = "updated gene symbol"
    egene.update_symbol("test BRCA1")
    egene.update_snps(["rs1","rs2","rs3"])

def main():
    print("this is another script")
    print(sys.argv)
    gene1 = egm_p.EnhancedGene()
    another_method(gene1)
    print(gene1)
    
    
if __name__ == "__main__":
    main()
```

Create setup.py in Project_Gene.

`setup.py` is the build script for setuptools. 
It provides setuptools with parameters which contain information about the package (e.g. name and version).

Entry points allow building commandline tools that run funtions from the package modules.

```python
from setuptools import setup

setup(name='demo_pkg',
      version='0.1.0',
      packages=['demo_pkg'],
      entry_points={
          'console_scripts': [
              'test_run = demo_pkg.test:main', 
              'another_module_run = demo_pkg.another_module:main'              
          ]
      },
      )
```

```
# You could install something with python setup.py -- it is not recommended but things happen. 

python setup.py install --record files.txt

# This will cause all the installed files to be printed to files.txt.
# Then when you want to uninstall it simply run the following command (be careful with the 'sudo')

cat files.txt | xargs sudo rm -rf
```

#### A good way to install your package!    

In the terminal run:
    
```
pip install .
```

Then to test your console commands run:

```
test_run
another_module_run
```

Fix import issue for gene_module by adding package name: package_name.module_name.   
Add demo_pkg.gene_module in the enhanced_gene_module.py file 

In the terminal run the following command to
UNINSTALL THE PACKAGE

```
pip uninstall demo_pkg
```

Then reinstall the package:
    
```
pip install .
```

Then run the code again, no error should occur:
```
another_module_run
```


Example of a package PyVCF:
    
https://github.com/jamescasbon/PyVCF/blob/master/setup.py
    


More resources:

https://python-packaging-tutorial.readthedocs.io/en/latest/setup_py.html <br>
https://python-packaging.readthedocs.io/en/latest/command-line-scripts.html <br>
https://www.geeksforgeeks.org/command-line-scripts-python-packaging/ <br>
https://www.w3schools.com/python/python_modules.asp <br>
https://click.palletsprojects.com/en/7.x/ <br>
https://packaging.python.org/tutorials/packaging-projects/
https://www.git-tower.com/blog/command-line-cheat-sheet/
http://www.yolinux.com/TUTORIALS/unix_for_dos_users.html