## Building modules
### BIOINF 575 - Fall 2023


### Python notebooks

Pros:  
* interactive
* contain code and presentation
* facilitate collaboration
* easy to write and test code
* provide quick results
* easy to display graphs

Cons:
* not really scallable to large/complex projects
    * if you have a very complex project you may not want to have all that code in one single notebook

#### Start coding

In [None]:
"""
    Gene - A class for demonstration purposes.
    The class has 2 attributes:
    - symbol - text (str) - the gene symbol
    - snp_no - numeric (int) - the number of SNPs known for the gene
    
    The class allows for the update of the numeric attribute.
    - update_snp_no updates snp_no by a given additional number of SNPs
"""
class Gene:
    def __init__(self, gene_symbol = "Gene Symbol", snp_number = 0):
        self.symbol = gene_symbol
        self.snp_no = snp_number
        
    def __str__(self):
        return f"Gene object: Gene symbol = '{self.symbol}', Number of SNPs = {self.snp_no}"
        
    def __repr__(self):
        return f"Gene('{self.symbol}',{self.snp_no})"
    
    def update_snp_no(self, additional_snps = 0):
        """
        Add parameter value to snp_no.

        Keyword arguments:
        int: additional_snps - the number to add (0)
        
        Returns:
        int: updated snp_no
        """         
        old_value = self.snp_no
        try:
            self.snp_no = self.snp_no + additional_snps
        except TypeError: 
            self.snp_no = self.snp_no + 1
            print(f"'{additional_snps}' is not a numeric value, we added 1 because at least one new SNP was found.")
        finally:
            print(f"Old value was {old_value}, new value is {self.snp_no}")
        return self.snp_no

In [None]:
# Explore Gene


#### Do more coding

In [None]:
"""
    EnhancedGene - A class for demonstration purposes.
    The class extends the class Gene with the methods:
    - update_symbol - updates symbol
    - update_snps - updates the snp number given a list of new SNPs
    
"""
class EnhancedGene(Gene):
    
    def update_symbol(self, new_symbol = ""):
        """
        Change symbol to new_symbol

        Keyword arguments:
        str: new_symbol - the string to replace the gene symbol, should contain test ("")
        
        Returns:
        str: updated gene symbol
        """        
        old_value = self.symbol
        try:
            self.symbol = new_symbol
            index = self.symbol.index("test")
        except TypeError: 
            self.symbol = self.symbol + " " + str(new_symbol)
            print(f"'{new_symbol}' is not a string, we made the conversion and added it")
        except ValueError: 
            self.symbol = self.symbol + " test"
            print(f"'{self.symbol}' does not contain 'test', we added 'test' to it")

        finally:
            print(f"Old value was '{old_value}', new value is '{self.symbol}'")
        return self.symbol
    

    def update_snps(self, snp_list = []):
        """
        Add parameter snp_list length to snp_no.
        
        Keyword arguments:
            list: snp_list -  the list of snps to add 
        
        Returns:
            int: updated gene snp_no
        """
        old_snp_no = self.snp_no
        try:
            self.snp_no += len(snp_list)
        except TypeError:
            print("We did not change the SNP no, no collection of SNPs was provided!")
        else:
            print("SNP no updated!")        
        finally:
            print(f"Old value for the SNP no was {old_snp_no}, new value is {self.snp_no}.")
        return self.snp_no

In [None]:
# Explore EnhancedGene



### From exploration work to production

https://docs.python.org/3/tutorial/modules.html

### Python Modules and Scripts

In [None]:
# All commands with ! can be run without ! in the terminal (mobaxterm or git bash)
# if you use the terminal navigate to the same folder as the notebook
# Create a module/script file test.py
# A module/script is a .py file with python code
# The first line is a comment line that tells the bash interpreter that this is a python script and what to use to run it
#   #!/usr/bin/python
# In Windows the quotes will be added and need to be removed

# !echo 'print("This is a python script")' > test.py



Create a file `test.py` in the same folder with the notebook and add the print statement to it:

```python
    print("This is a python script")
```

In [None]:
# Check python version

#!which python
!type python

In [None]:
!python --version

In [None]:
#!which python3
!type python3

In [None]:
!python3 --version

In [None]:
# run/execute script

!python test.py


In [None]:
!python3 test.py

In [None]:
!ls -la test.py

We add the header line in the .py file to tell the shell interpreter what program to use to run the script.   
The path is the result of: `python --version`

```
#!/usr/bin/python3
```

In [None]:
# change permissions to add execute (x) permission for the user (u)
# in mobaxterm and gitbash this may already be done automatically
# in mobaxterm this will work with no problems

!chmod u+x test.py

In [None]:
!./test.py

In [None]:
# check the environment

%whos

In [None]:
#import a module

import test

#### Adding more code - adding a function

In [None]:
def test_function(no = 0):
    print("This is a function in a python script")
    return no + 1

In [None]:
test_function()

In [None]:
test_function(5)

In [None]:
# add the test_function to the test.py file
# restart the kernel (the button with the round arrow icon)
# then import (run this cell)

import test as t

In [None]:
dir(t)

In [None]:
# test the function


In [None]:
# add a test_variable and set a value, restart kernel, import
import test as t

# the variable is now available to use
dir(t)

In [None]:
# check the environment

%whos

In [None]:
# check the variable 

t.test_variable

In [None]:
# add another variable


In [None]:
# add a function called say_hi to the test.py file
# the function prints "Hi name" using a parameter name
# restart kernel, import
# test the function



#### We have created a module - has functionality ready to import
#### We could also use it as a script - to actually compute results


### __main__ — Top-level script environment

'__main__' is the name of the scope in which top-level code executes. A module’s __name__ is set equal to '__main__' when read from standard input, a script, or from an interactive prompt.

A module can discover whether or not it is running in the main scope by checking its own __name__, which allows a common idiom for conditionally executing code in a module when it is run as a script or with python -m but not when it is imported.

```python
if __name__ == "__main__":
    # execute only if run as a script
    main() # function that contais the code to execute
```

https://docs.python.org/3/library/__main__.html

In [None]:
list.__name__

In [None]:
t.__name__

In [None]:
def main():
    test_variable = 10
    print(f'The test variable value is {test_variable}')

In [None]:
main()

In [None]:
# add main, the if statement, restart kernel, import
import test as t

In [None]:
dir(t)

In [None]:
!python test.py

In [None]:
# we can still use the script as a module 
#restart kernel, import
import test as t

In [None]:
t.test_variable


#### `sys.argv`

The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not). <br>
If the command was executed using the -c command line option to the interpreter, argv[0] is set to the string '-c'. <br>
If no script name was passed to the Python interpreter, argv[0] is the empty string.

The Python sys module provides access to any command-line arguments using the sys.argv object. 

The sys.argv is the list of all the command-line arguments.<br>
len(sys.argv) is the total number of length of command-line arguments.

Add to the script

```python
import sys

print('Number of arguments:', len(sys.argv))
print ('Argument List:', str(sys.argv))
```

In [None]:
!python test.py

#### Give some arguments

In [None]:
# all arguments will be retrieved as strings
!python test.py [1,2,4] message 1

In [None]:
# no space is allowed within the arguments


```import numpy as np```

In [None]:
# Making an array outof a string containing a list

import numpy as np
np.array("[1, 2 , 3]".strip('][').split(','), dtype = int)

#### Argument parsing

`import getopt`
    
`opts, args = getopt.getopt(argv, 'a:b:', ['foperand', 'soperand'])`

The signature of the getopt() method looks like:

`getopt.getopt(args, shortopts, longopts=[])`

* `args` is the list of arguments taken from the command-line.
* `shortopts` is where you specify the option letters. If you supply a:, then it means that your script should be supplied with the option a followed by a value as its argument. Technically, you can use any number of options here. When you pass these options from the command-line, they must be prepended with '-'.
* `longopts` is where you can specify the extended versions of the shortopts. They must be prepended with '--'.

https://www.datacamp.com/community/tutorials/argument-parsing-in-python
https://docs.python.org/2/library/getopt.html
https://www.tutorialspoint.com/python/python_command_line_arguments.htm

In [None]:
# create the test_getopt.py file and copy the following content to the file
!touch test_getopt.py

Change the file test_getopt.py


```python
    import getopt
    import numpy as np
    
    try:
        # Define the getopt parameters
        opts, args = getopt.getopt(sys.argv[1:], "l:s:n:", ["list","string","number"])
        print("no of arguments:", len(opts))
        if len(opts) != 3:
            print ("Provide 3 arguments.")
            print("usage: test.py -l <list_operand> -s <string_operand> -n <number_operand>")
        else:
            print("options:", opts)
            test_array = np.array(opts[0][1].strip('][').split(','), dtype = int)
            string_text = opts[1][1]
            number_text = int(opts[2][1])
            test_array = test_array * number_text 
            print(f'\nInfo "{string_text}", the updated list is: {test_array}\n')
    except getopt.GetoptError:
        print ("usage: test.py -l <list_operand> -s <string_operand> -n <number_operand>")
```

In [None]:
!python test_getopt.py

In [None]:
!python test_getopt.py -l [1,2,4] -s message

In [None]:
!python test_getopt.py -l [1,2,4] -s message -n 3

#### `argparse` -increased readability
`import argparse`

`class argparse.ArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=argparse.HelpFormatter, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True)`<br>
https://docs.python.org/3/library/argparse.html#argumentparser-objects

Argument definition<br>
`ArgumentParser.add_argument(name or flags...[, action][, nargs][, const][, default][, type][, choices][, required][, help][, metavar][, dest])`<br>
https://docs.python.org/3/library/argparse.html#the-add-argument-method

`ap.add_argument("-i", "--ioperand", required=True, help="important operand")`

* -i - letter version of the argument
* --ioperand - extended version of the argument
* required - whether the argument or not
* help - maningful description

https://www.datacamp.com/community/tutorials/argument-parsing-in-python
https://docs.python.org/3/library/argparse.html
https://realpython.com/command-line-interfaces-python-argparse/

In [None]:
# create the test_argparse.py file and copy the following content to the file
!touch test_argparse.py

Change the file test_argparse.py



```python
    import argparse
    import numpy as np 


    ap = argparse.ArgumentParser()

    # Add the arguments to the parser
    ap.add_argument("-l", "--list_operand", required=True, help="list operand")
    ap.add_argument("-s", "--string_operand", required=True, help="string operand")
    ap.add_argument("-n", "--number_operand", required=True, help="number operand")

    args = vars(ap.parse_args())
    print("arguments:", args)
    test_array = np.array(args["list_operand"].strip("][").split(","), dtype = int)
    string_text = args["string_operand"]
    number_text = int(args["number_operand"])
    test_array = test_array * number_text 

    print(f"\nResult with argparse.\nInfo '{string_text}', for updated list {test_array}\n")
```


In [None]:
!python test_argparse.py -h

In [None]:
!python test_argparse.py -l [1,2,4] --string_operand message -n 5

##### `action` parameter - count example
https://docs.python.org/3/library/argparse.html#action

'count' - This counts the number of times a keyword argument occurs. For example, this is useful for increasing verbosity levels:

    `ap.add_argument("-v", "--verbose", action='count', default=0)`


In [None]:
# check the long and short form of the operands

!python test_argparse.py -l [1,2,4] --string_operand message -n 5 -vv

### More Modules

https://docs.python.org/3/tutorial/modules.html
https://www.python.org/dev/peps/pep-0008/#package-and-module-names

If you want to write a somewhat longer program, you are better off <b>using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a script.</b> 
    
As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program.

A module is a file containing Python definitions and statements. <b>The file name is the module name with the suffix .py appended</b>. Within a module, the module’s name (as a string) is available as the value of the global variable `__name__`.

In [None]:
#Let's create a module for our classes
!touch gene_module.py
# add Gene class to the file

In [None]:
import gene_module as gm

In [None]:
gm.Gene()

In [None]:
!touch enhanced_gene_module.py
# add EnhancedGene class to the file

In [None]:
import enhanced_gene_module as egm

In [None]:
eg1 = egm.EnhancedGene()

In [None]:
#dir(eg1)