# PharmSci 275
## Drug discovery computing techniques


### Nathan Lim - Limn1@uci.edu
#### Mobley Lab - 3134B Nat. Sci. I



I’m going to assume: 
- Some familiarity with Linux/UNIX as a computing environment.
    - i.e. Basic file movement operations (`cd`, `ls`)
- You already know at least one programming/scripting language
    - If not let’s set up time to work through some introductory stuff one-on-one

# Starting off in Python
- A good way to start off in Python is by playing around in the *interactive* interpreter. Open up a terminal and execute: 
  >`python`
- **Preferred**: Get Jupyter notebooks from 
> **continuum.io/downloads**
-  *If you have Jupyter Notebook installed* exceute:
>`jupyter notebook`



Below, let's write a simple program that displays text to the terminal

In [None]:
# Below is a sample program
x = 'Hello World'
print(x)

- **`#`** : Denotes a *comment* in the code and is not executed.
- **`Hello World`** : Python *string*, a data structure that can store a sequence of any kind of characters (including special characters, such as newline, ‘\n’)
- **`x`** : a Python *variable (or object)*, which we assign by storing a string to it
- **`print()`** : a Python command which says to print the contents of x

# Python is dynamically typed

- Integers: defined by leaving out the decimal
- Floating point numbers *have* the decimal
- Use **`type()`** to determine what tye type of the object is.

In [None]:
x = 12.0
print(x, 'is', type(x))

y = 12
print(y, 'is', type(y) )

- Conversion between types is done using the type name but is not needed very often
  - Typically done when converting between *numerical* and **non**-numerical data, but less often between different kinds of numerical data.

In [5]:
z = 12.0
print( z, 'is', type(z))

z = int(z) 
print(z, 'is now', type(z) )

12.0 is <class 'float'>
12 is now <class 'int'>


# Data Structures: `lists`
* Lists in Python are a generic data structure that can contain pretty much anything:

In [None]:
a = []
a.append(1)
a.append('test')
a.append(1.43)
print(a )

- Common operations are defined for lists, for example: 
  - `a+b` combines a and b if they are lists

In [None]:
b = [0,2,'another list', 'final entry']
print(b)

c = a + b
print(c)

# Everything in Python is an *object*
- An object is something with properties and functions that can be accessed with dot notation. 
- For example, the list c we just created:

In [None]:
print(c)

In [None]:
# Get the index of the variable
c.index('test')

###### Notice indexing starts with **0** and NOT 1

In [None]:
# Remove variable from list and reverse the list
c.remove('another list')
c.reverse()
print(c)

Python provides help information on these modules with **`help()`**

In [None]:
help(c.count)

Python also provides information on properties/methods with **`dir()`**

In [None]:
dir(c)

# Making decisions in Python
- **If/elif/else** statements: Executes code if a condition is met
- **Note**: Indentation is used to designate code blocks.
 - Blocks end when indentation goes back to normal (and, in interactive, when there is a carriage return)

In [None]:
a = 1
if a==4:
    print(a)
elif a==2:
    print(a*a)
else:
    print(a*a+1)

- **While** statement: Executes code until the condition is True

In [None]:
ct = 5
while (ct > 1):
    print(ct)
    ct = ct -1

# Loops in Python
- Basic loop: `for` loop
- **`for`** loops are typically used to do almost everything that needs to be done multiple times
- **If** statements are better for one-time decisions
- **While** statements are not used as frequently

In [None]:
elements = ['how', 'do', 'loop']
for elem in elements:
    print(elem)

# Mathematics in Python
- Normal mathematical operation are available
- Other operations (such as square root) are available form numerical libraries like **`numpy`** or **`math`**

# Slicing for lists and strings
- Often we will want to work with a group of elements of a list or a string

In [None]:
# Define a string
line = "This is the line we're working on right now."

In [None]:
# Prints element 1 from string.
print( line[1] )

In [None]:
# Prints elments that range from 16 to 20
print( line[16:21] ) 

- Slicing can get a lot more sophisticated

In [None]:
# Print elements BEFORE index 20
print( line[:20] )

In [None]:
# Print elements AFTER index 20
print( line[20:] )

In [None]:
# Use negative numbers to go backwards from the end
print( line[:-5] )

In [None]:
# Overwrites what was stored previously in `line`
line = line[7:20]
print( line )

# Using Python noninteractively
- Using Python **non**-interactively (call from command-line) is more common for substantial/real tasks
- Allows us to avoid retyping stuff
- Allows reuse of code
- Easier troubleshooting

- Let's say I have the following in a file named **`test.py`** (filename must end with `.py`)
```python
x = [0,1,2,3,4,5]
y=[]
for i in x;
    y.append( i*i-1 )
z = x+y
print(z)
```

- To run the python script; Execute:
  > python test.py

# Documentation in python
- Functions can easily be documented by adding ‘doc stringsʼ at the beginnings of them
  - These make it possible for the built-in ‘helpʼ to work for functions you have written as well

In [2]:
# Define some function
def a(x, y): 
    """Adds two variables x and y, of any type.  Returns single value.""" 
    return x + y 

In [3]:
help(a)

Help on function a in module __main__:

a(x, y)
    Adds two variables x and y, of any type.  Returns single value.



- Commenting is also important for code readability

In [None]:
#Compose a new list consisting of 
#the squares of elements in the original list
some_list = [0,1,2,3]
b = [ elem*elem for elem in some_list]
print(b)

# Python modules

- A Python ‘moduleʼ is a package of code that can be used from within another package
- Many built-in modules already available
- Others may be written by you
- Any file you have already written may be used as a module
- Can be written in other languages, including Fortran and C++ for speed

#    File input and output
- File I/O is straightforward
```python
file = open('README.md', 'r')
for line in file:
    print(line)
file.close()
```

- ‘rʼ opens in read mode, ‘wʼ in write mode, ‘aʼ in append mode
  - Read/write lines as list of strings
     > file.readlines()
     > file.writelines(lines)
  - Read/write a single line
     > file.readline()     
     > file.writeline(line)
  - Read/write a string from/to file
     > file.read()
     > file.write(str) 
  - Close the file when done
     > file.close()
  

# Common mistakes
- Syntax errors, try some of them out to start getting a sense about how Python tells you these errors.
- Most common is the print function:
  - **Python3** uses: 
      > `print('string')`
  - **Python2** uses:
      > `print 'string' `

In [None]:
for i in range(5):
    print i
    #print(i)

In [None]:
b = 2
a = sqrt(float(b+2.)

In [None]:
#”scope” errors:
for i in range(20):
   a = []
   a.append(i)
print(a)

In [None]:
#type errors
a = ‘test string’
b = 2
c = a+b

# Tips for *debugging*
- Use print statements
  - Print the contents of variables throughout the code so you can see what’s going on
- Step through the code if necessary
  - You can use a debugger to do this, or just insert raw_input() statements in your code, which pause it until you provide any input (such as hitting ‘enter’)
- Test the components of your code 
  - If your program involves multiple conceptual steps, make sure each one works independently
- If it is too confusing, simplify by breaking into smaller steps 


# Moving on to SciPy/NumPy 

## Python is not particularly fast for numericcs; numerical libraries help
- One disadvantage of Python not being a compiled language is that it tends to be slow for large-scale numerics
- NumPy (Numeric Python) provides basic routines for manipulating arrays and numeric data
- SciPy extends it by providing minimization, Fourier tranformation, statistical, and other algorithms
- These enhance speed because they are precompiled and written in other languages, though the user doesnʼt need to know about this part

# Reminder: Import modules to allow access to their functionality

In [None]:
#Access using numpy.X
import numpy
help(numpy.sum)

In [None]:
#Give it a shortcut
import numpy as np
help(np.sum)

# Arrays are the key element of numpy
- An array is like a Python list, but it only contains numerical data (which must all be of the same type, like float or int)
- Arrays are much more efficient than lists
- Many common tasks can be written to be fast, using array operations rather than working element by element
- Unlike lists, array sizes are assigned in advance
- It is easy to create arrays:

In [None]:
a = [ 1, 4, 5, 8]
b = np.array( a, float ) 
print(b)

In [None]:
c = np.zeros( (10), int )
print(c)
print( type(c) )

# Multidimensional arrays, and array preallocation

- Array preallocation is useful when amount of data is known

In [None]:
#Open a file and read lines
file = open('tmp.txt', 'r')
text = file.readlines()
file.close()

In [None]:
#Allocate storage for data from file
data = np.zeros( (len(text)), float )

#Loop over lines in text, index,
# count the characters per line
# Storing to data array
for (linenum, line) in enumerate(text):
    print(linenum , len(line), line)
    data[linenum] = len(line)

print(data)

- Arrays can be multidimensional and different “axes” are accessed using commas within bracket notation

In [None]:
a = np.array( [ [1,2,3], [4, 5, 6] ], float )
print(a)

a[0,1]

# Array slicing is rather like list slicing

In [None]:
a = np.array( [ [1,2,3], [4, 5, 6] ], float )
print( a[0,:] )

In [None]:
print( a[:,2] )

In [None]:
print( a[-1:, -2:] )

- Note that use of : in a particular dimension means everything in that dimension
- 2D arrays can be thought of as having indices of “rows” and then “columns”, 
   - First index is row number, second is column number

# Common array inquiries
- Shape can be obtained from shape property

In [None]:
a.shape

- Lengths give length of first axis

In [None]:
len(a)

- The “in” statement can be used to test for presence of values

In [None]:
a = np.array( [ [1,2,3], [4, 5, 6] ], float )
print( 2 in a )
print( 0 in a )

- Arrays can be converted to lists

In [None]:
a.tolist()

# Combining arrays and simple array math
- Arrays are combined not with +, but with c = np.concatenate(a,b)
- For array math, arrays should be the same size
  - `+, -, /, *, %, and **, all` do element-by-element operations as you might expect
  - For multidimensional arrays, * performs element-by-element multiplication, NOT matrix multiplication (there are other functions for that)

In [None]:
a = np.array([1,2,3], float) 
b = np.array([5,2,6], float) 
a * b 

- Arrays (unlike lists) can handle math with scalars

In [None]:
a = np.array([1,2,3], float)
a+=2
print(a)

# Mathematical functions and array operations
- Numpy provides many mathematical functions and constants
  - functions: abs, sign sqrt, log, log10, exp, sin cos, tan, arcsin, arccos, arctan, sinh, cosh, tanh, archsinh, ..
  - constants: pi, e
 
- There are also many functions for whole array operations
    - Applied like `a.sum()`, which sums elements in array
       > a.prod(), a.mean(), a.var(), a.std(), a.min(), a.max()
       
    - `argmin` and `argmax` give indicies of minimum and maximum values

- Can limit application to a particular axis (direction) using optional axis argument

In [None]:
a = np.array([[0,2], [3, -1], [3,5]], float)
a.mean(axis=0)

# Comparisons to arrays
- Standard comparisons can be performed on arrays
    - Return arrays of Boolean (True/False) values

In [None]:
a = np.array([1,3,0] , float)
b = np.array([0,3,2], float)
a > b

- `Where` can be used to test where a particular condition is met, returning indicies where it is met

In [None]:
coordinates = np.array( [0.0, 0.45, 0.75, 0.13, 0.89, 1.47, 0.14], float )
interesting_residues = [ True, False, False, True, True, False, True]
indices = np.where( interesting_residues )
interesting_coordinates = coordinates[indices] #See p.15 NumPy Writeup
interesting_coordinates
np.where( coordinates < 0.3 )

- Logical operations can be used in constructing conditions
    - logical_and, logical_or, i.e np.logical_and (a > 0, a < 3) indices places where a meets conditions

# A couple of matplotlib/pylab plotting tips
## Plots using pylab are very easy

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

xvals = np.arange(10)
yvals = [0, 5, 7, 3,13, 7, -5, 8 , 3, 7]
plt.plot(xvals, yvals)
plt.show()

- Some other useful things to know:
  - various other file formats are available, such as png, pdf, jpg, svg, etc.
  - axis labels can be added (xlabel, ylabel)
  - limits adjusted with xlim, ylim etc.
  - Symbols and line styles can be specified with optional argument to plot: 
  `plot(xvals, yvals, 'mo')` would plot with magenta circles; see help(plot)
  - To create a new figure (rather than adding to existing one) use `figure()`
- Documention is avilabel online, including examples; use Google and `help()`

# Lists, assignment, and name binding: A word to the wise
- Python handles assignment often by pointing
  - i.e. `a = 2` creates 2 in memory, then points the variable `a` to it
  - This means that copying is not necessarily as expected:
    - `a = [1, 2, 3]` and `b=a` makes b point to the same place as a, 
    - After setting `b[1] = 0`, **`print(a)`**” will give **a = [1, 0, 3]**
  - If copies are needed, the copy module can be used: import copy, b = copy.copy(a)
  

- BUT this is not true for some types of data -- numbers are ‘immutable’, 
   - so setting `a = 2`, `b = a`, and `a=3`, leaves a with the value 3 and b with the value 2. 
   - Setting a = 3 makes a point to a new object in memory, leaving b pointing to the old one
- This can be confusing -- the point is, you should just be careful not to assume b = a gives you a COPY of a, though this is always OK for numbers

# “Random” numbers in Python
- In computational science, we often want to do something with a certain probability. This means we need a “random” number to determine whether or not an event happens.
  - i.e. assigning random initial velocities
- Computers can’t actually generate random numbers -- they are deterministic
  - So we generate pseudorandom numbers -- deterministically generated from a seed number, but distributed in a way that is very similar to random.

- NumPy has a random number generator in its ‘random’ module

In [None]:
np.random.seed(293423)
np.random.rand(5) 

In [None]:
np.random.random()

In [None]:
np.random.randint(5, 10) 