# Programming with Python

[Programming with Python course](https://swcarpentry.github.io/python-novice-inflammation/) by Software Carpentry.

## Python Fundamentals

* Basic data types in Python include integers, strings, and floating-point numbers.
* Use `variable = value` to assign a value to a variable in order to record it in memory.
* Variables are created on demand whenever a value is assigned to them.
* Use `print(something)` to display the value of `something`.
* Built-in functions are always available to use.

## Analysing Patient Data

Load `NumPy` and use the function call `np.loadtxt()` to load `inflammation-01.csv`, which contains to arthritis patients' inflammation data where each row are data for each individual patient and the columns are their daily inflammation measurements.

In [1]:
import numpy as np
data = np.loadtxt(fname = '../data/swc/inflammation-01.csv', delimiter=',')
print(data)

[[0. 0. 1. ... 3. 0. 0.]
 [0. 1. 2. ... 1. 0. 1.]
 [0. 1. 1. ... 2. 1. 1.]
 ...
 [0. 1. 1. ... 1. 1. 1.]
 [0. 0. 0. ... 0. 2. 0.]
 [0. 0. 1. ... 1. 1. 0.]]


Data type using `type()`.

In [2]:
print(type(data))

<class 'numpy.ndarray'>


The `type()` function will only tell you that a variable is a `NumPy` array but not the type of values stored in the array. A NumPy array contains elements of the same type; use `dtype` to find out the type of data stored inside an array.

In [3]:
print(data.dtype)

float64


`data.shape` is an attribute of `data` that describes dimensions of `data` (60 patients with data for 40 days).

In [4]:
print(data.shape)

(60, 40)


Get first value (cell in first row and first column) using an index with square brackets.

In [5]:
print(data[0, 0])

0.0


An array slice from the first row to the fifth and first column to the fifth.

In [6]:
print(data[0:5, 0:5])

[[0. 0. 1. 3. 1.]
 [0. 1. 2. 1. 2.]
 [0. 1. 1. 3. 3.]
 [0. 0. 2. 0. 4.]
 [0. 1. 1. 3. 3.]]


We do not need to include the lower or upper bounds when slicing; Python will do this find the lowest and highest possible values when we do not specify the limits.

In [9]:
print(data[:5, 35:])

[[4. 2. 3. 0. 0.]
 [5. 1. 1. 0. 1.]
 [3. 2. 2. 1. 1.]
 [4. 2. 3. 2. 1.]
 [4. 2. 0. 1. 1.]]


Calculating the mean of entire array.

In [10]:
print(np.mean(data))

6.14875


Multiple assignment of descriptive statistics. Note to get more information about a function, add `?` to the end of the function, e.g. `np.std?`; this is the same as using `help(np.std)`.

In [11]:
maxval, minval, stdval = np.max(data), np.min(data), np.std(data)
print('maximum inflammation:', maxval)
print('minimum inflammation:', minval)
print('standard deviation:', stdval)

maximum inflammation: 20.0
minimum inflammation: 0.0
standard deviation: 4.613833197118566


Calculate just for patient 1.

In [15]:
print(np.max(data[0, :]))

18.0


Calculate for all patients by using `axis = 1`. Axis 1 refers to the columns and we are calculating _across_ the columns.

In [26]:
print(np.max(data, axis = 1).shape)
print(np.max(data, axis = 1))

(60,)
[18. 18. 19. 17. 17. 18. 17. 20. 17. 18. 18. 18. 17. 16. 17. 18. 19. 19.
 17. 19. 19. 16. 17. 15. 17. 17. 18. 17. 20. 17. 16. 19. 15. 15. 19. 17.
 16. 17. 19. 16. 18. 19. 16. 19. 18. 16. 19. 15. 16. 18. 14. 20. 17. 15.
 17. 16. 17. 19. 18. 18.]


Calculate per day by using `axis = 0`. Axis 0 refers to the rows and we are calculating _across_ the rows.

In [27]:
print(np.max(data, axis = 0).shape)
print(np.max(data, axis = 0))

(40,)
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9. 10. 11. 12. 13. 14. 15. 16. 17.
 18. 19. 20. 19. 18. 17. 16. 15. 14. 13. 12. 11. 10.  9.  8.  7.  6.  5.
  4.  3.  2.  1.]


Slicing strings.

In [28]:
element = 'oxygen'
print(element[0:3])

oxy


Last index using negative value.

In [36]:
print(element[-2])

e


Stacking arrays using `hstack` and `vstack`, which are like `cbind` and `rbind` in R.

In [46]:
A = np.array(
    [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]
)
print('A = ', "\n", A, "\n")

B = np.hstack([A, A])
print('B = ', "\n", B, "\n")

C = np.vstack([A, A])
print('C = ', "\n", C, "\n")

A =  
 [[1 2 3]
 [4 5 6]
 [7 8 9]] 

B =  
 [[1 2 3 1 2 3]
 [4 5 6 4 5 6]
 [7 8 9 7 8 9]] 

C =  
 [[1 2 3]
 [4 5 6]
 [7 8 9]
 [1 2 3]
 [4 5 6]
 [7 8 9]] 



The `np.diff()` function takes an array and returns the difference between two successive values (as in R).

In [48]:
patient3_week1 = data[3, :7]
print(patient3_week1)
print(np.diff(patient3_week1))

[0. 0. 2. 0. 4. 2. 2.]
[ 0.  2. -2.  4. -2.  0.]


Calculating `np.diff` for all patients.

In [49]:
np.diff(data, axis = 1)

array([[ 0.,  1.,  2., ...,  1., -3.,  0.],
       [ 1.,  1., -1., ...,  0., -1.,  1.],
       [ 1.,  0.,  2., ...,  0., -1.,  0.],
       ...,
       [ 1.,  0.,  0., ..., -1.,  0.,  0.],
       [ 0.,  0.,  1., ..., -2.,  2., -2.],
       [ 0.,  1., -1., ..., -2.,  0., -1.]])

Largest change in inflammation for each patient.

In [53]:
np.max(np.diff(data, axis = 1), axis = 1)

array([ 7., 12., 11., 10., 11., 13., 10.,  8., 10., 10.,  7.,  7., 13.,
        7., 10., 10.,  8., 10.,  9., 10., 13.,  7., 12.,  9., 12., 11.,
       10., 10.,  7., 10., 11., 10.,  8., 11., 12., 10.,  9., 10., 13.,
       10.,  7.,  7., 10., 13., 12.,  8.,  8., 10., 10.,  9.,  8., 13.,
       10.,  7., 10.,  8., 12., 10.,  7., 12.])

* Import a library into a program using `import libraryname`.
* Use the `numpy` library to work with arrays in Python.
* The expression `array.shape` gives the shape of an array.
* Use `array[x, y]` to select a single element from a 2D array.
* Array indices start at 0, not 1.
* Use `low:high` to specify a slice that includes the indices from low to high-1.
* Use `#` to add comments to programs.
* Use `numpy.mean(array)`, `numpy.max(array)`, and `numpy.min(array)` to calculate simple statistics.
* Use `numpy.mean(array, axis=0)` or `numpy.mean(array, axis=1)` to calculate statistics across the specified axis.