# Analyzing Patient Data

Objectives:
- Explain what a library is and what libraries are used for
- Import a Python library and use the functions it contains
- Read tabular data from a file into a program
- Select individual values and subsections from data
- Perform operations on arrays of data

## Loading data into Python

Libraries in Python:
- Libraries are like lab equipment - they provide additional functionality to basic Python
- Import only what you need to keep programs efficient
- NumPy (Numerical Python) is used for working with arrays and matrices



Importing NumPy:
- Use `import numpy` to access the library
- Dot notation: `numpy.function()` - the function belongs to the numpy library

In [1]:
import numpy

Loading Data:
- Use `numpy.loadtxt()` to read data from files
- Parameters: `fname` (filename) and `delimiter` (separator between values)
- Assign data to a variable to store it in memory: `data = numpy.loadtxt(...)`

In [2]:
data = numpy.loadtxt(fname="inflammation-01.csv", delimiter=',')

In [3]:
data

array([[0., 0., 1., ..., 3., 0., 0.],
       [0., 1., 2., ..., 1., 0., 1.],
       [0., 1., 1., ..., 2., 1., 1.],
       ...,
       [0., 1., 1., ..., 1., 1., 1.],
       [0., 0., 0., ..., 0., 2., 0.],
       [0., 0., 1., ..., 1., 1., 0.]], shape=(60, 40))

In [4]:
# Type of data
type(data)

numpy.ndarray

In [5]:
# Dimension of data
print(data.shape)

(60, 40)


Understanding Arrays:
- NumPy creates N-dimensional arrays (ndarray)
- Arrays contain elements of the same type
- Use `type()` to check the object type
- Use `.dtype` to check the data type inside the array
- Use `.shape` to see array dimensions (rows, columns)

![](https://swcarpentry.github.io/python-novice-inflammation/fig/python-zero-index.svg)

Indexing:
- Python uses zero-based indexing (starts counting from 0)
- Access elements using `array[row, column]`
- Example: `data[0, 0]` is the first element
- Index `[0, 0]` appears in upper left corner (not lower left like Cartesian coordinates)

In [8]:
# Access the first patient data for the first day
print(data[0,0])

0.0


In [10]:
# Access the "middle" of the data
print("Middle value is: ", data[29,19])

Middle value is:  16.0


## Slicing data

**Slicing Arrays:**
- Select sections of arrays using slice notation: `start:end`
- The slice goes up to, but **not including**, the end index
- Example: `data[0:4, 0:10]` selects rows 0-3 and columns 0-9



In [11]:
# Acccess data of Patient 1 to 3 for days 1 to 9
print(data[0:4, 0:10])

[[0. 0. 1. 3. 1. 2. 4. 7. 8. 3.]
 [0. 1. 2. 1. 2. 1. 3. 2. 2. 6.]
 [0. 1. 1. 3. 3. 2. 6. 2. 5. 9.]
 [0. 0. 2. 0. 4. 2. 2. 1. 6. 7.]]


In [15]:
data.shape

(60, 40)

In [None]:
# Acccess last 5 patients data for the last 10 days
print(data[54:,30:])

[[ 9.  7. 10.  3. 13. 13.  8.  9. 17. 16. 16. 15. 12. 13.  5. 12. 10.  9.
  11.  9.  4.  5.  5.  2.  2.  5.  1.  0.  0.  1.]
 [ 2.  4. 11. 11.  3.  8.  8. 16.  5. 13. 16.  5.  8.  8.  6.  9. 10. 10.
   9.  3.  3.  5.  3.  5.  4.  5.  3.  3.  0.  1.]
 [ 5.  5.  4.  6.  6.  4. 16. 11. 14. 16. 14. 14.  8. 17.  4. 14. 13.  7.
   6.  3.  7.  7.  5.  6.  3.  4.  2.  2.  1.  1.]
 [ 6.  5.  6.  4. 14. 13. 13.  9. 12. 19.  9. 10. 15. 10.  9. 10. 10.  7.
   5.  6.  8.  6.  6.  4.  3.  5.  2.  1.  1.  1.]
 [ 9. 10.  8.  6.  5. 12. 15.  5. 10.  5.  8. 13. 18. 17. 14.  9. 13.  4.
  10. 11. 10.  8.  8.  6.  5.  5.  2.  0.  2.  0.]
 [ 9.  3.  3. 10. 12.  9. 14. 11. 13.  8.  6. 18. 11.  9. 13. 11.  8.  5.
   5.  2.  8.  5.  3.  5.  4.  1.  3.  1.  1.  0.]]


In [22]:
# Acccess last 5 patients data for the last 10 days
print(data[data.shape[0]-5-1:,10:])

[[ 9.  7. 10.  3. 13. 13.  8.  9. 17. 16. 16. 15. 12. 13.  5. 12. 10.  9.
  11.  9.  4.  5.  5.  2.  2.  5.  1.  0.  0.  1.]
 [ 2.  4. 11. 11.  3.  8.  8. 16.  5. 13. 16.  5.  8.  8.  6.  9. 10. 10.
   9.  3.  3.  5.  3.  5.  4.  5.  3.  3.  0.  1.]
 [ 5.  5.  4.  6.  6.  4. 16. 11. 14. 16. 14. 14.  8. 17.  4. 14. 13.  7.
   6.  3.  7.  7.  5.  6.  3.  4.  2.  2.  1.  1.]
 [ 6.  5.  6.  4. 14. 13. 13.  9. 12. 19.  9. 10. 15. 10.  9. 10. 10.  7.
   5.  6.  8.  6.  6.  4.  3.  5.  2.  1.  1.  1.]
 [ 9. 10.  8.  6.  5. 12. 15.  5. 10.  5.  8. 13. 18. 17. 14.  9. 13.  4.
  10. 11. 10.  8.  8.  6.  5.  5.  2.  0.  2.  0.]
 [ 9.  3.  3. 10. 12.  9. 14. 11. 13.  8.  6. 18. 11.  9. 13. 11.  8.  5.
   5.  2.  8.  5.  3.  5.  4.  1.  3.  1.  1.  0.]]


**Slice Boundaries:**
- Don't have to start at 0: `data[5:10, 0:10]`
- Can omit lower bound (defaults to 0): `data[:3, 36:]`
- Can omit upper bound (goes to the end): `data[5:, :]`
- Using `:` alone selects everything along that axis



**Rules to Remember:**
- The difference between upper and lower bounds = number of values in the slice
- Slicing works on strings too: `element[0:3]` for first 3 characters
- Slices can use negative indices to count from the end

## Analyzing data

**Statistical Functions:**
- NumPy provides functions for statistical analysis
- `numpy.mean(array)` - calculate average
- `numpy.amax(array)` - find maximum value
- `numpy.amin(array)` - find minimum value
- `numpy.std(array)` - calculate standard deviation


**Operating Along Axes:**
- Can apply functions across specific axes in multi-dimensional arrays
- `axis=0` - operates down rows (column-wise)
- `axis=1` - operates across columns (row-wise)
- Example: `numpy.mean(data, axis=0)` gives average for each day across all patients
- Example: `numpy.amax(data, axis=1)` gives maximum for each patient across all days

**Tips:**
- Use tab completion in Jupyter/IPython to explore available functions
- Add `?` after a function name to see documentation: `numpy.mean?`
- Multiple assignment lets you assign several values at once
- Array operations can work on selected rows/columns: `numpy.amax(data[0, :])`