# Analyzing Patient Data

Objectives:
- Explain what a library is and what libraries are used for
- Import a Python library and use the functions it contains
- Read tabular data from a file into a program
- Select individual values and subsections from data
- Perform operations on arrays of data

## Loading data into Python

Libraries in Python:
- Libraries are like lab equipment - they provide additional functionality to basic Python
- Import only what you need to keep programs efficient
- NumPy (Numerical Python) is used for working with arrays and matrices



Importing NumPy:
- Use `import numpy` to access the library
- Dot notation: `numpy.function()` - the function belongs to the numpy library

In [3]:
import numpy

Loading Data:
- Use `numpy.loadtxt()` to read data from files
- Parameters: `fname` (filename) and `delimiter` (separator between values)
- Assign data to a variable to store it in memory: `data = numpy.loadtxt(...)`

In [4]:
data = numpy.loadtxt(fname="inflammation-01.csv", delimiter=',')

In [5]:
data

array([[0., 0., 1., ..., 3., 0., 0.],
       [0., 1., 2., ..., 1., 0., 1.],
       [0., 1., 1., ..., 2., 1., 1.],
       ...,
       [0., 1., 1., ..., 1., 1., 1.],
       [0., 0., 0., ..., 0., 2., 0.],
       [0., 0., 1., ..., 1., 1., 0.]])

In [6]:
# Type of data
type(data)

numpy.ndarray

In [18]:
# Dimension of data
print(data.shape)

(60, 40)


Understanding Arrays:
- NumPy creates N-dimensional arrays (ndarray)
- Arrays contain elements of the same type
- Use `type()` to check the object type
- Use `.dtype` to check the data type inside the array
- Use `.shape` to see array dimensions (rows, columns)

![](https://swcarpentry.github.io/python-novice-inflammation/fig/python-zero-index.svg)

Indexing:
- Python uses zero-based indexing (starts counting from 0)
- Access elements using `array[row, column]`
- Example: `data[0, 0]` is the first element
- Index `[0, 0]` appears in upper left corner (not lower left like Cartesian coordinates)

## Slicing data

**Slicing Arrays:**
- Select sections of arrays using slice notation: `start:end`
- The slice goes up to, but **not including**, the end index
- Example: `data[0:4, 0:10]` selects rows 0-3 and columns 0-9



**Slice Boundaries:**
- Don't have to start at 0: `data[5:10, 0:10]`
- Can omit lower bound (defaults to 0): `data[:3, 36:]`
- Can omit upper bound (goes to the end): `data[5:, :]`
- Using `:` alone selects everything along that axis



**Rules to Remember:**
- The difference between upper and lower bounds = number of values in the slice
- Slicing works on strings too: `element[0:3]` for first 3 characters
- Slices can use negative indices to count from the end

## Analyzing data

**Statistical Functions:**
- NumPy provides functions for statistical analysis
- `numpy.mean(array)` - calculate average
- `numpy.amax(array)` - find maximum value
- `numpy.amin(array)` - find minimum value
- `numpy.std(array)` - calculate standard deviation


**Operating Along Axes:**
- Can apply functions across specific axes in multi-dimensional arrays
- `axis=0` - operates down rows (column-wise)
- `axis=1` - operates across columns (row-wise)
- Example: `numpy.mean(data, axis=0)` gives average for each day across all patients
- Example: `numpy.amax(data, axis=1)` gives maximum for each patient across all days

**Tips:**
- Use tab completion in Jupyter/IPython to explore available functions
- Add `?` after a function name to see documentation: `numpy.mean?`
- Multiple assignment lets you assign several values at once
- Array operations can work on selected rows/columns: `numpy.amax(data[0, :])`