# SWD1a - Introduction to Python

Material developed in December 06th 2023.

## Topic 2: Analysing Patient Data

### Download and Unzip the Data

- Data is available at: https://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-data.zip
- We are going to download and unzip the data files using command line.
- The data is going to be stored in a folder called `/contents/swc-python` (This is a colab folder location).
  - if you are using the command bellow, the folder will be created automatically
  - if not, please create the folder and move the data to the right place


In [13]:
# Download the data file and store in the swc-python folder
!wget -P swc-python https://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-data.zip

--2023-12-15 12:03:36--  https://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-data.zip
Resolving swcarpentry.github.io (swcarpentry.github.io)... 185.199.110.153, 185.199.108.153, 185.199.109.153, ...
Connecting to swcarpentry.github.io (swcarpentry.github.io)|185.199.110.153|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22554 (22K) [application/zip]
Saving to: ‘swc-python/python-novice-inflammation-data.zip’


2023-12-15 12:03:36 (10.4 MB/s) - ‘swc-python/python-novice-inflammation-data.zip’ saved [22554/22554]



In [17]:
# Extract .zip files inside the folder swc-python/
!unzip /content/swc-python/python-novice-inflammation-data.zip -d /content/swc-python/

Archive:  content/swc-python/python-novice-inflammation-data.zip
   creating: content/swc-python/data/
  inflating: content/swc-python/data/inflammation-01.csv  
  inflating: content/swc-python/data/inflammation-02.csv  
  inflating: content/swc-python/data/inflammation-03.csv  
  inflating: content/swc-python/data/inflammation-04.csv  
  inflating: content/swc-python/data/inflammation-05.csv  
  inflating: content/swc-python/data/inflammation-06.csv  
  inflating: content/swc-python/data/inflammation-07.csv  
  inflating: content/swc-python/data/inflammation-08.csv  
  inflating: content/swc-python/data/inflammation-09.csv  
  inflating: content/swc-python/data/inflammation-10.csv  
  inflating: content/swc-python/data/inflammation-11.csv  
  inflating: content/swc-python/data/inflammation-12.csv  
 extracting: content/swc-python/data/small-01.csv  
 extracting: content/swc-python/data/small-02.csv  
 extracting: content/swc-python/data/small-03.csv  


### Loading data into Python

To begin processing the clinical trial inflammation data, we need to load it into Python. We can do that using a library called NumPy, which stands for Numerical Python.

In [18]:
# Load the numpy library
import numpy

In [20]:
# Ask the library to read our data file for us
numpy.loadtxt(fname='/content/swc-python/data/inflammation-01.csv', delimiter=',')

array([[0., 0., 1., ..., 3., 0., 0.],
       [0., 1., 2., ..., 1., 0., 1.],
       [0., 1., 1., ..., 2., 1., 1.],
       ...,
       [0., 1., 1., ..., 1., 1., 1.],
       [0., 0., 0., ..., 0., 2., 0.],
       [0., 0., 1., ..., 1., 1., 0.]])

In [21]:
# Our call to numpy.loadtxt read our file but didn’t save the data in memory.
# To do that, we need to assign the array to a variable. 
data = numpy.loadtxt(fname='/content/swc-python/data/inflammation-01.csv', delimiter=',')

In [22]:
print(data)

[[0. 0. 1. ... 3. 0. 0.]
 [0. 1. 2. ... 1. 0. 1.]
 [0. 1. 1. ... 2. 1. 1.]
 ...
 [0. 1. 1. ... 1. 1. 1.]
 [0. 0. 0. ... 0. 2. 0.]
 [0. 0. 1. ... 1. 1. 0.]]


In [23]:
print(type(data))

<class 'numpy.ndarray'>


In [24]:
# Get an overview of the data

print(data.dtype) # see the data type
print(data.shape) # see how many rows/columns your data have
print('firstr value in data: ', data[0,0]) # Check first number in your dataset
print('firstr value in data: ', data[29,19]) # Check middle number in your dataset

float64
(60, 40)
firstr value in data:  0.0
firstr value in data:  16.0


### Slicing data

In [25]:
print(data[0:4, 0:10])

[[0. 0. 1. 3. 1. 2. 4. 7. 8. 3.]
 [0. 1. 2. 1. 2. 1. 3. 2. 2. 6.]
 [0. 1. 1. 3. 3. 2. 6. 2. 5. 9.]
 [0. 0. 2. 0. 4. 2. 2. 1. 6. 7.]]


In [26]:
print(data[60,0])

IndexError: index 60 is out of bounds for axis 0 with size 60

In [27]:
print(data[5:10, 0:10])

[[0. 0. 1. 2. 2. 4. 2. 1. 6. 4.]
 [0. 0. 2. 2. 4. 2. 2. 5. 5. 8.]
 [0. 0. 1. 2. 3. 1. 2. 3. 5. 3.]
 [0. 0. 0. 3. 1. 5. 6. 5. 5. 8.]
 [0. 1. 1. 2. 1. 3. 5. 3. 5. 8.]]


In [28]:
small = data[:3, 36:]
print('small is:', small)

small is: [[2. 3. 0. 0.]
 [1. 1. 0. 1.]
 [2. 2. 1. 1.]]


### Analysing data

In [29]:
# Compute data’s mean value using NumPy
print(numpy.mean(data))

6.14875


In [30]:
# other NumPy functions to get some descriptive values about the dataset 
maxval, minval, stdval = numpy.amax(data), numpy.amin(data), numpy.std(data)

print('Max: ', maxval)
print('Min: ', minval)
print('Std Dev: ', stdval)

Max:  20.0
Min:  0.0
Std Dev:  4.613833197118566


In [31]:
# Looking one patient 
patient_0 = data[0, :]
print('Max Inflamation for patient 0: ', numpy.amax(patient_0))

Max Inflamation for patient 0:  18.0


In [32]:
# Looking another patient 
print('Max Inflamation for patient 2: ', numpy.amax(data[2, :]))

Max Inflamation for patient 2:  19.0


In [33]:
# average inflammation per day for all patients
print(numpy.mean(data,axis=0))

[ 0.          0.45        1.11666667  1.75        2.43333333  3.15
  3.8         3.88333333  5.23333333  5.51666667  5.95        5.9
  8.35        7.73333333  8.36666667  9.5         9.58333333 10.63333333
 11.56666667 12.35       13.25       11.96666667 11.03333333 10.16666667
 10.          8.66666667  9.15        7.25        7.33333333  6.58333333
  6.06666667  5.95        5.11666667  3.6         3.3         3.56666667
  2.48333333  1.5         1.13333333  0.56666667]


In [34]:
# Check shape to help understand the axis operation - 40 is the number of days
print(numpy.mean(data,axis=0).shape)

(40,)


In [35]:
# max inflammation per patient across all days
print(numpy.max(data, axis=1))

[18. 18. 19. 17. 17. 18. 17. 20. 17. 18. 18. 18. 17. 16. 17. 18. 19. 19.
 17. 19. 19. 16. 17. 15. 17. 17. 18. 17. 20. 17. 16. 19. 15. 15. 19. 17.
 16. 17. 19. 16. 18. 19. 16. 19. 18. 16. 19. 15. 16. 18. 14. 20. 17. 15.
 17. 16. 17. 19. 18. 18.]


In [39]:
# Check shape to help understand the axis operation - 60 is the number of patients
print(numpy.max(data, axis=1).shape)

(60,)


### SLICING STRINGS

In [None]:
element = 'oxygen'
print('first three characters:', element[0:3])
print('last three characters:', element[3:6])