# HEP Software Training: Learn Programming with Python
## Chapter 2: Analyzing Patient Data

Created by: [Hisyam Athaya](https://athayahisyam.github.io/)  
Learning portfolio based on [SWCarpentry Programming with Python: Python Fundamentals](https://swcarpentry.github.io/python-novice-inflammation/)  
Visit [HEP Software Training](https://hepsoftwarefoundation.org/training/curriculum.html) for more information.

# Importing Useful Library

In [2]:
import numpy
numpy.__version__

'1.20.3'

Loading data from csv => in this case, the data is not saved to memory

In [5]:
numpy.loadtxt(fname='python-novice-inflammation-data/data/inflammation-01.csv', delimiter=',')

array([[0., 0., 1., ..., 3., 0., 0.],
       [0., 1., 2., ..., 1., 0., 1.],
       [0., 1., 1., ..., 2., 1., 1.],
       ...,
       [0., 1., 1., ..., 1., 1., 1.],
       [0., 0., 0., ..., 0., 2., 0.],
       [0., 0., 1., ..., 1., 1., 0.]])

# Numpy Array Operations

Loading data from csv and saving them to variable `data`

In [6]:
data = numpy.loadtxt(fname='python-novice-inflammation-data/data/inflammation-01.csv', delimiter=',')

In [7]:
print(data)

[[0. 0. 1. ... 3. 0. 0.]
 [0. 1. 2. ... 1. 0. 1.]
 [0. 1. 1. ... 2. 1. 1.]
 ...
 [0. 1. 1. ... 1. 1. 1.]
 [0. 0. 0. ... 0. 2. 0.]
 [0. 0. 1. ... 1. 1. 0.]]


In [8]:
print(type(data))

<class 'numpy.ndarray'>


In [9]:
# dtype for showing the type of data contained in the ndarray

print(data.dtype)

float64


In [11]:
# shape of the data variable, its description of the dimensions of data.

print(data.shape)

(60, 40)


The data contains `60 rows` and `40 columns`.

In [13]:
# to access a single value in the array, we need to provide index in square brackets.

print('first value in the data:', data[0,0])

first value in the data: 0.0


In [14]:
print('middle value in the data:', data[30, 20])

middle value in the data: 13.0


Remember: indices in Python array are `[row, column]`

# Slicing Data (... : ...)

Select first `ten columns of values` and the first `four rows of values`  
In this context, where the data column represents days of observation and rows represents patients: first ten days of values for the first four of patients.

In [15]:
# remember! [row, column]

print(data[0:4, 0:10])

[[0. 0. 1. 3. 1. 2. 4. 7. 8. 3.]
 [0. 1. 2. 1. 2. 1. 3. 2. 2. 6.]
 [0. 1. 1. 3. 3. 2. 6. 2. 5. 9.]
 [0. 0. 2. 0. 4. 2. 2. 1. 6. 7.]]


The slice `[0:4]` means: `start at index 0 and go up-to-but-not-including 4`. Respectively, slice `[0:10]` means `start at index 0 go up-to-but-not-including 10`. Slice can begins everywhere, depends on what we needed.

In [16]:
print(data[5:10, 0:10])

[[0. 0. 1. 2. 2. 4. 2. 1. 6. 4.]
 [0. 0. 2. 2. 4. 2. 2. 5. 5. 8.]
 [0. 0. 1. 2. 3. 1. 2. 3. 5. 3.]
 [0. 0. 0. 3. 1. 5. 6. 5. 5. 8.]
 [0. 1. 1. 2. 1. 3. 5. 3. 5. 8.]]


Slice do not have to include the `upper bound` and `lower bound`(`[lower bound : upper bound]`). If we do not specify `lower bound`, Python will use `0` as default and if we do not specify the `upper bound`, Python will run the slice to the end of the row/column axis. And if we do not specify anything, i.e. `[:]`, the slice includes everything.

In [22]:
# remember! [row, column]
# remember! [lower bound : upper bound]

small = data[:3, 36:] 
# means row start from index 0 up-to-but-not-include 3, column start from index 36 until end of axis

print('small is: ')
print(small)

small is: 
[[2. 3. 0. 0.]
 [1. 1. 0. 1.]
 [2. 2. 1. 1.]]
