# Learning Python Data Analysis

## Python Analyzing Data

Setup: https://swcarpentry.github.io/python-novice-inflammation/instructor/index.html#setup

Instruction: https://swcarpentry.github.io/python-novice-inflammation/instructor/02-numpy.html

Objectives:
* Explain what a library is and what libraries are used for.
* Import a Python library and use the functions it contains.
* Read tabular data from a file into a program.
* Select individual values and subsections from data.
* Perform operations on arrays of data.


In [None]:
# Begin by loading in a library called NumPy. 
# This library allows you to do fancy things with lots of numbers, especially if you have matrices or arrays.
# To start using NumPy, we need to import it:
import numpy

Importing a library is like getting a piece of lab equipment out of a storage locker and setting it up on the bench. 

Libraries provide additional functionality to the basic Python package, much like a new piece of equipment adds functionality to a lab space. 

Just like in the lab, importing too many libraries can sometimes complicate and slow down your programs - so we only import what we need for each program.

In [None]:
# Once we’ve imported the library, we can ask the library to read in our data file for us:
numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

Note: 'loadtxt' is a 'method', it's just like a function but associated with an object or library and is accessed using the dot notation (e.g numpy.loadtxt).
numpy.loadtxt is passed two parameters: 
* the name of the file we want to read 
* and the delimiter that separates values on a line. 
These both need to be character strings (or 'strings' for short), so we put them in quotes.

We are using 'named' arguments to pass the values (e.g fname=..., delimiter=...) which will work regardless of the order they are in.
The name of the arguments does not need quotes as it's a recognized variable within the method.
There are many more values you can pass, see: https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html

In [None]:
# To do anything with the data we need to assign it to a variable
data = numpy.loadtxt(fname='data/inflammation-01.csv', delimiter=',')

In [None]:
# Start by looking at the data type
print(type(data))

In [None]:
# To see the type of data in the array use the '.dtype' property
print(data.dtype)

In [None]:
# To see the shape of the data use the '.shape' property
print(data.shape)

#Note: The output is in rows, columns format

In [None]:
# To access a single number from the matrix use square brackets after the variable name
print('first value in data:', data[0][0])

# You can also use separate indexes with a comma
print('first value in data:', data[0, 0])

print('middle value in data:', data[29, 19])

#Note: indexes start from the upper left

## Slicing data

In [None]:
# Select the first four patients (rows) and first ten days (columns) of values like this:
print(data[0:4, 0:10])

# slice 0:4 means, “Start at index 0 and go up to, but not including, index 4”.

In [None]:
# You don’t have to include the upper and lower bounds in the slice 
# if you want to start at the begining and go to the end
small = data[:3, 36:]
print('small is:')
print(small)

## Analyzing data

In [None]:
# We can ask NumPy to compute values using it's many methods

# E.g compute data’s mean value
print(numpy.mean(data))

In [None]:
# Three other NumPy functions to get some descriptive values about the dataset
maxval, minval, stdval = numpy.amax(data), numpy.amin(data), numpy.std(data)

print('maximum inflammation:', maxval)
print('minimum inflammation:', minval)
print('standard deviation:', stdval)

In [None]:
# A new temporary array to find a patient's maximum inflammation
patient_0 = data[0, :] # 0 on the first axis (rows), everything on the second (columns)
print('maximum inflammation for patient 0:', numpy.amax(patient_0))

In [None]:
#  A patient's maximum inflammation without storing a new variable
print('maximum inflammation for patient 2:', numpy.amax(data[2, :]))

In [None]:
#  The maximum inflammation for all patients
print(numpy.amax(data, axis=1))

# Note: We need to specify the axis. Axis 0 is down and axis of 1 is across 

In [None]:
# we can also check the shape
print(numpy.amax(data, axis=1).shape)

In [None]:
# Exercise Slicing Strings 
element = 'oxygen'
print('first three characters:', element[0:3])
print('last three characters:', element[3:6])

# Without executing, what's the value of:
# element[:4]? 
# element[4:]? 
# and element[:]?

# What about:
# element[-1]? 
# and element[-2]?

# what about:
# element[1:-1]?

# How would you get the last 3 characters of the string?

In [None]:
# numpy.diff() takes an array and returns the differences between two successive values
patient3_week1 = data[3, :7]
print(patient3_week1)
numpy.diff(patient3_week1)

In [None]:
# Exercise - How would you find the largest change in inflammation for each patient?
# Does it matter if the change in inflammation is an increase or a decrease?


# Bonus - Add the .absolute method to control for negatives 
