![alt text](./pageheader_rose2_babies.jpg)

# Data Science in Medicine using Python

### Author: Dr Gusztav Belteki

## 1. Precedence of arithmetical operators

In [None]:
# This is a comment. Even when it is in a code cell, it will not be executed

# Addition and subtraction are of equal priority so they will be done sequentially but actually it does not matter. 
# Note the decimal point. Division always returns a floating point number even if the decimal is zero.

10 * 10 / 5, 10 * (10 / 5)

In [None]:
type(20), type (20.0)

In [None]:
# Division takes priority over addition - as in maths

10 + 10 / 10, (10 + 10) / 10

In [None]:
# This is integer division

15 // 2

In [None]:
# Modulo operator, returning the remainder after the division

15 % 4

- `int` returns the ingeters part = always rounding down
- for `round` one can provide the nummber of digits as an argument



In [None]:
int(16.864295), round(16.864295), round(16.864295, 3)

## 2. What is `True` and what is `False` ?

#### Rules

1. True is `True` and False is `False`

In [None]:
True

In [None]:
False

2. Everything is implicitly either `True` or `False`

In [None]:
bool(42), bool(-42), bool(0)

In [None]:
# Empty text evaluates as 'False' any other text as 'True'
# You can you single or double quotation marks but you cannot mix them

bool('Hello'), bool("Hello"), bool(' '), bool('')

3. `A and B` is True if both A and B are True 

In [None]:
True and True,  True and False, False and True, False and False

Please note that the second argument (the one after `and`) is returned if the first argument is True, otherwise the first argument is returned

This can be generalised to everything

In [None]:
# First is True and therefore the second is returned

42 and -42

In [None]:
# First is False and therefore it is returned

'' and 'Hello'

4. `A or B` is True if either A or B is True 

In [None]:
True or True,  True or False, False or True, False or False

Please note that the first argument (the one before `or`) is returned if it is True, otherwise the second argument is returned

This can be generalised to everything

In [None]:
# First is True and therefore it is returned

42 or -42

In [None]:
# First is False and therefore the second is returned

'' or 'Hello'

`and` takes priority over `or`. 

In [None]:
42 or 0 and 'Hello'

In [None]:
(42 or 0) and 'Hello'

In [None]:
not 42 or 0 and 'Hello'

`not` takes priority over both.

In [None]:
not (42 or 0 and 'Hello')

## 3. String methods

In [None]:
B = '''Midway upon the journey of our life I found myself within a forest dark, 
For the straightforward pathway had been lost.'''

print(B)

In [None]:
B

In [None]:
# Case sensitive

B.count('m')

In [None]:
B.lower()

In [None]:
C = B.lower()
C.count('m')

In [None]:
# You can chain methods

B.lower().count('m')

In [None]:
# Splits a text into a list - introduce 'lists'

B.split(' ')

In [None]:
len(B)

In [None]:
D = B.split(' ')
len(D)

In [None]:
# You can embed methods in each other

len(B.split(' '))

Lists all attributes and methods of an object

In [None]:
dir(B)

In [None]:
B.title()

In [None]:
print(B.title())

## 4. How to find files on your computer

![alt text](./file_structure.jpg)

Your current working directory

In [None]:
import os
os.getcwd()

On Mac: `/.../data_science_course`

On Windows: `C:\...\data_science_course`

##### Relative path from current directory

- For Mac

data/CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip

OR

./data/CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip


- For Windows

.\data\CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip


##### Absolute path from the root directory

- For me (Mac) 

/Users/guszti/data_science_course/data/CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip


- For you (Mac)

/.../data_science_course/data/CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip

- For you (Windows)

C:\...\data_science_course\data\CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip

In [None]:
# Platform independent file path

# relative path
rel_path = os.path.join('data', 'CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip')
rel_path

In [None]:
# absolute path for me
abs_path = os.path.join(os.sep, 'guszti', 'data', 'CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip')
abs_path

## 5. How to read in tabular data 

In [None]:
# Modules need to be imported first
import os
import pandas as pd

![alt text](./importing.pdf)

##### Works on all systems

In [None]:
# Imported by not bound to a variable. We cannot use it later.

path = os.path.join('data', 'CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip')
pd.read_csv(path)

Analogy

In [None]:
42

In [None]:
a = 42

In [None]:
a

In [None]:
# Now it is there for later use

path = os.path.join('data', 'CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip')
data = pd.read_csv(path)

In [None]:
data

In [None]:
type(data)

- DataFrames have rows (= `records`) and columns (= `fields`)
- The first column is an index column (record identifiers)
- The first first row contains the column names (field names)

In [None]:
data.index

In [None]:
data.columns

There is only one positional argument, all other arguments are keyword arguments with default value

Notice that lines can be broken inside parentheses.

In [None]:
# Keyword arguments with default values

pd.read_csv?

In [None]:
# Only import the first n rows

path = os.path.join('data', 'CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip')
data = pd.read_csv(path, nrows = 15)
data

1. Only limit columns to the ones you really need

2. Only throw away rows (data points) you really need to drop

In [None]:
path = os.path.join('data', 'CsvLogBase_2020-11-02_134238.904_slow_Measurement.csv.zip')
columns_to_keep = ['Date', 'Time', 'Rel.Time [s]','5001|MVe [L/min]', '5001|VTmand [mL]',
                   '5001|PIP [mbar]', '5001|RRspon [1/min]']
new_index = ['Rel.Time [s]']

data = pd.read_csv(path, nrows = 100000, usecols = columns_to_keep, index_col = new_index)
data

In [None]:
data.info()

In [None]:
data.describe()

## 6. Homework

#### Slicing and dicing in Python

In [None]:
# List of strings

lst = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October',
       'November', 'December']

lst

In [None]:
# Indexing is zero based

lst[0:4]

##### So - write the input to generate the output

`['March', 'April', 'May']`

In [None]:
lst[]

`['May', 'June', 'July', 'August']`

In [None]:
lst[]

`['July', 'August', 'September', 'October', 'November', 'December']`

In [None]:
lst[]

`['January', 'March', 'May', 'July', 'September']`

In [None]:
lst[]

`['March', 'April', 'May']`

`['January',
 'February',
 'March',
 'April',
 'May',
 'June',
 'July',
 'August',
 'September',
 'October',
 'November',
 'December']`

In [None]:
lst[]

`['April', 'June', 'August', 'October']`

In [None]:
lst[]

`['December', 'November', 'October', 'September', 'August']`

In [None]:
lst[]

`['November', 'September', 'July', 'May']`

In [None]:
lst[]

`['December',
 'November',
 'October',
 'September',
 'August',
 'July',
 'June',
 'May',
 'April',
 'March',
 'February',
 'January']`

In [None]:
lst[]