# First ECG AI Tutorial - Python Intro

This tutorial was designed for Jupyter Notebook, if you don't have it installed please download anaconda: https://www.anaconda.com/download/

In this tutorial we'll cover some basic python commands and how to structure your code and data.
We'll further process a .xml file and learn how to work in a directory with multiple files, save tables and start your data analysis.

# 1- Numpy
NumPy is the fundamental package for scientific computing with Python. It contains many functions for matrix operations.
We'll use numpy for most of our python applications.


In [0]:
# Import the library, if it is already installed you just need to import it, we call it np to make it easier to code
import numpy as np
# Creating an array with the numpy library
only_zeros = np.zeros((5,5))
print(only_zeros)

Arrays in python always start at 0. This means that if you have an array of size 5, you can only access positions from 0 to 4. The first position of the array indexing is the row, the second the collumn and so forth. 
Bellow we'll change the value of the first row, second column of the array.
As in Matlab you can also change a whole row/column or the whole array at once.

In [0]:
#changing value
only_zeros[0,1] = 1
print(only_zeros)
#printing an empty row, space
print("\n")
#changing whole row
only_zeros[1,:] = 2
print(only_zeros)

Python is not a Strong Type language. This means that you usually do not have to declare a variable type before using it (as you would have to do in C for instance). We can check the type of our numpy array by accessing the dtype property or using the python type function for other variables. We can also change that depending on what level of precision we need in our variable. Since we only have small round values, we'll change it for integer. The type is **important** for decreasing the memory use since types such as float64 can take up a lot of memory.

In [0]:
#accessing the dtype property
print(only_zeros.dtype)
z=only_zeros[0,0]
print(type(z))
#changing the type of array by creating a new array, the dtype function can be used to specify the array type
not_only_zeros=np.array(only_zeros, dtype="int16")
print(not_only_zeros.dtype)
print(not_only_zeros)



#Identation, For and If
Different from many programming languages python does not work with brackets {} for delimitating the start and end of a function. 
Instead it works based on identitation as shown on the example below, where we search the whole array for the number 2 and print a message everytime we find it.

In [0]:
#with the shape function we can get the size of the array [rows,columns]




print(only_zeros.shape)

#for loop for the rows
for i in range(0, only_zeros.shape[0]):
    #for loop for the columns, the for loop below is inside the one above
    for j in range(0, only_zeros.shape[1]):
        #checking if the postion is equal2, idented again, so it is inside both for loops.
        if only_zeros[i,j]==2:
              print("Found a 2!")                         
        else:
            print("Found something else.")
    print(i) 
    

# Exercise 1
Solve the exercises below. First you should try without any numpy functions. The purpose here is to practice for loops, if and identation. If you're already confortable with these concepts you can skip this exercise.

### 1- Find the minimum and maximum values and print their value and position.
### 2- Sort the values from the array in crescent order.


In [0]:
values = np.random.random((30))
print(values)
print(np.min(values))


#Implement here your solutions

In [0]:

min_val=2
for i in range(0,values.shape[0]):
    if values[i]<min_val:
      min_val=values[i]

print(min_val)

# 2 Reading some XML

Below we'll read a xml file and start parsing it.
First we'll load the file, extract all the fields to a **dictionary** and then organize them on a **Pandas dataframe** table.
Finally we'll plot the ECG signal.

Dictionaries in python are composed of keys and values. Each key can have one or multiple values. Different keys can have different kinds of data (as you can see in the example below), which makes dictionaries very useful when parsing xml files.

Functions can be easily created using the **def** command. Below we'll create a function to print/return the age.

In [0]:
#creating a dictionary with different data types, values come after :

d = {'Name': 'Zara', 'Age': 15, 'Friends': ['Renan','Lucas','Ricardo']}

print("Name = ", d['Name'])
print('Age =', d['Age'])
print('Friends =',d['Friends'])
print('Best Friend =',d['Friends'][1])

#Creating a function that returns the age times 2:
def Real_Age(d_temp):    
  print("Real age is:",d_temp['Age']*2)
  return(d_temp['Age']*2)


In [0]:
#We created the function on the cell above, but once it is created and we run it, it becomes avaiable for the whole code.
age=Real_Age(d)
print("The real age is =",age)

### Now that we know how dictionaries and functions work let's see how we can load our xml file.

#### Downloading data
Here we will download the data from Gogle Drive.** You should skip this step if you are running the code on your own machine and already have the "ECG test.xml" downloaded.**


In [0]:
!pip install pydrive

In [0]:
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

file = drive.CreateFile({'id': '1FTZDmyBQ4b-2Rl8XfxXIKheIdNn-7gvK'})
file.GetContentFile('ECG test.xml')

#### Reading XML

In [0]:
#import the library to process xml

import xml.etree.ElementTree as ElementTree

class XmlListConfig(list):
    def __init__(self, aList):
        for element in aList:
            if element:
                # treat like dict
                if len(element) == 1 or element[0].tag != element[1].tag:
                    self.append(XmlDictConfig(element))
                # treat like list
                elif element[0].tag == element[1].tag:
                    self.append(XmlListConfig(element))
            elif element.text:
                text = element.text.strip()
                if text:
                    self.append(text)


class XmlDictConfig(dict):
    '''
    Example usage:

    >>> tree = ElementTree.parse('your_file.xml')
    >>> root = tree.getroot()
    >>> xmldict = XmlDictConfig(root)

    Or, if you want to use an XML string:

    >>> root = ElementTree.XML(xml_string)
    >>> xmldict = XmlDictConfig(root)

    And then use xmldict for what it is... a dict.
    '''
    def __init__(self, parent_element):
        if parent_element.items():
            self.update(dict(parent_element.items()))
        for element in parent_element:
            if element:
                # treat like dict - we assume that if the first two tags
                # in a series are different, then they are all different.
                if len(element) == 1 or element[0].tag != element[1].tag:
                    aDict = XmlDictConfig(element)
                # treat like list - we assume that if the first two tags
                # in a series are the same, then the rest are the same.
                else:
                    # here, we put the list in dictionary; the key is the
                    # tag name the list elements all share in common, and
                    # the value is the list itself 
                    aDict = {element[0].tag: XmlListConfig(element)}
                # if the tag has attributes, add those to the dict
                if element.items():
                    aDict.update(dict(element.items()))
                self.update({element.tag: aDict})
            # this assumes that if you've got an attribute in a tag,
            # you won't be having any text. This may or may not be a 
            # good idea -- time will tell. It works for the way we are
            # currently doing XML configuration files...
            elif element.items():
                self.update({element.tag: dict(element.items())})
            # finally, if there are no child tags and no attributes, extract
            # the text
            else:
                self.update({element.tag: element.text})
                



In [0]:
tree = ElementTree.parse('ECG test.xml')
root = tree.getroot()
xmldict = XmlDictConfig(root)
xmldict

Now we have the XML stored as a dictionary. We can access the data just like we did before:

In [0]:
print("Patient demographics: ", xmldict['PatientDemographics'])
print("Age: ", xmldict['PatientDemographics']['PatientAge'])
print("Gender: ", xmldict['PatientDemographics']['Gender'])
print("BMI: ", int(xmldict['PatientDemographics']['WeightKG'])/int(xmldict['PatientDemographics']['HeightCM'])**2)
#xmldict

# 3 - Pandas

Dictionarie is a good data structure, but it's not really suitable if we want to process data. We would need to create `fors` and iterate over it, which is really slow if we have a huge database. Usually databases are stored into tables. A good library to deal with tables is [Pandas](http://pandas.pydata.org/pandas-docs/stable/10min.html). 


```
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
```


Let's see what we can do with it!

In [0]:
# Import the pandas library
import pandas as pd

# The XML is a really structure. The Dataframe (structure from pandas) is a powerfull data structure, but it only handle the data as a spreadsheet.
# Let's use the dictionary from the begining of this notebook to create a DataFrame.
print("Dict: ", d)

print("\n")
df = pd.DataFrame(d)
print("Dataframe: ", df)
print("Columns: ", df.columns)

As you can see, each element of `Friends` turned into a row. Let's try now with the `PatientDemographics` we used before (from the XML). 

In [0]:
new_df = pd.DataFrame(xmldict['PatientDemographics'], index=[0])
new_df

Let's add more rows into this DataFrame:

In [0]:
new_pat1 = {'PatientAge': '50', 'AgeUnits': 'YEARS', 'Gender': 'MALE', 'HeightCM': '163', 'WeightKG': '63'}
new_pat2 = {'PatientAge': '80', 'AgeUnits': 'YEARS', 'Gender': 'FEMALE', 'HeightCM': '167', 'WeightKG': '66'}

new_df = new_df.append(
    [new_pat1, new_pat2], ignore_index=True)
new_df

Now we can do a lot of things using our brand new DataFrame. Let's explore!

In [0]:
new_df.dtypes

In [0]:
# For a reason, our numeric data is considered 'object', not numbers. Let's convert it to numbers
new_df[['HeightCM', 'PatientAge', 'WeightKG']] = new_df[['HeightCM', 'PatientAge', 'WeightKG']].astype(int)
new_df.dtypes

In [0]:
# Now, let's check some statistics
new_df.describe()

In [0]:
# We also can create new columns based on another columns:
new_df['Weight_plus_10'] = new_df['WeightKG'] + 10
new_df

# Exercise 2
Solve the exercises below. The purpose here is to practice working with DataFrames.

### 1- Create a new column with the patient's BMI

In [0]:
new_df['bmi'] = round(((new_df['WeightKG'] / (new_df['HeightCM']*new_df['HeightCM'])) * 10000), 2)
new_df

# 4 - Working with more data
Now you have an idea how to work with Python, Numpy and Pandas, let's try something with more data.
The breast cancer dataset is a classic and very easy binary (malignant or benign) classification dataset. 

In [0]:
# Let's import the data from a traditional ML library: Scikit-learn
from sklearn.datasets import load_breast_cancer
cancer = load_breast_cancer()

# And assign important data into variables
features = cancer.data
feat_name = cancer.feature_names
target = cancer.target
targ_name = cancer.target_names

In [0]:
# Now we can create a DataFrame with all data
df_feat = pd.DataFrame(features, columns=feat_name)
# And the target to be predicted
df_feat['target'] = target
df_feat

In [0]:
df_feat.describe()

In [0]:
# Correlation map
df_feat.corr()

Wow... We have a lot of data! It's not so easy now to understand what is happening, what is important and how to make decisions. Let's try something else! Data visualization and plots! The Matpltolib is the most important plot library in Python, but it's a little bit limited. For this reason, we are going to try something we another library, the Seaborn.

In [0]:
# Let's import both librarys
import seaborn as sns
import matplotlib.pyplot as plt

# To start, how about checking how many samples do we have for each class?
sns.countplot(x='target', data=df_feat)

In [0]:
# And how about making the correlation map a little bit more readable?
# just making it bigger
plt.figure(figsize=(16,16))
# and plot!
sns.heatmap(df_feat.corr(), vmax=1, square=True, annot=True)

In [0]:
# We also have other plots that might be intersting, liket he swarmplot
sns.swarmplot(x='target', y='mean radius', data= df_feat)

In [0]:
# Or histograms
sns.distplot(df_feat[df_feat['target']==0]['mean radius'])
sns.distplot(df_feat[df_feat['target']==1]['mean radius'])
plt.show()