<img src="https://docs.actable.ai/_images/logo.png" style="object-fit: cover; max-width:100%; height:300px;" />

# AAICorrelationTask

This notebook is an example on how you can run a correlation analysis with
[Actable AI](https://actable.ai)

For this example we will try to find correlations between a patient's level
of cholesterol and their other features like their sex, age etc

For this example the dataset we are going to use is \
the [Predict Heart Failure Dataset](https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction)

### Imports

This part simply imports the python modules.
Last line imports the Correlation task from actableai

In [4]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

from actableai.tasks.correlation import AAICorrelationTask

### Importing the data

This part imports the data.

In [8]:
df = pd.read_csv("https://raw.githubusercontent.com/Actable-AI/public-datasets/master/heart.csv")
df

Unnamed: 0,Age,Sex,ChestPainType,RestingBP,Cholesterol,FastingBS,RestingECG,MaxHR,ExerciseAngina,Oldpeak,ST_Slope,HeartDisease
0,40,M,ATA,140,289,0,Normal,172,N,0.0,Up,0
1,49,F,NAP,160,180,0,Normal,156,N,1.0,Flat,1
2,37,M,ATA,130,283,0,ST,98,N,0.0,Up,0
3,48,F,ASY,138,214,0,Normal,108,Y,1.5,Flat,1
4,54,M,NAP,150,195,0,Normal,122,N,0.0,Up,0
...,...,...,...,...,...,...,...,...,...,...,...,...
913,45,M,TA,110,264,0,Normal,132,N,1.2,Flat,1
914,68,M,ASY,144,193,1,Normal,141,N,3.4,Flat,1
915,57,M,ASY,130,131,0,Normal,115,Y,1.2,Flat,1
916,57,F,ATA,130,236,0,LVH,174,N,0.0,Flat,1


### Calling Actable AI task

This part is the call to the ActableAI correlation analysis.\
To learn more about the available parameters you can consult the [API Documentation](https://lib.actable.ai/actableai.tasks.html#actableai.tasks.correlation.AAICorrelationTask.run)

In [9]:
# Here df is the DataFrame containing our data
# target is "Churn" because we want to predict the churn
# features set to None means that we will use every single feature available
result = AAICorrelationTask().run(
    df=df,
    target_column="Cholesterol"
)

[ColumnTransformer] ..... (1 of 1) Processing remainder, total=   0.0s
[ColumnTransformer] . (1 of 2) Processing OneHotEncoder, total=   0.0s
[ColumnTransformer] ..... (2 of 2) Processing remainder, total=   0.0s


100%|██████████| 20/20 [00:00<00:00, 644.01it/s]


### Evaluation of the generated model

In this part we take a look at the metrics created by the model on the validation set.\
The validation set is created internally so you dont need to specify it.

In [11]:
result['data']['corr']

[{'col': 'FastingBS',
  'corr': -0.19287095999573994,
  'pval': 3.835414526708026e-09},
 {'col': 'MaxHR', 'corr': 0.18389977675903896, 'pval': 1.997361651456194e-08},
 {'col': ['Sex', 'F'],
  'corr': 0.18137011225847538,
  'pval': 3.135192926258979e-08},
 {'col': ['Sex', 'M'],
  'corr': -0.18137011225847538,
  'pval': 3.135192926258979e-08},
 {'col': ['RestingECG', 'LVH'],
  'corr': 0.18013616106297495,
  'pval': 3.8974701092669564e-08},
 {'col': 'HeartDisease',
  'corr': -0.13987308609485483,
  'pval': 2.1083437988968683e-05},
 {'col': ['ChestPainType', 'ATA'],
  'corr': 0.12347742507022169,
  'pval': 0.0001765202792362173},
 {'col': ['RestingECG', 'ST'],
  'corr': -0.11259103412897559,
  'pval': 0.0006318392623570431},
 {'col': 'RestingBP',
  'corr': 0.10948098341278427,
  'pval': 0.0008918789807780659},
 {'col': ['ChestPainType', 'ASY'],
  'corr': -0.06557034829756311,
  'pval': 0.047020558132196605}]