In [None]:
# Setting up the Colab environment. DO NOT EDIT!
try:
    import otter, pingouin

except ImportError:
    ! pip install -q otter-grader==4.0.0, pingouin
    import otter

if not os.path.exists('lab-tests'):
    zip_files = [f for f in os.listdir() if f.endswith('.zip')]
    assert len(zip_files)>0, 'Could not find any zip files!'
    assert len(zip_files)==1, 'Found multiple zip files!'
    ! unzip {zip_files[0]}

grader = otter.Notebook(colab=True,
                        tests_dir = 'lab-tests')

# Lab

Remember, all assignments are due at 11:59 PM (Philadelphia time) on the Sunday of each instructional week.

## Learning Objectives
At the end of this learning activity you will be able to: 
 - Employ `pg.chi2_independence` to estimate the correlation between two categorical variables.
 - Practice testing variables for normality.
 - Employ `pg.ttest`, `pg.anova`, and `pg.kruskal` to look for differences in a dependent variable between different categorical variables.

## Introduction

In this lab you will explore the effects of antiretroviral medications on neurological impairment.
In this cohort, we have two major drug regimens, d4T (Stavudine) and the newer Emtricitabine/tenofovir (Truvada).
The older Stavudine is suspected to have neurotoxic effects that are not found in the newer Truvada.

In order to evaluate this effect, the participants in this cohort have completed an extensive neuropsychological exam that measures each 6 domains of neurocognition:
* Processing Speed
* Executive Function
* Language
* Visuospatial processing
* Learning and Memory
* Motor Function

Each of these domains is measure by a number of tests.
The results of these tests are then compared to demographically matched individuals (age, race, gender, and education) in order to scale the values appropriately.

These values are on a _Z-scale_.
A z-scale is a tranformation such that the _mean_ is 0 and the _standard deviation_ is 1.
Therefore a person with a `motor_domain_z = 0` are performing at the _average_ of matched individuals.
A person with `motor_domain_z = -1` is performing 1 standard deviation below the average of matched individuals.

This leads to a scale of:
  * Z < -2 : Significant impairment
  * -2 < Z < -1 : Mild impairment
  * Z > -1  : No evidence of impairment

In [None]:
import numpy as np
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

import pingouin as pg

%matplotlib inline

In [None]:
data = pd.read_csv('hiv_neuro_data.csv')
data['education'] = data['education'].astype(float)
data.head()

### Q1: How many participants are suffering from impairment?

Using the thresholds above, create a bar-chart which shows the number of individuals with mild or moderate impairment for each of the domains.

Checked variables:
 * `q1_plot` - A barplot the number of individuals with impairment in each category.
 
<details><summary>Hint</summary>
Try creating a mask that puts a `True` for each position that is below the threshold. Then sum that mask to find the number of individuals with impairment. Much like we did in Module 5 Wakthrough Q1.</details>

|               |    |
| --------------|----|
| Points        | 2  |
| Public Checks | 2  |
| Hidden Tests  | 1  |

_Points:_ 2

In [None]:

# Generate a figure
q1_plot = ...



In [None]:
# Which column has the most impaired individuals in this cohort?

q1_most_impaired = ...

In [None]:
grader.check("q1_impaired_bar")

### Q2: Is Visuospatial impairment linked with ART therapy?

Using the thresholds above, binarize indivduals based on their `visuospatial_domain_z` into impaired and non-impaired.
Then, use a chi2 test to measure the linkage between this and the ART therapy of the individual.

Checked variables:
 * `q2_plot` - A countplot showing the number of individuals with and without impairment and taking each ART therapy.
 * `q2_linkage` - Is there a linkage between Visuospatial impairment and ART regimen?
 * `q2_therapy` - Which therapy is leading to more impairment?

<details><summary>Hint</summary>
In `data`, create a new column, `visuospatial_impaired`, by applying the `< -1` threshold to `visuospatial_domain_z`. Then follow the protocol under Hypothesis Testing in the Module 8 Walkthrough.</details>


|               |    |
| --------------|----|
| Points        | 5  |
| Public Checks | 3  |
| Hidden Testss | 2  |

_Points:_ 5

In [None]:
# Create a countplot which visualizes this comparison


# Generate a figure showing this comparison
q2_plot = ...


In [None]:
# Perform a chi2 test for the linkage between visuospatial impairment and ART



In [None]:
# Is there a linkage between Visuospatial impairment and ART regimen? 'yes' or 'no'
q2_linkage = ...

# Which therapy is leading to more impairment? 'Stavudine' or 'Truvada'
q2_therapy = 'Stavudine'

In [None]:
grader.check("q2_impaired_v_art")

### Q3: Is Visuospatial **score** linked with ART therapy?

Evaluate the normality of the `visuospatial_domain_z` and then choose the appropriate test using the flowcharts linked below.

Refer to the pingouin guidelines for chosing the appropriate statistical test. https://pingouin-stats.org/build/html/guidelines.html

Checked variables:
 * `q3_is_normal` - A yes/no assesment on the normality of `visuospatial_domain_z` supported by qqplot and normality testing.
 * `q3_plot` - A plot showing any difference in `visuospatial_domain_z` between ART therapies.
 * `q3_sig_diff` - A yes/no assesment on whether `visuospatial_domain_z` is significantly different between ART regimens supported by statistical tests and plots.

<details><summary>Hint</summary>
Create the qqplot and use `pg.normality` to test the normality of the `visuospatial_domain_z` as described under Continious comparisons in the Module 8 Walkthrough. </details>

|               |    |
| --------------|----|
| Points        | 5  |
| Public Checks | 3  |
| Hidden Testss | 2  |


_Points:_ 5

In [None]:
# Asses the normality of the visuospatial_domain_z scale


# Answer yes or no.
q3_is_normal = ...

In [None]:

# Generate a figure showing this comparison
q3_plot = ...


In [None]:
# Using the appropriate test
# Determine whether there is a difference in the mean Z-score of visuospatial_domain_z
# in individuals between ART regimens

In [None]:
# Is visuospatial_domain_z significantly different between ART regimens? 'yes' or 'no'
q3_sig_diff = ...

In [None]:
grader.check("q3_visuo_v_art")

### Q4: Evaluate a potential covariate

ART use is likely not the only thing that impacts neurocognitive impairement.
Use similar methods to evaluate the impact of any of:
* sex
* race
* education
* age
* YearsSeropositivedata['YS_binned']

on `visuospatial_domain_z`.
You can use any comparison method we have discussed so far.

Checked variables:
 * `q4_plot` - A plot showing any difference in `visuospatial_domain_z` across your variable of interest.
 * `q4_is_sig` - A yes/no assesment on the significance of the effect supported by statistical tests and plots.

<details><summary>Hint</summary>
Adapt any of the techniques shown in the Module 8 walkthrough to a different co-variate.</details>

|               |    |
| --------------|----|
| Points        | 5  |
| Public Checks | 2  |

_Points:_ 5

In [None]:

# Generate a figure of your comparison
q4_plot = ...



In [None]:
# Choose the appropriate test for your comparison
...

In [None]:
# Is there a linkage between Visuospatial domain and your covariate? 'yes' or 'no'
q4_is_sig = ...

In [None]:
grader.check("q4_covariates")

In this lab you explored the linkage between ART regimens and visuospatial memory domain.
We utilized tools like chi2 tests and various means-tests to determine whether categorical varaibles were assotiated with impairement.
Next week we will utilize single and multiple regressions to compare continous varaibles to gain more statistical power.

--------------------------------------------

## Submission

Check:
 - That all tables and graphs are rendered properly.
 - Code completes without errors by using `Restart & Run All`.
 - All checks **pass**.
 
Then save the notebook and the `File` -> `Download` -> `Download .ipynb`. Upload this file to BBLearn.