# Assignment #2: Efficacy Analysis of a Hypothetical Arthritis Drug

**Objective**: In this assignment, your task is to utilize Python programming skills to evaluate the effectiveness of a fictional medication designed to reduce inflammation caused by arthritis flare-ups.

**Background**: Imagine a clinical trial where 60 patients were administered a new drug for arthritis. Data from this trial has been recorded in a series of CSV files. Evaluate the effectiveness of a fictional medication designed to reduce inflammation caused by arthritis flare-ups.

**Data Structure**:
- Each CSV file corresponds to a specific check-in session with the patients.
- There are 12 such CSV files, reflecting 12 different sessions where patients reported their experiences.
- Inside each file:
  - Rows: Each of the 60 rows represents a unique patient.
  - Columns: Each of the 40 columns corresponds to a day, detailing the number of inflammation flare-ups the patient experienced on that day.

**Your Role**: Analyze this data to determine how effective the new drug has been in managing arthritis inflammation across the trial period.

**The file is located under `../data/assignment_2_data/`.**

## Part 1: Importing the Relevant Data Files

**Objective**: In this section of the assignment, you are tasked with filtering and importing specific data files essential for our analysis.

**Data File Types**:
- **Irrelevant Files**: Files named `small-xx.csv`. These files are not required for our analysis.
- **Relevant Files**: Files named `inflammation-xx.csv`. These are the files that contain the data we need to analyze.

**Your Task**:
- Write a Python script to identify and create a list containing the full paths of all the `inflammation-xx.csv` files. These are the files we will be analyzing.
- Ensure that files named `small-xx.csv` are excluded from this list.

**Guidance**:
- If you need assistance with file operations in Python, please refer back to the `03_in_out_modules_files_oop` slides for helpful tips and examples.

By completing this task, you will have a prepared list of all the relevant data files, ready for further analysis in subsequent parts of this assignment.

In [None]:
import os
import csv

path = '' #YOUR CODE HERE: put in the path to your file
all_paths = []

#this for loop will iterate through all the contents of the folder
for i in os.listdir(path):
  if :   #YOUR CODE HERE: check if the filename begins with 'inflammation'
    #YOUR CODE HERE: if True, append the full path to the array all_paths


print(all_paths)

## Task: Reading and Displaying Data from the First File

**Objective**: Now that you have a list of the relevant `inflammation-xx.csv` file paths, your next step is to read and display the contents of the first file in this list.

**Instructions**:
1. **Read the First File**:
   - Use Python to open the first file in your list of `inflammation-xx.csv` files.
   - Ensure you're reading the file in a way that allows you to access its contents.

2. **Print Each Row**:
   - After opening the file, iterate through each row of data.
   - Print each row so you can visually inspect the data it contains.

**Expected Outcome**:
- By completing this task, you will display the detailed data of each patient for the first day of the clinical trial. This step is crucial for understanding the structure and nature of the data you will be analyzing.

**Hint**: Remember to use appropriate Python file handling and data reading methods. If you need guidance on how to handle CSV files in Python, refer to the relevant sections in your Python learning resources.

In [None]:
with open(all_paths[0], 'r') as f:
    # YOUR CODE HERE: Use the csv.reader to read the .csv file into 'contents'
    
    # YOUR CODE HERE: Iterate through 'contents' using a for loop and print each row for inspection

## 2. Data Summarization Function: `patient_summary`

**Objective**: Create a function named `patient_summary` that will compute summary statistics for each patient's data over a 40-day period.

**Function Specifications**:
- **Function Name**: `patient_summary`
- **Parameters**:
  1. `file_path`: A string representing the path to the CSV file containing the patient data.
  2. `operation`: A string specifying the type of summary operation to perform. Acceptable values are "mean", "max", or "min". This will determine whether the function calculates the average, maximum, or minimum number of flare-ups for each patient over the 40 days.

**Expected Behavior**:
- Your function should read the data from the file at `file_path`.
- Perform the specified `operation` (mean, max, or min) to summarize the flare-ups data for each of the 60 patients.
- Return a list or array with 60 elements, each element being the result of the summary operation for a corresponding patient.

**Output**:
- The output should be an array or list with a length of 60, aligning with the number of patients in the study.

**Hints for Implementation**:
1. **Utilizing NumPy**: For efficient data manipulation and computation, consider using NumPy, as discussed in the `04a_data_numpy` slides.
2. **Output Shape**: Ensure that the shape of your output data matches the number of patients, which is 60.

In [None]:
import numpy as np

def patient_summary(file_path, operation):
    # load the data from the file
    data = np.loadtxt(fname=file_path, delimiter=',')
    ax = 1  # this specifies that the operation should be done for each row (patient)

    # implement the specific operation based on the 'operation' argument
    if operation == 'mean':
        # YOUR CODE HERE: calculate the mean (average) number of flare-ups for each patient

    elif operation == 'max':
        # YOUR CODE HERE: calculate the maximum number of flare-ups experienced by each patient

    elif operation == 'min':
        # YOUR CODE HERE: calculate the minimum number of flare-ups experienced by each patient
        
    else:
        # if the operation is not one of the expected values, raise an error
        raise ValueError("Invalid operation. Please choose 'mean', 'max', or 'min'.")

    return summary_values

In [None]:
# test it out on the data file we read in and make sure the size is what we expect i.e., 60
data_min = patient_summary(all_paths[0], 'min')
print(len(data_min))

## 3. Error Detection in Patient Data

**Objective**: Develop a function named `detect_problems` that identifies any irregularities in the patient data, specifically focusing on detecting patients with a mean inflammation score of 0.

**Function Specifications**:
- **Function Name**: `detect_problems`
- **Parameter**:
  - `file_path`: A string that specifies the path to the CSV file containing patient data.

**Expected Behavior**:
- The function should read the patient data from the file at `file_path`.
- Utilize the previously defined `patient_summary()` function to calculate the mean inflammation for each patient.
- Employ an additional helper function `check_zeros(x)` (already created) to determine if there are any zero values in the array of mean inflammations.
- The `detect_problems()` function should return `True` if there is at least one patient with a mean inflammation score of 0, and `False` otherwise.

**Implementation Guidance**:
1. Call `patient_summary(file_path, 'mean')` to get the mean inflammation scores for all patients.
2. Use `check_zeros()` to evaluate the mean scores. This helper function takes an array as input and returns `True` if it finds zero values in the array.
3. Based on the output from `check_zeros()`, the `detect_problems()` function should return `True` (indicating an issue) if any mean inflammation scores of 0 are found, or `False` if none are found.

**Note**: This function is crucial for identifying potential data entry errors, such as healthy individuals being mistakenly included in the dataset or other data-related issues.

**Understanding the `check_zeros(x)` Helper Function**

The `check_zeros(x)` function is provided as a tool to assist with your data analysis. While you do not need to modify or fully understand the internal workings of this function, it's important to grasp its input, output, and what the output signifies:

1. **Input**:
   - **Parameter `x`**: This function takes an array of numbers as its input. In the context of your assignment, this array will typically represent a set of data points from your patient data, such as mean inflammation scores.

2. **Output**:
   - The function returns a boolean value: either `True` or `False`.

3. **Interpreting the Output**:
   - **Output is `True`**: This indicates that the array `x` contains at least one zero value. In the context of your analysis, this means that at least one patient has a mean inflammation score of 0, signaling a potential issue or anomaly in the data.
   - **Output is `False`**: This signifies that there are no zero values in the array `x`. For your patient data, it means no patient has a mean inflammation score of 0, and thus no apparent anomalies of this type were detected.

**Usage in Your Analysis**:
When using `check_zeros(x)` in conjunction with your `patient_summary()` function in the `detect_problems()` function, you'll be checking whether any patient in your dataset has an average (mean) inflammation score of 0.

In [None]:
# Run this cell so you can use this helper function

def check_zeros(x):
    '''
    Given an array, x, check whether any values in x equal 0.
    Return True if any values found, else returns False.
    '''
    # np.where() checks every value in x against the condition (x == 0) and returns a tuple of indices where it was True (i.e. x was 0)
    flag = np.where(x == 0)[0]

    # Checks if there are any objects in flag (i.e. not empty)
    # If not empty, it found at least one zero so flag is True, and vice-versa.
    flag = len(flag) > 0

    return flag

In [None]:
# Define your function `detect_problems` here

def detect_problems(data):
  #YOUR CODE HERE: use patient_summary() to get the means and check_zeros() to check for zeros in the means

  return

| Criteria                     | Pass Criteria                                                                                                                                                                 | Fail Criteria                                                                                                         |
|------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------|
| **General Criteria**         |                                                                                                                                                                               |                                                                                                                       |
| Code Execution               | All code cells execute without errors.                                                                                                                                        | Any code cell produces an error upon execution.                                                                      |
| Code Quality                 | Code is well-organized, concise, and includes necessary comments for clarity.                                                                                                 | Code is unorganized, verbose, or lacks necessary comments.                                                            |
| Data Handling                | Data files are correctly handled and processed.                                                                                                                               | Data files are not handled or processed correctly.                                                                    |
| Adherence to Instructions    | Follows all instructions and requirements as per the assignment.                                                                                                              | Misses or incorrectly implements one or more of the assignment requirements.                                         |
| **Specific Criteria**        |                                                                                                                                                                               |                                                                                                                       |
| Setup              | Successfully downloads and extracts the data from the provided `.zip` file.                                                                                                  | Fails to download or incorrectly extracts the data files.                                                             |
| Part 1: Reading in our files | Correctly filters and lists file paths for `inflammation-xx.csv` files and reads in the first file, displaying its content.                                                   | Fails to filter out `small-xx.csv` files, or errors in reading/displaying file contents.                              |
| Part 2: Summarizing our data | Correctly defines `patient_summary()` function. Function processes data as per `operation` and outputs correctly shaped data (60 entries).                                   | Incomplete or incorrect definition of `patient_summary()`. Incorrect implementation of operation or wrong output shape.|
| Part 3: Checking for Errors  | Correctly defines `detect_problems()` function. Function uses `patient_summary()` and `check_zeros()` to identify mean inflammation of 0 accurately.                        | Incorrect definition or implementation of `detect_problems()` function. Fails to accurately identify mean inflammation of 0.|
| **Overall Assessment**       | Meets all the general and specific criteria, indicating a strong understanding of the assignment objectives.                                                                  | Fails to meet one or more of the general or specific criteria, indicating a need for further learning or clarification.|


## References

### Data Sources
- Software Carpentry. _Python Novice Inflammation Data_. http://swcarpentry.github.io/python-novice-inflammation/data/python-novice-inflammation-data.zip
