# Submodule 3: OOP Project
---------------------------------------------------------------------------
## Overview
In this project, you'll build a Python class to represent liver tissue samples collected for biomedical research. You'll practice object-oriented programming by validating real-world data, converting units, and analyzing sample quality. This task models how clean, reusable code supports meaningful scientific discovery.

## Learning/practice objectives
In this project, you will strengthen your abilities to:
- Use object-oriented programming to model and manage biological data.
- Apply data validation techniques to ensure scientific accuracy.
- Write methods that perform calculations and support data analysis.
- Build scripts that load, filter, and summarize real-world datasets.

## Prerequisites
- Submodule 1
- Submodule 2 (especially Pandas so you can import the dataset
- Submodule 3 Tutorials

## Getting Started

Below, you will find a task prompt that will require you to define a new Class which can handle a dataset. You can attempt the task on your own, or use the guided prompts which are each followed by a "fill-in-the-blank" model. You can copy each of those sections, edit them, and use each to build a class then write a script. If you get stuck, the entire solution is in the next tutorial.

-----------------------------------------

## Project: Building a LiverSample Class for Biomedical Data Processing

You've been hired as a data analyst for a biomedical research lab working on metabolic regulation in liver tissue. The lab collects and freezes liver samples from patients to study glycogen metabolism, enzyme activity, and glucose homeostasis.

Currently, the data comes in raw spreadsheets and contains errors, like incorrect pH values or missing identifiers. Your job is to streamline this data handling by designing a LiverSample class to:

- Import and validate liver sample data.
- Filter out samples with invalid pH.
- Convert blood glucose levels from mg/dL to mM.
- Store the collection date and enable future time-based analyses.

**What You Know**
Each row in the dataset represents one liver sample and includes:

- Sample ID (Format L###)
- Mass (mg)
- Freezer Location
- LDH (U/L) *that is, Units/L*
- Blood Glucose (mg/dL)
- Glycogen (mg/g)
- ALT (U/L) 
- pH
- Time Since Collection (min) *a measure of quality*
- Collection Date (YYYY-MM-DD)

## 📌 Your Task

You will create a class called **LiverSample** that:

- Takes one row of data as input during initialization.
- Validates the data:
   - Ensures pH is between 5.5 and 8.5. If not, skip or flag the sample.
   - Validates that mass is positive, and collection date is in proper YYYY-MM-DD format.
- Contains a method called convert_glucose_to_mM() to convert blood glucose:
     Formula: mM = (mg/dL) ÷ 18.0182
- Stores validated samples in a list of objects so you can filter, analyze, and summarize them later.

✅ What to create:

1. Your LiverSample class with methods and validation.
2. A script that loads all 30 rows, filters out invalid entries, and prints:
- Number of valid samples
- Average glucose (converted to mM)
- List of sample IDs that were removed due to bad pH

The data file is accessed as: "./Datasets/liver_samples.csv"

Below, you will find a guided set of tasks and, if you want, a "fill-in-the-blank" solution to each below (but with the code hidden unless you open it). 

A full solution is in the [final tutorial](Submodule_3_Tutorial_4_ProjectSolution.ipynb)

----------------------------------------------------------------

## ✅ Step 1: Define the Class and Basic Constructor
🧠 Task:
Create a class named LiverSample with an __init__ method that accepts values for:
- Sample ID
- Mass
- Freezer location
- LDH level
- Blood glucose
- Glycogen
- ALT level
- pH
- Time since collection
- Collection date

💡 Tips:

Use self.attribute_name = value inside __init__ to assign each input.

At this stage, don’t worry about validation—just store the values.

*Optional Suggestion: Use data_row (e.g., a dictionary or DataFrame row) as a single input to keep your constructor clean.*

In [None]:
#create your script here

In [None]:
#fill-in-the-blank solution
class LiverSample:
    def __init__(self, sample_id, mass, freezer_location, ldh, blood_glucose, glycogen, alt, pH, time_since_collection, collection_date):
        self.sample_id = __________
        self.mass = __________
        self.freezer_location = __________
        self.ldh = __________
        self.blood_glucose = __________
        self.glycogen = __________
        self.alt = __________
        self.pH = __________
        self.time_since_collection = __________
        self.collection_date = __________


## Step 2: Add Data Validation (pH, Mass, Date)
Task: **Add a private method called _validate() that ensures:**
- pH is between 5.5 and 8.5
- Mass is a positive number
- Collection date is a valid string in the YYYY-MM-DD format

Set a property self.valid to True only if **all** checks pass.

💡 Tips:

1. Use datetime.strptime() to validate date format.
2. Use try/except to catch errors and return False if validation fails.

*Expand on your class in the above code box*

In [None]:
#Validation method fill-in, to add to above. 
#Be sure to import datetime!
from _______ import datetime

        self.valid = self._validate()  # Check if the sample is valid

    def _validate(self):
        try:
            if not (__________ <= self.pH <= __________):
                return False
            if __________ <= 0:
                return False
            datetime.strptime(self.collection_date, "__________")  # Check date format
            return True
        except:
            return False


## Step 3: Add a Method to Convert Glucose
Task: Create a method named convert_glucose_to_mM() that converts blood glucose from mg/dL to mM using the formula:
**mM = mg/dL ÷ 18.0182**

Tip: Use round(..., 2) to return a value with 2 decimal places.

In [None]:
#Glucose conversion method, to add to the class definition
    def convert_glucose_to_mM(self):
        """Converts blood glucose from mg/dL to mmol/L (mM)."""
        return round(self.blood_glucose / __________, 2)


## Step 4: Build a Script to Read a Dataset and Use Your Class
Task: **Write a script that:**
1. Loads liver sample data from a CSV file.
2. Creates a LiverSample object for each row.
3. Filters out invalid samples.
4. Prints:
- Total samples
- Valid samples
- Invalid sample IDs
- Average blood glucose (in mM) of valid samples

💡 Tips:
- Use a *list* to store valid samples, starting with an empty list []
- Loop through rows with for index, row in df.iterrows() if using pandas.

In [None]:
#Suggested Script to create the dataframe df
import pandas as pd

df = pd.read_csv("__________________")  # Load your data
valid_samples = []
invalid_samples = []

for index, row in df.iterrows():
    sample = LiverSample(
        sample_id = row["Sample ID"],
        mass = row["Mass (mg)"],
        freezer_location = row["Freezer Location"],
        ldh = row["LDH (U/L)"],
        blood_glucose = row["Blood Glucose (mg/dL)"],
        glycogen = row["Glycogen (mg/g)"],
        alt = row["ALT (U/L)"],
        pH = row["pH"],
        time_since_collection = row["Time Since Collection (min)"],
        collection_date = row["Collection Date"]
    )

    if sample.valid:
        __________.append(sample)
    else:
        __________.append(sample.sample_id)

# ✅ Calculate the average glucose in mM, without assuming there are any valid samples
if valid_samples:
    avg_glucose = sum(s.convert_glucose_to_mM() for s in valid_samples) / len(__________________)
    avg_glucose = round(avg_glucose, 2)
else:
    avg_glucose = None

# Print a summary
print("Valid samples:", ___(valid_samples)) # number of codes in the "valid samples" list
print("Invalid samples:", ___(invalid_samples)) #number of IDs in the "invalid samples" list
print(f"Average blood glucose (mM): {avg_glucose}")
print(f"Invalid Sample IDs: {________}")


## Conclusion

🎉 Congratulations! If you completed this project, you've successfully used object-oriented programming to model real-world biomedical data. You created a reusable Python class, handled real-world data validation, and performed basic analysis using custom methods. That’s a huge step toward writing clean, modular, and scalable scientific code!

You’re well on your way to becoming a powerful scientific programmer! 🚀

You should consider working through other Bioinformatics modules at the [NIGMS Sandbox](https://github.com/NIGMS/NIGMS-Sandbox)


## Clean Up

To avoid unnecessary charges, stop your compute instance. You should save a copy of the Tutorial then delete the cloud files to avoid storage charges.