# 🧪LAB: Penalized Regression for Survival Analysis

We have already seen how regularization works in regression and classification tasks by adding a penalty to the cost function. But real-world problems are not restricted to this kind of tasks. One of these examples is survival analysis. 

In this lab, you will be exposed to applying machine learning for survival analysis and with penalization. In particular, you will learn how to use **penalization techniques** in **Cox proportional hazards model** and apply all of this to survival data. 

---

**Dataset**: In this lab, you will work with a real-world dataset focused on lung squamous cell carcinoma (LUSC), a histological subtype of lung cancer. Lung cancer remains one of the leading causes of cancer-related deaths globally, and LUSC, in particular, is associated with poor prognosis and substantial biological and clinical heterogeneity.

Your goal with this dataset is to identify prognostic biomarkers that could help predict patient outcomes. You will analyze gene expression and clinical data from patients diagnosed with LUSC.

- The dataset includes a total of 473 patients with complete records.
- For each patient, you are given:
  - 378 gene expression features
  - 4 clinical covariates:  
    - Age at diagnosis  
    - Gender  
    - Smoking history  
    - Tumor stage
- The target outcome is overall survival.  
  - For patients who did not experience the event (death), survival times are right-censored at the date of last follow-up.

This dataset provides a rich opportunity to apply penalized regression techniques to model survival outcomes, identify predictive features, and assess their clinical relevance.

---

**Software**: In this lab you will get familiar with `scikit-survival`, which is a library adapting `scikit-learn` to survival analysis.

---


**Collaboration Note**: This assignment is designed to support collaborative work. We encourage you to divide tasks among group members so that everyone can contribute meaningfully. Many components of the assignment can be approached in parallel or split logically across team members. Good coordination and thoughtful integration of your work will lead to a stronger final result.

--- 

In total, this lab assignment will be worth **100 points**.


## 1. Background Reading and Conceptual Understanding (30 points)

Familiarize yourself with survival analysis using `scikit-survival`. Visit the [User Guide](https://scikit-survival.readthedocs.io) and read **Sections 1, 2, 3, and 5**.

Then, answer the following questions in your own words:

- What makes survival analysis different from standard regression or classification tasks?  
- How should the outcome variable be formatted for survival analysis?  
- Which class in `scikit-survival` is used to fit a **Cox model with L1 (Lasso) regularization**?  
- What class can be used to **tune the regularization parameter**?  
- What metrics are available to evaluate Cox model performance in `scikit-survival`, and what functions implement them?

Please, elaborate on your answers. Use as many cells as needed.

YOUR ANSWERS HERE

## 2. Data Preparation (10 points)

- Load the dataset using `pandas`.  
- Prepare the input features (`X`), and the outcome data using the columns `OS_STATUS` and `OS_MONTHS`.  
- Split the data into training (80%) and testing (20%) sets using `scikit-learn`.

In [None]:
# YOUR CODE HERE

## 3. Model Fitting and Evaluation (30 points)

- Fit and evaluate a Cox model **without** penalization.  
- Tune, fit, and evaluate a Cox model with **L1 (Lasso)** penalty. Identify which features were selected by the penalized model.  

In [None]:
# YOUR CODE HERE

## 4. Analysis, Discussion, and Conclusion (25 points)

- Discuss the performance of both models.  
- Interpret the results of variable selection from the penalized model.  
- Reflect on what the L1 penalty accomplished in terms of sparsity or interpretability.  
- Summarize key takeaways from your analysis.  
- Mention any limitations or improvements you would consider if you had more time or data.

Please, elaborate on your answers. Use as many cells as needed.

YOUR ANSWERS HERE

## 5. Collaboration Reflection (5 points)

As a group, briefly reflect on the following (max 1–2 short paragraphs):

- How did the group dynamics work throughout the assignment?
- Were there any major disagreements or diverging approaches?
- How did you resolve conflicts or make final modeling decisions?
- What did you learn from each other during this project?

YOUR ANSWERS HERE