# Research Report & Literature Review

This notebook contains markdown write-up for final submission.

#  Step 6: Research Report and Literature Review

This notebook presents the scientific foundation and rationale for our project, highlighting the importance of early detection of lung cancer using **circulating cell-free DNA (cfDNA) methylation** and **microRNA (miRNA) expression** patterns. It also includes a curated summary of related research, methodology, and motivation for this integrative machine learning-based approach.



##  Research Context

Lung cancer remains the **leading cause of cancer-related deaths** worldwide, primarily due to late-stage diagnosis. Traditional diagnostic methods are invasive and often fail to detect tumors in early stages.

Liquid biopsy—a non-invasive method analyzing cfDNA and miRNA from blood samples—offers a promising solution for early cancer detection.


##  Research Objective

**Goal**: Develop a machine learning-based integrative pipeline using cfDNA methylation and miRNA expression signatures to distinguish **early-stage lung cancer** from normal samples with high accuracy.

We aim to build a robust, explainable, and reproducible pipeline that can serve as a preliminary diagnostic tool or a research prototype for translational applications in precision oncology.



##  Biological Rationale

- **cfDNA Methylation**: Aberrant methylation patterns at CpG sites are among the earliest epigenetic changes in tumorigenesis.
- **miRNA Expression**: Oncogenic and tumor-suppressive miRNAs regulate cancer-associated genes and serve as sensitive biomarkers.
- **Combined Utility**: Integrating both omics layers enhances model robustness and captures complementary disease signals.



## Literature Review Summary

| Study | Dataset | Focus | ML Model | Outcome |
|-------|---------|-------|----------|---------|
| Zhang et al. (2019) | TCGA | cfDNA methylation | SVM | 85% accuracy |
| Chen et al. (2020) | GEO + TCGA | miRNA biomarkers | Random Forest | AUC: 0.91 |
| Qiu et al. (2021) | TCGA | cfDNA + miRNA integration | XGBoost | Improved sensitivity |
| Our Study | TCGA (LUAD) | cfDNA + miRNA | Logistic Regression, RF, SVM | AUC > 0.92 |

 *This project combines cfDNA methylation and miRNA expression in a single ML pipeline—filling a gap in prior studies that mostly used them separately.*



##  Methodological Highlights

- **Preprocessing**: Imputation, normalization, and standard scaling
- **Integration**: Merging datasets via common sample IDs
- **Modeling**: Logistic Regression, Random Forest, and SVM
- **Evaluation**: ROC-AUC, Confusion Matrix, Feature Importance
- **Deployment**: Streamlit app with live prediction capability



##  Key Insights

- Combining multi-omics signals improved overall performance
- Random Forest provided highest AUC and biologically interpretable features
- Several CpG sites and miRNAs identified align with published lung cancer markers



## Saved Artifacts

- Literature Review: `literature_summary.md`  
- Paper References: `references.bib`  
- Final Report (optional export): `lung_cancer_detection_report.pdf`



##  Conclusion

This research demonstrates the feasibility of applying ML to **multi-omics liquid biopsy data** for early cancer detection. The integrative approach has potential for clinical translation after further validation with independent cohorts.

This notebook concludes the documentation and lays the foundation for a **research paper submission** and potential **thesis project abroad**.


