In [None]:
from IPython.display import display, HTML
from datetime import datetime

# Define the notebook title
Notebook_title = "scRNA Lethal COVID19 Analysis"

# Get the current date
current_date = datetime.now().strftime("%B %d, %Y")

# Create the HTML string with title, date, and author
html_content = f"""
<h1 style="text-align:center;">{Notebook_title}</h1>
<br/>
<h3 style="text-align:left;">MikiasHWT</h3>
<h3 style="text-align:left;">{current_date}</h3>
"""

# Display the HTML content in the output
display(HTML(html_content))


# Background
As of November 2024, the [World health Organization](https://data.who.int/dashboards/covid19/cases) reported 777 million (103 million in the US) confirmed cases of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) worldwide, with over 7 million deaths (1.2 million in the US). This coronavirus, commonly known as COVID-19, has had a profound impact on global health, economies, and societies.In this project, I aim to replicate the analyses from the paper ["A molecular single-cell lung atlas of lethal COVID-19"](https://www.nature.com/articles/s41586-021-03569-1#data-availability). 

The original paper provides an in-depth examination of the cellular and molecular alterations in the lungs of individuals who died of COVID-19, utilizing single-nucleus RNA sequencing to analyze lung tissue from 19 patients (12M, 7F, mediage age 72) who died of COVID-19 and biopsy or resection samples from 7 pre-pandemic controls (4M, 3F, median age 70).

## Objectives
By replicating the various analyses performed by the original authors, I intend to recapitulate the original findings and further explore the pathophysiology of lethal COVID-19. 

This will include:
- Process single-cell RNA sequencing data using sensible quality control metrics.
- Cluster and integrate immune cell populations between healthy and COVID-19 samples. 
- Identify and label immune cell using gene expression profiles and activation states
- Characterize differences in cell infilitration, proportions and activation states between healthy and COVID-19 samples. 

The ultimate goal of this project is to enhance my understanding of the cellular and molecular mechanisms underlying severe COVID-19.

## Data Source
The data was made publicly avaible in Gene Expression Omnibus, under [GSE171524](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE171524). 
I have downloaded the TAR file to a local folder and extracted the CSV files containing the data usuing 7-Zip. 

## Tissue Collection
Tissue samples from lethal COVID-19 patients were collected with consent, from New York Presbyterian Hospital or Columbia University Medical Center. Samples were tested for COVID-19 using reverse transcription quantitative polymerace chain reaction (RT-qPCR) and the regions of interest were selected based on pathological review of adjecent Haematoxylin and Eosin (H&E) stained, Formalin-Fixed Parafin-Embedded (FFPE) slides. 1 cm^3 samples were snap frozen in liquid nitrogen, and embedded in Optimal Cutting Temperature (OCT) compound in -80C freezers until processing. 

## Tissue Processing
In preperation for single-nucleus RNA sequencing tissues were: 
- Rinced of OCT in ice-cold Phosphate Buffered Saline (PBS).
- Mechanically dissociated with fine scissors and pipettes in a buffer containing Tween surfactant (and RNase Inhibitor) to aid in dissociation of cells and extraction of nuclei. 
- Washed in Tris salt containing buffer and filetered with 70um cell strainers followed by pelleting at 500g and resuspension in an appropriate amount of Tris buffered solution. 
- Cells were counted by secondary investigator uninvoled with tissue processing before 15,000-20,000 nuclei were loaded per channel on a Chromium controller using Chromium Next GEM Single Cell 3ʹ v3.1 reagents. 


## Library Preparation & Sequencing 

Chromium Next GEM Single Cell 3ʹ v3.1 reagents were used to prepare single-nucleus RNA-seq libraries, mostly according to manufacturers recomendation. One additional cDNA amplificiation cycle was included to account for lower RNA yields from nuclei as compared to whole-cell RNA extractions. RNA libraries and cDNA were quantified using D1000 TapeStation and Qubit HS DNA quantification kit. Finally the libraries were pooled in an equimolar mixture and sequenced on a NovaSeq 6000 with S4 flow cell, usuing paired-end, single-index sequencing.

## Data Preprocessing

- Raw 3' scRNA-seq data was demultiplexed using Cell Ranger (v5.0).
- Trancripts were alligned using a COVID-19 appended human reference genome (GRCh38).
- Ambient RNA was removed using CellBender (v.0.2.0). 
- Expression matrices were procused using Seurat (v.3.2.3). 
- The following filters were applied to keep nuclei with:
    - 200-7500 genes. 
    - 400-40,000 Unique Molecular Idenitifier(UMI's).
    - <10% Mitochondrial reads. 
- Scrublet was applied with a predicted rate of 4-9.6% to remove nuclei doublets.  

# Prep Workplace

## Import Libraries

In [2]:
import numpy as np
import pandas as pd
import os 

## Directories

In [3]:
# Define directories 
datDir = os.path.abspath("data")
outDir = os.path.abspath("output")

# List their contents. 
for path in [datDir, outDir]:
    # os.makedirs(path, exist_ok=True)   # Optional: Create directories if they dont exist
    print(f"Contents of {path}:")
    print("\n".join(os.listdir(path)) or "Directory is empty", "\n")

Contents of c:\Users\Owner\Documents\GitHub\scRNA_Lethal_Covid19_Analysis\data:
GSE171524_RAW
GSE171524_RAW.tar
Supplementary_Table1_Clinical_Information.xlsx 

Contents of c:\Users\Owner\Documents\GitHub\scRNA_Lethal_Covid19_Analysis\output:
Directory is empty 



## Data Source

# Import Data

# Wrangle Data

explain

# Explore Data

explain

# Analyze Data

explain

# Conclusions

## Discoveries

explain

## Future Directions

explain

# End

## Show Session Information

In [4]:
import session_info
session_info.show()

## Save Session Requirements

In [5]:
# Replace spaces in notebook title with underscores
filename = Notebook_title.replace(" ", "_") + "_requirements.txt"

# Run the pip freeze command and save the output txt file
!pip freeze > $filename