### What is single-cell sequencing?
Single-cell sequencing examines the nucleic acid sequence information from individual cells with optimized next-generation sequencing technologies, providing a higher resolution of cellular differences and a better understanding of the function of an individual cell in the context of its microenvironment.

### Benefits of using single-cell sequencing
Single-cell sequencing has several advantages, including:

1. **High Resolution:** Enables the study of individual cells, providing a detailed understanding of cellular heterogeneity within a population.

2. **Precise Insights:** Unravels rare cell types or subpopulations that might be overlooked in traditional bulk sequencing methods.

3. **Accurate Profiling:** Allows for accurate characterization of cellular functions, gene expression, and genetic variations at the single-cell level.

4. **Dynamic Processes:** Captures dynamic changes within a cell population, providing insights into cellular transitions and responses over time.

5. **Clinical Relevance:** Has potential applications in personalized medicine, diagnostics, and understanding disease mechanisms at the individual cell level.

6. **Reduced Averaging Effects:** Eliminates the averaging effects seen in bulk sequencing, providing a clearer picture of the diverse molecular landscape within a sample.

7. **Cellular Heterogeneity:** Facilitates the identification of subtle differences between seemingly similar cells, enhancing our understanding of complex biological systems.

8. **Discovery of Novel Biomarkers:** Unveils new biomarkers and therapeutic targets by uncovering variations in gene expression and genomic profiles within individual cells.


### Difference between bulk and single-cell sequencing

**Table: Bulk Sequencing vs. Single-Cell Sequencing**

| Aspect                            | Bulk Sequencing                                       | Single-Cell Sequencing                                |
|-----------------------------------|-------------------------------------------------------|-------------------------------------------------------|
| **Resolution**                    | Provides an average from a population of cells         | Captures information at the individual cell level      |
| **Heterogeneity**                  | Masks cellular heterogeneity                           | Reveals cellular diversity and rare cell types         |
| **Insights into Rare Cells**      | May miss rare cells or variations                      | Enables detection and analysis of rare cells           |
| **Dynamic Processes**             | Provides a static snapshot of the entire population    | Captures dynamic changes, tracking cellular responses  |
| **Clinical Applications**         | General profiling in large populations                | Applicable to personalized medicine and diagnostics    |

**Chart: Bulk vs. Single-Cell Sequencing**

```plaintext
 100% 
  90% 
  80%                         Bulk Sequencing
  70% 
  60% 
  50% 
  40%                                   Single-Cell Sequencing
  30% 
  20% 
  10% 
   0% 
```

In the chart, you can visualize the emphasis on individual cell resolution in single-cell sequencing compared to the averaged representation in bulk sequencing. This graphical representation helps highlight the differences between the two methods.

# The Process of Single-cell Sequnecing
Single-cell sequencing is a powerful technique used in genomics to study individual cells, providing insights into the genetic information of each cell in a complex biological sample. Here's a simplified explanation of the process:

1. **Cell Isolation:**
   - Start by collecting a sample of cells from the organism or tissue you're studying. This could be blood, tissue, or any other type of sample.

2. **Single-Cell Capture:**
   - To study individual cells, you need to isolate them from the rest of the sample. This can be done using various techniques, such as microfluidics or droplet-based methods, to capture and separate each cell.

3. **Cell Lysis:**
   - Once you have isolated individual cells, you break open (lyse) each cell to release its genetic material (DNA or RNA). This step is essential to access the genetic information inside the cell.

4. **Amplification:**
   - The genetic material (DNA or RNA) from a single cell is typically very small, so it needs to be amplified or copied to generate enough material for analysis. Polymerase chain reaction (PCR) is often used for this purpose.

5. **Library Preparation:**
   - The amplified genetic material is then prepared as a library. This involves tagging the genetic material with unique identifiers, allowing you to trace it back to the individual cell it came from during analysis.

6. **Sequencing:**
   - The prepared libraries are then subjected to high-throughput sequencing machines. These machines read the genetic code of each DNA or RNA fragment, providing a massive amount of data.

7. **Data Analysis:**
   - The sequencing data is then analyzed computationally. Bioinformatic tools help interpret the data, identifying genes that are active or inactive in each individual cell. Researchers can gain insights into the diversity of cell types, gene expression patterns, and potential differences between cells.

8. **Biological Insights:**
   - Finally, the results of single-cell sequencing provide a detailed understanding of the heterogeneity within a population of cells. Researchers can discover rare cell types, identify changes in gene expression under different conditions, and gain insights into various biological processes.

Single-cell sequencing has revolutionized our ability to study cellular diversity, understand complex biological systems, and uncover new insights into health and disease. It allows scientists to go beyond average measurements and examine the specific characteristics of individual cells within a population.

### Preparation

In [7]:
# import the libraries
import numpy as np # linear algebra, matrix manipulation
import scanpy as sc # computational biology
import pandas as pd # data analysis
import os # for managing file system
import anndata # for analyzing the h5ad file type


In [8]:
# Suppress all warnings
sc.settings.verbosity = 3
sc.logging.print_header()
sc.settings.set_figure_params(dpi = 1200, facecolor='white')

scanpy==1.9.6 anndata==0.10.3 umap==0.5.5 numpy==1.26.3 scipy==1.11.4 pandas==2.1.4 scikit-learn==1.3.2 statsmodels==0.14.1 pynndescent==0.5.11


In [9]:
results_file = 'write/pbmc3k.h5ad'  # the file that will store the analysis results

In [10]:
# reading the dataset in h5ad file type
adata = anndata.read_h5ad('C:/Users/Farhan/Documents/NextGenSequencing/data/pbmc3k_processed.h5ad')
# converting the datast into .csv
adata_df = adata.to_df()
adata_df


This is where adjacency matrices should go now.
  warn(

This is where adjacency matrices should go now.
  warn(


index,TNFRSF4,CPSF3L,ATAD3C,C1orf86,RER1,TNFRSF25,TNFRSF9,CTNNBIP1,SRM,UBIAD1,...,DSCR3,BRWD1,BACE2,SIK1,C21orf33,ICOSLG,SUMO3,SLC19A1,S100B,PRMT2
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
AAACATACAACCAC-1,-0.171470,-0.280812,-0.046677,-0.475169,-0.544024,4.928495,-0.038028,-0.280573,-0.341788,-0.195361,...,-0.226570,-0.236269,-0.102943,-0.222116,-0.312401,-0.121678,-0.521229,-0.098269,-0.209095,-0.531203
AAACATTGAGCTAC-1,-0.214582,-0.372653,-0.054804,-0.683391,0.633951,-0.334837,-0.045589,-0.498264,-0.541914,-0.209017,...,-0.317530,2.568866,0.007155,-0.445372,1.629285,-0.058662,-0.857164,-0.266844,-0.313146,-0.596654
AAACATTGATCAGC-1,-0.376887,-0.295084,-0.057528,-0.520972,1.332647,-0.309362,-0.103108,-0.272526,-0.500798,-0.220228,...,-0.302938,-0.239801,-0.071774,-0.297857,-0.410920,-0.070431,-0.590721,-0.158656,-0.170876,1.379000
AAACCGTGCTTCCG-1,-0.285241,-0.281735,-0.052227,-0.484929,1.572679,-0.271825,-0.074552,-0.258876,-0.416752,-0.208471,...,-0.262978,-0.231807,-0.093818,-0.247770,2.552078,-0.097402,1.631685,-0.119462,-0.179120,-0.505670
AAACCGTGTATGCG-1,-0.256483,-0.220394,-0.046800,-0.345859,-0.333409,-0.208122,-0.069514,5.806442,-0.283112,-0.199355,...,-0.202237,-0.176765,-0.167350,-0.098665,-0.275836,-0.139482,-0.310096,-0.006877,-0.109614,-0.461946
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
TTTCGAACTCTCAT-1,-0.290368,2.638305,-0.054510,-0.554385,-0.666646,-0.301333,-0.074079,-0.334131,-0.478076,-0.211993,...,2.865903,-0.259984,-0.057421,-0.320982,-0.396917,-0.078147,1.232168,-0.174593,-0.216714,-0.529870
TTTCTACTGAGGCA-1,-0.386343,2.652696,-0.058686,-0.545443,1.201865,-0.321670,-0.105418,-0.296857,1.803535,-0.222323,...,-0.314890,-0.249159,-0.058679,-0.324692,-0.427120,-0.062188,-0.630103,-0.178990,-0.181736,-0.502022
TTTCTACTTCCTCG-1,-0.207089,-0.250464,-0.046397,-0.409737,2.193953,-0.221747,-0.051566,-0.198130,-0.307756,-0.196557,...,-0.212130,-0.206711,10.000000,-0.158643,3.308512,-0.132098,2.264174,-0.051144,-0.161064,2.041497
TTTGCATGAGAGGC-1,-0.190328,-0.226334,-0.043999,-0.354661,-0.350005,-0.195177,-0.047832,-0.142079,-0.251677,-0.192347,...,-0.186529,-0.185312,-0.165108,-0.098862,-0.256393,-0.149789,-0.325824,-0.005918,-0.135213,-0.482111


In [11]:
adata.var_names_make_unique()  # this is unnecessary if using `var_names='gene_ids'` in `sc.read_10x_mtx`

In [12]:
adata  # testing the input of the files

AnnData object with n_obs × n_vars = 2638 × 1838
    obs: 'n_genes', 'percent_mito', 'n_counts', 'louvain'
    var: 'n_cells'
    uns: 'draw_graph', 'louvain', 'louvain_colors', 'neighbors', 'pca', 'rank_genes_groups'
    obsm: 'X_pca', 'X_tsne', 'X_umap', 'X_draw_graph_fr'
    varm: 'PCs'
    obsp: 'distances', 'connectivities'