# Module 1: Vectors & Gene Expression

This notebook is part of the *Linear Algebra for Omics Data Science* series. It introduces vectors in the context of **gene expression**, with practical examples and biological motivation for data scientists exploring omics.


## 🧬 What is Gene Expression?

Gene expression is the process by which information from a gene is used to produce a functional product, such as a protein. Although every cell in the human body contains the same DNA, each cell expresses a different subset of genes depending on its function and environment.

In omics, gene expression is quantified as the number of RNA transcripts produced from each gene, often measured across many cells. These expression levels can be stored as vectors and matrices, allowing data scientists to analyze biological patterns using linear algebra.


## What is a Vector?
A vector is an ordered list of numbers.In the context of gene expression analysis, a vector often represents the expression levels of a single gene across several cells.

In [None]:

import numpy as np

# Gene expression values for GeneA and GeneB across 5 cells
geneA_expr = np.array([2.1, 3.5, 1.8, 4.0, 2.9])
geneB_expr = np.array([1.2, 3.3, 2.4, 3.8, 3.0])


## Visualizing Gene Expression

In [None]:

import matplotlib.pyplot as plt

plt.plot(geneA_expr, marker='o', label='GeneA')
plt.plot(geneB_expr, marker='s', label='GeneB')
plt.title("GeneA and GeneB Expression Across Cells")
plt.xlabel("Cell Index")
plt.ylabel("Expression Level")
plt.legend()
plt.grid(True)
plt.show()



##  Why Add Vectors?

Adding two gene expression vectors models the combined expression of two genes, which can represent the overall activity of a pathway or co-regulated group. This is useful when studying genes that function together biologically.


In [None]:

sum_expr = geneA_expr + geneB_expr
print("Combined Expression (GeneA + GeneB):", sum_expr)



##  Dot Product

The dot product measures the overlap between two vectors in terms of both **magnitude** and **alignment**. In gene expression, a large dot product suggests two genes are highly and similarly expressed across cells.


In [None]:

dot_product = np.dot(geneA_expr, geneB_expr)
print("Dot Product:", dot_product)



##  Cosine Similarity

Cosine similarity measures **pattern similarity**, ignoring the scale (magnitude). It answers: _"Do these genes behave similarly across cells, even if their expression levels differ?"_


In [None]:

cos_sim = np.dot(geneA_expr, geneB_expr) / (np.linalg.norm(geneA_expr) * np.linalg.norm(geneB_expr))
print("Cosine Similarity:", cos_sim)



## Summary

| Metric            | What it Measures          | Magnitude-Sensitive? | Use Case                         |
|-------------------|----------------------------|-----------------------|----------------------------------|
| Dot Product       | Magnitude + Pattern        | ✅ Yes                | Detect strong co-expression      |
| Cosine Similarity | Direction/Pattern Only     | ❌ No                 | Detect similar expression trends |



## 🧪 Exercises

1. Add a third gene vector and compare it with GeneA and GeneB.
2. Visualize all three gene expression profiles in a single plot.
3. Compute cosine similarities between all pairs.
