## **Vectors in Data Science**

### **What is a Vector?**  
A vector is an ordered collection of numbers (scalars) that represent a point in space. Mathematically, a vector is an **n-dimensional array** of numerical values.

Example:  
$$
v = \begin{bmatrix} 3 \\ 4 \\ 5 \end{bmatrix}
$$
This is a **3D vector**, representing a point in three-dimensional space.

---

## **Applications of Vectors in Data Science**

### **1. Feature Representation in Machine Learning**
- In supervised learning, data points are often represented as vectors.
- Each feature in a dataset corresponds to a dimension in a vector.
- Example: A house price prediction model may use a feature vector:  
  $$
  v = \begin{bmatrix} \text{size} \\ \text{bedrooms} \\ \text{age} \end{bmatrix} = \begin{bmatrix} 2000 \\ 3 \\ 15 \end{bmatrix}
  $$

### **2. Word Embeddings in NLP**
- Words or phrases can be represented as vectors in high-dimensional space.
- Word2Vec, GloVe, and BERT transform words into vector representations.
- Example: The word "king" might be represented as a 300-dimensional vector.

### **3. Image Processing & Computer Vision**
- Images are represented as vectors by flattening pixel values.
- Example: A grayscale **28×28** image is converted into a **784-dimensional vector**.

### **4. Recommendation Systems**
- User preferences and item features are represented as vectors.
- Cosine similarity is often used to measure the similarity between users/items.

### **5. Dimensionality Reduction (PCA, t-SNE)**
- High-dimensional data is projected into lower dimensions for visualization and efficiency.
- Principal Component Analysis (PCA) transforms high-dimensional vectors into a smaller set of uncorrelated components.

### **6. Clustering (K-Means, DBSCAN)**
- Clustering algorithms work by grouping similar data points based on vector representations.
- Example: Customer segmentation using K-Means clustering.

### **7. Deep Learning and Neural Networks**
- Input layers in neural networks take vectors as input.
- Feature extraction layers convert raw data into meaningful vector representations.

---

## **Mathematical Operations on Vectors**
Vectors support operations that help in understanding relationships between data points.

### **1. Addition & Subtraction**  
$$
\mathbf{a} + \mathbf{b} = (a_1 + b_1, a_2 + b_2, \dots, a_n + b_n)
$$

### **2. Dot Product (Similarity Measure)**
- Used in **cosine similarity** and **linear regression**.
$$
\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i
$$

### **3. Magnitude (Norm)**
- Measures vector length.
$$
\|\mathbf{v}\| = \sqrt{v_1^2 + v_2^2 + \dots + v_n^2}
$$

### **4. Cosine Similarity**
- Measures similarity between two vectors.
$$
\cos(\theta) = \frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\| \|\mathbf{b}\|}
$$
- Used in **document similarity, recommendation systems,** etc.

---