## Vector

A vector is like a list of numbers that represent something. Imagine you’re describing a point on a map using coordinates (like latitude and longitude). Similarly, a vector is a way to describe something with numbers.

**Example:**
If you have a picture of a cat, a vector could be used to represent this cat by listing numbers that describe various features of the picture.


## Embeddings

Embeddings are a special kind of vector that helps computers understand complex things, like words or images, in a way that makes them easier to work with. Think of embeddings as a way to turn something complex into a list of numbers that a computer can easily understand.

**Example:**
If you have the word "apple," an embedding would turn it into a vector of numbers. This vector helps the computer understand the meaning of "apple" in relation to other words.

<div align="center">

![Embeddings](https://miro.medium.com/v2/1*UCKRYEj85S3eH1uv1vFfCw.gif)
<img src="https://cdn.sanity.io/images/vr8gru94/production/e016bbd4d7d57ff27e261adf1e254d2d3c609aac-2447x849.png" width="700px" height="300px">
</div>

<div align="center">

| Embeddings without DL | Embeddings with DL | Others |
| -------------- | ----------- | -- |
|TF-IDF|Word2Vec| Glove (Matrix Factorization Based) |
|N-Gram|FastText| |
|Document Matrix(BoW)|ELMO| |
|Integer Embeddings|BERT| |

</div>

### Rise of Embeddings with Deep Learning

| **Aspect**               | **Old Embeddings (Without Deep Learning)**           | **New Embeddings (With Deep Learning)**             |
|--------------------------|--------------------------------------------------------|------------------------------------------------------|
| **Context**              | Use the same meaning for a word no matter where it appears. | Adjust the meaning of a word based on the surrounding words. |
| **Meaning**              | Can struggle to show subtle differences in meaning.    | Better at capturing and showing subtle differences in meaning. |
| **Multiple Meanings**    | Uses the same representation for words that have different meanings in different contexts. | Changes the representation of a word depending on its context. |
| **Adaptability**         | Fixed and can’t easily adjust to new information or topics. | Can be updated and improved with new data or for specific topics. |
| **Feature Learning**     | Needs a lot of manual work to figure out how to represent words. | Automatically learns and discovers useful features from large amounts of data. |



## **Vector Embeddings**

**Vector embeddings** are numerical representations of data—like text, images, audio, or even users—converted into a **dense vector** (a list of floating-point numbers) that capture the meaning or features of the data in a machine-readable way.

> Think of it as: *"Turning raw data into points in a high-dimensional space where similar items are close together."*

**Example (Text Embedding)**: Let’s say you use a model like OpenAI’s `text-embedding-ada-002` to embed:

* "cat" → `[0.15, 0.8, -0.1, ...]` 
* "dog" → `[0.17, 0.79, -0.12, ...]`
* "airplane" → `[0.92, -0.33, 0.88, ...]`

Here, "cat" and "dog" will be close together in vector space, because they are semantically related, but "airplane" will be far away.


## **Vector Database**

A **vector database** is a special type of database designed to **store, index, and search vector embeddings efficiently**, especially in high-dimensional spaces.

Unlike traditional databases that search with exact matches or filters, vector DBs perform **similarity search**. They power **semantic search**, recommendation systems, and generative AI memory.

### Key Functions:

* Store millions (or billions) of embeddings

* Perform **Approximate Nearest Neighbor (ANN)** search (e.g. find 10 closest vectors to a given vector)
* Support metadata filters (e.g. only search among items tagged "Nepali articles")

**Examples**: Pinecone, FAISS, PineCone, ChromaDB, Weaviate

### Use Cases:

* **Semantic search**: Search for "Who built the Taj Mahal?" → it retrieves documents that don’t contain exact words but related content.

* **Similarity search**: "Search similar images, text, audio, ..."
* **Chatbot memory**: Store previous conversations as vectors, retrieve relevant ones when user asks something related.
