## 🧠 **What Is Thompson Sampling?**

Thompson Sampling is a **smart guessing algorithm** used when you have to **choose the best option** from many, but **you don’t know which one is best yet**.

It’s mainly used to solve the **Multi-Armed Bandit Problem**, where:

> Imagine you have several slot machines (let’s say Ads, Products, Webpage Designs), and you want to find out **which one gives the best reward (clicks/sales)**.

---

## 🎯 **Real-Life Example: Online Ads**

Suppose you have **5 different ads** and **thousands of users** visiting your site. You want to **show the best ad** to maximize clicks, but you don’t know which one is best yet.

You **can’t try all ads equally forever**, because that would waste time. So, Thompson Sampling helps you:

* **Explore** new or uncertain ads,
* **Exploit** (focus on) ads that are doing well.

---

## ⚙️ **How Does It Work? Step-by-Step**

### 1️⃣ Start with **no knowledge**

All ads look the same at first — you have no idea which one works better.

---

### 2️⃣ For each ad, maintain a **belief distribution**

Think of it like a **guess** or **opinion** on how good each ad is — and it changes as you gather data.

You use a **Beta distribution** to model your belief for each ad (because rewards are 1s and 0s — click or no click).

![image.png](attachment:image.png)

---

### 3️⃣ In each round:

* You **sample** a number from each ad’s belief.
* You **choose the ad** with the **highest sampled value**.
* You **show** that ad to the user.
* If the user clicks: that’s a reward of **1**, else **0**.

---

### 4️⃣ Update your belief:

Based on the click (or not), you **update your belief** for that ad. This is done using **Bayesian updating** (don’t worry — the math does this behind the scenes).

The more confident you are about an ad, the more narrow your guess becomes.

---

### 🔁 Repeat this process:

* Each round, you keep learning.
* Over time, the **bad ads get ignored**, and the **best ad gets picked more and more often**.

---

## 📈 Why Is It Great?

* It **naturally balances** trying new things (exploration) and focusing on what works (exploitation).
* Works well even when **data is limited**.
* It’s **probabilistic**, so it’s more flexible and robust than fixed-rule algorithms.

---

## 🧪 Compared to UCB (Upper Confidence Bound)

| Feature              | UCB                         | Thompson Sampling               |
| -------------------- | --------------------------- | ------------------------------- |
| Type                 | Deterministic (uses bounds) | Probabilistic (uses sampling)   |
| Exploration strategy | Fixed confidence intervals  | Random sampling based on belief |
| Ease of use          | Simple math logic           | Needs probability distribution  |
| Performance          | Very good                   | Often **better** in practice    |

---

## 💡 Final Thought

> **Thompson Sampling = "Smart Guessing + Learning from Feedback"**

It’s a powerful way to make decisions in **uncertain environments** — whether it’s picking ads, products, or strategies — and it gets better over time!
