# DBSCAN Clustering Notebook Explanation (Hinglish)

Is notebook mein DBSCAN clustering step-by-step dikhaya gaya hai. Neeche har cell ka code aur uski simple Hinglish explanation di hai. Aap is markdown ko seedha Jupyter Notebook mein paste kar sakte hain.

---
## Cell 1: Libraries Import

```python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
```
- `numpy` import karte hain numerical operations ke liye (arrays, matrix).
- `pandas` import karte hain dataframes handle karne ke liye.
- `matplotlib.pyplot` import karte hain plots banane ke liye.

---
## Cell 2: Data Loading & Inspection

```python
df = pd.read_csv("Mall_customers.csv")
df.head()
type(df)
```
- `pd.read_csv(...)`: CSV file ko pandas dataframe mein load karte hain.
- `df.head()`: first 5 rows print karta hai initial data dekhne ke liye.
- `type(df)`: check karte hain ke data structure pandas DataFrame hai.

---
## Cell 3: View Last Rows

```python
df.tail()
```
- `df.tail()`: last 5 rows show karta hai, data ke end ko inspect karne ke liye.

---
## Cell 4: Dataset Shape

```python
df.shape
```
- `df.shape`: (rows, columns) ki tuple return karta hai, data ka size batata hai.

---
## Cell 5: Feature Selection & Numpy Conversion

```python
df = df[["Annual Income (k$)", "Spending Score (1-100)"]].values
type(df)
```
- `df[[...]]`: dataframe se sirf 2 columns select karte hain: income aur spending score.
- `.values`: pandas DataFrame ko numpy array mein convert karte hain.
- `type(df)`: confirm karte hain ab `df` ek NumPy array hai.

---
## Cell 6: Initial Scatter Plot

```python
plt.scatter(df[:, 0], df[:, 1], s=10, c="green")
```
- `df[:,0]` first column (Annual Income) ko x-axis pe aur `df[:,1]` second column (Spending Score) ko y-axis pe plot karte hain.
- `s=10`: point size set karta hai.
- `c="green"`: points ka color green hai.

---
## Cell 7: Elbow Method with KMeans (for comparison)

```python
from sklearn.cluster import KMeans
sse = []
K = range(1, 10)
for k in K:
    kmean = KMeans(n_clusters=k)
    kmean.fit(df)
    sse.append(kmean.inertia_)
plt.plot(K, sse)
plt.title("Elbow Method")
plt.xlabel("Number of Clusters")
plt.ylabel("SSE")
```
- `Standard K-Means elbow method`: SSE (Sum of Squared Errors) accumulate k=1 se 9 tak.
- `inertia_`: KMeans ka attribute jo SSE return karta hai.
- `plt.plot(...)`: elbow curve banate hain jo optimal k choose karne mein help karta hai.

---
## Cell 8: DBSCAN Clustering & Visualization

```python
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN

# Step 1: Fit DBSCAN
```python
dbscan = DBSCAN(eps=5, min_samples=5)
label = dbscan.fit_predict(df)
```
- `DBSCAN(eps=5, min_samples=5)`: density-based clustering, points ko cluster karne ke liye.
- `eps`: radius setting, `min_samples`: minimum points in neighborhood.
- `fit_predict(...)`: data pe model fit karke har point ka label return karta hai.

```python
# Step 2: Unique labels
unique_labels = np.unique(label)
```
- `np.unique(...)`: labels array mein jitne distinct labels (clusters + noise) unko list karta hai.

```python
# Step 3: Color map setup
colors = plt.cm.tab10(np.linspace(0, 1, len(unique_labels)))
```
- `plt.cm.tab10`: 10 distinct colors palette.
- `np.linspace(0,1,...)`: evenly spaced values, jitne clusters utne colors.

```python
# Step 4: Plot each cluster
for lbl, color in zip(unique_labels, colors):
    cluster = df[label == lbl]
    print(cluster)
    print(label)
    plt.scatter(cluster[:, 0], cluster[:, 1], s=10, c=[color], label=f"Cluster {lbl}")
```
- Loop se har cluster ke points nikaalte hain aur scatter plot banate hain.
- `label == lbl`: boolean mask points filter karne ke liye.
- `print(cluster)` aur `print(label)`: cluster data aur labels console pe dekhne ke liye.

```python
# Step 5: Finalize plot
plt.legend()
plt.title("DBSCAN Clustering")
plt.xlabel("Annual Income (k$)")
plt.ylabel("Spending Score (1-100)")
plt.show()
```
- `plt.legend()`: clusters ke colors aur labels show karta hai.
- Axis labels aur title set karke final plot display karte hain.


In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt


In [None]:
df=pd.read_csv("Mall_customers.csv")
df.head()
type(df)

In [None]:
df.tail()

In [None]:
df.shape

In [None]:
df=df[["Annual Income (k$)","Spending Score (1-100)"]].values
type(df)

In [None]:
plt.scatter(df[:,0],df[:,1],s=10,c="green")

In [None]:
from sklearn.cluster import KMeans
sse=[];
K=range(1,10)
for k in K:
    kmean=KMeans(n_clusters=k)
    kmean.fit(df)
    sse.append(kmean.inertia_);
plt.plot(K,sse)
plt.title("elbow method")
plt.xlabel("number of cluster")
plt.ylabel("SSE")

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.cluster import DBSCAN

# Step 1: Fit DBSCAN
dbscan = DBSCAN(eps=5, min_samples=5)
label = dbscan.fit_predict(df)

# Step 2: Get unique labels
unique_labels = np.unique(label)

# Step 3: Create color map (jitne clusters hain, utne colors assign ho jayenge)
colors = plt.cm.tab10(np.linspace(0, 1, len(unique_labels)))  # up to 10 distinct colors

# Step 4: Plot clusters in loop
for lbl, color in zip(unique_labels, colors):
    cluster = df[label == lbl]
    print(cluster)
    print(label)
    plt.scatter(cluster[:, 0], cluster[:, 1], s=10, c=[color], label=f"Cluster {lbl}")

# Step 5: Add legend & show
plt.legend()
plt.title("DBSCAN Clustering")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
