In [1]:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
    .appName("JupyterStandalone") \
    .master("spark://8fa087ac675c:7077") \
    .config("spark.executor.instances", "1") \
    .config("spark.executor.cores", "2") \
    .config("spark.executor.memory", "1g") \
    .getOrCreate()

Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
25/06/23 09:31:44 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable


In [2]:
sc = spark.sparkContext

In [3]:
rdd1=sc.parallelize([i for i in range(10)])

In [4]:
rdd1.getNumPartitions()

12

In [5]:
print(rdd1.glom())

PythonRDD[1] at RDD at PythonRDD.scala:56


Great question! Here’s a simple explanation of **`glom()`** in Spark:

---

### What does `glom()` do?

* **`glom()`** transforms each partition of an RDD into a **list (or array) of all elements in that partition**.
* It returns an RDD where **each element is a list containing all the items of one partition**.

---

### Why use `glom()`?

* Useful to **inspect or debug data partitioning**.
* Lets you see how data is grouped inside partitions.
* Helps when you want to perform operations on entire partitions as a unit.

---

### Example:

```python
rdd = sc.parallelize([1, 2, 3, 4, 5, 6], 3)  # 3 partitions

print(rdd.glom().collect())
```

**Output:**

```
[[1, 2], [3, 4], [5, 6]]
```

Explanation:

* The RDD is split into 3 partitions.
* `glom()` collects all elements of each partition into a list.
* So you get a list of lists, one per partition.

---

### Summary:

| Input RDD partitions | Output of `glom()` |
| -------------------- | ------------------ |
| Partition 1: 1, 2    | \[1, 2]            |
| Partition 2: 3, 4    | \[3, 4]            |
| Partition 3: 5, 6    | \[5, 6]            |

`glom()` output: `[[1, 2], [3, 4], [5, 6]]`

---

If you want, I can show you how to use it in your Spark code!


# glom le k garxa vanae
yo pani yeuta transformation nai ho

yelle ni rdd nai return garxa

yelle k garxa vane hamlae partition garxam 

partition ma kati element hunxa


but if we want only one element in a partition then it is useful


vanna khoje ko from n no of elements in the partition create a partition that has only one element ie list and that list contains all the element in that previous partitions

Ho, timro bujhāi almost correct cha — ma tyo lai ekdam simple ra sahaj tarikale summarise garera bujhaunchu:

---

## ✅ `glom()` ke garxa?

* `glom()` **Spark ko transformation** ho.
* Yo le **har ek partition lai ek list ma halcha** — tyo list ma tyo partition bhitra ko sabai elements huncha.
* Return huney pani **RDD nai ho**, tara **each element is a list**.

---

## 🔍 Bujhna sajilo example:

```python
rdd = sc.parallelize([1, 2, 3, 4, 5, 6], 3)
rdd.glom().collect()
```

Output:

```
[[1, 2], [3, 4], [5, 6]]
```

Explanation:

* 3 partitions banae.
* `glom()` le har ek partition ma vayeko elements lai ek ota list ma halcha.
* Aba RDD ko har ek element euta list nai huncha.

---

## 🔁 Timro kura — “ek partition, ek list”:

Ho, `glom()` le:

* har partition lai → `list` banāuncha.
* jun list ma tyo partition bhitra ko sabai elements huncha.

> Yo le partition ko elements haru lai euta chunk/list ma convert garera dekhāu cha, jasto ki:
> "From *n* elements in a partition → make a **list of those n elements** as a single item in new RDD".

---

## 🔧 Kahile useful huncha?

* **Debugging**: kun partition ma kati data cha herna.
* **Group-wise processing**: ekchoti pura partition ma operation garnu cha bhane.
* For example: `rdd.glom().map(lambda x: sum(x)).collect()` → each partition ko sum.

---

## ✅ Summary in Nepali:

* `glom()` Spark ko transformation ho.
* Yo le har partition ko data lai euta list ma halera naya RDD banaucha.
* Nai RDD ko har element list huncha (i.e., list of elements in a partition).
* Useful cha when you want to do operations on full partitions.

---

Chahe bhane ma animation or visual diagram jasto kura banaera dekhāuna sakchhu, bhaney — let me know!


In [None]:
# aaba yo rdd ma j pani operation garxau tyo yeuta array ma garni operation hunxa
# yesari bhujam na pratyek partition ma 1 oota matrai element xa

In [13]:
rdd1=sc.parallelize([i for i in range(10)])

In [14]:
rdd2=rdd1.glom()

In [15]:
rdd3 = rdd2.map(lambda lst: [x + 1 for x in lst])

In [16]:
rdd3.collect()

                                                                                

[[], [1], [2], [3], [4], [5], [], [6], [7], [8], [9], [10]]