# 📝 Problem: Compute Final Price After Discount

## **Problem Statement**
You are given a dataset containing the following columns:
- **product_id** (String)
- **product_name** (String)
- **original_price** (Double)
- **discount_percentage** (Double)

Your task is to compute the **final price** for each product by applying the discount and return the following columns:
- **product_id**
- **product_name**
- **final_price**

### Formula
\[
final\_price = original\_price \times \left(1 - \frac{discount\_percentage}{100}\right)
\]

---

## **Input**
- **File Path**: `/datasets/products.csv`

### Input Schema
| Column              | Type   |
|---------------------|--------|
| product_id          | String |
| product_name        | String |
| original_price      | Double |
| discount_percentage | Double |

### Example Input Table
| product_id | product_name | original_price | discount_percentage |
|------------|--------------|----------------|---------------------|
| P001       | Laptop       | 1000.00        | 10                  |
| P002       | Phone        | 800.00         | 5                   |
| P003       | Tablet       | 600.00         | 15                  |
| P004       | Monitor      | 300.00         | 20                  |
| P005       | Keyboard     | 100.00         | 25                  |

---

## **Output**
### Output Schema
| Column       | Type   |
|--------------|--------|
| product_id   | String |
| product_name | String |
| final_price  | Double |

### Example Output Table
| product_id | product_name | final_price |
|------------|--------------|-------------|
| P001       | Laptop       | 900.00      |
| P002       | Phone        | 760.00      |
| P003       | Tablet       | 510.00      |
| P004       | Monitor      | 240.00      |
| P005       | Keyboard     | 75.00       |

---

## **Explanation**
The final price is calculated by subtracting the discount from the original price using the formula:

\[
final\_price = original\_price \times (1 - discount\_percentage / 100)
\]

The resulting DataFrame `df_result` contains the required output.

---

In [5]:
from pyspark import SparkContext, SparkConf
import random

In [2]:
conf = SparkConf().setAppName("sparkPractice").setMaster("local[*]")

In [3]:
sc = SparkContext(conf=conf)

In [4]:
sc.defaultParallelism

8

In [6]:
random_list = random.sample(
    range(1, 41), 10
)

random_list

[8, 19, 12, 38, 29, 7, 18, 30, 22, 39]

In [13]:
# RDD of integers
rdd1 = sc.parallelize(random_list)

# RDD of strings
rdd2 = sc.parallelize(["apple", "banana", "mango"])

# RDD of tuples (key-value RDD, very common for joins, reduceByKey, etc.)
rdd3 = sc.parallelize([("a", 1), ("b", 2), ("a", 3)])

# RDD of lists
rdd4 = sc.parallelize([[1, 2], [3, 4], [5, 6]])

# RDD of dictionaries
rdd5 = sc.parallelize([{"id": 1, "name": "Darshan"}, {"id": 2, "name": "Pandey"}])


In [15]:
rdd5.collect()

[{'id': 1, 'name': 'Darshan'}, {'id': 2, 'name': 'Pandey'}]