
# Spark Day 2 — Reading CSV File (Write-up)

## Overview
In Spark (Databricks / PySpark), reading a CSV file is one of the most common Day-2 activities because most raw data in real projects arrives in CSV format. Spark provides a distributed way to read CSV files into a DataFrame, which can then be transformed using Spark SQL or PySpark transformations.

---

## Why Reading CSV is Important
CSV is widely used because it is:
- Simple and human readable
- Easy to export from Excel, databases, APIs
- Common in data engineering pipelines as a landing (raw/bronze) format

However, CSV files often come with issues like:
- Header mismatch
- Wrong schema inference
- Null/blank values
- Corrupt records
- Extra delimiter issues
- Date parsing issues
- Mixed data types inside same column

Spark CSV reader handles these challenges using various options.

---

## Basic Concept
When Spark reads a CSV file, it creates a **DataFrame**.  
A DataFrame is a distributed table-like structure with:
- Rows and columns
- Schema (column names + data types)
- Lazy execution (transformations are not executed until an action happens)

---

## Key Options While Reading CSV
While reading a CSV file, we usually control the following:

### 1) Header
- If `header=true`, Spark treats the first row as column names.
- If `header=false`, Spark auto-generates column names like `_c0`, `_c1`.

### 2) Schema Handling
Spark can read schema in two ways:

**a) Infer Schema**
- Spark scans the file and guesses data types.
- Works for small/clean files but can be risky for production.

**b) Provide Schema**
- Best practice in real projects.
- Ensures consistent datatype across all loads and prevents data drift issues.

### 3) Separator / Delimiter
By default delimiter is comma `,` but can also be:
- `|` pipe-separated
- `\t` tab-separated
- `;` semi-colon

### 4) Handling Bad Records
CSV may contain corrupted rows due to:
- Missing columns
- Extra columns
- Broken delimiter
Spark provides options like:
- `PERMISSIVE` (default): keeps row but sets invalid fields as null
- `DROPMALFORMED`: drops invalid rows
- `FAILFAST`: stops job immediately if any corrupt row appears

### 5) Null Value Handling
Spark can treat specific strings as null:
- "NULL"
- ""
- "NA"
This helps clean data properly at ingestion stage.

### 6) Quoted Values and Escape Characters
CSV often contains commas inside values, such as:
- `"Darbhanga, Bihar"`
Spark supports quote and escape handling to correctly parse such values.

---

## Databricks File Locations
CSV files can be loaded from:
- DBFS (`dbfs:/FileStore/...`)
- Mounted ADLS/S3 locations (`/mnt/...`)
- Unity Catalog Volumes (`/Volumes/...`)
- Local paths in cluster

This is important in Databricks because paths depend on workspace architecture.

---

## Output of CSV Reading
After reading CSV, the DataFrame can be:
- Displayed and explored
- Cleaned (null/blank handling)
- Validated (schema checks)
- Stored in Bronze layer (Delta format)
- Used for transformation into Silver/Gold layers

---

## Best Practices (Industry Standard)
- Always read CSV with `header=true`
- Avoid `inferSchema` in production pipelines
- Explicitly define schema using StructType
- Handle malformed/corrupt records proactively
- Trim/clean string columns after ingestion
- Store raw ingested CSV into Bronze Delta table for audit and replay

---

## Summary
Reading CSV in Spark is not only about loading data — it is the first step of a robust data pipeline. Using correct reader options ensures:
- Data consistency
- Data quality
- Better performance
- Less transformation errors later in the pipeline

Spark’s CSV reader is flexible and production-ready when combined with schema enforcement and bad record handling.


In [0]:
df=spark.read.format("csv").option("header","true").option("inferSchema","true").load("/Volumes/spartk14day/file_upload/datafiles/sales_transactions.csv")
display(df) 

transaction_id,customer_id,customer_name,email,phone,gender,age,city,state,country,product_id,product_name,category,quantity,unit_price,discount_pct,tax_pct,order_date,ship_date,payment_mode,order_status
T0001,C001,Anupam Jha,anupam.jha@gmail.com,9876543210.0,Male,29.0,Darbhanga,Bihar,India,P101,Notebook,Stationery,2,50,5,18,2025-12-01,2025-12-03,UPI,Delivered
T0002,C002,Rahul Kumar,rahul.kumar@gmail.com,9123456789.0,Male,31.0,Patna,Bihar,India,P102,Pendrive,Electronics,1,599,10,18,2025-12-02,2025-12-04,Card,Delivered
T0003,C003,Priya Singh,priya.singh@gmail.com,9988776655.0,Female,26.0,Delhi,Delhi,India,P103,Water Bottle,Home,3,199,0,12,2025-12-02,2025-12-05,COD,Delivered
T0004,C004,Amit Sharma,amit.sharma@@gmail.com,9090909090.0,Male,35.0,Mumbai,Maharashtra,India,P104,Mouse,Electronics,1,799,5,18,2025-12-03,2025-12-02,UPI,Returned
T0005,C005,Neha Verma,neha.verma@gmail.com,8888777766.0,Female,27.0,Bengaluru,Karnataka,India,P105,Coffee Mug,Home,2,299,15,12,2025-12-03,2025-12-07,Card,Delivered
T0006,C006,Sunil Yadav,sunil.yadav@gmail.com,7777666655.0,Male,40.0,Pune,Maharashtra,India,P106,T-Shirt,Fashion,5,499,20,5,2025-12-04,2025-12-08,COD,Delivered
T0007,C007,Rina Jha,rina.jha@gmail.com,6666555544.0,Female,22.0,Kolkata,West Bengal,India,P107,Headphones,Electronics,1,1299,5,18,2025-12-04,2025-12-06,UPI,Delivered
T0008,C008,Manish Gupta,manish.gupta@gmail.com,5555444433.0,Male,33.0,Hyderabad,Telangana,India,P108,Shoes,Fashion,2,1999,25,5,2025-12-05,2025-12-09,Card,Delivered
T0009,C009,Kavita Rai,kavita.rai@gmail.com,4444333322.0,Female,,Lucknow,Uttar Pradesh,India,P109,Kitchen Set,Home,1,3499,10,12,2025-12-05,2025-12-08,Card,Delivered
T0010,C010,Deepak Mishra,deepak.mishra@gmail.com,3333222211.0,Male,28.0,Jaipur,Rajasthan,India,P110,Smart Watch,Electronics,1,2999,0,18,2025-12-06,2025-12-10,UPI,Delivered



## PySpark Transformations: `filter()` and `col()` (Write-up)

### 1) `col()` Transformation
`col()` is used to reference a column in a PySpark DataFrame while building expressions.

**Why it is used:**
- Helps Spark understand that we are working with a **column expression**, not a Python variable.
- Makes transformations readable and consistent.
- Required in many functions like `filter()`, `withColumn()`, `when()`, `agg()`.

**Key points:**
- `col("column_name")` represents a DataFrame column.
- It is commonly used with arithmetic, comparison, and logical operators.

---

### 2) `filter()` Transformation
`filter()` is used to **select rows** from a DataFrame based on a condition.

**What it does:**
- Keeps only the rows where the condition is true.
- Returns a **new DataFrame** (original DataFrame remains unchanged).

**Equivalent to SQL:**
- `filter()` in PySpark is the same as `WHERE` in SQL.

**Common usage scenarios:**
- Filtering records by state/city
- Removing bad or corrupt rows
- Selecting rows with missing values (NULL checks)
- Filtering based on date ranges, sales thresholds, quantity limits, etc.

---

### Important Notes for Conditions in PySpark
When writing conditions inside `filter()`, PySpark uses **bitwise operators** instead of Python logical operators.

**Use:**
- `&` for AND  
- `|` for OR  
- `~` for NOT  

**Avoid:**
- `and`, `or`, `not` (these work in Python but not properly in Spark column expressions)

**Best practice:**
- Always wrap each condition in brackets for correctness and readabili


In [0]:

from pyspark.sql.functions import *
null_customer_df=df.filter(col("customer_name").isNull() | (col("customer_name")==" ")) #filtering the colume with null or blank value
display(null_customer_df)

transaction_id,customer_id,customer_name,email,phone,gender,age,city,state,country,product_id,product_name,category,quantity,unit_price,discount_pct,tax_pct,order_date,ship_date,payment_mode,order_status
T0015,C014,,invalid_email,9876501234,Female,21.0,Noida,Uttar Pradesh,India,P114,Notebook,Stationery,0,50,0,18,2025-12-08,2025-12-12,COD,Cancelled
T0025,C024,,empty.name@gmail.com,9800000000,Male,19.0,Delhi,Delhi,India,P124,Charger,Electronics,1,499,0,18,2025-12-13,2025-12-20,UPI,Delivered


In [0]:
df.select("customer_name","email","phone","gender","age","city","state","country").filter(col("state")=="Bihar").show()

+-------------+--------------------+----------+------+----+-----------+-----+-------+
|customer_name|               email|     phone|gender| age|       city|state|country|
+-------------+--------------------+----------+------+----+-----------+-----+-------+
|   Anupam Jha|anupam.jha@gmail.com|9876543210|  Male|29.0|  Darbhanga|Bihar|  India|
|  Rahul Kumar|rahul.kumar@gmail...|9123456789|  Male|31.0|      Patna|Bihar|  India|
|  Rahul Kumar|rahul.kumar@gmail...|9123456789|  Male|31.0|      Patna|Bihar|  India|
| Meena Kumari|meena.kumari@gmai...|9988123456|Female|38.0|      Patna|Bihar|  India|
|     Alok Jha|  alok.jha@gmail.com|9922334455|  Male|32.0|  Darbhanga|Bihar|  India|
|Sakshi Kumari|sakshi.kumari@gma...|9876540000|Female|23.0|  Darbhanga|Bihar|  India|
|    Rohit Jha| rohit.jha@gmail.com|9876509876|  Male|34.0|      Patna|Bihar|  India|
|   Nisha Devi|nisha.devi@gmail.com|9871112223|Female|50.0|Muzaffarpur|Bihar|  India|
+-------------+--------------------+----------+------+

## PySpark Transformation: `groupBy()`

### Overview
`groupBy()` is used in PySpark to **group rows based on one or more columns** so that aggregate calculations can be performed on each group.

It is one of the most important transformations for analytics and reporting, similar to SQL `GROUP BY`.

---

### Why we use `groupBy()`
`groupBy()` is used when we want to compute metrics like:
- total sales
- average age
- maximum/minimum values
- counts of records
- sum of quantity per category/state/city

---

### Key Concept
`groupBy()` by itself only creates groups.  
To get results, it must be followed by aggregation functions using:
- `agg()`

---

### Common Aggregations after `groupBy()`
- `sum()` → total
- `avg()` → average
- `max()` → maximum
- `min()` → minimum
- `count()` → number of rows

---

### Summary
- `groupBy()` groups data by a column (or multiple columns).
- It is always used together with aggregation to produce summary outputs.
- It is equivalent to SQL `UPBy


In [0]:
from pyspark.sql.functions import *
df.groupBy("state")\
  .agg(avg("age")).alias("avg_age")\
  .show()



+-------------+------------------+
|        state|          avg(age)|
+-------------+------------------+
|    Rajasthan|              28.0|
|        Delhi|24.666666666666668|
|  Maharashtra|              34.0|
|  West Bengal|              23.0|
|    Telangana|              33.0|
|        Bihar|              33.5|
|        Assam|              25.0|
|      Gujarat|              45.0|
|    Karnataka|              27.0|
|Uttar Pradesh|              25.0|
|   Tamil Nadu|              30.0|
+-------------+------------------+



In [0]:
df.groupBy("state")\
  .agg(max("age")).alias("max_age")\
  .show()

+-------------+--------+
|        state|max(age)|
+-------------+--------+
|    Rajasthan|    28.0|
|        Delhi|    29.0|
|  Maharashtra|    40.0|
|  West Bengal|    24.0|
|    Telangana|    33.0|
|        Bihar|    50.0|
|        Assam|    25.0|
|      Gujarat|    45.0|
|    Karnataka|    27.0|
|Uttar Pradesh|    29.0|
|   Tamil Nadu|    30.0|
+-------------+--------+



In [0]:

# when them cam be used as if esle in pyaspark
age_group=df.withColumn("age_category",when(col("age")<18,"minor")\
              .when((col("age")>=18) & (col("age")<60),"midage")\
              .when(col("age")>=60,"senior")\
              .otherwise("unknown"))\
               .display()
  



transaction_id,customer_id,customer_name,email,phone,gender,age,city,state,country,product_id,product_name,category,quantity,unit_price,discount_pct,tax_pct,order_date,ship_date,payment_mode,order_status,age_category
T0001,C001,Anupam Jha,anupam.jha@gmail.com,9876543210.0,Male,29.0,Darbhanga,Bihar,India,P101,Notebook,Stationery,2,50,5,18,2025-12-01,2025-12-03,UPI,Delivered,midage
T0002,C002,Rahul Kumar,rahul.kumar@gmail.com,9123456789.0,Male,31.0,Patna,Bihar,India,P102,Pendrive,Electronics,1,599,10,18,2025-12-02,2025-12-04,Card,Delivered,midage
T0003,C003,Priya Singh,priya.singh@gmail.com,9988776655.0,Female,26.0,Delhi,Delhi,India,P103,Water Bottle,Home,3,199,0,12,2025-12-02,2025-12-05,COD,Delivered,midage
T0004,C004,Amit Sharma,amit.sharma@@gmail.com,9090909090.0,Male,35.0,Mumbai,Maharashtra,India,P104,Mouse,Electronics,1,799,5,18,2025-12-03,2025-12-02,UPI,Returned,midage
T0005,C005,Neha Verma,neha.verma@gmail.com,8888777766.0,Female,27.0,Bengaluru,Karnataka,India,P105,Coffee Mug,Home,2,299,15,12,2025-12-03,2025-12-07,Card,Delivered,midage
T0006,C006,Sunil Yadav,sunil.yadav@gmail.com,7777666655.0,Male,40.0,Pune,Maharashtra,India,P106,T-Shirt,Fashion,5,499,20,5,2025-12-04,2025-12-08,COD,Delivered,midage
T0007,C007,Rina Jha,rina.jha@gmail.com,6666555544.0,Female,22.0,Kolkata,West Bengal,India,P107,Headphones,Electronics,1,1299,5,18,2025-12-04,2025-12-06,UPI,Delivered,midage
T0008,C008,Manish Gupta,manish.gupta@gmail.com,5555444433.0,Male,33.0,Hyderabad,Telangana,India,P108,Shoes,Fashion,2,1999,25,5,2025-12-05,2025-12-09,Card,Delivered,midage
T0009,C009,Kavita Rai,kavita.rai@gmail.com,4444333322.0,Female,,Lucknow,Uttar Pradesh,India,P109,Kitchen Set,Home,1,3499,10,12,2025-12-05,2025-12-08,Card,Delivered,unknown
T0010,C010,Deepak Mishra,deepak.mishra@gmail.com,3333222211.0,Male,28.0,Jaipur,Rajasthan,India,P110,Smart Watch,Electronics,1,2999,0,18,2025-12-06,2025-12-10,UPI,Delivered,midage


In [0]:


df.withColumn("total_sales", col("quantity") * col("unit_price")) \
  .groupBy("state") \
  .agg(sum("total_sales").alias("total_sales")) \
  .orderBy(col("total_sales").desc()) \
  .show()



+-------------+-----------+
|        state|total_sales|
+-------------+-----------+
|        Bihar|       6995|
|Uttar Pradesh|       4798|
|    Telangana|       3998|
|  Maharashtra|       3892|
|    Rajasthan|       2999|
|        Delhi|       2595|
|  West Bengal|       2196|
|        Assam|       1798|
|      Gujarat|        999|
|   Tamil Nadu|        996|
|    Karnataka|        598|
+-------------+-----------+



In [0]:
 df.withColumn("total_sales", col("quantity") * col("unit_price")) \
  .groupBy("state") \
  .agg(sum("total_sales").alias("total_sales")) \
  .orderBy(col("total_sales").desc()) \
  .limit(1) \
  .show()

+-----+-----------+
|state|total_sales|
+-----+-----------+
|Bihar|       6995|
+-----+-----------+



In [0]:
from pyspark.sql.functions import *

df.withColumn("gross_amount", col("quantity") * col("unit_price")) \
       .withColumn("discount_amount", col("gross_amount") * col("discount_pct")/100) \
       .withColumn("tax_amount", (col("gross_amount") - col("discount_amount")) * col("tax_pct")/100) \
       .withColumn("total_sale", col("gross_amount") - col("discount_amount") + col("tax_amount"))\
        .groupBy("state")\
        .agg(sum("total_sale").alias("total_sale")) \
  .orderBy(col("total_sale").desc()) \
      .display()


state,total_sale
Bihar,7061.6140000000005
Uttar Pradesh,4754.547
Maharashtra,3627.751
Rajasthan,3538.82
Telangana,3148.425
Delhi,2849.398
West Bengal,2514.639
Assam,2015.558
Tamil Nadu,1059.744
Gujarat,891.6075


In [0]:
df.printSchema()

root
 |-- transaction_id: string (nullable = true)
 |-- customer_id: string (nullable = true)
 |-- customer_name: string (nullable = true)
 |-- email: string (nullable = true)
 |-- phone: long (nullable = true)
 |-- gender: string (nullable = true)
 |-- age: double (nullable = true)
 |-- city: string (nullable = true)
 |-- state: string (nullable = true)
 |-- country: string (nullable = true)
 |-- product_id: string (nullable = true)
 |-- product_name: string (nullable = true)
 |-- category: string (nullable = true)
 |-- quantity: integer (nullable = true)
 |-- unit_price: integer (nullable = true)
 |-- discount_pct: integer (nullable = true)
 |-- tax_pct: integer (nullable = true)
 |-- order_date: date (nullable = true)
 |-- ship_date: date (nullable = true)
 |-- payment_mode: string (nullable = true)
 |-- order_status: string (nullable = true)



In [0]:
df_new_schema=df.withColumnRenamed("Product Name","product_name").display()

transaction_id,customer_id,customer_name,email,phone,gender,age,city,state,country,product_id,product_name,category,quantity,unit_price,discount_pct,tax_pct,order_date,ship_date,payment_mode,order_status
T0001,C001,Anupam Jha,anupam.jha@gmail.com,9876543210.0,Male,29.0,Darbhanga,Bihar,India,P101,Notebook,Stationery,2,50,5,18,2025-12-01,2025-12-03,UPI,Delivered
T0002,C002,Rahul Kumar,rahul.kumar@gmail.com,9123456789.0,Male,31.0,Patna,Bihar,India,P102,Pendrive,Electronics,1,599,10,18,2025-12-02,2025-12-04,Card,Delivered
T0003,C003,Priya Singh,priya.singh@gmail.com,9988776655.0,Female,26.0,Delhi,Delhi,India,P103,Water Bottle,Home,3,199,0,12,2025-12-02,2025-12-05,COD,Delivered
T0004,C004,Amit Sharma,amit.sharma@@gmail.com,9090909090.0,Male,35.0,Mumbai,Maharashtra,India,P104,Mouse,Electronics,1,799,5,18,2025-12-03,2025-12-02,UPI,Returned
T0005,C005,Neha Verma,neha.verma@gmail.com,8888777766.0,Female,27.0,Bengaluru,Karnataka,India,P105,Coffee Mug,Home,2,299,15,12,2025-12-03,2025-12-07,Card,Delivered
T0006,C006,Sunil Yadav,sunil.yadav@gmail.com,7777666655.0,Male,40.0,Pune,Maharashtra,India,P106,T-Shirt,Fashion,5,499,20,5,2025-12-04,2025-12-08,COD,Delivered
T0007,C007,Rina Jha,rina.jha@gmail.com,6666555544.0,Female,22.0,Kolkata,West Bengal,India,P107,Headphones,Electronics,1,1299,5,18,2025-12-04,2025-12-06,UPI,Delivered
T0008,C008,Manish Gupta,manish.gupta@gmail.com,5555444433.0,Male,33.0,Hyderabad,Telangana,India,P108,Shoes,Fashion,2,1999,25,5,2025-12-05,2025-12-09,Card,Delivered
T0009,C009,Kavita Rai,kavita.rai@gmail.com,4444333322.0,Female,,Lucknow,Uttar Pradesh,India,P109,Kitchen Set,Home,1,3499,10,12,2025-12-05,2025-12-08,Card,Delivered
T0010,C010,Deepak Mishra,deepak.mishra@gmail.com,3333222211.0,Male,28.0,Jaipur,Rajasthan,India,P110,Smart Watch,Electronics,1,2999,0,18,2025-12-06,2025-12-10,UPI,Delivered


In [0]:
df.withColumn("customer_name",when((col("customer_name").isNull())| (col("customer_name")==" ") ,lit("XXXX")).otherwise(col("customer_name"))).display()

transaction_id,customer_id,customer_name,email,phone,gender,age,city,state,country,product_id,product_name,category,quantity,unit_price,discount_pct,tax_pct,order_date,ship_date,payment_mode,order_status
T0001,C001,Anupam Jha,anupam.jha@gmail.com,9876543210.0,Male,29.0,Darbhanga,Bihar,India,P101,Notebook,Stationery,2,50,5,18,2025-12-01,2025-12-03,UPI,Delivered
T0002,C002,Rahul Kumar,rahul.kumar@gmail.com,9123456789.0,Male,31.0,Patna,Bihar,India,P102,Pendrive,Electronics,1,599,10,18,2025-12-02,2025-12-04,Card,Delivered
T0003,C003,Priya Singh,priya.singh@gmail.com,9988776655.0,Female,26.0,Delhi,Delhi,India,P103,Water Bottle,Home,3,199,0,12,2025-12-02,2025-12-05,COD,Delivered
T0004,C004,Amit Sharma,amit.sharma@@gmail.com,9090909090.0,Male,35.0,Mumbai,Maharashtra,India,P104,Mouse,Electronics,1,799,5,18,2025-12-03,2025-12-02,UPI,Returned
T0005,C005,Neha Verma,neha.verma@gmail.com,8888777766.0,Female,27.0,Bengaluru,Karnataka,India,P105,Coffee Mug,Home,2,299,15,12,2025-12-03,2025-12-07,Card,Delivered
T0006,C006,Sunil Yadav,sunil.yadav@gmail.com,7777666655.0,Male,40.0,Pune,Maharashtra,India,P106,T-Shirt,Fashion,5,499,20,5,2025-12-04,2025-12-08,COD,Delivered
T0007,C007,Rina Jha,rina.jha@gmail.com,6666555544.0,Female,22.0,Kolkata,West Bengal,India,P107,Headphones,Electronics,1,1299,5,18,2025-12-04,2025-12-06,UPI,Delivered
T0008,C008,Manish Gupta,manish.gupta@gmail.com,5555444433.0,Male,33.0,Hyderabad,Telangana,India,P108,Shoes,Fashion,2,1999,25,5,2025-12-05,2025-12-09,Card,Delivered
T0009,C009,Kavita Rai,kavita.rai@gmail.com,4444333322.0,Female,,Lucknow,Uttar Pradesh,India,P109,Kitchen Set,Home,1,3499,10,12,2025-12-05,2025-12-08,Card,Delivered
T0010,C010,Deepak Mishra,deepak.mishra@gmail.com,3333222211.0,Male,28.0,Jaipur,Rajasthan,India,P110,Smart Watch,Electronics,1,2999,0,18,2025-12-06,2025-12-10,UPI,Delivered



### Overview
`expr()` is a PySpark function used to write **SQL-like expressions directly inside PySpark code**.  
It allows us to perform calculations, transformations, and conditional logic using Spark SQL syntax.

It is very useful when:
- the logic is easier to write in SQL
- you want compact formulas
- you want to avoid chaining too many PySpark functions

---

### Why we use `expr()`
`expr()` helps in:
- creating new columns using arithmetic formulas
- applying SQL functions like `upper()`, `lower()`, `trim()`
- doing conditional logic with `CASE WHEN`
- writing complex expressions in a single line

---



In [0]:



### ✅ Example: Total Sale using `expr()`


from pyspark.sql.functions import expr

df2 = df.withColumn(
    "total_sale",
    expr("""
      (quantity * unit_price)
      - ((quantity * unit_price) * discount_pct / 100)
      + (((quantity * unit_price) - ((quantity * unit_price) * discount_pct / 100)) * tax_pct / 100)
    """)
)

display(df2)


transaction_id,customer_id,customer_name,email,phone,gender,age,city,state,country,product_id,product_name,category,quantity,unit_price,discount_pct,tax_pct,order_date,ship_date,payment_mode,order_status,total_sale
T0001,C001,Anupam Jha,anupam.jha@gmail.com,9876543210.0,Male,29.0,Darbhanga,Bihar,India,P101,Notebook,Stationery,2,50,5,18,2025-12-01,2025-12-03,UPI,Delivered,112.1
T0002,C002,Rahul Kumar,rahul.kumar@gmail.com,9123456789.0,Male,31.0,Patna,Bihar,India,P102,Pendrive,Electronics,1,599,10,18,2025-12-02,2025-12-04,Card,Delivered,636.138
T0003,C003,Priya Singh,priya.singh@gmail.com,9988776655.0,Female,26.0,Delhi,Delhi,India,P103,Water Bottle,Home,3,199,0,12,2025-12-02,2025-12-05,COD,Delivered,668.64
T0004,C004,Amit Sharma,amit.sharma@@gmail.com,9090909090.0,Male,35.0,Mumbai,Maharashtra,India,P104,Mouse,Electronics,1,799,5,18,2025-12-03,2025-12-02,UPI,Returned,895.679
T0005,C005,Neha Verma,neha.verma@gmail.com,8888777766.0,Female,27.0,Bengaluru,Karnataka,India,P105,Coffee Mug,Home,2,299,15,12,2025-12-03,2025-12-07,Card,Delivered,569.296
T0006,C006,Sunil Yadav,sunil.yadav@gmail.com,7777666655.0,Male,40.0,Pune,Maharashtra,India,P106,T-Shirt,Fashion,5,499,20,5,2025-12-04,2025-12-08,COD,Delivered,2095.8
T0007,C007,Rina Jha,rina.jha@gmail.com,6666555544.0,Female,22.0,Kolkata,West Bengal,India,P107,Headphones,Electronics,1,1299,5,18,2025-12-04,2025-12-06,UPI,Delivered,1456.179
T0008,C008,Manish Gupta,manish.gupta@gmail.com,5555444433.0,Male,33.0,Hyderabad,Telangana,India,P108,Shoes,Fashion,2,1999,25,5,2025-12-05,2025-12-09,Card,Delivered,3148.425
T0009,C009,Kavita Rai,kavita.rai@gmail.com,4444333322.0,Female,,Lucknow,Uttar Pradesh,India,P109,Kitchen Set,Home,1,3499,10,12,2025-12-05,2025-12-08,Card,Delivered,3526.992
T0010,C010,Deepak Mishra,deepak.mishra@gmail.com,3333222211.0,Male,28.0,Jaipur,Rajasthan,India,P110,Smart Watch,Electronics,1,2999,0,18,2025-12-06,2025-12-10,UPI,Delivered,3538.82
