
## Bronze Layer Ingestion  TelecomSparkTransformations

This notebook ingests raw telecom JSON files and creates
Bronze Delta tables with minimal transformation.

Principles:
- Append-only
- No business logic
- Schema-on-read
- Delta Lake format


##### 1.Define Base Data Path

In [0]:
base_path = "/Volumes/telecomsparktransformations_catalog/landing/operational_data/"

##### Ingest Subscribers (Bronze)

- Raw JSON
- No transformation
- Append mode

In [0]:
subscribers_df = (
    spark.read
         .option("multiline", "true")
         .json(f"{base_path}/subscribers.json")
)

subscribers_df.write.format("delta") \
    .mode("append") \
    .saveAsTable("telecomsparktransformations_catalog.bronze.subscribers")



##### Ingest Call Records (CDR)

In [0]:
call_records_df = (
    spark.read
         .option("multiline", "true")
         .json(f"{base_path}/call_records.json")
)

call_records_df.write.format("delta") \
    .mode("append") \
    .saveAsTable("telecomsparktransformations_catalog.bronze.call_records")



#####  Ingest Data Usage Events

In [0]:
data_usage_df = (
    spark.read
         .option("multiline", "true")
         .json(f"{base_path}/data_usage.json")
)

data_usage_df.write.format("delta") \
    .mode("append") \
    .saveAsTable("telecomsparktransformations_catalog.bronze.data_usage")


##### Ingest Recharge Transactions

In [0]:
recharge_df = (
    spark.read
         .option("multiline", "true")
         .json(f"{base_path}/recharge.json")
)

recharge_df.write.format("delta") \
    .mode("append") \
    .saveAsTable("telecomsparktransformations_catalog.bronze.recharge")
  


##### Validate Bronze Tables

In [0]:
spark.sql("SHOW TABLES IN telecomsparktransformations_catalog.bronze").show()

spark.table("telecomsparktransformations_catalog.bronze.subscribers").show()
spark.table("telecomsparktransformations_catalog.bronze.call_records").show()


##### Row Count Validation (Optional but Good Practice)

In [0]:
tables = ["subscribers", "call_records", "data_usage", "recharge"]

for t in tables:
    count = spark.table(f"telecomsparktransformations_catalog.bronze.{t}").count()
    print(f"bronze.{t} -> {count} records")
