#Telecom Domain Read & Write Ops Assignment - Building Datalake & Lakehouse
This notebook contains assignments to practice Spark read options and Databricks volumes. <br>
Sections: Sample data creation, Catalog & Volume creation, Copying data into Volumes, Path glob/recursive reads, toDF() column renaming variants, inferSchema/header/separator experiments, and exercises.<br>

![](https://fplogoimages.withfloats.com/actual/68009c3a43430aff8a30419d.png)
![](https://theciotimes.com/wp-content/uploads/2021/03/TELECOM1.jpg)

##First Import all required libraries & Create spark session object

##1. Write SQL statements to create:
1. A catalog named telecom_catalog_assign
2. A schema landing_zone
3. A volume landing_vol
4. Using dbutils.fs.mkdirs, create folders:<br>
/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/
/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/
/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/
5. Explain the difference between (Just google and understand why we are going for volume concept for prod ready systems):<br>
a. Volume vs DBFS/FileStore<br>
b. Why production teams prefer Volumes for regulated data<br>

###1. SQL Statement to create catalog named 'telecom_catalog_assign'

In [0]:
%sql
CREATE CATALOG IF NOT EXISTS telecom_catalog_assign;

###2. SQL Statement to create schema named 'landing_zone'

In [0]:
%sql
create schema if not exists telecom_catalog_assign.landing_zone

###3. SQL Statement to create volume named 'landing_vol'

In [0]:
%sql
create volume if not exists telecom_catalog_assign.landing_zone.landing_vol

#### 4. Using dbutils.fs.mkdirs, create folders:
#####/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/ 
#####/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/ 
#####/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/

In [0]:
dbutils.fs.mkdirs('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/')
dbutils.fs.mkdirs('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/')
dbutils.fs.mkdirs('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/')

####5.Explain the difference between (Just google and understand why we are going for volume concept for prod ready systems):
#####a. Volume vs DBFS/FileStore
#####b. Why production teams prefer Volumes for regulated data

#####a.Volumes are modern, Unity Catalog-governed objects for managing non-tabular data, offering better access control, lineage, and organization
##### DBFS (Databricks File System) is the underlying distributed file system for cloud storage

#####b.Production teams prefer using volumes (referring to a data management construct in platforms like Databricks) for regulated data primarily because they facilitate robust data governance, security, and access control while providing a user-friendly, file-system-like interface required by many data science and machine learning tools. 

##Data files to use in this usecase:
customer_csv = '''
101,Arun,31,Chennai,PREPAID
102,Meera,45,Bangalore,POSTPAID
103,Irfan,29,Hyderabad,PREPAID
104,Raj,52,Mumbai,POSTPAID
105,,27,Delhi,PREPAID
106,Sneha,abc,Pune,PREPAID
'''

usage_tsv = '''customer_id\tvoice_mins\tdata_mb\tsms_count
101\t320\t1500\t20
102\t120\t4000\t5
103\t540\t600\t52
104\t45\t200\t2
105\t0\t0\t0
'''

tower_logs_region1 = '''event_id|customer_id|tower_id|signal_strength|timestamp
5001|101|TWR01|-80|2025-01-10 10:21:54
5004|104|TWR05|-75|2025-01-10 11:01:12
'''

In [0]:
customer_data = '''101,Arun,31,Chennai,PREPAID
102,Meera,45,Bangalore,POSTPAID
103,Irfan,29,Hyderabad,PREPAID
104,Raj,52,Mumbai,POSTPAID
105,,27,Delhi,PREPAID
106,Sneha,abc,Pune,PREPAID
'''
dbutils.fs.put('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer.csv', customer_data, True)

usuage_data = '''customer_id\tvoice_mins\tdata_mb\tsms_count
101\t320\t1500\t20
102\t120\t4000\t5
103\t540\t600\t52
104\t45\t200\t2
105\t0\t0\t0
'''
dbutils.fs.put('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage.csv', usuage_data, True)

tower_logs_region1_data = '''event_id|customer_id|tower_id|signal_strength|timestamp
5001|101|TWR01|-80|2025-01-10 10:21:54
5004|104|TWR05|-75|2025-01-10 11:01:12
'''
dbutils.fs.put('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower_logs_region1.csv', tower_logs_region1_data, True)

##2. Filesystem operations
1. Write dbutils.fs code to copy the above datasets into your created Volume folders:
Customer → /Volumes/.../customer/
Usage → /Volumes/.../usage/
Tower (region-based) → /Volumes/.../tower/region1/ and /Volumes/.../tower/region2/

2. Write a command to validate whether files were successfully copied

#####1.Write dbutils.fs code to copy the above datasets into your created Volume folders: Customer → /Volumes/.../customer/ Usage → /Volumes/.../usage/ Tower (region-based) → /Volumes/.../tower/region1/ and /Volumes/.../tower/region2/

In [0]:
dbutils.fs.cp('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer.csv', '/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer.csv')
dbutils.fs.cp('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage.csv', '/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage.csv')
dbutils.fs.cp('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower_logs_region1.csv', '/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_logs_region1.csv')

#####2.Write a command to validate whether files were successfully copied

In [0]:
%sh
ls -l '/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer.csv'
ls -l '/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage.csv'
ls -l '/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_logs_region1.csv'

##3. Spark Directory Read Use Cases
1. Read all tower logs using:
Path glob filter (example: *.csv)
Multiple paths input
Recursive lookup

2. Demonstrate these 3 reads separately:
Using pathGlobFilter
Using list of paths in spark.read.csv([path1, path2])
Using .option("recursiveFileLookup","true")

3. Compare the outputs and understand when each should be used.

#####1.Read all tower logs using: Path glob filter (example: *.csv) Multiple paths input Recursive lookup

In [0]:
df1 = spark.read.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/', pathGlobFilter='*.csv', recursiveFileLookup=True)
display(df1.show(2))

#####2.Demonstrate these 3 reads separately: Using pathGlobFilter Using list of paths in spark.read.csv([path1, path2]) Using .option("recursiveFileLookup","true")

In [0]:
df1 = spark.read.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/', pathGlobFilter='*.csv')
display(df1)

df2 = spark.read.csv(
    '/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_logs_region1.csv'
  
)
display(df2)

df3 = spark.read.csv(
    '/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/',recursiveFileLookup=True)
display(df3)

#####3.Compare the outputs and understand when each should be used.

##4. Schema Inference, Header, and Separator
1. Try the Customer, Usage files with the option and options using read.csv and format function:<br>
header=false, inferSchema=false<br>
or<br>
header=true, inferSchema=true<br>
2. Write a note on What changed when we use header or inferSchema  with true/false?<br>
3. How schema inference handled “abc” in age?<br>

#####1.Try the Customer, Usage files with the option and options using read.csv and format function:
#####header=false, inferSchema=false
#####or
#####header=true, inferSchema=true

In [0]:
df1_cust_no_header=spark.read.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer.csv',header=False,inferSchema=False)
display(df1_cust_no_header)
df1_cust_no_header.printSchema()

df2_cust=spark.read.format('csv').option('header','True').option('inferSchema','True').load('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer.csv')
display(df2_cust)
df2_cust.printSchema()

df3_usage_no_header=spark.read.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage.csv',header=False,inferSchema=False,sep="\t")
display(df3_usage_no_header)
df3_usage_no_header.printSchema()

df4_usage=spark.read.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage.csv',header=True,inferSchema=True,sep="\t")
display(df4_usage)
df4_usage.printSchema()

#####2.Write a note on What changed when we use header or inferSchema with true/false?

######With Header=True -> It skips the header and displays the rows
#######With Header=False -> It consider header as one of the row

#####3.How schema inference handled “abc” in age?

#######When schema inference is enabled (inferSchema=True), Spark examines the data in each column to determine its type. If a column like "age" contains mostly numeric values but also a string value such as "abc", Spark will infer the column type as String to accommodate all values. This means all entries in the "age" column, including numbers, will be treated as strings. If inferSchema=False, all columns are read as String by default, so "abc" in "age" does not affect the inferred type—it remains String.

##5. Column Renaming Usecases
1. Apply column names using string using toDF function for customer data
2. Apply column names and datatype using the schema function for usage data
3. Apply column names and datatype using the StructType with IntegerType, StringType, TimestampType and other classes for towers data 

######1.Apply column names using string using toDF function for customer data

In [0]:
df1_cust_col=spark.read.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer.csv',header=False,inferSchema=False).toDF('id','name','age','loc','plan')
display(df1_cust_col)
df1_cust_col.printSchema()

######2.Apply column names and datatype using the schema function for usage data

In [0]:
from pyspark.sql.types import StructType,StructField,IntegerType,DoubleType

schema = "customer_id INT,voice_mins INT,data_mb INT,sms_count INT"

df1_usage =spark.read.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage.csv',header=True,schema=schema,sep="\t")
display(df1_usage)
df1_usage.printSchema()

######3.Apply column names and datatype using the StructType with IntegerType, StringType, TimestampType and other classes for towers data

In [0]:
from pyspark.sql.types import StructType,StructField,IntegerType,DoubleType,StringType,TimestampType

tower_schema = StructType([
    StructField("event_id",IntegerType(),True),
    StructField("customer_id",IntegerType(),True),
    StructField("tower_id",StringType(),True),
    StructField("signal_strength",IntegerType(),True),
    StructField("timestamp",TimestampType(),True)
    ])

df1_tower = spark.read.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_logs_region1.csv',header=True,schema=tower_schema,sep="|")
display(df1_tower)
df1_tower.printSchema()

## Spark Write Operations using 
- csv, json, orc, parquet, delta, saveAsTable, insertInto, xml with different write mode, header and sep options

##6. Write Operations (Data Conversion/Schema migration) – CSV Format Usecases
1. Write customer data into CSV format using overwrite mode
2. Write usage data into CSV format using append mode
3. Write tower data into CSV format with header enabled and custom separator (|)
4. Read the tower data in a dataframe and show only 5 rows.
5. Download the file into local from the catalog volume location and see the data of any of the above files opening in a notepad++.

######1.Write customer data into CSV format using overwrite mode

In [0]:
df1_cust_col.write.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write.csv',header=True,mode='overwrite')

######2.Write usage data into CSV format using append mode

In [0]:
df1_cust_col.write.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_csv',header=True,mode='append')

######3.Write tower data into CSV format with header enabled and custom separator (|)

In [0]:
df1_cust_col.write.csv('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_csv',header=True,mode='append',sep="|")

######4.Read the tower data in a dataframe and show only 5 rows.

In [0]:
df1_tower.show(5)

######5.Download the file into local from the catalog volume location and see the data of any of the above files opening in a notepad++.

#######It opens in notepad++

##7. Write Operations (Data Conversion/Schema migration)– JSON Format Usecases
1. Write customer data into JSON format using overwrite mode
2. Write usage data into JSON format using append mode and snappy compression format
3. Write tower data into JSON format using ignore mode and observe the behavior of this mode
4. Read the tower data in a dataframe and show only 5 rows.
5. Download the file into local harddisk from the catalog volume location and see the data of any of the above files opening in a notepad++.

######1.Write customer data into JSON format using overwrite mode

In [0]:
df1_cust_col.write.json('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write',mode='overwrite')

######2.Write usage data into JSON format using append mode and snappy compression format

In [0]:
df1_cust_col.write.json('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_snappy_json',mode='append',compression='snappy')

######3.Write tower data into JSON format using ignore mode and observe the behavior of this mode

In [0]:
df1_tower.write.json('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_write_json',mode='ignore')

######4.Read the tower data in a dataframe and show only 5 rows.

In [0]:
df1_tower_json =spark.read.json('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_write_json')
display(df1_tower_json.show(5))

######5.Download the file into local harddisk from the catalog volume location and see the data of any of the above files opening in a notepad++.

#######It opens in notepad++

##8. Write Operations (Data Conversion/Schema migration) – Parquet Format Usecases
1. Write customer data into Parquet format using overwrite mode and in a gzip format
2. Write usage data into Parquet format using error mode
3. Write tower data into Parquet format with gzip compression option
4. Read the usage data in a dataframe and show only 5 rows.
5. Download the file into local harddisk from the catalog volume location and see the data of any of the above files opening in a notepad++.

######1.Write customer data into Parquet format using overwrite mode and in a gzip format

In [0]:
df1_cust_col.write.parquet('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_parquet',mode='overwrite',compression='gzip')

######2.Write usage data into Parquet format using error mode

In [0]:
df4_usage.write.parquet('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage_write_parquet',mode='error')

######3.Write tower data into Parquet format with gzip compression option

In [0]:
df1_tower.write.parquet('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_write_parquet',compression='gzip')

######4.Read the usage data in a dataframe and show only 5 rows.

In [0]:
df4_usage_parquet=spark.read.parquet('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage_write_parquet')
df4_usage_parquet.show(5)

######5.Download the file into local harddisk from the catalog volume location and see the data of any of the above files opening in a notepad++.

#######Parquet file doesn't open in notepad++ but json file opens

##9. Write Operations (Data Conversion/Schema migration) – Orc Format Usecases
1. Write customer data into ORC format using overwrite mode
2. Write usage data into ORC format using append mode
3. Write tower data into ORC format and see the output file structure
4. Read the usage data in a dataframe and show only 5 rows.
5. Download the file into local harddisk from the catalog volume location and see the data of any of the above files opening in a notepad++.

######1.Write customer data into ORC format using overwrite mode

In [0]:
df1_cust_col.write.orc('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_orc',mode='overwrite')

######2.Write usage data into ORC format using append mode

In [0]:
df4_usage.write.orc('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage_write_orc',mode='append')

######3.Write tower data into ORC format and see the output file structure

In [0]:
df1_tower.write.orc('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_write_orc')
df1_tower.printSchema

######4.Read the usage data in a dataframe and show only 5 rows.

In [0]:
df4_usage_orc = spark.read.orc('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage_write_orc')
display(df4_usage_orc.limit(5))

######5.Download the file into local harddisk from the catalog volume location and see the data of any of the above files opening in a notepad++.

#######The file doesn't opens in notepad++

##10. Write Operations (Data Conversion/Schema migration) – Delta Format Usecases
1. Write customer data into Delta format using overwrite mode
2. Write usage data into Delta format using append mode
3. Write tower data into Delta format and see the output file structure
4. Read the usage data in a dataframe and show only 5 rows.
5. Download the file into local harddisk from the catalog volume location and see the data of any of the above files opening in a notepad++.
6. Compare the parquet location and delta location and try to understand what is the differentiating factor, as both are parquet files only.

######1.Write customer data into Delta format using overwrite mode

In [0]:
df1_cust_col.write.format('delta').save('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_delta',mode='overwrite')
df1_cust_col.printSchema
display(df1_cust_col.limit(5))




######2.Write usage data into Delta format using append mode

In [0]:
df1_usage.write.format('delta').save('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage_write_delta',mode='append')
df1_usage.printSchema
display(df1_usage.limit(5))



######3.Write tower data into Delta format and see the output file structure

In [0]:
df1_tower.write.format('delta').save('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/tower/tower_write_delta')
df1_tower.printSchema


######4.Read the usage data in a dataframe and show only 5 rows.

In [0]:


df1_usage_delta=spark.read.format('delta').load('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage_write_delta')
df1_usage_delta.printSchema
display(df1_usage_delta.limit(5))

######5.Download the file into local harddisk from the catalog volume location and see the data of any of the above files opening in a notepad++.

In [0]:
#######It is not in readable format

######6.Compare the parquet location and delta location and try to understand what is the differentiating factor, as both are parquet files only.

In [0]:
#######No difference except delta_log folder

##11. Write Operations (Lakehouse Usecases) – Delta table Usecases
1. Write customer data using saveAsTable() as a managed table
2. Write usage data using saveAsTable() with overwrite mode
3. Drop the managed table and verify data removal
4. Go and check the table overview and realize it is in delta format in the Catalog.
5. Use spark.read.sql to write some simple queries on the above tables created.


In [0]:
#1.Write customer data using saveAsTable() as a managed table

df1_cust_col.write.saveAsTable('telecom_catalog_assign.default.cust_delta')
df1_cust_col.printSchema
display(df1_cust_col.limit(5))


#2.Write usage data using saveAsTable() with overwrite mode
df1_usage.write.saveAsTable('telecom_catalog_assign.default.usage_delta')
df1_usage.printSchema
display(df1_usage.limit(5))

#3.Drop the managed table and verify data removal
#spark.sql("drop table telecom_catalog_assign.default.df1_cust_delta")
#display(spark.sql("select * from telecom_catalog_assign.default.df1_cust_delta"))

#4.Go and check the table overview and realize it is in delta format in the Catalog
#Checked the table overview and delta format in the catalog

#5.Use spark.read.sql to write some simple queries on the above tables created.
display(spark.sql("select * from telecom_catalog_assign.default.cust_delta"))




##12. Write Operations (Lakehouse Usecases) – Delta table Usecases
1. Write customer data using insertInto() in a new table and find the behavior
2. Write usage data using insertTable() with overwrite mode

In [0]:
#1.Write customer data using insertInto() in a new table and find the behavior

# Create a new empty table with the same schema as df1_cust_col
df1_cust_col.limit(0).write.saveAsTable('telecom_catalog_assign.default.cust_delta_insert')

# Insert customer data into the new table using insertInto()
df1_cust_col.write.insertInto('telecom_catalog_assign.default.cust_delta_insert')

# Display the inserted data to observe the behavior
display(spark.sql("select * from telecom_catalog_assign.default.cust_delta_insert"))


#2.Write usage data using insertTable() with overwrite mode
df1_usage.limit(0).write.saveAsTable('telecom_catalog_assign.default.usage_delta_insert')
df1_usage.write.insertInto('telecom_catalog_assign.default.usage_delta_insert')
df1_usage.printSchema
display(df1_usage.limit(5))

##13. Write Operations (Lakehouse Usecases) – Delta table Usecases
1. Write customer data into XML format using rowTag as cust
2. Write usage data into XML format using overwrite mode with the rowTag as usage
3. Download the xml data and open the file in notepad++ and see how the xml file looks like.

In [0]:

#1.Write customer data into XML format using rowTag as cust
df1_cust_col.write.xml('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/cust_write_xml',rowTag='cust')

#2.Write usage data into XML format using overwrite mode with the rowTag as usage
df1_usage.write.xml('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/usage/usage_write_xml',mode='overwrite',rowTag='usage')

#3.Download the xml data and open the file in notepad++ and see how the xml file looks like.
#Both the customer and usage xml files opens in notepad++ wiht row tag cust and usage respectively

##14. Compare all the downloaded files (csv, json, orc, parquet, delta and xml) 
1. Capture the size occupied between all of these file formats and list the formats below based on the order of size from small to big.

In [0]:
#1.delta
#2.parquet
#3.csv
#4.json
#5.xml
#6.orc

###15. Try to do permutation and combination of performing Schema Migration & Data Conversion operations like...
1. Read any one of the above orc data in a dataframe and write it to dbfs in a parquet format
2. Read any one of the above parquet data in a dataframe and write it to dbfs in a delta format
3. Read any one of the above delta data in a dataframe and write it to dbfs in a xml format
4. Read any one of the above delta table in a dataframe and write it to dbfs in a json format
5. Read any one of the above delta table in a dataframe and write it to another table

In [0]:
#1.Read any one of the above orc data in a dataframe and write it to dbfs in a parquet format

df1_cust_orc=spark.read.orc('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_orc')
df1_cust_orc.write.parquet('dbfs:/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_dbfs_parquet')

#2.Read any one of the above parquet data in a dataframe and write it to dbfs in a delta format

df1_cust_parquet=spark.read.parquet('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_parquet')
df1_cust_parquet.write.format('delta').save('dbfs:/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_dbfs_delta')

#3.Read any one of the above delta data in a dataframe and write it to dbfs in a xml format

df1_cust_delta=spark.read.format('delta').load('/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_delta')
df1_cust_delta.write.xml('dbfs:/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_dbfs_xml',rowTag='cust')

#4.Read any one of the above delta table in a dataframe and write it to dbfs in a json format
df2_cust_delta=spark.read.table('telecom_catalog_assign.default.cust_delta')
df2_cust_delta.write.json('dbfs:/Volumes/telecom_catalog_assign/landing_zone/landing_vol/customer/customer_write_dbfs_json')

#5.Read any one of the above delta table in a dataframe and write it to another table

df1_cust_col.write.saveAsTable('telecom_catalog_assign.default.cust_write_delta')
df2_cust_delta=spark.read.table('telecom_catalog_assign.default.cust_delta')
df2_cust_delta.write.saveAsTable('telecom_catalog_assign.default.cust_write_delta')


##16. Do a final exercise of defining one/two liner of... 
1. When to use/benifits csv
2. When to use/benifits json
3. When to use/benifit orc
4. When to use/benifit parquet
5. When to use/benifit delta
6. When to use/benifit xml
7. When to use/benifit delta tables
