## How to read schema from raw data files

Import the required libraries:

```import dlt``` - This imports the **dlt** module, which is a Python package for working with Delta Lake.

```from pyspark.sql.functions import *``` - This imports all functions from the pyspark.sql.functions module. The **pyspark.sql.functions** module contains a wide range of functions that can be used for data manipulation and transformation in Spark SQL.

```from pyspark.sql.types import *``` - This imports all types from the pyspark.sql.types module. The **pyspark.sql.types** module provides the data types that can be used to define the schema of a DataFrame or a column in Spark SQL.

```import datetime``` - This imports the datetime module, which is a standard Python library for working with dates and times.

In [None]:
import dlt
from pyspark.sql.functions import *
from pyspark.sql.types import *
import datetime

**/mnt/data/data1.csv** represents the file path where the data file named **data1.csv** is located.

**spark.read.csv()** method is used to read the data from the CSV file located at **data_path1** and load it into a DataFrame named **df1**.

**@dlt.table** represents a decorator, which suggests that the code is part of a framework or library that extends functionality.

**spark.read.schema(df1.schema).option('header', True).csv(data_path1)** returns a DataFrame by using the **spark.read** object to read the CSV file at **data_path1**. The **.schema(df1.schema)** part ensures that the new DataFrame has the same schema (column structure) as the previously read **df1** DataFrame. The **.option('header', True)** part indicates that the first row of the CSV file contains the header (column names). Finally, the **.csv(data_path1)** part reads the CSV file and returns a DataFrame.

In [None]:
#create delta live table
data_path1 ='/mnt/data/data1.csv'
df1 = spark.read.csv(data_path1)
@dlt.table
def ToBeDimension1_raw():
    return spark.read.schema(df1.schema).option('header', True).csv(data_path1)

Similarly run the following codes:

In [None]:
#create delta live table
data_path2 ='/mnt/data/data2.csv'
df2 = spark.read.csv(data_path2)
@dlt.table
def ToBeDimension2_raw():
    return spark.read.schema(df2.schema).option('header', True).csv(data_path2)

In [None]:
#create delta live table
data_path3 ='/mnt/data/data3.csv'
df3 = spark.read.csv(data_path3)
@dlt.table
def ToBeFact_raw():
    return spark.read.schema(df3.schema).option('header', True).csv(data_path3)