The tables to create are listed on a csv. This csv must be uploaded in the Volume or the DBFS.

If you want to use a csv file with Spark, you should upload it to DBFS or to the volum. **Having the file in the same worksapce directory as your notebook is not sufficient.**

To see the files uploaded in DBFS: 
`display(dbutils.fs.ls("dbfs:/FileStore/"))` The dbutils.fs.ls command only lists files in DBFS, not workspace files. To access workspace files, you should use shell commands such as %sh ls instead.

To list files in the workspace folder: 
```
%sh
ls
```


In [0]:
%sh
ls

Untitled Notebook 2025-10-06 11:54:07.ipynb


Here, we have the list of tables in a csv which has been uploaded in pur volum.

In [0]:
# read the information using a spark dataframe
df = spark.read.option("header","true").csv('/Volumes/data_2025-10-06.csv')
df.limit(5).display()

In [0]:
# print the schema of the csv
df.printSchema()

root
 |-- nom_schema: string (nullable = true)
 |-- nom_table: string (nullable = true)
 |-- info_table: string (nullable = true)
 |-- type_table: string (nullable = true)
 |-- size: string (nullable = true)
 |-- nombre_colonnes: string (nullable = true)
 |-- nombre_lignes: string (nullable = true)



In [0]:
# create a list of the name of schemas and a list of the name of tables, both from the spark dataframe
schemas = [row['nom_schema'] for row in df.select('nom_schema').collect()]
tables = [row['nom_table'] for row in df.select('nom_table').collect()]


In [0]:
# create the catalog in Databricks
spark.sql("""
    CREATE CATALOG IF NOT EXISTS data
    USING 'jdbc'
    OPTIONS (
        url 'jdbc:postgresql://<hostname>:<port>/<database>',
        user '<username>',
        password '<password>',
        driver 'org.postgresql.Driver'
    )
""")

The following approach registers the table in the Databricks catalog and queries will be pushed down to PostgreSQL, so data is not ingested into Databricks storage:

(This requires Unity Catalog and the JDBC table feature enabled in your workspace.)

In [0]:
# register the table in the Databricks catalog
for schema, table in zip(schemas, tables):
    table = f"{schema}.{table}"
    spark.sql(f"""
    CREATE TABLE IF NOT EXISTS datageo.{table}
    USING JDBC
    OPTIONS (
        url 'jdbc:postgresql:://<hostname>:<port>/<database>',
        dbtable '{table}',
        user '<username>',
        password '<password>',
        driver 'org.postgresql.Driver'
    )
""")

The JDBC table feature in Databricks Unity Catalog allows you to register external tables that reference data in JDBC-accessible databases (like PostgreSQL) without ingesting the data into Databricks storage. This feature must be enabled in your workspace to use SQL statements like `CREATE TABLE ... USING JDBC ...` that register external tables in Unity Catalog and push queries down to the source database.

To check or enable this feature, you need to:

- Be on a supported Databricks Runtime.
- Have Unity Catalog enabled in your workspace.
- Have the JDBC table feature enabled by your Databricks admin.

This is a workspace-level setting that may require admin action. If you do not see errors when running `CREATE TABLE ... USING JDBC ...`, the feature is likely enabled.

If you are unsure, contact your Databricks workspace admin to confirm that the JDBC table feature is enabled for Unity Catalog in your workspace.

In [0]:
display(dbutils.fs.ls("dbfs:/FileStore"))