# User-Defined Tables

This notebook demonstrates how to create new User-Defined Tables (UDTs) in the *User Data Lake* using `maystreet_data` API. UDTs are owned by their creator, who has permission to append data but cannot delete or modify the existing data.

UDTs can be created and deleted from the *Launcher*, as illustrated in the *Workbench User Guide*; . The `maystreet_data` library also provides the ability to insert data into the tables.

First, we import a few libraries:

In [None]:
from maystreet_data import udt
import maystreet_data as md
import pandas as pd

As in any other data lake, tables in the User Data Lake must have a unique name, so let's define one in the cell code below.

In [None]:
table_name = '' # enter an original name

In order to create a table, we need to upload some existing data which will generate the schema of the table. The data can be uploaded as a Pandas DataFrame or as a Parquet file. We are going to create a table with two columns, "a" and "b". The data types will be inferred by Workbench.

In [None]:
# Option 1)

#  Pandas DataFrame
data_df = pd.DataFrame(dict(a=[1, 2], b=["x", "y"]))
# Alternatively, you can upload your CSV file:
# data_df = pd.read_csv("/home/workbench/your_file.csv")

udt.create_table(name=table_name, records=data_df)

In [None]:
# Option 2)

# Parquet file. For the sake of simplicity, we create a Parquet file from a Pandas Dataframe.
data_df = pd.DataFrame(dict(a=[1, 2], b=["x", "y"]))
parquet_file = data_df.to_parquet("/home/workbench/data1.parquet")
udt.create_table(name=table_name, records="/home/workbench/data1.parquet")

Let's add more rows to the table. Note that uploading large chunks of rows at once, rather than one row at a time, drastically improves performance.

In [None]:
pd.DataFrame(dict(a=[3, 4], b=["new data", "new data"])).to_parquet("/home/workbench/data2.parquet")
udt.insert_into_table(name=table_name, records="/home/workbench/data2.parquet")

We can read the table by calling `md.query()`.

In [None]:
query = f"""
SELECT
    *
FROM
    p_user_data_lake."{table_name}"
        """

result = md.query(md.DataSource.DATA_LAKE_USER, query)
print(list(result))

We can also run queries across different data lakes.

In [None]:
query = f"""
SELECT
    *
FROM
    p_user_data_lake."{table_name}"
UNION ALL
SELECT
    1 AS a,
    product AS b
FROM
    p_production.p_mst_data_lake.mt_trade
WHERE
    dt = '2024-01-03'
    AND f = 'bats_edga'
    AND product = 'AAPL'
LIMIT 10
        """

result = md.query(md.DataSource.DATA_LAKE_USER, query)
print(list(result))

We can also rename the table if needed. If you decide to rename it, please remember to use the table's new name for further operations.

In [None]:
new_table_name = '' # pick a new name for the table
udt.rename_table(oldName=table_name, newName=new_table_name)

Once finished, let's delete the table and all its contents.

In [None]:
udt.delete_table(name=table_name, confirm="delete_table_and_all_its_contents")