# Experiment: timestamps and timezones

Timestamp behaves differently among databases, we will perform different experiments:

## 1. DuckDB to DuckDB

Duckdb has several timestamp types, for us the important ones are:

- TIMESTAMP ignores TimeZone session variable but still converts all timestamps with timezones to UTC
- TIMEDTAMPTZ uses TimeZone session variable to offset the timestamps convert

When you store a TIMESTAMPTZ (timestamp with time zone) value, the timestamp is converted to and stored as an instant in time (an absolute point in time, like a Unix timestamp) using the timezone setting that is active in the session.

## 2. DuckDB to Filesystem (parquet)

In parquet the timezone is a flag stored separately from the timestamp value (it does not affect timestamp values)


## DuckDB exploration
Before jumping to the experiments, we are going to do a DuckDB exploration.
We will start creating a duckdb table using TIMESTAMPZ type, with the timezone session set to America/Los Angeles. We can see that DuckDB uses the TimeZone to convert and store the timestamp as an instant (without timezone).

In [36]:
import duckdb
import pandas as pd

# Connect to the DuckDB database
conn = duckdb.connect('source.duckdb')

# Create a table and insert data
conn.execute('''
SET TimeZone = 'America/Los_Angeles';
CREATE TABLE IF NOT EXISTS events (
    event_id INTEGER,
    event_tstamp TIMESTAMPTZ
);
DELETE FROM events;
INSERT INTO events (event_id, event_tstamp) VALUES
  (1, '2024-07-30 10:00:00.123'),
  (2, '2024-07-30 10:00:00.123456+00:00');
''')

# Fetch the results and load into a Pandas DataFrame
results = conn.execute('SELECT * FROM events;').fetchdf()
print(results)

conn.close()

   event_id                     event_tstamp
0         1 2024-07-30 10:00:00.123000-07:00
1         2 2024-07-30 03:00:00.123456-07:00


Now we are going to change the Timezone value to UTC, and insert some new data to see what happens:

In [37]:
conn = duckdb.connect('source.duckdb')
conn.execute('''
SET TimeZone = 'UTC';
INSERT INTO events (event_id, event_tstamp) VALUES
  (3, '2024-08-01 10:00:00.123'),
  (4, '2024-08-02 10:00:00.123456+04:00');
''')

results = conn.execute('SELECT * FROM events;').fetchdf()
print(results)
conn.close()

   event_id                     event_tstamp
0         1 2024-07-30 17:00:00.123000+00:00
1         2 2024-07-30 10:00:00.123456+00:00
2         3 2024-08-01 10:00:00.123000+00:00
3         4 2024-08-02 06:00:00.123456+00:00


##  Experiment 1 - DuckDB to DuckDB

Now we are going to use dlt to load data from source.duckdb to new duckdb instance. We are going to do that three times:

- Timezone flag set to NONE (unset)
- Timezone flag set to True (on)
- Timezone flag set to False (off)

In [43]:
import dlt
import duckdb

# Fetch data
conn = duckdb.connect('source.duckdb')
source_df = conn.execute('SELECT * FROM events;').fetchdf()
conn.close()

pipelines = {
    "duckunset": None,
    "duckon": True,
    "duckoff": False
}

for p in pipelines.keys():
    
  # run pipeline
  pipeline = dlt.pipeline(
    pipeline_name=p,
    destination='duckdb',
  )

  pipeline.run(source_df.to_dict(orient="records"),write_disposition="replace",table_name='events',columns=[{"name": "event_tstamp", "data_type": "timestamp", "timezone": pipelines[p]}])

  # fetch results
  conn = duckdb.connect(f'{p}.duckdb')
  
  result = conn.execute(f'''
    SET TimeZone = 'America/Los_Angeles';
    SELECT event_id,event_tstamp FROM {p}_dataset.events;
  ''').fetchdf()

  describe = conn.execute(f'DESCRIBE {p}_dataset.events').fetchdf()

  conn.close()

  print (f"""
    Results for - {p}
    {result}
    DESCRIBE destination table:
    {describe}
  """)

  


    Results for - duckunset
       event_id               event_tstamp
0         1 2024-07-30 17:00:00.123000
1         2 2024-07-30 10:00:00.123456
2         3 2024-08-01 10:00:00.123000
3         4 2024-08-02 06:00:00.123456
    DESCRIBE destination table:
        column_name column_type null   key default extra
0  event_tstamp   TIMESTAMP  YES  None    None  None
1      event_id      BIGINT  YES  None    None  None
2  _dlt_load_id     VARCHAR   NO  None    None  None
3       _dlt_id     VARCHAR   NO  None    None  None
  

    Results for - duckon
       event_id                     event_tstamp
0         1 2024-07-30 10:00:00.123000-07:00
1         2 2024-07-30 03:00:00.123456-07:00
2         3 2024-08-01 03:00:00.123000-07:00
3         4 2024-08-01 23:00:00.123456-07:00
    DESCRIBE destination table:
        column_name               column_type null   key default extra
0  event_tstamp  TIMESTAMP WITH TIME ZONE  YES  None    None  None
1      event_id                    BIGINT  

We can see that when timezone flag influences the destination timestamp type being TIMESTAMP or TIMESTAMP WITH TIME ZONE, and with the later the TimeZone session variable is used to convert the store instants.

In [None]:
import dlt

# source

# destination
postgres = dlt.destinations.postgres("postgresql://loader:loader@localhost/dlt_data")

# pipeline
pipeline = dlt.pipeline(
  pipeline_name='chess',
  destination=postgres
)