# Loading Trajectory Data

Mobility data comes in many formats: timestamps as unix integers or ISO strings (with timezones), 
coordinates in lat/lon or projected, files as single CSVs or partitioned directories.

`nomad.io.from_file` handles these cases with a single function call.

In [None]:
import glob
import pandas as pd
import nomad.io.base as loader
import nomad.data as data_folder
from pathlib import Path

data_dir = Path(data_folder.__file__).parent

## Pandas vs nomad.io for partitioned data

Partitioned directories (e.g., `date=2024-01-01/`, `date=2024-01-02/`, ...) require a loop with pandas:

In [None]:
csv_files = glob.glob(str(data_dir / "partitioned_csv" / "*" / "*.csv"))
df_list = []
for f in csv_files:
    df_list.append(pd.read_csv(f))
df_pandas = pd.concat(df_list, ignore_index=True)

print(f"Pandas: {len(df_pandas)} rows")
print(df_pandas.dtypes)
print("\nFirst few rows:")
print(df_pandas.head(3))

`nomad.io.from_file` handles partitioned directories in one line, plus automatic type casting and column mapping:

In [None]:
traj_cols = {"user_id": "user_id",
             "latitude": "dev_lat",
             "longitude": "dev_lon",
             "datetime": "local_datetime"}

df = loader.from_file(data_dir / "partitioned_csv", format="csv", traj_cols=traj_cols, parse_dates=True)
print(f"nomad.io: {len(df)} rows")
print(df.dtypes)
print("\nFirst few rows:")
print(df.head(3))
print("\nNote: 'local_datetime' is now datetime64[ns], not object!")

The same pattern works for Parquet files, with the type casting and processing relying on passing to the functions which columns correspond to the default "typical" spatio-temporal column names

In [None]:
traj_cols = {"user_id": "uid", "timestamp": "timestamp", 
             "latitude": "latitude", "longitude": "longitude", "date": "date"}

df = loader.from_file(data_dir / "partitioned_parquet", format="parquet", traj_cols=traj_cols, parse_dates=True)
print(f"Loaded {len(df)} rows")
print(df.dtypes)

In [None]:
# These are the default canonical columnn names
from nomad.constants import DEFAULT_SCHEMA
print(DEFAULT_SCHEMA.keys())