# Tabular Read-Write

This notebook is an introduction to the PAM tabular read-write methods. It has two parts:

1. [Read](#read-tabular-format)
2. [Write](#write-tabular-data)

Please note that the included example data is a small sample. Larger input data sets, such as full samples of the UK National Travel Survey will take longer.

In [1]:
import os

import pandas as pd
from pam import read

ModuleNotFoundError: No module named 'pam'

## Read Tabular Format

PAM can read from either tabular or MATSim formats. Tabular formats use the `pam.read.load_travel_diary` function, which will try to automatically infer trips and activities from commonly formatted travel diary data.

Tabular data should include a trips table and then optionally, atributes tables for persons and/or households. Tabular data is expected as pandas DataFrames with column names as described in the docs and/or as in the following example.

The following demonstration data is available in the [`data/example_data`](https://github.com/arup-group/pam/tree/main/examples/data/example_data) directory. All data paths in this example are relative to the [notebook directory](https://github.com/arup-group/pam/tree/main/examples) in the PAM repository.

#### Step 1

Load your trips (and attributes) data into pandas DataFrames. Reformat and rename the columns as required (please read the docs). The following example already has the required data types and column names:

**trips:**

Each row represents a trip, where:

- **pid**: person id of trip
- **hid**: household id of trip (**optional**)
- **seq**: sequence of trip within day (optional if order is already correct)
- **hzone**: home zone of person (**optional**)
- **ozone**: origin zone of trip
- **dzone**: destination zone of trip
- **purp**: purpose of trip (note that other ways of classifying purpose are supported - read the docs!)
- **mode**: trip mode
- **tst**: (integer) trip start time in minutes from start of day (typically from midnight)
- **tet**: (integer) trip end time as above
- **freq**: sample weighting (**optional**)

**persons:**

Each row represents a persons attributes. These can be arbitrary key - value pairs, with most types supported. The following are examples:

- **pid**: person id, must be consistent with trips data (**required**)
- gender: gender of person (example)
- job: employment status of person (example)
- occ: employment type of person (example)
- inc: income of person (example)

In [2]:
trips = pd.read_csv(
    os.path.join("data", "example_data", "example_travel_diaries.csv"), index_col="uid"
)
persons = pd.read_csv(
    os.path.join("data", "example_data", "example_attributes.csv"), index_col="pid"
)
trips.head(10)

In [3]:
persons.head(10)

#### Step 2:

Load the travel diary data:

In [4]:
population = read.load_travel_diary(trips, persons, trip_freq_as_person_freq=True)

#### Step 3:

Check everything is as expected. PAM will try to infer activities from trip data, including for arbitrarily complex sequences of nested tours.

However, trip purpose can be encoded in a variety of ways. PAM will try to make sensible inference based on the data provided. If something looks wrong then check the docs, then consider raising an issue. The team are keen to support you!

In [5]:
household = population.households["census_12"]
person = household.people["census_12"]
person.print()

In [6]:
person.plot()

## Write Tabular Data

PAM can write into a preferred tabular formats using `pam.write.to_csv`. This outputs trip legs, household attributes and person attributes tables. Where sufficient geometries are found, PAM will write spatial data as geojson.

In [7]:
from pam import write

write.to_csv(population, dir="tmp")

PAM can also write directly to O-D matrices using `pam.write.write_od_matrices`. This can optionally be segmented (read the docs). But does not currently support trip weighting (frequency).


In [8]:
write.write_od_matrices(population, "tmp")

## Pickle

Not a tabular format but if you've read this far - you might like to know that there is a Population.pickle method:

In [9]:
population.pickle(os.path.join("tmp", "population.pickle"))