In [None]:
!pip install Palmto-gen

**Load Data:**  The Palmto_gen package has a sample file of 30k taxi trajectories from the city of Porto included that we can use for demonstration.

In [None]:
import pkg_resources
import pandas as pd

sample_data_path = pkg_resources.resource_filename('Palmto_gen', 'data/porto_sample_data.pkl')
df = pd.read_pickle(sample_data_path)
df.head()

Next, we convert the trajectories into *'sentences'* using the following steps:
1. **Create Shapely Points:** For each latitude-longitude coordinate pair, create a Shapely Point object.
2. **Overlay a Grid**: Overlay a grid of a specific size over the area covered by the trajectories.
3. Assign Unique IDs to Grid Cells
4. **Merge the shapely points with the cell:** representing each point by the ID of the cell it fall into.

In [None]:
import geopandas as gpd
from Palmto_gen import ConvertToToken

study_area_path = pkg_resources.resource_filename('Palmto_gen', 'data/porto.geojson')
study_area =  gpd.read_file(study_area_path)

TokenCreator = ConvertToToken(df, study_area, cell_size = 50)
grid, sentence_df = TokenCreator.create_tokens()
sentence_df.head()

**Create n-grams:** from the *'sentences'* we formed in the previous step.

In [None]:
from Palmto_gen import NgramGenerator

ngram_model = NgramGenerator(sentence_df)
ngrams, start_end_points = ngram_model.create_ngrams()

**Approach 1: Generating length-constrained trajectories
from a given point**

Start the process by selecting a token from
our bigram corpus and specifying the length (number of points) for
our trajectory and the number of new trajectories we want. The model then proceeds to construct the trajectory by iteratively generating and adding points
until the predetermined length is reached.

In [None]:
from Palmto_gen import TrajGenerator

n = 10000
sentence_length = 30
traj_generator = TrajGenerator(ngrams, start_end_points, n, grid)
new_trajs_app1, new_trajs_app1_gdf = traj_generator. generate_trajs_using_origin(sentence_length)
new_trajs_app1.head()

**Approach 2: Generating trajectories between two given
points**

Select one origin and one destination point
from one of our original trajectories and generate trajectories that
connect these two points.

In [None]:
n = 10000
traj_generator = TrajGenerator(ngrams, start_end_points, n, grid)
new_trajs_app2, new_trajs_app2_gdf = traj_generator.generate_trajs_using_origin_destination()
new_trajs_app2.head()

**Save the new dataset.**

In [13]:
new_trajs_app1.to_pickle('generated_trajs_app1.pkl')
new_trajs_app1.to_csv('generated_trajs_app1.csv')

**Plot**: a sample of 1000 trajectories from the original dataset and the dataset generated using approach 1.

In [None]:
from Palmto_gen import DisplayTrajs

original_trajs = sentence_df['geometry'].sample(1000).to_list()
generated_trajs = new_trajs_app1_gdf['geometry'].sample(1000).to_list()

display_trajs = DisplayTrajs(original_trajs, generated_trajs)
display_trajs.display_maps()

**Plot**: a sample of 1000 trajectories from the original dataset and the dataset generated using approach 2.


In [None]:
original_trajs = sentence_df['geometry'].sample(1000).to_list()
generated_trajs = new_trajs_app2_gdf['geometry'].sample(1000).to_list()

display_trajs = DisplayTrajs(original_trajs, generated_trajs)
display_trajs.display_maps()

**Heat Map**: comparing the original trajectories with the trajectories generated using approach 1.

In [None]:
import matplotlib.pyplot as plt

sample_size = 5000
fig, axes = plt.subplots(1, 2, figsize=(20, 8))

display_trajs.plot_heat_map(sentence_df.sample(sample_size), study_area, axes[0], cell_size = 800)
axes[0].set_title('Original Trajectories')

display_trajs.plot_heat_map(new_trajs_app1_gdf.sample(sample_size), study_area, axes[1], cell_size = 800)
axes[1].set_title('Generated Trajectories')

plt.tight_layout()
plt.show()

**Heat Map**: comparing the original trajectories with the trajectories generated using approach 2.

In [None]:
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(20, 10))

display_trajs.plot_heat_map(sentence_df.sample(sample_size), study_area, axes[0], cell_size = 800)
axes[0].set_title('Original Trajectories')

display_trajs.plot_heat_map(new_trajs_app2_gdf.sample(sample_size), study_area, axes[1], cell_size = 800)
axes[1].set_title('Generated Trajectories')

plt.tight_layout()
plt.show()