### Introduction

Welcome to this Jupyter Notebook, where we will be conducting an analysis of GPS data provided by Perigon AI, covering Yerevan for the <br>
years 2019-2022. The dataset includes user_id, latitude, longitude, and timestamp information. Our primary objective is to analyze movement <br>
patterns and parking behaviors in Yerevan using advanced data processing and segmentation techniques. We will utilize several Python <br>
libraries such as `NumPy`, `Pandas`, `GeoPandas`, `Shapely`, `PyProj`, `OSMnx`, `NetworkX`, and Matplotlib for data manipulation and <br>
analysis, along with Trackintel and custom libraries for processing, segmentation and more tasks.

In the following sections, we will import the necessary libraries and start by cleaning and preparing the dataset for analysis. The <br>
analysis will focus on deriving staypoints from positionfixes, performing trajectory segmentation, and visualizing the results to <br>
gain insights into mobility patterns within the city. The use of `tqdm` will help track the progress of our computations, and <br>
warnings will be suppressed for a cleaner output.

In [2]:
import pandas as pd
import geopandas as gpd
import trackintel as ti

import lib.process as prcs
import lib.segmentation as seg

from tqdm import tqdm
import warnings

warnings.filterwarnings("ignore")

### 1. GPS Data Pre-Processing

In the context of our study on parking pattern analysis in Yerevan, the GPS data preprocessing stage is fundamental <br>
to refining the raw location data sourced from smartphones. This crucial step involves meticulous cleaning and validation <br>
procedures to enhance the accuracy of the dataset. Tasks such as handling missing or inaccurate data points, filtering out <br>
anomalies, and standardizing data formats are integral to ensuring the reliability of the information. The effectiveness of <br>
the parking pattern analysis heavily relies on a well-executed preprocessing stage, setting the groundwork for meaningful <br>
insigts into the dynamics of parking behavior within the city.

#### 1.1 Loading and Combining Data
In this section, we will load the position fixes from multiple CSV files and combine them into a single DataFrame. <br> 
The data files are stored in the `./data/sample/` directory and are named sequentially from `sample_0.csv` to `sample_18.csv`. <br>
We will use a loop to read each file, append it to a list, and then concatenate all the DataFrames into one. This will create <br>
a comprehensive dataset containing position fixes for further analysis.

In [2]:
pfs = []

for i in range(19):
    pos = prcs.read_positionfixes(f'./data/sample/sample_{i}.csv')
    pfs.append(pos)

pfs = pd.concat(pfs, ignore_index=True)

#### 1.2 Cleaning and Filtering Data
After loading and combining the position fixes, we will perform data cleaning to remove any duplicates and filter <br>
the points to retain only those within Yerevan's boundaries. This ensures the accuracy and relevance of our dataset <br>
for further analysis. The cleaned data will be stored in a CSV file under the name `./data./positionfixes.csv` and read <br>
into a trackintel positionfixes structure for standardized processing.

In [3]:
# Removing duplicates if any
pfs.drop_duplicates()

# Extracting points that are in Yerevan's polygon
pfs = prcs.filter_yerevan_data(pfs)

# Store the positionfixes in a csv
pfs.to_csv('./data/positionfixes.csv')

In [6]:
# Read the positionfixes extracted from the sample data
pfs = []

for i in range(7):
    pfs.append(ti.io.read_positionfixes_csv(f'./data/positionfixes/positionfixes_{i}.csv', sep=",", tz='UTC', index_col=0, crs=prcs.CRS.from_epsg(4326)))

pfs = pd.concat(pfs, ignore_index=True)

### 2. Extracting People
After filtering and cleaning the data, the next step involves grouping the data according to each device. This allows <br>
us to analyze the movement patterns of individual users and extract valuable information. By tracking each device individually, <br>
we can calculate movement modes and construct trips. Additionally, it helps in understanding trends and preferences that may <br>
inform urban planning and transportation policies.

In [None]:
# Separating by each device id
people = prcs.extract_people(pfs)

#### 2.1 Generating Staypoints
After extraction we still need to do some cleaning in order to get rid of noisy and uninformative data.We utilize the <br>
`generate_staypoints()` method of trackintel library to extract staypoints from the GPS data for each individual represented <br>
by the people list. By iterating over each person and calling this method, we identify locations where the device stayed stationary <br>
for a significant duration, indicative of potential destinations or points of interest.This process will clean all the days that are <br> 
unable to generating staypoints for people. By so we significantly decreased the points but we increased the accuracy by working with <br>
a representative data.These staypoints are then stored for future analysis and reuse, contributing to a deeper understanding of individual <br>
mobility patterns and urban behavior.



In [13]:
# Generate staypoints and store them in a file to reuse
for person in tqdm(people, colour='GREEN', desc='SP Generated: '):
    person.generate_staypoints()

SP Generated: 100%|[32m██████████[0m| 508478/508478 [1:17:39<00:00, 109.12it/s] 


In [5]:
# Save the pfs and sp after generation of staypoints
all_pfs = []
all_sp = []

for person in tqdm(people, colour='GREEN', desc='SP Generated: '):
    all_pfs.append(person.pfs)
    all_sp.append(person.sp)

all_pfs = pd.concat(all_pfs, ignore_index=True)
all_sp = pd.concat(all_sp, ignore_index=True)

all_pfs.to_csv('./data/inf_positionfixes.csv')
all_sp.to_csv('./data/inf_staypoints.csv')

#### 2.2 Reading clean positionfixes and staypoins and updating people
After our data cleaning we can precceed the work with already clean data. Let us get the positionfixes and staypoints <br>
for further processing and trajectory formation. Thenwe can extract people and generate staypoints for them.

In [18]:
# Reading positionfixes that have staypoints
pfs_clean = [] 

for i in range(2):
    pfs_clean.append(ti.io.read_positionfixes_csv(f'./data/positionfixes/inf_positionfixes_{i}.csv', sep=",", tz='UTC', index_col=0, crs=prcs.CRS.from_epsg(4326)))

pfs_clean = pd.concat(pfs_clean, ignore_index=True)

# Extracting people out of clean pfs
people = prcs.extract_people(pfs_clean)

People Extracted: 100%|[32m██████████[0m| 10021/10021 [00:12<00:00, 826.74it/s]


In [20]:
# Getting the staypoints that are already stored and putting them in peopel
sp_clean = ti.io.read_staypoints_csv('./data/positionfixes/inf_staypoints.csv', sep=",", tz='UTC', index_col=0, crs=prcs.CRS.from_epsg(4326))
prcs.update_staypoints(people, sp_clean)

### 3. Micro-Segmentation
Here we carry the process of segmenting GPS data into meaningful segments representing distinct movement patterns, <br>
such as walking or driving. The `convert_to_segments` function takes a sequence of position fixes (pfs) and calculates <br>
the distance, duration, and average speed between consecutive fixes. It also constructs a path through the street map <br>
based on the fixes' geographical coordinates, providing insights into the route taken by the user.


#### 3.1 Storage of micro-segments
As segment creation takes place for every consecutive points in daily basis the `save_segments` function is used <br>
to save the segments generated from the GPS data for each user into CSV files, enabling storage and future analysis. <br>

In [None]:
def save_segments(segments, index):
    frames =  []

    for item in segments:
        for seg in item:
            frame = pd.DataFrame(seg)
        
        frames.append(frame)
    
    s = pd.concat(frames, ignore_index=True)
    s.to_csv(f'./data/segments/sample_{index}.csv')

#### 3.2 Creation of segments

Furthermore, the process of generating segments from the cleaned position fixes data for each user is demonstrated. <br>
The combined segments from all users are read from the CSV files, concatenated into a single DataFrame, and written <br>
to a new CSV file. This consolidated dataset provides a comprehensive overview of movement patterns across different <br>
users, facilitating further analysis and insights into urban mobility behavior.



In [9]:
# Trying to create the segments from the clean pfs data for each user
segments = []

for i in tqdm(range(len(people))):
    pfs_days = people[i].group_pfs_by_date()
    segments_day = []
    
    for day in pfs_days:
        segment = seg.convert_to_segments(day)
        segments_day.append(segment)
    
    segments.append(segments_day)

    if (i % 25 == 0):
        save_segments(segments, i / 25)
        segments = []

100%|██████████| 10021/10021 [18:31:27<00:00,  6.65s/it]   


In [None]:
# Combiningthe output of segments
dfs = []

# Read each CSV file and append its DataFrame to the list
for i in range(1, 401):
    df = pd.read_csv(f'./data/segments/sample_{i}.0.csv')
    df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
    dfs.append(df)

# Concatenate all DataFrames in the list along rows
combined_df = pd.concat(dfs, ignore_index=True)

# Write the combined DataFrame to a new CSV file
combined_df.to_csv('./data/segments.csv', index=True)

#### 3.3 Retrieval of Segments
The retrieval of the segments was a significant step, however we need to continue our processing. Let us retrieve <br>
our segments for further processing. 

In [34]:
# Reading the segment information
from shapely.wkt import loads

df = pd.read_csv('./data/segments.csv')
df = df.loc[:, ~df.columns.str.contains('^Unnamed')]

df['geom'] = df['geom'].apply(loads)
segments = gpd.GeoDataFrame(df, geometry='geom', crs=prcs.CRS.from_epsg(4326))

In [35]:
# Merging the segments by first grouping segments by user and by date

from tqdm import tqdm
merged_segments = []
adjusted_segments = []

user_segments = seg.split_segments_by_user(segments)

for user in tqdm(user_segments):
    day_segments = seg.split_segments_by_date(user)
    for segment in day_segments:
        merged = seg.merge_segments(segment, v_thresh=0.5)
        adjusted = seg.merge_segments(merged)
        
        merged_segments.append(merged)
        adjusted_segments.append(adjusted)

100%|██████████| 9998/9998 [05:27<00:00, 30.55it/s] 
