<a href="https://colab.research.google.com/github/destruc-Harun/Vehicle-Trajectory-Anomaly-detection-with-Deep-Neural-Network/blob/main/TrajectoryAnomalyDetection.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Generating a taxi trajectory dataset with source-destination pairs, trajectories, and target variables (normal vs. anomalous trajectories) involves several steps. This includes data collection using the Google Maps API, labeling data, preprocessing, and finally feeding the data into neural network models like LSTM, Bi-LSTM, CNN, GRU and two more hybrid model for prediction.

### Step 1: Data Collection

#### 1.1. Google Maps API Setup
To use the Google Maps API, you need an API key. Set up your project on the [Google Cloud Platform](https://cloud.google.com/) and enable the Google Maps API.

#### 1.2. Generate Trajectories
Use the Google Maps Directions API to get routes between random source and destination pairs within New York City.



In [None]:
import requests
import json
import random

# Set up your Google Maps API key
API_KEY = 'YOUR_GOOGLE_MAPS_API_KEY'

def get_route(source, destination):
    url = f"https://maps.googleapis.com/maps/api/directions/json?origin={source}&destination={destination}&key={API_KEY}"
    response = requests.get(url)
    data = response.json()
    if data['status'] == 'OK':
        route = data['routes'][0]['overview_polyline']['points']
        return route
    return None

def generate_random_coordinates():
    lat = random.uniform(40.5774, 40.9176)
    lon = random.uniform(-74.15, -73.7004)
    return f"{lat},{lon}"

# Generate random source and destination pairs
source = generate_random_coordinates()
destination = generate_random_coordinates()
route = get_route(source, destination)
print("Generated route:", route)


Generated route: None


### Step 2: Labeling Data
Label some trajectories as normal (1) and some as anomalous (0). Anomalous trajectories can be simulated by introducing deviations in normal routes or using random trajectories that are significantly different from the normal routes.



In [26]:
from IPython.display import clear_output

def generate_anomalous_route():
    # Simulate an anomalous route by generating random deviations
    return generate_random_coordinates()

# Example of generating a dataset
dataset = []
for _ in range(1000):  # Generate 1000 trajectories
    source = generate_random_coordinates()
    destination = generate_random_coordinates()
    normal_route = get_route(source, destination)
    if normal_route:
        dataset.append((source, destination, normal_route, 1))  # Normal route

    # Generate an anomalous route
    anomalous_route = generate_anomalous_route()
    dataset.append((source, destination, anomalous_route, 0))  # Anomalous route

print("Dataset:", dataset)

# Clear the output
clear_output(wait=True)


Dataset: [('40.7974301158683,-73.84586403217672', '40.58731424348921,-74.06744637767487', '40.899858810890564,-73.89769116697296', 0), ('40.88125810051369,-73.93112371045703', '40.85545625251363,-73.96378703246432', '40.74187505572564,-74.1090008564768', 0), ('40.86200651715343,-73.8474438752024', '40.90166138193207,-74.07826445913614', '40.77250309014826,-73.93933162579502', 0), ('40.61513192407044,-73.87456225070066', '40.79577731695874,-74.07188401233547', '40.59490596975614,-73.87673421513918', 0), ('40.89419197354356,-73.921194792142', '40.85188068666483,-73.9460196897793', '40.708705544868586,-74.01901368047574', 0), ('40.75374176493428,-73.95589947068135', '40.833471492821374,-73.90279977509559', '40.86440342841539,-73.86957109636836', 0), ('40.91205747097335,-73.80148507335807', '40.79257628303758,-73.94508106093697', '40.69091929374031,-73.86532348242922', 0), ('40.6729155726817,-73.99333266736285', '40.770808128265216,-73.78484626850702', '40.81400369436552,-73.78566925737616

### Step 3: Preprocessing Data

1. **Check Polyline Decoding**: Ensure that the polyline string is correctly decoded into coordinates.
2. **Error Handling**: A try-except block is added to catch any errors during decoding.
3. **Empty or Invalid Routes**: We check if the decoded coordinates list is empty or invalid.
4. **Normalization**: Coordinates are normalized using min-max normalization as an example.


In [None]:
!pip install polyline

Collecting polyline
  Downloading polyline-2.0.2-py3-none-any.whl (6.0 kB)
Installing collected packages: polyline
Successfully installed polyline-2.0.2


In [None]:
import numpy as np
import polyline

def preprocess_route(polyline_str):
    try:
        # Decode polyline into coordinates
        coordinates = polyline.decode(polyline_str)
        # Check if coordinates are empty or not valid
        if not coordinates:
            return None
        # Normalize coordinates (example: min-max normalization)
        coordinates = np.array(coordinates, dtype=np.float32)
        coordinates = (coordinates - coordinates.min(axis=0)) / (coordinates.max(axis=0) - coordinates.min(axis=0))
        return coordinates
    except Exception as e:
        print(f"Error decoding polyline: {e}")
        return None

# Preprocess the entire dataset with error handling
preprocessed_data = []
for source, destination, route, label in dataset:
    preprocessed_route = preprocess_route(route)
    if preprocessed_route is not None:
        preprocessed_data.append((source, destination, preprocessed_route, label))

print("Preprocessed data:", preprocessed_data)


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
       [0.6233766 , 0.82608694],
       [0.48051953, 0.67391306],
       [0.5714286 , 0.9347826 ],
       [0.7012987 , 0.76086956],
       [0.8571429 , 0.5217391 ],
       [0.68831176, 0.2608696 ],
       [0.83116883, 0.        ],
       [1.        , 0.23913048],
       [0.84415585, 0.5217391 ],
       [1.        , 0.76086956],
       [0.8701299 , 1.        ]], dtype=float32), 0), ('40.65704581672286,-73.83820789174607', '40.77575011045524,-73.71599570782607', array([[1.        , 0.45121953],
       [0.89189196, 0.5853659 ],
       [0.71621627, 0.46341464],
       [0.5810811 , 0.3292683 ],
       [0.44594598, 0.2195122 ],
       [0.2972973 , 0.35365853],
       [0.16216214, 0.24390244],
       [0.        , 0.08536586],
       [0.17567568, 0.        ],
       [0.2702703 , 0.14634147],
       [0.40540543, 0.0487805 ],
       [0.5810811 , 0.18292683],
       [0.71621627, 0.31707317],
       [0.5405406 , 0.46341464],
       [


### Step 4: Prepare Data for Neural Network
Format the data appropriately, split into training and testing sets, and convert to tensors.


In [None]:
# Convert preprocessed data to tensors
def convert_to_tensor(data):
    X = []
    y = []
    route_shape = None
    for _, _, route, label in data:
        # Check if route shape is consistent
        if route_shape is None:
            route_shape = route.shape
        elif route.shape != route_shape:
            continue  # Skip routes with different shapes
        X.append(route)
        y.append(label)
    X = np.array(X, dtype=np.float32)
    y = np.array(y, dtype=np.int32)
    return X, y

X, y = convert_to_tensor(preprocessed_data)

# Split data into training and testing sets
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Check shapes of the data
print("X_train shape:", X_train.shape)
print("y_train shape:", y_train.shape)
print("X_test shape:", X_test.shape)
print("y_test shape:", y_test.shape)


X_train shape: (466, 18, 2)
y_train shape: (466,)
X_test shape: (117, 18, 2)
y_test shape: (117,)


### Ensure Correct Input Shape for Neural Networks
Make sure the input shape for your neural network matches the shape of your preprocessed data. For instance, if your routes are sequences of coordinates:


In [None]:
input_shape = (X_train.shape[1], X_train.shape[2])


###Step 5: Define Neural Network models and train
By using Tensorflow and Keras we can define models.

LSTM Model



In [None]:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Define LSTM model
def create_lstm_model(input_shape):
    model = Sequential()
    model.add(LSTM(50, input_shape=input_shape, return_sequences=False))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

lstm_model = create_lstm_model(input_shape)

# Train the model
lstm_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7b02107dd720>

Bi-LSTM Model

In [None]:
from tensorflow.keras.layers import Bidirectional

# Define Bi-LSTM model
def create_bilstm_model(input_shape):
    model = Sequential()
    model.add(Bidirectional(LSTM(50), input_shape=input_shape))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

bilstm_model = create_bilstm_model(input_shape)

# Train the model
bilstm_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7b0210fb2770>

CNN Model

In [None]:
from tensorflow.keras.layers import Conv1D, MaxPooling1D, Flatten

# Define CNN model
def create_cnn_model(input_shape):
    model = Sequential()
    model.add(Conv1D(16, kernel_size=3, activation='relu', input_shape=input_shape))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Conv1D(32, kernel_size=3, activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Flatten())
    model.add(Dense(100, activation='relu'))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

cnn_model = create_cnn_model(input_shape)

# Train the model
cnn_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7b0206ac8df0>

GRU Model

In [None]:
from tensorflow.keras.layers import GRU

# Define GRU model
def create_gru_model(input_shape):
    model = Sequential()
    model.add(GRU(50, input_shape=input_shape))
    model.add(Dense(1, activation='sigmoid'))
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

gru_model = create_gru_model(input_shape)

# Train the model
gru_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7b020b559a80>

### Step 6: Define Hybrid CNN-LSTM Model and train
We will define a hybrid CNN-LSTM model. The CNN layers will capture spatial features from the input sequences, and the LSTM layers will capture temporal dependencies.


In [None]:
def create_hybrid_cnn_lstm_model(input_shape):
    model = Sequential()
    # CNN layers
    model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    # LSTM layer
    model.add(LSTM(50, return_sequences=False))
    # Fully connected layer
    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

input_shape = (X_train.shape[1], X_train.shape[2])
hybrid_model = create_hybrid_cnn_lstm_model(input_shape)

# Train the model
hybrid_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7b020aa33490>

Define Hybrid CNN-GRU Model

Similarly, we can define a hybrid CNN-GRU model.


In [None]:
def create_hybrid_cnn_gru_model(input_shape):
    model = Sequential()
    # CNN layers
    model.add(Conv1D(filters=64, kernel_size=3, activation='relu', input_shape=input_shape))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
    model.add(MaxPooling1D(pool_size=2))
    # GRU layer
    model.add(GRU(50, return_sequences=False))
    # Fully connected layer
    model.add(Dense(1, activation='sigmoid'))

    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

hybrid_gru_model = create_hybrid_cnn_gru_model(input_shape)

# Train the model
hybrid_gru_model.fit(X_train, y_train, epochs=10, batch_size=32, validation_data=(X_test, y_test))


Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.src.callbacks.History at 0x7b0207086da0>

### Step 7: Evaluation
Evaluate your trained ordinary and hybrid models on the test set to check their performance.


In [None]:
# Evaluate LSTM model
loss, accuracy = lstm_model.evaluate(X_test, y_test)
print(f'LSTM Model Test Accuracy: {accuracy * 100:.2f}%')

# Evaluate Bi-LSTM model
loss, accuracy = bilstm_model.evaluate(X_test, y_test)
print(f'Bi-LSTM Model Test Accuracy: {accuracy * 100:.2f}%')

# Evaluate CNN model
loss, accuracy = cnn_model.evaluate(X_test, y_test)
print(f'CNN Model Test Accuracy: {accuracy * 100:.2f}%')

# Evaluate GRU model
loss, accuracy = gru_model.evaluate(X_test, y_test)
print(f'GRU Model Test Accuracy: {accuracy * 100:.2f}%')

# Evaluate Hybrid CNN-LSTM model
loss, accuracy = hybrid_model.evaluate(X_test, y_test)
print(f'Hybrid CNN-LSTM Model Test Accuracy: {accuracy * 100:.2f}%')

# Evaluate Hybrid CNN-GRU model
loss, accuracy = hybrid_gru_model.evaluate(X_test, y_test)
print(f'Hybrid CNN-GRU Model Test Accuracy: {accuracy * 100:.2f}%')


LSTM Model Test Accuracy: 100.00%
Bi-LSTM Model Test Accuracy: 100.00%
CNN Model Test Accuracy: 100.00%
GRU Model Test Accuracy: 100.00%
Hybrid CNN-LSTM Model Test Accuracy: 100.00%
Hybrid CNN-GRU Model Test Accuracy: 100.00%
