
# Wildlife Movement Prediction using Deep Learning 🦅

This project leverages deep learning techniques to analyze GPS tracking data of wildlife, focusing on bird migration patterns. Using geospatial and temporal data, we preprocess and visualize trajectories, compute inter-location distances using the Haversine formula, and apply sequential modeling techniques (e.g., GRU) to predict animal movement.

---

### 📂 Dataset Information
- **Source**: Movement Ecology Dataset (hosted via Google Drive)
- **Size**: ~90,000 entries
- **Fields**: Timestamp, Location (lat/lon), Species, Sensor Type, Vegetation Indexes, etc.

---

### 💻 Project Goals
- Clean and preprocess geospatial data
- Compute movement trajectories
- Build and train GRU-based deep learning model
- Predict next location(s) in movement sequence


In [None]:
%pip install gdown geopy pandas numpy matplotlib seaborn scikit-learn tensorflow --quiet

In [None]:
import numpy as np
import pandas as pd
import math

In [None]:
import gdown
import pandas as pd

# Extract the File ID from your link
file_id = "1o1umq9xOuvhE7rKWpop82tvkKYjtIPyX"  # Extracted from your Google Drive link

# Correct Google Drive direct download URL
download_url = f"https://drive.google.com/uc?id={file_id}"

# Define output file name
output_file = "migration_original.csv"

# Download the file
gdown.download(download_url, output_file, quiet=False)


In [None]:
# Load the dataset
df = pd.read_csv('migration_original.csv')
print(df.shape)
df.head()

In [None]:
# Check for unique values in all the columns
for column in df.columns:
  print(f'The Unique Columns present in "{column}" are: ',df[column].unique(), "\n")

In [None]:
'''
    1. As some Columns contain all null values or a single value for entire dataset,
       they does not contribute to the output at all thus we will drop them.

    2. Also as "individual-local-identifier" is the same as that of "tag-local-identifier"
       just with an extension of "A" they become similar.

    3. Again, "ECMWF Interim Full Daily Invariant Low Vegetation Cover" and "ECMWF Interim Full Daily Invariant High Vegetation Cover"
       are complementary to each other. Thus, do not need to keep both in our dataset for training. Any one can be dropped.
'''

# define Columns to drop
columns_to_drop = ["event-id","visible", "visible.1", "sensor-type", "individual-taxon-canonical-name", "study-name", "manually-marked-outlier",
                   "NCEP NARR SFC Vegetation at Surface", "individual-local-identifier", "ECMWF Interim Full Daily Invariant Low Vegetation Cover"]

# drop unwanted columns
data = df.drop(columns=columns_to_drop)
data.head()

In [None]:
# Check for data types and Null counts using info() method
data.info()

In [None]:
# -------------------- STEP 1: Load Data -------------------- #
# Ensure timestamps are in datetime format
data["timestamp"] = pd.to_datetime(data["timestamp"])

In [None]:
# -------------------- STEP 2: Group Data by Tag -------------------- #
# This groups the dataset by "tag-local-identifier" so that birds are clearly separated
data = data.sort_values(by=["tag-local-identifier", "timestamp"]).reset_index(drop=True)

In [None]:
# -------------------- STEP 3: Extract Date-Time Features -------------------- #
data["year"] = data["timestamp"].dt.year
data["month"] = data["timestamp"].dt.month
data["hour"] = data["timestamp"].dt.hour

In [None]:
# STEP 4: Compute Time Difference per Bird
data["time_diff(hrs)"] = (
    data.groupby("tag-local-identifier")["timestamp"]
    .diff().dt.total_seconds() / 3600
)

# Replace NaN with 0 for the first row per bird (safe assignment)
data["time_diff(hrs)"] = data["time_diff(hrs)"].fillna(0)

In [None]:
data.head(10)

### Haversine Formula:

To calculate the distance between two latitude and longitude points (current and previous), you can use the **Haversine formula**. This formula calculates the distance between two points on the Earth's surface, taking into account the spherical shape of the Earth.

$$
a = \sin^2\left(\frac{\Delta\phi}{2}\right) + \cos(\phi_1) \cdot \cos(\phi_2) \cdot \sin^2\left(\frac{\Delta\lambda}{2}\right)
$$

$$
c = 2 \cdot \text{atan2}\left(\sqrt{a}, \sqrt{1 - a}\right)
$$

$$
d = R \cdot c
$$

Where:
- $ \phi_1, \phi_2 $ are the latitudes of the two points in radians,
- $ \lambda_1, \lambda_2 $ are the longitudes of the two points in radians,
- $ R $ is the Earth's radius (mean radius = 6,371 km),
- $ d $ is the distance between the points in kilometers.


In [None]:
# -------------------- STEP 5: Define Haversine Distance Function -------------------- #
def haversine(lat1, lon1, lat2, lon2):
    """Compute the great-circle distance (Haversine formula) between two GPS coordinates."""
    R = 6371  # Earth radius in kilometers
    phi1, phi2 = map(math.radians, [lat1, lat2])
    delta_phi = math.radians(lat2 - lat1)
    delta_lambda = math.radians(lon2 - lon1)

    a = math.sin(delta_phi / 2)**2 + math.cos(phi1) * math.cos(phi2) * math.sin(delta_lambda / 2)**2
    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

    return R * c  # Distance in km

In [None]:
# -------------------- STEP 6: Compute Distance per Bird -------------------- #
# Compute previous lat/lon per bird before applying Haversine formula
data["prev_lat"] = data.groupby("tag-local-identifier")["location-lat"].shift(1)
data["prev_lon"] = data.groupby("tag-local-identifier")["location-long"].shift(1)

# Apply Haversine function to compute distances
data["distance(km)"] = data.apply(
    lambda row: haversine(row["prev_lat"], row["prev_lon"], row["location-lat"], row["location-long"])
    if pd.notna(row["prev_lat"]) and pd.notna(row["prev_lon"]) else 0, axis=1
)

# Drop temporary columns
data.drop(columns=["prev_lat", "prev_lon"], inplace=True)

In [None]:
# ------------------------ STEP 7: Compute Speed (Avoid Division by Zero) ------------------------ #
data["speed(km/hr)"] = data["distance(km)"] / data["time_diff(hrs)"]

# Replace inf, -inf, NaN → First np.nan, then 0
data["speed(km/hr)"] = data["speed(km/hr)"].replace([np.inf, -np.inf], np.nan)
data["speed(km/hr)"] = data["speed(km/hr)"].fillna(0)  # Replace NaN with 0


In [None]:
!pip install folium

In [None]:
import folium
from folium.plugins import AntPath
import pandas as pd
import numpy as np
import ipywidgets as widgets
from IPython.display import display, clear_output
import matplotlib.pyplot as plt
from matplotlib import colormaps

# Ensure timestamp is in datetime format
data['timestamp'] = pd.to_datetime(data['timestamp'])

# Get unique years, months, and tags
unique_years = sorted(data['year'].unique())
unique_months = sorted(data['month'].unique())
unique_tags = sorted(data['tag-local-identifier'].unique())

# Set up color mapping for each unique tag
base_cmap = colormaps.get_cmap('tab10')
color_map = lambda i: base_cmap(i / max(len(unique_tags) - 1, 1))  # Normalize index
tag_colors = {
    tag: f"#{int(color_map(i)[0]*255):02x}{int(color_map(i)[1]*255):02x}{int(color_map(i)[2]*255):02x}"
    for i, tag in enumerate(unique_tags)
}

# Create dropdowns for year and month selection
year_selector = widgets.SelectMultiple(
    options=unique_years,
    value=[unique_years[0]],
    description='Years',
    layout=widgets.Layout(height='100px', width='150px')
)

month_selector = widgets.SelectMultiple(
    options=unique_months,
    value=[unique_months[0]],
    description='Months',
    layout=widgets.Layout(height='100px', width='150px')
)

# Button to update the map
update_button = widgets.Button(description="Update Map")

# Output widget to display the map
output = widgets.Output()

# Function to plot movement interactively
def plot_movement_interactive(years, months):
    # Filter data
    filtered_data = data[data["year"].isin(years) & data["month"].isin(months)]

    if filtered_data.empty:
        with output:
            clear_output(wait=True)
            print("No data available for the selected period.")
        return None

    # Initialize the map at the first valid location
    first_point = (
        filtered_data.iloc[0]["location-lat"],
        filtered_data.iloc[0]["location-long"]
    )
    m = folium.Map(location=first_point, zoom_start=8)

    # Plot paths for each unique bird tag
    for tag in filtered_data["tag-local-identifier"].unique():
        bird_data = filtered_data[filtered_data["tag-local-identifier"] == tag]
        bird_color = tag_colors[tag]
        path = list(zip(bird_data["location-lat"], bird_data["location-long"]))

        # Draw movement path
        folium.PolyLine(path, color=bird_color, weight=2.5, opacity=0.8).add_to(m)

        # Add point markers
        for _, row in bird_data.iterrows():
            folium.CircleMarker(
                location=(row["location-lat"], row["location-long"]),
                radius=5,
                color=bird_color,
                fill=True,
                fill_color=bird_color,
                popup=f"Tag: {tag}<br>Time: {row['timestamp']}"
            ).add_to(m)

    return m

# Button click event handler
def on_button_click(b):
    output.clear_output(wait=True)
    selected_years = [int(year) for year in year_selector.value]
    selected_months = [int(month) for month in month_selector.value]
    map_plot = plot_movement_interactive(selected_years, selected_months)

    if map_plot:
        with output:
            display(map_plot)

update_button.on_click(on_button_click)

# Display UI
display(widgets.HBox([year_selector, month_selector]))
display(update_button, output)


In [None]:
import folium
import pandas as pd
import ipywidgets as widgets
from IPython.display import display, clear_output
from folium.plugins import TimestampedGeoJson
import matplotlib.pyplot as plt
from matplotlib import colormaps

# Ensure timestamp is in datetime format
data['timestamp'] = pd.to_datetime(data['timestamp'])

# Get unique tag-local-identifiers
unique_tags = sorted(data['tag-local-identifier'].unique())

# Assign unique colors to each tag using normalized colormap
base_cmap = colormaps.get_cmap('tab10')
color_map = lambda i: base_cmap(i / max(len(unique_tags) - 1, 1))
tag_colors = {
    tag: f"#{int(color_map(i)[0]*255):02x}{int(color_map(i)[1]*255):02x}{int(color_map(i)[2]*255):02x}"
    for i, tag in enumerate(unique_tags)
}

# Create dropdowns for tag, year, and month selection
tag_selector = widgets.Dropdown(
    options=unique_tags,
    value=unique_tags[0],
    description='Tag:',
    layout=widgets.Layout(width='200px')
)

year_selector = widgets.Dropdown(
    options=[],
    description='Year:',
    layout=widgets.Layout(width='200px')
)

month_selector = widgets.Dropdown(
    options=[],
    description='Month:',
    layout=widgets.Layout(width='200px')
)

update_button = widgets.Button(description="Update Map")
output = widgets.Output()

# Update year and month dropdowns dynamically based on tag selection
def update_year_month_dropdowns(tag):
    filtered_data = data[data["tag-local-identifier"] == tag]
    unique_years = sorted(filtered_data['year'].unique())
    unique_months = sorted(filtered_data['month'].unique())

    year_selector.options = unique_years
    month_selector.options = unique_months

    if unique_years:
        year_selector.value = unique_years[0]
    if unique_months:
        month_selector.value = unique_months[0]

# Movement plotting function with animation
def plot_movement_interactive(tag, year, month):
    filtered_data = data[
        (data["tag-local-identifier"] == tag) &
        (data["year"] == year) &
        (data["month"] == month)
    ]

    if filtered_data.empty:
        with output:
            clear_output(wait=True)
            print("No data available for the selected tag, year, and month.")
        return None

    filtered_data = filtered_data.sort_values(by="timestamp")
    bird_color = tag_colors[tag]

    first_point = (filtered_data.iloc[0]["location-lat"], filtered_data.iloc[0]["location-long"])
    m = folium.Map(location=first_point, zoom_start=8)

    features = []
    path_coordinates = []

    for _, row in filtered_data.iterrows():
        point_feature = {
            'type': 'Feature',
            'geometry': {
                'type': 'Point',
                'coordinates': [row["location-long"], row["location-lat"]]
            },
            'properties': {
                'time': row['timestamp'].isoformat(),
                'popup': f"Tag: {tag}<br>Time: {row['timestamp']}",
                'icon': 'circle',
                'iconstyle': {
                    'fillColor': bird_color,
                    'fillOpacity': 0.6,
                    'stroke': 'false',
                    'radius': 5
                }
            }
        }
        features.append(point_feature)
        path_coordinates.append([row["location-long"], row["location-lat"]])

    line_feature = {
        'type': 'Feature',
        'geometry': {
            'type': 'LineString',
            'coordinates': path_coordinates
        },
        'properties': {
            'times': [row['timestamp'].isoformat() for _, row in filtered_data.iterrows()],
            'style': {
                'color': bird_color,
                'weight': 2
            }
        }
    }
    features.append(line_feature)

    TimestampedGeoJson(
        {'type': 'FeatureCollection', 'features': features},
        period='PT1M',
        add_last_point=True,
        auto_play=True,
        loop=False,
        max_speed=30,
        loop_button=True,
        date_options='YYYY/MM/DD HH:mm:ss',
        time_slider_drag_update=True
    ).add_to(m)

    return m

# Button click logic
def on_button_click(b):
    output.clear_output(wait=True)
    selected_tag = tag_selector.value
    selected_year = year_selector.value
    selected_month = month_selector.value
    map_plot = plot_movement_interactive(selected_tag, selected_year, selected_month)
    if map_plot:
        with output:
            display(map_plot)

# Link dropdown updates to tag selection
def on_tag_change(change):
    update_year_month_dropdowns(change['new'])

tag_selector.observe(on_tag_change, names='value')
update_year_month_dropdowns(tag_selector.value)

update_button.on_click(on_button_click)

# Display all UI components
display(widgets.VBox([tag_selector, year_selector, month_selector]))
display(update_button, output)



In [None]:
# -------------------- STEP 8: Compute Bearing (Direction of Movement) -------------------- #

# Function to calculate bearing between two GPS points
def calculate_bearing(lat1, lon1, lat2, lon2):
    """
    Calculate the initial bearing (direction) from point (lat1, lon1) to (lat2, lon2).
    The result is in degrees (0° = North, 90° = East, 180° = South, 270° = West).
    """
    lat1, lon1, lat2, lon2 = map(np.radians, [lat1, lon1, lat2, lon2])

    delta_lon = lon2 - lon1
    x = np.sin(delta_lon) * np.cos(lat2)
    y = np.cos(lat1) * np.sin(lat2) - np.sin(lat1) * np.cos(lat2) * np.cos(delta_lon)

    initial_bearing = np.arctan2(x, y)
    initial_bearing = np.degrees(initial_bearing)

    return (initial_bearing + 360) % 360  # Normalize to 0-360 degrees

# Initialize a new bearing column
data["bearing"] = np.nan  # Start with NaN for all rows

# Compute bearing for each bird (tag) individually
for tag in data["tag-local-identifier"].unique():
    tag_data = data[data["tag-local-identifier"] == tag].copy()
    tag_data.sort_values("timestamp", inplace=True)

    # Shifted coordinates to get previous point
    lat1 = tag_data["location-lat"].shift(1)
    lon1 = tag_data["location-long"].shift(1)
    lat2 = tag_data["location-lat"]
    lon2 = tag_data["location-long"]

    # Compute bearing
    bearings = calculate_bearing(lat1, lon1, lat2, lon2)

    # Fill NaN with 0 and assign to main DataFrame
    data.loc[tag_data.index, "bearing"] = bearings.fillna(0)


In [None]:
# -------------------- STEP 9: Encode Cyclic Time Features (Preserve Temporal Patterns) -------------------- #
# Convert hour to cyclic feature
data["hour_sin"] = np.sin(2 * np.pi * data["hour"] / 24)
data["hour_cos"] = np.cos(2 * np.pi * data["hour"] / 24)

# Convert month to cyclic feature
data["month_sin"] = np.sin(2 * np.pi * data["month"] / 12)
data["month_cos"] = np.cos(2 * np.pi * data["month"] / 12)

# Drop original columns
data.drop(["hour", "month"], axis=1, inplace=True)


In [None]:
# -------------------- STEP 8: Reorder Columns -------------------- #
desired_order = ["tag-local-identifier", "timestamp", "year", "month_sin", "month_cos", "hour_sin","hour_cos", "time_diff(hrs)", "distance(km)", "speed(km/hr)",
                 "ECMWF Interim Full Daily Invariant High Vegetation Cover", "bearing",
                "location-long", "location-lat"
                ]
data = data[desired_order]

In [None]:
data.head()

In [None]:
data.drop(columns=['year', 'time_diff(hrs)', 'ECMWF Interim Full Daily Invariant High Vegetation Cover'], axis=1, inplace=True)

In [None]:
data.head()

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Calculate speed for each bird using Haversine formula
data['speed'] = 0.0
data['distance'] = 0.0

for tag in data["tag-local-identifier"].unique():
    tag_data = data[data["tag-local-identifier"] == tag].copy()

    lat1 = np.radians(tag_data["location-lat"].shift(1))
    lon1 = np.radians(tag_data["location-long"].shift(1))
    lat2 = np.radians(tag_data["location-lat"])
    lon2 = np.radians(tag_data["location-long"])

    dlat = lat2 - lat1
    dlon = lon2 - lon1

    a = np.sin(dlat / 2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon / 2)**2
    c = 2 * np.arcsin(np.sqrt(a))
    distance = 6371 * c

    time_diff = tag_data["timestamp"].diff().dt.total_seconds() / 3600
    speed = distance / time_diff
    speed = speed.fillna(0)

    data.loc[tag_data.index, "speed"] = speed
    data.loc[tag_data.index, "distance"] = distance.fillna(0)

# Select bird
bird_tag = 91732
bird_data = data[data["tag-local-identifier"] == bird_tag]

# Replace NaNs
bird_data = data[data["tag-local-identifier"] == bird_tag].copy()
bird_data.loc[:, "bearing"] = bird_data["bearing"].fillna(0)


# Plotting
fig, axes = plt.subplots(2, 3, figsize=(16, 10))
fig.suptitle(f"Visualizations for Bird Tag: {bird_tag}", fontsize=16)

# Top row
axes[0, 0].hist(bird_data["speed"], bins=50, color='skyblue')
axes[0, 0].set_title("Speed Distribution")
axes[0, 0].set_xlabel("Speed (km/hr)")

axes[0, 1].scatter(bird_data["location-long"], bird_data["location-lat"], c='blue', s=10)
axes[0, 1].set_title("Path (Lat vs Long)")
axes[0, 1].set_xlabel("Longitude")
axes[0, 1].set_ylabel("Latitude")

axes[0, 2].boxplot(
    [bird_data["speed"], bird_data["distance"], bird_data["bearing"]],
    tick_labels=["Speed", "Distance", "Bearing"]
)

axes[0, 2].set_title("Outlier Detection (Boxplots)")

# Bottom row — NEW PLOTS
# Time Series: Speed
axes[1, 0].plot(bird_data["timestamp"], bird_data["speed"], color='green')
axes[1, 0].set_title("Speed over Time")
axes[1, 0].set_xlabel("Time")
axes[1, 0].tick_params(axis='x', rotation=45)

# Scatter: Distance vs Speed
axes[1, 1].scatter(bird_data["distance"], bird_data["speed"], alpha=0.5, color='purple')
axes[1, 1].set_title("Distance vs Speed")
axes[1, 1].set_xlabel("Distance (km)")
axes[1, 1].set_ylabel("Speed (km/hr)")

# Histogram: Bearings
axes[1, 2].hist(bird_data["bearing"], bins=30, color='orange')
axes[1, 2].set_title("Bearing Distribution")
axes[1, 2].set_xlabel("Bearing (°)")

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()


In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import ipywidgets as widgets
from IPython.display import display

# Function to calculate time difference and plot graphs
def plot_time_gap_analysis(tag_id):
    """
    Analyzes the relationship between time intervals and speed/distance.

    Parameters:
    tag_id (int or str): The unique identifier of the bird.
    """
    data_tag = data[data['tag-local-identifier'] == tag_id].copy()
    data_tag['timestamp'] = pd.to_datetime(data_tag['timestamp'])
    data_tag = data_tag.sort_values(by='timestamp')

    # Compute time difference in hours
    data_tag['time_diff'] = data_tag['timestamp'].diff().dt.total_seconds() / 3600

    if data_tag.empty:
        print(f"No data available for tag {tag_id}")
        return

    fig, axes = plt.subplots(1, 3, figsize=(18, 5))
    fig.suptitle(f'Time Interval Analysis for Bird {tag_id}', fontsize=14)

    # Histogram of time intervals
    axes[0].hist(data_tag['time_diff'].dropna(), bins=30, edgecolor='black')
    axes[0].set_title('Time Interval Distribution')
    axes[0].set_xlabel('Time Interval (hours)')
    axes[0].set_ylabel('Frequency')

    # Scatter plot of distance vs. time interval
    axes[1].scatter(data_tag['time_diff'], data_tag['distance(km)'], alpha=0.5)
    axes[1].set_title('Distance vs. Time Interval')
    axes[1].set_xlabel('Time Interval (hours)')
    axes[1].set_ylabel('Distance (km)')

    # Scatter plot of speed vs. time interval
    axes[2].scatter(data_tag['time_diff'], data_tag['speed(km/hr)'], alpha=0.5)
    axes[2].set_title('Speed vs. Time Interval')
    axes[2].set_xlabel('Time Interval (hours)')
    axes[2].set_ylabel('Speed (km/hr)')

    plt.show()

# Create dropdown widget to select bird tag
tag_selector = widgets.Dropdown(
    options=data['tag-local-identifier'].unique(),
    description='Select Tag:',
    style={'description_width': 'initial'}
)

# Display dropdown and link it to the function
# display(tag_selector)
widgets.interactive(plot_time_gap_analysis, tag_id=tag_selector)

In [None]:
import numpy as np
import pandas as pd
import folium
from folium.plugins import MarkerCluster
from sklearn.cluster import DBSCAN

# Assuming df has columns: ['timestamp', 'location-lat', 'location-long', 'speed(km/hr)']
resting_threshold = 0.025  # km/hr
resting_points = data[data['speed(km/hr)'] <= resting_threshold].copy()

# Clustering with DBSCAN
epsilon = 0.1  # Adjust based on typical stopover site size
min_samples = 10  # Minimum points to form a cluster
db = DBSCAN(eps=epsilon, min_samples=min_samples, metric='haversine').fit(np.radians(resting_points[['location-lat', 'location-long']]))

# Assign cluster labels
resting_points['cluster'] = db.labels_

# Create a folium map centered at the first resting point
center_lat, center_long = resting_points.iloc[0][['location-lat', 'location-long']]
m = folium.Map(location=[center_lat, center_long], zoom_start=8)

# Color mapping for clusters
colors = ['red', 'blue', 'green', 'purple', 'orange', 'darkred', 'lightblue', 'pink', 'black', 'gray']
marker_cluster = MarkerCluster().add_to(m)

# Plot resting points with cluster colors
for _, row in resting_points.iterrows():
    cluster = row['cluster']
    color = colors[cluster % len(colors)] if cluster != -1 else "black"  # Noise in black
    folium.CircleMarker(
        location=[row['location-lat'], row['location-long']],
        radius=4,
        color=color,
        fill=True,
        fill_color=color,
        fill_opacity=0.7,
        popup=f"Cluster: {cluster}"
    ).add_to(marker_cluster)

# Show the map
m



In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import GRU, Dense, Dropout, Input
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping



X = data.drop(columns=['timestamp', 'location-long', 'location-lat']).to_numpy()
y = data[['location-long', 'location-lat']].to_numpy()
tags = data['tag-local-identifier'].to_numpy()


unique_tags = np.unique(tags)


def create_sequences(tag_data, tag_labels, seq_length):
    X_seq, y_seq = [], []
    for i in range(len(tag_data) - seq_length):  
        X_seq.append(tag_data[i:i + seq_length])
        y_seq.append(tag_labels[i + seq_length]) 
    return np.array(X_seq), np.array(y_seq)


X_sequences, y_sequences = [], []
sequence_length = 10  


for tag in unique_tags:
    tag_indices = np.where(tags == tag)[0]  # Get indices for the tag
    tag_data = X[tag_indices]
    tag_labels = y[tag_indices]

    if len(tag_data) > sequence_length:
        X_seq, y_seq = create_sequences(tag_data, tag_labels, sequence_length)
        X_sequences.append(X_seq)
        y_sequences.append(y_seq)


X_sequences = np.vstack(X_sequences)
y_sequences = np.vstack(y_sequences)

#train and testing validation split (80-10-10)
X_train, X_temp, y_train, y_temp = train_test_split(X_sequences, y_sequences, test_size=0.2, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)


scaler_X = StandardScaler()
X_train = scaler_X.fit_transform(X_train.reshape(-1, X_train.shape[2])).reshape(X_train.shape)
X_val = scaler_X.transform(X_val.reshape(-1, X_val.shape[2])).reshape(X_val.shape)
X_test = scaler_X.transform(X_test.reshape(-1, X_test.shape[2])).reshape(X_test.shape)


scaler_y = StandardScaler()
y_train = scaler_y.fit_transform(y_train)
y_val = scaler_y.transform(y_val)
y_test = scaler_y.transform(y_test)


num_features = X_train.shape[2] 

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, GRU, Dense

model = Sequential([
    Input(shape=(sequence_length, num_features)),  
    GRU(64, return_sequences=True),
    GRU(32, return_sequences=False),
    Dense(16, activation='relu'),
    Dense(2)
])



model.compile(optimizer=Adam(learning_rate=0.001), loss='mse', metrics=['mae'])


history = model.fit(
    X_train, y_train,
    validation_data=(X_val, y_val),
    epochs=100,
    batch_size=32,
    callbacks=[EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)]
)


model.save("my_model.keras")



plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Training History')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()


In [None]:
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error
from tensorflow.keras.models import load_model
from tensorflow.keras.losses import MeanSquaredError

# Load the trained model from the 'models' directory
model_path = 'models/next_lat_long_model.h5'
model = load_model(model_path, compile=False)

# Compile the model (useful for evaluation or further training)
model.compile(loss=MeanSquaredError(), optimizer='adam')

# Predict on the scaled test dataset
y_pred_scaled = model.predict(X_test)

# Inverse transform to get actual lat/long values
y_pred = scaler_y.inverse_transform(y_pred_scaled)
y_true = scaler_y.inverse_transform(y_test)

# Compute performance metrics
mae = mean_absolute_error(y_true, y_pred)
rmse = np.sqrt(mean_squared_error(y_true, y_pred))

print(f"Mean Absolute Error (MAE): {mae}")
print(f"Root Mean Squared Error (RMSE): {rmse}")

# Create a DataFrame to compare true vs predicted values
results_df = pd.DataFrame({
    'True_Longitude': y_true[:, 0],
    'True_Latitude': y_true[:, 1],
    'Predicted_Longitude': y_pred[:, 0],
    'Predicted_Latitude': y_pred[:, 1]
})

# Calculate individual and combined absolute errors
results_df['Longitude_Error'] = results_df['True_Longitude'] - results_df['Predicted_Longitude']
results_df['Latitude_Error'] = results_df['True_Latitude'] - results_df['Predicted_Latitude']
results_df['Absolute_Error'] = np.sqrt(
    results_df['Longitude_Error']**2 + results_df['Latitude_Error']**2
)

# Export the results to a CSV file
results_df.to_csv('test_predictions_with_errors.csv', index=False)

# Preview the first few rows
print("\nTest Predictions with Errors:")
print(results_df.head())



In [None]:
import folium
from folium import PolyLine, CircleMarker
import pandas as pd

# Load the predictions from the CSV in the datasets/ directory
results_df = pd.read_csv('dataset/test_predictions_with_errors_2.csv')

# Sample the first N points for clarity in visualization
N = 20
sample_df = results_df.head(N)

# Initialize the map centered at the starting actual location
start_coords = [sample_df['True_Latitude'].iloc[0], sample_df['True_Longitude'].iloc[0]]
m = folium.Map(location=start_coords, zoom_start=5)

# Plot actual path (in blue)
actual_path = list(zip(sample_df['True_Latitude'], sample_df['True_Longitude']))
PolyLine(actual_path, color='blue', weight=4, opacity=0.8, tooltip="Actual Path").add_to(m)

# Plot predicted path (in red)
predicted_path = list(zip(sample_df['Predicted_Latitude'], sample_df['Predicted_Longitude']))
PolyLine(predicted_path, color='red', weight=4, opacity=0.8, tooltip="Predicted Path").add_to(m)

# Draw error lines and point markers
for _, row in sample_df.iterrows():
    actual_point = (row['True_Latitude'], row['True_Longitude'])
    predicted_point = (row['Predicted_Latitude'], row['Predicted_Longitude'])

    # Line connecting actual to predicted
    PolyLine([actual_point, predicted_point], color='gray', weight=1, opacity=0.5).add_to(m)

    # Markers
    CircleMarker(actual_point, radius=3, color='blue', fill=True, fill_color='blue').add_to(m)
    CircleMarker(predicted_point, radius=3, color='red', fill=True, fill_color='red').add_to(m)

# Show the map
m



---

## ✅ Conclusion

This project successfully demonstrated a complete end-to-end pipeline for processing, modeling, and predicting wildlife movement data using deep learning. We utilized real-world geospatial datasets and implemented a GRU-based recurrent neural network to model and forecast trajectories.

## 🚀 Future Scope

- Integrate real-time animal movement data using APIs (e.g., Movebank)
- Deploy the model as a Streamlit or Flask web app for conservationists
- Expand to multi-species, multi-continent trajectory analysis
- Add reinforcement learning to optimize migratory route prediction
- Collaborate with wildlife reserves for real-world deployment

---

📌 *Author: Aarjun Mahule*  
📅 *Date: May 2025*  
📍 *Nagpur, India*
