# Bird Visulaization project
This Jupyter notebook looks serves for exploration capabilities regarding the visualizations that we have in mind


### Research Question 1  
**How do migration routes differ across species?**

**Goal:** Compare the migration paths of different bird species.  
**Visualization:** Interactive map showing trajectories colored by species.  
**Interaction:** User selects species through a multi-select dropdown.  
**Note:** Species is the single varying dimension (no region/season filters for this RQ).


### Library Imports
We initialize the project environment with the following dependencies:

* **Data Handling:** `pandas` and `numpy` for dataframe manipulation.
* **Visualization:** `plotly.express` for creating the(globe) map.
* **Dashboarding:**  `Dash`, `html`, `dcc`: Core Dash components.
    * `Input`, `Output`, `State`, `ctx`: For handling callback logic and button triggers.
    * `dash_bootstrap_components`: For responsive grid layouts and pre-styled components.
* **Geospatial Processing:** `global_land_mask` is imported to filter out flight coordinates that fall in the ocean.
* **Custom Data:** We modify `sys.path` to import `df_cleaned_result` from our local project structure.

In [10]:
import sys
import os
import pandas as pd
import numpy as np
from dash import Dash, html, dcc, Input, Output,State,ctx # Required to detect which button was clicked
import dash_bootstrap_components as dbc
import plotly.express as px
from global_land_mask import globe 
sys.path.append(os.path.abspath(".."))
from data.df_cleaning import df_cleaned_result

## 2. Data Preprocessing & Stratified Sampling
This cell prepares the raw data for the dashboard through three specific operations:

1.  **Geospatial Cleaning (Land Mask):**
    * We identify flight paths that start or end in the ocean.
    * Using the `global_land_mask` library, we filter the dataframe to retain **only** segments where both the Start and End coordinates are strictly on land.

2.  **Stratified Sampling (20 Birds per Species):**
    * To prevent map clutter and improve performance, we limit the dataset size.
    * We select **20 random unique Bird IDs** for each species.
    * *Note:* We sample by `ID`, not by row. This ensures we preserve the **full flight trajectory** for every selected bird, rather than plotting random, disconnected points.

3.  **Sorting:**
    * The final sample is sorted by `Bird_ID` and time (or index). This ensures that the visualization engine draws the flight lines in the correct chronological order (Point A $\to$ Point B $\to$ Point C).

In [2]:
df = df_cleaned_result.copy()

#PRE-PROCESSING: REMOVE OCEAN POINTS 
print(f"Total flight segments before cleaning: {len(df)}")

# Create a mask: True only if BOTH Start AND End coordinates are on land
is_start_land = globe.is_land(df['Start_Latitude'], df['Start_Longitude'])
is_end_land   = globe.is_land(df['End_Latitude'], df['End_Longitude'])

# Apply filter: Keep only rows where both are True
df = df[is_start_land & is_end_land].copy()

print(f"Total flight segments on land: {len(df)}")


# SAMPLING LOGIC 
# Now we select 20 random Bird_IDs from the CLEANED dataset

birds_per_species = 20
target_ids = []

for species in df['Species'].unique():
    # Get all unique IDs for this species (from the clean list)
    ids_for_species = df[df['Species'] == species]['Bird_ID'].unique()
    
    # Randomly pick 20 (or all if less than 20 exist)
    selected = np.random.choice(
        ids_for_species, 
        size=min(birds_per_species, len(ids_for_species)), 
        replace=False
    )
    target_ids.extend(selected)

# Filter the dataframe to include only these birds
df_sample = df[df['Bird_ID'].isin(target_ids)].copy()

# Sort is still good practice for timeline consistency
if 'date_time' in df_sample.columns:
    df_sample = df_sample.sort_values(by=['Bird_ID', 'date_time'])
else:
    df_sample = df_sample.sort_index()

print(f"Ready to plot: {len(df_sample)} flight segments across {len(target_ids)} birds.")

Total flight segments before cleaning: 9999
Total flight segments on land: 1126
Ready to plot: 140 flight segments across 140 birds.


## 3. Helper Functions (Data Logic)
This cell defines the core data filtering logic for the dashboard:

* **`filter_data`:**
    * **Context:** This function is called by the Map Callback whenever the user changes the dropdown selection.
    * **Mechanism:** It accepts a list of `bird_ids` and subsets the main dataframe.
    * **Why it's important:** We use `.isin(bird_ids)` to ensure we retrieve the *complete* flight history (every single latitude/longitude point) for the selected birds. This is required to draw continuous lines on the map rather than disconnected dots.

In [11]:
# filtering specific birds, not the whole species group.
def filter_data(bird_ids, df=df_sample):
    filtered = df.copy()
    if bird_ids:
        # We now check against the 'Bird_ID' column
        filtered = filtered[filtered['Bird_ID'].isin(bird_ids)]
    return filtered

## 4. Map Visualization Engine
This function is the core of the dashboard. It transforms the filtered data into a Plotly 3D Globe visualization.

### Key Logic Steps:
1.  **Data Transformation (Vector to Sequence):**
    * **The Problem:** Our dataset stores flights as "vectors" (Start & End coordinates in the *same* row). Plotly expects "sequential" data (Point A in row 1, Point B in row 2) to draw lines.
    * **The Solution:** We loop through the filtered data and "melt" each flight into two separate rows: a **Start Point** and an **End Point**.
    * **Grouping:** We assign a unique `Segment_ID` to each pair so Plotly knows exactly which two points to connect.

2.  **Contextual Hover Info:**
    * We pre-format the latitude/longitude strings (`start_txt`, `end_txt`) so that when a user hovers over *any* part of the line, they see the full context of where that bird took off and where it landed.

3.  **Visualization (`px.line_geo`):**
    * **Projection:** We use `orthographic` to render a 3D Earth rather than a flat map, preserving the scientific accuracy of Great Circle migration routes.
    * **Styling:** We use `mode='lines+markers'` to add distinct dots at the start and end of every flight, giving the map a polished "travel route" aesthetic.
    * **Theme:** We apply a "Light Mode" palette (pale blue ocean, light grey land) to ensure the bold colored flight paths stand out clearly.

In [None]:
# --- 4. MAP VISUALIZATION ENGINE ---
def create_map(filtered_df):
    if filtered_df.empty:
        fig = px.scatter_geo()
        fig.update_layout(template="plotly_white", paper_bgcolor="rgba(0,0,0,0)")
        fig.add_annotation(text="No data selected", x=0.5, y=0.5, showarrow=False)
        return fig
    
    # Transform data for Line + Marker plotting
    plot_data = []
    for index, row in filtered_df.iterrows():
        segment_id = f"{row['Bird_ID']}_{index}"
        
        # Get the Reason, Remove Coordinate Strings ---
        # Ensure your column name matches 'Migration_Reason'
        reason = row['Migration_Reason'] 
        
        # Start Point
        plot_data.append({
            "Bird_ID": row['Bird_ID'],
            "Species": row['Species'],
            "Latitude": row['Start_Latitude'],
            "Longitude": row['Start_Longitude'],
            "Segment_ID": segment_id,
            "Position": "Start (Takeoff)",
            "Migration Reason": reason, 
        })
        # End Point
        plot_data.append({
            "Bird_ID": row['Bird_ID'],
            "Species": row['Species'],
            "Latitude": row['End_Latitude'],
            "Longitude": row['End_Longitude'],
            "Segment_ID": segment_id,
            "Position": "End (Landing)",
            "Migration Reason": reason, 
        })
    
    df_plot = pd.DataFrame(plot_data)

    # Plot
    fig = px.line_geo(
        df_plot,
        lat="Latitude", lon="Longitude", color="Species",
        line_group="Segment_ID", 
        hover_name="Bird_ID", 
        
        #  Update Hover Data
        hover_data={
            "Species": True, 
            "Migration Reason": True, 
            "Position": True, 
            "Segment_ID": False, 
            "Latitude": False, 
            "Longitude": False
        },
        
        projection="orthographic", 
        title=f"Tracking {filtered_df['Bird_ID'].nunique()} Unique Birds",
        color_discrete_sequence=px.colors.qualitative.Bold 
    )

    # Styling: Lines + Markers
    fig.update_traces(
        mode='lines+markers', 
        line=dict(width=2), 
        marker=dict(size=6, symbol='circle', opacity=1, line=dict(width=0)),
        opacity=0.8
    )
    
    # Map Geos styling
    fig.update_geos(
        visible=True, resolution=50,
        showcountries=True, countrycolor="#bbbbbb",
        showcoastlines=True, coastlinecolor="#bbbbbb",
        showland=True, landcolor="#f0f0f0",      
        showocean=True, oceancolor="#e4edff",   
        projection_rotation=dict(lon=-10, lat=20)
    )
    
    fig.update_layout(
        template="plotly_white",
        margin={"r":0,"t":50,"l":0,"b":0},
        paper_bgcolor="rgba(0,0,0,0)", 
        legend=dict(yanchor="top", y=0.95, xanchor="left", x=0.05, bgcolor="rgba(255,255,255,0.9)")
    )
    return fig

## 5. Application Initialization
This cell initializes the Dash application instance:

* **`Dash(__name__)`:** Creates the main application object.
* **`external_stylesheets`:** We load `dbc.themes.BOOTSTRAP`. This is crucial because it pulls in the CSS framework that allows our Rows, Columns, Cards, and Buttons to look styled and responsive immediately.
* **`server = app.server`:** This exposes the underlying Flask server. While not strictly necessary for running locally (debug mode), this line is **required** if we ever want to deploy this dashboard to a production environment (like Heroku or a cloud server).

In [13]:
app = Dash(__name__, external_stylesheets=[dbc.themes.BOOTSTRAP])
server = app.server

## 6. Dashboard Layout (User Interface)
This cell defines the visual structure of the dashboard using the **Bootstrap Grid System**.

### Layout Structure:
1.  **Container (`dbc.Container`):** The main wrapper for the entire page. We set `fluid=True` to let the dashboard expand to fill the entire width of the screen.

2.  **Header Row:** Contains the Title and Subtitle.

3.  **Main Content Row (Split Layout):**
    * **Left Column (Controls - 25% width):**
        * Contains a **Card** holding all user inputs.
        * Includes two Dropdowns (`Species` and `Bird ID`) and their respective "Select All" buttons.
        * *Technical Note:* We use `width=12, md=3` to make this column take up the full screen on mobile phones (`width=12`) but only 25% (`md=3`) on desktop screens.
    * **Right Column (Map - 75% width):**
        * Contains the **Map Card** which holds the `dcc.Graph` component.
        * The graph height is fixed at `75vh` (75% of the viewport height) to ensure the globe is large and immersive.

In [14]:
app.layout = dbc.Container([
    dbc.Row([
        dbc.Col(html.H2("Global Bird Migration Tracker", className="display-6"), width=12),
        dbc.Col(html.P("Compare species between specific bird IDs.", className="text-muted"), width=12),
    ], className="my-4"),
    
    dbc.Row([
        dbc.Col([
            dbc.Card([
                dbc.CardHeader("Filter Controls", className="fw-bold"),
                dbc.CardBody([
                    #SPECIES SECTION 
                    html.Label("1. Select Species", className="mb-2 fw-bold text-primary"),
                    dcc.Dropdown(
                        id='species-filter',
                        options=[{'label': s, 'value': s} for s in df_sample['Species'].unique()],
                        value=[df_sample['Species'].unique()[0]], 
                        multi=True, 
                        clearable=True
                    ),
                    #Button to select all species
                    dbc.Button("Select All Species", id="btn-all-species", color="light", size="sm", className="mt-1 w-100 border"),
                    
                    html.Hr(),
                    
                    #BIRD ID SECTION
                    html.Label("2. Select Specific Birds", className="mb-2 fw-bold text-primary"),
                    dcc.Dropdown(
                        id='bird-selector', 
                        multi=True, 
                        placeholder="Select Bird IDs..."
                    ),
                    #Button to select all birds
                    dbc.Button("Select All Birds", id="btn-all-birds", color="light", size="sm", className="mt-1 w-100 border"),
                    
                    html.Small("Showing validated land-to-land flights only.", className="text-muted mt-3 d-block")
                ])
            ], className="mb-4 shadow-sm")
        ], width=12, md=3), 
        
        dbc.Col([
            dbc.Card([
                dbc.CardBody([
                    dcc.Graph(id='migration-map', style={'height': '75vh'}) 
                ], style={'padding': '0'})
            ], className="shadow-sm")
        ], width=12, md=9)
    ]),
], fluid=True)

## 7. Interactivity & Logic (Callbacks)
This cell defines the event handlers (Callbacks) that power the dashboard's interactivity.

### 1. "Select All" Logic (`select_all_species`)
* **Trigger:** The "Select All Species" button.
* **Action:** It grabs every available option from the dropdown and sets them all as selected values instantly.

### 2. Chained Logic & State Management (`update_bird_options`)
This is the most complex function in the dashboard. It manages the dependency between **Species** (Dropdown 1) and **Bird IDs** (Dropdown 2).

* **Context Detection (`ctx.triggered_id`):**
    * Since this callback has *two* inputs (Species Dropdown and "Select All" Button), we use `ctx` to detect exactly which one was clicked.
    * **If Button Clicked:** We ignore the current selection and select **ALL** available birds.
    * **If Dropdown Changed:** We use "Smart Logic."

* **Smart "Add-One" Logic:**
    * **Problem:** Standard callbacks reset the second dropdown whenever the first one changes, deleting the user's previous work.
    * **Solution:** We use `State` to read the current bird selection. We **keep** the birds the user has already picked (if they belong to the selected species) and only add **one single bird** for any newly selected species. This allows for smooth comparison building without resetting the view.

### 3. Map Update (`update_map`)
* **Trigger:** Any change to the **Bird ID** selection.
* **Action:** Filters the dataset to the specific selected IDs and re-renders the 3D globe.

In [15]:
# Callback for "Select All Species" button
@app.callback(
    Output('species-filter', 'value'),
    Input('btn-all-species', 'n_clicks'),
    State('species-filter', 'options'),
    prevent_initial_call=True
)
def select_all_species(n_clicks, options):
    return [opt['value'] for opt in options]

# Main Callback for Options and Map Logic
@app.callback(
    [Output('bird-selector', 'options'), Output('bird-selector', 'value')],
    [Input('species-filter', 'value'), Input('btn-all-birds', 'n_clicks')],
    State('bird-selector', 'value')
)
def update_bird_options(selected_species, n_clicks_all, current_bird_selection):
    if not selected_species:
        return [], []
    
    dff = df_sample[df_sample['Species'].isin(selected_species)]
    unique_birds = dff[['Species', 'Bird_ID']].drop_duplicates()
    
    # Options list
    options = [
        {'label': f"{row['Species']} | {row['Bird_ID']}", 'value': row['Bird_ID']}
        for index, row in unique_birds.iterrows()
    ]
    options = sorted(options, key=lambda x: x['label'])
    
    # Determine Logic Source (Button click vs Dropdown change)
    trigger_id = ctx.triggered_id

    if trigger_id == 'btn-all-birds':
        all_ids = [opt['value'] for opt in options]
        return options, all_ids
    else:
        # Default Logic: Keep current selection, add 1 bird for new species
        current_ids = current_bird_selection if current_bird_selection else []
        valid_id_set = set(unique_birds['Bird_ID'])
        kept_selection = [bid for bid in current_ids if bid in valid_id_set]
        
        represented_species = dff[dff['Bird_ID'].isin(kept_selection)]['Species'].unique()
        missing_species = [s for s in selected_species if s not in represented_species]
        
        final_selection = list(kept_selection)
        for species in missing_species:
            species_birds = dff[dff['Species'] == species]['Bird_ID'].unique()
            if len(species_birds) > 0:
                final_selection.append(species_birds[0])
            
        return options, final_selection

@app.callback(
    Output('migration-map', 'figure'),
    Input('bird-selector', 'value')
)
def update_map(selected_bird_ids):
    if not selected_bird_ids:
        return create_map(pd.DataFrame()) 
    filtered = filter_data(selected_bird_ids, df=df_sample)
    return create_map(filtered)

## 8. Application Execution
This final block runs the Dash server.

* **`if __name__ == '__main__':`**: This standard Python guard ensures the server only starts if the script is executed directly (not if it is imported as a module).
* **`app.run(debug=True)`**:
    * Starts the local development server.
    * **`debug=True`**: Enables "Hot Reloading." If you change the code while the app is running, the dashboard will automatically refresh in your browser without needing a restart.
    * *Note:* In a production environment, `debug` should be set to `False`.


    

In [16]:
if __name__ == '__main__':
    app.run(debug=True)