# Data Visualization - Mussel Baskets

## Visualization

#### Function Descriptions and Usage

##### `verify_access_code`
- **Purpose**: Verifies the access code provided by the user.
- **Usage**: Hashes the input code using SHA256 and compares it to a pre-defined hashed access code. This ensures only users with the correct access code can proceed to use the application.

##### `load_trained_model`
- **Purpose**: Loads a trained machine learning model from a file.
- **Usage**: Uses Joblib to load the model necessary for making growth predictions.

##### `load_data`
- **Purpose**: Loads data from a CSV file into a Pandas DataFrame.
- **Usage**: Reads various datasets needed for the application.

##### `create_sidebar`
- **Purpose**: Creates a sidebar for user settings in the Streamlit app.
- **Usage**: Allows selection of year range and locations. The selections are returned as a tuple to filter the data displayed on the dashboard.

##### `display_main_map`
- **Purpose**: Displays an interactive map with mussel growth data.
- **Usage**: Uses Plotly Express to show bubbles representing mussel growth, colored by a selected feature and animated by month. Filters data based on user-selected years and locations.

##### `display_graphs`
- **Purpose**: Displays interactive graphs for various metrics over time.
- **Usage**: Uses Plotly Graph Objects to plot data based on user-selected years and locations, providing visual insights into mussel growth trends.

##### `display_feature_vs_target_analysis`
- **Purpose**: Compares a selected feature against Growth (g per day).
- **Usage**: Creates an interactive graph using user-selected years and locations to filter and plot the data for detailed feature analysis.

##### `haversine`
- **Purpose**: Calculates the great-circle distance between two points on the Earth's surface.
- **Usage**: Given latitude and longitude, computes the distance, useful for finding the nearest environmental data point.

##### `get_nearest_environmental_data`
- **Purpose**: Finds the nearest environmental data point for a specific month.
- **Usage**: Filters data by month and calculates distances using the haversine function to return the closest data point, providing accurate environmental data inputs for growth predictions.

##### `calculate_average_ash_free_dry_weight`
- **Purpose**: Calculates the average Ash Free Dry Weight (g) for a specified monitoring period.
- **Usage**: Filters relevant data and computes the mean to provide necessary input for the growth prediction model.

##### `predict_growth`
- **Purpose**: Predicts mussel growth based on user inputs and additional environmental features.
- **Usage**: Uses the trained model to make predictions for each monitoring period from May to October and returns the results in a DataFrame.

##### `display_prediction_interface`
- **Purpose**: Creates the interface for mussel growth prediction.
- **Usage**: Allows users to input latitude, longitude, and individual weight, then predicts growth based on these inputs. Displays the predictions and a corresponding plot.

##### `display_data`
- **Purpose**: Displays the interactive map, graphs, and prediction interface in the Streamlit app.
- **Usage**: Uses sidebar selections to filter data and calls relevant display functions to show visualizations and predictions.

##### `main`
- **Purpose**: Runs the Streamlit app for the Mussel Growth Trends Dashboard.
- **Usage**: Handles access control, loads necessary data and model files upon successful access verification, and calls the `display_data` function to render the dashboard.

#### Summary

This Streamlit application visualizes mussel growth data through interactive maps and graphs, and allows users to predict growth based on various environmental factors. The `verify_access_code` function secures access to the app, while `load_trained_model` and `load_data` handle loading necessary files. The sidebar for user input is created by `create_sidebar`. The core display functions `display_main_map`, `display_graphs`, and `display_feature_vs_target_analysis` render the main visual elements based on user inputs. Environmental data is managed through `get_nearest_environmental_data` and `calculate_average_ash_free_dry_weight`, with distance calculations done by `haversine`. Growth predictions are facilitated by `predict_growth` and its interface is managed by `display_prediction_interface`. The `display_data` function integrates these components into the main app, which is initialized and run by the `main` function.

In [18]:
%%writefile mussel_visualization.py

# Streamlit for creating the web application interface
import streamlit as st

# Pandas for data manipulation and analysis
import pandas as pd

# Folium and streamlit_folium for interactive maps
import folium
from streamlit_folium import folium_static, st_folium

# NumPy for numerical operations
import numpy as np

# Plotly for creating interactive plots and charts
import plotly.graph_objects as go
import plotly.express as px

# Matplotlib for additional plotting utilities
import matplotlib.pyplot as plt
import matplotlib.colors as mcolors
from matplotlib.cm import get_cmap

# Joblib for model serialization
import joblib

# Hashlib for hashing access codes
import hashlib

# Math functions for distance calculations
from math import radians, sin, cos, sqrt, atan2

# OS for file operations (not used, can be removed if not needed)
import os


access_granted = False

st.set_page_config(layout="wide", page_title="Mussel Growth Trends Dashboard", page_icon="🐚")

def verify_access_code(input_code):
    """
    Verifies the access code provided by the user.

    This function hashes the input code using SHA256 and compares it to a pre-defined 
    hashed access code. It is used to ensure that only users with the correct access 
    code can proceed to use the application.

    Args:
        input_code (str): The access code input by the user.

    Returns:
        bool: True if the input code matches the pre-defined hashed access code, False otherwise.
    """
    # Hash the input code using SHA256
    hashed_input = hashlib.sha256(input_code.encode()).hexdigest()

    # Pre-defined hashed version of the correct access code
    # Note: To generate this hash for a new access code, use the following code snippet in a separate cell:
    # import hashlib
    # correct_code = 'your_desired_access_code'
    # print(hashlib.sha256(correct_code.encode()).hexdigest())
    hashed_access_code = 'd0601b8185961d8e3e0ea3bbbeca893e630abfa2f9617a5122a431f498bcfe6e'

    # Compare the hashed input code with the pre-defined hashed access code
    return hashed_input == hashed_access_code

def load_trained_model(file_path):
    """
    Loads the trained model from a file.

    Args:
        file_path (str): The path to the model file.

    Returns:
        model: The loaded model.
    """
    return joblib.load(file_path)

def load_data(file):
    """
    Loads the data from a CSV file.

    Args:
        file (str): The path to the CSV file.

    Returns:
        pd.DataFrame: The loaded DataFrame.
    """
    return pd.read_csv(file)

def create_sidebar(df):
    """
    Creates a sidebar in the Streamlit app for user settings, allowing selection of years and locations.

    Args:
        df (pd.DataFrame): The DataFrame containing the data.

    Returns:
        tuple: A tuple containing the selected year range and location selection.
    """
    # Set the title and header for the sidebar
    st.sidebar.title("Settings")
    
    # Add spacing
    st.sidebar.markdown("<br>", unsafe_allow_html=True)
    
    # Multiselect widget for selecting year(s)
    year_range = st.sidebar.multiselect(
        '📅 Select Year(s)',
        options=list(range(int(df['Year'].min()), int(df['Year'].max()) + 1)), 
        default=list(range(int(df['Year'].min()), int(df['Year'].max()) + 1))
    )
    
    # Instruction for year selection
    st.sidebar.markdown("""
    <div style="border: 1px solid #ccc; padding: 10px; border-radius: 5px; font-size: 12px; margin-top: -10px;">
    <b>Note</b>: Selecting more than one year will average the values for all selected years.
    </div>
    """, unsafe_allow_html=True)
    
    # Add spacing
    st.sidebar.markdown("<br>", unsafe_allow_html=True)
    
    # Prepare location options by combining 'System' and 'Plot Location'
    location_options = df[['System', 'Plot Location']].drop_duplicates()
    location_options['Location_System'] = location_options['System'] + ' - ' + location_options['Plot Location']
    
    # Multiselect widget for selecting locations
    location_selection = st.sidebar.multiselect(
        '📍 Select Locations',
        options=location_options['Location_System'].sort_values()
    )
    
    # Instruction for location selection
    st.sidebar.markdown("""
    <div style="border: 1px solid #ccc; padding: 10px; border-radius: 5px; font-size: 12px; margin-top: -10px;">
    <b>Note</b>: If no location is selected, the data will be averaged per system.
    Selecting more than one location will display data per location.
    </div>
    """, unsafe_allow_html=True)
    
    # Return the selected year range and location selection
    return year_range, location_selection

def display_main_map(df, year_range, location_selection):
    """
    Displays an interactive map with bubbles representing mussel growth data, colored by a selected feature and animated by month.

    Args:
        df (pd.DataFrame): The DataFrame containing the data.
        year_range (list): List of selected years.
        location_selection (list): List of selected locations.
    """
    # User selection for the feature to color the map bubbles
    selected_color_feature = st.selectbox('Select feature for bubble color', [
        'Precipitation', 'Sea Surface Temperature (C)', 'Chlorophyll (mg per m3)', 'Turbidity (FTU)'
    ], help="The color scale represents the selected feature, indicating its intensity.")
    
    # Add a custom information box
    st.markdown("""
        <div style="background-color: #fff3cd; padding: 10px; border-radius: 5px; color: #856404; font-size: 16px; margin-bottom: 20px;">
            <strong>Note</strong>: Color scale indicates the intensity of the selected feature, and bubble size represents the magnitude of growth (g per day).
        </div>
    """, unsafe_allow_html=True)

    # Filter the dataframe based on the selected years
    df_filtered = df[df['Year'].isin(year_range)]

    # If specific locations are selected, filter the dataframe accordingly
    if location_selection:
        df_filtered['Location_System'] = df_filtered['System'] + ' - ' + df_filtered['Plot Location']
        df_filtered = df_filtered[df_filtered['Location_System'].isin(location_selection)]

    # Exclude data for Monitoring Period 0
    df_filtered = df_filtered[df_filtered['Monitoring Period'] > 0]

    # Group the filtered data by specific columns and calculate the mean for each group
    df_grouped = df_filtered.groupby([
        'lat', 'lon', 'Plot Location', 'System', 'Month', 'Monitoring Period'
    ]).mean().round(2).reset_index()

    # Rename columns for better readability in the hover information
    df_grouped.rename(columns={'lat': 'Latitude', 'lon': 'Longitude'}, inplace=True)

    # Sort the grouped data by Monitoring Period
    df_grouped.sort_values(by='Monitoring Period', inplace=True)

    # Create the scatter mapbox figure using Plotly Express
    fig = px.scatter_mapbox(
        df_grouped,
        lat="Latitude",
        lon="Longitude",
        color=selected_color_feature,
        size="Growth (g per day)",
        size_max=25,
        hover_name="Plot Location",
        hover_data={
            "System": True,
            "Latitude": False,
            "Longitude": False,
            selected_color_feature: True,
            "Growth (g per day)": True,
            "Month": True
        },
        animation_frame="Monitoring Period",
        mapbox_style="carto-positron",
        color_continuous_scale=px.colors.sequential.Viridis,
        zoom=7,
        center={"lat": df['lat'].mean(), "lon": df['lon'].mean()}
    )

    # Update the layout of the figure
    fig.update_layout(
        margin={"r": 0, "t": 0, "l": 0, "b": 0},
        mapbox=dict(bearing=0, pitch=0),
        coloraxis_colorbar=dict(title=selected_color_feature),
        height=800
    )

    # Display the figure using Streamlit
    st.plotly_chart(fig, use_container_width=True)
    
def display_graphs(df, year_range, location_selection):
    """
    Displays interactive graphs showing various metrics over time.

    Args:
        df (pd.DataFrame): The DataFrame containing the data.
        year_range (list): List of selected years.
        location_selection (list): List of selected locations.
    """
    st.header("📈 Graphs")  # Add a header for the graphs section
    
    # Define the metrics to be plotted, ordered by importance
    metrics = ['Growth (g per day)', 'Chlorophyll (mg per m3)', 'Sea Surface Temperature (C)', 
               'Turbidity (FTU)', 'Precipitation', 'Individual Weight (g)', 
               'Depth (m)', 'Average Flow Speed (mps)', 'Maximum Flow Speed (mps)', 'Living Mussel Count']

    # Create two columns for displaying the plots
    col1, col2 = st.columns(2)
    
    # Define colors for different systems
    system_colors = {'OS': 'orange', 'WAD': 'green'}
    
    # Create a colormap for location colors
    cmap = plt.get_cmap('tab10')
    location_colors = [mcolors.to_hex(cmap(i)) for i in np.linspace(0, 1, len(location_selection))]

    # Filter the dataframe based on the selected years
    df_filtered = df[df['Year'].isin(year_range)]

    # Initialize a counter to alternate between columns
    counter = 0

    # Loop through each metric to create a plot
    for metric in metrics:
        fig = go.Figure()  # Create a new figure for each metric
        added_systems = set()  # Track systems that have been added to the plot

        # If specific locations are selected, plot each location's data
        if location_selection:
            # Create a dictionary to map locations to colors
            location_colors_dict = {location: color for location, color in zip(location_selection, location_colors)}
            
            # Loop through each selected location
            for location in location_selection:
                if ' - ' in location:
                    # Split the location string into system and plot location
                    system, plot_location = location.split(' - ', 1)
                    
                    # Filter the data for the specific system and plot location
                    df_location = df_filtered[(df_filtered['System'] == system) & 
                                              (df_filtered['Plot Location'] == plot_location)].dropna()
                    
                    # Group by 'Month' and 'Monitoring Period', and calculate the mean for the metric
                    df_location_avg = df_location.groupby(['Month', 'Monitoring Period'])[metric].mean().reset_index()
                    
                    # Sort the grouped data by 'Monitoring Period'
                    df_location_avg.sort_values(by='Monitoring Period', inplace=True)

                    # Add a trace for the location's data
                    fig.add_trace(go.Scatter(x=df_location_avg['Month'], y=df_location_avg[metric], 
                                             mode='lines+markers', 
                                             name=f'Average at {plot_location}', 
                                             line=dict(shape='spline', color=location_colors_dict[location]),
                                             marker=dict(symbol='circle')))
                    
                    # Add system average line if not already added
                    if system not in added_systems:
                        df_system_avg = df_filtered[df_filtered['System'] == system].groupby(['Month', 'Monitoring Period'])[metric].mean().reset_index()
                        df_system_avg.sort_values(by='Monitoring Period', inplace=True)
                        fig.add_trace(go.Scatter(x=df_system_avg['Month'], y=df_system_avg[metric], 
                                                 mode='lines+markers', 
                                                 line=dict(dash='dash', shape='spline', color=system_colors[system]), 
                                                 name=f'Average in {system} System',
                                                 marker=dict(symbol='circle')))
                        added_systems.add(system)
                else:
                    st.warning(f"Invalid location format: {location}")
        else:
            # If no specific locations are selected, plot each system's average data
            for system in df['System'].unique():
                df_system_avg = df_filtered[df_filtered['System'] == system].groupby(['Month', 'Monitoring Period'])[metric].mean().reset_index()
                df_system_avg.sort_values(by='Monitoring Period', inplace=True)
                fig.add_trace(go.Scatter(x=df_system_avg['Month'], y=df_system_avg[metric], 
                                         mode='lines+markers', 
                                         name=f'Average {system}', 
                                         line=dict(shape='spline', color=system_colors[system]),
                                         marker=dict(symbol='circle')))

        # Update the layout of the figure
        fig.update_layout(template="plotly_dark", title=f"{metric} Trends - {', '.join(map(str, year_range))}")
        
        # Alternate between the two columns for displaying the charts
        with (col1 if counter % 2 == 0 else col2):
            st.plotly_chart(fig, use_container_width=True)
        
        counter += 1  # Increment the counter to alternate columns
            
def display_feature_vs_target_analysis(df, year_range, location_selection):
    """
    Displays an interactive graph comparing a selected feature against Growth (g per day).

    Args:
        df (pd.DataFrame): The DataFrame containing the data.
        year_range (list): List of selected years.
        location_selection (list): List of selected locations.
    """
    st.header("🔍 Growth (g per day) Feature Analysis")  # Header for the analysis section
    
    # Define the available metrics
    metrics = ['Growth (g per day)', 'Chlorophyll (mg per m3)', 'Sea Surface Temperature (C)', 
               'Turbidity (FTU)', 'Precipitation', 'Individual Weight (g)', 
               'Depth (m)', 'Average Flow Speed (mps)', 'Maximum Flow Speed (mps)', 'Living Mussel Count']
    
    # Create a list of feature options excluding 'Growth (g per day)'
    feature_options = [metric for metric in metrics if metric != 'Growth (g per day)']
    
    # Selectbox for the user to choose a feature to plot against 'Growth (g per day)'
    selected_feature = st.selectbox('Select a feature to plot against Growth (g per day)', options=feature_options)

    if selected_feature:
        # Filter the dataframe based on the selected years
        df_filtered = df[df['Year'].isin(year_range)]
        
        # If specific locations are selected, filter based on those locations
        if location_selection:
            df_filtered['Location_System'] = df_filtered['System'] + ' - ' + df_filtered['Plot Location']
            df_filtered = df_filtered[df_filtered['Location_System'].isin(location_selection)]
        
        # Create a new figure for plotting
        fig = go.Figure()

        if location_selection:
            # Loop through each selected location
            for location in location_selection:
                if ' - ' in location:
                    # Split the location string into system and plot location
                    system, plot_location = location.split(' - ', 1)
                    
                    # Filter the data for the specific system and plot location
                    df_location = df_filtered[(df_filtered['System'] == system) & 
                                              (df_filtered['Plot Location'] == plot_location)].dropna()
                    
                    # Group by 'Month' and 'Monitoring Period', and calculate the mean for the selected feature and target
                    df_grouped = df_location.groupby(['Month', 'Monitoring Period']).mean().reset_index()
                    
                    # Sort the grouped data by 'Monitoring Period'
                    df_grouped.sort_values(by='Monitoring Period', inplace=True)

                    # Add a trace for the selected feature
                    fig.add_trace(go.Scatter(x=df_grouped['Month'], y=df_grouped[selected_feature], mode='lines+markers', 
                                             name=f'{selected_feature} at {plot_location}', yaxis='y1', line_shape='spline'))
                    
                    # Add a trace for 'Growth (g per day)'
                    fig.add_trace(go.Scatter(x=df_grouped['Month'], y=df_grouped['Growth (g per day)'], mode='lines+markers', 
                                             name=f'Growth (g per day) at {plot_location}', line=dict(width=4, dash='dash'), yaxis='y2', line_shape='spline'))
        else:
            # If no specific locations are selected, plot the data for each system
            for system in df['System'].unique():
                df_system = df_filtered[df_filtered['System'] == system].dropna()
                
                # Group by 'Month' and 'Monitoring Period', and calculate the mean for the selected feature and target
                df_grouped = df_system.groupby(['Month', 'Monitoring Period']).mean().reset_index()
                
                # Sort the grouped data by 'Monitoring Period'
                df_grouped.sort_values(by='Monitoring Period', inplace=True)

                # Add a trace for the selected feature
                fig.add_trace(go.Scatter(x=df_grouped['Month'], y=df_grouped[selected_feature], mode='lines+markers', 
                                         name=f'{selected_feature} in {system} System', yaxis='y1', line_shape='spline'))
                
                # Add a trace for 'Growth (g per day)'
                fig.add_trace(go.Scatter(x=df_grouped['Month'], y=df_grouped['Growth (g per day)'], mode='lines+markers', 
                                         name=f'Growth (g per day) in {system} System', line=dict(width=4, dash='dash'), yaxis='y2', line_shape='spline'))

        # Update the layout of the figure
        fig.update_layout(
            title=f"{selected_feature} vs Growth (g per day)",
            xaxis_title='Month',
            yaxis=dict(title=f'{selected_feature} Values', side='left'),
            yaxis2=dict(title='Growth (g per day)', overlaying='y', side='right'),
            height=600,
            template="plotly_dark"
        )

        # Display the figure
        st.plotly_chart(fig, use_container_width=True)
        
def display_location_rankings(df):
    """
    Displays a ranking table of locations based on the average of selected metrics over all years.

    Args:
        df (pd.DataFrame): The DataFrame containing the data.
    """
    # Select the metrics to rank locations
    metrics = ['Growth (g per day)', 'Precipitation', 'Sea Surface Temperature (C)', 'Chlorophyll (mg per m3)', 'Turbidity (FTU)']
    
    # Calculate the average of each metric for each location and system
    df_rankings = df.groupby(['Plot Location', 'System'])[metrics].mean().round(2).reset_index()
    
    # Sort by 'Growth (g per day)' in descending order
    df_rankings = df_rankings.sort_values(by='Growth (g per day)', ascending=False).reset_index(drop=True)

    # Display the rankings table with a header
    st.header("📊 Location Rankings Based on Average Metrics")
    
    # Add a custom information box
    st.markdown("""
        <div style="background-color: #fff3cd; padding: 10px; border-radius: 5px; color: #856404; font-size: 16px; margin-bottom: 20px;">
            <strong>Note</strong>: The table below shows the ranking of locations based on the average of selected metrics over all years.
        </div>
    """, unsafe_allow_html=True)
    
    # Display the dataframe as a table
    st.dataframe(df_rankings, use_container_width=True)

def haversine(latitude1, longitude1, latitude2, longitude2):
    """
    Calculate the great-circle distance between two points 
    on the Earth's surface given their latitude and longitude.
    
    Parameters:
    - latitude1: Latitude of the first point in decimal degrees.
    - longitude1: Longitude of the first point in decimal degrees.
    - latitude2: Latitude of the second point in decimal degrees.
    - longitude2: Longitude of the second point in decimal degrees.
    
    Returns:
    - distance: The distance between the two points in kilometers.
    """
    # Radius of the Earth in kilometers
    earth_radius_km = 6371.0

    # Convert latitude and longitude from degrees to radians
    delta_latitude = radians(latitude2 - latitude1)
    delta_longitude = radians(longitude2 - longitude1)
    
    # Apply the Haversine formula to calculate the great-circle distance
    a = sin(delta_latitude / 2)**2 + cos(radians(latitude1)) * cos(radians(latitude2)) * sin(delta_longitude / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))
    
    # Calculate the distance
    distance = earth_radius_km * c

    return distance

def get_nearest_environmental_data(latitude, longitude, dataframe, month):
    """
    Find the nearest environmental data point to a given latitude and longitude for a specific month.

    Parameters:
    - latitude: Latitude of the target location in decimal degrees.
    - longitude: Longitude of the target location in decimal degrees.
    - dataframe: DataFrame containing the environmental data with 'lat' and 'lon' columns.
    - month: The month for which to filter the data.

    Returns:
    - nearest_row: The row from the DataFrame that is closest to the given location for the specified month.
    """
    global df_environment  # Add this line
    
    # Filter the dataframe for the specified month
    monthly_data = dataframe[dataframe['month'] == month].copy()

    # Calculate the distance from the target location for each row in the filtered dataframe
    monthly_data['distance'] = monthly_data.apply(
        lambda row: haversine(latitude, longitude, row['lat'], row['lon']), axis=1
    )

    # Find the row with the minimum distance
    nearest_row = monthly_data.loc[monthly_data['distance'].idxmin()]
    
    return nearest_row

def calculate_average_ash_free_dry_weight(monitoring_period):
    """
    Calculate the average Ash Free Dry Weight (g) for the specified monitoring period.

    Parameters:
    - monitoring_period: The monitoring period for which to calculate the average weight.

    Returns:
    - average_weight: The average Ash Free Dry Weight (g) for the specified period.
    """
    # Filter the dataframe for the specified monitoring period
    period_data = df_modeling[df_modeling['Monitoring Period'] == monitoring_period]
    
    # Calculate the mean of 'Ash Free Dry Weight (g)_lag' for the filtered data
    average_weight = period_data['Ash Free Dry Weight (g)_lag'].mean()
    
    return average_weight

def predict_growth(latitude, longitude, initial_weight):
    """
    Predict mussel growth based on user inputs and additional environmental features.

    Parameters:
    - latitude: Latitude of the location.
    - longitude: Longitude of the location.
    - initial_weight: Initial weight of the mussel.

    Returns:
    - predictions: DataFrame with columns 'Month' and 'Growth (g per day)'.
    """
    # Get nearest environmental data for April (lag) and May (current)
    env_data_april = get_nearest_environmental_data(latitude, longitude, df_environment, 4)
    env_data_may = get_nearest_environmental_data(latitude, longitude, df_environment, 5)

    # Create input data dictionary for prediction
    input_data = {
        'Chlorophyll': [env_data_may.get('chlorophyll', 0)],
        'Water Temperature (C)': [env_data_may.get('sst', 0)],
        'Water Temperature (C)_lag': [env_data_april.get('sst', 0)],
        'Ash Free Dry Weight (g)_lag': [calculate_average_ash_free_dry_weight(0)],
        'Number of Days': [30],
        'Precipitation_lag': [env_data_april.get('precipitation', 0)],
        'Turbidity (FTU)': [env_data_may.get('turbidity', 0)],
        'Turbidity (FTU)_lag': [env_data_april.get('turbidity', 0)],
        'Monitoring Period': [0],
        'Individual Weight (g)_lag': [initial_weight],
        'Chlorophyll_lag': [env_data_april.get('chlorophyll', 0)]
    }

    # Convert input data dictionary to DataFrame
    input_df = pd.DataFrame(input_data)
    predictions = []

    # Perform predictions for each monitoring period from May to October
    for period in range(1, 7):
        # Predict growth using the loaded model
        prediction = loaded_model.predict(input_df[[
            'Chlorophyll', 'Water Temperature (C)', 'Water Temperature (C)_lag', 
            'Ash Free Dry Weight (g)_lag', 'Number of Days', 'Precipitation_lag', 
            'Turbidity (FTU)', 'Turbidity (FTU)_lag', 'Monitoring Period', 
            'Individual Weight (g)_lag', 'Chlorophyll_lag']])[0]
        predictions.append(prediction)
        
        # Update input data for the next period
        input_df['Monitoring Period'] = period
        input_df['Growth (g per day)'] = prediction
        input_df['Ash Free Dry Weight (g)_lag'] = calculate_average_ash_free_dry_weight(period)

    # Create a DataFrame for predictions with 'Month' and 'Growth (g per day)'
    months = ['May', 'June', 'July', 'August', 'September', 'October']
    prediction_df = pd.DataFrame({
        'Month': months,
        'Growth (g per day)': predictions
    })

    return prediction_df

def display_prediction_interface():
    """
    Displays the prediction interface for mussel growth. Allows users to input latitude, longitude, 
    and individual weight, then predicts growth based on these inputs.
    """
    st.header("🔮 Predict Mussel Growth")

    # Custom warning box for model disclaimer
    st.markdown("""
        <div style="background-color: #fff3cd; padding: 10px; border-radius: 5px; color: #856404; font-size: 16px; margin-bottom: 20px;">
            <strong>Disclaimer</strong>: The predictions provided here are based on a trained model. 
            While the model has been developed and tested with historical data, 
            the predictions are not guaranteed to be 100% accurate. 
            Please use these predictions as a guideline and not as a definitive forecast.
        </div>
    """, unsafe_allow_html=True)

    # Initial coordinates for Amsterdam
    initial_lat = 52.3676
    initial_lon = 4.9041

    # User inputs for prediction
    lat = st.number_input('Latitude', value=initial_lat, format="%.6f")
    lon = st.number_input('Longitude', value=initial_lon, format="%.6f")
    individual_weight = st.number_input('Individual Weight (g)', value=2.0, min_value=0.1)

    # Button click state initialization
    if 'predict_clicked' not in st.session_state:
        st.session_state['predict_clicked'] = False

    button_placeholder = st.empty()  # Placeholder for the predict button

    # Display the predict button if prediction has not been clicked yet
    if not st.session_state['predict_clicked']:
        if button_placeholder.button("Predict Growth", key="predict"):
            st.session_state['predict_clicked'] = True  # Update state to indicate button was clicked
            with st.spinner('Predicting...'):
                # Perform prediction
                predictions_df = predict_growth(lat, lon, individual_weight)
                st.session_state['predictions'] = predictions_df  # Store predictions in session state

    # Display predictions and plot if available
    if st.session_state['predict_clicked'] and 'predictions' in st.session_state:
        st.write(st.session_state['predictions'])  # Display predictions DataFrame

        # Create a plot of the predictions
        fig = go.Figure()
        fig.add_trace(go.Scatter(
            x=st.session_state['predictions']['Month'], 
            y=st.session_state['predictions']['Growth (g per day)'], 
            mode='lines+markers', 
            line_shape='spline'
        ))
        fig.update_layout(
            title="Predicted Growth (g per day) for Each Month", 
            xaxis_title="Month", 
            yaxis_title="Growth (g per day)"
        )
        st.plotly_chart(fig, use_container_width=True)

        # Reset the state for the next prediction
        st.session_state['predict_clicked'] = False
        st.session_state.pop('predictions', None)
        button_placeholder.button("Predict Growth", key="predict_new")
        
def display_data(df):
    """
    Main function to display the interactive map, graphs, and prediction interface in the Streamlit app.

    Args:
        df (pd.DataFrame): The DataFrame containing the data.
    """

    # Create the sidebar and get the user's selections
    year_range, location_selection = create_sidebar(df)

    # Apply custom styling using CSS for font family
    st.markdown("""
    <style>
    @import url('https://fonts.googleapis.com/css2?family=Inter&display=swap');
    h1, h2, h3, h4, h5, h6, p {
        font-family: 'Inter', sans-serif;
    }
    </style>
    """, unsafe_allow_html=True)

    # Markdown block to provide an overview and instructions for the app
    st.markdown("""
    > This dashboard visualizes mussel growth data from various locations in the Netherlands. 
    > Use the sidebar to filter data by year and location.
    > The map shows mussel baskets with different colors representing different systems. 
    > Charts below display detailed growth trends. Hover over them for more information.
    """)

    # Display header for the map section
    st.header("🌍 Map")

    # Display the main interactive map
    display_main_map(df, year_range, location_selection)
    
    # Add a divider for better visual separation
    st.divider()

    # Display graphs for various metrics over time
    display_graphs(df, year_range, location_selection)

    # Add another divider
    st.divider()

    # Display analysis comparing selected features against Growth (g per day)
    display_feature_vs_target_analysis(df, year_range, location_selection)
    
    # Add another divider
    st.divider()

    # Display location rankings
    display_location_rankings(df)
    
    # Add a final divider
    st.divider()
    
    # Display the prediction interface for predicting mussel growth
    display_prediction_interface()

def main():
    """
    Main function to run the Streamlit app for Mussel Growth Trends Dashboard.
    """
    # Set the title of the Streamlit app
    st.title('🦪 Mussel Growth Trends Dashboard'.upper())

    # Check if 'access_granted' exists in the session state, if not, initialize it as False
    if "access_granted" not in st.session_state:
        st.session_state.access_granted = False

    # Access control: If access has not been granted
    if not st.session_state.access_granted:
        # Create three columns for layout, with the middle column for the access code input
        cols = st.columns([1, 2, 1])
        with cols[1]:
            # Create a password input field in the middle column for the access code
            access_code = st.text_input("Enter access code", type="password")
            # Create a submit button in the middle column
            if st.button("Submit"):
                # Verify the access code
                if verify_access_code(access_code):
                    # If the access code is correct, set 'access_granted' to True and refresh the page
                    st.session_state.access_granted = True
                    st.experimental_rerun()  # Refresh the page to reload with granted access
                else:
                    # If the access code is incorrect, display an error message
                    st.error("Access denied. Please enter the correct access code.")
            else:
                # If the submit button is not clicked, display a message prompting for the access code
                st.markdown("""
                > This dashboard allows you to explore the growth trends of mussels across various locations and periods. 
                > Please enter the access code to continue.
                """)
    else:
        # If access is granted, proceed to load necessary files
        global loaded_model, df_environment, df_modeling  # Define as global variables

        # Load the trained model
        loaded_model = load_trained_model('./Data/final_ridge_model.pkl')
        # Load the combined data
        df_environment = load_data('./Data/combined_data.csv')
        
        # Load the main data
        df_modeling = load_data('./Data/It3 - Mussel + SatML + Weather (Lag Features).csv')

        # Drop 'Plot Location' column from the main DataFrame as it is not needed
        df_modeling.drop(columns=['Plot Location'], inplace=True)

        # Load the main dataset and ensure the 'Year' column is of type int
        df = load_data('./Data/Mussel + Satellite + Weather.csv')
        df['Year'] = df['Year'].astype(int)

        # Display the data and visualizations in the app
        display_data(df)

if __name__ == '__main__':
    main()

Overwriting mussel_visualization.py


##### *Get Access Code Hashing*

In [17]:
import hashlib

# Define the access code
access_code = '1'

# Hash the access code using SHA-256
hashed_access_code = hashlib.sha256(access_code.encode()).hexdigest()

# Print the hashed access code
print(hashed_access_code)

6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b


## Model Testing

#### **Environmental Data**

In [None]:
import pandas as pd
import numpy as np

# Function to load and process feather files for a specific month
def load_and_process_feather(file_path, system, variable_name, month):
    # Load data
    df = pd.read_feather(file_path)
    # Convert date column to datetime
    df['date'] = pd.to_datetime(df['date'])
    # Filter data for the specified month
    df_filtered = df[df['date'].dt.month == month]
    # Calculate median values for each location, handling NaN values
    df_median = df_filtered.groupby(['y', 'x'])['value'].median().reset_index()
    # Rename columns
    df_median.columns = ['lat', 'lon', f'{variable_name}_{system}']
    df_median['month'] = month
    return df_median

# Function to load and process precipitation data for a specific month
def load_and_process_precipitation(file_path, system, month):
    df = pd.read_csv(file_path)
    df['date'] = pd.to_datetime(df['YYYYMMDD'], format='%Y%m%d')
    df_filtered = df[df['date'].dt.month == month]
    # Calculate median precipitation
    df_median = df_filtered.groupby('YYYYMMDD')['RD'].median().mean()
    # Create a DataFrame with system-wide median precipitation
    df_median_df = pd.DataFrame({
        'lat': [np.nan],  # Placeholder, as we do not have specific lat/lon
        'lon': [np.nan],  # Placeholder, as we do not have specific lat/lon
        f'median_precipitation_{system}': [df_median],
        'month': [month]
    })
    return df_median_df

# Load and process data for WAD and OS systems for April (lag) and May (current)
months = [4, 5]
combined_data = []

for month in months:
    chlorophyll_wad = load_and_process_feather('../Mussels Notebook/Mussels Data/Parsed TIF Data/tifs_Wadden_Sea_chl_prepped.feather', 'WAD', 'chlorophyll', month)
    chlorophyll_os = load_and_process_feather('../Mussels Notebook/Mussels Data/Parsed TIF Data/tifs_Eastern_Scheldt_chl_prepped.feather', 'OS', 'chlorophyll', month)
    sst_wad = load_and_process_feather('../Mussels Notebook/Mussels Data/Parsed TIF Data/tifs_WAD_sst_prepped.feather', 'WAD', 'sst', month)
    sst_os = load_and_process_feather('../Mussels Notebook/Mussels Data/Parsed TIF Data/tifs_OS_sst_prepped.feather', 'OS', 'sst', month)
    turbidity_wad = load_and_process_feather('../Mussels Notebook/Mussels Data/Parsed TIF Data/tifs_Wadden_Sea_tsm_prepped.feather', 'WAD', 'turbidity', month)
    turbidity_os = load_and_process_feather('../Mussels Notebook/Mussels Data/Parsed TIF Data/tifs_Eastern_Scheldt_tsm_prepped.feather', 'OS', 'turbidity', month)
    precipitation_wad = load_and_process_precipitation('../Mussels Notebook/Mussels Data/weatherdata-WAD.csv', 'WAD', month)
    precipitation_os = load_and_process_precipitation('../Mussels Notebook/Mussels Data/weatherdata-OS.csv', 'OS', month)

    combined_data.append(chlorophyll_wad)
    combined_data.append(chlorophyll_os)
    combined_data.append(sst_wad)
    combined_data.append(sst_os)
    combined_data.append(turbidity_wad)
    combined_data.append(turbidity_os)
    combined_data.append(precipitation_wad)
    combined_data.append(precipitation_os)

# Combine all data into a single DataFrame
combined_df = pd.concat(combined_data, ignore_index=True)

# Pivot precipitation data to have lat and lon columns
combined_df['lat'] = combined_df['lat'].fillna(combined_df['lon'])
combined_df['lon'] = combined_df['lon'].fillna(combined_df['lat'])
combined_df = combined_df.groupby(['lat', 'lon', 'month']).first().reset_index()

# Save the combined dataset to CSV in the specified folder
combined_df.to_csv('./Data/combined_data.csv', index=False)

print("Combined dataset saved to CSV in the './Data/' folder.")

In [2]:
import pandas as pd

df = pd.read_csv('./Data/combined_data.csv')

In [3]:
df.head()

Unnamed: 0,lat,lon,month,chlorophyll_WAD,chlorophyll_OS,sst_WAD,sst_OS,turbidity_WAD,turbidity_OS,median_precipitation_WAD,median_precipitation_OS
0,51.409999,3.659997,4,,,,9.887507,,,,
1,51.409999,3.659997,5,,,,13.236994,,,,
2,51.409999,3.669997,4,,,,9.889994,,,,
3,51.409999,3.669997,5,,,,13.241999,,,,
4,51.409999,3.679997,4,,,,9.891504,,,,


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4689409 entries, 0 to 4689408
Data columns (total 11 columns):
 #   Column                    Dtype  
---  ------                    -----  
 0   lat                       float64
 1   lon                       float64
 2   month                     int64  
 3   chlorophyll_WAD           float64
 4   chlorophyll_OS            float64
 5   sst_WAD                   float64
 6   sst_OS                    float64
 7   turbidity_WAD             float64
 8   turbidity_OS              float64
 9   median_precipitation_WAD  float64
 10  median_precipitation_OS   float64
dtypes: float64(10), int64(1)
memory usage: 393.6 MB


In [6]:
df['month'].value_counts()

4    2432617
5    2256792
Name: month, dtype: int64

In [21]:
# Split the DataFrame for month 4 and month 5
df_month_4 = df[df['month'] == 4]
df_month_5 = df[df['month'] == 5]

# Reduce the DataFrame size by half for each month
df_month_4_reduced = df_month_4.sample(frac=0.02, random_state=42)
df_month_5_reduced = df_month_5.sample(frac=0.02, random_state=42)

# Concatenate the reduced DataFrames back together
df_reduced = pd.concat([df_month_4_reduced, df_month_5_reduced], ignore_index=True)

print("Original DataFrame size:", df.shape)
print("Reduced DataFrame size:", df_reduced.shape)

Original DataFrame size: (4689409, 11)
Reduced DataFrame size: (93788, 11)


In [22]:
df_reduced.to_csv('./Data/combined_data.csv', index=False)

### **Model**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import KFold
from sklearn.metrics import mean_absolute_error
import joblib
from sklearn.pipeline import Pipeline

In [None]:
def evaluate_ridge(df, features, target, alpha=1.0, n_splits=5):
    X = df[features]
    y = df[target]
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X)
    
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
    test_mae_sum = 0

    for train_idx, test_idx in kf.split(X):
        X_train, X_test = X_scaled[train_idx], X_scaled[test_idx]
        y_train, y_test = y.iloc[train_idx], y.iloc[test_idx]

        model = Ridge(alpha=alpha)
        model.fit(X_train, y_train)
        y_pred = model.predict(X_test)
        test_mae_sum += mean_absolute_error(y_test, y_pred)

    avg_test_mae = test_mae_sum / n_splits
    return avg_test_mae

In [None]:
def feature_selection(df, target):
    features = [col for col in df.columns if col != target and col not in ['lat', 'lon']]
    selected_features = []
    mae_results = []

    while features:
        mae_list = []
        for feature in features:
            current_features = selected_features + [feature]
            avg_test_mae = evaluate_ridge(df, current_features, target)
            mae_list.append((feature, avg_test_mae))
        
        best_feature, best_mae = min(mae_list, key=lambda x: x[1])
        selected_features.append(best_feature)
        features.remove(best_feature)
        mae_results.append((selected_features[:], best_mae))
        print(f'Selected Features: {selected_features}, MAE: {best_mae}')

    best_combination = min(mae_results, key=lambda x: x[1])
    best_features, best_mae = best_combination
    print(f'Final selected features: {best_features}')
    print(f'Final MAE with the selected features: {best_mae}')

    # Plotting the results
    fig, ax = plt.subplots(figsize=(10, 6))
    mae_values = [mae for _, mae in mae_results]
    best_combination_idx = mae_values.index(min(mae_values))

    ax.plot(range(1, len(mae_values) + 1), mae_values, marker='o', label='MAE')
    ax.axvline(x=best_combination_idx + 1, color='green', linestyle='--', label=f'Best combination: {best_combination_idx + 1} features')

    plt.xticks(range(1, len(mae_values) + 1))
    plt.xlabel('Number of Features')
    plt.ylabel('Average Test MAE')
    plt.title('Feature Selection based on Ridge Regression MAE')
    plt.legend()
    plt.tight_layout()
    plt.grid(True)
    plt.show()

    return best_features, best_mae, mae_results

In [None]:
# Function for alpha testing
def alpha_testing(df, features, target):
    scaler = StandardScaler()
    kf = KFold(n_splits=5, shuffle=True, random_state=42)
    alphas = np.logspace(-4, 4, num=20)
    train_mae = []
    test_mae = []

    for alpha in alphas:
        model = Ridge(alpha=alpha)
        test_mae_sum = 0

        for train_idx, test_idx in kf.split(df):
            X_train, X_test = df.iloc[train_idx][features], df.iloc[test_idx][features]
            y_train, y_test = df.iloc[train_idx][target], df.iloc[test_idx][target]

            X_train_scaled = scaler.fit_transform(X_train)
            X_test_scaled = scaler.transform(X_test)

            model.fit(X_train_scaled, y_train)
            y_pred = model.predict(X_test_scaled)

            test_mae_sum += mean_absolute_error(y_test, y_pred)

        avg_test_mae = test_mae_sum / kf.n_splits
        test_mae.append(avg_test_mae)

        model.fit(scaler.fit_transform(df[features]), df[target])
        train_mae.append(mean_absolute_error(df[target], model.predict(scaler.transform(df[features]))))

    # Plotting the results
    fig, ax = plt.subplots(figsize=(10, 5))  # Adjusted figure size
    ax.plot(alphas, train_mae, color='blue', label='Train MAE', marker='o')
    ax.plot(alphas, test_mae, color='red', label='Test MAE', marker='o')
    ax.set_xscale('log')
    ax.set_xlabel('Alpha of the Ridge model')
    ax.set_ylabel('Mean Absolute Error (MAE)')
    ax.set_title('MAE vs. Alpha for Ridge Model')
    ax.legend(loc='best')
    ax.grid(True, which='both', linestyle='--', linewidth=0.5)

    # Highlight the best alpha
    min_mae_alpha = alphas[np.argmin(test_mae)]
    ax.axvline(x=min_mae_alpha, color='green', linestyle='--', label=f'Best alpha: {min_mae_alpha:.4f}')
    ax.text(min_mae_alpha, min(test_mae) + 0.0005, f'Best alpha: {min_mae_alpha:.4f}',
            verticalalignment='bottom', horizontalalignment='right', color='green', fontsize=10)

    plt.subplots_adjust(bottom=0.3, top=0.8)  # Adjust margins
    plt.show()

    return min_mae_alpha, min(test_mae)

In [None]:
# Memuat data utama
df_main = pd.read_csv('./Data/It3 - Mussel + SatML + Weather (Lag Features).csv')

# Menghapus kolom 'Plot Location' yang tidak digunakan
df_main.drop(columns=['Plot Location'], inplace=True)

In [None]:
df_main.head(10)

In [None]:
df_main.columns

In [None]:
# Seleksi fitur
target = 'Growth (g per day)'
best_features, best_mae, mae_results = feature_selection(df_main, target)

In [None]:
print(f'Best feature combination: {best_features}')
print(f'Lowest MAE: {best_mae}')

In [None]:
# Alpha testing with the best features
best_alpha, min_mae = alpha_testing(df_main, best_features, target)

In [None]:
print(f'Best alpha: {best_alpha}')
print(f'Minimum MAE: {min_mae}')

In [None]:
# Membaca data
df_main = pd.read_csv('./Data/It3 - Mussel + SatML + Weather (Lag Features).csv')

# Fitur dan parameter terbaik berdasarkan hasil sebelumnya
best_features = ['Chlorophyll', 'Water Temperature (C)', 'Water Temperature (C)_lag', 'Ash Free Dry Weight (g)_lag',
                 'Number of Days', 'Precipitation_lag', 'Turbidity (FTU)', 'Turbidity (FTU)_lag', 'Monitoring Period',
                 'Individual Weight (g)_lag', 'Chlorophyll_lag']
best_alpha = 1.623776739188721
target = 'Growth (g per day)'

def evaluate_ridge(df, features, target, alpha, n_splits=5):
    kf = KFold(n_splits=n_splits, shuffle=True, random_state=42)
    scaler = StandardScaler()
    model = Ridge(alpha=alpha)
    test_mae_sum = 0

    for train_idx, test_idx in kf.split(df):
        X_train, X_test = df.iloc[train_idx][features], df.iloc[test_idx][features]
        y_train, y_test = df.iloc[train_idx][target], df.iloc[test_idx][target]

        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)

        model.fit(X_train_scaled, y_train)
        y_pred = model.predict(X_test_scaled)

        test_mae_sum += mean_absolute_error(y_test, y_pred)

    avg_test_mae = test_mae_sum / kf.n_splits
    return avg_test_mae

# Melatih dan mengevaluasi model
avg_mae = evaluate_ridge(df_main, best_features, target, best_alpha)
print(f'Average Test MAE: {avg_mae}')

# Membangun pipeline dengan scaler dan model Ridge
final_model_pipeline = Pipeline([
    ('scaler', StandardScaler()),
    ('ridge', Ridge(alpha=best_alpha))
])

# Melatih model dengan data lengkap
final_model_pipeline.fit(df_main[best_features], df_main[target])

# Menyimpan model ke file
model_file = './Data/final_ridge_model.pkl'
joblib.dump(final_model_pipeline, model_file)

print(f'Model disimpan ke {model_file}')

In [None]:
# Load the trained model from the file
loaded_model = joblib.load('./Data/final_ridge_model.pkl')

In [None]:
# Load combined data
df_combined = pd.read_csv('./Data/combined_data.csv')

In [None]:
df_combined.head()

In [23]:
from math import radians, sin, cos, sqrt, atan2

def haversine(lat1, lon1, lat2, lon2):
    R = 6371.0  # Radius of the Earth in kilometers

    dlat = radians(lat2 - lat1)
    dlon = radians(lon2 - lon1)
    a = sin(dlat / 2)**2 + cos(radians(lat1)) * cos(radians(lat2)) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1 - a))
    distance = R * c

    return distance

In [None]:
def get_nearest_environmental_data(lat, lon, df, month):
    # Filter by month
    df_month = df[df['month'] == month].copy()
    # Calculate distances to all points in the dataset
    df_month['distance'] = df_month.apply(lambda row: haversine(lat, lon, row['lat'], row['lon']), axis=1)
    nearest_row = df_month.loc[df_month['distance'].idxmin()]
    return nearest_row

In [None]:
def get_average_ash_free_dry_weight(period):
    # Calculate the average Ash Free Dry Weight (g) for the specified monitoring period
    return df_main[df_main['Monitoring Period'] == period]['Ash Free Dry Weight (g)_lag'].mean()

In [None]:
def predict_growth(lat, lon, individual_weight, depth, avg_flow_speed, max_flow_speed):
    """
    Predict mussel growth based on user inputs and additional environmental features.

    Parameters:
    - lat: Latitude of the location.
    - lon: Longitude of the location.
    - individual_weight: Initial weight of the mussel.
    - depth: Depth of the location.
    - avg_flow_speed: Average flow speed of the water.
    - max_flow_speed: Maximum flow speed of the water.

    Returns:
    - predictions: DataFrame with columns 'Month' and 'Growth (g per day)'.
    """
    # Get nearest environmental data based on lat and lon for April (lag) and May (current)
    env_data_april = get_nearest_environmental_data(lat, lon, df_combined, 4)
    env_data_may = get_nearest_environmental_data(lat, lon, df_combined, 5)
    
    # Create input data frame for prediction
    input_data = {
        'Chlorophyll': [env_data_may.get('chlorophyll', 0)],
        'Water Temperature (C)': [env_data_may.get('sst', 0)],
        'Water Temperature (C)_lag': [env_data_april.get('sst', 0)],
        'Ash Free Dry Weight (g)_lag': [get_average_ash_free_dry_weight(0)],
        'Number of Days': [30],
        'Precipitation_lag': [env_data_april.get('precipitation', 0)],
        'Turbidity (FTU)': [env_data_may.get('turbidity', 0)],
        'Turbidity (FTU)_lag': [env_data_april.get('turbidity', 0)],
        'Monitoring Period': [0],
        'Individual Weight (g)_lag': [individual_weight],
        'Chlorophyll_lag': [env_data_april.get('chlorophyll', 0)]
    }

    input_df = pd.DataFrame(input_data)
    predictions = []

    # Perform predictions for each monitoring period
    for period in range(1, 7):  # May to October
        prediction = loaded_model.predict(input_df[best_features])[0]
        predictions.append(prediction)
        
        # Update input data for the next period
        input_df['Monitoring Period'] = period
        input_df['Growth (g per day)'] = prediction
        input_df['Ash Free Dry Weight (g)_lag'] = get_average_ash_free_dry_weight(period)  # Use average for the next period

    # Create DataFrame for predictions with 'Month' and 'Growth (g per day)'
    months = ['May', 'June', 'July', 'August', 'September', 'October']
    prediction_df = pd.DataFrame({
        'Month': months,
        'Growth (g per day)': predictions
    })

    # Plotting the predictions
    plt.figure(figsize=(10, 6))
    plt.plot(prediction_df['Month'], prediction_df['Growth (g per day)'], marker='o', linestyle='-', color='b')
    plt.xlabel('Month')
    plt.ylabel('Growth (g per day)')
    plt.title('Predicted Mussel Growth from May to October')
    plt.grid(True)
    plt.show()

    return prediction_df

In [None]:
# Contoh penggunaan
lat = 52.0
lon = 4.0
individual_weight = 2.0  # Misal 2 gram
depth = 5.0
avg_flow_speed = 0.2
max_flow_speed = 0.5

predictions = predict_growth(lat, lon, individual_weight, depth, avg_flow_speed, max_flow_speed)
predictions