# IND320 Course Project

## Introduction

This directory contains the first part of the course project in the "IND320 - Data to Decision" at the Norwegian University of Life Sciences. The project is divided into four compulsory assignments: 
- Dashboard Basics (current)
- Databases and APIs 
- Pre-processing and anomaly detection 
- Machine Learning and visualisation

### AI Usage

AI plays a multifaceted role throughout this project, primarily serving as an assistant and analytical tool. The project leverages AI in several areas:

**Development and Code Generation:** 
- AI assists in writing and optimizing code for the application. 

**Data Analysis and Insights:** 
- AI helps analyze data patterns and identifying trends. It assists in generating meaningful statistical summaries and suggesting appropriate visualization techniques for the given data.

**Documentation and Communication:** 
- AI supports the creation of clear documentation, such as code comments, and user interface text. It helps structure the project documentation and ensures technical concepts are communicated effectively.

**Problem-Solving and Debugging:** 
- Throughout the development process, AI serves as a coding companion, helping troubleshoot issues, optimize data processing workflows, and suggesting best practices.

## Notes

For running and displaying this project i suggest using viewing the deployed version of the application linked under "Code access & direct links" or downloading the project and running it from your terminal with "streamlit run streamlit_app.py". I've tried to make this Notebook readable and usable, but find it a bit "impractical" at times.

## Project structure & code access

### File structure

As for the first assignment (Project Work, Part 1), the project's file structure is as follows:
- /assets
    - contains datasets
- /pages
    - contains all sub pages for the application
- Notebook.ipynb
    - Holds the documentation of the project
- streamlit_app.py
    - Serves as the entrypoint for running the application. Also represents the "Homepage".
- README.md
    - Holds documentation and other details. Maily used for Github repo.
- requirements.txt
    - Contains all necessary packages/modules installed to run the project correctly

### Code access and direct links

- The project is deployed here: [ind320-henrikengdal-project](https://ind320-henrikengdal-project.streamlit.app/)
- The code is accessible at the repository: [henrikengdal/ind320-henrikengdal-project](https://github.com/HenrikEngd/IND320-HenrikEngdal-Project.git)

## Project Work, Part 1

#### Imports & Run

In [1]:
import streamlit as st
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from datetime import datetime

# Only needed for displaying the application within the notebook
from IPython.display import IFrame

In [2]:
path_prefix = "/Users/henrikengdal/Documents/GitHub/IND320-HenrikEngdal-Project/"
file_name = "streamlit_app.py"
!streamlit run {path_prefix}{file_name} --server.port 8501 --server.headless true # Flags used for displaying the Streamlit app in an IFrame within Jupyter Notebook

2025-10-02 12:40:49.423 Port 8501 is already in use


#### Hoemepage
 My understanding for this page is that it does not have any spesifics that it has to include. I therefore chose to use it as a sort of "Welcome to this project" and explain a bit about what is to be found in the application and its use. The file streamlit_app.py also includes code for loading, preparing and caching the data so that its available to the other pages. Being done this way, a user has to enter the homepage before data is possible to display, but since it's the "landingpage" i think thats fine, and i've also added some error-handlig that politely asks the user to navigate back if the data is not loaded yet. 

 See file strealit_app.py for code details

In [3]:
# Home
IFrame("http://localhost:8501", width="100%", height=800)

Related code:

In [None]:
# Load and cache data globally
@st.cache_data
def load_weather_data():
    # Load, prepare and cache the data
    try:   
        df = pd.read_csv("assets/open-meteo-subset.csv")
        # Convert time column to datetime
        df['time'] = pd.to_datetime(df['time'])
        return df
    except FileNotFoundError:
        st.error("Could not find the data file")
        return None
    except Exception as e:
        st.error(f"Error loading data: {str(e)}")
        return None

# Load data once and store in session state
if 'weather_data' not in st.session_state:
    st.session_state.weather_data = load_weather_data()

st.title("IND320 Course Project")

# Show data load status
if st.session_state.weather_data is None:
    st.error("Failed to load the data")


st.markdown("""
### Project Overview
This project is part of the course "IND320 - Data to Decision" at the Norwegian University of Life Sciences.
The application demonstrates data loading, processing, and visualization using Streamlit and Plotly.
The dataset used is a subset of weather data from [Open-Meteo](https://open-meteo.com/).

### Content
- Home Page: Overview of the project and data load status.
- Second Page: Table with weather parameters for January 2020, incl. mean, min, max, std.
- Third Page: Dynamic visualization with parameter selection and month range filter.
""")


#### sencond_page (Static data visualization)

My understanding of this page is that it is to include a table using a "row-wise display" of the first month of the data series (January). And that the table should have one row for each column of the imported data. Therefore 5 rows in total:
- temperature_2m
- percipitation
- wind_speed_10m
- wind_gust_10m
- wind_direction_10m

I've also included some values for Mean Value, Minimum, Maximum and Standard Deviation for some more understanding for the data and for a quick and easy overview

See the file "pages/second_page.py" for code details

In [None]:
# Second page (Make sure you click back to the first page and then navigate to the third page to load it properly)
IFrame("http://localhost:8501/second_page", width="100%", height=600)

related code:

In [None]:
# Page configuration
st.set_page_config(page_title="Weather Data Table", layout="wide")

st.title("Weather Data Overview")
st.markdown("---")

# Get data from session state (loaded in homepage)
df = st.session_state.get('weather_data', None)

if df is None:
    st.error("No weather data available. Please visit the homepage first to load the data.")
    st.stop()

if df is not None:
    # Filter data for the first month
    first_month_df = df[df['time'].dt.strftime('%Y-%m') == '2020-01']
    
    # Get numeric columns only (exclude non-numeric like 'time' or any string columns)
    numeric_columns = list(df.select_dtypes(include='number').columns)
    
    # Create table data for display
    table_data = []
    
    for column in numeric_columns:
        # Get first month data for this column
        first_month_values = first_month_df[column].values
        # Calculate statistics (numeric-safe)
        mean_val = pd.to_numeric(first_month_df[column], errors='coerce').mean()
        min_val = pd.to_numeric(first_month_df[column], errors='coerce').min()
        max_val = pd.to_numeric(first_month_df[column], errors='coerce').max()
        std_val = pd.to_numeric(first_month_df[column], errors='coerce').std()
        
        # Create row data
        row_data = {
            'Parameter': column,
            'Mean': f"{mean_val:.2f}",
            'Min': f"{min_val:.2f}",
            'Max': f"{max_val:.2f}",
            'Std Dev': f"{std_val:.2f}",
            'First Month Trend': first_month_values
        }
        table_data.append(row_data)
    
    # Convert to DataFrame
    table_df = pd.DataFrame(table_data)
    
    st.subheader("Weather Parameters - First Month Analysis")
    st.markdown("Each row shows statistics and trends for one weather parameter during January 2020")
    
    # Display the table
    st.dataframe(
        table_df,
        column_config={
            "Parameter": st.column_config.TextColumn(
                "Weather Parameter",
                help="The weather measurement parameter",
                width="medium"
            ),
            "Mean": st.column_config.NumberColumn(
                "Mean Value",
                help="Average value for January 2020"
            ),
            "Min": st.column_config.NumberColumn(
                "Minimum",
                help="Lowest recorded value in January 2020"
            ),
            "Max": st.column_config.NumberColumn(
                "Maximum", 
                help="Highest recorded value in January 2020"
            ),
            "Std Dev": st.column_config.NumberColumn(
                "Standard Deviation",
                help="Standard deviation showing data variability"
            ),
            # using LineChartColumn for trend visualization
            "First Month Trend": st.column_config.LineChartColumn(
                "Trend",
                help="Hourly trend visualization for the entire first month",
                width="large"
            )
        },
        width='stretch',
        hide_index=True
    )

else:
    st.error("Unable to load data. Please check if the data file exists and is properly formatted.")



#### third_page (Dynamic visualization)

My understanding of this page is that it should include a graph display of the imported data including axis titles and other relevant formatting.
There should also be the option to select single columns for display or all columns together
There should also be a slider to select a subset of months (default to the first month) 

In [None]:
# Third page (Make sure you click back to the first page and then navigate to the third page to load it properly)
IFrame("http://localhost:8501/third_page", width="100%", height=1200)

related code:

In [None]:
# Page configuration
st.set_page_config(page_title="Weather Data Analysis", layout="wide")

st.title("Weather Data Analysis")
st.markdown("---")

# Get data from session state (loaded in homepage)
df = st.session_state.get('weather_data', None)

if df is None:
    st.error("No weather data available. Please visit the homepage first to load the data.")
    st.stop()

if df is not None:
    # Work on a local copy to avoid mutating cached data in session state
    df_local = df.copy()
    # Extract month for filtering
    df_local['month'] = df_local['time'].dt.strftime('%Y-%m')
    # Get unique months for the slider
    months = sorted(df_local['month'].unique())
    
    # Create month abbreviations mapping
    month_mapping = {}
    month_display_names = []
    
    for month_str in months:
        # Convert '2020-01' to 'Jan 2020'
        year, month_num = month_str.split('-')
        month_abbrev = datetime.strptime(month_num, '%m').strftime('%b')
        display_name = f"{month_abbrev} {year}"
        month_mapping[display_name] = month_str
        month_display_names.append(display_name)
    
    # Create two columns for controls
    col1, col2 = st.columns([1, 1])
    
    with col1:
        st.subheader("Column Selection")
        # Get numeric columns (excluding time)
        numeric_columns = [col for col in df_local.select_dtypes(include='number').columns if col not in ['time']]
        column_options = ['All Columns'] + numeric_columns
        
        selected_column = st.selectbox(
            "Choose a column to visualize:",
            options=column_options,
            index=0,
            help="Select a specific column or 'All Columns' to show all data together"
        )
    
    with col2:
        st.subheader("Month Selection")
        # Month selection slider with abbreviated names
        # Default to the first month only (January) — two handles set to the first month if range is available
        default_month_value = (
            (month_display_names[0], month_display_names[0])
            if len(month_display_names) >= 2
            else month_display_names[0]
        )
        selected_month_display = st.select_slider(
            "Select month range:",
            options=month_display_names,
            value=default_month_value,
            help="Use the slider to select a contiguous range of months"
        )
    
    # Convert display selection back to actual month values for filtering
    if isinstance(selected_month_display, str):
        selected_months = [month_mapping[selected_month_display]]
    elif isinstance(selected_month_display, tuple):
        # Get all months between the selected range
        start_display = selected_month_display[0]
        end_display = selected_month_display[1]
        start_idx = month_display_names.index(start_display)
        end_idx = month_display_names.index(end_display)
        selected_display_range = month_display_names[start_idx:end_idx + 1]
        selected_months = [month_mapping[display] for display in selected_display_range]
    else:
        selected_months = [month_mapping[selected_month_display]]
    
    # Filter data based on selected months
    filtered_df = df_local[df_local['month'].isin(selected_months)]
    
    st.markdown("---")
    
    # Create the plot
    if len(filtered_df) > 0:
        # Get display names for selected months
        selected_display_names = [display for display, month in month_mapping.items() if month in selected_months]
        
        st.subheader(f"Weather Data Visualization")
        
        if selected_column == 'All Columns':
            # Create subplot for all columns
            fig = go.Figure()
            
            colors = ['#FF6B6B', "#7AFFA4", "#D8ABFF", "#FF71FA", '#FFEAA7']
            
            for i, col in enumerate(numeric_columns):
                fig.add_trace(go.Scatter(
                    x=filtered_df['time'],
                    y=filtered_df[col],
                    mode='lines',
                    name=col,
                    line=dict(color=colors[i % len(colors)], width=2),
                    hovertemplate=f'<b>{col}</b><br>Date: %{{x}}<br>Value: %{{y}}<extra></extra>'
                ))
            
            fig.update_layout(
                title={
                    'text': f"All Weather Parameters Over Time ({', '.join(selected_display_names)})",
                    'x': 0.5,
                    'font': {'size': 20}
                },
                xaxis_title="Date and Time",
                yaxis_title="Values (Various Units)",
                hovermode='x unified',
                height=600,
                showlegend=True,
                legend=dict(
                    yanchor="top",
                    y=0.99,
                    xanchor="left",
                    x=1.01
                ),
                template='plotly_white',
                margin=dict(r=150)
            )
            
            # Update x-axis formatting
            fig.update_xaxes(
                showgrid=True,
                gridwidth=1,
                gridcolor='lightgray',
                tickformat='%Y-%m-%d %H:%M'
            )
            
            fig.update_yaxes(
                showgrid=True,
                gridwidth=1,
                gridcolor='lightgray'
            )
            
        else:
            # Create plot for single column
            fig = px.line(
                filtered_df,
                x='time',
                y=selected_column,
                title=f"{selected_column} Over Time ({', '.join(selected_display_names)})",
                labels={
                    'time': 'Date and Time',
                    selected_column: selected_column
                },
                height=600
            )
            
            fig.update_traces(
                line=dict(color='#3498DB', width=2),
                hovertemplate=f'<b>{selected_column}</b><br>Date: %{{x}}<br>Value: %{{y}}<extra></extra>'
            )
            
            fig.update_layout(
                title={
                    'text': fig.layout.title.text,
                    'x': 0.5,
                    'font': {'size': 20}
                },
                template='plotly_white',
                hovermode='x'
            )
            
            # Update axes
            fig.update_xaxes(
                title_text="Date and Time",
                showgrid=True,
                gridwidth=1,
                gridcolor='lightgray',
                tickformat='%Y-%m-%d %H:%M'
            )
            
            fig.update_yaxes(
                title_text=selected_column,
                showgrid=True,
                gridwidth=1,
                gridcolor='lightgray'
            )
        
        # Display the plot
        st.plotly_chart(fig, width='stretch')
        
    else:
        st.warning("No data available for the selected month(s).")
        
else:
    st.error("Unable to load data. Please check if the data file exists and is properly formatted.")

