# Top 30 observed species focusing on calculating seasonal onset shifts and trend slopes between the reference period (2014-2020) and post-2020. 

### Top 30 Observed Species: 
- Jackfruit-Artocarpus heterophyllus
- Mango (all varieties)-Mangifera indica
- Teak-Tectona grandis
- Tamarind-Tamarindus indica
- Indian Laburnum-Cassia fistula
- Amla-Phyllanthus emblica
- Jamun-Syzygium cumini
- Coconut palm-Cocos nucifera
- Neem-Azadirachta indica
- Purple Bauhinia-Bauhinia purpurea
- Maulsari-Mimusops elengi
- Gulmohur-Delonix regia
- Rain tree-Albizia saman
- Peepal-Ficus religiosa
- Guava tree-Psidium guajava
- Devil's Tree-Alstonia scholaris
- Chandada-Macaranga peltata
- True Ashoka-Saraca asoca
- Pongam Tree-Pongamia pinnata
- Wood Apple-Aegle marmelos
- Country Fig-Ficus racemosa
- Drumstick tree-Moringa oleifera 
- Red Silk Cotton-Bombax ceiba
- Indian Almond-Terminalia catappa
- Custard apple-Annona squamosa
- Gamar-Gmelina arborea
- Copper-pod-Peltophorum pterocarpum
- Pride of India-Lagerstroemia speciosa
- Chiku Sapodilla-Manilkara zapota
- Banyan-Ficus benghalensis

In [1]:
import pandas as pd
from scipy.stats import linregress
import matplotlib.pyplot as plt
import numpy as np

# Load the data for the top 30 species
file_path = '/Users/cecilywang/Documents/GitHub/ds-seasonwatch-trees/code/Fall 2024 Code/top_30_species_all_data.csv'  # Update the path if necessary
species_data = pd.read_csv(file_path)

# Define periods and seasonal ranges
reference_period = species_data[(species_data['Year'] >= 2014) & (species_data['Year'] <= 2020)]
post_2020_period = species_data[species_data['Year'] > 2020]

season_ranges = {
    "Winter": (1, 8),
    "Summer": (9, 22),
    "Monsoon": (23, 39),
    "Post-Monsoon": (40, 52)
}

# Initialize storage for results
onset_shifts = []
slope_comparisons = []

# Loop over each species to calculate onset shifts and slope differences
for species in species_data['Species_name'].unique():
    for season, (start, end) in season_ranges.items():
        # Filter data for the species and season
        ref_data = reference_period[(reference_period['Species_name'] == species) &
                                    (reference_period['Week'] >= start) & 
                                    (reference_period['Week'] <= end)]
        post_data = post_2020_period[(post_2020_period['Species_name'] == species) &
                                     (post_2020_period['Week'] >= start) & 
                                     (post_2020_period['Week'] <= end)]

        # Onset shift calculation
        ref_onset_week_avg = ref_data[ref_data['Leaves_mature'] > 0].groupby('Year')['Week'].min().mean()
        post_onset_week_avg = post_data[post_data['Leaves_mature'] > 0].groupby('Year')['Week'].min().mean()
        
        onset_shifts.append({
            'Species': species,
            'Season': season,
            'Reference Onset Week (Avg)': ref_onset_week_avg,
            'Post-2020 Onset Week (Avg)': post_onset_week_avg,
            'Onset Shift (Weeks)': post_onset_week_avg - ref_onset_week_avg
        })

        # Trend slope calculation
        ref_season_avg = ref_data.groupby('Year')['Leaves_mature'].mean().reset_index()
        post_season_avg = post_data.groupby('Year')['Leaves_mature'].mean().reset_index()

        ref_slope, _, _, _, _ = linregress(ref_season_avg['Year'], ref_season_avg['Leaves_mature'])
        post_slope, _, _, _, _ = linregress(post_season_avg['Year'], post_season_avg['Leaves_mature'])

        slope_comparisons.append({
            'Species': species,
            'Season': season,
            'Reference Period Slope': ref_slope,
            'Post-2020 Period Slope': post_slope,
            'Slope Difference': post_slope - ref_slope
        })

# Convert results to DataFrames
onset_shifts_df = pd.DataFrame(onset_shifts)
slope_comparisons_df = pd.DataFrame(slope_comparisons)

# Display results (or save to CSV if preferred)
print("Onset Shifts by Species and Season:")
print(onset_shifts_df.head())  # Display the first few rows for quick inspection
print("\nSlope Comparisons by Species and Season:")
print(slope_comparisons_df.head())




Onset Shifts by Species and Season:
                            Species        Season  Reference Onset Week (Avg)  \
0  Indian Almond-Terminalia catappa        Winter                        1.75   
1  Indian Almond-Terminalia catappa        Summer                       10.20   
2  Indian Almond-Terminalia catappa       Monsoon                       24.80   
3  Indian Almond-Terminalia catappa  Post-Monsoon                       41.00   
4        Chandada-Macaranga peltata        Winter                        1.00   

   Post-2020 Onset Week (Avg)  Onset Shift (Weeks)  
0                    1.000000            -0.750000  
1                    9.666667            -0.533333  
2                   23.333333            -1.466667  
3                   40.000000            -1.000000  
4                    1.000000             0.000000  

Slope Comparisons by Species and Season:
                            Species        Season  Reference Period Slope  \
0  Indian Almond-Terminalia catappa     

In [None]:
# save the results to csv files
onset_shifts_df.to_csv('onset_shifts_top_30_species.csv', index=False)
slope_comparisons_df.to_csv('slope_comparisons_top_30_species.csv', index=False)

# What is the 'onset_shifts_top_30_species.csv' and 'slope_comparisons_top_30_species.csv' about? :

Each file contains data summarizing the seasonal onset shifts and trend comparisons between the reference period (2014-2020) and post-2020 for each of the top 30 observed species.

### 1. `onset_shifts_top_30_species.csv`

This file captures the shifts in the average onset week for mature leaves across different seasons, comparing the reference period to the post-2020 period.

| Column Name                     | Description                                                                                           |
|---------------------------------|-------------------------------------------------------------------------------------------------------|
| `Species`                       | The species name, identifying each unique plant or tree in the top 30 observed species.               |
| `Season`                        | The season in which the observation was made (Winter, Summer, Monsoon, or Post-Monsoon).             |
| `Reference Onset Week (Avg)`    | The average week number (within the season) when mature leaves were first observed during 2014-2020. |
| `Post-2020 Onset Week (Avg)`    | The average week number (within the season) when mature leaves were first observed after 2020.       |
| `Onset Shift (Weeks)`           | The difference between `Post-2020 Onset Week (Avg)` and `Reference Onset Week (Avg)`, showing if mature leaves appeared earlier or later on average in the post-2020 period. Positive values indicate a shift to later weeks, while negative values indicate an earlier shift. |

### 2. `slope_comparisons_top_30_species.csv`

This file details the trend slopes of mature leaf observations for each species and season, comparing the reference period to post-2020. The slope represents the rate of change in mature leaf observations over time.

| Column Name                     | Description                                                                                              |
|---------------------------------|----------------------------------------------------------------------------------------------------------|
| `Species`                       | The species name, identifying each unique plant or tree in the top 30 observed species.                  |
| `Season`                        | The season in which the observation was made (Winter, Summer, Monsoon, or Post-Monsoon).                 |
| `Reference Period Slope`        | The slope of the trend line for mature leaf observations during the reference period (2014-2020).        |
| `Post-2020 Period Slope`        | The slope of the trend line for mature leaf observations in the post-2020 period.                        |
| `Slope Difference`              | The difference between `Post-2020 Period Slope` and `Reference Period Slope`, indicating any change in the rate of leaf observation trends between the two periods. Positive values suggest an increase in the trend post-2020, while negative values suggest a decrease. |

### Purpose of These Files

- **Onset Shifts Analysis** (`onset_shifts_top_30_species.csv`): This file allows users to assess if there has been a significant shift in the timing of mature leaf appearances between the reference period and post-2020. Delays or advances in onset week can provide insights into how changing climate conditions might be affecting the growth cycles of these species.

- **Slope Comparisons Analysis** (`slope_comparisons_top_30_species.csv`): This file provides a comparison of the trends in mature leaf observations over time, showing if these trends have changed direction or intensity in the post-2020 period compared to the reference period. Differences in slopes may indicate changes in seasonal growth patterns that could be influenced by environmental or climate factors.
