# Top 30 observed species focusing on calculating seasonal onset shifts and trend slopes between the reference period (2014-2020) and post-2020. 

### Top 30 Observed Species: 
- Jackfruit-Artocarpus heterophyllus
- Mango (all varieties)-Mangifera indica
- Teak-Tectona grandis
- Tamarind-Tamarindus indica
- Indian Laburnum-Cassia fistula
- Amla-Phyllanthus emblica
- Jamun-Syzygium cumini
- Coconut palm-Cocos nucifera
- Neem-Azadirachta indica
- Purple Bauhinia-Bauhinia purpurea
- Maulsari-Mimusops elengi
- Gulmohur-Delonix regia
- Rain tree-Albizia saman
- Peepal-Ficus religiosa
- Guava tree-Psidium guajava
- Devil's Tree-Alstonia scholaris
- Chandada-Macaranga peltata
- True Ashoka-Saraca asoca
- Pongam Tree-Pongamia pinnata
- Wood Apple-Aegle marmelos
- Country Fig-Ficus racemosa
- Drumstick tree-Moringa oleifera 
- Red Silk Cotton-Bombax ceiba
- Indian Almond-Terminalia catappa
- Custard apple-Annona squamosa
- Gamar-Gmelina arborea
- Copper-pod-Peltophorum pterocarpum
- Pride of India-Lagerstroemia speciosa
- Chiku Sapodilla-Manilkara zapota
- Banyan-Ficus benghalensis

In [1]:
import pandas as pd
from scipy.stats import linregress
import matplotlib.pyplot as plt
import numpy as np

# Load the data for the top 30 species
file_path = '/.../Fall 2024 Code/top_30_species_all_data.csv'  # Update the path if necessary
species_data = pd.read_csv(file_path)

# Define periods and seasonal ranges
reference_period = species_data[(species_data['Year'] >= 2014) & (species_data['Year'] <= 2020)]
post_2020_period = species_data[species_data['Year'] > 2020]

season_ranges = {
    "Winter": (1, 8),
    "Summer": (9, 22),
    "Monsoon": (23, 39),
    "Post-Monsoon": (40, 52)
}

# Initialize storage for results
onset_shifts = []
slope_comparisons = []

# Loop over each species to calculate onset shifts and slope differences
for species in species_data['Species_name'].unique():
    for season, (start, end) in season_ranges.items():
        # Filter data for the species and season
        ref_data = reference_period[(reference_period['Species_name'] == species) &
                                    (reference_period['Week'] >= start) & 
                                    (reference_period['Week'] <= end)]
        post_data = post_2020_period[(post_2020_period['Species_name'] == species) &
                                     (post_2020_period['Week'] >= start) & 
                                     (post_2020_period['Week'] <= end)]

        # Onset shift calculation
        ref_onset_week_avg = ref_data[ref_data['Leaves_mature'] > 0].groupby('Year')['Week'].min().mean()
        post_onset_week_avg = post_data[post_data['Leaves_mature'] > 0].groupby('Year')['Week'].min().mean()
        
        onset_shifts.append({
            'Species': species,
            'Season': season,
            'Reference Onset Week (Avg)': ref_onset_week_avg,
            'Post-2020 Onset Week (Avg)': post_onset_week_avg,
            'Onset Shift (Weeks)': post_onset_week_avg - ref_onset_week_avg
        })

        # Trend slope calculation
        ref_season_avg = ref_data.groupby('Year')['Leaves_mature'].mean().reset_index()
        post_season_avg = post_data.groupby('Year')['Leaves_mature'].mean().reset_index()

        ref_slope, _, _, _, _ = linregress(ref_season_avg['Year'], ref_season_avg['Leaves_mature'])
        post_slope, _, _, _, _ = linregress(post_season_avg['Year'], post_season_avg['Leaves_mature'])

        slope_comparisons.append({
            'Species': species,
            'Season': season,
            'Reference Period Slope': ref_slope,
            'Post-2020 Period Slope': post_slope,
            'Slope Difference': post_slope - ref_slope
        })

# Convert results to DataFrames
onset_shifts_df = pd.DataFrame(onset_shifts)
slope_comparisons_df = pd.DataFrame(slope_comparisons)

# Display results (or save to CSV if preferred)
print("Onset Shifts by Species and Season:")
print(onset_shifts_df.head())  # Display the first few rows for quick inspection
print("\nSlope Comparisons by Species and Season:")
print(slope_comparisons_df.head())




Onset Shifts by Species and Season:
                            Species        Season  Reference Onset Week (Avg)  \
0  Indian Almond-Terminalia catappa        Winter                        1.75   
1  Indian Almond-Terminalia catappa        Summer                       10.20   
2  Indian Almond-Terminalia catappa       Monsoon                       24.80   
3  Indian Almond-Terminalia catappa  Post-Monsoon                       41.00   
4        Chandada-Macaranga peltata        Winter                        1.00   

   Post-2020 Onset Week (Avg)  Onset Shift (Weeks)  
0                    1.000000            -0.750000  
1                    9.666667            -0.533333  
2                   23.333333            -1.466667  
3                   40.000000            -1.000000  
4                    1.000000             0.000000  

Slope Comparisons by Species and Season:
                            Species        Season  Reference Period Slope  \
0  Indian Almond-Terminalia catappa     

In [None]:
# save the results to csv files
onset_shifts_df.to_csv('onset_shifts_top_30_species.csv', index=False)
slope_comparisons_df.to_csv('slope_comparisons_top_30_species.csv', index=False)

# What is the 'onset_shifts_top_30_species.csv' and 'slope_comparisons_top_30_species.csv' about? :

*Each file contains data summarizing the seasonal onset shifts and trend comparisons between the reference period (2014-2020) and post-2020 for each of the top 30 observed species.*

### 1. `onset_shifts_top_30_species.csv`

This file captures the shifts in the average onset week for mature leaves across different seasons, comparing the reference period to the post-2020 period.

| Column Name                     | Description                                                                                           |
|---------------------------------|-------------------------------------------------------------------------------------------------------|
| `Species`                       | The species name, identifying each unique plant or tree in the top 30 observed species.               |
| `Season`                        | The season in which the observation was made (Winter, Summer, Monsoon, or Post-Monsoon).             |
| `Reference Onset Week (Avg)`    | The average week number (within the season) when mature leaves were first observed during 2014-2020. |
| `Post-2020 Onset Week (Avg)`    | The average week number (within the season) when mature leaves were first observed after 2020.       |
| `Onset Shift (Weeks)`           | The difference between `Post-2020 Onset Week (Avg)` and `Reference Onset Week (Avg)`, showing if mature leaves appeared earlier or later on average in the post-2020 period. Positive values indicate a shift to later weeks, while negative values indicate an earlier shift. |

### 2. `slope_comparisons_top_30_species.csv`

This file details the trend slopes of mature leaf observations for each species and season, comparing the reference period to post-2020. The slope represents the rate of change in mature leaf observations over time.

| Column Name                     | Description                                                                                              |
|---------------------------------|----------------------------------------------------------------------------------------------------------|
| `Species`                       | The species name, identifying each unique plant or tree in the top 30 observed species.                  |
| `Season`                        | The season in which the observation was made (Winter, Summer, Monsoon, or Post-Monsoon).                 |
| `Reference Period Slope`        | The slope of the trend line for mature leaf observations during the reference period (2014-2020).        |
| `Post-2020 Period Slope`        | The slope of the trend line for mature leaf observations in the post-2020 period.                        |
| `Slope Difference`              | The difference between `Post-2020 Period Slope` and `Reference Period Slope`, indicating any change in the rate of leaf observation trends between the two periods. Positive values suggest an increase in the trend post-2020, while negative values suggest a decrease. |

### Purpose 

- **Onset Shifts Analysis** (`onset_shifts_top_30_species.csv`): allows users to assess if there has been a significant shift in the timing of mature leaf appearances between the reference period and post-2020. Delays or advances in onset week can provide insights into how changing climate conditions might be affecting the growth cycles of these species.

- **Slope Comparisons Analysis** (`slope_comparisons_top_30_species.csv`): gives a comparison of the trends in mature leaf observations over time, showing if these trends have changed direction or intensity in the post-2020 period compared to the reference period. Differences in slopes may indicate changes in seasonal growth patterns that could be influenced by environmental or climate factors.


In [2]:
import pandas as pd

# Load the CSV files generated earlier
onset_shifts_df = pd.read_csv('/.../VISUALIZATIONS- fall 2024/onset_shifts_top_30_species.csv')
slope_comparisons_df = pd.read_csv('/.../VISUALIZATIONS- fall 2024/slope_comparisons_top_30_species.csv')

# 1. Identify Species and Seasons with Significant Onset Shifts
# Sort onset shifts by absolute value to find the largest changes
significant_onset_shifts = onset_shifts_df.copy()
significant_onset_shifts['Absolute Onset Shift'] = significant_onset_shifts['Onset Shift (Weeks)'].abs()
top_onset_shifts = significant_onset_shifts.sort_values(by='Absolute Onset Shift', ascending=False).head(10)

print("Top 10 Species-Seasons with the Largest Onset Shifts:")
print(top_onset_shifts[['Species', 'Season', 'Reference Onset Week (Avg)', 
                        'Post-2020 Onset Week (Avg)', 'Onset Shift (Weeks)']])

# 2. Aggregate Onset Shifts by Season
# Calculate the average onset shift for each season across all species
seasonal_onset_shift_avg = onset_shifts_df.groupby('Season')['Onset Shift (Weeks)'].mean().reset_index()
print("\nAverage Onset Shift by Season (Weeks):")
print(seasonal_onset_shift_avg)

# 3. Identify Species with the Most Drastic Trend Changes
# Sort slope comparisons by absolute slope difference to find the largest trend changes
significant_trend_changes = slope_comparisons_df.copy()
significant_trend_changes['Absolute Slope Difference'] = significant_trend_changes['Slope Difference'].abs()
top_trend_changes = significant_trend_changes.sort_values(by='Absolute Slope Difference', ascending=False).head(10)

print("\nTop 10 Species-Seasons with the Largest Trend Changes:")
print(top_trend_changes[['Species', 'Season', 'Reference Period Slope', 
                         'Post-2020 Period Slope', 'Slope Difference']])


Top 10 Species-Seasons with the Largest Onset Shifts:
                                   Species        Season  \
117     Copper-pod-Peltophorum pterocarpum        Summer   
50             Coconut palm-Cocos nucifera       Monsoon   
49             Coconut palm-Cocos nucifera        Summer   
2         Indian Almond-Terminalia catappa       Monsoon   
71   Pride of India-Lagerstroemia speciosa  Post-Monsoon   
119     Copper-pod-Peltophorum pterocarpum  Post-Monsoon   
106             Guava tree-Psidium guajava       Monsoon   
36       Chiku Sapodilla-Manilkara zapota        Winter   
3         Indian Almond-Terminalia catappa  Post-Monsoon   
118     Copper-pod-Peltophorum pterocarpum       Monsoon   

     Reference Onset Week (Avg)  Post-2020 Onset Week (Avg)  \
117                       12.75                    9.666667   
50                        25.50                   23.000000   
49                        11.75                    9.666667   
2                         24.80  

In [3]:
# 2. Aggregate Onset Shifts by Season
# Calculate the average onset shift for each season across all species
seasonal_onset_shift_avg = onset_shifts_df.groupby('Season')['Onset Shift (Weeks)'].mean().reset_index()

# Save average onset shifts by season to CSV
seasonal_onset_shift_avg.to_csv('average_onset_shift_by_season.csv', index=False)
print("Saved average onset shift by season to 'average_onset_shift_by_season.csv'")

# 3. Save Trend Changes for All 30 Species
# No filtering needed; we save the entire dataset for trend comparisons
slope_comparisons_df.to_csv('slope_comparisons_all_30_species.csv', index=False)
print("Saved trend comparisons for all 30 species to 'slope_comparisons_all_30_species.csv'")


Saved average onset shift by season to 'average_onset_shift_by_season.csv'
Saved trend comparisons for all 30 species to 'slope_comparisons_all_30_species.csv'


### 1. **Significant Early Onset Shifts in Summer and Monsoon**
   - **Species with Largest Shifts**: *Copper-pod (Peltophorum pterocarpum)*, *Coconut palm (Cocos nucifera)*, and *Indian Almond (Terminalia catappa)* show significant shifts in onset timing, particularly in the **Summer** and **Monsoon** seasons, with onset weeks moving earlier by up to 3 weeks.
   - **Possible Interpretation**: This trend of earlier onset in Summer and Monsoon could suggest that these species are responding to changing seasonal cues, possibly due to warmer or more favorable conditions occurring earlier in these seasons.

### 2. **Mixed Onset Shifts in Post-Monsoon and Winter**
   - In **Post-Monsoon** and **Winter**, the average onset shifts are generally smaller and less consistent across species, with both earlier and later onset observed.
   - **Key Observation**: For instance, *Pride of India (Lagerstroemia speciosa)* showed a delayed onset of around 1.3 weeks in Post-Monsoon, while *Chiku Sapodilla (Manilkara zapota)* in Winter had an earlier onset by 1 week.
   - **Implication**: The variability in Winter and Post-Monsoon onset shifts may indicate a more complex interaction with climate, where some species experience delays due to potentially cooler or altered post-monsoon conditions.

### 3. **Notable Changes in Growth Trends in Winter and Monsoon**
   - **Top Species with Trend Changes**: *Red Silk Cotton (Bombax ceiba)* in Winter and *Coconut Palm (Cocos nucifera)* in Monsoon experienced the most substantial trend changes, with slopes decreasing by over 0.29, indicating a reduction in growth rates in these seasons post-2020.
   - **Broader Pattern**: Many species, including *Pongam Tree (Pongamia pinnata)*, *Drumstick Tree (Moringa oleifera)*, and *Peepal (Ficus religiosa)*, showed decreasing trends across Winter and Monsoon.
   - **Potential Climate Impact**: This decrease in trend slopes for growth in Winter and Monsoon suggests that these species may be experiencing stress or limitations in their typical growth patterns, possibly due to less favorable conditions (e.g., reduced rainfall or temperature changes) in these seasons post-2020.

### 4. **Average Seasonal Shift Patterns**
   - **Overall Seasonal Impact**: On average, **Summer** shows a positive shift of 0.32 weeks, while **Monsoon** and **Winter** show slight negative shifts, with Monsoon having the most substantial change of -0.35 weeks.
   - **Implications for Seasonal Cycles**: These shifts indicate that species in Kerala may be adapting to earlier summer conditions, while Monsoon and Winter may be shifting slightly later or becoming less predictable.

### Summary
The data suggests that:
- **Earlier onset** in Summer and Monsoon for many species could be an adaptation to changing seasonal cues.
- **Reduced growth trends** in Winter and Monsoon, especially for species like *Red Silk Cotton* and *Coconut Palm*, may indicate stress or climate-related impacts affecting growth during these periods.
- These findings point to a broader pattern of climate adaptation, where species are shifting growth cycles or experiencing altered growth rates in response to changing environmental conditions in Kerala.


key questions we can answer using our current analysis:

### Key Questions and Answers

1. **How are trees changing because of climate change?**
   - By comparing the onset shifts and trend slopes between the reference period (2014-2020) and post-2020, we observe that some species, particularly *Copper-pod* and *Coconut palm*, have shown earlier onset weeks, especially in the Summer and Monsoon seasons. These earlier growth patterns could be an adaptation to changing climate conditions, like warmer temperatures or shifting rainfall patterns.

2. **What is the onset time for flowering and fruiting in tropical species?**
   -The onset shifts calculated for the top 30 observed species provide an average onset week for each season, allowing us to identify how early or late these phenological stages occur on average. For example, Summer onset shifts were generally earlier in post-2020 data, which could suggest a trend towards earlier development in response to climate signals.

3. **How fast do trees change in response to changing seasons?**
   -  The slope comparisons reveal how quickly growth trends are changing over time, with several species showing decreased trend slopes in the post-2020 period. This suggests that certain species might be experiencing slowed or reduced growth rates, possibly due to altered seasonal conditions or increased environmental stress.

4. **How has the probability of flowering and fruiting in a given season changed since 2014?**
   - While this question is designed to be answered probabilistically, our onset and trend analyses already show that flowering and fruiting stages are shifting earlier in some seasons. This indirectly suggests that the likelihood of these phenological events happening in a given season may be increasing or decreasing due to climate changes. Further probability-based modeling could refine this insight.

5. **Do certain seasons show more pronounced shifts in timing or growth patterns?**
   -  According to the average onset shifts, **Summer** and **Monsoon** seasons display more pronounced shifts towards earlier onset timing. This pattern suggests that these seasons may be experiencing stronger climate-driven impacts, which is further supported by the general trend of reduced slopes in growth patterns across these seasons.





# 12/5 Client edits: 

In [8]:
import pandas as pd
from scipy.stats import linregress
import matplotlib.pyplot as plt
import numpy as np

# Load the data for the top 30 species
file_path = '.../ds-seasonwatch-trees/data/Fall 2024 data/top_30_species_all_data.csv' 
species_data = pd.read_csv(file_path)

# Define periods and seasonal ranges
reference_period = species_data[(species_data['Year'] >= 2014) & (species_data['Year'] <= 2020)]
post_2020_period = species_data[species_data['Year'] > 2020]

season_ranges = {
    "Winter": (1, 8),
    "Summer": (9, 22),
    "Monsoon": (23, 39),
    "Post-Monsoon": (40, 52)
}

# Initialize storage for results
onset_shifts = []
slope_comparisons = []

# Loop over each species to calculate onset shifts and slope differences
for species in species_data['Species_name'].unique():
    for season, (start, end) in season_ranges.items():
        # Filter data for the species and season
        ref_data = reference_period[(reference_period['Species_name'] == species) &
                                    (reference_period['Week'] >= start) & 
                                    (reference_period['Week'] <= end)]
        post_data = post_2020_period[(post_2020_period['Species_name'] == species) &
                                     (post_2020_period['Week'] >= start) & 
                                     (post_2020_period['Week'] <= end)]

        # Ensure there is data to calculate
        if not ref_data.empty and not post_data.empty:
            # Onset shift calculation
            ref_onset_week_avg = ref_data[ref_data['Leaves_mature'] > 0].groupby('Year')['Week'].min().mean()
            post_onset_week_avg = post_data[post_data['Leaves_mature'] > 0].groupby('Year')['Week'].min().mean()

            onset_shifts.append({
                'Species': species,
                'Season': season,
                'Reference Onset Week (Avg)': ref_onset_week_avg,
                'Post-2020 Onset Week (Avg)': post_onset_week_avg,
                'Onset Shift (Weeks)': post_onset_week_avg - ref_onset_week_avg
            })

            # Trend slope calculation
            ref_season_avg = ref_data.groupby('Year')['Leaves_mature'].mean().reset_index()
            post_season_avg = post_data.groupby('Year')['Leaves_mature'].mean().reset_index()

            if not ref_season_avg.empty and len(ref_season_avg) > 1:
                ref_slope, _, _, _, _ = linregress(ref_season_avg['Year'], ref_season_avg['Leaves_mature'])
            else:
                ref_slope = np.nan  # Not enough data for regression

            if not post_season_avg.empty and len(post_season_avg) > 1:
                post_slope, _, _, _, _ = linregress(post_season_avg['Year'], post_season_avg['Leaves_mature'])
            else:
                post_slope = np.nan  # Not enough data for regression

            slope_comparisons.append({
                'Species': species,
                'Season': season,
                'Reference Period Slope': ref_slope,
                'Post-2020 Period Slope': post_slope,
                'Slope Difference': post_slope - ref_slope
            })

# Convert results to DataFrames
onset_shifts_df = pd.DataFrame(onset_shifts)
slope_comparisons_df = pd.DataFrame(slope_comparisons)


In [9]:
onset_shifts_df.to_csv('12_5_onset_shift_by_season.csv', index=False)

In [6]:
# 1. Identify Species and Seasons with Significant Onset Shifts
# Sort onset shifts by absolute value to find the largest changes
significant_onset_shifts = onset_shifts_df.copy()
significant_onset_shifts['Absolute Onset Shift'] = significant_onset_shifts['Onset Shift (Weeks)'].abs()
top_onset_shifts = significant_onset_shifts.sort_values(by='Absolute Onset Shift', ascending=False).head(10)

print("Top 10 Species-Seasons with the Largest Onset Shifts:")
print(top_onset_shifts[['Species', 'Season', 'Reference Onset Week (Avg)', 
                        'Post-2020 Onset Week (Avg)', 'Onset Shift (Weeks)']])

# 2. Aggregate Onset Shifts by Season
# Calculate the average onset shift for each season across all species, weighted by species count in each season
seasonal_onset_shift_avg = (
    onset_shifts_df
    .groupby(['Season', 'Species'])
    .mean(numeric_only=True)  # Ensure numeric columns are averaged
    .reset_index()
    .groupby('Season')['Onset Shift (Weeks)']
    .mean()
    .reset_index()
)

print("\nAverage Onset Shift by Season (Weeks):")
print(seasonal_onset_shift_avg)

# Save average onset shifts by season to CSV
#seasonal_onset_shift_avg.to_csv('/.../VISUALIZATIONS- fall 2024/average_onset_shift_by_season.csv', index=False)
#print("Saved average onset shift by season to 'average_onset_shift_by_season.csv'")

# 3. Identify Species with the Most Drastic Trend Changes
# Sort slope comparisons by absolute slope difference to find the largest trend changes
significant_trend_changes = slope_comparisons_df.copy()
significant_trend_changes['Absolute Slope Difference'] = significant_trend_changes['Slope Difference'].abs()
top_trend_changes = significant_trend_changes.sort_values(by='Absolute Slope Difference', ascending=False).head(10)

print("\nTop 10 Species-Seasons with the Largest Trend Changes:")
print(top_trend_changes[['Species', 'Season', 'Reference Period Slope', 
                         'Post-2020 Period Slope', 'Slope Difference']])



Top 10 Species-Seasons with the Largest Onset Shifts:
                                   Species        Season  \
106            Coconut palm-Cocos nucifera       Monsoon   
105            Coconut palm-Cocos nucifera        Summer   
109             Guava tree-Psidium guajava        Summer   
99            Quickstick-Gliricidia sepium  Post-Monsoon   
110             Guava tree-Psidium guajava       Monsoon   
71   Pride of India-Lagerstroemia speciosa  Post-Monsoon   
51                   Gamar-Gmelina arborea  Post-Monsoon   
93                 Neem-Azadirachta indica        Summer   
59           Indian Coral-Erythrina indica  Post-Monsoon   
36       Chiku Sapodilla-Manilkara zapota        Winter   

     Reference Onset Week (Avg)  Post-2020 Onset Week (Avg)  \
106                   25.500000                   23.000000   
105                   11.750000                    9.666667   
109                   11.750000                    9.666667   
99                    41.666667  

In [10]:
seasonal_onset_shift_avg.to_csv('12_5_seasonal_onset_shift_by_season.csv', index=False)