#### Introduction
In order to classify long term and short term changes in relative abundance as 
'significant', it is important to ensure they have not occurred by chance. 
Here, the long and short term relative abundance changes for each species are 
tested for statistical significance. Two one-sided tests are used to test for increases
and decreases with the confidence interval set at 0.95. The periods being evaluated 
are:

1993 to 2023 (Long term relative abundance change). 
2013 to 2023 (Short term relative abundance change). 

Any changes in relative abundance changes determined as 'not significant' will 
need to be highlighted in the results section. 

In [1]:
# Importing the required packages
import numpy as np
import pandas as pd
import os
from pathlib import Path

# Importing localised file directory
project_root = Path(os.environ['butterfly_project'])

# Importing the required datasets
boot_results_2 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_2.csv', index_col=0)
boot_results_4 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_4.csv', index_col=0)
boot_results_8 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_8.csv', index_col=0)
boot_results_54 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_54.csv', index_col=0)
boot_results_75 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_75.csv', index_col=0)
boot_results_76 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_76.csv', index_col=0)
boot_results_84 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_84.csv', index_col=0)
boot_results_88 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_88.csv', index_col=0)
boot_results_93 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_93.csv', index_col=0)
boot_results_98 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_98.csv', index_col=0)
boot_results_99 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_99.csv', index_col=0)
boot_results_100 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_100.csv', index_col=0)
boot_results_104 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_104.csv', index_col=0)
boot_results_106 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_106.csv', index_col=0)
boot_results_121 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_121.csv', index_col=0)
boot_results_122 = pd.read_csv(project_root/'Data'/'UKBMS'/'bootstrap'/'boot_results_122.csv', index_col=0)

#### Cleaning and Organising the Bootstrapped Data

In [2]:
# Storing bootstrap dataframes in a list
boot_results = [boot_results_2,
               boot_results_4,
               boot_results_8,
               boot_results_54,
               boot_results_75,
               boot_results_76,
               boot_results_84,
               boot_results_88,
               boot_results_93,
               boot_results_98,
               boot_results_99,
               boot_results_100,
               boot_results_104,
               boot_results_106,
               boot_results_121,
               boot_results_122]

In [3]:
# A for loop is created to remove redundant year predictions in each species dataframe
for index, value in enumerate(boot_results): 
    boot_results[index] = (
        value[(value['year']==1993) # long term changes
        |(value['year']==2013) # short term changes
        | (value['year']==2023)]
        .reset_index(drop=True) # new index required following row removal
    )

In [4]:
# Redundant columns are removed from each species dataframe stored in 'boot_results'
for index, value in enumerate(boot_results):
    boot_results[index] = (
        value[['sample_n', 'year', 'log_predict']] 
    )

#### Computing Long Term Relative Abundance Change
For each resample (denoted by the 'sample_n' column) the 2023 relative abundance is 
subtracted from the 1993 value. If relative abundance increased, x<0. If it 
decreased then x>0. 

In [5]:
for index, value in enumerate(boot_results):
    boot_results[index]['lt_abundance_change'] = (
        value.groupby('sample_n')['log_predict']
        .transform(lambda x: x.iloc[0] # 1993 predicted relative abundance
                   -x.iloc[2]) # 2023 predicted relative abundance
    )

#### Computing Short Term Relative Abundance Change
For each resample (denoted by the 'sample_n' column) the 2023 relative abundance is 
subtracted from the 2013 value. If relative abundance increased, x<0. If it 
decreased then x>0.

In [6]:
for index, value in enumerate(boot_results):
    boot_results[index]['st_abundance_change'] = (
        value.groupby('sample_n')['log_predict']
        .transform(lambda x: x.iloc[1] # 2013 predicted relative abundance
                   -x.iloc[2])) # 2023 predicted relative abundance

#### Cleaning DataFrames in 'boot_results' List

In [7]:
# Redundant rows and columns are removed from each dataframe. 
abundance_change = boot_results.copy()
for index, value in enumerate(abundance_change):
    abundance_change[index] = (
        value
        # Duplicate rows caused by multiple years in each sample group are removed.
        .drop_duplicates('sample_n')
        # The only columns required now are those that record the long term and short 
        # changes for each resample.
        .drop(columns=['sample_n', 'year', 'log_predict'])
        .reset_index(drop=True)
    )

In [8]:
# A new dataframe is created to record the results. 
significance_results = (
    pd.DataFrame({'species_code':[2,4,8,54,75,76,84,88,93,98,99,100,104,106,121,122],
                  # 'tbc' (to be confirmed) is added as a place holder for 'significant' 
                  # or 'not significant'.
                  'long_term_sig':'tbc',
                  'short_term_sig':'tbc'})
)

#### Evaluating Long Term Changes in Relative Abundance

In [9]:
for index, value in enumerate(abundance_change):
    # Is there a significant decrease? True values are summed and divided by sample size.
    if sum(value['lt_abundance_change']>0)/500 >= 0.95:
        significance_results.loc[index,'long_term_sig'] = 'significant'
    # Is there a significant increase?
    elif sum(value['lt_abundance_change']<0)/500 >= 0.95:
        significance_results.loc[index,'long_term_sig'] = 'significant'
    else:
        significance_results.loc[index,'long_term_sig'] = 'not significant'

#### Evaluating Short Term Changes in Relative Abundance

In [10]:
for index, value in enumerate(abundance_change):
    # Is there a significant decrease? True values are summed and divided by sample size.
    if sum(value['st_abundance_change']>0)/500 >= 0.95:
        significance_results.loc[index,'short_term_sig'] = 'significant'
    # Is there a significant increase? 
    elif sum(value['st_abundance_change']<0)/500 >= 0.95:
        significance_results.loc[index,'short_term_sig'] = 'significant'
    else:
        significance_results.loc[index,'short_term_sig'] = 'not significant'

In [11]:
print(significance_results)

    species_code    long_term_sig   short_term_sig
0              2      significant      significant
1              4      significant  not significant
2              8      significant      significant
3             54      significant      significant
4             75      significant      significant
5             76      significant      significant
6             84      significant  not significant
7             88      significant      significant
8             93      significant      significant
9             98  not significant  not significant
10            99      significant      significant
11           100  not significant      significant
12           104      significant  not significant
13           106      significant  not significant
14           121      significant      significant
15           122      significant      significant


- For species codes 4, 84, 98, 100, 104 and 106 at least one of the long or short term changes in relative abundance was found to be non-significant (less than 95% of all resamples were either all positive or all negative). 
- For significant changes at least 95% of long or short term changes were found to be all positive or all negative.