<a href="https://colab.research.google.com/github/Rebex3000/variableextract/blob/main/AIRCentre_ChelsaTemperature.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<!DOCTYPE html>
<html>
<body>
  <h1>Extract climate variables from chelsa, code includes to download raster file with the matching year from chelsa server </h1>
  <p>Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N.E., Linder, P., Kessler, M. (2017): Climatologies at high resolution for the Earth land surface areas. Scientific Data. 4 170122. https://doi.org/10.1038/sdata.2017.122  </p>

  <img src="https://chelsa-climate.org/wp-content/uploads/2016/02/logotest3.gif">


  <p>Variable: monthly average temperature (°C)</p>
  <p>Resolution of tiff file: 30 seconds, approx. 1 km</p>

  <p> Chelsa has two data sets for average monthly climate variables. <br>
  1. Historical climate: CHELSAcruts (1901-2016) link: <a> https://chelsa-climate.org/chelsacruts/</a><br>
  2. Recent climate (1980-2019) link: <a> https://envicloud.wsl.ch/#/?prefix=chelsa%2Fchelsa_V2%2FGLOBAL%2F </a>  </p>

  <p>Extract variables for whole countries based on country borders from shape file.</p>
  <p>Requirements:</p>
  <ul>
    <li>Shapefile with country borders of the whole world with column "AREAID"</li>
    <li>Table with columns "AREAID", "NAME_0", "SPECIESID", and "YEAR"</li>
  </ul>
  <p>Note: This code example is for maximum temperature, same works for the minimum temperature but download link and  variable names have to be changed. </p>
</body>
</html>


In [None]:
# Mount drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
pip install geopandas

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting geopandas
  Downloading geopandas-0.13.2-py3-none-any.whl (1.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m22.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting fiona>=1.8.19 (from geopandas)
  Downloading Fiona-1.9.4.post1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (16.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.4/16.4 MB[0m [31m104.6 MB/s[0m eta [36m0:00:00[0m
Collecting pyproj>=3.0.1 (from geopandas)
  Downloading pyproj-3.6.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.9/7.9 MB[0m [31m133.2 MB/s[0m eta [36m0:00:00[0m
Collecting click-plugins>=1.0 (from fiona>=1.8.19->geopandas)
  Downloading click_plugins-1.1.1-py2.py3-none-any.whl (7.5 kB)
Collecting cligj>=0.5 (from fiona>=1.8.19->geopanda

In [None]:
pip install rasterstats

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting rasterstats
  Downloading rasterstats-0.19.0-py3-none-any.whl (16 kB)
Collecting affine (from rasterstats)
  Downloading affine-2.4.0-py3-none-any.whl (15 kB)
Collecting rasterio>=1.0 (from rasterstats)
  Downloading rasterio-1.3.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (21.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m21.3/21.3 MB[0m [31m22.8 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting simplejson (from rasterstats)
  Downloading simplejson-3.19.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (137 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.9/137.9 kB[0m [31m15.2 MB/s[0m eta [36m0:00:00[0m
Collecting snuggs>=1.4.1 (from rasterio>=1.0->rasterstats)
  Downloading snuggs-1.4.7-py3-none-any.whl (5.4 kB)
Installing collected packages: snuggs, simplejson

In [None]:
# import packages
import multiprocessing
import pandas as pd
import geopandas as gpd
from rasterstats import zonal_stats
import rasterio
import os
import urllib.request
import time
from tqdm import tqdm

## Define path to download the right file (corresponding year to introduction if species into new region)
! Do not change file paths !

In [None]:
def downloadYearlyData(m, year, output_dir):
    if year < 1980:
        tiff_file_name = os.path.join(output_dir, f'CHELSAcruts_tmax_{m}_{year}_V.1.0.tif')
        if os.path.exists(tiff_file_name):
            return tiff_file_name
        tiff_url = 'https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V1/chelsa_cruts/tmax/CHELSAcruts_tmax_{}_{}_V.1.0.tif'.format(m,year)

    elif 1979 < year < 2020:
        tiff_file_name = os.path.join(output_dir, f'CHELSAV21_tasmax_{m}_{year}_V.2.1.tif')
        if os.path.exists(tiff_file_name):
            return tiff_file_name
        tiff_url = 'https://os.zhdk.cloud.switch.ch/envicloud/chelsa/chelsa_V2/GLOBAL/monthly/tasmax/CHELSA_tasmax_{:02d}_{}_V.2.1.tif'.format(m,year)

    # get the file size
    with urllib.request.urlopen(tiff_url) as response:
        file_size = int(response.info().get('Content-Length'))

        # retry the download up to three times
        max_retry = 3
        retry_count = 0
        while retry_count < max_retry:
            try:
                # download the file with tqdm progress bar
                with tqdm(unit='B', unit_scale=True, unit_divisor=1024, miniters=1, desc=tiff_file_name,
                          total=file_size) as progress_bar:
                    urllib.request.urlretrieve(tiff_url, tiff_file_name,
                                               reporthook=lambda b, bsize, t: progress_bar.update(bsize))
                return tiff_file_name
            except:
                # wait for 60 seconds and try again
                time.sleep(60)
                retry_count += 1

        # raise an error if all retries failed
        raise Exception('Failed to download {}'.format(tiff_url))

## Define zonal statistics depending of the data source, as they have different formats.
Only choose the chelsacruts data set if needed, as the recent data set is more accurate.

In [None]:
def readTiff(tiff_file_name, year):
    if year < 1980:
        with rasterio.open(tiff_file_name) as src:
            affine = src.transform
            stats = zonal_stats(region.geometry, tiff_file_name, affine=affine, stats=['mean'])
            monthly_tmax.append(stats[0]['mean'] * 0.1)
            print(stats[0]['mean'] * 0.1)


    elif 1979 < year < 2020:
        with rasterio.open(tiff_file_name) as src:
            affine = src.transform
            stats = zonal_stats(region.geometry, tiff_file_name, affine=affine, stats=['mean'])
            monthly_tmax.append(stats[0]['mean'] * 0.1 - 273.15)
            print(stats[0]['mean'] * 0.1 - 273.15)
  # os.remove(tiff_file_name)

## Own file paths must be inserted here, wherever nescessary:
This part will create a yearly average maximum temperature for the year of introduction of a species into a new region, as well as a value for the average temperature of the hottest month of that year.




In [None]:

if __name__ == '__main__':
    #55555 = no data can not find URL
    #999 = no data because introduction is after 2019

    # # Load test CSV table
    df = pd.read_csv('/content/drive/MyDrive/AIRCentre/introductions_test.csv')

    # # Load real CSV table
    #df = pd.read_csv('/content/drive/MyDrive/AIRCentre/aedesalbopictus.csv')
    print(df.head())

    # # Load test shapefile of the regions
    shapefile = gpd.read_file('/content/drive/MyDrive/AIRCentre/my_few_worldregions.shp', crs='EPSG:4326')

    # # Load shapefile of the global regions
    # shapefile = gpd.read_file('/content/drive/MyDrive/AIRCentre/my_global_regions_rebecca230303.shp', crs='EPSG:4326')
    print(shapefile.head())

    # # list of the respective annual average maximum temperatures
    tmax = []
    hottest_month = []

    # specify the directory to save the files
    output_dir = '/content/drive/MyDrive/AIRCentre'

    # # loop through each entry in the csv table
    for index, row in df.iterrows():
        # extract relevant data from the row
        species_id = row['SPECIESID']
        area_id = row['AREAID']
        country = row['NAME_0']
        year = row['YEAR']

        # skip the loop iteration if the country is null or missing
        if pd.isnull(country):
            continue

        # select the relevant region from the shapefile
        region = shapefile.loc[shapefile['AREAID'] == area_id].iloc[0]

        # initialize list of monthly average maximum temperatures
        monthly_tmax = []

        try:
            with multiprocessing.Pool(processes=6) as pool:
                # apply the function to each month in parallel
                results = []
                for m in range(1, 13):
                    results.append(pool.apply_async(downloadYearlyData, args=(m, year, output_dir)))

                # wait for all processes to finish
                for r in results:
                    r.wait()

                for m in range(1, 13):
                    readTiff(tiff_file_name=downloadYearlyData(m, year, output_dir), year=year)

            # calculate the yearly average temperature and append to the list of yearly values
            tmax.append(sum(monthly_tmax) / len(monthly_tmax))
            hottest_month.append(max(monthly_tmax))

            # print the result
            print(f"SpeciesID: {species_id}, Country: {country}, AreaID: {area_id}, Year: {year}, Mean tmax: {sum(monthly_tmax) / len(monthly_tmax)}, Highest value: {max(monthly_tmax)}")


        except:
            # if url is not found or introduction later than 2019, in order to not lose the data
            tmax.append(9999)
            hottest_month.append(9999)
            dferror = pd.DataFrame(tmax, hottest_month, columns=['MEAN_TMAX_OFYEAR', 'MEAN_TMAX_hottestMONTH'])
            dferror.to_excel('/content/drive/MyDrive/AIRCentre/tmax_except_result.xlsx', index=False)

    # append tmax to initial table
    df['MEAN_TMAX_OFYEAR'] = tmax
    df['MEAN_TMAX_HOTTESTMONTH'] = hottest_month

    # save the DataFrame to an Excel file
    df.to_excel('/content/drive/MyDrive/AIRCentre/tmax.xlsx', index=False)



   SPECIESID            SPECIES AREAID  ISO    NAME_0  YEAR
0          1  Aedes albopictus    A295  FRA    France  1950
1          2   Aedes albopictus   A398  PRT  Portugal  2005
   ISO       NAME_0  NAME_1  MTemp  MTPrec AREAID       Realm  Island  \
0  DEU      Germany     NaN    9.1   726.6   A302  Palearctic       0   
1  GRC       Greece     NaN   14.2   618.9   A305  Palearctic       0   
2  NLD  Netherlands     NaN   10.0   801.2   A374  Palearctic       0   
3  PRT     Portugal     NaN   15.5   820.7   A398  Palearctic       0   
4  SYC   Seychelles     NaN   27.0  1460.7   A418  Afrotropic       1   

         ClimClass       AreaSqKm   AreaSqKmV2  \
0   T 0-10-Not Dry  357056.090045  357552.8398   
1  T 10-20-Not Dry  132747.841923  132561.6183   
2   T 0-10-Not Dry   37602.902495   37665.8118   
3  T 10-20-Not Dry   91995.322515   91878.0332   
4  T 20-30-Not Dry     494.407466     491.2000   

                                            geometry  
0  MULTIPOLYGON (((8.7012

/content/drive/MyDrive/AIRCentre/CHELSAcruts_tmax_10_1950_V.1.0.tif: 96.2MB [00:06, 14.8MB/s]                            
/content/drive/MyDrive/AIRCentre/CHELSAcruts_tmax_9_1950_V.1.0.tif: 97.4MB [00:07, 14.4MB/s]                            


4.693526518429173
9.208873928206204
11.39848889092832
11.740774160701331
18.388211688561416
23.14456105619215
24.580926986915596
22.714639435893403
17.62720564456215
13.827685056506803
9.675724580678432
2.6615294858011906
SpeciesID: 1, Country: France, AreaID: A295, Year: 1950, Mean tmax: 14.13851228611468, Highest value: 24.580926986915596


/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_2_2005_V.2.1.tif: 133MB [00:10, 13.0MB/s]                           
/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_1_2005_V.2.1.tif: 131MB [00:10, 13.3MB/s]                           
/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_3_2005_V.2.1.tif: 137MB [00:10, 13.4MB/s]                           
/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_4_2005_V.2.1.tif: 140MB [00:10, 13.5MB/s]                           
/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_5_2005_V.2.1.tif: 143MB [00:10, 13.6MB/s]                           
/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_6_2005_V.2.1.tif: 145MB [00:11, 13.4MB/s]                           
/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_9_2005_V.2.1.tif: 144MB [00:18, 8.29MB/s]                           

/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_7_2005_V.2.1.tif: 145MB [00:18, 8.29MB/s]
/content/drive/MyDrive/AIRCentre/CHELSAV21_tasmax_12_2005_V.2.1.tif: 132MB

11.99179746616511
11.909929952004177
16.09698764791517
17.840370562545957
22.176568512993526
28.392113114541473
28.985486660613162
30.49114382900217
26.425162508467793
20.97901298626431
14.242344445885806
12.206131361611995
SpeciesID: 2, Country: Portugal, AreaID: A398, Year: 2005, Mean tmax: 20.14475408733422, Highest value: 30.49114382900217
