# 'Surprise!' Map

- get window of data (1m, 3m, 6m) from each year of data
- calc mean and std of that window
- calculate distance of current years window from mean ad multiple of std

### Overview

Works the same way for both fires and GLADS (note, for fires you can also select dataset) - and calculates how unusual (i.e. how many multiples of sigma from the mean) each location is.

We can then rank regions based on this (best = below average fires/glads, and worst = above average fires/glads).

At global level the widget will break down countries glad or fire data (if they have it). At country level it will break down admin-1's, and ad admin-1 is will break down the contained admin-2's. (Maybe at admin-2 we show the position of _that_ admin 2 in the ranking?)

Clicking on a location in the ranked list should take you to it's dashboard.

### User settings

Users can select:

- forestType / landUse combination
- a time range over which we consider the deviation from the mean (tbd)
- Ascending or descending order (vis-a-vis the 'best' or 'worst' locations relative the to expected number of alerts)

In [9]:
# Tables

fire_iso = 'ff289906-aa83-4a89-bba0-562edd8c16c6'
fire_adm1 = '9b9e56fc-270e-486d-8db5-e0a839c9a1a9'
fire_adm2 = '0f24299d-2aaa-4afc-945c-b614028c12d1'

glad_iso = '391ca96d-303f-4aef-be4b-9cdb4856832c'
glad_adm1 = 'c7a1d922-e320-4e92-8e4c-11ea33dd6e35'
glad_adm2 = '428db321-5ebb-4e86-a3df-32c63b6d3c83'

In [1]:
#Import Global Metadata, functions etc

%run '0.Importable_Globals.ipynb'

In [111]:
from datetime import date
from datetime import timedelta

import datetime

In [90]:
def alerts_summary(iso=None, adm1=None, polyname='admin', type='glad'):
    """
        Fetches the raw data (we may be able to reuse old sql queries from the ranked glad widget here)
    """
        
    if iso and adm1:
        if type == 'glad':
            table = glad_adm2
        elif type == 'fire':
            table = fire_adm2
            
        location = f"iso = '{iso}' AND adm1 = {adm1}"
        sql = f"""  SELECT iso, adm1, adm2, week, year, alerts as count, area_ha, polyname 
                FROM data
                WHERE {location}
                AND polyname = '{polyname}'
                """
        
    elif iso and not adm1:
        if type == 'glad':
            table =  glad_adm1
        elif type == 'fire':
            table = fire_adm1
            
        location = f"iso = '{iso}'"
        sql = f"""  SELECT iso, adm1, week, year, alerts as count, area_ha, polyname 
                FROM data
                WHERE {location}
                AND polyname = '{polyname}'
                """
        
    elif not iso and not adm1:
        if type == 'glad':
            table = glad_iso
        elif type == 'fire':
            table = fire_iso
            
        location = 'Global'
        sql = f"""  SELECT iso, week, year, alerts as count, area_ha, polyname 
                FROM data
                WHERE polyname = '{polyname}'
                """

    url = f'https://production-api.globalforestwatch.org/query/{table}?sql='

    r = requests.get(url+sql)
    print(r.url)

    return r.json().get('data', None)

In [163]:
# Example user settings

adm0 = 'BRA'
adm1 = None

polyname = 'admin' # this would be some forestType + landUse combo

period = 4         # in weeks 
type = 'fire'

In [165]:
# Fetch data, print first 3 entries

data = alerts_summary(adm0, adm1, polyname, type)

data[0:3]

https://production-api.globalforestwatch.org/query/9b9e56fc-270e-486d-8db5-e0a839c9a1a9?sql=%20%20SELECT%20iso,%20adm1,%20week,%20year,%20alerts%20as%20count,%20area_ha,%20polyname%20%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20FROM%20data%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20WHERE%20iso%20=%20'BRA'%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20AND%20polyname%20=%20'admin'%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20


[{'_id': 'AWd3QCkYAfC9sjteCgj4',
  'adm1': 1,
  'alerts': 7,
  'count': 7,
  'iso': 'BRA',
  'polyname': 'admin',
  'week': 1,
  'year': 2003},
 {'_id': 'AWd3QCkYAfC9sjteCgkV',
  'adm1': 1,
  'alerts': 1,
  'count': 1,
  'iso': 'BRA',
  'polyname': 'admin',
  'week': 8,
  'year': 2009},
 {'_id': 'AWd3QCkYAfC9sjteCgkb',
  'adm1': 1,
  'alerts': 1,
  'count': 1,
  'iso': 'BRA',
  'polyname': 'admin',
  'week': 9,
  'year': 2014}]

In [166]:
# Create list of locations
if adm1:
    print('getting adm2')
    admin = 'adm2' 
elif adm0:
    print('getting adm1')
    admin = 'adm1'
else:
    print('getting iso')
    admin = 'iso'

locations = []
for d in data:
    if d.get(admin) not in locations:
        locations.append(d.get(admin))
        
locations[0:3]

getting adm1


[1, 2, 3]

In [167]:
# create stats and assign to admin

adm_list = []

today = date.today().isocalendar() # an array of today's date in [year, isoweek, day] format

for adm in locations:
    
    # grab all data entries that match (essentially: groupBy(data, 'iso'))
    adm_filter = list(filter(lambda x: x.get(admin) == adm, data))
    
    # Temporary array to store count stats
    tmp_c = []
    
    # iterate through years (we now have a case where all iso data doesnt start at the same point, though!)
    for year in range(2015, today[0] + 1):
        
        # temp count counter
        tc = 0
        
        # get the start and end date of the window in 'YYYY-MM-DD' format
        start = datetime.datetime.strptime(f'{year}-{today[1]}-{today[2]}', '%Y-%W-%w')
        end = start - timedelta(weeks=period)
        
        # convert into single numbers (year, isoweek, isoday)
        startDate = start.isocalendar()
        endDate = end.isocalendar()
        
        # then filter data within that window
        if startDate[0] == endDate[0]:
            window_filter = list(filter(lambda x: x.get('year') == startDate[0]
                                        and x.get('week') <= startDate[1]
                                        and x.get('week') >= endDate[1],
                                        adm_filter))
            
        else:
            start_filter = list(filter(lambda x: x.get('year') == startDate[0]
                                        and x.get('week') <= startDate[1],
                                        adm_filter))
            
            end_filter = list(filter(lambda x: x.get('year') == endDate[0]
                                        and x.get('week') >= endDate[1],
                                        adm_filter))
            
            window_filter = end_filter + start_filter
        
        
        # sum all counts in that window
        for week in window_filter:
            tc += week.get('count')
        
        tmp_c.append(tc)
        
        # calculate the stats for the window across all years
        if year == today[0]:
            count_mean = np.mean(tmp_c)
            count_std = np.std(tmp_c)
            
            # essentially the distance from the mean as a ratio of standard deviation
            count_surprise = (tc - count_mean) / count_std
            
            
    # append stats object
    if count_mean > 0:
        adm_list.append({'adm': adm, 
                         'count_surprise': count_surprise,
                         'count_mean': count_mean,
                         'count_std': count_std,
                         'count_total': tc,
                        })

In [168]:
# Example of sorting by ascending order

best = sorted(adm_list, key=lambda k: k['count_surprise'], reverse=False)[0:5]
best

[{'adm': 12,
  'count_mean': 4567.0,
  'count_std': 1555.9911632139817,
  'count_surprise': -1.710163953954317,
  'count_total': 1906},
 {'adm': 5,
  'count_mean': 2760.25,
  'count_std': 539.46194258724131,
  'count_surprise': -1.5316187014737923,
  'count_total': 1934},
 {'adm': 22,
  'count_mean': 1620.25,
  'count_std': 726.30757086787969,
  'count_surprise': -1.5217382336787626,
  'count_total': 515},
 {'adm': 1,
  'count_mean': 155.0,
  'count_std': 93.944132334063312,
  'count_surprise': -1.4263796649214757,
  'count_total': 21},
 {'adm': 9,
  'count_mean': 889.25,
  'count_std': 437.62048340999763,
  'count_surprise': -1.3464863331087356,
  'count_total': 300}]

In [169]:
# Example of sorting by descending order

worst = sorted(adm_list, key=lambda k: k['count_surprise'], reverse=True)[0:5]
worst

[{'adm': 15,
  'count_mean': 1391.75,
  'count_std': 654.84401768665487,
  'count_surprise': 0.74101616093894562,
  'count_total': 1877},
 {'adm': 20,
  'count_mean': 876.25,
  'count_std': 395.05466393905539,
  'count_surprise': 0.38918664689836147,
  'count_total': 1030},
 {'adm': 4,
  'count_mean': 2462.25,
  'count_std': 659.32972593384568,
  'count_surprise': 0.16494023509401351,
  'count_total': 2571},
 {'adm': 18,
  'count_mean': 5077.25,
  'count_std': 1908.4608163386536,
  'count_surprise': 0.11147726910534998,
  'count_total': 5290},
 {'adm': 8,
  'count_mean': 276.25,
  'count_std': 78.001201913816686,
  'count_surprise': 0.048076182263747227,
  'count_total': 280}]

In [173]:
# Get admin 2 and 1 name dictionaries

areaId_to_name = None
if adm1:
    tmp = get_admin2_json(iso=adm0, adm1=adm1)
    areaId_to_name ={}
    for row in tmp:
        areaId_to_name[row.get('adm2')] = row.get('name')
elif adm0 and not adm1:
    tmp = get_admin1_json(iso=adm0)
    areaId_to_name={}
    for row in tmp:
        areaId_to_name[row.get('adm1')] = row.get('name') 

In [174]:
# dynamic sentence(s)!

phrase_dict = {
    'admin': 'in',
    'wdpa': 'in protected areas within',
    'aze': 'in Alliance for Zero Extinction sites within',
    'ifl': 'in intact forests within',
    'tcl': 'in tiger conservation landscapes within',
    'kba': 'in key biodiversity areas within'
}
    
if period == 4:
    timeframe = 'month'
elif period == 12:
    timeframe = '3 months'
elif period == 26:
    timeframe = '6 months'

print(f"In the last {timeframe} ",end="")
print(f"the most unusual number of {type} events occurred {phrase_dict[polyname]} ", end="")

if adm0 or adm1:
    print(f"{areaId_to_name[best[0].get('adm')]} ", end="")
else:
    print(f"{best[0].get('adm')} ", end="")
    
print(f"where {best[0].get('count_total')} were detected, around ", end="")
print(f"{abs(int(best[0].get('count_total') - int(best[0].get('count_mean'))))} ",end="")
print(f"less than normal for the same time period in previous years.")


In the last month the most unusual number of fire events occurred in Mato Grosso where 1906 were detected, around 2661 less than normal for the same time period in previous years.
