### Starter template litter surveillance project (lsp) 20-21

### This template is still under development. Expected completion date is  01-31-2020

#### Intended use 

This is a standardized template for the lsp. It defines the following:

1. Directory structure 
2. File names
3. Data access methods 
4. Formatting 
5. Required fields

Contact hammerdirt if you plan on submitting your work for publication. We may be able to direct you to ressources or have some insight that could help.

#### How to save your version

Save your files in the following format: lastname_firstname_x, where x is equal to the number of notebooks you have in the repo. See the examples below repo.

#### How to save charts, data, files 

Use the same prefix for all files, lastname_firstname_one_scatter.svg, or lastname_firstname_one.csv etc... 

#### Keep the subject per notebook limited 

Keep your notebooks focussed and limited in scope. Better to produce a couple of notebooks that build on each other than to create a big doc that is difficult to follow.

#### Use utility files

There is a folder called utility files. You can store your methods, just prefix them example: lastname_firstname_utility_one.py.

#### Don't lose time doing lengthy transformations

If you find your self doing extensive itterations let us know how the data can be formatted on the server side to speed things up for you. Chances are if you need the results in a certain way others will too.

#### Adding to requirements.txt

Why isn't seaborn in here ? Because it wasn't imported! Check the requirements.txt file first (seaborn is in there).

#### Using Bokeh

If you are familiar and want to produce content for the app, go for it!

#### Why are we using JSON format for data?

These visualisations are destined to be included in a web application, there fore we need to include the data in a format that is easily consumed by D3 or ReactJS.

#### Contact 

roger@hammerdirt.ch

#### Imports

In [1]:
# you can add to the imports as you like
# see the requirements.txt file for whats in the env

import os
import os.path
import scipy.stats
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
import statsmodels.api as sm
import pandas as pd
import patsy
# this you will need to get the data
import requests
import json
# set your file prefix
file_prefix = "lastname_firstname"
notebook_number = "_one"
# set the curernt working directory 
current_directory = os.getcwd()
my_json_folder = '{}{}_jsons'.format(file_prefix, notebook_number)
my_svg_folder = '{}{}_svgs'.format(file_prefix, notebook_number)
# use this to either use previously saved data or to get fresh data 
use_local = False # use local when possible

### <font color = blue>State your purpose</font>

Make it easy to understand what you are trying to show and what are the practical apllications.

#### <font color = blue>Briefly</font>

Tell us why you are doing this... studies, personal interest, professional

### <font color = blue>Get the data</font>

Getting data is simple, the api is here [https://mwshovel.pythonanywhere.com/](https://mwshovel.pythonanywhere.com/). Pick the endpoint(s) you need. Store them in a dictionary and load the dictionary into the provided script.

The script will make a folder lastname_firstname_x_jsons and store your request in a JSON object in the json directory of the repo. Using the dictionary key values as filenames. Then you can access the data locally.

#### <font color = blue>Sorting categories</font> 

There are endpoints that provide grouped locations:

1. https://mwshovel.pythonanywhere.com/api/list-of-beaches/by-category/ : returns an array of grouped locations, one group for each city, body of water and postal-code.

2. https://mwshovel.pythonanywhere.com/api/list-of-beaches/categories/ : returns an array of grouped categories cities, rivers, lakes and postal codes.

I will be adding code groups soon! See the example on sorting.

__You can make your own script__ as long as you use the directory name specified by "my_json_folder".

In [2]:
# sample script for getting data
# you provide this data
import os.path
requested = {
    "beaches":"https://mwshovel.pythonanywhere.com/api/list-of-beaches/",
    "daily_totals":"https://mwshovel.pythonanywhere.com/api/surveys/daily-totals/",
    "code_defs":"https://mwshovel.pythonanywhere.com/api/mlw-codes/list/",
    "sorted_location":"https://mwshovel.pythonanywhere.com/api/list-of-beaches/by-category/"
}
file_names = ["beaches", "daily_totals"]
def directory_name(current_directory, my_json_folder):
    return '{}/{}/'.format(current_directory, my_json_folder)
def make_json_dir(directory_name):
    try:
        os.mkdir(directory_name)
        message = '{} created'.format(directory_name)
        success=True
    except OSError:
        message = 'Unable to create {}'.format(directory_name)
        success=False        
    print(message)
    return success
def get_file_names_prefix(a_dict):
    return list(a_dict.keys())
def get_data_from_endpoint(an_endpoint):
    return requests.get(an_endpoint)
def make_file_name(name,file_extension):
    return '{}.{}'.format(name,file_extension)
def make_file_path(my_json_folder, file_name):
    return '{}/{}'.format(my_json_folder, file_name) 
def save_a_json_file(save_to, some_data, file_name):    
    try:
        with open(save_to, 'w') as fresh_data:
            json.dump(some_data.json(), fresh_data)
        message=succes_string(file_name, save_to)
        success = True
    except OSError:
        message=error_string(file_name, save_to)
        success=False
    return success, message
def succes_string(file_name, save_to):
    return '{} saved to {}'.format(file_name, save_to)
def error_string(file_name, save_to):
    return 'Unable to save {} to {}'.format(file_name, save_to)
def get_a_json_file(file_name_directory):    
    with open(file_name) as json_data:
        d = json.load(json_data)
        return d
def use_local_data(a_list, jsons_directory, file_extension):
    """
    Retrieves json data from local directory
    
    accepts an array of filenames
    """
    my_local_data = {}
    for name in a_list:
        file_name = make_file_name(name,file_extension)
        file_path = make_file_path(my_json_folder, file_name)
        with open(file_path) as json_data:
            d = json.load(json_data)
            my_local_data[name] = d
    return my_local_data
    

def get_and_store_api_data(requested, my_json_folder, file_extension, current_directory):
    """
    Gets and saves json data from an endpoint.
    
    Accepts four variables:
    
    1. requested: a dictionary of endpoints
    2. my_json_folder: the directory to store the files 
    3. files extension: the .extension of the file you are saving
    4. current_directory: the value of os.getcwd()
    
    Returns a dict where keys = the keys of the "requested" dictionary and
    values = data from the endpoint.
    
    Saves the successfull request to the directory specified by "my_json_folder"
    
    if request is unsuccessfull returns an empty dictionary and an error message
    """
    directory = directory_name(current_directory, my_json_folder)
    file_names = get_file_names_prefix(requested)
    if os.path.isdir(directory):
        success_count = 0
        for name in file_names:
            an_endpoint = requested[name]
            some_data = get_data_from_endpoint(an_endpoint)
            if some_data.status_code == 200:
                file_name = make_file_name(name,file_extension)
                save_to = make_file_path(my_json_folder, file_name)
                success, message =  save_a_json_file(save_to, some_data, file_name)
                if success:
                    print(message)
                    success_count += 1
                else:
                    print("There was a filename or directory name error")
                    print(message)                    
            else:
                print('Sorry there was a {} error'.format(some_data.status_code))                
        if success_count == len(file_names):
            data = use_local_data(file_names, my_json_folder, file_extension)
        else:
            print("The request was unsuccessfull")
            
    else:
        json_dir = make_json_dir(directory)
        if json_dir:
            success_count = 0
            for name in file_names:
                an_endpoint = requested[name]
                some_data = get_data_from_endpoint(an_endpoint)
                if some_data.status_code == 200:
                    file_name = make_file_name(name,file_extension)
                    save_to = make_file_path(my_json_folder, file_name)
                    success, message =  save_a_json_file(save_to, some_data, file_name)
                    if success:
                        print(message)
                        success_count += 1
                    else:
                        print("There was a filename or directory name error")
                        print(message)
                else:
                    print('Sorry there was a {} error'.format(some_data.status_code))                    
        if success_count == len(file_names):
            data = use_local_data(file_names, my_json_folder, file_extension)
        else:
            print("The request was unsuccessfull")
    return data

In [3]:
my_jsons = get_and_store_api_data(requested, my_json_folder, "json", current_directory)

beaches.json saved to lastname_firstname_one_jsons/beaches.json
daily_totals.json saved to lastname_firstname_one_jsons/daily_totals.json
code_defs.json saved to lastname_firstname_one_jsons/code_defs.json
sorted_location.json saved to lastname_firstname_one_jsons/sorted_location.json


#### <font color = blue>Read the requested data in</font>

You can use any method you want to read in data. The method returns a dictionary of objects that can be read in or you can use the file that was saved. For example this is valid:

In [4]:
def read_the_json(file):
    with open(file) as some_data:
        requested = json.load(some_data)
        return requested

my_beaches = read_the_json("lastname_firstname_one_jsons/beaches.json")
my_daily_totals = read_the_json("lastname_firstname_one_jsons/daily_totals.json")
my_code_defs = read_the_json("lastname_firstname_one_jsons/code_defs.json")
my_sorted_locations = read_the_json("lastname_firstname_one_jsons/sorted_location.json")

# it returns an array of dictionaries
# take a look at the first result
print(my_beaches[0])

{'location': 'A l ombre', 'latitude': '43.81119100', 'longitude': '4.64817800', 'city': 'Beaucaire', 'post': '30032', 'country': 'FR', 'water': 'r', 'water_name': 'Rhöne en aval', 'slug': 'a-l-ombre', 'city_slug': 'beaucaire', 'water_name_slug': 'rhone-en-aval'}


In [5]:
# or directly into a data frame
my_beach_df = pd.read_json("lastname_firstname_one_jsons/beaches.json", orient='records')
my_beach_df.iloc[:2]

Unnamed: 0,city,city_slug,country,latitude,location,longitude,post,slug,water,water_name,water_name_slug
0,Beaucaire,beaucaire,FR,43.811191,A l ombre,4.648178,30032,a-l-ombre,r,Rhöne en aval,rhone-en-aval
1,Bern,bern,CH,46.97101,aarezufluss_bern_scheurerk,7.45279,3004,aarezufluss_bern_scheurerk,r,Aare,aare


### <font color = blue>Sorting data </font>

This study is concerned with the quantity, type and dispersion of litter along Swiss lakes and rivers. Therefore location is a primary sorting category and there is an endpoint dedicated to just that.

The locations have five categories:

1. Lake or river
2. Lake or river name 
3. City name 
4. Postal code

<font color=red>There will be two new categories added -- source and basin</font>

For now lets use the existing categories to identify all the locations on Thunersee. 

#### <font color = blue>Sorting locations by categories</font>

In [6]:
# this is the general output from the sorted locations endpoint:
print(my_sorted_locations[1])

{'location': 'Aare', 'beaches': ['aarezufluss_bern_scheurerk', 'aare_bern_caveltin', 'aare_bern_gerberm', 'aare_bern_scheurerk', 'aare_brugg_buchie', 'aare_elfenau_cataldiv', 'aare_kehrsatz_stolten', 'aare_koniz_hoppej', 'aare_rupperswil_badert', 'aare_solothurn_nguyena', 'aare_suhrespitz_badert']}


In [7]:
def beaches_per_location(a_list, an_arg):
    return [x['beaches'][0] for x in a_list if x['location'] == an_arg]
thunersee = beaches_per_location(my_sorted_locations, "Thunersee")
thunersee

['thunersee_spiez_meierd_1']

In [8]:
# in pandas
my_beach_df.loc[my_beach_df.water_name == "Thunersee"]

Unnamed: 0,city,city_slug,country,latitude,location,longitude,post,slug,water,water_name,water_name_slug
124,Spiez,spiez,CH,46.704437,thunersee_spiez_meierd_1,7.657882,3646,thunersee_spiez_meierd_1,l,Thunersee,thunersee


#### <font color = blue>Thats great, but how do I know the lake or city names ?</font>

There is an endpoint for that. 

In [9]:
# make the data local first
# use the endpoint to get the list:
requested = {"cities_lakes":"https://mwshovel.pythonanywhere.com/api/list-of-beaches/categories/"}
my_jsons = get_and_store_api_data(requested, my_json_folder, "json", current_directory)

cities_lakes.json saved to lastname_firstname_one_jsons/cities_lakes.json


In [10]:
lakes = [x['results'] for x in my_jsons['cities_lakes'] if x['category'] == 'lakes'][0]
print(lakes)

['Bielersee', 'Bodensee', 'Greifensee', 'Katzensee', 'Lac Léman', 'Neuenburgersee', 'Quatre Cantons', 'Schiffenensee', 'Sempachsee', 'Sihlsee', 'Thunersee', 'Untersee', 'Walensee', 'Zugersee', 'Zurichsee']


#### <font color = blue>Sorting by object type, material or use</font>

The code data holds the properties for each type of object found. The properties are:

1. Code
2. Material
3. Source

<font color=red>There will be three new properties added -- single-use & micro-plastic (both boolean) and source_two an alternate or local source code.</font>

Lets start by looking at one code definition

In [11]:
# get one code definition from the list
# the index number is non specific and may change with each query 
# therefore codes should always be called by a property value
print(my_code_defs[101])

{'code': 'G21', 'material': 'Plastic', 'description': 'Drink lids', 'source': 'Packaging'}


In [12]:
# get all the codes with a source of "Utility items"
def objects_per_group(a_list, a_group, an_arg):
    return [x for x in a_list if x[a_group] == an_arg]
utility_items = objects_per_group(my_code_defs, "source", "Utility items")
# print the first object
print(utility_items[0])

{'code': 'G141', 'material': 'Cloth', 'description': 'Carpet ', 'source': 'Utility items'}
