# Example Use of Project

### This notebook shows how to create a dataframe with the predicted desired litters for the darkzones

1. Create a Conda environment from the [environment.yml](environment.yml) file.
2. Create a Jupyter Notebook or Python script and activate the created environment.
3. Import pandas.
4. Import darkzones (located in the `src` foler).

In [1]:
import pandas as pd
import darkzones

5. Declare a pandas.DataFrame containing data with the readings for litter counts. Must contain the following columns:
    - `date_utc` : The date in which the litter count reading was performed
    - `edge_id` : The edge number corresponding to the reading
    - `osm_highway` : The type of street according to Open Street Maps
    - *litter counts* : At least one column with litter counts with a numerical label (E.g. `21`)
6. **IMPORTANT**: The longer the time frame of the data, the better the prediction model will be. (Recommended 12+ months)

In [2]:
df = pd.read_csv('../../../learn/sit_ds/projects/cortexia/Information/output/datav2.csv', dtype = {'place.id': object})

7. Declare a list with the numerical litters that want to be predicted.
8. Declare variable called `models` which calls the function [darkzones.train_models()](src/darkzones.py). Pass the following arguments:
    - pandas.DataFrame declared in step 5
    - litters list declared in step 7

In [3]:
litters = ["1", "2", 4, "21"]  #<-- Can be either string or integer
models = darkzones.train_models(df, litters)

The fitting took: 1.0 minutes
Litter 1 D2 Score: 0.5953
#################################
The fitting took: 1.0 minutes
Litter 2 D2 Score: 0.672
#################################
The fitting took: 1.1 minutes
Litter 4 D2 Score: 0.6558
#################################
The fitting took: 1.3 minutes
Litter 21 D2 Score: 0.6742
#################################


9. Define a variable that calls the function [darkzones.predict_darkzones()](src/darkzones.py). Pass the following arguments:
    - pandas.DataFrame declared in step 5
    - models dictionary declared in step 8

In [4]:
df_darkzones = darkzones.predict_darkzones(df, models)

### Returns Pandas Dataframe with the darkzones for each individual day with predicted litter counts

In [5]:
# For privacy matters, the following code is executed

df_darkzones.drop('edge_osmid', axis=1, inplace=True)

import random
def randomize_edge(edge_id):
    edges = tuple(edge_id[1:-1].split(', ')[0:2])
    edge1 = random.randint(10**(len(edges[0])-1), 10**(len(edges[0]))-1)
    edge2 = random.randint(10**(len(edges[1])-1), 10**(len(edges[1]))-1)
    randomized_edge = f"({edge1}, {edge2}, 0)"
    return randomized_edge

df_darkzones['edge_id'] = df_darkzones['edge_id'].apply(lambda x: randomize_edge(x))

In [7]:
df_darkzones.sample(n=20)

Unnamed: 0,date_utc,edge_id,osm_highway,row_type,1,2,4,21
1478199,2022-06-02,"(612122553, 8087012472, 0)",tertiary,darkzone,6,0,2,0
1277808,2022-05-11,"(434299738, 195095286, 0)",unclassified,darkzone,2,0,1,0
215013,2022-01-22,"(104637446, 213910470, 0)",residential,darkzone,4,9,19,0
1747566,2022-06-27,"(229256301, 297114410, 0)",unclassified,darkzone,5,0,5,10
1388459,2022-05-23,"(350520403, 757062251, 0)",footway,darkzone,1,0,1,0
622352,2022-03-06,"(2705343977, 2862166187, 0)",footway,darkzone,1,1,1,0
215235,2022-01-22,"(705334202, 8055447201, 0)",service,darkzone,1,4,4,0
748329,2022-03-20,"(5999594242, 3896736151, 0)",primary,darkzone,2,2,3,0
439929,2022-02-15,"(4904947272, 540186138, 0)",residential,darkzone,1,33,3,0
1705955,2022-06-23,"(8249106277, 7395406163, 0)",footway,darkzone,0,0,1,0


In [6]:
import helper_scripts.data_processor as data_processor
original_df = data_processor.clean_df(df)
original_df = data_processor.aggregate_df(original_df)
print(f"Length of original dataframe: {len(original_df)}")
print(f"Length of darkzones dataframe: {len(df_darkzones)}")

Length of original dataframe: 598807
Length of darkzones dataframe: 1782723
