## Analysis exploration

Now that we have explore in the previous notebooks the workflow for the four different indicators, we are going to produce mock data to star exploring graphs and visualizations. The workflow that we are going to use would be covered in the table of contents:

## Table of Contents
- ### [Python libraries](#libraries)
- ### [1. Explore mock data](#importData)
- ### [2. Default crop and pasture data](#crop_data)
- ### [3. Generate risk map](#risk)
    - #### [3.1 Unsustainable water use risk](#waterRisk)
    - #### [3.2 Deforestation risk](#deforestationRisk)
    - #### [3.3 Carbon emissions due to land use change risk](#carbonRisk)
    - #### [3.4 Biodiverstity loss due to land use change risk](#biodiversityRisk)
- ### [4. Get metrics for user data](#metric)
- ### [5. Final notes](#finalNotes)

<a id='libraries'></a>
## Python libraries

In [249]:
# import libraries
# Data
from collections import Counter
from math import pi

import geopandas as gpd
import pandas as pd
import pandas_bokeh
from bokeh.io import show
from bokeh.models import ColumnDataSource

# Create Bokeh-Table with DataFrame:
from bokeh.models.widgets import DataTable, TableColumn
from bokeh.palettes import BuGn, Spectral10
from bokeh.plotting import figure
from bokeh.transform import cumsum, dodge
from geopandas.tools import sjoin

pandas_bokeh.output_notebook()

<a id='importData'></a>
## 1. Explore mock data

In [10]:
!ls ../../datasets/processed/user_data/located_lg_data_point_v2.shp

../../datasets/processed/user_data/located_lg_data_point_v2.shp


In [2]:
mock_data = gpd.read_file(
    "../../datasets/processed/processed_data/located_lg_data_polygon_v2_stats.shp"
)
mock_data.head()

Unnamed: 0,Material,Material d,Volume,Country,Address,Latitude,Longitude,Location t,Accuracy,wr_mean,...,cr_median,cr_std,cr_min,cr_max,bio_mean,bio_median,bio_std,bio_min,bio_max,geometry
0,Rubber,,2400,China,,,,Unknown,Low,1.686943,...,0.0,8.8491e-11,0.0,2.813154e-08,0.0,0.0,0.0,0.0,1.4e-14,"MULTIPOLYGON (((73.49973 39.38174, 73.50468 39..."
1,Rubber,,1300,Malaysia,,,,Unknown,Low,226.718707,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"MULTIPOLYGON (((98.93721 5.68384, 98.93771 5.6..."
2,Rubber,,1000,United States,,,,Unknown,Low,0.088295,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"MULTIPOLYGON (((-180.00000 51.79409, -180.0000..."
3,Rubber,,730,Japan,,,,Unknown,Low,,...,,,,,,,,,,"MULTIPOLYGON (((122.71418 24.44983, 122.71457 ..."
4,Rubber,,490,India,,,,Unknown,Low,45.279566,...,31.787052,11537.2,-243.486969,650395.0,2.728085e-08,4.16822e-10,7.774598e-08,-5.64263e-10,7.001327e-07,"MULTIPOLYGON (((68.11138 23.60145, 68.13528 23..."


In [4]:
## calulate impact


# calculate water risk impact
wr_impact = [row["Volume"] * row["wr_mean"] for i, row in mock_data.iterrows()]
wr_impact_min = [row["Volume"] * row["wr_min"] for i, row in mock_data.iterrows()]
wr_impact_max = [row["Volume"] * row["wr_max"] for i, row in mock_data.iterrows()]

# calculate deforestation impact
df_impact = [row["Volume"] * row["df_mean"] for i, row in mock_data.iterrows()]
df_impact_min = [row["Volume"] * row["df_min"] for i, row in mock_data.iterrows()]
df_impact_max = [row["Volume"] * row["df_max"] for i, row in mock_data.iterrows()]

# calculate carbon impacts
cr_impact = [row["Volume"] * row["cr_mean"] for i, row in mock_data.iterrows()]
cr_impact_min = [row["Volume"] * row["cr_min"] for i, row in mock_data.iterrows()]
cr_impact_max = [row["Volume"] * row["cr_max"] for i, row in mock_data.iterrows()]

# calculate biodiversity impacts
bio_impacts = [row["Volume"] * row["bio_mean"] for i, row in mock_data.iterrows()]
bio_impacts_min = [row["Volume"] * row["bio_min"] for i, row in mock_data.iterrows()]
bio_impacts_max = [row["Volume"] * row["bio_max"] for i, row in mock_data.iterrows()]


##append to dataframe
mock_data["wr_imp"] = wr_impact
mock_data["wr_imp_min"] = wr_impact_min
mock_data["wr_imp_max"] = wr_impact_max
mock_data["df_imp"] = df_impact
mock_data["df_imp_min"] = df_impact_min
mock_data["df_imp_max"] = df_impact_max
mock_data["cr_imp"] = cr_impact
mock_data["cr_imp_min"] = cr_impact_min
mock_data["cr_imp_max"] = cr_impact_max
mock_data["bio_imp"] = bio_impacts
mock_data["bio_imp_min"] = bio_impacts_min
mock_data["bio_imp_max"] = bio_impacts_max


mock_data.head()

Unnamed: 0,Material,Material d,Volume,Country,Address,Latitude,Longitude,Location t,Accuracy,wr_mean,...,wr_imp_max,df_imp,df_imp_min,df_imp_max,cr_imp,cr_imp_min,cr_imp_max,bio_imp,bio_imp_min,bio_imp_max
0,Rubber,,2400,China,,,,Unknown,Low,1.686943,...,140983.6,,,,6.672e-10,0.0,6.75157e-05,0.0,0.0,3.36e-11
1,Rubber,,1300,Malaysia,,,,Unknown,Low,226.718707,...,2533535.0,,,,0.0,0.0,0.0,0.0,0.0,0.0
2,Rubber,,1000,United States,,,,Unknown,Low,0.088295,...,1150.512,,,,0.0,0.0,0.0,0.0,0.0,0.0
3,Rubber,,730,Japan,,,,Unknown,Low,,...,,,,,,,,,,
4,Rubber,,490,India,,,,Unknown,Low,45.279566,...,1437854.0,71.314111,8.5e-05,1854.204938,640589.1,-119308.614807,318693600.0,1.3e-05,-2.764889e-07,0.000343065


In [6]:
# export dataframe

mock_data.to_file(
    "../../datasets/processed/processed_data/located_lg_data_polygon_v2_stats_impacts.shp",
    driver="ESRI Shapefile",
)

  mock_data.to_file('../../datasets/processed/processed_data/located_lg_data_polygon_v2_stats_impacts.shp',driver='ESRI Shapefile')


### General charts:

 - We can provide information about total volume bought by country or continent
 - percentage of material provided - to see your top materials
 - percentage of different types of locations - to see your general kowledge of your current supply chain
 - top countries by material provided
 

In [9]:
continents = gpd.read_file("../../datasets/raw/input_data_test/continents.shp")
continents

Unnamed: 0,OBJECTID,CONTINENT,SQMI,SQKM,Shape_Leng,Shape_Area,geometry
0,1.0,Africa,11583460.0,30001150.0,426.208612,2559.073098,"MULTIPOLYGON (((35.488 -21.685, 35.452 -21.787..."
1,2.0,Asia,17317280.0,44851730.0,2331.623746,5432.085227,"MULTIPOLYGON (((150.894 -10.649, 150.881 -10.6..."
2,3.0,Australia,2973612.0,7701651.0,252.165311,695.539921,"MULTIPOLYGON (((158.882 -54.711, 158.880 -54.7..."
3,4.0,North America,9339528.0,24189360.0,3954.89243,3707.418684,"MULTIPOLYGON (((-81.678 7.389, -81.649 7.384, ..."
4,5.0,Oceania,165678.7,429107.6,221.581942,42.56547,"MULTIPOLYGON (((169.186 -52.577, 169.162 -52.5..."
5,6.0,South America,6856255.0,17757690.0,622.552582,1539.312933,"MULTIPOLYGON (((-67.209 -55.891, -67.247 -55.8..."
6,7.0,Antarctica,4754809.0,12314950.0,1587.227698,6034.461899,"MULTIPOLYGON (((51.803 -46.457, 51.721 -46.453..."
7,8.0,Europe,3821854.0,9898597.0,1596.706533,1444.638513,"MULTIPOLYGON (((23.849 35.523, 23.971 35.515, ..."


In [11]:
## join dataframes for getting stats at continent level
join_df = sjoin(continents, mock_data, how="inner")
join_df.head()

Unnamed: 0,OBJECTID,CONTINENT,SQMI,SQKM,Shape_Leng,Shape_Area,geometry,index_right,Material,Material d,...,wr_imp_max,df_imp,df_imp_min,df_imp_max,cr_imp,cr_imp_min,cr_imp_max,bio_imp,bio_imp_min,bio_imp_max
0,1.0,Africa,11583460.0,30001150.0,426.208612,2559.073098,"MULTIPOLYGON (((35.488 -21.685, 35.452 -21.787...",10,Rubber,,...,2723210.0,-318.649197,-1226.595855,-99.844961,-1013901.0,-11037210.0,-397.438571,-9e-06,-3.5e-05,-3e-06
0,1.0,Africa,11583460.0,30001150.0,426.208612,2559.073098,"MULTIPOLYGON (((35.488 -21.685, 35.452 -21.787...",7,Rubber,,...,30087.42,,,,0.0,0.0,0.0,0.0,0.0,0.0
0,1.0,Africa,11583460.0,30001150.0,426.208612,2559.073098,"MULTIPOLYGON (((35.488 -21.685, 35.452 -21.787...",45,Leather,,...,27667730.0,,,,0.0,0.0,0.0,0.0,0.0,0.0
1,2.0,Asia,17317280.0,44851730.0,2331.623746,5432.085227,"MULTIPOLYGON (((150.894 -10.649, 150.881 -10.6...",27,Cotton,,...,2753818.0,,,,0.0,0.0,0.0,0.0,0.0,0.0
7,8.0,Europe,3821854.0,9898597.0,1596.706533,1444.638513,"MULTIPOLYGON (((23.849 35.523, 23.971 35.515, ...",27,Cotton,,...,2753818.0,,,,0.0,0.0,0.0,0.0,0.0,0.0


In [36]:
# group by country
countries_volume_df = join_df.groupby("Country").sum()
##add geometry
country_geoms = countries_volume_df.merge(
    country_geoms, right_on="Country", left_on="Country", how="inner"
).drop_duplicates()

country_geoms.head()

Unnamed: 0,Country,OBJECTID,SQMI,SQKM,Shape_Leng,Shape_Area,index_right,Volume,wr_mean,wr_median,...,df_imp,df_imp_min,df_imp_max,cr_imp,cr_imp_min,cr_imp_max,bio_imp,bio_imp_min,bio_imp_max,geometry
0,Argentina,6.0,6856255.0,17757690.0,622.552582,1539.312933,41,140,19440.327613,5754.687988,...,-178688.4,-463444.3,71831.791992,-19507750.0,-3062715000.0,863128.5,-0.001845754,-0.03108784,0.001596102,"MULTIPOLYGON (((-73.56054 -49.94247, -73.56052..."
1,Australia,43.0,59343600.0,153699800.0,9541.693325,26495.924796,403,17790,789720.133852,173274.990162,...,-40356470.0,-64446210.0,-71.914848,-7183121000.0,-138406800000.0,-5572.344,-0.7299327,-9.130121,-8.992352e-07,"POLYGON ((149.30875 -29.35461, 149.31280 -29.3..."
2,Australia,43.0,59343600.0,153699800.0,9541.693325,26495.924796,403,17790,789720.133852,173274.990162,...,-40356470.0,-64446210.0,-71.914848,-7183121000.0,-138406800000.0,-5572.344,-0.7299327,-9.130121,-8.992352e-07,"MULTIPOLYGON (((72.24619 -53.02073, 72.24644 -..."
3,Australia,43.0,59343600.0,153699800.0,9541.693325,26495.924796,403,17790,789720.133852,173274.990162,...,-40356470.0,-64446210.0,-71.914848,-7183121000.0,-138406800000.0,-5572.344,-0.7299327,-9.130121,-8.992352e-07,"MULTIPOLYGON (((140.99926 -28.99910, 141.04221..."
4,Bangladesh,2.0,17317280.0,44851730.0,2331.623746,5432.085227,16,1400,8.558963,6.08812,...,-0.5223351,-7.158305,-0.003407,-16306.3,-251792.9,2655120.0,-2.1518e-08,-4.02766e-07,3.621667e-05,"MULTIPOLYGON (((88.00791 24.66782, 88.02667 24..."


In [38]:
country_geoms = country_geoms.drop_duplicates(subset=["Country"])
country_geoms.head()

Unnamed: 0,Country,OBJECTID,SQMI,SQKM,Shape_Leng,Shape_Area,index_right,Volume,wr_mean,wr_median,...,df_imp,df_imp_min,df_imp_max,cr_imp,cr_imp_min,cr_imp_max,bio_imp,bio_imp_min,bio_imp_max,geometry
0,Argentina,6.0,6856255.0,17757690.0,622.552582,1539.312933,41,140,19440.327613,5754.687988,...,-178688.4,-463444.3,71831.79,-19507750.0,-3062715000.0,863128.5,-0.001845754,-0.03108784,0.001596102,"MULTIPOLYGON (((-73.56054 -49.94247, -73.56052..."
1,Australia,43.0,59343600.0,153699800.0,9541.693325,26495.924796,403,17790,789720.133852,173274.990162,...,-40356470.0,-64446210.0,-71.91485,-7183121000.0,-138406800000.0,-5572.344,-0.7299327,-9.130121,-8.992352e-07,"POLYGON ((149.30875 -29.35461, 149.31280 -29.3..."
4,Bangladesh,2.0,17317280.0,44851730.0,2331.623746,5432.085227,16,1400,8.558963,6.08812,...,-0.5223351,-7.158305,-0.003407447,-16306.3,-251792.9,2655120.0,-2.1518e-08,-4.02766e-07,3.621667e-05,"MULTIPOLYGON (((88.00791 24.66782, 88.02667 24..."
5,Brazil,12.0,13712510.0,35515380.0,1245.105164,3078.625867,60,2080,17670.023677,3418.011529,...,-5533917.0,-17405040.0,-3.168394e-07,-1899072000.0,-207159900000.0,-1.05456e-09,-0.09299721,-13.20976,0.0,"MULTIPOLYGON (((-73.98306 -7.53473, -73.98303 ..."
6,Burundi,1.0,11583460.0,30001150.0,426.208612,2559.073098,45,680,40687.835938,40687.835938,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"POLYGON ((29.28880 -3.34510, 29.28942 -3.34421..."


In [22]:
# group by continent
# group by country
continents_volume_df = join_df.groupby("CONTINENT").sum()
# add geometry
continents_geom = continents[["CONTINENT", "geometry"]].merge(
    continents_volume_df, right_on="CONTINENT", left_on="CONTINENT", how="inner"
)

continents_geom.head()

Unnamed: 0,CONTINENT,geometry,OBJECTID,SQMI,SQKM,Shape_Leng,Shape_Area,index_right,Volume,wr_mean,...,wr_imp_max,df_imp,df_imp_min,df_imp_max,cr_imp,cr_imp_min,cr_imp_max,bio_imp,bio_imp_min,bio_imp_max
0,Africa,"MULTIPOLYGON (((35.488 -21.685, 35.452 -21.787...",3.0,34750390.0,90003450.0,1278.625835,7677.219293,62,4080,40892.06122,...,30421030.0,-318.6492,-1226.596,-99.844961,-1013901.0,-11037210.0,-397.4386,-9e-06,-3.5e-05,-2.865552e-06
1,Asia,"MULTIPOLYGON (((150.894 -10.649, 150.881 -10.6...",62.0,536835700.0,1390404000.0,72280.336121,168394.642052,592,54805,214176.870475,...,7442242000.0,-4257004.0,-8669441.0,435598.901801,-380535800.0,-28360140000.0,3486245000.0,-0.046769,-1.805112,0.07272118
2,Australia,"MULTIPOLYGON (((158.882 -54.711, 158.880 -54.7...",15.0,14868060.0,38508260.0,1260.826554,3477.699603,187,8640,501915.788946,...,18881180000.0,-30455220.0,-46564960.0,-71.913369,-5800737000.0,-56671040000.0,-5219.47,-0.593691,-3.807598,-8.992352e-07
3,North America,"MULTIPOLYGON (((-81.678 7.389, -81.649 7.384, ...",20.0,46697640.0,120946800.0,19774.462151,18537.09342,131,8925,57259.365618,...,540988200.0,-9700.556,-98416.48,0.0,-20444400.0,-1272256000.0,0.0,-0.000242,-0.012346,0.0
4,Oceania,"MULTIPOLYGON (((169.186 -52.577, 169.162 -52.5...",25.0,828393.6,2145538.0,1107.909709,212.827352,131,8850,123182.034024,...,2919394000.0,-3300416.0,-5960416.0,-0.000493,-460794900.0,-27245250000.0,-117.6247,-0.045414,-1.774174,0.0


In [37]:
# export for visualizatuon in qgis
country_geoms.to_file(
    "../../datasets/processed/processed_data/test_vis/country_sum.shp", driver="ESRI Shapefile"
)
continents_geom.to_file(
    "../../datasets/processed/processed_data/test_vis/continents_sum.shp", driver="ESRI Shapefile"
)

  country_geoms.to_file('../../datasets/processed/processed_data/test_vis/country_sum.shp', driver='ESRI Shapefile')
  continents_geom.to_file('../../datasets/processed/processed_data/test_vis/continents_sum.shp', driver='ESRI Shapefile')


In [66]:
country_geoms = country_geoms.sort_values("Volume", ascending=True)
country_geoms.head()

Unnamed: 0,Country,OBJECTID,SQMI,SQKM,Shape_Leng,Shape_Area,index_right,Volume,wr_mean,wr_median,...,df_imp,df_imp_min,df_imp_max,cr_imp,cr_imp_min,cr_imp_max,bio_imp,bio_imp_min,bio_imp_max,geometry
7,Canada,4.0,9339528.0,24189360.0,3954.89243,3707.418684,42,125,29975.149161,13345.055664,...,-9700.556284,-98416.481018,0.0,-20444400.0,-1272256000.0,0.0,-0.000242,-0.012346,0.0,"MULTIPOLYGON (((-141.00275 69.70417, -141.0027..."
0,Argentina,6.0,6856255.0,17757690.0,622.552582,1539.312933,41,140,19440.327613,5754.687988,...,-178688.436217,-463444.262695,71831.791992,-19507750.0,-3062715000.0,863128.5,-0.001846,-0.031088,0.001596,"MULTIPOLYGON (((-73.56054 -49.94247, -73.56052..."
21,Korea,2.0,17317280.0,44851730.0,2331.623746,5432.085227,37,160,409.754392,163.833557,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"MULTIPOLYGON (((124.37273 37.90986, 124.37350 ..."
31,Uzbekistan,2.0,17317280.0,44851730.0,2331.623746,5432.085227,23,600,147.928929,28.563921,...,98.808917,0.327866,305.552316,264671.8,0.0,1726761.0,4e-06,0.0,1.4e-05,"MULTIPOLYGON (((55.99890 44.43675, 55.99893 45..."
6,Burundi,1.0,11583460.0,30001150.0,426.208612,2559.073098,45,680,40687.835938,40687.835938,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,"POLYGON ((29.28880 -3.34510, 29.28942 -3.34421..."


In [74]:
country = list(country_geoms["Country"])
volume = list(country_geoms["Volume"])

# represent top countries by volume
data = {"country": country, "volume": volume}

source = ColumnDataSource(data)

p = figure(
    y_range=country,
    x_range=(0, 26800),
    plot_width=250,
    title="Top countries by volume (Tonnes)",
    toolbar_location=None,
    tools="",
)

p.hbar(
    y=dodge("country", 0, range=p.y_range),
    right="volume",
    height=0.2,
    source=source,
    color="#c9d9d3",
)

p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None

# represent top countries by percentage
volume_pct = [round((val * 100) / sum(volume), 2) for val in volume]

data_pct = {"country": country, "volume_pct": volume_pct}

source_pct = ColumnDataSource(data_pct)

p_pct = figure(
    y_range=country,
    x_range=(0, 100),
    plot_width=250,
    title="Top countries by volume (%)",
    toolbar_location=None,
    tools="",
)

p_pct.hbar(
    y=dodge("country", 0, range=p.y_range),
    right="volume_pct",
    height=0.2,
    source=source_pct,
    color="#c9d9d3",
)

p_pct.y_range.range_padding = 0.1
p_pct.ygrid.grid_line_color = None


# Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p, p_pct]], plot_width=450)

## by material:

In [119]:
# group by materials
# group by material
risk_material = mock_data.groupby("Material").sum()

## volume dataframe
materials = list(risk_material.index)
volumens = list(risk_material["Volume"])
df = gpd.GeoDataFrame()
df["materials"] = materials
df["volume"] = volumens  # Create Bokeh-Table with DataFrame:
from bokeh.models import ColumnDataSource
from bokeh.models.widgets import DataTable, TableColumn

data_table = DataTable(
    columns=[TableColumn(field=Ci, title=Ci) for Ci in df.columns],
    source=ColumnDataSource(df),
    height=300,
)

p_bar = risk_material[["Volume"]].plot_bokeh(
    kind="bar",
    title="Total volume purchased by Material",
    show_figure=False,
)

# Combine Table and Scatterplot via grid layout:
pandas_bokeh.plot_grid([[data_table, p_bar]], plot_width=300, plot_height=350)

In [126]:
x = Counter({"Cotton": 48745, "Rubber": 19510, "Leather": 8155})

data = (
    pd.DataFrame.from_dict(dict(x), orient="index")
    .reset_index()
    .rename(index=str, columns={0: "value", "index": "Commodity"})
)
data["angle"] = data["value"] / sum(x.values()) * 2 * pi
data["color"] = BuGn[len(x)]

# Plotting code

p = figure(
    plot_height=350,
    title="Purchased volume by material (tonnes)",
    toolbar_location=None,
    tools="hover",
    tooltips=[("Commodity", "@Commodity"), ("Value", "@value")],
)

p.annular_wedge(
    x=0,
    y=1,
    inner_radius=0.2,
    outer_radius=0.4,
    start_angle=cumsum("angle", include_zero=True),
    end_angle=cumsum("angle"),
    line_color="white",
    fill_color="color",
    legend="Commodity",
    source=data,
)

p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None

show(p)



In [134]:
# group by material and country
group_m_c = mock_data.groupby(["Country", "Material"]).sum()[["Volume"]].sort_values("Volume")
group_m_c

Unnamed: 0_level_0,Unnamed: 1_level_0,Volume
Country,Material,Unnamed: 2_level_1
Canada,Leather,125
Argentina,Leather,140
Korea,Leather,160
Thailand,Leather,180
Vietnam,Leather,260
Brazil,Leather,480
Uzbekistan,Cotton,600
Burundi,Leather,680
Japan,Rubber,730
Italy,Leather,790


In [171]:
# countries = list(set(sorted_.index.get_level_values(0)))
commodities = ["Cotton", "Rubber", "Leather"]

data = {
    "countries": [
        "Argentina",
        "Australia",
        "Bangladesh",
        "Brazil",
        "Burundi",
        "Canada",
        "China",
        "Cote d'Ivoire",
        "Greece",
        "India",
        "Indonesia",
        "Italy",
        "Japan",
        "Korea",
        "Liberia",
        "Malaysia",
        "Thailand",
        "Turkey",
        "United States",
        "United states",
        "Uzbekistan",
        "Vietnam",
    ],
    "cotton": [
        0,
        5900,
        1400,
        1600,
        0,
        0,
        22600,
        0,
        3300,
        2545,
        0,
        0,
        0,
        0,
        0,
        0,
        1200,
        0,
        7000,
        600,
        2600,
    ],
    "rubber": [
        0,
        0,
        0,
        0,
        0,
        0,
        2400,
        1100,
        0,
        1690,
        2600,
        0,
        730,
        0,
        2300,
        2040,
        4840,
        0,
        1000,
        0,
        810,
    ],
    "leather": [
        140,
        2740,
        0,
        480,
        680,
        125,
        1800,
        0,
        0,
        0,
        0,
        790,
        0,
        160,
        0,
        0,
        180,
        0,
        800,
        0,
        260,
    ],
}

source = ColumnDataSource(data=data)

p = figure(
    y_range=countries,
    x_range=(0, 22600),
    plot_width=400,
    title="Commodities bought by Country",
    toolbar_location=None,
    tools="",
)

p.hbar(
    y=dodge("countries", -0.25, range=p.y_range),
    right="cotton",
    height=0.2,
    source=source,
    color="#c9d9d3",
)

p.hbar(
    y=dodge("countries", 0.0, range=p.y_range),
    right="rubber",
    height=0.2,
    source=source,
    color="#718dbf",
)

p.hbar(
    y=dodge("countries", 0.25, range=p.y_range),
    right="leather",
    height=0.2,
    source=source,
    color="#e84d60",
)

p.y_range.range_padding = 0.1
p.ygrid.grid_line_color = None

show(p)



## by location type:

In [130]:
# group by materials
# group by material
risk_lt = mock_data.groupby("Location t").sum()

## volume dataframe
lt = list(risk_lt.index)
volumens = list(risk_lt["Volume"])
df = gpd.GeoDataFrame()
df["location"] = lt
df["volume"] = volumens  # Create Bokeh-Table with DataFrame:


data_table = DataTable(
    columns=[TableColumn(field=Ci, title=Ci) for Ci in df.columns],
    source=ColumnDataSource(df),
    height=300,
)

p_bar = risk_material[["Volume"]].plot_bokeh(
    kind="bar",
    title="Total volume purchased by Location type",
    show_figure=False,
)

# Combine Table and Scatterplot via grid layout:
pandas_bokeh.plot_grid([[data_table, p_bar]], plot_width=300, plot_height=350)

In [133]:
x = Counter({"Origin country": 19345, "Origin supplier": 39110, "Unknown": 17955})

data = (
    pd.DataFrame.from_dict(dict(x), orient="index")
    .reset_index()
    .rename(index=str, columns={0: "value", "index": "Location"})
)
data["angle"] = data["value"] / sum(x.values()) * 2 * pi
data["color"] = BuGn[len(x)]

# Plotting code

p = figure(
    plot_height=350,
    title="Purchased volume by location type (tonnes)",
    toolbar_location=None,
    tools="hover",
    tooltips=[("Location", "@Location"), ("Value", "@value")],
)

p.annular_wedge(
    x=0,
    y=1,
    inner_radius=0.2,
    outer_radius=0.4,
    start_angle=cumsum("angle", include_zero=True),
    end_angle=cumsum("angle"),
    line_color="white",
    fill_color="color",
    legend="Location",
    source=data,
)

p.axis.axis_label = None
p.axis.visible = False
p.grid.grid_line_color = None

show(p)



## water impacts:

In [175]:
water_risk_impact = mock_data[
    [
        "Material",
        "Volume",
        "Country",
        "Location t",
        "Accuracy",
        "wr_mean",
        "wr_median",
        "wr_std",
        "wr_max",
        "wr_min",
        "geometry",
        "wr_imp",
        "wr_imp_min",
        "wr_imp_max",
    ]
]
water_risk_impact.head()

Unnamed: 0,Material,Volume,Country,Location t,Accuracy,wr_mean,wr_median,wr_std,wr_max,wr_min,geometry,wr_imp,wr_imp_min,wr_imp_max
0,Rubber,2400,China,Unknown,Low,1.686943,0.1375327,3.525667,58.743172,1e-15,"MULTIPOLYGON (((73.49973 39.38174, 73.50468 39...",4048.66338,2.4e-12,140983.6
1,Rubber,1300,Malaysia,Unknown,Low,226.718707,72.63023,320.896829,1948.873047,0.1174634,"MULTIPOLYGON (((98.93721 5.68384, 98.93771 5.6...",294734.319486,152.7024,2533535.0
2,Rubber,1000,United States,Unknown,Low,0.088295,1.533124e-09,0.287133,1.150512,1.08753e-10,"MULTIPOLYGON (((-180.00000 51.79409, -180.0000...",88.295126,1.08753e-07,1150.512
3,Rubber,730,Japan,Unknown,Low,,,,,,"MULTIPOLYGON (((122.71418 24.44983, 122.71457 ...",,,
4,Rubber,490,India,Unknown,Low,45.279566,0.4625767,181.590908,2934.39502,1.358522e-09,"MULTIPOLYGON (((68.11138 23.60145, 68.13528 23...",22186.987158,6.656758e-07,1437854.0


In [195]:
# water risk by material
water_risk_impact_material = water_risk_impact.groupby("Material").sum()
water_risk_impact_material

Unnamed: 0_level_0,Volume,wr_mean,wr_median,wr_std,wr_max,wr_min,wr_imp,wr_imp_min,wr_imp_max
Material,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cotton,48745,1559.834051,704.584181,3873.8,99755.64,201.863288,3655594.0,564247.5,147912100.0
Leather,8155,767231.370025,240862.741302,1607439.0,25585580.0,40688.022647,787721400.0,27667910.0,24663150000.0
Rubber,19510,1783.558232,935.331831,2194.707,16029.77,359.796898,2190081.0,278276.6,20652440.0


In [203]:
water_risk_material = water_risk_impact_material.sort_values("wr_mean", ascending=True)
water_risk_material

Unnamed: 0_level_0,Volume,wr_mean,wr_median,wr_std,wr_max,wr_min,wr_imp,wr_imp_min,wr_imp_max
Material,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Cotton,48745,1559.834051,704.584181,3873.8,99755.64,201.863288,3655594.0,564247.5,147912100.0
Rubber,19510,1783.558232,935.331831,2194.707,16029.77,359.796898,2190081.0,278276.6,20652440.0
Leather,8155,767231.370025,240862.741302,1607439.0,25585580.0,40688.022647,787721400.0,27667910.0,24663150000.0


In [204]:
water_impact_material = water_risk_impact_material.sort_values("wr_imp", ascending=True)
water_impact_material

Unnamed: 0_level_0,Volume,wr_mean,wr_median,wr_std,wr_max,wr_min,wr_imp,wr_imp_min,wr_imp_max
Material,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Rubber,19510,1783.558232,935.331831,2194.707,16029.77,359.796898,2190081.0,278276.6,20652440.0
Cotton,48745,1559.834051,704.584181,3873.8,99755.64,201.863288,3655594.0,564247.5,147912100.0
Leather,8155,767231.370025,240862.741302,1607439.0,25585580.0,40688.022647,787721400.0,27667910.0,24663150000.0


In [205]:
# water risk as donnut chart
x = Counter({"Cotton": 1559.834051, "Leather": 767231.370025, "Rubber": 1783.558232})

data = (
    pd.DataFrame.from_dict(dict(x), orient="index")
    .reset_index()
    .rename(index=str, columns={0: "value", "index": "Commodity"})
)
data["angle"] = data["value"] / sum(x.values()) * 2 * pi
data["color"] = BuGn[len(x)]

p_d_risk = figure(
    plot_height=350,
    title="Unsustainable water use risk (m3/tonne) by commodity in 2000",
    toolbar_location=None,
    tools="hover",
    tooltips=[("Commodity", "@Commodity"), ("Value", "@value")],
)

p_d_risk.annular_wedge(
    x=0,
    y=1,
    inner_radius=0.2,
    outer_radius=0.4,
    start_angle=cumsum("angle", include_zero=True),
    end_angle=cumsum("angle"),
    line_color="white",
    fill_color="color",
    legend="Commodity",
    source=data,
)

p_d_risk.axis.axis_label = None
p_d_risk.axis.visible = False
p_d_risk.grid.grid_line_color = None

# water risk as bar chart

material = list(water_risk_material.index)
risk = list(water_risk_material["wr_mean"])


data = {"material": material, "risk": risk}

source = ColumnDataSource(data)

p_b_risk = figure(
    y_range=material,
    x_range=(0, 2067231.370025),
    plot_width=250,
    title="Top commodities by water risk (m3/Tonnes)",
    toolbar_location=None,
    tools="",
)

p_b_risk.hbar(
    y=dodge("material", 0, range=p.y_range),
    right="risk",
    height=0.2,
    source=source,
    color="#c9d9d3",
)

p_b_risk.y_range.range_padding = 0.1
p_b_risk.ygrid.grid_line_color = None


# water impact as donut chart
x = Counter({"Cotton": 3.655594e06, "Leather": 7.877214e08, "Rubber": 2.190081e06})

data = (
    pd.DataFrame.from_dict(dict(x), orient="index")
    .reset_index()
    .rename(index=str, columns={0: "value", "index": "Commodity"})
)
data["angle"] = data["value"] / sum(x.values()) * 2 * pi
data["color"] = BuGn[len(x)]

p_d_imp = figure(
    plot_height=350,
    title="Unsustainable water use impact (m3) by commodity in 2000",
    toolbar_location=None,
    tools="hover",
    tooltips=[("Commodity", "@Commodity"), ("Value", "@value")],
)

p_d_imp.annular_wedge(
    x=0,
    y=1,
    inner_radius=0.2,
    outer_radius=0.4,
    start_angle=cumsum("angle", include_zero=True),
    end_angle=cumsum("angle"),
    line_color="white",
    fill_color="color",
    legend="Commodity",
    source=data,
)

p_d_imp.axis.axis_label = None
p_d_imp.axis.visible = False
p_d_imp.grid.grid_line_color = None

# water impact as bar chart

material = list(water_impact_material.index)
impact = list(water_impact_material["wr_mean"])


data = {"material": material, "impact": impact}

source = ColumnDataSource(data)

p_b_impact = figure(
    y_range=material,
    x_range=(0, 2067231.370025),
    plot_width=250,
    title="Top commodities by water impact (m3)",
    toolbar_location=None,
    tools="",
)

p_b_impact.hbar(
    y=dodge("material", 0, range=p.y_range),
    right="impact",
    height=0.2,
    source=source,
    color="#c9d9d3",
)

p_b_impact.y_range.range_padding = 0.1
p_b_impact.ygrid.grid_line_color = None


# Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p_d_risk, p_d_imp], [p_b_risk, p_b_impact]], plot_width=450)



In [206]:
# risk and impact over time
pct_change_df = pd.read_csv("../../datasets/raw/crop_data/projection_factor_byCountry.csv")
pct_change_df.head()

Unnamed: 0.1,Unnamed: 0,2000,2001,2002,2003,2004,2005,2006,2007,2008,...,2011,2012,2013,2014,2015,2016,2017,2018,2019,country
0,0,,0.0,0.0,-0.4,-0.15,0.176471,0.065,0.095462,0.0,...,0.0,0.0,0.1,-0.035813,0.203543,0.213133,-0.376835,0.240257,0.250025,Afghanistan
1,1,,-0.151515,0.0,0.190476,0.06,0.018868,0.0,-0.12037,-0.221053,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.166216,Albania
2,2,,0.014184,0.020979,0.034247,0.02649,-0.470968,0.292683,0.169811,0.153226,...,0.055,0.184834,0.04,0.0,0.0,0.0,0.0,0.0,0.057692,Algeria
3,3,,-0.8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,-0.018333,Angola
4,4,,0.0,0.083333,0.076923,0.0,0.0,0.0,0.0,0.0,...,0.0,-0.142857,0.0,0.0,0.0,0.0,0.0,0.0,-0.001667,Antigua and Barbuda


In [239]:
mean_pct = pd.DataFrame(pct_change_df.mean())[1:]
mean_pct = mean_pct.transpose()
mean_pct["2000"] = 0
mean_pct["2001"] = 0
mean_pct["2007"] = 0
mean_pct["2008"] = 0
mean_pct["2012"] = 0
mean_pct

Unnamed: 0,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019
0,0,0,-0.0598,-0.063481,0.321695,0.049136,-0.001274,0,0,-0.045373,0.02851,0.282777,0,-0.070649,0.004498,0.021944,0.087911,0.105078,0.80738,0.041863


In [246]:
pct_change_json = {}
for el in mean_pct.columns:
    pct_change_json[el] = mean_pct[el].iloc[0]

# estimate total impact to project
total_risk_impact = water_risk_impact.sum()

##RISK OVER TIME
# project total risk
average_risk = total_risk_impact["wr_mean"]
pr_average_risk = [
    (average_risk + pct_change_json[f"{year}"] * average_risk) for year in range(2000, 2020)
]

# project max risk
max_risk = total_risk_impact["wr_max"]
pr_max_risk = [(max_risk + pct_change_json[f"{year}"] * max_risk) for year in range(2000, 2020)]

# project min risk
min_risk = total_risk_impact["wr_min"]
pr_min_risk = [(min_risk + pct_change_json[f"{year}"] * min_risk) for year in range(2000, 2020)]

# generate dataframe
df_risk = pd.DataFrame()
df_risk["year"] = [year for year in range(2000, 2020)]
df_risk["average_risk"] = pr_average_risk
df_risk["min_risk"] = pr_min_risk
df_risk["max_risk"] = pr_max_risk
df_risk.head()

Unnamed: 0,year,average_risk,min_risk,max_risk
0,2000,770574.8,41249.682833,25701360.0
1,2001,770574.8,41249.682833,25701360.0
2,2002,724494.3,38782.948757,24164420.0
3,2003,721657.9,38631.113273,24069810.0
4,2004,1018465.0,54519.485534,33969350.0


In [256]:
df_risk["year"] = pd.to_datetime(df_risk["year"], format="%Y")

source = ColumnDataSource(df_risk)

p_risk = figure(x_axis_type="datetime")

p_risk.line(x="year", y="average_risk", line_width=2, source=source, legend="Average impact")
p_risk.line(
    x="year", y="min_risk", line_width=2, source=source, color=Spectral10[5], legend="Min impact"
)
p_risk.line(
    x="year", y="max_risk", line_width=2, source=source, color=Spectral10[9], legend="Max impact"
)

p_risk.title.text = "Unsustainable water use risk over time"
p_risk.yaxis.axis_label = "m3 / ha"



In [254]:
##IMPACT OVER TIME
# project total risk
average_imp = total_risk_impact["wr_imp"]
pr_average_imp = [
    (average_risk + pct_change_json[f"{year}"] * average_risk) for year in range(2000, 2020)
]

# project max risk
max_risk = total_risk_impact["wr_imp_max"]
pr_max_imp = [(max_risk + pct_change_json[f"{year}"] * max_risk) for year in range(2000, 2020)]

# project min risk
max_risk = total_risk_impact["wr_imp_max"]
pr_min_imp = [(min_risk + pct_change_json[f"{year}"] * min_risk) for year in range(2000, 2020)]


# generate dataframe
df_imp = pd.DataFrame()
df_imp["year"] = [year for year in range(2000, 2020)]
df_imp["average_imp"] = pr_average_imp
df_imp["min_imp"] = pr_min_imp
df_imp["max_imp"] = pr_max_imp
df_imp.head()

Unnamed: 0,year,average_imp,min_imp,max_imp
0,2000,770574.8,41249.682833,24831710000.0
1,2001,770574.8,41249.682833,24831710000.0
2,2002,724494.3,38782.948757,23346770000.0
3,2003,721657.9,38631.113273,23255370000.0
4,2004,1018465.0,54519.485534,32819940000.0


In [257]:
df_imp["year"] = pd.to_datetime(df_imp["year"], format="%Y")

source = ColumnDataSource(df_imp)

p_imp = figure(x_axis_type="datetime")

p_imp.line(x="year", y="average_imp", line_width=2, source=source, legend="Average impact")
p_imp.line(
    x="year", y="min_imp", line_width=2, source=source, color=Spectral10[5], legend="Min impact"
)
p_imp.line(
    x="year", y="max_imp", line_width=2, source=source, color=Spectral10[9], legend="Max impact"
)

p_imp.title.text = "Unsustainable water use impact over time"
p_imp.yaxis.axis_label = "m3"



In [258]:
# Make Dashboard with Grid Layout:
pandas_bokeh.plot_grid([[p_risk, p_imp]], plot_width=450)

In [265]:
water_risk_impact[["Country", "wr_mean", "wr_imp"]][4:]

Unnamed: 0,Country,wr_mean,wr_imp
4,India,45.279566,22186.99
5,Thailand,73.810405,228812.3
6,Indonesia,52.100767,135462.0
7,Cote d'Ivoire,2.733245,3006.569
8,Vietnam,100.613764,81497.15
9,Malaysia,226.718707,167771.8
10,Liberia,201.492037,463431.7
11,India,296.645033,355974.0
12,Thailand,84.117894,84117.89
13,Thailand,471.552868,348949.1


In [266]:
p = figure(title="")
p.circle(
    "wr_mean",
    "wr_imp",
    source=water_risk_impact[["wr_mean", "wr_imp"]][4:],
    fill_alpha=0.2,
    size=10,
)
p.xaxis.axis_label = "Risk"
p.yaxis.axis_label = "Impact"
show(p)