# EDA

## Motivation

To set up a biorefinery in a region, an **understanding of the region’s current and future biomass produce** will be required. 

This biomass needs to be:
1. Collected and transported to intermediate depots for **de-moisturisation and densification into pellets**.
2. The pellets will then need to be **transported to the biorefinery** for conversion to biofuel. 

![bioprocess](../../img/Harvesting_Pellets_Bioref.png)

This incurs high cost of feedstock transportation and associated GHG emissions, which will need to be minimised too. The value it generates lies not only in contribution to the global energy transition, but also benefits farmers as a sustainable source of
income.

## Problem Goal

Using the provided data, you are required to forecast biomass availability as well as design the optimal supply chain for the years 2018 and 2019, following the objective and constraints given in the subsequent sections.

1. Biomass Forecast (biomass_forecast)
2. Set Optimal Location for:
- Depots (depot_location): Where the biomass is pelletized (i.e. packed)
- Refineries (refinery_location): Where the pellets are used for energy generation.
3. The supplies for
- Depots (biomass_demand_supply)
- Refineries (pellet_demand_supply)

## Objective Functions and Constraints

$$
Cost_{transport} = \left(\sum_{i, j}D_{i,j}Biomass_{i,j}\right) + \left(\sum_{j, k}D_{j, k}Pellet_{j, k}\right)
$$

$$
Cost_{forecast} = MAE\left(Biomass_{i, j}, Biomass_{i, j}^{forecast}\right)
$$

$$
Cost_{underuse} = \sum_{i, j}\left(Cap_{depot} - \sum_{i,j}Biomass_{i, j}\right) + \sum_{j, k}\left(Cap_{refinery} - \sum_{j, k}Biomass_{j, k}\right)
$$

$$
Cost = 0.001Cost_{transport} + Cost_{forecast} + Cost_{underuse}
$$

1. All values (forecasted biomass, biomass demand-supply, pellet demand-supply) must be
greater than or equal to zero.

$$
Biomass_{i}^{forecast} >= 0,\quad biomass\_demand\_supply >= 0,\quad pellet\_demand\_supply >=0
$$

2. The amount of biomass procured for processing from each harvesting site ′𝑖′ must be less than or equal to that site’s forecasted biomass.
$$
\sum_{j}Biomass_{i, j} <= Biomass_{i}^{forecast}
$$

3. Total biomass reaching each preprocessing depot ′𝑗′ must be less than or equal to its yearly processing capacity (20,000).
$$
\sum_{i}Biomass_{i, j} <= Cap_{depot}^j
$$
4. Total pellets reaching each refinery ′𝑘′ must be less than or equal to its yearly processing capacity (100,000).
$$
\sum_{j}Pellet_{j, k} <= Cap_{refinery}^k
$$
5. Number of depots should be less than or equal to 25.
$$
j<=25
$$
6. Number of refineries should be less than or equal to 5.
$$
k<=5
$$
7. At least 80% of the total forecasted biomass must be processed by refineries each year.

$$
\sum_{j, k}Pellet_{j, k, year} >= 0.8\sum_{i, j}Biomass^{forecast}_{i, j, year}
$$

8. Total amount of biomass entering each preprocessing depot is equal to the total amount of
pellets exiting that depot (within tolerance limit of 1e-03).
$$\sum_{i}Biomass_{i, j} = \sum_{k}Pellet_{j, k}$$


# Data

In [12]:
import pandas as pd
import numpy as np

In [None]:
cap_depot = 20_000
cap_ref = 100_000
n_depots = 25
n_ref = 5
min_proc_bio_rate = 0.8

## Biomass History

A time-series of biomass availability in the state of Gujarat from year
2010 to 2017. We have considered arable land as a map of 2418 equisized grid blocks
(harvesting sites). For ease of use, we have flattened the map and provided location index,
latitude, longitude, and year wise biomass availability for each Harvesting Site

In [3]:
df_bio = pd.read_csv('../../data/Biomass_History.csv')
df_bio.head()

Unnamed: 0,Index,Latitude,Longitude,2010,2011,2012,2013,2014,2015,2016,2017
0,0,24.66818,71.33144,8.475744,8.868568,9.202181,6.02307,10.788374,6.647325,7.387925,5.180296
1,1,24.66818,71.41106,24.029778,28.551348,25.866415,21.634459,34.419411,27.361908,40.431847,42.126946
2,2,24.66818,71.49069,44.831635,66.111168,56.982258,53.003735,70.917908,42.517117,59.181629,73.203232
3,3,24.66818,71.57031,59.974419,80.821304,78.956543,63.160561,93.513924,70.203171,74.53672,101.067352
4,4,24.66818,71.64994,14.65337,19.327524,21.928144,17.899586,19.534035,19.165791,16.531315,26.086885


## Distances Matrix

The travel distance from source grid block to destination grid block,
provided as a 2418 x 2418 matrix. Note that this is not a symmetric matrix due to U-turns, oneways etc. that may result into different distances for ‘to’ and ‘from’ journey between source
and destination.

In [4]:
df_dist = pd.read_csv("../../data/Distance_Matrix.csv")
df_dist.head()

Unnamed: 0.1,Unnamed: 0,0,1,2,3,4,5,6,7,8,...,2408,2409,2410,2411,2412,2413,2414,2415,2416,2417
0,0,0.0,11.3769,20.4557,38.1227,45.381,54.9915,78.6108,118.675,102.6639,...,683.8771,687.631,697.3246,669.3962,667.6788,665.5775,662.0291,665.9655,673.2073,681.4235
1,1,11.3769,0.0,9.0788,28.9141,36.1724,45.7829,69.4022,78.2329,93.4553,...,681.6295,685.3833,695.0769,667.1485,665.4311,663.3298,659.7815,663.7178,670.9596,679.1758
2,2,20.4557,9.0788,0.0,22.3791,29.6374,39.2478,62.8671,71.6979,86.9203,...,682.2323,685.9861,695.6796,667.7513,666.0339,663.9326,660.3843,664.3206,671.5623,679.7786
3,3,38.1227,28.9141,22.3791,0.0,11.8343,23.5413,41.8396,50.6703,65.8927,...,681.4226,685.1765,694.8701,666.9417,665.2243,663.123,659.5746,663.511,670.7528,678.969
4,4,45.381,36.1724,29.6374,11.8343,0.0,11.707,24.3986,33.2293,53.9901,...,663.9816,667.7355,677.4291,649.5007,647.7833,645.682,642.1336,646.07,653.3118,661.528


## Sample Submission

In [6]:
df_sample = pd.read_csv("../../data/sample_submission.csv")
df_sample.head()

Unnamed: 0,year,data_type,source_index,destination_index,value
0,20182019,depot_location,1256,,
1,20182019,depot_location,1595,,
2,20182019,depot_location,1271,,
3,20182019,depot_location,2001,,
4,20182019,depot_location,2201,,


In [8]:
df_sample.data_type.value_counts()

biomass_demand_supply    21646
biomass_forecast          4836
pellet_demand_supply       152
depot_location              21
refinery_location            4
Name: data_type, dtype: int64

In [11]:
df_sample[df_sample.data_type == 'depot_location']

Unnamed: 0,year,data_type,source_index,destination_index,value
0,20182019,depot_location,1256,,
1,20182019,depot_location,1595,,
2,20182019,depot_location,1271,,
3,20182019,depot_location,2001,,
4,20182019,depot_location,2201,,
5,20182019,depot_location,1179,,
6,20182019,depot_location,1801,,
7,20182019,depot_location,1432,,
8,20182019,depot_location,1488,,
9,20182019,depot_location,2088,,


In [10]:
df_sample[df_sample.data_type == 'biomass_forecast']

Unnamed: 0,year,data_type,source_index,destination_index,value
25,2018,biomass_forecast,0,,5.180296
26,2018,biomass_forecast,1,,42.126946
27,2018,biomass_forecast,2,,73.203232
28,2018,biomass_forecast,3,,101.067352
29,2018,biomass_forecast,4,,26.086885
30,2018,biomass_forecast,5,,41.749001
31,2018,biomass_forecast,6,,55.003929
32,2018,biomass_forecast,7,,29.411613
33,2018,biomass_forecast,8,,37.497009
34,2018,biomass_forecast,9,,52.549976


In [9]:
df_sample[df_sample.data_type == 'pellet_demand_supply']

Unnamed: 0,year,data_type,source_index,destination_index,value
13266,2018,pellet_demand_supply,1256,1465.0,4.473232e-01
13267,2018,pellet_demand_supply,1256,1692.0,3.646738e+01
13268,2018,pellet_demand_supply,1256,1535.0,1.184648e+04
13269,2018,pellet_demand_supply,1256,1461.0,5.170925e+03
13270,2018,pellet_demand_supply,1595,1465.0,4.608941e+03
13271,2018,pellet_demand_supply,1595,1692.0,5.033927e-05
13272,2018,pellet_demand_supply,1595,1461.0,1.314331e+04
13273,2018,pellet_demand_supply,1271,1692.0,1.257224e+04
13274,2018,pellet_demand_supply,1271,1535.0,5.722201e+03
13275,2018,pellet_demand_supply,2001,1465.0,6.437653e+03
