E-waste calculations for the European Union
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
images
scripts
.gitignore
HISTORY.md
LICENSE
README.md
VERSION
screenshot.xcf

README.md

Waste over Time

Last updated: November 23, 2017

Statistics Netherlands author: "V.M. van Straalen"

Table of Contents

Purpose

The Waste over Time (or WOT) script is a script developed by Statistics Netherlands. It uses European production and trade statistics with the 'apparent consumption method' to estimate sales in weight of products. The script also comes with data on life times of electronic and electric equipment, and allows to calculate the urban mine and to forecast waste generated.

Results from the calculations can be viewed here:
screenshot

The script here available at the GitHub shows the calculations to produce this output.

The script can be used for:

WEEE-Directive:

  • The methodology in the script follows the so-called 'common methodology' to determine Placed on the Market and WEEE Generated as defined in article 7 of the EU-WEEE Directive. For more information regarding the WEEE-Directive: WEEE-Directive
  • Governmental authorities of the EU Member States, inspection agencies and WEEE-registers can use the script to obtain an alternative to equipment placed on the market, and forecast waste streams.

Circular Economy

  • The script has been used in the H2020 ProSUM Project.
  • The script allows to study Material Flows and Urban Mining by relating sales, use phase, and forecasts of waste streams. It can be easily extended to other materials.

The WOT script contains all relevant source data to perform the calculations for each EU-28 Member State for 54 types of electronical and electrical equipment. The statistical source data comes from Eurostat. The other parameters come from a results of a project of DG Environment in which Statistics Netherlands participated. Art7_publication.

The script has been designed that it can be easily extended to other types of products. The script has been written in R.




Installation

R

No R knowledge is necessary to run this series of scripts. But R needs to be installed on your computer. For more information on how to install R see R Installation and Administration

The scripts work with R version 3.2.3. Only script 00a_Prepare_Prodcom_data.R, which is not part of the main program, needs at least R version 3.3.3.

In case you wish to see and understand all the calculations, knowledge of the R language is of course needed.


Wast over Time (WOT) script

From the GitHub e-waste repository (you are there now when reading this file) click on the menu item 'releases' in the middle of the screen. Click on 'zip' behind the most recent release. Save this file on your computer and unpack. Go to the folder .\ewaste-master\scripts. Copy that location to the clipboard.

Now open de command line (Windows: cmd.exe, aka 'DOS box') by typing cmd into the windows search box and pressing enter. Here type the command SETX EWASTE_SCRIPT_PATH "[FOLDER SCRIPT PATH]" and enter the location of the folder with the scripts between the "". Right-click and choose paste, to paste this location that was copied to the clipboard before. The command would then look for instance like this: SETX EWASTE_SCRIPT_PATH "F:\Vincent\Documents\ewaste-master\scripts".

Set script path

Now the DOS box can be closed. In case R was open, please close and re-open it.


R packages

You need to install the following packages (sets of extra R functions) to be able to run the scripts. To do this open R and copy all of the next R commands - one at a time - into the R Console and press ENTER. If you are asked to select a CRAN mirror, select one of your choice and continue.

install.packages("getopt")
install.packages("eurostat")
install.packages("plyr")
install.packages("dplyr")
install.packages("reshape2")
install.packages("readxl")
install.packages("data.table")
install.packages("forecast")
install.packages("tseries")



Usage

The program can be run from the command line (Windows: cmd.exe, aka 'DOS box').

To run the calculations for all countries, go to the folder of the scrips. Do this by using the CD command. For instance 'cd /d F:\Vincent\Mijn documenten\ewaste-master\scripts'. Then type the following text in the command editor:
Rscript main.R

Alternatively the above two commands could be performed at once with the command 'Rscript %EWASTE_SCRIPT_PATH%\main.R'

To run the calculations just for one or more specific countries use: Rscript main.R --countries=nld;bel. This will not perform calculations to correct outliers based on values from similar countries.

The above commands assume that the location where Rscript.exe lives is on the search path (the PATH environment variable). If this is not the case (you will notice because you get a message saying "'Rscript' is not recognized as an internal or external command, operable program or batch file."), you will need to provide the full path to Rscript.exe, e.g. for R version 3.3.1 under MS Windows this would typically be:

  • 32 bit computer: C:\Program Files\R\R-3.3.1\bin\i386

  • 64 bit computer: C:\Program Files\R\R-3.3.1\bin\x64

For instance type '"C:\Program Files\R\R-3.3.1\bin\x64\Rscript.exe" main.R' (the path itself has to be between double-quotes because of the space in the name of the Program Files folder). Then it looks like this:

Start Rscript


In case you are comfortable with the R language you can of course also run the script from R. Executing the script main.R will give you the the calculations for all countries.


Source file listing




Output

The program creates a few data files that can be found in the '.\ewaste-master\data' folder.

The first two are also included in this set of scripts so they can be viewed directly:

  • tbl_POM.csv - This contains the Put-On-Market results. No aggregates.
  • tbl_WEEE.csv - This contains the WEEE generated. No aggregates.

These two CSV files are USA/UK CSV files (where the decimal separator is a period/full stop and the value separator is a comma).

After running the program you can also find the following data files.

  • tbl_ANALYSIS.RData - This file can only be accessed with R and contains a list of 4 dataframes:
    1. POM data
    2. WEEE generated data
    3. Stock data
    4. Eurostat WEEE directive reference data
  • tbl_POM_all.csv - This contains the Put-On-Market results including aggregates.
  • tbl_WEEE_all.csv - This contains the WEEE generated including aggregates.
  • tbl_Stock_all.csv - This contains the WEEE generated including aggregates.

These last three CSV files are European formatted CSV files (where the decimal separator is a comma and the value separator is a semicolon). They contain the same data as the first 3 dataframes in the RData file and can be easily used to import in other programs.




Description of the program components

This series of scripts uses prodcom and international trade data. It calculates the apparent consumption and then estimates missing values and corrects outliers. When running the script without any alterations, all script files will be executed except for scripts that prepare the input data: 00a_Prepare_Prodcom_data.R, 00b_Prepare_International_Trade_data.R and 00c_Prepare_Solar_Panel_data.R.


Preparing source data sets of prodcom and international trade

The script uses two main input data files for the calculations:

The script starts with those files. To run the script with other data those files have to be altered as long as the structure remains the same. The two files have been generated by the scripts 00a_Prepare_Prodcom_data.R and 00b_Prepare_International_Trade_data.R. Those scripts are not automatically executed, so the scripts that are excecuted when the program is run without alterations will generate results based on the two data files provided here at the Github.

Script 00a_Prepare_Prodcom_data.R downloads the prodcom datafiles from the Eurostat website. Data for 1995-2015 will be automatically downloaded and saved in the right location. When data for later years becomes available, the new files have to be added in the script or they have to be manually downloaded and placed in the '..\ewaste-master\data' folder. The output of this script is tbl_data_pcc_conf.csv.

Script 00b_Prepare_International_Trade_data.R shows how to download the international trade data. It is not performed automatically. This International Trade data is very large and has to be downloaded manually. This can be done in two ways. Using the the Eurostat Bulk Download facility or by using the Eurostat Comext website. The method of using the Bulk download facility is used in the script. The resulting datafile from this script is called tbl_CN.csv and is included with this program. It contains data from 1995-2015. When new data is needed it has to be downloaded manually and script 00b_Prepare_International_Trade_data.R has to be executed to generate a new version of tbl_CN.csv. A detailed instruction to download these source data files is given in Annex 1 both for using the the Bulk Download facility method and for using the Comext website.


Estimation of confidential prodcom data.

Prodcom data contains many hidden values because of confidentiality. These confidential values are being estimated in script 01_Prodcom_confidentials.R. Input data or this script is:

The estimation of the confidential Prodcom record are carried out in the following way:

Step 1 - Ratio between units exported and units produced over years

Calculate ratio between units exported and units produced in case both values are not confidential for every Prodcom code, country and year. Calculate for every Prodcom code and country the median of the calculated ratios (all the years). Use this median ratio to estimate confidential prodcom units: export units / ratio = prodcom units.

Step 2 - Ratio between units exported and units produced over countries

Same, but this time countries are grouped in 3 groups based on purchasing power. Values that could not have been estimated with step 1, are now being estimated based on the median ratio of the aggregated values of similar countries.

Finalize - Fitting the results

Estimated values are checked on reliability using the EU28 aggregate. In case the sum of the estimated values is larger than the difference between the EU28 aggregate and the sum of the non-confidential values for the EU 28 countries, the estimated values are scaled downwards proportionally.

The result of this script is tbl_PCC.csv. It contains a flag variable telling where the prodcom values come from:

  • Flag 0 - "Original Prodcom value"
  • Flag 1 - "Estimated with Export/Prodcom ratio of other years per country"
  • Flag 2 - "Estimated with Export/Prodcom ratio of similar countries per year"
  • Flag 3 - "Estimated with lineair model original values per Country and PCC"

Calculate apparent consumption

The apparent consumption is calculated in script 02_Apparent_Consumption_Method.R using the formula apparent consumption = production + imports - exports. Therefore data on production and international trade are input of this script:

To perform this calculation units have to be converted to weight which is done by conversion tables of average weights. Also the prodcom and CN data has to be aggregated to UNU_Key level. These calculations are also done in this script. They result in total value in kilo's and in units for each UNU_Key and are saved in UNU_countries.csv.


Description of method to calculate POM (Put on market)

The data resulting from the calculations in script 02_Apparent_Consumption_Method.R need correction of outliers and estimation of missing data. That is performed in script 03_POM_calculations.R.

The input is UNU_countries.csv


First adjustments

Additional data on number of inhabitants and a stratum code to group similar countries (based on purchasing power) is added. With those we can calculate the first version of the kilo's per inhabitant and the units per inhabitant. These ratios will be the basis for the checks and corrections.


Manual corrections

A number of manual corrections are being carried out. These result from the analysis of the automatic corrections. When there are a few years in a row with unreliable data, the automatic procedures cannot correct them. Some unreliable data is corrected using knowledge of the market. For instance CRT TVs have not been sold in recent years, so they are set to 0.


Solar panel data

Data on solar panels (UNU_Key 0002) is taken from the Eurostat website. This data is included. Script 00c_Prepare_Solar_Panel_data.R shows exactly how the data is made. This data is only available in megawatts. We calculate the weight and number of units by assuming a number of kg's per megawatt and an average weight per panel. These values are around 100.000 kg per megawatt and 17 kg per panel, but they vary with every year. The used conversion factors can be seen in the file tbl_solar_panel_conversion_factors.csv. Estimations of future solar panel data are taken from reports on IRENA.ORG. The values used can be found in the file tbl_solar_panel_extrapolation_data.csv.

Automatic corrections for all UNU_Keys, countries and years

Remove unreliable data by comparing with other years from the same Country and product category

The outlier detection is done using the Median Absolute Deviation (MAD). For this outlier detection method, the median of the data (all years of a specific country and product category) is calculated. Then, the difference is calculated between each historical value and this median. These differences are expressed as their absolute values, and a new median is calculated of those absolute values. This is the MAD. If a value is 4 times the MAD away from the median of the data points, that value is classified as an outlier.

Example:

Year 2005 2006 2007 2008 2009 2010 2011 median
kg per inhabitant 3.08 3.04 3.5 4.96 2.71 2.6 2.47 3.04
deviation from median 0.04 0.00 0.46 1.92 0.33 0.43 0.56 0.43

factor: 4 lower threshold 1.30 (= 3.04 - 4 x 0.43) upper threshold 4.78 (= 3.04 + 4 x 0.43)

The value of 2008 (4.96) will be classified as an outlier because it lies above the upper threshold of 4 times the median absolute deviation There have to be at least 6 values for calculating the median. With less than 6 values, no outliers are being determined.

Remove unreliable data by comparing data on a specific year and product category with other countries

Also this outlier detection is done using the MAD. The only difference is that the data of a specific year and product group is compared with data from similar countries (given by the stratum).

Calculate years by using the average of other years.

In case one year is missing (in original data or after removal of outliers) it is calculated with the average of the previous and following year. For instance when 2007 was missing it will be calculated in the following way:

Year 2005 2006 2007 2008
kg per inhabitant 1.20 1.20 1.40 1.60

With 2 consecutive missing years a missing year gets 2/3 of the value of the adjacent year and 1/3 of the value of the year next to the other missing year. For instance when 2006 and 2007 were missing they will be calculated in the following way:

Year 2005 2006 2007 2008
kg per inhabitant 1.20 1.33 1.47 1.60

If the missing year is the last available year then it is calculated by adding the difference of the two preceding years to the previous year. This way the trend of the previous 2 years is being continued. For instance when 2012 (the last year for which we have data) is missing they will be calculated in the following way:

Year 2009 2010 2011 2012
kg per inhabitant 1.30 1.40 1.70 2.00

Calculate missing years by using the average of other similar countries

First the average kilo's per inhabitant and average units per inhabitant are calculated for each stratum. Also the average purchasing power of each stratum is calculated (purchasing power per country multiplied with inhabitants of each country, then divided by total inhabitants in stratum).
The purchasing power of the country with the missing data is also known. When the purchasing power is between the average of 2 stratums the missing value is calculated by taking the difference between de kilo's/units per inhabitant of the higher stratum and the lower stratum. This will then be divided by the difference in purchasing power between the higher stratum and the lower stratum. After this it will be multiplied with the difference in purchasing power between the country with the missing data and that of the lower stratum. Finally this outcome will be added to the average kg/units per inhabitant of the lower stratum. The following example with demonstrate this procedure.

Country A B C D E F stratum lower stratum
purchasing power 19500 21928 22028 23193 18023 21344 21307 17480
kg per inhabitant 0.81 1.20 0.70 1.15 0.85 1.10 1.00 0.60

All countries A to F are from the same stratum. Country A had missing data. The average kilo's per inhabitant for the stratum is 1.00 based on the known values from countries B to F. The average purchasing power is a weighted average using the number of inhabitants of each country. The last column shows the averages for countries in the next lower stratum.

The calculation is as follows: 0.6 + (1.00 - 0.6) / (21,307 - 17,480) x (19,500 - 17,480) = 0.81

This way a value will be calculated based on the stratum average, but adjusted for the purchasing power. The country with the lowest purchasing power in this dataset is Rumania. This country is used to adjust for the purchasing power for countries that have a purchasing power that lies between Rumania and the average of the lowest stratum. The country with the highest purchasing power is Luxemburg. This country is used to adjust for the purchasing power of countries that have a purchasing power that lies between the average of the highest stratum and Luxemburg.

Eliminate big changes between years

After the last calculations there can be big changes from year to year. These are corrected in two ways. The first check is to see if a specific outcome considering the kilo's per inhabitant differs more than 50% from the previous year (t-1) while the previous year and the next year (t+1) don't differ more than 20%. This only done if the value that will possibly be corrected is higher than 0.3. Below this value there is only little impact of the changes on the outcome of the collection categories.
The second check is to see if a specific outcome is more than 20 percent below the minimum of the two surrounding years and more than 40 percent below the maximum value of those years. Also if a specific outcome is more than 20 percent higher than the maximum of the two surrounding years and more than 40 percent higher than the minimum of those years. This check is not done if the value of year t+2 is within 20 percent of the year that might be corrected. The reason for this is that in this case the year t+1 might be the one with an unreliable outcome. Just as with the first check, the second check is only done if the value that will possibly be corrected is higher than 0.3. Once a check has had a positive outcome the corresponding value will be deleted.
After this the empty values will be again calculated in the same way as described in the part "Calculate years by using the average of other years" above. So they will be replaced by the average of surrounding years.


Correction of CRT monitors (UNU_Key 0302)

The amount of CRT monitors (UNU_Key 0302) is not reliable with the above method. Therefore we calculate them in another way with the formula: CRT Monitors [kg] = ((Desktop sales[pieces] x 116% ) - Flatpanels [pieces] ) x average weight CRT Monitors. The desktop sales are the outcome of UNU_Key 0302 and the flatpanels are UNU_Key 0309.


Calculate past POM

Put-on-Market data is only calculated based on prodcom and International Trade data starting in 1995. For some countries especially those in stratum 3 data for 1995-2000 is also not available. Data for the years from 1980 until the year for which data is available, is calculated using a lineair line from the year of introduction to the average of the first 3 real datapoints. The year of introduction used can be found in htbl_Key_IntroductionYear.csv. It is possible that the year of introduction is before 1980.


Calculate future POM

Future POM is calculated in various ways. There are 4 mehods that can be used:

  1. KPI_PPP
  2. ConnectionConstant
  3. KPI
  4. LastReliableConstant

The KPI_PPP method is a model based on the relation between kg per inhabitant and purchasing power. The connectionconstant model does the same except it keeps the first calculated value as a constant afterwards. the KPI method is a continuation of the trend of the most recent years. LastReliableConstant is a method that keeps a one of the latest value as a constant. The KPI_PPP method is the default method. The other methods will only be used automatically in case there can not be made a reliable calculation for the KPI_PPP method. You can also manually assign one of the methods to a specic UNU_Key or a UNU_Key/Country combination. These data can be found in tbl_Extrapolation_exceptions.csv.

A detailed description of these 4 mehods can be found in Annex 2.


Result

After all the estimations and corrections the result is saved into tbl_POM.csv.


Calculation of WEEE generated

The WEEE generated is calculated by using the apparent consumption and life-cycle profiles of each of the product groups (UNU_Keys). This is done with the Weibull function. Input of this script is tbl_POM.csv and the output is tbl_WEEE.csv.


Calculated aggregates, stock, graphs and output files

Input of this script are the files:

Now some aggregates of the POM and WEEE data are calculated.

After this the stock is being calculated which is the sum of historic POM minus the sum of all historic WEEE.

Data is saved in the following output files

  • tbl_ANALYSIS.RData - This file can only be accessed with R and contains a list of 4 dataframes:
    1. POM data
    2. WEEE generated data
    3. Stock data
    4. Eurostat WEEE directive reference data
  • tbl_POM_all.csv
  • tbl_WEEE_all.csv
  • tbl_stock_all.csv

The POM data also contains contains a flag variable telling if the resulting value is taken without alterations from UNU_countries.csv or that it has been altered by the program and it what way that was done. The categories indicated are:

  • Flag 0 - "Original value"
  • Flag 11 - "Imputated value via other years"
  • Flag 12 - "Low outlier in stratum, corrected by other years"
  • Flag 13 - "High outlier in stratum, corrected by other years"
  • Flag 14 - "Low outlier over years, corrected by other years"
  • Flag 15 - "High outlier over years, corrected by other years"
  • Flag 21 - "Imputated value via stratum"
  • Flag 22 - "Low outlier in stratum, corrected via stratum"
  • Flag 23 - "High outlier in stratum, corrected via stratum"
  • Flag 24 - "Low outlier over years, corrected via stratum"
  • Flag 25 - "High outlier over years, corrected via stratum"
  • Flag 41 - "partly based on estimated prodcom data that is marked as confidential"
  • Flag 42 - "partly based on estimated prodcom data that is marked as confidential"
  • Flag 43 - "partly based on estimated prodcom data that is marked as confidential"
  • Flag 52 - "CRT tvs corrected to zero or close to zero for recent years"
  • Flag 53 - "Eurostat data copied"
  • Flag 54 - "The solar panel forecast"
  • Flag 60 - "Manual correction"
  • Flag 100 - "Original value and afterwards replaced with avg surrounding years"
  • Flag 111 - "Imputated value via other years and afterwards replaced with avg surrounding years"
  • Flag 121 - "Imputated value via stratum and afterwards replaced with avg surrounding years"
  • Flag 122 - "Low outlier in stratum, corrected via stratum and afterwards replaced with avg surrounding years"
  • Flag 123 - "High outlier in stratum, corrected via stratum and afterwards replaced with avg surrounding years"
  • Flag 124 - "Low outlier over years, corrected via stratum and afterwards replaced with avg surrounding years"
  • Flag 125 - "High outlier over years, corrected via stratum and afterwards replaced with avg surrounding years"
  • Flag 200 - "Historic extrapolations"
  • Flag 215 - "KPI_PPP approach with connection year 2015 (=all years from 2016 and later are extrapolated)"
  • Flag 313 - "ConnectionConstant approach with connection year 2013 (=all years from 2014 and later are extrapolated)"
  • Flag 314 - "ConnectionConstant approach with connection year 2014 (=all years from 2015 and later are extrapolated)"
  • Flag 315 - "ConnectionConstant approach with connection year 2015 (=all years from 2016 and later are extrapolated)"
  • Flag 415 - "KPI approach with connection year 2015 (=all years from 2016 and later are extrapolated)"
  • Flag 500 - "Solar panel data set to zero for historic years and future years for which there are no estimations"
  • Flag 507 - "LastReliableConstant approach with connection year 2007 (=all years from 2008 and further get value from 2007)"
  • Flag 511 - "LastReliableConstant approach with connection year 2011 (=all years from 2012 and further get value from 2011)"
  • Flag 512 - "LastReliableConstant approach with connection year 2012 (=all years from 2013 and further get value from 2012)"
  • Flag 513 - "LastReliableConstant approach with connection year 2013 (=all years from 2014 and further get value from 2013)"
  • Flag 514 - "LastReliableConstant approach with connection year 2014 (=all years from 2015 and further get value from 2014)"
  • Flag 515 - "LastReliableConstant approach with connection year 2015 (=all years from 2016 and further get value from 2015)"



Reference

When referring to this 'Waste over Time' script please use:

Van Straalen, V.M, Roskam, A.J., & Baldé, C.P. (2016). Waste over Time [computer software]. The Hague, The Netherlands: Statistics Netherlands (CBS). Retrieved from: http://github.com/Statistics-Netherlands/ewaste




License

These scripts are provided under MIT License: https://github.com/Statistics-Netherlands/ewaste/blob/master/LICENSE




Annex 1 - Instruction to update International Trade source datafile

the Eurostat Bulk Download facility method

The location of the Annual data changes every year. Go here. Then go into the folder that has 'S2' at the end of the name, for instance '2016S2'. Then go into 'data'.

Unpack all these files in the '.\data\international_trade' folder. It is easy to view if changes have been made in historic data by looking at the date of the files.

The location of the corresponding metadata changes every month. Go here. Then go to the folder with the most recent monthly data. For instance if the most recent data is from August 2017 then go to the folder 201708. After this go to the folders 'text' and 'english'.

Script 00b_Prepare_International_Trade_data.R shows how the downloaded data is then combined and put in the right format for use in the remainder of the scripts. Run the entire script from R by selecting all rows and pressing CTRL+ENTER. It can take a couple of hours depending on your system specifications. After this tbl_CN.csv has been updated.


Comext website method

International trade data is published at the Eurostat website: Eurostat Easy Comext website

This method is not recommended anymore, because the script 00b_Prepare_International_Trade_data.R is using the Bulk Download Facility method to place the data in the right format. The data downloaded with the Comext website method can not be processed with this same script. It is better to use it just for information.

Make sure you register yourself so you can download the big amount of data.
Go to "Available datasets" --> "INTERNATIONAL TRADE" --> "EU Trade Since 1988 By CN8 (DS-016890)". Click on the "New Query" icon.

Fill the following fields in Step 1 "Dimension Selection":

  • Reporter:
    Select all countries except the aggregations starting with EA, EU or EUROZONE. Click the "select" button when you are done.
  • Partner:
    Select only:
    • EU25_EXTRA-EU25-EXTRA
    • EU25_INTRA-EU25-INTRA
  • Product:
    Select only the following CN codes (the advanced selection box is very helpful):
    • 63011000
    • 84*
    • 85*
    • 87119010 + 87119090
    • 90*
    • 91*
    • 92*
    • 94*
    • 95*
  • Flow:
    Select all
  • Period:
    • Select the years for which you would like to download data. You can choose up to 6 years at once which will result in a download of about 20MB.
    • Choose only the complete years. For instance for year 2015 I choose: 201552-Jan.Dec.2015.
  • Indicators:
    Select all

After this press the Compress button to remove codes that do not generate data.

In Step 2 "Layout selection" make the following selection:

  • Rows:

    • Dimension#1: PERIOD
    • Dimension#2: FLOW
    • Dimension#3: REPORTER
    • Dimension#4: PRODUCT
  • Columns:

    • Dimension#1: INDICATORS
    • Dimension#2: PARTNER
  • All formats are codes

In Step 3 "Output selection" do the following:

  • Select: Batch extraction
  • Select: Also generate output
  • Select: Show in output: Codes
  • Select: output format: CSV

Give extraction name and it is useful to select "Notify me whenever this dataset is uploaded" Click the "Finish" button. Once completed you can download the data under the tab "Completed Works".




Annex 2 - Extrapolation methods

In the WOT tool there is data available for the years for which there is also production and international trade data available. To make estimations for future sales (also resulting in future waste) extrapolations have to be done. Depending of the products a different method can be used. Sometimes the latest value(s) are not showing the trend as is to be expected. In those cases an earlier year then the last one can be chosen to connect the extrapolation data to the real data. In those cases all years after the connection year will be replaced with extrapolation data. In those cases were real data was available, these will be removed. There are currently four methods available. In the dataset Flag values will show which method has been used. Those methods are the following:

  1. KPI_PPP
  2. ConnectionConstant
  3. KPI
  4. LastReliableConstant

These methods work with standard parameter values, for instance the year where the first extrapolation is being calculated for. In a special data file exceptions can be entered. In the following these methods are explained in more detail. Whenever exceptions are possible, they will be addressed in those sections. How to enter those exceptions is shown in the last chapter on Extrapolation Exceptions.


KPI_PPP

This is the standard approach. When no method is chosen, this one is used unless it is not possible to calculate. For this method , the relation between KPI (kg EEE per inhabitant) and PPP (Purchasing Power Parity) is used to predict future sales. For PPP estimations of the next 6 years are available. First calculate the linear regression line for all KPI vs PPP for each UNU_Key for all countries except LUX in the last 10 years with exception of the very last available year. Also only use values with flag = 0 or 60, which means no alterations have taken place or only manual alterations have been done. That way we are not extrapolating on extrapolations. Also no empty KPI values should be used. When correlation coefficient of KPI and PPP is smaller than 33% we will not use this approach but the LastReliableConstant instead. The trend of this model will now be connected with the data for all countries. To connect the extrapolation with the real data a connection point has to be chosen. This is done by calculating the KPI vs PPP model for each country specifically and using the outcome as the connecting KPI value for the connection year. This connection year is usually the last available year for which there is real data, however an earlier year can be chosen manually. Using the model outcome for the connection year will ensure that a strange value in the last available year will make the rest of the extrapolations unreliable. In case no trend per country can be calculated, this approach is cancelled and the KPI_PPP method is changed to the KPI method.

Parameter exceptions

The year for which the connection of the extrapolations with the real data is used can be can be put into the “ConnectionYear” variable. Using this value you can for instance say that you don’t want to have the last 2 original data values, but replace them with the extrapolation values.

Flag

Flag value: 200 + last 2 digits year. Meaning 215 is KPI_PPP approach with connection year 2015. This means that all years from 2016 and later are extrapolated.


ConnectionConstant

In the first approach the trend line over all countries was fitted to existing data by using a connection value that was calculated by the trend line for the respective single country. This second approach uses that value as a constant for all future values. In practice this method is not used so much.

Parameter exceptions

The year for which the connection of the extrapolations with the real data is used can be can be put into the “ConnectionYear” variable. Using this value you can for instance say that you don’t want to have the last 2 original data values, but replace them with the extrapolation values.

Flag

Flag value: 300 + last 2 digits year. Meaning 315 is ConnectionConstant approach with connection year 2015. This means that all years from 2016 and later are extrapolated.


KPI

This is an extrapolations method that uses just the KPI trendline for single country over years. All KPI values for the last 10 available years are used, so also those that have flag values other than 0 and 60. That means also extrapolated values are used to calculate the trend line. When this approach cannot be used due to not enough data, then the LastReliableConstant approach will be used.

Parameter exceptions

You can decide not to use the trend of the last 10 data points to estimate the next KPI values, but to uses less than those. For instance you can base the trend just on the last 4 years. This value can be entered in the “YearsForTrend” variable.

Flag

Flag: 400 + last 2 digits year. Meaning 415 is KPI approach with connection year 2015 (=all years from 2016 and later are extrapolated)


LastReliableConstant

This approach just takes the last available value and keeps it as a constant. The approach is used a lot.

Parameter exceptions

The year for which the first extrapolation is calculated can be addressed can be put into the “ConnectionYear” variable. Setting this value for instance to 2015 will mean that the real value of 2014 is copied to 2015 and all later years.

Flag

Flag: 500 + last 2 digits year. Meaning 515 is LastReliableConstant approach with connection year 2015 (=all years from 2016 and further get value from 2015)


Extrapolation Exceptions

You can also manually decide which of these above methods will be used for a particular UNU_Key or for a UNU_Key / Country combination. Those exceptions can be placed in the file tbl_Extrapolation_exceptions.csv which can be found in the data folder. The following fields are available: UNU_Key, Country, ExtrapolationApproach, ConnectionYear and YearsForTrend. When no approach is entered, the standard approach KPI_PPP is used. When no Connection year is entered, the connection year will be the first year for which no data is available in the current dataset. The YearsForTrend variable is only used with the KPI method. When no data is entered, the default value is 10 years.

If you want to enter an approach for all countries for a specific UNU_Key, the Country can be left blank:

UNU_Key Country ExtrapolationApproach ConnectionYear YearsForTrend
0408 LastReliableConstant

An approach for a specific country will look like this:

UNU_Key Country ExtrapolationApproach ConnectionYear YearsForTrend
0408 AUT LastReliableConstant 2014

This means that for Austria in UNU_Key 0408 the extrapolation method is LastReliableConstant. This is the same as used in the previous table, but this time is specifically says to connect the data with year 2014, so the year 2015 will be the first to be extrapolated. Rows that don’t have a Country specified, and are therefore more general, will be executed first. When other rows have a method for a specific country, that method is used for this country. So using the examples of above, in case the last available data in the dataset is from 2015, then for all countries the UNU_Key 0408 will be extrapolated starting in 2016, except for Austria where the extrapolation starts in 2015 and any real data for 2015 will be erased. The order in which the rows are specified in tbl_Extrapolation_exceptions.csv does not matter.