# Examples
This notebook illustrates how to use this repository in different ways.
- To quickly test the code, a fake dataset has been created in the folder **BACI_HS12_V202501_test**. It is based on the real BACI dataset but has been reduced in size. We will use this fake dataset to demonstrate in this notebook.


## Preparation

Before using the library, several preparation need to be done.  
- Reminder: each time you restart the kernel, you have to import the library and create the class instance again. (Which means run the step 2 and step 3 again before run other cells)

1. Using `pip` to install this library.  
- Reminder: this command only need to be run once, once you sucessfully installed, you will always have the library in your current environment.

In [1]:
# ! pip install rca_batch_calc  # Uncomment this line if you need to install the library.

2. Import **rca_batch_calc** library along with other required libraries.

In [2]:
from rca_batch_calc.parallel_calc import Parallel_Calculator
from rca_batch_calc.data_extract import DataExtract

from functools import partial
from concurrent.futures import ThreadPoolExecutor, as_completed
import pandas as pd
import os

3. Create instances of `Parallel_Calculator` and `DataExtract` class. 
- `Parallel_Calculator`: is used to calculate RCA in parallel.
- `DataExtract`: is used to filter the needed product from BACI dataset.

In [3]:
parallel = Parallel_Calculator()
extractor = DataExtract()

## Product filter

Because the BACI dataset includes trading data for 5000 products, it's difficult to visualize it in EXCEL, let alone operate on the data. Extract the targeted products is easier for further manipulate.  
  
So, let's do it.

1. You need to find the codes of the products you want to do experiments on, and then define them as constants in the code. The comparison table is included in the dataset, called: **product_codes_HS12_V202501.csv**.  
- Option 1: Define it in this notebook, it's convinent and easy to use.
- Option 2: Define in in `constants.py`, it's easier to manage and modify if you have many constants, good for unified management and modification.

    In this notebook, I define all the constants in [constants.py](./constants.py) (Click the name, you can direct to the script). The meaning of each constant and what you should assign to the variable is illustrated within the script. It's located in the same directory as this notebook.  

In [4]:
# Let's import the products and all of other constants.
from constants import *

# Here we print to check if we have the correct product code. (You call the constant by directly write the variable name defined in the constants.py)
print(PROD)

[121221, 121229]


2. Here, we iterate all the `CSV` files and filter out the required products.

In [5]:
product_data = extractor.find_product(FOLDER_PATH, PROD)

Extracting file: BACI_HS12_Y2014_V202501.csv
Extracting finished: BACI_HS12_Y2014_V202501.csv
Extracting file: BACI_HS12_Y2012_V202501.csv
Extracting finished: BACI_HS12_Y2012_V202501.csv
Extracting file: BACI_HS12_Y2013_V202501.csv
Extracting finished: BACI_HS12_Y2013_V202501.csv


3. Observe the extracted data.

In [6]:
product_data.head()

Unnamed: 0,t,i,j,k,v,q
0,2014,32,36,121221,379.273,31.615
1,2014,32,36,121229,348.991,30.081
2,2014,32,70,121221,0.045,0.001
3,2014,32,156,121229,116.567,70.003
4,2014,32,218,121221,46.287,10.0


3. The output is saved in the BACI dataset folder as **output.csv**.

In [7]:
extractor.save_csv(product_data, FOLDER_PATH)

Extracted data saved.
---------------------


4. Since the country codes are not easy to read or understand, we use the `transform_countries` function to convert country codes into country names.
- The inputs to the function are the file with country codes and the comparison table file.

In [8]:
extractor.transform_countries(f"{FOLDER_PATH}/output.csv", f"{FOLDER_PATH}/country_codes_V202501.csv")

## RCA calculator

1. Define the constants, all the constants in the `constants.py` file need to be defined to calculate RCA.  
(If you don't know the product code, please check the comparison table. It's included in the dataset, called: **product_codes_HS12_V202501.csv**.)
- Option 1: Define it in this notebook, it's convinent and easy to use.
- Option 2: Define in in `constants.py`, it's easier to manage and modify if you have many constants, good for unified management and modification.

    In this notebook, I define all the constants in [constants.py](./constants.py) (Click the name, you can direct to the script). The meaning of each constant and what you should assign to the variable is illustrated within the script. It's located in the same directory as this notebook.  

2. Calculate "xij" values for all files in the `BACI_HS12_V202501_test` folder. (According to the RCA formula, "xij" represent export value of commodity i from a country to country j.)

In [9]:
xij_process_file = partial(parallel.run_xij, prod=PROD)
parallel.parallel_run(FOLDER_PATH, xij_process_file, "xij")

Processing BACI_HS12_Y2014_V202501.csv in thread: 123145685962752Processing BACI_HS12_Y2012_V202501.csv in thread: 123145702752256

Processing BACI_HS12_Y2013_V202501.csv in thread: 123145719541760
BACI_HS12_Y2012_V202501.csv is done.
BACI_HS12_Y2013_V202501.csv is done.
BACI_HS12_Y2014_V202501.csv is done.
Total execution time: 0.02 seconds


3. Calculate "xin" values for all files in the `BACI_HS12_V202501_test` folder. (According to the RCA formula, "xin" represent total export value of commodity i from all exporting countries to country j.)

In [10]:
xin_process_file = partial(parallel.run_xin, val=VAL, prod=PROD)
parallel.parallel_run(FOLDER_PATH, xin_process_file, "xin", XIN_NAMES)

Processing BACI_HS12_Y2014_V202501.csv in thread: 123145685962752Processing BACI_HS12_Y2012_V202501.csv in thread: 123145702752256
Processing BACI_HS12_Y2013_V202501.csv in thread: 123145719541760

BACI_HS12_Y2013_V202501.csv is done.
BACI_HS12_Y2014_V202501.csv is done.
BACI_HS12_Y2012_V202501.csv is done.
Total execution time: 1.16 seconds


4. Calculate "xwj" values for all files in the `BACI_HS12_V202501_test` folder. (According to the RCA formula, "xwj" represent total export value of all commodities from a country to country j.)

In [11]:
xwj_process_file = partial(parallel.run_xwj, val=VAL)
parallel.parallel_run(FOLDER_PATH, xwj_process_file, "xwj", XWJ_NAMES)

Processing BACI_HS12_Y2014_V202501.csv in thread: 123145685962752
Processing BACI_HS12_Y2012_V202501.csv in thread: 123145702752256
Processing BACI_HS12_Y2013_V202501.csv in thread: 123145719541760
Handling exporter 32.
Handling exporter 4.
Handling exporter 32.
Handling exporter 100.
Handling exporter 84.
Handling exporter 108.
BACI_HS12_Y2012_V202501.csv is done.
BACI_HS12_Y2014_V202501.csv is done.
BACI_HS12_Y2013_V202501.csv is done.
Total execution time: 8.80 seconds


5. Calculate "xwn" values for all files in the `BACI_HS12_V202501_test` folder. (According to the RCA formula, "xwn" represent total export value of all commodities from all exporting to country j.)

In [12]:
xwn_process_file = partial(parallel.run_xwn, val=VAL)
parallel.parallel_run(FOLDER_PATH, xwn_process_file, "xwn", XWN_NAMES)

Processing BACI_HS12_Y2014_V202501.csv in thread: 123145685962752
Processing BACI_HS12_Y2012_V202501.csv in thread: 123145702752256
Processing BACI_HS12_Y2013_V202501.csv in thread: 123145719541760
BACI_HS12_Y2014_V202501.csv is done.
BACI_HS12_Y2012_V202501.csv is done.
BACI_HS12_Y2013_V202501.csv is done.
Total execution time: 0.37 seconds


6. Calculate RCA values for all dataset files in the `BACI_HS12_V202501_test` folder, by formula: $RCA^i_j = \left( \frac{X^i_j}{X^i_n} \middle/ \frac{X^w_j}{X^w_n} \right)$.  
The output will be in the same folder as this notebook.

In [13]:
file_path_list = [
    os.path.join(os.getcwd(), "xij.csv"),
    os.path.join(os.getcwd(), "xin.csv"),
    os.path.join(os.getcwd(), "xwj.csv"),
    os.path.join(os.getcwd(), "xwn.csv")
]

max_workers = os.cpu_count() * 2 if os.cpu_count() else 4
with ThreadPoolExecutor(max_workers=max_workers) as executor:
    futures = {executor.submit(parallel.run_rca, val, file_path_list): val for val in VAL}

    dfs = [pd.read_csv(file_path_list[0], dtype={'Year': int, 'Importer': int, 'Exporter': int}).iloc[:, :4]]
    for future in as_completed(futures):
        df = future.result()
        dfs.append(df)

final_df = pd.concat(dfs, axis=1)
final_df.to_csv("rca.csv", index=False)

Processing in thread: 123145685962752
Processing in thread: 123145702752256
