# Day 1 - Comparisons: Part-to-whole

Lately I've been playing Tradle, a game similar to Worldle but instead of country silhouettes it uses a treemap of the country's exports, as well as a total annual export value in USD. Tradle uses OEC (Observatory of Economic Complexity) data as a source.

I'll be recreating that treemap for Guam.

## Data Source
The OEC website has a web-based visualizer and luckily Guam is considered a sovereign country by OEC so it has its own data separate from the United States. An example treemap of Guam can be found [here](https://oec.world/en/visualize/tree_map/hs92/export/gum/all/show/2020/).

However it seems like a premium account is needed to download the data used to create the treemap. The source of the data is listed as 'HS6 REV. 1992 (1995 - 2020)' which leads to a database called BACI, maintained by the French economic think tank CEPII. This database uses Comtrade data from the UN, but performs some data cleaning described [here](http://www.cepii.fr/PDF_PUB/wp_nts/2010/wp2010-23.pdf).

## Harmonized System
The data is available in several categorizations of the Harmonized System (HS). HS was developed by the World Customs Organization to break down trade goods into several levels of categories in order to standardize international trade statistics. There have been several changes to the HS categories every several years and the BACI data is available in HS92, HS96, HS02, HS07, HS12, and HS17 categorizations. The categories become more numerous and complex with each subsequent change, and viewing the data in the OEC visualizer shows that Tradle uses HS92 and it provides plenty of detail and categories, so I will move forward with that dataset.

## BACI Data
The full 2022 HS92 BACI database was a 2.1GB ZIP file of CSV files which I won't include in this repository. I will include my cleaned CSV file that only include Guam's imports and exports and the simple script I used to create that file.

In order to process the data, Guam's country code and the HS97 product codes must be known. These can be found on the UN website.

* [Country Codes](https://comtrade.un.org/data/cache/partnerAreas.json)
* [Product Codes](https://comtrade.un.org/data/cache/classificationH0.json)

Both files are included in the day 1 folder.

The structure of the CSV file is 6 columns representing the year, exporter, importer, product, value, and quantity. Each line represents a yearly total of a single HS92 category from a single exporter to a single importer. The 2020 HS92 file has 10,031,112 rows.

## Data Structure
Here are the first 10 lines of the 2020 HS92 BACI CSV file:
```

2020,4,32,391000,       0.044,        0.009
2020,4,32,401010,       0.164,           NA
2020,4,32,610120,       0.012,        0.003
2020,4,36,071120,       0.718,        0.308
2020,4,36,071290,       0.665,        0.500
2020,4,36,080211,       2.377,        0.710
2020,4,36,080420,       1.342,        1.331
2020,4,36,080620,     523.628,      326.501
```

Here is an example country from the `partnerAreas.json`
```
        {
            "id": "316",
            "text": "Guam"
        },
```

Here is an example heirarchy of product codes from the `classificationH0.json`
```
	{
		"id": "01",
		"text": "01 - Animals; live",
		"parent": "TOTAL"
	},
	{
		"id": "0101",
		"text": "0101 - Horses, asses, mules and hinnies; live",
		"parent": "01"
	},
	{
		"id": "010111",
		"text": "010111 - Horses; live, pure-bred breeding animals",
		"parent": "0101"
	},
```