# Taxonomies - Dow Jones Intellingent Indentifiers (DJID)
News content in Dow Jones is classificed across multiple taxonomies like `regions`, `industries`, `subjects` (aka topics), `companies`, `executives` among others.

The current notebook shows how to interact with these taxonomies to convert codes to human-readable text or viceversa.

In this notebook...
* [Dependencies and Initialisation](#Dependencies-and-Initialisation)
* [The Taxonomy Class](#The-Taxonomy-Class)
* [Loading Taxonomies into Pandas DataFrames](#Loading-Taxonomies-into-Pandas-DataFrames)
* [Combining with Snapshots data](#Combining-with-Snapshots-Data)

## Dependencies and Initialisation
Import statements and environment initialisation using the package `dotenv`. More details in the [Configuration notebook](0.2_configuration.ipynb).

In [1]:
from factiva.news import Taxonomy
from dotenv import load_dotenv
load_dotenv()
print('Done!')

Done!


## The Taxonomy Class

The taxonomy class interacts with the API and thus, requires to authenticate by using the user_key parameter:
- **user_key**: str, UserKey, optional. (default None)
        String containing the 32-character long APi Key. If not provided, the
        constructor will try to obtain its value from the FACTIVA_USERKEY
        environment variable.

To get started, it is possible to get the list of available taxonomies using the function `get_categories()`

Examples:

```Python
# Creates a new Taxonomy instance getting the user_key value from the FACTIVA_USERKEY env variable.
t = Taxonomy()
# Uses the provided key value.
t = Taxonomy(user_key='abcd1234abcd1234abcd1234abcd1234')
t.categories()

['news_subjects',
 'regions',
 'hierarchyRegion',
 'companies',
 'hierarchyIndustry',
 'industries',
 'executives',
 'hierarchySubject']

```

In [2]:
t = Taxonomy()
t.categories

['news_subjects',
 'regions',
 'hierarchyRegion',
 'companies',
 'hierarchyIndustry',
 'industries',
 'executives',
 'hierarchySubject']

## Loading Taxonomies into Pandas DataFrames
Simply assign the output of `get_category_codes(category:str)` to a variable.

```Python
hr = t.get_category_codes('hierarchyRegion')
```

>**Note**: The category `executives` returns a big dataset that may kill the Python kernel. Alternatives are currently in analysis (Jul 2021).

In [3]:
tc = t.get_category_codes('regions')
print(tc.shape)
tc.head()

(831, 1)


Unnamed: 0_level_0,description
code,Unnamed: 1_level_1
ABIDJA,Abidjan
ABKHAZ,Abkhazia
ABUDH,Abu Dhabi
ABUJA,Abuja
ACCRA,Accra


In [4]:
# Save to CSV
tc.to_csv('regions.csv')

## Combining with Snapshots Data
(Depends on Sample data)