# Taxonomies - Dow Jones Intellingent Indentifiers (DJID)
News content in Dow Jones is classificed across multiple taxonomies like `regions`, `industries`, `subjects` (aka topics), `companies`, `executives` among others.

The current notebook shows how to interact with these taxonomies to convert codes to human-readable values or viceversa.

In this notebook...
* [Dependencies and Initialisation](#dependencies-and-initialisation)
* [The Taxonomy Class](#the-taxonomy-class)
* [Loading Taxonomies into Pandas DataFrames](#loading-taxonomies-into-pandas-dataframes)
* [Export to a file](#export-to-a-file)
* [Next Steps](#next-steps)

## Dependencies and Initialisation
Import statements and environment initialisation using the package `dotenv`. More details in the [Configuration notebook](0.2_configuration.ipynb).

In [2]:
from factiva.news import Taxonomy
from dotenv import load_dotenv
load_dotenv()
print('Done!')

Done!


## The Taxonomy Class

The taxonomy class interacts with the API and thus, requires to authenticate by using the user_key parameter. By default it is taken from the environment variable `FACTIVA_USERKEY`.

To get started, show the list of available taxonomies using the attribute `categories`


In [3]:
t = Taxonomy()
t.categories

['news_subjects',
 'regions',
 'hierarchyRegion',
 'companies',
 'hierarchyIndustry',
 'industries',
 'executives',
 'hierarchySubject']

## Loading Taxonomies into Pandas DataFrames
Simply assign the output of `get_category_codes(category:str)` to a variable.

```Python
hr = t.get_category_codes('industries')
```

>**Note**: The category `executives` returns a big dataset that may kill the Python kernel.

In [8]:
taxonomy_name = 'industries'
tc = t.get_category_codes(taxonomy_name)
print(tc.shape)
tc.head()

(960, 1)


Unnamed: 0_level_0,description
code,Unnamed: 1_level_1
i3dprn,3D/4D Printing
i246,Abrasive Products
i4752105,Academic/Scientific/Trade Journals
i836,Accounting
iacc,Accounting/Consulting


## Export to a file
For convenience when working in other systems or loading a dictionary to a database table, the Pandas `DataFrame.to_*` options are a useful operation to export the loaded data.

In [9]:
# Save to CSV
tc.to_csv(f'./data/{taxonomy_name}.csv')

In [22]:
# Save to JSON
tc.to_json(f'./data/{taxonomy_name}.json', orient='table')

# Next Steps

* Understand how [companies](1.3_company_identifiers.ipynb) are identified and how those can be mapped to a predefined list.
* Run an [explain](1.4_snapshot_explain.ipynb) to get a quick idea about how to query the archive.
* Explore the available fields in the archive and how to apply conditions on them according to the [query reference](2.1_complex_large_queries.ipynb).