# Taxonomy - Industries
This notebook helps retrieving the full list of Industries used by factiva. Returned codes can be added to the Retrieval API payload.

## Code Initialisation
Dependencies and environment initialisation. Taxonomy requests require authentication.

Ensure there's a `.env` file with your credentials in the same directory as this script. Use the `.env.example` file as template.

In [15]:
import os
import requests as r
import pandas as pd
from IPython.display import Markdown
import utils as u
from dotenv import load_dotenv

load_dotenv()

True

## Constants

In [16]:
API_HOST = 'api.dowjones.com'
AUTH_HOST = 'accounts.dowjones.com'
CLIENT_ID = os.getenv('FACTIVA_CLIENTID')
USERNAME = os.getenv('FACTIVA_USERNAME')
PASSWORD = os.getenv('FACTIVA_PASSWORD')
AUTH_URL = f"https://{AUTH_HOST}/oauth2/v1/token"
IND_URL = f"https://{API_HOST}/taxonomy/factiva-industries/list"

## Authentication - Generate Bearer

For details about getting the `bearer_token`, please see the `utils.py` file.

In [17]:
bearer_token = u.get_bearer_token(CLIENT_ID, USERNAME, PASSWORD, AUTH_URL)
if bearer_token:
    display(Markdown(f"**Authentication Successful**: Bearer token created for user {USERNAME.split('@')[0].split('-')[0]}"))
else:
    display(Markdown(f"**Authentication Failed**: Cannot obtain the Bearer token for the user {USERNAME.split('@')[0].split('-')[0]}"))
    
req_headers = {
    "Authorization": f"Bearer {bearer_token}",
    "Content-Type": "application/json",
    "Accept": "application/json"
}

**Authentication Successful**: Bearer token created for user 9ZZZ159100

## Taxonomy API Request

In [18]:
ind_response = r.get(f"{IND_URL}?language=en&parts=All", headers=req_headers)
ind_dict = ind_response.json()['data']['attributes']['industries']
flat_ind = []
for item in ind_dict:
    parents = []
    if 'parent' in item and item['parent']:
        # parent can be a dict or a list of dicts
        if isinstance(item['parent'], dict):
            parents.append(item['parent'].get('code'))
        elif isinstance(item['parent'], list):
            parents = [p.get('code') for p in item['parent'] if 'code' in p]
    flat_ind.append({
        'ind_code': item.get('code'),
        'ind_name': item.get('descriptor'),
        'description': item.get('description'),
        'parents': parents
    })
ind_df = pd.DataFrame(flat_ind)
if ind_df.shape[0] > 5:
    display(Markdown("**Industries Retrieved Successfully**"))
    display(Markdown(f"Returned {ind_df.shape[0]} industries"))
else:
    display(Markdown("**Industries Retrieval Failed**"))

**Industries Retrieved Successfully**

Returned 949 industries

## Displaying and filtering News Subjects

### Display & Filter

In [19]:
# Show all
# ind_df
# Top Industries in the hierarchy
# ind_df[ind_df['parents'].apply(lambda x: len(x) == 0)]
# By industry code
# ind_df[ind_df.ind_code == 'i3302022']
# Filter by parent
ind_df[ind_df['parents'].apply(lambda x: 'itech' in x)]

Unnamed: 0,ind_code,ind_name,description,parents
151,i3302,Computers/Consumer Electronics,The manufacture of computers and consumer elec...,[itech]
160,i3302022,Artificial Intelligence Technologies,Artificial Intelligence Technologies,[itech]
171,i3441,Telecommunications Equipment,Equipment and components used to enable the pr...,[itech]
216,i3dprn,3D/4D Printing Technology,3D and 4D printing technologies. Includes biop...,[itech]
559,iadrive,Autonomous Driving Technologies,Technologies that assist the driver in some as...,"[iaut, itech]"
565,iagtech,Agriculture Technology,"Technologies used in agriculture, aquaculture,...",[itech]
597,iblock,Blockchain Technology,"Blockchain is a decentralized, digital databas...",[itech]
678,ielear,E-learning/Educational Technology,Software and online platforms that provide onl...,"[i983, itech]"
701,ifmsoft,Financial Technology,FinTech is the design and provision of technol...,"[ifinal, itech]"
742,iindele,Industrial Electronics,"Electronic equipment, parts and components for...","[iindstrls, itech]"


### Search Keywords

In [22]:
# By industry name
ind_df[ind_df.ind_name.str.contains('artificial intelligence', case=False)]
# By industry description
# ind_df[ind_df.description.str.contains('artificial intelligence', case=False)]

Unnamed: 0,ind_code,ind_name,description,parents
160,i3302022,Artificial Intelligence Technologies,Artificial Intelligence Technologies,[itech]
