# Taxonomy - News Subjects
This notebook helps retrieving the full list of News Subjects used by factiva. Returned codes can be added to the Retrieval API payload.

## Code Initialisation
Dependencies and environment initialisation. Taxonomy requests require authentication.

Ensure there's a `.env` file with your credentials in the same directory as this script. Use the `.env.example` file as template.

In [5]:
import os
import requests as r
import pandas as pd
from IPython.display import Markdown
import utils as u
from dotenv import load_dotenv

load_dotenv()

True

## Constants

In [6]:
API_HOST = 'api.dowjones.com'
AUTH_HOST = 'accounts.dowjones.com'
CLIENT_ID = os.getenv('FACTIVA_CLIENTID')
USERNAME = os.getenv('FACTIVA_USERNAME')
PASSWORD = os.getenv('FACTIVA_PASSWORD')
AUTH_URL = f"https://{AUTH_HOST}/oauth2/v1/token"
NS_URL = f"https://{API_HOST}/taxonomy/factiva-news-subjects/list"

## Authentication - Generate Bearer

For details about getting the `bearer_token`, please see the `utils.py` file.

In [7]:
bearer_token = u.get_bearer_token(CLIENT_ID, USERNAME, PASSWORD, AUTH_URL)
if bearer_token:
    display(Markdown(f"**Authentication Successful**: Bearer token created for user {USERNAME.split('@')[0].split('-')[0]}"))
else:
    display(Markdown(f"**Authentication Failed**: Cannot obtain the Bearer token for the user {USERNAME.split('@')[0].split('-')[0]}"))
    
req_headers = {
    "Authorization": f"Bearer {bearer_token}",
    "Content-Type": "application/json",
    "Accept": "application/json"
}

**Authentication Successful**: Bearer token created for user 9ZZZ159100

## Taxonomy API Request

In [8]:
ns_response = r.get(f"{NS_URL}?language=en&parts=Descriptor&parts=ParentDescriptor&parts=Description", headers=req_headers)
ns_dict = ns_response.json()['data']['attributes']['news_subjects']
flat_ns = []
for item in ns_dict:
    parents = []
    if 'parent' in item and item['parent']:
        # parent can be a dict or a list of dicts
        if isinstance(item['parent'], dict):
            parents.append(item['parent'].get('code'))
        elif isinstance(item['parent'], list):
            parents = [p.get('code') for p in item['parent'] if 'code' in p]
    flat_ns.append({
        'ns_code': item.get('code'),
        'ns_name': item.get('descriptor'),
        'description': item.get('description'),
        'parents': parents
    })
ns_df = pd.DataFrame(flat_ns)
if ns_df.shape[0] > 5:
    display(Markdown("**News Subjects Retrieved Successfully**"))
    display(Markdown(f"Returned {ns_df.shape[0]} news subjects"))
else:
    display(Markdown("**News Subjects Retrieval Failed**"))

**News Subjects Retrieved Successfully**

Returned 913 news subjects

## Displaying and filtering News Subjects

### Display

In [9]:
ns_df.head(5)

Unnamed: 0,ns_code,ns_name,description,parents
0,GABOR,Abortion,The non-spontaneous termination of a pregnancy...,[GTREA]
1,NABST,Abstracts,Brief non-evaluative summaries of longer docum...,[NCAT]
2,CACQU,Acquisitions/Mergers,"Acquisitions, takeovers and mergers of compani...",[C181]
3,C181,Acquisitions/Mergers/Shareholdings,"Acquisitions, mergers, or takeovers of compani...",[C18]
4,GADR,Adverse Drug Reactions,Unintended and noxious reactions caused by the...,[GHEA]


### Search Keywords

In [10]:
# By ns_code
# ns_df[ns_df.ns_code == 'C151']
# By ns_name
ns_df[ns_df.ns_name.str.contains('earning', case=False)].head(5)
# By ns_description
# ns_df[ns_df.description.str.contains('earning', case=False)].head(5)

Unnamed: 0,ns_code,ns_name,description,parents
53,GAIML,Artificial Intelligence/Machine Learning,An area of computer science that deals with th...,[GCSCI]
276,C151,Earnings,Announcements of the earnings of a company or ...,[C15]
277,C152,Earnings Projections,Covers projections of earnings or sales figure...,[C15]
278,C1514,Earnings Surprises,Reported earnings significantly above or below...,[C151]
290,GELEAR,E-learning,Education or training delivered partially or e...,[GEDU]
