# Taxonomy - Sources Search
This notebook helps retrieving sepcific source codes based on a set of parameters. Returned codes can be added to the Retrieval API payload.

## Code Initialization
Dependencies and environment initialization. Taxonomy requests require authentication.

Ensure there's a `.env` file with your credentials in the same directory as this script. Use the `.env.example` file as template.

In [1]:
import os
import requests as r
import pandas as pd
from IPython.display import Markdown
import utils as u
from dotenv import load_dotenv

load_dotenv()

True

## Constants

In [2]:
API_HOST = 'api.dowjones.com'
AUTH_HOST = 'accounts.dowjones.com'
CLIENT_ID = os.getenv('FACTIVA_CLIENTID')
USERNAME = os.getenv('FACTIVA_USERNAME')
PASSWORD = os.getenv('FACTIVA_PASSWORD')
AUTH_URL = f"https://{AUTH_HOST}/oauth2/v1/token"
SRC_URL = f"https://{API_HOST}/taxonomy/factiva-sources/search"

## Authentication - Generate Bearer

For details about getting the `bearer_token`, please see the `utils.py` file.

In [3]:
bearer_token = u.get_bearer_token(CLIENT_ID, USERNAME, PASSWORD, AUTH_URL)
if bearer_token:
    display(Markdown(f"**Authentication Successful**: Bearer token created for user {USERNAME.split('@')[0].split('-')[0]}"))
else:
    display(Markdown(f"**Authentication Failed**: Cannot obtain the Bearer token for the user {USERNAME.split('@')[0].split('-')[0]}"))
    
req_headers = {
    "Authorization": f"Bearer {bearer_token}",
    "Content-Type": "application/json",
    "Accept": "application/json"
}

**Authentication Successful**: Bearer token created for user 9ZZZ159100

## Source Search Keyword

In [4]:
source_keyword = "wsj"

## Taxonomy API Request

In [5]:
src_response = r.get(f"{SRC_URL}?language=en&filter.search_string={source_keyword}&filter.alias_only=false&filter.exclude_discontinued=true&filter.search_in_groups=false", headers=req_headers)
if src_response.status_code == 200:
    src_dict = src_response.json()['data']
    src_count = src_response.json()['meta']['total_count']
    display(Markdown(f"Found: {src_count} results for the keyword '{source_keyword}'"))
else:
    display(Markdown(f"**Error**: {src_response.status_code} - {src_response.reason}"))


Found: 14 results for the keyword 'wsj'

Results processing and transform into a Pandas DataFrame

In [6]:
flat_src = []
for item in src_dict:
    flat_src.append({
        'source_code': item.get('attributes').get('code'),
        'source_name': item.get('attributes').get('descriptor'),
        'status': item.get('attributes').get('status').get('active_status', ''),
        'common_name': item.get('attributes').get('alternative_names').get('common_name', '') if item.get('attributes').get('alternative_names') else '',
        'local_name': item.get('attributes').get('alternative_names').get('local_name', '') if item.get('attributes').get('alternative_names') else ''
    })
src_df = pd.DataFrame(flat_src)

## Display Sources Results

In [7]:
src_df

Unnamed: 0,source_code,source_name,status,common_name,local_name
0,WSJCOM,Buy Side from WSJ,Active,Buy Side from WSJ,Buy Side from WSJ
1,WSJCN,The Wall Street Journal Online (Chinese - Simp...,Active,The Wall Street Journal Online (Chinese Langua...,华尔街日报中文版 (简体)
2,WSJOJP,The Wall Street Journal Online,Active,The Wall Street Journal Online (Japanese Langu...,ウォール・ストリート・ジャーナル日本版
3,WSJO,The Wall Street Journal Online,Active,The Wall Street Journal Online,The Wall Street Journal Online
4,J,The Wall Street Journal,Active,The Wall Street Journal,The Wall Street Journal
5,WSJNLT,WSJ Newsletters,Active,WSJ Newsletters,WSJ Newsletters
6,WSJPOD,WSJ Podcasts,Active,WSJ Podcasts,WSJ Podcasts
7,RSTPROBK,WSJ Pro Bankruptcy,Active,WSJ Pro Bankruptcy,WSJ Pro Bankruptcy
8,RSTPROCB,WSJ Pro Central Banking,Active,WSJ Pro Central Banking,WSJ Pro Central Banking
9,RSTPROCY,WSJ Pro Cybersecurity,Active,WSJ Pro Cybersecurity,WSJ Pro Cybersecurity
