# Taxonomy - Organizations Search
This notebook helps retrieving sepcific organizations codes based on a set of parameters. Returned codes be added to the Retrieval API payload.

## Code Initialization
Dependencies and environment initialization. Taxonomy requests require authentication.

Ensure there's a `.env` file with your credentials in the same directory as this script. Use the `.env.example` file as template.

In [1]:
import os
import requests as r
import pandas as pd
from IPython.display import Markdown
import utils as u
from dotenv import load_dotenv

load_dotenv()

True

## Constants

In [2]:
API_HOST = 'api.dowjones.com'
AUTH_HOST = 'accounts.dowjones.com'
CLIENT_ID = os.getenv('FACTIVA_CLIENTID')
USERNAME = os.getenv('FACTIVA_USERNAME')
PASSWORD = os.getenv('FACTIVA_PASSWORD')
AUTH_URL = f"https://{AUTH_HOST}/oauth2/v1/token"
CO_URL = f"https://{API_HOST}/taxonomy/factiva-companies/search"

## Authentication - Generate Bearer

For details about getting the `bearer_token`, please see the `utils.py` file.

In [3]:
bearer_token = u.get_bearer_token(CLIENT_ID, USERNAME, PASSWORD, AUTH_URL)
if bearer_token:
    display(Markdown(f"**Authentication Successful**: Bearer token created for user {USERNAME.split('@')[0].split('-')[0]}"))
else:
    display(Markdown(f"**Authentication Failed**: Cannot obtain the Bearer token for the user {USERNAME.split('@')[0].split('-')[0]}"))
    
req_headers = {
    "Authorization": f"Bearer {bearer_token}",
    "Content-Type": "application/json",
    "Accept": "application/json"
}

**Authentication Successful**: Bearer token created for user 9ZZZ159100

## Company Search Keyword

In [4]:
company_keyword = "greenpeace"

## Taxonomy API Request

In [5]:
co_response = r.get(f"{CO_URL}?filter.search_string={company_keyword}&language=en&parts=All&sort_by=CompanyName", headers=req_headers)
if co_response.status_code == 200:
    co_dict = co_response.json()['data']
    co_count = co_response.json()['meta']['total_count']
    display(Markdown(f"Found: {co_count} results for the keyword '{company_keyword}'"))
else:
    display(Markdown(f"**Error**: {co_response.status_code} - {co_response.reason}"))


Found: 6 results for the keyword 'greenpeace'

Results processing and transform into a Pandas DataFrame

In [6]:
flat_co = []
for item in co_dict:
    country_iso2 = None
    country_iso3 = None
    tptc = item.get('attributes').get('third_party_taxonomy_code')
    if tptc:
        for tpc in tptc:
            if tpc.keys() == {'code'} and len(tpc.get('code')) == 2:
                country_iso2 = tpc.get('code')
            elif tpc.keys() == {'code'} and len(tpc.get('code')) == 3:
                country_iso3 = tpc.get('code')
    flat_co.append({
        'org_code': item.get('attributes').get('code'),
        'org_name': item.get('attributes').get('name'),
        'duns': str(int(float(item.get('attributes').get('duns_number')))) if item.get('attributes').get('duns_number') else '',
        'status': item.get('attributes').get('company_status').get('active_status', 'N/A'),
        'address1': item.get('attributes').get('address').get('address1', '') if item.get('attributes').get('address') else '',
        'address2': item.get('attributes').get('address').get('address2', '') if item.get('attributes').get('address') else '',
        'city': item.get('attributes').get('address').get('city', '') if item.get('attributes').get('address') else '',
        'postal_code': item.get('attributes').get('address').get('postal_code', '') if item.get('attributes').get('address') else '',
        'country_iso2': country_iso2,
        'country_iso3': country_iso3,
        'factiva_region_code': item.get('attributes').get('primary_region').get('code'),
        'factiva_region_name': item.get('attributes').get('primary_region').get('descriptor'),
        'factiva_industry_code': item.get('attributes').get('primary_industry').get('code'),
        'factiva_industry_name': item.get('attributes').get('primary_industry').get('descriptor',''),
        'listed': item.get('attributes').get('company_status').get('listing_status', 'N/A'),
        'is_newscoded': item.get('attributes').get('company_status').get('is_news_coded', 'N/A')
    })
co_df = pd.DataFrame(flat_co)

## Display Organization Results

In [7]:
co_df

Unnamed: 0,org_code,org_name,duns,status,address1,address2,city,postal_code,country_iso2,country_iso3,factiva_region_code,factiva_region_name,factiva_industry_code,factiva_industry_name,listed,is_newscoded
0,GREEEE,Green Planet Energy eG,325955099.0,Active,Schulterblatt 120,,Hamburg,20357,DE,DEU,GFR,Germany,I163,Renewable Energy Generation,Unlisted,True
1,GPCNDA,Greenpeace Canada,209594217.0,Active,33 Cecil Street,,Toronto,M5T 1N1,CA,CAN,CANA,Canada,IEWM,Environment/Waste Management,Unlisted,True
2,GRNPCE,Greenpeace e.V.,317578292.0,Active,Hongkongstraße 10,,Hamburg,20457,DE,DEU,GFR,Germany,,,Unlisted,True
3,TXJCHR,Greenpeace India,677374112.0,Active,"60, Wellington Street",Richmond Town,Bangalore,560025,IN,IND,INDIA,India,,,Unlisted,True
4,GPEACI,Greenpeace International,,Active,Ottho Heldringstraat 5,,Amsterdam,1066,NL,NLD,NETH,Netherlands,,,Unlisted,True
5,GPEACE,Greenpeace UK,227080231.0,Active,Canonbury Villas,,London,N1 2PN,GB,GBR,UK,United Kingdom,I8395413,Environmental Consulting Services,Unlisted,True
