<h1><center>  Corporate Credit Screen - Nucleus APIs Use Cases</center></h1>


<h1><center>  SumUp Analytics, Proprietary & Confidential</center></h1>
<h1><center>  Disclaimers and Terms of Service available at www.sumup.ai</center></h1>


#  

 


## Objective: 
-	Develop a ranking of corporate bonds using content published by corporations


## Data:
-	A chosen list of corporations, for instance within the same industry sector, or with similar market capitalization
 - 	Company reports such as SEC filings
 - 	Press releases
 - 	Earning call transcripts



## Nucleus APIs used:
-	Dataset creation API
 - 	*api_instance.post_upload_file(file, dataset)*
 - 	*nucleus_helper.import_files(api_instance, dataset, file_iters, processes=1)*

        nucleus_helper.import_files leverages api_instance.post_upload_file with parallel execution to speed-up the dataset creation


-	Topic Modeling API
 - 	*api_instance.post_topic_api(payload)*


-	Topic Sentiment API
 - 	*api_instance.post_topic_sentiment_api(payload)*


-	DocInfo API
 - 	*api_instance.post_doc_info(payload)*


-	DatasetInfo API
 - 	*api_instance.post_dataset_info(payload)*


## Approach:

### 1.	Dataset Preparation
-	Create a Nucleus dataset containing all relevant documents over a chosen historical period

    

In [None]:
import csv
import json
import nucleus_api.api.nucleus_api as nucleus_helper
import nucleus_api
from nucleus_api.rest import ApiException

configuration = nucleus_api.Configuration()
configuration.host = 'UPDATE-WITH-API-SERVER-HOSTNAME'
configuration.api_key['x-api-key'] = 'UPDATE-WITH-API-KEY'

# Create API instance
api_instance = nucleus_api.NucleusApi(nucleus_api.ApiClient(configuration))

In [None]:
print('--------- Append all files from local folder to dataset in parallel -----------')
folder = 'Corporate_documents'         
dataset = 'Corporate_docs'# str | Destination dataset where the file will be inserted.

# build file iterable from a folder recursively. 
# Each item in the iterable is in the format below:
# {'filename': filename,   # filename to be uploaded. REQUIRED
#  'metadata': {           # metadata for the file. Optional
#      'key1': val1,       # keys can have arbiturary names as long as the names only
#      'key2': val2        # contain alphanumeric (0-9|a-z|A-Z) and underscore (_)
#   } 
# }
file_iter = []
for root, dirs, files in os.walk(folder):
    for file in files:
        #if Path(file).suffix == '.pdf': # .txt .doc .docx .rtf .html .csv also supported
            file_dict = {'filename': os.path.join(root, file),
                         'metadata': {'ticker': 'AAPL',
                                      'company': 'Apple',
                                      'category': 'Press Release',
                                      'date': '2019-01-01'}}
            file_iter.append(file_dict)

file_props = nucleus_helper.upload_files(api_instance, dataset, file_iter, processes=4)
for fp in file_props:
    print(fp.filename, '(', fp.size, 'bytes) has been added to dataset', dataset)

-	An alternative using SEC filings and the embedded Nucleus datafeed 

In [None]:
dataset = "Corporate_docs" 
period_start = "2010-01-01" 
period_end= "2019-06-01"

payload = nucleus_api.EdgarQuery(destination_dataset=dataset,
                                tickers=["FB", "AMZN", "INTL", "IBM", "NFLX", "GOOG"], 
                                filing_types=["10-K", "10-K/A", "10-Q", "10-Q/A"], 
                                sections=["Quantitative and Qualitative Disclosures about Market Risk",
                                          "Management's Discussion and Analysis of Financial Condition and Results of Operations",
                                          "Risk Factors"],
                                period_start=period_start,
                                period_end=period_end)

api_response = api_instance.post_create_dataset_from_sec_filings(payload)

**You can subsequently work on specific time periods within your dataset directly in the APIs, as illustrated below**

### 2.	Sentiment and Topic Contribution = Screen Analysis
- Identify and Extract key topics at a given point in time on the subset of documents at that date 


- Measure the sentiment on each topic to classify all key topics into ‘good’ and ‘bad’ topics


- Determine the exposure of each company to each topic


- Aggregate the exposures of a given company across key topics based on the ‘good’ or ‘bad’ nature of the topics, to derive a ranking of the companies
 - The top company is the one with the most exposure to good topics and/or the least exposure to bad topics
 
 
- Further down, we discuss how to refine this analysis by leveraging the different parameters available to the user




In [None]:
# Determine which companies are associated to the documents contributing to the topics
import numpy as np

payload = nucleus_api.DocInfo(dataset='Corporate_docs')
api_response = api_instance.post_doc_info(payload)

company_sources = []
for res in api_response.result:        
    company_sources.append(res.attribute['ticker']) 

company_list = np.unique(company_sources)


print('-------- Get topic sentiment and exposure per firm ----------------')

payload = nucleus_api.TopicSentimentModel(dataset='Corporate_docs',          
                                        query='',                   
                                        num_topics=20,
                                        num_keywords=8,
                                        period_start="2018-11-01 00:00:00",
                                        period_end="2019-01-01 00:00:00")
try:
    api_response = api_instance.post_topic_sentiment_api(payload)    
    api_ok = True
except ApiException as e:
    api_error = json.loads(e.body)
    print('ERROR:', api_error['message'])
    api_ok = False

if api_ok:   
    company_rankings = np.zeros([len(company_list), len(api_response.result)])
    for i, res in enumerate(api_response.result):
        print('Topic', i, 'sentiment:')
        print('    Keywords:', res.keywords)

        # Aggregate all document exposures within a topic into a company exposure, using the dataset metadata
        payload = nucleus_api.DocInfo(dataset='Corporate_docs', doc_ids = res.doc_ids)
        api_response1 = api_instance.post_doc_info(payload)

        company_sources = [] # This list will be much shorter than the whole dataset because not all documents contribute to a given topic
        for res1 in api_response1.result:        
            company_sources.append(res1.attribute['ticker']) 

        company_contributions = np.zeros([len(company_list), 1])
        for j in range(len(company_list)):
            for k in range(len(company_sources)):
                if company_sources[k] == company_list[j]:
                    company_contributions[j] += json.loads(res.doc_topic_exposures[0])[k]

        company_rankings[:, i] = [x[0] for x in  float(res.strength) * float(res.sentiment) * company_contributions[:]]  

        print('---------------')


    # Add up the ranking of companies per topic into the final credit screen
    Corporate_screen = np.mean(company_rankings, axis=1)

-	Repeat the above tasks for each date in the historical period to get the complete history of your credit screen

In [None]:
import datetime
import numpy as np

print('------------ Retrieve all companies found in the dataset ----------')

payload = nucleus_api.DocInfo(dataset='Corporate_docs')
api_response = api_instance.post_doc_info(payload)

company_sources = []
for res in api_response.result:        
    company_sources.append(res.attribute['ticker']) 

company_list = np.unique(company_sources)


print('--------------- Retrieve the time range of the dataset -------------')

payload = nucleus_api.DatasetInfo(dataset='Corporate_docs', query='')
api_response = api_instance.post_dataset_info(payload)

first_date = datetime.datetime.fromtimestamp(float(api_response.result.time_range[0]))
last_date = datetime.datetime.fromtimestamp(float(api_response.result.time_range[1]))
delta = last_date - first_date

# Now loop through time and at each date, compute the ranking of companies
T = 90 # The look-back period in days

Corporate_screen = []
for i in range(delta.days):  
    if i == 0:
        end_date = first_date + datetime.timedelta(days=T)
 
    # first and last date used for the lookback period of T days
    start_date = end_date - datetime.timedelta(days=T)
    start_date_str = start_date.strftime("%Y-%m-%d 00:00:00")

    # We want a daily indicator
    end_date = end_date + datetime.timedelta(days=1) 
    end_date_str = end_date.strftime("%Y-%m-%d 00:00:00")

    payload = nucleus_api.TopicSentimentModel(dataset="Corporate_docs",      
                                            query='',                   
                                            num_topics=20,
                                            num_keywords=8,
                                            period_start=start_date_str,
                                            period_end=end_date_str)
    try:
        api_response = api_instance.post_topic_sentiment_api(payload)
        api_ok = True
    except ApiException as e:
        api_error = json.loads(e.body)
        print('ERROR:', api_error['message'])
        api_ok = False

    if api_ok:   
        company_rankings = np.zeros([len(company_list), len(api_response.result)])
        for l, res in enumerate(api_response.result):
            # Aggregate all document exposures within a topic into a company exposure, using the dataset metadata
            payload = nucleus_api.DocInfo(dataset='Corporate_docs', doc_ids=res.doc_ids)
            api_response1 = api_instance.post_doc_info(payload)

            company_sources = [] # This list will be much shorter than the whole dataset because not all documents contribute to a given topic
            for res1 in api_response1.result:        
                company_sources.append(res1.attribute['ticker']) 

            company_contributions = np.zeros([len(company_list), 1])
            for j in range(len(company_list)):
                for k in range(len(company_sources)):
                    if company_sources[k] == company_list[j]:
                        company_contributions[j] += json.loads(res.doc_topic_exposures[0])[k]

            company_rankings[:, l] = [x[0] for x in  float(res.strength) * float(res.sentiment) * company_contributions[:]]      

        # Add up the ranking of companies per topic into the final credit screen
        Corporate_screen.append(np.mean(company_rankings, axis=1))

### 3.	Results Interpretation
-	Plot the time series of the company rankings against the beta-adjusted corporate bond spreads

### 4.	Fine Tuning

#### a.	Tailoring the topics
-	See whether some tailoring may be applied to your corporate screen by excluding certain topics considered not impactful. This is achieved by using the custom_stop_words parameter in input to the Topic Sentiment API


-	Identify and Extract key topics on the subset of documents and print their keywords



In [None]:
print('------------- Get list of topics from dataset --------------')

payload = nucleus_api.Topics(dataset='Corporate_docs',                       
                            query='',                       
                            num_topics=20, 
                            num_keywords=8,
                            period_start="2018-11-01 00:00:00",
                            period_end="2019-01-01 00:00:00")
try:
    api_response = api_instance.post_topic_api(payload)        
    api_ok = True
except ApiException as e:
    api_error = json.loads(e.body)
    print('ERROR:', api_error['message'])
    api_ok = False

if api_ok:       
    for i, res in enumerate(api_response.result.topics):
        print('Topic', i, ' keywords: ', res.keywords)    
        print('---------------')

You can then tailor the screen analysis by creating a custom_stop_words variable. Initialize the variable as follows, for instance, and pass it in the payload of the main code of section 2: 

In [None]:
custom_stop_words = ["call","report"] # str | List of stop words. (optional)

#### b.	Focusing the screen analysis on certain subjects
In case you decide to focus the screen analysis, for instance on financial health and corporate actions subjects, simply substitute the query variable in the main code of section 2. with: 

In [None]:
query = '(earnings OR debt OR competition OR lawsuit OR restructuring)' # str | Fulltext query, using mysql MATCH boolean query format. Example: "(word1 OR word2) AND (word3 OR word4)" (optional)

#### c.	Exploring the impact of the type of documents, the lookback period, the number of topics being extracted
**num_topics**: You can compute the corporate screen using different breadth of topics by changing the variable num_topics in the payload in the main code of section 2. A larger value will provide more breadth in establishing rankings while a smaller value will provide a shallower measure. If num_topics is too large, some very marginal topics may bring in a lot of noise in measuring corporate rankings.

**T**: You can compute the corporate screen with different speeds of propagation by changing the variable T (lookback) in the main code of section 2. A larger value will provide a slowly changing ranking while a smaller value will lead to a very responsive ranking. If T is too small, too few documents may be used and this may lead to a lot of noise in ranking companies. If T is too long, the rankings won’t reflect quickly enough important new information. 

**Document types**: You can investigate how the corporate screen changes if it is measured using only one type of document among company reports, press releases, earning call transcripts compared to capturing the whole content by leveraging the metadata selector provided during the construction of the dataset. Rerun the main code of section 2. on a subset of the whole corpus. Create a variable metadata_selection and pass it in to the payload:

In [None]:
metadata_selection = {"category": "Report"}   # str | json object of {\"metadata_field\":[\"selected_values\"]} (optional)

### 5.	Next Steps
-	Possible extension: repeat the above tasks for different industry sectors
 - This gives you a broad market screen to have views within every important industry, which you can use for security selection 
 - If you blend all companies across industries together in a single dataset, you could then derive industry rankings using the main code in section 2 and an extra metadata argument being the industry sector attached to a company/document. Then you can apply this industry screen to sector allocation


-	Perform a correlation analysis between the time series of the ranking of companies and their beta-adjusted credit spreads 
 - Several time horizons for price impact can be studied: 1 day, 7 days, a few weeks, perhaps even longer lasting impact
 - Several time lags for price impact can be studied: following day, 2 to 3 days gap before market starts adjusting, a week, perhaps even longer gap before markets start incorporating the information from the corporate content
 - You could also conduct such correlation analysis by bucket of credit spreads within your universe of bonds. Are the companies trading at higher spread showing a different price impact than companies trading at lower spread?


-	Possible extension: explore simple transformation of the indicator 
 - You could rescale and smooth these rankings using a cross-sectional score 

        Score(Company i) = ( Rank(Company i) – Average(Ranks, [Companies]) ) / Std(Ranks, [Companies])

Copyright (c) 2019 SumUp Analytics, Inc. All Rights Reserved.

NOTICE: All information contained herein is, and remains the property of SumUp Analytics Inc. and its suppliers, if any. The intellectual and technical concepts contained herein are proprietary to SumUp Analytics Inc. and its suppliers and may be covered by U.S. and Foreign Patents, patents in process, and are protected by trade secret or copyright law.

Dissemination of this information or reproduction of this material is strictly forbidden unless prior written permission is obtained from SumUp Analytics Inc.