<h1><center>  Directional Rate Trading - Nucleus APIs Use Cases</center></h1>


<h1><center>  SumUp Analytics, Proprietary & Confidential</center></h1>


<h1><center>  Disclaimers and Terms of Service available at www.sumup.ai</center></h1>

 


## Objective: 
-	Develop an indicator of market rally/sell-off using a measure of macro-economic sentiment in Central Bank’ publications


## Data:
-	All documents published by People Bank of China, in Mandarin, excluding formal research
 - 	Speeches
 - 	Press Releases
 - 	Informal publications

    **The Nucleus Datafeed can be leveraged for all content from major Central Banks**


## Nucleus APIs used:
-	Dataset creation API
 - 	*api_instance.post_upload_file(file, dataset)*
 - 	*nucleus_helper.import_files(api_instance, dataset, file_iters, processes=1)*

        nucleus_helper.import_files leverages api_instance.post_upload_file with parallel execution to speed-up the dataset creation


-	Topic Modeling API
 - 	*api_instance.post_topic_api(payload)*


-	Topic Sentiment API
 - 	*api_instance.post_topic_sentiment_api(payload)*


-	DocInfo API
 - 	*api_instance.post_doc_info(payload)*


-	DatasetInfo API
 - 	*api_instance.post_dataset_info(payload)*


## Approach:

### 1.	Dataset Preparation
-	Create a Nucleus dataset containing all relevant documents over a chosen historical period

    

In [None]:
print('---- Append all files from local folder to dataset in parallel ----')
dataset = 'sumup/central_banks_chinese'# embedded datafeeds in Nucleus.
metadata_selection = {'bank': 'people_bank_of_china', 'document_category': ('speech', 'press release', 'publication')}

-	For a given date in that period, retain only a subset of the documents published during a chosen lookback period

**This can be done directly into the APIs that perform content analysis, see below**



### 2.	Sentiment Analysis
-	Identify and Extract key topics at a given point in time on the subset of documents at that date 


-	Measure the sentiment on each topic and aggregate all topic’ sentiments into a PBOC-level sentiment 


-	Further down, we discuss how to refine the sentiment analysis by leveraging the different parameters available to the user



In [None]:
print('---------------- Get topic sentiment ------------------------')

payload = nucleus_api.TopicSentimentModel(dataset='sumup/central_banks_chinese',           
                                query='',                   
                                num_topics=8,
                                num_keywords=8,
                                metadata_selection=metadata_selection,
                                period_start="2018-11-01 00:00:00",
                                period_end="2019-01-01 00:00:00")
api_response = api_instance.post_topic_sentiment_api(payload)
    

for i, res in enumerate(api_response.result):
    print('Topic', i, ' keywords: ', res.keywords)    
    print('    Sentiment:', res.sentiment)
    print('---------------')


# Aggregate all topic’ sentiments
import numpy as np
PBOC_sent = np.dot(api_response.result.sentiment, api_response.result.strength)

-	Repeat the above tasks for each date in the historical period to get the complete history of your sentiment indicator

In [None]:
import datetime
import numpy as np

print('--------------- Retrieve the time range of the dataset -------------')

payload = nucleus_api.DatasetInfo(dataset='sumup/central_banks_chinese', query='')
api_response = api_instance.post_dataset_info(payload)

first_date = datetime.datetime.fromtimestamp(float(api_response.result.time_range[0]))
last_date = datetime.datetime.fromtimestamp(float(api_response.result.time_range[1]))
delta = last_date – first_date

# Now loop through time and at each date, compute the sentiment indicator for PBOC
T = 90 # The look-back period in days

PBOC_sentiments = []
for i in range(delta.days):  
    if i == 1:
        end_date = first_date + datetime.timedelta(days=T)
 
    # first and last date used for the lookback period of T days
    start_date = end_date - datetime.timedelta(days=T)
    start_date_str = start_date.strftime("%Y-%M-%d 00:00:00")

    # We want a daily indicator
    end_date = end_date + datetime.timedelta(days=1) 
    end_date_str = end_date.strftime("%Y-%M-%d 00:00:00")

    payload = nucleus_api.TopicSentimentModel(dataset="sumup/central_banks_chinese",        
                                            query='',                   
                                            num_topics=8,
                                            num_keywords=8,
                                            metadata_selection=metadata_selection,
                                            period_start= start_date_str,
                                            period_end= end_date_str)
    api_response = api_instance.post_topic_sentiment_api(payload)

    # Aggregate all topic’ sentiments
    PBOC_sentiments.append(np.dot(api_response.result.sentiment, api_response.result.strength))

## 3.	Results Interpretation
-	Plot the time series of this PBOC sentiment against government yields, credit spread indices, equity indices



## 4.	Fine Tuning

### a.	Tailoring the topics
-	See whether some tailoring may be applied to your measure of sentiment by excluding certain topics considered not impactful. This is achieved by using the custom_stop_words parameter in input to the Topic Sentiment API


-	Identify and Extract key topics on the subset of documents and print their keywords



In [None]:
print('------------- Get list of topics from dataset --------------')

payload = nucleus_api.Topics(dataset='sumup/central_banks_chinese',                         
                            query='',                       
                            num_topics=8, 
                            num_keywords=8,
                            metadata_selection=metadata_selection,
                            period_start="2018-11-01 00:00:00",
                            period_end="2019-01-01 00:00:00")
api_response = api_instance.post_topic_api(payload)        
    
for i, res in enumerate(api_response.result.topics):
    print('Topic', i, ' keywords: ', res.keywords)    
    print('---------------')

You can then tailor the sentiment analysis by creating a custom_stop_words variable. Initialize the variable as follows, for instance, and pass it in the payload of the main code of section 2: 

In [None]:
custom_stop_words = ["conference","government"] # str | List of stop words. (optional)

### b.	Focusing the sentiment analysis on certain subjects
In case you decide to focus the sentiment analysis, for instance on policy and macro-economic subjects, simply substitute the query variable in the main code of section 2. with: 

In [None]:
query = '(inflation OR growth OR unemployment OR stability OR regulation)' # str | Fulltext query, using mysql MATCH boolean query format. Example: "(word1 OR word2) AND (word3 OR word4)" (optional)

### c.	Exploring the impact of the type of documents, the lookback period, the number of topics being extracted
**num_topics**: You can compute the sentiment indicator using different breadth of topics by changing the variable num_topics in the payload in the main code of section 2. A larger value will provide more breadth in establishing a sentiment indicator while a smaller value will provide a shallower measure. If num_topics is too large, some very marginal topics may bring in a lot of noise in measuring sentiment.

**T**: You can compute the sentiment indicator with different speeds of propagation by changing the variable T (lookback) in the main code of section 2. A larger value will provide a slowly changing measure of sentiment while a smaller value will lead to a very responsive sentiment measure. If T is too small, too few documents may be used and this may lead to a lot of noise in measuring sentiment. If T is too long, the sentiment indicator won’t reflect quickly enough important new information. 

**Document types**: You can investigate how the sentiment indicator changes if it is measured using only one type of document among speech, press release, publications compared to capturing the whole content by leveraging the metadata selector provided during the construction of the dataset. Rerun the main code of section 2. on a subset of the whole corpus. Create a variable metadata_selection and pass it in to the payload:



In [None]:
metadata_selection = {"document_category": "speech"}   # str | json object of {\"metadata_field\":[\"selected_values\"]} (optional)

## 5.	Next Steps
-	Possible extension: repeat the above tasks for each of (Federal Reserve, European Central Bank, Bank of England, Bank of Canada, Bank of Japan, Royal Bank of Australia, Bundesbank)
 - This gives you an indicator per country, which can lead to independent per-country signals
 - You could also rank each country based on that measure of sentiment and this can lead to a security-selection signal


-	Perform a correlation analysis between the time series of the sentiment indicator and either of government yields, credit spread indices, equity indices
 - Several time horizons for price impact can be studied: 1 day, 7 days, a few weeks, perhaps even longer lasting impact
 - Several time lags for price impact can be studied: following day, 2 to 3 days gap before market starts adjusting, a week, perhaps even longer gap before markets start incorporating the information from the Central Banks sentiment
 - Which underlying asset appear to have the strongest response? Within an asset class, are physical securities or Futures responding more?


-	Possible extension: explore simple transformation of the indicator 
 - If you have independent per-country indicators, you could rescale and smooth these indicators using a time-series score 

            Score(t) = ( Indicator(t) – Average(Indicator, [t – N  ; t]) ) / Std(Indicator, [t – N  ; t])

 - If you have a ranking of countries based on each sentiment indicator, you could rescale and smooth these rankings using a cross-sectional score

            Score(Bank i) = ( Indicator( Bank i) – Average(Indicator, [Banks]) ) / Std(Indicator, [Banks])




In [None]:
# Create a regression object

# Train the model using the training sets
# change in SentimentIndicator is your x
# change in rate or any other tradable asset is your y

# 1. Predict direction of rates
regr.fit(change in SentimentIndicator(t- p), sign(change in rate )(t)) # There may be a lag = p between the indicator and market response. To test.

# 2. Predict both direction and size of the move
regr.fit(change in SentimentIndicator(t- p), (change in rates)(t))

fitted_score = regr.score(change in SentimentIndicator, change in rates)

# Forecast stock return for the latest released doc using the exposures of that doc to the topics + the fitting
y_predicted = regr.fit(x, y).predict(x.reshape(1,-1))

Copyright (c) 2019 SumUp Analytics, Inc. All Rights Reserved.

NOTICE: All information contained herein is, and remains the property of SumUp Analytics Inc. and its suppliers, if any. The intellectual and technical concepts contained herein are proprietary to SumUp Analytics Inc. and its suppliers and may be covered by U.S. and Foreign Patents, patents in process, and are protected by trade secret or copyright law.

Dissemination of this information or reproduction of this material is strictly forbidden unless prior written permission is obtained from SumUp Analytics Inc.