<h1><center>  Stocks Sentiment - Nucleus APIs Use Cases</center></h1>


<h1><center>  SumUp Analytics, Proprietary & Confidential</center></h1>
<h1><center>  Disclaimers and Terms of Service available at www.sumup.ai</center></h1>


#  

 


## Objective: 
-	Measure the sentiment of specific stocks using sell-side research reports


## Data:
-	A collection of equity research reports from the sell-side



## Nucleus APIs used:
-	Dataset creation API
 - 	*api_instance.post_upload_file(file, dataset)*
 - 	*nucleus_helper.import_files(api_instance, dataset, file_iters, processes=1)*

        nucleus_helper.import_files leverages api_instance.post_upload_file with parallel execution to speed-up the dataset creation
        

-	Document Sentiment API
 - 	*api_instance.post_doc_sentiment_api(payload)*



-	Topic Modeling API
 - 	*api_instance.post_topic_api(payload)*



-	DocInfo API
 - 	*api_instance.post_doc_info(payload)*



## Approach:

### 1.	Dataset Preparation
-	Create a Nucleus dataset containing all relevant documents

    

In [None]:
print('--------- Append all files from local folder to dataset in parallel -----------')
folder = 'Sellside_research'         
dataset = 'Sellside_research'# str | Destination dataset where the file will be inserted.

# build file iterable from a folder recursively. 
# Each item in the iterable is in the format below:
# {'filename': filename,   # filename to be uploaded. REQUIRED
#  'metadata': {           # metadata for the file. Optional
#      'key1': val1,       # keys can have arbiturary names as long as the names only
#      'key2': val2        # contain alphanumeric (0-9|a-z|A-Z) and underscore (_)
#   } 
# }
file_iter = []
for root, dirs, files in os.walk(folder):
    for file in files:
        #if Path(file).suffix == '.pdf': # .txt .doc .docx .rtf .html .csv also supported
            file_dict = {'filename': os.path.join(root, file),
                         'metadata': {'ticker': 'AAPL',
                                      'company': 'Apple',
                                      'bank': 'Credit Suisse',
                                      'category': 'sell side research'
                                      'date': '2019-01-01'}}
            file_iter.append(file_dict)

file_props = nucleus_helper.upload_files(api_instance, dataset, file_iter, processes=4)
for fp in file_props:
    print(fp.filename, '(', fp.size, 'bytes) has been added to dataset', dataset)

-	For a given date in that period, retain only a subset of the documents published during a chosen lookback period

**This can be done directly into the APIs that perform content analysis, see below**



### 2.	Measuring the Sentiment of one Stock
- Select all documents relating to the stock


- Measure the sentiment on each document discussing the company


- Aggregate the reports' sentiment into a company' sentiment
 
 
- Further down, we discuss how to refine this analysis by leveraging the different parameters available to the user




In [None]:
# Extract all the documents that relate to a chosen company
import numpy as np

payload = nucleus_api.DocInfo(dataset='Sellside_research',
                             metadata_selection={'ticker': 'AAPL'})
api_response = api_instance.post_doc_info(payload)

doc_list = []
for res in api_response.result:        
    doc_list.append(res.title) 

doc_list = np.unique(doc_list)

print('-------- Get the sentiment of each document ----------------')
reports_sentiment = []
for i in range(len(company_list)):
    payload = nucleus_api.DocumentSentimentModel(dataset='Sellside_research', 
                                                doc_title=doc_list[i], 
                                                custom_stop_words="", 
                                                num_topics=10, 
                                                num_keywords=10)
    api_response = api_instance.post_doc_sentiment_api(payload)
    
    reports_sentiment.append(api_response.result.sentiment)

# Add up the sentiment from each report into a sentiment for the company
company_sentiment = np.mean(reports_sentiment, axis=1)

-	Repeat the above tasks for each company you are interested in

In [None]:
import numpy as np

# List of companies you are interested in
company_list = ['AAPL', 'GOOG', 'FB', 'BABA', 'NFLX']

# Go through each of them and get the sentiment
company_sentiment = []
for i in range(len(company_list)):  
    # Get all docs discussing a given company
    payload = nucleus_api.DocInfo(dataset='Sellside_research',
                                 metadata_selection={'ticker': company_list[i]})
    api_response = api_instance.post_doc_info(payload)

    doc_list = []
    for res in api_response.result:        
        doc_list.append(res.title) 

    doc_list = np.unique(doc_list)

    # Get the sentiment of each document
    reports_sentiment = []
    for j in range(len(company_list)):
        payload = nucleus_api.DocumentSentimentModel(dataset='Sellside_research', 
                                                    doc_title=doc_list[j], 
                                                    custom_stop_words="", 
                                                    num_topics=10, 
                                                    num_keywords=10)
        api_response = api_instance.post_doc_sentiment_api(payload)

        reports_sentiment.append(api_response.result.sentiment)

    # Add up the sentiment from each report into a sentiment for the company
    company_sentiment.append(np.mean(reports_sentiment, axis=1))

### 3.	Results Interpretation
-	Plot the time series of company sentiments within one industry sector; or for each company against their stock returns

### 4.	Fine Tuning

#### a.	Tailoring the topics
-	See whether some tailoring may be applied to your company's sentiment by excluding certain topics considered not impactful. This is achieved by using the custom_stop_words parameter in input to the Document Sentiment API


-	Identify and Extract key topics on the subset of documents that discusses the company



In [None]:
print('------------- Get list of topics from dataset --------------')

payload = nucleus_api.Topics(dataset='Sellside_research',                       
                            query='',                       
                            num_topics=20, 
                            num_keywords=5,
                            metadata_selection={'ticker': 'AAPL'})
api_response = api_instance.post_topic_api(payload)        
    
for i, res in enumerate(api_response.result.topics):
    print('Topic', i, ' keywords: ', res.keywords)    
    print('---------------')

You can then tailor the company' sentiment by creating a custom_stop_words variable. Initialize the variable as follows, for instance, and pass it in the payload of the main code of section 2: 

In [1]:
custom_stop_words = ["call","report"] # str | List of stop words. (optional)

#### b.	Exploring the impact of the type of documents, the lookback period, the number of topics being extracted
**num_topics**: You can compute the company's sentiment using different breadth of topics by changing the variable num_topics in the payload in the main code of section 2. A larger value will provide more breadth in establishing sentiment while a smaller value will provide a shallower measure. If num_topics is too large, some very marginal topics may bring in a lot of noise in measuring company sentiment.

**Document types**: You can investigate how the company's sentiment changes if it is measured using sell-side research vs news vs company publications. Rerun the main code of section 2. on those different datasets. You could also construct a dataset with all the content across providers and then select only certain types of documents using a metadata_selection:

In [None]:
metadata_selection = {"category": "News"}   # str | json object of {\"metadata_field\":[\"selected_values\"]} (optional)

### 5.	Next Steps
-	Possible extension: repeat the above tasks for different industry sectors
 - Aggregate the sentiment of each company in a sector to get a sector sentiment

Copyright (c) 2019 SumUp Analytics, Inc. All Rights Reserved.

NOTICE: All information contained herein is, and remains the property of SumUp Analytics Inc. and its suppliers, if any. The intellectual and technical concepts contained herein are proprietary to SumUp Analytics Inc. and its suppliers and may be covered by U.S. and Foreign Patents, patents in process, and are protected by trade secret or copyright law.

Dissemination of this information or reproduction of this material is strictly forbidden unless prior written permission is obtained from SumUp Analytics Inc.