# Calculation of an analyst question's thematic distinctness

## Analysts’ Questions When Peers Are Listening

Julia Haag, Christian Hofmann, Alexander Paulus, Nina Schwaiger, Thorsten Sellhorn<br>
*LMU Munich School of Management*

The following example depicts the calculation of our cosine modification score (*DistinctnessQ*) for selected analysts’ questions in [Microsoft’s Q3 2014 conference call](https://view.officeapps.live.com/op/view.aspx?src=https://c.s-microsoft.com/en-us/CMSFiles/Mi-crosoft_Q3_2014_Transcript.docx?version=8b1f52f4-9cbe-d8c2-4ee6-12083c154b2f) held on April 24, 2014. Below please find an excerpt of management presentation as the full version is multiple pages long. As indicated in Section 3, we consolidate all relevant speech portions from the management presentation and exclude stop words, non-informative word categories, and any word inflections. Subsequently, we calculate *DistinctnessQ* for each respective analyst question, which are also consolidated as well as equally preprocessed.

In [1]:
# Import of mandatory Python modules
import pandas as pd
import numpy as np
import nltk
import spacy
from nltk.tokenize import word_tokenize, sent_tokenize
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

# Loading of spacy nlp dictionary
nlp = spacy.load('en_core_web_sm', disable=['ner', 'parser'])

In the following, we define a tokenizer that includes the typical preprocessing steps which are also used by prior literature. To reduce the dimensionality of the overall text corpus, we exclude highly frequent words (“stop words”; see [Manela and Moreira [2017]](https://www.sciencedirect.com/science/article/abs/pii/S0304405X16301751)), word categories that do not contain any information with regard to the underlying research question (e.g., [Lee [2016]](https://meridian.allenpress.com/accounting-review/article-abstract/91/1/229/53572/Can-Investors-Detect-Managers-Lack-of-Spontaneity?redirectedFrom=fulltext)), and any word inflections by lemmatizing the words to their base form (e.g., [Li et al. [2021]](https://academic.oup.com/rfs/article/34/7/3265/5869446)).

In [2]:
# Definition of better tokenizer 
def tokenizer(text):
    doc = nlp(text)
    words = [token.lemma_ for token in doc if 
                 token.is_stop == False 
                 and token.is_punct == False
                 and (token.pos_ == "NOUN" 
                      or token.pos_ == "ADJ" 
                      or token.pos_ == "PROPN"
                      or token.pos_ == "VERB")
                 and any(digit in token.text for digit in ("0", "1", "2", "3", 
                                                           "4", "5", "6", "7",
                                                           "8" , "9", ".")) == False]
    return words

In [3]:
# Definition of output DataFrame
df = [0,1,2,3]

**Management presentation:**

***Satya Nadella, CEO:*** *“[…] We want to build products that people love to use, and as a result, you'll see us increasingly focus on usage as the leading indicator of long-term success. To that end, we're already making progress. Amy will provide additional detail, but I wanted to say a few words about the quarter itself. Today's results demonstrate the breadth and strength of our overall business. We saw strong momentum in Cloud services, our commercial Cloud business more than doubled year-over-year with Office 365 and Azure are both performing extremely well. Business customers continued to make Windows their overwhelming platform of choice with solid growth both in Windows Pro and Windows volume licensing revenues. We saw continued improvement in search, with our US search share growing to 18.6%, and search revenue increasing by 38%. Bing continues to deliver platform capabilities across our products. One recent example of this is the recently-announced Cortana virtual assistant for Windows Phone. And very importantly, across all our businesses, we continue to have a rigorous focus on execution and cost discipline, resulting in solid revenue and earnings per share. I sum up this quarter in two words, execution and transition. We delivered solid financial results, and we took several steps to reorient Microsoft. In recent weeks, we talked about how we're advancing Office, Windows and our data platform, and how we think holistically about the constituencies we serve, IT, developers and the people at the center in the mobile-first cloud first world. We will continue to invest in our cloud capabilities, including Office 365 and Azure in the fast-growing SaaS and Cloud platform markets. We are committed to ensuring that our cloud services are available across all device platforms that people use. We're delivering a cloud for everyone on every device. […]”*



***Amy Hood, CFO:*** *“[…] We had outstanding momentum and results in our cloud services. As Satya mentioned, commercial cloud revenue more than doubled again this quarter. Office 365 is now on an annual revenue run rate of $2.5 billion, and Azure revenue grew over 150\%, driven by both new customers and increased usage. In our Office 365 Home service, we added nearly 1 million new users this quarter, and now have over 4.4 million subscribers, they continued to enhance its value competition with new features, premium services, and cross-platform functionality. As we cross the one-year anniversary since launch, we are pleased with the renewal rates we are experiencing thus far. With Bing, we made clear progress again this quarter. We grew our US share and improved RPS significantly. Display revenue, related to portal and e-mail, declined, while we saw ad revenue growth in products like Skype and Xbox. Importantly, we are innovating, while expanding our cloud gross margins through both improved scale and continuous engineering efforts to drive efficiency. Businesses are clearly expressing their overwhelming preference for Windows. Windows Pro revenue grew 19\%, driven by growth in business PCs. Mix shift to developed markets, where attach is higher, continued strength in the enterprise, and an increased mix of Pro in small and medium businesses. Windows volume licensing also had a solid 11\% revenue growth. Windows XP end of support contributed in part to this growth we saw this quarter, as did a general hardware refresh. […]”*


Hence, in the management presentation, the CEO talks about the business development from a strategic perspective and describes that their business operations (mainly cloud products) developed quite positively over the past quarter. The CFO subsequently provides more detail on the development from a rather financial perspective thereby highlighting increases in revenue and subscribers for Software-as-a-Service as well as cloud-related products.

In [4]:
df[0] = "We want to build products that people love to use, and as a result, you'll see us increasingly focus on usage as the leading indicator of long-term success. To that end, we're already making progress. Amy will provide additional detail, but I wanted to say a few words about the quarter itself. Today's results demonstrate the breadth and strength of our overall business. We saw strong momentum in Cloud services, our commercial Cloud business more than doubled year-over-year with Office 365 and Azure ar both performing extremely well. Business customers continued to make Windows their overwhelming platform of choice with solid growth both in Windows Pro and Windows volume licensing revenues. We saw continued improvement in search, with our US search share growing to 18.6%, and search revenue increasing by 38%. Bing continues to deliver platform capabilities across our products. One recent example of this is the recently-announced Cortana virtual assistant for Windows Phone. And very importantly, across all our businesses, we continue to have a rigorous focus on execution and cost discipline, resulting in solid revenue and earnings per share. I sum up this quarter in two words, execution and transition. We delivered solid financial results, and we took several steps to reorient Microsoft. In recent weeks, we talked about how we're advancing Office, Windows and our data platform, and how we think holistically about the constituencies we serve, IT, developers and the people at the center in the mobile-first cloud first world. We will continue to invest in our cloud capabilities, including Office 365 and Azure in the fast-growing SaaS and Cloud platform markets. We are committed to ensuring that our cloud services are available across all device platforms that people use. We're delivering a cloud for everyone on every device. We had outstanding momentum and results in our cloud services. As Satya mentioned, commercial cloud revenue more than doubled again this quarter. Office 365 is now on an annual revenue run rate of $2.5 billion, and Azure revenue grew over 150%, driven by both new customers and increased usage. In our Office 365 Home service, we added nearly 1 million new users this quarter, and now have over 4.4 million subscribers, they continued to enhance its value competition with new features, premium services, and cross-platform functionality. As we cross the one-year anniversary since launch, we are pleased with the renewal rates we are experiencing thus far. With Bing, we made clear progress again this quarter. We grew our US share and improved RPS significantly. Display revenue, related to portal and e-mail, declined, while we saw ad revenue growth in products like Skype and Xbox. Importantly, we are innovating, while expanding our cloud gross margins through both improved scale and continuous engineering efforts to drive efficiency. Businesses are clearly expressing their overwhelming preference for Windows. Windows Pro revenue grew 19%, driven by growth in business PCs. Mix shift to developed markets, where attach is higher, continued strength in the enterprise, and an increased mix of Pro in small and medium businesses. Windows volume licensing also had a solid 11% revenue growth. Windows XP end of support contributed in part to this growth we saw this quarter, as did a general hardware refresh."

**Analysts’ questions:**

***Rick Sherlund, Nomura Asset Management:*** *“First, Satya, on Platform-as-a-Service, I'm kind of curious how aggressively you plan to maybe change your business model to a subscription model and drive, like adobe did, less upfront revenue, more encouraging subscription and cloud-based, and whether we should begin to anticipate what it might look like to Microsoft as you make this transition to more of a subscription business. I'm just not sure what the margins are on your cloud business and as you transition, most SaaS companies have 70%, 80% gross margins, I'm not real sure where you are on your cloud businesses today. Do we think this is going to be a smooth transition, or might we expect it to be a little more disruptive, as you gain more and more traction on a subscription and cloud basis?”*

According to the distribution parameters of *DistinctnessQ* displayed in Table 2, Rick Sherlund’s question is part of the first quartile and can therefore be classified as rather less-destinct. In particular, the question thematically relates to the managements’ discussion of the development and profitability (margins) of the firms’ business operations. Given the relatively low thematic distinctness, our resulting cosine medication score is lower (0.7490, *see below*).

In [5]:
df[1] = "First, Satya, on Platform-as-a-Service, I'm kind of curious how aggressively you plan to maybe change your business model to a subscription model and drive, like adobe did, less upfront revenue, more encouraging subscription and cloud-based, and whether we should begin to anticipate what it might look like to Microsoft as you make this transition to more of a subscription business. I'm just not sure what the margins are on your cloud business and as you transition, most SaaS companies have 70%, 80% gross margins, I'm not real sure where you are on your cloud businesses today. Do we think this is going to be a smooth transition, or might we expect it to be a little more disruptive, as you gain more and more traction on a subscription and cloud basis?"

***Heather Bellini, Goldman Sachs:*** *“Thank you, and Satya, I’ll echo everybody’s thoughts that it’s great to have you on the call, and hear your perspective. I was wondering if you could share with us the decision recently to offer Windows for free for sub-9-inch devices, and how you think this impacts your share in that arena? And also, how should we think about Windows pricing, given your comment about how Windows is going to play in different market segmentations, how do we see Windows pricing evolving, if at all, for other types of form factors over time?”*

Heather Bellini’s question is relatively more-distinct with a cosine modification score (*DistinctnessQ*) of 0.8884 (*see below*). She addresses a topic, namely the pricing of the firm’s product, which was not discussed in the management presentation. Hence, the resulting *DistinctnessQ* score is rather high.

In [6]:
df[2] = "Thank you, and Satya, I'll echo everybody's thoughts that it's great to have you on the call, and hear your perspective. I was wondering if you could share with us the decision recently to offer Windows for free for sub-9-inch devices, and how you think this impacts your share in that arena? And also, how should we think about Windows pricing, given your comment about how Windows is going to play in different market segmentations, how do we see Windows pricing evolving, if at all, for other types of form factors over time?"

***Walter Pritchard, Citigroup:*** *“Amy, just a question for you, I know you're not providing updated guidance for Nokia. Just wondering, maybe a little bit of thought behind that decision. I know the deal closes tomorrow, you probably could have easily moved things around here a bit and reported on early next week, and had that benefit. Is the lack of guidance there, is it just simply you haven't been able to get your hands into that business at this point? Is there something else that's changed in that business, just trying to get a sense. It does open up quite a bit of variability, in terms of how people are going to model things over the next three months.”*

The management neither talked about the firm’s guidance nor about their subsidiaries. However, Walter Pritchard is specifically asking the for the management’s reasoning for not providing guidance for a specific subsidiary. Our cosine modification score picks that up and therefore our resulting *DistinctnessQ* score is relatively high (0.9373, *see below*) for this question.

In [7]:
df[3] = "Amy, just a question for you, I know you're not providing updated guidance for Nokia. Just wondering, maybe a little bit of thought behind that decision. I know the deal closes tomorrow, you probably could have easily moved things around here a bit and reported on early next week, and had that benefit. Is the lack of guidance there, is it just simply you haven't been able to get your hands into that business at this point? Is there something else that's changed in that business, just trying to get a sense. It does open up quite a bit of variability, in terms of how people are going to model things over the next three months."

In [8]:
# Definition of count vectorizer that is using the predefined tokenizer and mono- and bigrams
count_tokenizer = CountVectorizer(tokenizer=tokenizer, 
                                    ngram_range=(1,2))

In [9]:
# Calculation of cosine similarity
count_matrix = count_tokenizer.fit_transform(df)    
count_similarity = cosine_similarity(count_matrix)
DistinctnessQ = np.around(np.subtract(1,count_similarity), decimals=4)

In [10]:
print("DistinctnessQ for Rick Sherlund's questions is: ", DistinctnessQ[1,0])
print("DistinctnessQ for Heather Bellini's questions is: ", DistinctnessQ[2,0])
print("DistinctnessQ for Walter Pritchard's questions is: ", DistinctnessQ[3,0])

DistinctnessQ for Rick Sherlund's questions is:  0.749
DistinctnessQ for Heather Bellini's questions is:  0.8884
DistinctnessQ for Walter Pritchard's questions is:  0.9373
