# A Gentle Introduction To Calculating TF-IDF Values
A step by step mathematical and code-based guide on demystifying TF-IDF values by calculating them on a mystic poem by Rumi.

Medium article with explanation for this notebook is at:https://medium.com/@ann.t.sebastian/a-gentle-introduction-to-calculating-tf-idf-values-9e391f8a13e5 

In [71]:
import pandas as pd
import numpy as np
from IPython.display import display, HTML

# Table of contents
1. [Intuition behind TF-IDF](#introduction)
2. [Making up a Corpus](#corpus)   
3. [Looking ahead to the final output](#output)
4. [Step 1: Term Frequency](#tf)
5. [Step 2: Inverse Document Frequency](#idf)
6. [Step 3: Aggregate TF and IDF](#aggregate_tf_idf)
    1. [Multiply TF and IDF](#mutiply_tf_idf)
    2. [Normalize TF and IDF](#normalize_tf_idf)
7. [Why is TF-IDF still relevant?](#relevance)
8. [Limitations](#limitations)
9. [Assumptions](#assumptions)
10. [References](#references)

# Intuition behind TF-IDF<a name="introduction"></a>

This notebook drills down into a pretty old mechanism to represent text as numbers called TF-IDF a.k.a Term Frequency Inverse Document Frequency.

So, what is TF-IDF? 

Intuitively, to understand what text is about, we look for words that occur frequently. Term frequency covers that aspect by capturing the number of times each word occurs in the text. To downgrade the relative importance of words that occur all too frequently, an inverse weighting is introduced to scale down the words that occur too frequently. This inverse weighting is referred to as Inverse Document Frequency. Together, TF-IDF captures the relative importance of words in a set of documents or a collection of texts.

There are many great articles written on the intuition behind TF-IDF and not many written on how to derive the exact values for TF-IDF. The focus of this notebook is to piece together various calculations involved and provide how to derive each step programmatically so that you can derive it on the texts you are working with. We will look at the maths involved in an intuitive way as we go along. 

We will be using a beautiful poem by mystic poet Rumi as our example corpus. First, we will calculate TF IDF values for the poem using TF IDF Vectoriser from the sklearn package. Then, we will pull apart the various components and work through various steps involved in calculating TF-IDF values. Mathematical calculations and Python code will be provided for each step.

Understanding TF-IDF calculations from scratch will help you with developing better intuitions on the results you obtain from applying any algorithm on TF-IDF values.

## Introducing Corpus<a name="corpus"></a>

To illustrate the concept of TF-IDF, we need a corpus. A corpus is a collection of documents. In a typical Natural Language Processing problem, a corpus can vary from a list of call center logs/ transcripts, a list of social media feedback to a large collection of research documents.

To illustrate the various steps involved, we need to keep the corpus as small as possible. I chanced upon this quote / beautiful poem from the 13th century Persian Poet and Sufi Mystic Rumi (Jalāl ad-Dīn Muhammad Rūmī) and it fits our use case perfectly. So, we will be using this poem as our list of documents with each sentence considered as a document.

![Rumi poem](RumiQuote.png)

In [72]:
corpus =  ["Early indications are that we may see more than half a foot of snow in the higher elevations of central and northern New England. There’s also a chance of rain changing to snow close to Boston and Providence, so we have hoisted the First Alert stamp for possible difficult travel Saturday afternoon and evening.",
"Low pressure that brought snow to Oklahoma Thursday morning is tracking to the mid-Atlantic States and starts to intensify Friday night and Saturday. The storm center of what's expected to be a nor'easter should track pretty close to Cape Cod during the afternoon. At the same time, we have colder air moving in, so we may have a cold rain trying to change to snow in much of the region.",
"Saturday's weather event will begin as rain and get pretty intense, with heavy rain coming down Saturday morning into the noontime hour. As rain intensity increases, it will start to drag down snowflakes from above causing major snow (snow?!) pileup on the streets.",
"Heavy snow and even whiteouts are possible. And it’ll become windy, too, from the east, then the north, then the northwest, perhaps gusting past 40 mph. Low pressure starts to pull out Saturday night, so we should dry out, the exception could be in Maine where it may try and keep snowing Saturday night into Sunday, especially in northern Maine.",
]

# Looking ahead to the final output<a name="output"></a>
We will be decimating the beautiful poem into mysterious decimals in this step. But, hey, after all, we are trying to demystify these decimals by understanding the  calculations involved in TF-IDF. As mentioned before, it is quite easy to derive through sklearn package.

In [73]:
#transform the tf idf vectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
tf_idf_vect = TfidfVectorizer()
X_train_tf_idf = tf_idf_vect.fit_transform(corpus)
terms = tf_idf_vect.get_feature_names()


### Helper Functions
As the outputs are sparse matrices, for the ease of visualisation it is converted to dataframe. As the matrix is very small(8 rows * 25 columns), memory is not a constraint here.

In [74]:
# create a dataframe from a word matrix
def dtm2df(wm, feat_names):
    
    # create an index for each row
    doc_names = ['Doc{:d}'.format(idx) for idx, _ in enumerate(wm)]
    df = pd.DataFrame(data=wm.toarray(), index=doc_names,
                      columns=feat_names)
    return(df)

def idf2df(wm, feat_names):
  
    # create an index for each row
    doc_names = ['Doc{:d}'.format(idx) for idx, _ in enumerate(wm)]
    df = pd.DataFrame(data=wm, index=[0],
                      columns=feat_names)
    return(df)

In [75]:
df_tf_idf = dtm2df(X_train_tf_idf ,terms)
display(HTML(df_tf_idf.to_html()))

Unnamed: 0,and,are,born,crawling,don,dreams,fly,for,goodness,greatness,have,ideals,learn,meant,not,potential,so,them,to,trust,use,were,wings,with,you
Doc0,0.0,0.0,0.383289,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.682895,0.0,0.0,0.0,0.0,0.0,0.383289,0.0,0.383289,0.304834
Doc1,0.37764,0.0,0.293087,0.0,0.0,0.0,0.0,0.0,0.522185,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.522185,0.0,0.293087,0.0,0.293087,0.233096
Doc2,0.37764,0.0,0.293087,0.0,0.0,0.522185,0.0,0.0,0.0,0.0,0.0,0.522185,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.293087,0.0,0.293087,0.233096
Doc3,0.0,0.0,0.383289,0.0,0.0,0.0,0.0,0.0,0.0,0.682895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.383289,0.0,0.383289,0.304834
Doc4,0.0,0.0,0.413022,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.413022,0.616716,0.413022,0.328481
Doc5,0.0,0.372697,0.0,0.372697,0.372697,0.0,0.0,0.372697,0.0,0.0,0.0,0.0,0.0,0.372697,0.372697,0.0,0.372697,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166366
Doc6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.725164,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.607744,0.0,0.323703
Doc7,0.307727,0.0,0.0,0.0,0.0,0.0,0.425512,0.0,0.0,0.0,0.0,0.0,0.425512,0.0,0.0,0.0,0.0,0.425512,0.425512,0.0,0.425512,0.0,0.0,0.0,0.0


In the matrix above, each row represents a sentence from the above poem. Each column represents a unique word in the corpus above in alphabetical order. As you can see, there are lot of zeros in the matrix. So, a memory efficient sparse matrix is used for representing this. I have converted it to a data frame for ease of visualization.

Let us interpret the numbers we have received so far. As you may have noticed, the words "you were born" are repeated throughout the corpus. So, we anticipate that these words will not be getting high TF-IDF scores. Manually if you look at the values for those three words, you can see that most often they get .2 and .3.

Let us look at Document 0- You were born with potential. The word potential stands out. If you look at the various TF-IDF values in the first row in the matrix, you will see that the word potential has the highest TF-IDF value.
Let us look at Document 4 (row 5): You were born with wings. Again, same as before, the word "wings" has the highest value in that sentence. 

Notice that the word "wings" appears also in Document 6. TF-IDF value for the word wings in Document 6 is different to TF-IDF value for the word wings in Document 4. In Document 6, the word "wings" is deemed less important than the word "have" in Document 6 as the word "have" appears only once in the entire corpus.

The objective of this article is to look at how the above TF-IDF values can be calculated from scratch. We will be focusing on applying the calculations on the words wings and potential in particular, to derive the values highlighted in red in the matrix displayed above.

We will break apart the various components and then put them back together. We will do this in three steps:
* Step 1: Derive term frequency values 
* Step 2: Derive inverse document frequency values 
* Step 3: Aggregate the above two values using multiplication and normalization

---

# Step 1: Calculate Term Frequency<a name="tf"></a>

The term frequency is pretty straight forward. It is calculated as the number of times the words/terms appear in a document.

Let us consider the following 2 documents

* Document 0 - "You were born with potential" - each of the words appear once in this document
* Document 4 - "You were born with wings" - again, each of the words appear once in this document



![Count Vectorizer Example](.\images\count_vect_small.png "count vectorizer example")

Let us now use CountVectorizer to count the words and display the word count matrix. The matrix obtained below is also known as "Bag Of Words" or "Document Term Matrix".

In [76]:
# Find the number of times each time a word appears in a document (a sentence in the case of our corpus)
from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(corpus)
terms = count_vect.get_feature_names()

df_count = dtm2df(X_train_counts ,terms)
display(HTML(df_count.to_html()))

Unnamed: 0,and,are,born,crawling,don,dreams,fly,for,goodness,greatness,have,ideals,learn,meant,not,potential,so,them,to,trust,use,were,wings,with,you
Doc0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,1
Doc1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,1,1
Doc2,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,1
Doc3,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,1
Doc4,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1
Doc5,0,1,0,1,1,0,0,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,1
Doc6,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1
Doc7,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,1,0,1,0,0,0,0



![Count Vectorizer All](.\images\count_vect_all.png "count vectorizer")

Based on above: 
* Term Frequency of potential in Doc0 = 1
* Term Frequency of wings in Doc5 = 1
* Term Frequency of wings in Doc 7 = 1

---

# Step 2. Calculating Inverse Document Frequency<a name="idf"></a>

Given the relative importance of frequently used words, we need a mechanism to tone down the importance of such words. Enter Inverse Document Frequency. Intuitively, if a word appears in all documents, then it may not play such a big part in differentiating between the documents. 

<i> Similar to Term Frequency </i>
* <i>Document Frequency(term t) = number of documents with the term t/ total number of documents = d(t)/n</i>
* <i>Inverse Document Frequency = total number of documents / number of documents with the term t = n / d(t)</i>

Logarithmic scale intuitively makes sense to be used here as log(1) is 0. However, there are some practical considerations such as avoiding the division by 0 error, 1 is added to the denominator.
Inverse Document frequency for the default settings in TF IDF vectorizer in sklearn is calculated as below (default settings have smooth_idf=True which adds 1 to numerator and denominator).
n is the total number of documents in the document set.
d(t) is the number of documents in the document set that contain term .

$$ idf(t) = \ln (\frac{1+n}{1+df(t)})+1 $$
* $n$  is the total number of documents in the document set, and 
* $d(t)$ is the number of documents in the document set that contain term . 

Applying the above formula for calidf value for the word potential
* Number of Documents = 8
* Number of documents in the corpus that contain the word 'potential' = 1

$$ idf(potential) = \ln (\frac{1+8}{1+1})+1
                  = \ln (\frac {9}{2})+1
                  = 2.504077 $$

Applying the above formula for calidf value for the word potential
* Number of Documents = 8
* Number of documents in the corpus that contain the word 'wings' = 2

$$ idf(wings) = \ln (\frac{1+8}{2+1})+1
                  = \ln (\frac {9}{3})+1
                  = 2.098612 $$

In [77]:
# explore idf
# idf_ attribute can be used to extract IDF values
# transpose the 1D IDF array to convert to a dataframe to make it easy to visualise
df_idf = idf2df(vectorizer.idf_[:,np.newaxis].T ,terms)
display(HTML(df_idf.to_html()))

Unnamed: 0,and,are,born,crawling,don,dreams,fly,for,goodness,greatness,have,ideals,learn,meant,not,potential,so,them,to,trust,use,were,wings,with,you
0,1.81093,2.504077,1.405465,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,2.504077,1.405465,2.098612,1.405465,1.117783


We obtain the values as shown below and we can cross-check the values from our calculations.

---

# Step 3: Aggregate TF and IDF<a name="aggregate_tf_idf"></a>

As the name implies TF-IDF is a combination of Term Frequency(TF) and Inverse Document Frequency(IDF), obtained by multiplying the 2 values together. The sklearn implementation then applies normalization on the product between TF and IDF. Let us look at each of those steps in detail.

### Step 3A:  Multiply TF and IDF<a name="multiply_tf_idf"></a>

In multiplying the 2 matrices together, we take an element-wise multiplication of Term Frequency Matrix and Inverse Document Frequency. Consider the first sentence - "You were born with potential". To find the product of TF and IDF for this sentence, it is calculated as below.![TF.IDF Example](.\images\tf.idf_example.png "TF.IDF example")


In [78]:
#elment wise dot product
df_mul = df_count.mul(df_idf.to_numpy())
display(HTML(df_mul.to_html()))

Unnamed: 0,and,are,born,crawling,don,dreams,fly,for,goodness,greatness,have,ideals,learn,meant,not,potential,so,them,to,trust,use,were,wings,with,you
Doc0,0.0,0.0,1.405465,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,1.405465,0.0,1.405465,1.117783
Doc1,1.81093,0.0,1.405465,0.0,0.0,0.0,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.504077,0.0,1.405465,0.0,1.405465,1.117783
Doc2,1.81093,0.0,1.405465,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.405465,0.0,1.405465,1.117783
Doc3,0.0,0.0,1.405465,0.0,0.0,0.0,0.0,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.405465,0.0,1.405465,1.117783
Doc4,0.0,0.0,1.405465,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.405465,2.098612,1.405465,1.117783
Doc5,0.0,2.504077,0.0,2.504077,2.504077,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,2.504077,2.504077,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.117783
Doc6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.098612,0.0,1.117783
Doc7,1.81093,0.0,0.0,0.0,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,0.0,2.504077,0.0,0.0,0.0,0.0,2.504077,2.504077,0.0,2.504077,0.0,0.0,0.0,0.0


You may notice that the product of TF and IDF can be above 1. 
Now, the last step is to normalize these values so that TF-IDF values always scale between 0 and 1.

### Step 3 B: Normalize TF-IDF product<a name="normalize_tf_idf"></a>

In day to day life, when we want to normalize the values so that we can compare them easily we use percentages or proportions. We can potentially calculate a proportion of TF-IDF values across various words in a sentence. 

Note that both TF and IDF are non-negative values as the lowest value possible for Term Frequency and Inverse Document Frequency is 0. So, taking out proportions would be equivalent to what is known as L1 normalization. In L1 normalization each value in a vector(think various TF-IDF values of a sentence) is divided by sum of absolute values all elements. 

There is an option to L1 normalize the values in sklearn, but that is not the default setting.
Default normalization applied is L2 normalization.Easiest way to think about L2 normalization is to think about the length of a line or Pythogoras theorem.
![L2 Normalization](.\images\eg_L2_normalisation.png "L2 Normalization example")

In the diagram above, the length of the line is 5. In this case, the line is a 1D vector. When vectors are n-dimensional, the length of the vector is similar to length of a line but extended to n dimensions. So, if a vector v is composed of n-elements, the length of the vector is calculated as
![Length Vector](.\images\length_vector.png "Length of a Vector")

In L2 normalization, we are essentially dividing the vector by the length of the vector . For a more mathematical explanation of L1 and L2 norm, please refer to Wikipedia.
To apply L2 norm, for each of the sentences we need to calculate the square root of sum of squares of the product of TF and IDF.
It can be done in Python as below


Here are the values obtained
![Euclidean Distance](.\images\euclidean_distance_tfidf.png "Euclidean Distance")


Finally, we are ready to calculate the final scores
* <i>TF-IDF for the word potential in "you were born with potential" (Doc 0): 2.504077 / 3. 66856427 = 0.682895</i>
* <i>TF-IDF for the word wings in you were born with wings Doc4 = 2.098612/ 3. 402882126 = 0.616716</i>
* <i>TF-IDF for the word wings in Doc6 = 2.098612/ 3. 452116387 = 0.607744</i>

This can be programatically achieved by the following code.

In [79]:
from sklearn.preprocessing import Normalizer
df_mul.iloc[:,:] = Normalizer(norm='l2').fit_transform(df_mul)
display(HTML(df_mul.to_html()))

Unnamed: 0,and,are,born,crawling,don,dreams,fly,for,goodness,greatness,have,ideals,learn,meant,not,potential,so,them,to,trust,use,were,wings,with,you
Doc0,0.0,0.0,0.383289,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.682895,0.0,0.0,0.0,0.0,0.0,0.383289,0.0,0.383289,0.304834
Doc1,0.37764,0.0,0.293087,0.0,0.0,0.0,0.0,0.0,0.522185,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.522185,0.0,0.293087,0.0,0.293087,0.233096
Doc2,0.37764,0.0,0.293087,0.0,0.0,0.522185,0.0,0.0,0.0,0.0,0.0,0.522185,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.293087,0.0,0.293087,0.233096
Doc3,0.0,0.0,0.383289,0.0,0.0,0.0,0.0,0.0,0.0,0.682895,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.383289,0.0,0.383289,0.304834
Doc4,0.0,0.0,0.413022,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.413022,0.616716,0.413022,0.328481
Doc5,0.0,0.372697,0.0,0.372697,0.372697,0.0,0.0,0.372697,0.0,0.0,0.0,0.0,0.0,0.372697,0.372697,0.0,0.372697,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166366
Doc6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.725164,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.607744,0.0,0.323703
Doc7,0.307727,0.0,0.0,0.0,0.0,0.0,0.425512,0.0,0.0,0.0,0.0,0.0,0.425512,0.0,0.0,0.0,0.0,0.425512,0.425512,0.0,0.425512,0.0,0.0,0.0,0.0


When you run the above code, you get the results as below, which is same as the matrix in final output session
So, far we have done the following
![Calculation Overview](.\images\calculation_overview.png "Calculation Overview")


---

# Why TF IDF is still relevant? <a name="relevance"></a>
In the field of Natural Language Processing, word embedding in 2013 and language models in 2018 have changed the landscape and led to many state of the art models for problems in NLP. So, it seems a bit strange that here in 2020, I have chosen to talk about TF IDF which was first formulated in 1970. Here are my 2 reasons why I think it is good to understand the underlying calculations.
1. Understanding TF-IDF makes it one step easier to understand the results of algorithms you apply on top of TF-IDF. 
2. In 2018, Google released a text classification framework based on 450K experiments on a few different text sets. In text classification problems, the algorithms have to predict the topic based on a predefined set of topics it has trained on. Text classification problem is a common problem to solve for many companies. Based on the 450K experiments, Google found that when the number of samples/number of words < 1500, TF IDF was the best way to represent text. 

# Limitations <a name="limitations"></a>
The main limitation of TF IDF is that word order which is an important part of understanding the meaning of a sentence is not considered in TF-IDF.
Another limitation is that document length can introduce a lot of variance in the TF IDF values. ??



# Assumptions and Conventions <a name="assumptions"></a>
Following are the assumptions and conventions used in this article.
1. Rounding - For the ease of visualization, decimals are rounded to 4 decimal places. I have taken care not to round the numbers that are called out as examples. Any rounding changes you may spot in various cells are because of that. 
2. As mentioned in the sklearn documentation, there is a slight difference between most text-book formula for IDF and the implementation in sklearn.
3. For simplicity, I have used default settings for sklearn TF IDF vectorizer. There are ways to alter this such as 

    * a) Use L1 normalization instead of L2 normalization
    * b) Omit using smooth_idf in which case 1 that is added to numerator and denominator will be omitted. <divide by zero error> 
    


# References: <a name="references"></a>
* [1] <a href = "https://scikit-learn.org/stable/modules/feature_extraction.html">scikit-learn documentation</a> 
* [2]<a href="https://developers.google.com/machine-learning/guides/text-classification/step-2-5">Text Classification Framework</a>, Google, 2018 
* [3]Link to the medium post for helper function


# Additional Notes <a name="notes"></a>

TF IDF can also be calculated by transforming the CountVectorizer as shown below

In [80]:

#transform the count vectorizer
from sklearn.feature_extraction.text import TfidfTransformer
tf_transformer = TfidfTransformer().fit(X_train_counts)
X_train_tf_idf = tf_transformer.transform(X_train_counts)
X_train_tf_idf.shape
print(dtm2df(X_train_tf_idf ,terms))


           and       are      born  crawling       don    dreams       fly  \
Doc0  0.000000  0.000000  0.383289  0.000000  0.000000  0.000000  0.000000   
Doc1  0.377640  0.000000  0.293087  0.000000  0.000000  0.000000  0.000000   
Doc2  0.377640  0.000000  0.293087  0.000000  0.000000  0.522185  0.000000   
Doc3  0.000000  0.000000  0.383289  0.000000  0.000000  0.000000  0.000000   
Doc4  0.000000  0.000000  0.413022  0.000000  0.000000  0.000000  0.000000   
Doc5  0.000000  0.372697  0.000000  0.372697  0.372697  0.000000  0.000000   
Doc6  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000  0.000000   
Doc7  0.307727  0.000000  0.000000  0.000000  0.000000  0.000000  0.425512   

           for  goodness  greatness  ...  potential        so      them  \
Doc0  0.000000  0.000000   0.000000  ...   0.682895  0.000000  0.000000   
Doc1  0.000000  0.522185   0.000000  ...   0.000000  0.000000  0.000000   
Doc2  0.000000  0.000000   0.000000  ...   0.000000  0.000000  0.000000 