# <center> TF-IDF</center>
**Term Frequency (TF)** The number of times a word appears in a document divded by the total number of words in the document. Every document has its own term frequency.
![title](https://miro.medium.com/proxy/1*HM0Vcdrx2RApOyjp_ZeW_Q.png)
**Inverse Data Frequency (IDF)** The log of the number of documents divided by the number of documents that contain the word w. Inverse data frequency determines the weight of rare words across all documents in the corpus.
![title](https://miro.medium.com/proxy/1*A5YGwFpcTd0YTCdgoiHFUw.png)

In [1]:
import nltk
from nltk.tokenize import sent_tokenize
from nltk.stem import WordNetLemmatizer
from nltk.corpus import stopwords
import re
from sklearn.feature_extraction.text import TfidfVectorizer

In [2]:
text = """Hello! Good morning to every last one of you presents over here. Before going ahead of I would like to extend a warm welcome to every person present over here. I am thankful to everyone present over here for giving me this beautiful opportunity to share my words about the most amazing person that ever lived, Dr. APJ Abdul Kalam. Dr. APJ Abdul Kalam was indeed one of the most humble, intelligent, 
wise, selfless, loving, and lovable leaders ever born. He was born on 15 October 1931 in Rameswaram, Tamil Nadu. He was the 11th President of India and served the country for one term. Not only this, but he also is one of the most famous scientists who have worked with highly famous organizations like DRDO (Defense Research and Development Organization) and ISRO (Indian Space Research Organization) in his career.
He was a true gem and a person with no haters. But let us first know a little more about him. His full name was Avul Pakir Jainulabdeen Abdul Kalam. He was born in a middle-class Muslim family. Since the beginning of his days, he was a very hardworking and diligent person. In his early childhood, he helped his family to earn livelihood along with the studies. Being very intelligent and promising, he started his career and life soon.
He saw a lot of hardships on his way to success. There was a time when his sister sold her jewelry to pay his college fees. After completing his graduation, he joined the defense department to serve the nation. And from there the journey of the famous scientist started to a never-ending tale of success. He was one of the various scientists in India who worked for the development of nuclear power. For his work, he also earned various awards and prizes.
Dr. Kalam was one of the most important figures in the testing of the Pokhran-II in the year 1988. Politics never attracted Dr. Kalam. But in the year 2002, Indian National Democratic Alliance requested him to nominate him for the post of the President. Thinking of the nation and keen eagerness to work for the country, made him say yes. With the support of the Indian National Democratic Alliance, he won the elections and was selected as the President of India.
He was a man of dreams and ideas. He dreamt of making India one of the super-powers in the world. His idea of dreaming was really different. He emphasized that the dreams are not those which you see when you sleep but are those which never let you sleep. Undoubtedly, these are the precious words of wisdom. He always encouraged everyone to work hard and not think about the result. He believed, if you work hard, you will definitely get the result as well.
Some countless efforts and contributions are made by Dr. Kalam for the sake of the nation. He was awarded by Bharat Ratna in the year 1997. But, the biggest grief is that we have no longer this beautiful amongst us. While delivering his speech at the Institute of Management, Shillong he got cardiac arrest and collapsed. Even after great efforts, he left us, making 27 July 2015 one of the saddest days in the history of India.
At last, I would like to say even though he left us, he is still in our hearts as the inspiration and the motivation. His golden words and miraculous deeds will always be remembered. He was a man of high stature and value who taught us the way to transform our nation and we shall always be grateful to him.
Much thank you to all of you. Have a great evening!"""

### Text-preprocessing

In [3]:
sentance = sent_tokenize(text)

In [4]:
wordNet = WordNetLemmatizer()

In [5]:
corpus =[]
for i in range(len(sentance)):
    review = sentance[i]
    review = re.sub('[^a-zA-Z]'," ", review)
    review = review.lower()
    review = review.split()
    review = [wordNet.lemmatize(word) for word in review if not word in set(stopwords.words("english"))]
    corpus.append(" ".join(review))    

In [6]:
corpus

['hello',
 'good morning every last one present',
 'going ahead would like extend warm welcome every person present',
 'thankful everyone present giving beautiful opportunity share word amazing person ever lived dr apj abdul kalam',
 'dr apj abdul kalam indeed one humble intelligent wise selfless loving lovable leader ever born',
 'born october rameswaram tamil nadu',
 'th president india served country one term',
 'also one famous scientist worked highly famous organization like drdo defense research development organization isro indian space research organization career',
 'true gem person hater',
 'let u first know little',
 'full name avul pakir jainulabdeen abdul kalam',
 'born middle class muslim family',
 'since beginning day hardworking diligent person',
 'early childhood helped family earn livelihood along study',
 'intelligent promising started career life soon',
 'saw lot hardship way success',
 'time sister sold jewelry pay college fee',
 'completing graduation joined defen

In [7]:
tfidf = TfidfVectorizer()

In [8]:
x = tfidf.fit_transform(corpus)

In [9]:
x.toarray()

array([[0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.3408959 , 0.        , ..., 0.30710955, 0.        ,
        0.        ],
       ...,
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ],
       [0.        , 0.        , 0.        , ..., 0.        , 0.        ,
        0.        ]])