## What is NLP (Natural Language Processing)?

NLP is a branch of data science that consists of systematic processes for analyzing, understanding, and deriving information from the text data in a smart and efficient manner. By utilizing NLP and its components, one can organize the massive chunks of text data, perform numerous automated tasks and solve a wide range of problems such as – automatic summarization, machine translation, named entity recognition, relationship extraction, sentiment analysis, speech recognition, topic segmentation etc.

Nowadays, most of us have smartphones that have speech recognition. These smartphones use NLP to understand what is said. Also, many people use laptops whose operating system has a built-in speech recognition.

Example:
**Cortana**
![title](https://miro.medium.com/max/700/1*TXj0kr4jVrtLtmvxZFu8Lw.png)

The Microsoft OS has a virtual assistant called Cortana that can recognize a natural voice. You can use it to set up reminders, open apps, send emails, play games, track flights and packages, check the weather and so on.

**Siri**
![title](https://miro.medium.com/max/700/1*-AuKCZbXIVOhI-AgX4J8PQ.jpeg)

Siri is a virtual assistant of the Apple Inc.’s iOS, watchOS, macOS, HomePod, and tvOS operating systems. Again, you can do a lot of things with voice commands: start a call, text someone, send an email, set a timer, take a picture, open an app, set an alarm, use navigation and so on.

### Applications of NLP:
**Machine Translation**
    It is the process by which computer software is used to translate a text from one natural language (such as English) to another (such as Spanish).


**Speech Recognition:**
    Speech recognition is the process by which a computer (or other type of machine) identifies spoken words. Basically, it means talking to your computer, AND having it correctly recognize what you are saying.
    
**Sentiment Analysis:**
    Sentiment analysis is the process of detecting positive or negative sentiment in text. It’s often used by businesses to detect sentiment in social data, gauge brand reputation, and understand customers.
    
**Question Answering:**
    Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.
    
**Text Summarization:**
    Text summarization is the process of distilling the most important information from a source (or sources) to produce an abridged version for a particular user (or users) and task (or tasks).
  
**Chatbot:**
    A chatbot is a software application used to conduct an on-line chat conversation via text or text-to-speech, in lieu of providing direct contact with a live human agent.
    
**Text Classifications:**
    Text clarification is the process of categorizing the text into a group of words. By using NLP, text classification can automatically analyze text and then assign a set of predefined tags or categories based on its context. 
    
**Optical Character Recognition:**
    Optical Character Recognition (OCR) is an electronic conversion of the typed, handwritten or printed text images into machine-encoded text
    
**Spell Checking:** 
    Spell Checking is a sequence to sequence mapping problem. Given an input sequence, potentially containing a certain number of errors, ContextSpellChecker will rank correction sequences according to three things, Different correction candidates for each word — word level.
  
**Spam Detection:**
    Spam Detection detect unsolicited, unwanted, and virus-infested email (called spam) and stop it from getting into email inboxes.

**Named Entity Recognition:**
    Named entity recognition (NER) — sometimes referred to as entity chunking, extraction, or identification — is the task of identifying and categorizing key information (entities) in text. An entity can be any word or series of words that consistently refers to the same thing. Every detected entity is classified into a predetermined category. For example, an NER machine learning (ML) model might detect the word “Bdec” in a text and classify it as a “Company”.
    
    
    
### Understanding Natural Language Processing (NLP)

   ![title](https://miro.medium.com/max/581/0*YovzfkM8Ld1LO-87.png)


As humans, perform natural language processing (NLP) considerably well, but even then, we are not perfect. We often misunderstand one thing for another, and we often interpret the same sentences or words differently.

consider the following sentence,

`I saw a man on hill with a telescope.`

These are some interpretations of the sentence shown above.
   - There is a man on the hill, and I watched him with my telescope.
   - There is a man on the hill, and he has a telescope.
   - I’m on a hill, and I saw a man using my telescope.
   - I’m on a hill, and I saw a man who has a telescope.
   - There is a man on a hill, and I saw him something with my telescope.
   
From the examples above, we can see that language processing is not “deterministic” (the same language has the same interpretations), and something suitable to one person might not be suitable to another. 

Therefore, Natural Language Processing (NLP) has a non-deterministic approach. In other words, Natural Language Processing can be used to create a new intelligent system that can understand how humans understand and interpret language in different situations.

### Components of Natural Language Processing
![title](https://miro.medium.com/max/455/0*9aT_MdjuT9xXGUdU.png)

**Lexical Analysis:**
With lexical analysis, we divide a whole chunk of text into paragraphs, sentences, and words. It involves identifying and analyzing words’ structure.

**Syntactic Analysis:**
Syntactic analysis involves the analysis of words in a sentence for grammar and arranging words in a manner that shows the relationship among the words. For instance, the sentence “The shop goes to the house” does not pass.

**Semantic Analysis:**
Semantic analysis draws the exact meaning for the words, and it analyzes the text meaningfulness. Sentences such as “hot ice-cream” do not pass.

**Disclosure Integration:**
Disclosure integration takes into account the context of the text. It considers the meaning of the sentence before it ends. For example: “He works at Google.” In this sentence, “he” must be referenced in the sentence before it.

**Pragmatic Analysis:**
Pragmatic analysis deals with overall communication and interpretation of language. It deals with deriving meaningful use of language in various situations.


Banks are using natural language processing (NLP) to automate certain document processing, analysis and customer service activities. Three applications include:

- **Intelligent document search:** finding relevant information in large volumes of scanned documents.
- **Investment analysis:** automating routine analysis of earnings reports and news so that analysts can focus on alpha generation.
- **Customer service & insights:** deploying chatbots to answer customer queries and understand customer needs.

![title](https://miro.medium.com/max/2560/1*BqX1wu57y5ApVE-5G-EC4w.png)

### Handling Text Files in python

#### Python provides inbuilt functions for creating, writing and reading files. 

**How to Open a Text File in Python**
To open a file, you need to use the built-in `open` function. The Python file open function returns a file object that contains methods and attributes to perform various operations for opening files in Python.

**Close the file instance**
To open a file, you need to use the built-in `close()` function.

**How to Read a File line by line in Python**
You can also read your .txt file line by line if your data is too big to read. `readlines()` code will segregate your data in easy to read mode.

In [2]:
my_file=open('skills_set.txt','r') #open the file
print(my_file.readline())# reading first line of the document.
print('--------------')
print(my_file.readline())# reading second line of the document.
my_file.close()#close the file.

skill_set

--------------
SAP



In [3]:
my_file=open('skills_set.txt','r') #open the file
print(my_file.read(20))# reading 20 word of the document.
my_file.close()

skill_set
SAP
SQL
Ma


In [4]:
# Reading each line of document
my_file=open('skills_set.txt','r')
for line in my_file:
    print(line)
my_file.close()

skill_set

SAP

SQL

Machine Learning

R

SAS

Python

Data Mining

Data Management

STATA

SPSS

Data Analysis

Certified Internal Auditor

Statistical Software

Time Management

Microsoft Office

Excel

Tableau

Data Science

AI

Quantitative Analysis

Analysis Skills

CSS

Image Processing

Cloud Computing

ArcGIS

GIS

AWS

Linux

C/C++

JavaScript

TS/SCI Clearance

TensorFlow

Project Planning

Jira

Statisical Analysis

Scala

Java

"""Drivers License"""

Microsoft SQL Server

Visual Basic

Microsoft Access

LMRT

Natural Language Processing

IIS

Power BI

CAD Software

SharePoint

nan

Git

SVN

React

Adobe Photoshop

HTML5

Scripting

Perl

MATLAB

Splunk

Grant Writing

Hive

Spark

Microsoft Powerpoint

Hadoop

Qualitative Research

.Net

C#

Big Data

Software Development

Signal Processing

Design Experience

Pentaho

Oracle

DB2

Next Generation Sequencing

Bioinformatics

Data Warehouse

Alteryx

Ruby

Marketing

Customer Service

Predictive Analytics

Business Intelli

### Handling json files in python

*JSON (JavaScript Object Notation) is a popular data format used for representing structured data. It's common to transmit and receive data between a server and web application in JSON format.*

**How to read JSON file in python**
  - You can use json.load() method to read a file containing JSON object.

In [5]:
import json #import json module

In [6]:
file=open('college.json','r') #opening the json file
data = json.load(file)# returns JSON object as a dictionary
data

[{'domains': ['marywood.edu'],
  'web_pages': ['http://www.marywood.edu'],
  'name': 'Marywood University',
  'alpha_two_code': 'US',
  'state-province': None,
  'country': 'United States'},
 {'domains': ['lindenwood.edu'],
  'web_pages': ['http://www.lindenwood.edu/'],
  'name': 'Lindenwood University',
  'alpha_two_code': 'US',
  'state-province': None,
  'country': 'United States'},
 {'domains': ['sullivan.edu'],
  'web_pages': ['https://sullivan.edu/'],
  'name': 'Sullivan University',
  'alpha_two_code': 'US',
  'state-province': None,
  'country': 'United States'},
 {'domains': ['fscj.edu'],
  'web_pages': ['https://www.fscj.edu/'],
  'name': 'Florida State College at Jacksonville',
  'alpha_two_code': 'US',
  'state-province': None,
  'country': 'United States'},
 {'domains': ['xavier.edu'],
  'web_pages': ['https://www.xavier.edu/'],
  'name': 'Xavier University',
  'alpha_two_code': 'US',
  'state-province': None,
  'country': 'United States'},
 {'domains': ['tusculum.edu'],
 

**How to read JSON file in python**
   - Another approach for reading json file through pandas module. `pd.read_json()`

In [7]:
#Another approach for reading json file through pandas module.
import pandas as pd
tweet=pd.read_json("college.json")
tweet.head()

Unnamed: 0,domains,web_pages,name,alpha_two_code,state-province,country
0,[marywood.edu],[http://www.marywood.edu],Marywood University,US,,United States
1,[lindenwood.edu],[http://www.lindenwood.edu/],Lindenwood University,US,,United States
2,[sullivan.edu],[https://sullivan.edu/],Sullivan University,US,,United States
3,[fscj.edu],[https://www.fscj.edu/],Florida State College at Jacksonville,US,,United States
4,[xavier.edu],[https://www.xavier.edu/],Xavier University,US,,United States


### Regular Expressions
- A RegEx, or Regular Expression, is a sequence of characters that forms a search pattern.
- RegEx can be used to check if a string contains the specified search pattern.
- When you have imported the re module, you can start using regular expressions

In [8]:
s1 = 'Python is an excellent language'
s2 = 'I love the Python language. I also use Python to build applications at work!'

In [9]:
import re
pattern = 'python'
# match only returns a match if regex match is found at the beginning of the string
re.match(pattern, s1)

In [10]:
# pattern is in lower case hence ignore case flag helps
# in matching same pattern with different cases
re.match(pattern, s1, flags=re.IGNORECASE)

<re.Match object; span=(0, 6), match='Python'>

In [11]:
# printing matched string and its indices in the original string
m = re.match(pattern, s1, flags=re.IGNORECASE)
print('Found match {} ranging from index {} - {} in the string "{}"'.format(m.group(0), 
                                                                            m.start(), 
                                                                            m.end(), s1))

Found match Python ranging from index 0 - 6 in the string "Python is an excellent language"


In [12]:
# we use re.search to match word anywhere within the string.
import re
fhand=open('myfile.txt')
for line in fhand:
    line=line.rstrip()
    if re.search('how',line): # Checks the 'Pattern' to be search. Returns True or False
        print(line)
fhand.close()

hello how are you?
hello there how do you do?


In [13]:
# Display only those lines which are starting with 'I am'
import re
fhand=open('myfile.txt')
for line in fhand:
    line=line.rstrip()
    if re.search('^I am',line): #here we use search modulr to match the
        print(line)
fhand.close()

I am doing good


In [14]:
#Demo of findall() method
# Extract domain from a given string
import re
s='A message from csev@umich.edu to cwen@iupui.edu about meeting @2PM'
ls=re.findall('\S+@\S+',s)
print(ls)

# Extract only the domain names
for email in ls:
    x=email.find('@')
    print(email[x+1:])

['csev@umich.edu', 'cwen@iupui.edu']
umich.edu
iupui.edu


In [15]:
# illustrating pattern substitution using sub and subn methods
re.sub(pattern, 'Java', s2, flags=re.IGNORECASE)

'I love the Java language. I also use Java to build applications at work!'

In [16]:
re.subn(pattern, 'Java', s2, flags=re.IGNORECASE)

('I love the Java language. I also use Java to build applications at work!', 2)

### NLTK 
(Natural Language Toolkit) is a suite that contains libraries and programs for statistical language processing. It is one of the most powerful NLP libraries, which contains packages to make machines understand human language and reply to it with an appropriate response.

- if nltk library is not install use pip method to install it.

**!pip install nltk**

after installation use nltk.download to install all the other packages of nltk.


![Capture.PNG](attachment:Capture.PNG)

In [17]:
import nltk

In [18]:
#nltk.download()

## Text Preprocessing

Since text is the most unstructured form of all the available data, various types of noise are present in it and the data is not readily analyzable without any pre-processing. The entire process of cleaning and standardization of text, making it noise-free and ready for analysis, is known as text preprocessing.


![The text data preprocessing framework.](https://www.kdnuggets.com/wp-content/uploads/text-preprocessing-framework-2.png)
### Basic Text Pre-processing of text data
- Case Conversion
- Punctuation removal
- Stopwords removal
- Spelling correction
- Tokenization
- Stemming
- Lemmatization

### Case Conversion

If the text is in the same case, it is easy for a machine to interpret the words because the lower case and upper case are treated differently by the machine. For example, words like Ball and ball are treated differently by machine. So, we need to make the text in the same case and the most preferred case is a lower case to avoid such problems.

In [19]:
text='Natural language processing (NLP), describes the interaction between human language and computers.'
text

'Natural language processing (NLP), describes the interaction between human language and computers.'

In [20]:
#conversion of text into lower case.
text.lower()

'natural language processing (nlp), describes the interaction between human language and computers.'

In [21]:
#conversion of text into upper letter.
text.upper()

'NATURAL LANGUAGE PROCESSING (NLP), DESCRIBES THE INTERACTION BETWEEN HUMAN LANGUAGE AND COMPUTERS.'

In [22]:
# Load the imdb review dataset 
imdb=pd.read_csv('imdb_sentiment.csv')

In [23]:
#converting each review into lower to avoid duplication of word in sentence.
imdb['review']=imdb['review'].apply(lambda x :x.lower())
imdb['review']

0      a very, very, very slow-moving, aimless movie ...
1      not sure who was more lost - the flat characte...
2      attempting artiness with black & white and cle...
3           very little music or anything to speak of.  
4      the best scene in the movie was when gerardo i...
                             ...                        
743    i just got bored watching jessice lange take h...
744    unfortunately, any virtue in this film's produ...
745                     in a word, it is embarrassing.  
746                                 exceptionally bad!  
747    all in all its an insult to one's intelligence...
Name: review, Length: 748, dtype: object

### Punctuation Removal
One of the other text processing techniques is removing punctuations. There are total 32 main punctuations that need to be taken care of. We can directly use the string module with a regular expression to replace any punctuation in text with an empty string. 

In [24]:
#removal of punctuation using regex.
text

'Natural language processing (NLP), describes the interaction between human language and computers.'

In [25]:
text=re.sub(r'[^\w\s]','',text) #remove everything except words and space
text

'Natural language processing NLP describes the interaction between human language and computers'

In [26]:
imdb['clean']=imdb['review'].apply(lambda x : re.sub(r'[^\w\s]',' ',x))

In [27]:
imdb['clean']

0      a very  very  very slow moving  aimless movie ...
1      not sure who was more lost   the flat characte...
2      attempting artiness with black   white and cle...
3           very little music or anything to speak of   
4      the best scene in the movie was when gerardo i...
                             ...                        
743    i just got bored watching jessice lange take h...
744    unfortunately  any virtue in this film s produ...
745                     in a word  it is embarrassing   
746                                 exceptionally bad   
747    all in all its an insult to one s intelligence...
Name: clean, Length: 748, dtype: object

In [28]:
#remove punctuation using string module
import string
imdb['clean1']=imdb['review'].apply(lambda x: re.sub('[%s]' % re.escape(string.punctuation), '' , x))

In [29]:
imdb['clean1']

0      a very very very slowmoving aimless movie abou...
1      not sure who was more lost  the flat character...
2      attempting artiness with black  white and clev...
3            very little music or anything to speak of  
4      the best scene in the movie was when gerardo i...
                             ...                        
743    i just got bored watching jessice lange take h...
744    unfortunately any virtue in this films product...
745                       in a word it is embarrassing  
746                                  exceptionally bad  
747    all in all its an insult to ones intelligence ...
Name: clean1, Length: 748, dtype: object

### What are stop words?

Stopwords are the words in any language which does not add much meaning to a sentence. They can be safely ignored without sacrificing the meaning of the sentence. For some search engines, these are some of the most common, short function words, such as the, is, at, which, and on. In this case, stop words can cause problems when searching for phrases that include them, particularly in names such as “The Who” or “Take That”.

### When to remove stop words?

If we have a task of text classification or sentiment analysis then we should remove stop words as they do not provide any information to our model, i.e keeping out unwanted words out of our corpus, but if we have the task of language translation then stopwords are useful, as they have to be translated along with other words.

There is no hard and fast rule on when to remove stop words. But I would suggest removing stop words if our task to be performed is one of Language Classification, Spam Filtering, Caption Generation, Auto-Tag Generation, Sentiment analysis, or something that is related to text classification.

On the other hand, if our task is one of Machine Translation, Question-Answering problems, Text Summarization, Language Modeling, it’s better not to remove the stop words as they are a crucial part of these applications.

### Stopword Removal

In [30]:
from nltk.corpus import stopwords
stop = stopwords.words('english')
print('This are the stopwords',stop)

This are the stopwords ['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're", "you've", "you'll", "you'd", 'your', 'yours', 'yourself', 'yourselves', 'he', 'him', 'his', 'himself', 'she', "she's", 'her', 'hers', 'herself', 'it', "it's", 'its', 'itself', 'they', 'them', 'their', 'theirs', 'themselves', 'what', 'which', 'who', 'whom', 'this', 'that', "that'll", 'these', 'those', 'am', 'is', 'are', 'was', 'were', 'be', 'been', 'being', 'have', 'has', 'had', 'having', 'do', 'does', 'did', 'doing', 'a', 'an', 'the', 'and', 'but', 'if', 'or', 'because', 'as', 'until', 'while', 'of', 'at', 'by', 'for', 'with', 'about', 'against', 'between', 'into', 'through', 'during', 'before', 'after', 'above', 'below', 'to', 'from', 'up', 'down', 'in', 'out', 'on', 'off', 'over', 'under', 'again', 'further', 'then', 'once', 'here', 'there', 'when', 'where', 'why', 'how', 'all', 'any', 'both', 'each', 'few', 'more', 'most', 'other', 'some', 'such', 'no', 'nor', 'not', 'only', 'own', 

In [31]:
text

'Natural language processing NLP describes the interaction between human language and computers'

In [32]:
text=' '.join([x for x in text.split() if x not in stop]) #here we are spltting the text and then removing the stop word from list
#and theb join the list to string.
text

'Natural language processing NLP describes interaction human language computers'

In [33]:
#removing stop words from reviews.
imdb['clean']=imdb['clean'].apply(lambda x: " ".join(x for x in x.split() if x not in stop))
imdb['clean']

0      slow moving aimless movie distressed drifting ...
1      sure lost flat characters audience nearly half...
2      attempting artiness black white clever camera ...
3                            little music anything speak
4      best scene movie gerardo trying find song keep...
                             ...                        
743        got bored watching jessice lange take clothes
744    unfortunately virtue film production work lost...
745                                    word embarrassing
746                                    exceptionally bad
747             insult one intelligence huge waste money
Name: clean, Length: 748, dtype: object

- Now we can see that stop words like `"very", "a"` is removed from the review.

### Spell checks

Spelling mistakes are common and most of us are used to software indicating if a mistake was made or not. From autocorrect on our phones to red underlining in text editors, spell checking is an essential feature for many different products.

#### If textBlob is not install use pip method to install it.
**pip install textblob**

In [34]:
# Many a time some words are spelt wrongly by author either by mistake or due to typing error 
# So corpus of our word increases due to wrong spellings, hence we correct them
# We will use textBlob module.
# If textBlob is not install use pip method to install it.
#pip install textblob
from textblob import TextBlob

In [35]:
text_='hostipal is far'# here spelling of hospital is worng.

- The correct() Function
- The most straightforward way to correct input text is to use the correct() method

In [36]:
text_=TextBlob(text_).correct() # from textblob we use correct to corrrect the spellings.
text_

TextBlob("hospital is far")

In [37]:
# same way we can use for reviews
imdb['clean'][:10].apply(lambda x: str(TextBlob(x).correct()))

0    slow moving aimless movie distressed drifting ...
1    sure lost flat characters audience nearly half...
2    attempting artless black white clever camera a...
3                          little music anything speak
4    best scene movie gerard trying find song keeps...
5    rest movie lacks art charm meaning emptiness w...
6                                     wasted two hours
7    saw movie today thought good effort good messa...
8                                      bit predictable
9          loved casting jimmy buffets science teacher
Name: clean, dtype: object

## Tokenization

Tokenization is splititng the large chunk of word, sentence,document in to smaller unit (single word or combination of words). Smaller units are known as tokens.


### Why is Tokenization required in NLP?

Before processing a natural language, we need to identify the words that constitute a string of characters. That’s why tokenization is the most basic step to proceed with NLP (text data). This is important because the meaning of the text could easily be interpreted by analyzing the words present in the text.

### Tokenization using split()

Let’s start with the split() method as it is the most basic one. It returns a list of strings after breaking the given string by the specified separator. By default, split() breaks a string at each space. We can change the separator to anything. 

In [38]:
# Split is the most basic tokenizing technique.here we splited on the whitespaces. It split and return the list of all the words.
Text="""Because of problems with her eyesight, rey the African penguin had issues with swimming. That’s unusual for a penguin,
and presented a big challenge for our aviculture team to help Rey overcome her hesitancy.
Slowly and steadily, we trained her to be comfortable feeding in the water like the rest of the penguin colony.
The aviculturists also trained Rey to accept daily eye drops from them as part of her special health care.
Rey already had good relationships with some staff, and was comfortable with them handling her.
Senior Aviculturist Kim Fukuda says the team built on those bonds to get Rey used to receiving the eye drops.
"She knows the routine," Kim says. "I usually give her the eye drops in one area of the exhibit after all the penguins get
their vitamins. When that happens, she runs over there and waits for me." Rosa, our oldest sea otter, has very limited eyesight,
among other health issues. The sea otter team had already trained Rosa so they could examine her eyes,
and built on that trust to include administering the eye drops she needs."""
Text.split()

['Because',
 'of',
 'problems',
 'with',
 'her',
 'eyesight,',
 'rey',
 'the',
 'African',
 'penguin',
 'had',
 'issues',
 'with',
 'swimming.',
 'That’s',
 'unusual',
 'for',
 'a',
 'penguin,',
 'and',
 'presented',
 'a',
 'big',
 'challenge',
 'for',
 'our',
 'aviculture',
 'team',
 'to',
 'help',
 'Rey',
 'overcome',
 'her',
 'hesitancy.',
 'Slowly',
 'and',
 'steadily,',
 'we',
 'trained',
 'her',
 'to',
 'be',
 'comfortable',
 'feeding',
 'in',
 'the',
 'water',
 'like',
 'the',
 'rest',
 'of',
 'the',
 'penguin',
 'colony.',
 'The',
 'aviculturists',
 'also',
 'trained',
 'Rey',
 'to',
 'accept',
 'daily',
 'eye',
 'drops',
 'from',
 'them',
 'as',
 'part',
 'of',
 'her',
 'special',
 'health',
 'care.',
 'Rey',
 'already',
 'had',
 'good',
 'relationships',
 'with',
 'some',
 'staff,',
 'and',
 'was',
 'comfortable',
 'with',
 'them',
 'handling',
 'her.',
 'Senior',
 'Aviculturist',
 'Kim',
 'Fukuda',
 'says',
 'the',
 'team',
 'built',
 'on',
 'those',
 'bonds',
 'to',
 'get',

### Tokenization using regex.

The re.findall() function finds all the words that match the pattern passed on it and stores it in the list.
The “\w” represents “any word character” which usually means alphanumeric (letters, numbers) and underscore (_). ‘+’ means any number of times. So [\w’]+ signals that the code should find all the alphanumeric characters until any other character is encountered.

In [39]:
# we will use re library in Python to work with regular expression.
tokens = re.findall("[\w']+", Text)
tokens

['Because',
 'of',
 'problems',
 'with',
 'her',
 'eyesight',
 'rey',
 'the',
 'African',
 'penguin',
 'had',
 'issues',
 'with',
 'swimming',
 'That',
 's',
 'unusual',
 'for',
 'a',
 'penguin',
 'and',
 'presented',
 'a',
 'big',
 'challenge',
 'for',
 'our',
 'aviculture',
 'team',
 'to',
 'help',
 'Rey',
 'overcome',
 'her',
 'hesitancy',
 'Slowly',
 'and',
 'steadily',
 'we',
 'trained',
 'her',
 'to',
 'be',
 'comfortable',
 'feeding',
 'in',
 'the',
 'water',
 'like',
 'the',
 'rest',
 'of',
 'the',
 'penguin',
 'colony',
 'The',
 'aviculturists',
 'also',
 'trained',
 'Rey',
 'to',
 'accept',
 'daily',
 'eye',
 'drops',
 'from',
 'them',
 'as',
 'part',
 'of',
 'her',
 'special',
 'health',
 'care',
 'Rey',
 'already',
 'had',
 'good',
 'relationships',
 'with',
 'some',
 'staff',
 'and',
 'was',
 'comfortable',
 'with',
 'them',
 'handling',
 'her',
 'Senior',
 'Aviculturist',
 'Kim',
 'Fukuda',
 'says',
 'the',
 'team',
 'built',
 'on',
 'those',
 'bonds',
 'to',
 'get',
 'Re

- The re.findall() function finds all the words that match the pattern passed on it and stores it in the list.
-  The “\w” represents “any word character” which usually means alphanumeric (letters, numbers) and underscore (_). ‘+’ means any number of times.

### Tokenization using NLTK

NLTK contains a module called tokenize() which further classifies into two sub-categories:

- Word tokenize: We use the word_tokenize() method to split a sentence into tokens or words.
- Sentence tokenize: We use the sent_tokenize() method to split a document or paragraph into sentences.

In [40]:
#Tokenize module have further 2 module. word_tokenize, sentence_tokenize.
from nltk.tokenize import word_tokenize
token=word_tokenize(Text)
token

['Because',
 'of',
 'problems',
 'with',
 'her',
 'eyesight',
 ',',
 'rey',
 'the',
 'African',
 'penguin',
 'had',
 'issues',
 'with',
 'swimming',
 '.',
 'That',
 '’',
 's',
 'unusual',
 'for',
 'a',
 'penguin',
 ',',
 'and',
 'presented',
 'a',
 'big',
 'challenge',
 'for',
 'our',
 'aviculture',
 'team',
 'to',
 'help',
 'Rey',
 'overcome',
 'her',
 'hesitancy',
 '.',
 'Slowly',
 'and',
 'steadily',
 ',',
 'we',
 'trained',
 'her',
 'to',
 'be',
 'comfortable',
 'feeding',
 'in',
 'the',
 'water',
 'like',
 'the',
 'rest',
 'of',
 'the',
 'penguin',
 'colony',
 '.',
 'The',
 'aviculturists',
 'also',
 'trained',
 'Rey',
 'to',
 'accept',
 'daily',
 'eye',
 'drops',
 'from',
 'them',
 'as',
 'part',
 'of',
 'her',
 'special',
 'health',
 'care',
 '.',
 'Rey',
 'already',
 'had',
 'good',
 'relationships',
 'with',
 'some',
 'staff',
 ',',
 'and',
 'was',
 'comfortable',
 'with',
 'them',
 'handling',
 'her',
 '.',
 'Senior',
 'Aviculturist',
 'Kim',
 'Fukuda',
 'says',
 'the',
 'tea

- NLTK consider punctuation as tokens. so we can remove the punctuation for further use.

### Tokenization using the spaCy library

spaCy is an open-source library for advanced Natural Language Processing (NLP). It supports over 49+ languages and provides state-of-the-art computation speed.
- Spacy is faster than its other contenders

Installation of Spacy

**pip install -U pip setuptools wheel**

**pip install -U spacy**

**python -m spacy download en_core_web_sm**

In [41]:
# if spacy is not installed, use pip method to install it
# import library.
from spacy.lang.en import English
# Load English tokenizer
nlp = English()

In [42]:
my_doc = nlp(Text)
# Create list of word tokens
token_list = []
for token in my_doc:
    token_list.append(token.text)
token_list

['Because',
 'of',
 'problems',
 'with',
 'her',
 'eyesight',
 ',',
 'rey',
 'the',
 'African',
 'penguin',
 'had',
 'issues',
 'with',
 'swimming',
 '.',
 'That',
 '’s',
 'unusual',
 'for',
 'a',
 'penguin',
 ',',
 '\n',
 'and',
 'presented',
 'a',
 'big',
 'challenge',
 'for',
 'our',
 'aviculture',
 'team',
 'to',
 'help',
 'Rey',
 'overcome',
 'her',
 'hesitancy',
 '.',
 '\n',
 'Slowly',
 'and',
 'steadily',
 ',',
 'we',
 'trained',
 'her',
 'to',
 'be',
 'comfortable',
 'feeding',
 'in',
 'the',
 'water',
 'like',
 'the',
 'rest',
 'of',
 'the',
 'penguin',
 'colony',
 '.',
 '\n',
 'The',
 'aviculturists',
 'also',
 'trained',
 'Rey',
 'to',
 'accept',
 'daily',
 'eye',
 'drops',
 'from',
 'them',
 'as',
 'part',
 'of',
 'her',
 'special',
 'health',
 'care',
 '.',
 '\n',
 'Rey',
 'already',
 'had',
 'good',
 'relationships',
 'with',
 'some',
 'staff',
 ',',
 'and',
 'was',
 'comfortable',
 'with',
 'them',
 'handling',
 'her',
 '.',
 '\n',
 'Senior',
 'Aviculturist',
 'Kim',
 'F

In [43]:
#tokenizing the reviews of imdb datasets.
review=' '.join(imdb['review'])
my_doc = nlp(review)
# Create list of word tokens
token_list = []
for token in my_doc:
    token_list.append(token.text)
token_list

['a',
 'very',
 ',',
 'very',
 ',',
 'very',
 'slow',
 '-',
 'moving',
 ',',
 'aimless',
 'movie',
 'about',
 'a',
 'distressed',
 ',',
 'drifting',
 'young',
 'man',
 '.',
 '  ',
 'not',
 'sure',
 'who',
 'was',
 'more',
 'lost',
 '-',
 'the',
 'flat',
 'characters',
 'or',
 'the',
 'audience',
 ',',
 'nearly',
 'half',
 'of',
 'whom',
 'walked',
 'out',
 '.',
 '  ',
 'attempting',
 'artiness',
 'with',
 'black',
 '&',
 'white',
 'and',
 'clever',
 'camera',
 'angles',
 ',',
 'the',
 'movie',
 'disappointed',
 '-',
 'became',
 'even',
 'more',
 'ridiculous',
 '-',
 'as',
 'the',
 'acting',
 'was',
 'poor',
 'and',
 'the',
 'plot',
 'and',
 'lines',
 'almost',
 'non',
 '-',
 'existent',
 '.',
 '  ',
 'very',
 'little',
 'music',
 'or',
 'anything',
 'to',
 'speak',
 'of',
 '.',
 '  ',
 'the',
 'best',
 'scene',
 'in',
 'the',
 'movie',
 'was',
 'when',
 'gerardo',
 'is',
 'trying',
 'to',
 'find',
 'a',
 'song',
 'that',
 'keeps',
 'running',
 'through',
 'his',
 'head',
 '.',
 '  ',
 

### What is Stemming?

Stemming is an elementary rule-based process for removing inflectional forms from a token and the outputs are the stem of the world.

For example, “laughing”, “laughed“, “laughs”, “laugh” will all become “laugh”, which is their stem, because their inflection form will be removed.

Stemming is not a good normalization process because sometimes stemming can produce words that are not in the dictionary. For example, consider a sentence: “His teams are not winning”

After stemming the tokens that we will get are- “hi”, “team”, “are”, “not”,  “winn”

Notice that the keyword “winn” is not a regular word and “hi” changed the context of the entire sentence.

- **2 types of stemmers:**
  
    1. **Porter’s Stemmer:** 
    It is one of the most popular stemming methods proposed in 1980. It is based on the idea that the suffixes in the English language are made up of a combination of smaller and simpler suffixes. This stemmer is known for its speed and simplicity. The main applications of Porter Stemmer include data mining and Information retrieval. However, its applications are only limited to English words. Also, the group of stems is mapped on to the same stem and the output stem is not necessarily a meaningful word. The algorithms are fairly lengthy in nature and are known to be the oldest stemmer.
        
    2. **Snowball stemmer:**
    When compared to the Porter Stemmer, the Snowball Stemmer can map non-English words too. Since it supports other languages the Snowball Stemmers can be called a multi-lingual stemmer. The Snowball stemmers are also imported from the nltk package. This stemmer is based on a programming language called ‘Snowball’ that processes small strings and is the most widely used stemmer. The Snowball stemmer is way more aggressive than Porter Stemmer and is also referred to as Porter2 Stemmer. Because of the improvements added when compared to the Porter Stemmer, the Snowball stemmer is having greater computational speed.


In [44]:
#NLTK library used for stemming.
from nltk.stem.snowball import PorterStemmer,SnowballStemmer
#PorterStemmer
port=PorterStemmer()
words=[]
for word in Text.split(' '):
    words.append(port.stem(word))
Text_=' '.join(words)
Text_

'becaus of problem with her eyesight, rey the african penguin had issu with swimming. that’ unusu for a penguin,\nand present a big challeng for our avicultur team to help rey overcom her hesitancy.\nslowli and steadily, we train her to be comfort feed in the water like the rest of the penguin colony.\nth aviculturist also train rey to accept daili eye drop from them as part of her special health care.\nrey alreadi had good relationship with some staff, and wa comfort with them handl her.\nsenior aviculturist kim fukuda say the team built on those bond to get rey use to receiv the eye drops.\n"sh know the routine," kim says. "I usual give her the eye drop in one area of the exhibit after all the penguin get\ntheir vitamins. when that happens, she run over there and wait for me." rosa, our oldest sea otter, ha veri limit eyesight,\namong other health issues. the sea otter team had alreadi train rosa so they could examin her eyes,\nand built on that trust to includ administ the eye drop 

In [45]:
#Using SnowballStemmer
snow=SnowballStemmer('english')
words=[]
for word in Text.split(' '):
    words.append(snow.stem(word))
Text_=' '.join(words)
Text_

'becaus of problem with her eyesight, rey the african penguin had issu with swimming. that unusu for a penguin,\nand present a big challeng for our avicultur team to help rey overcom her hesitancy.\nslowli and steadily, we train her to be comfort feed in the water like the rest of the penguin colony.\nth aviculturist also train rey to accept daili eye drop from them as part of her special health care.\nrey alreadi had good relationship with some staff, and was comfort with them handl her.\nsenior aviculturist kim fukuda say the team built on those bond to get rey use to receiv the eye drops.\n"sh know the routine," kim says. "i usual give her the eye drop in one area of the exhibit after all the penguin get\ntheir vitamins. when that happens, she run over there and wait for me." rosa, our oldest sea otter, has veri limit eyesight,\namong other health issues. the sea otter team had alreadi train rosa so they could examin her eyes,\nand built on that trust to includ administ the eye drop

In [46]:
#stemming the review from imdb dataset.
imdb['clean']=imdb['clean'].apply(lambda x: " ".join([snow.stem(word) for word in x.split()]))

In [47]:
imdb['clean']

0        slow move aimless movi distress drift young man
1          sure lost flat charact audienc near half walk
2      attempt arti black white clever camera angl mo...
3                                littl music anyth speak
4      best scene movi gerardo tri find song keep run...
                             ...                        
743                got bore watch jessic lang take cloth
744    unfortun virtu film product work lost regrett ...
745                                       word embarrass
746                                           except bad
747                  insult one intellig huge wast money
Name: clean, Length: 748, dtype: object

### What is Lemmatization?
Lemmatization, on the other hand, is a systematic step-by-step process for removing inflection forms of a word. It makes use of vocabulary, word structure, part of speech tags, and grammar relations.

The output of lemmatization is the root word called a lemma. For example,

Am, Are, Is >> Be

Running, Ran, Run >> Run

Also, since it is a systematic process while performing lemmatization one can specify the part of the speech tag for the desired term and lemmatization will only be performed if the given word has the proper part of the speech tag. For example, if we try to lemmatize the word running as a verb, it will be converted to run. But if we try to lemmatize the same word running as a noun it won’t be converted.

![title](https://cdn.analyticsvidhya.com/wp-content/uploads/2021/02/Screenshot-from-2021-02-23-15-07-22.png)

In [48]:
from nltk import WordNetLemmatizer
lemma=WordNetLemmatizer()
words=[]
for word in Text.split(' '):
    words.append(lemma.lemmatize(word))
Text_=' '.join(words)
Text_

'Because of problem with her eyesight, rey the African penguin had issue with swimming. That’s unusual for a penguin,\nand presented a big challenge for our aviculture team to help Rey overcome her hesitancy.\nSlowly and steadily, we trained her to be comfortable feeding in the water like the rest of the penguin colony.\nThe aviculturists also trained Rey to accept daily eye drop from them a part of her special health care.\nRey already had good relationship with some staff, and wa comfortable with them handling her.\nSenior Aviculturist Kim Fukuda say the team built on those bond to get Rey used to receiving the eye drops.\n"She know the routine," Kim says. "I usually give her the eye drop in one area of the exhibit after all the penguin get\ntheir vitamins. When that happens, she run over there and wait for me." Rosa, our oldest sea otter, ha very limited eyesight,\namong other health issues. The sea otter team had already trained Rosa so they could examine her eyes,\nand built on th

# Parts of Speech Tagging

Source: https://towardsdatascience.com/a-practitioners-guide-to-natural-language-processing-part-i-processing-understanding-text-9f4abfd13e72

![title](https://cdn.analyticsvidhya.com/wp-content/uploads/2021/02/Screenshot-from-2021-02-23-15-44-27.png)

For any language, syntax and structure usually go hand in hand, where a set of specific rules, conventions, and principles govern the way words are combined into phrases; phrases get combines into clauses; and clauses get combined into sentences. 

Knowledge about the structure and syntax of language is helpful in many areas like text processing, annotation, and parsing for further operations such as text classification or summarization.

__Parts of speech (POS)__ are specific lexical categories to which words are assigned, based on their syntactic context and role. Usually, words can fall into one of the following major categories.

+ __N(oun)__: This usually denotes words that depict some object or entity, which may be living or nonliving. Some examples would be fox , dog , book , and so on. The POS tag symbol for nouns is N.

+ __V(erb)__: Verbs are words that are used to describe certain actions, states, or occurrences. There are a wide variety of further subcategories, such as auxiliary, reflexive, and transitive verbs (and many more). Some typical examples of verbs would be running , jumping , read , and write . The POS tag symbol for verbs is V.

+ __Adj(ective)__: Adjectives are words used to describe or qualify other words, typically nouns and noun phrases. The phrase beautiful flower has the noun (N) flower which is described or qualified using the adjective (ADJ) beautiful . The POS tag symbol for adjectives is ADJ .

+ __Adv(erb)__: Adverbs usually act as modifiers for other words including nouns, adjectives, verbs, or other adverbs. The phrase very beautiful flower has the adverb (ADV) very , which modifies the adjective (ADJ) beautiful , indicating the degree to which the flower is beautiful. The POS tag symbol for adverbs is ADV.

Besides these four major categories of parts of speech , there are other categories that occur frequently in the English language. These include pronouns, prepositions, interjections, conjunctions, determiners, and many others. Furthermore, each POS tag like the noun (N) can be further subdivided into categories like __singular nouns (NN)__, __singular proper nouns (NNP)__, and __plural nouns (NNS)__.

The process of classifying and labeling POS tags for words called parts of speech tagging or POS tagging .

## Guide to POS Tags

The most common part of speech (POS) tag schemes are those developed for the Penn Treebank.

| POS Tag | Description | Example |
|---------|---------------------------------------|-----------------------------------------|
| CC | coordinating conjunction | and |
| CD | cardinal number | 1, third |
| DT | determiner | the |
| EX | existential there | there is |
| FW | foreign word | d’hoevre |
| IN | preposition/subordinating conjunction | in, of, like |
| JJ | adjective | big |
| JJR | adjective, comparative | bigger |
| JJS | adjective, superlative | biggest |
| LS | list marker | 1) |
| MD | modal | could, will |
| NN | noun, singular or mass | door |
| NNS | noun plural | doors |
| NNP | proper noun, singular | John |
| NNPS | proper noun, plural | Vikings |
| PDT | predeterminer | both the boys |
| POS | possessive ending | friend‘s |
| PRP | personal pronoun | I, he, it |
| PRP\$ | possessive pronoun | my, his |
| RB | adverb | however, usually, naturally, here, good |
| RBR | adverb, comparative | better |
| RBS | adverb, superlative | best |
| RP | particle | give up |
| TO | to | to go, to him |
| UH | interjection | uhhuhhuhh |
| VB | verb, base form | take |
| VBD | verb, past tense | took |
| VBG | verb, gerund/present participle | taking |
| VBN | verb, past participle | taken |
| VBP | verb, sing. present, non-3d | take |
| VBZ | verb, 3rd person sing. present | takes |
| WDT | wh-determiner | which |
| WP | wh-pronoun | who, what |
| WP\$ | possessive wh-pronoun | whose |
| WRB | wh-abverb | where, when |

Source: https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

### POS Using NLTK

In [49]:
sentence = 'Mr. Trump became president after winning the political election. Though he lost the support of some republican friends, Trump is friends with President Putin'
sentence

'Mr. Trump became president after winning the political election. Though he lost the support of some republican friends, Trump is friends with President Putin'

In [50]:
#importing nltk library.
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')

[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\Admin\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     C:\Users\Admin\AppData\Roaming\nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


True

In [51]:
#Tokenize each word using word_tokenize using nltk module thne find the pos tag for each word.
nltk_pos_tagged = nltk.pos_tag(word_tokenize(sentence))
nltk_pos_tagged

[('Mr.', 'NNP'),
 ('Trump', 'NNP'),
 ('became', 'VBD'),
 ('president', 'NN'),
 ('after', 'IN'),
 ('winning', 'VBG'),
 ('the', 'DT'),
 ('political', 'JJ'),
 ('election', 'NN'),
 ('.', '.'),
 ('Though', 'IN'),
 ('he', 'PRP'),
 ('lost', 'VBD'),
 ('the', 'DT'),
 ('support', 'NN'),
 ('of', 'IN'),
 ('some', 'DT'),
 ('republican', 'JJ'),
 ('friends', 'NNS'),
 (',', ','),
 ('Trump', 'NNP'),
 ('is', 'VBZ'),
 ('friends', 'NNS'),
 ('with', 'IN'),
 ('President', 'NNP'),
 ('Putin', 'NNP')]

In [52]:
import pandas as pd
#creating dataframe for word and its tag.
POS_df=pd.DataFrame(nltk_pos_tagged, 
             columns=['Word', 'POS tag'])
POS_df

Unnamed: 0,Word,POS tag
0,Mr.,NNP
1,Trump,NNP
2,became,VBD
3,president,NN
4,after,IN
5,winning,VBG
6,the,DT
7,political,JJ
8,election,NN
9,.,.


### POS Using Spacy

In [53]:
import spacy #loading spacy

nlp = spacy.load('en_core_web_sm') # english module.

sentence_nlp = nlp(sentence)
spacy_pos_tagged = [(word, word.tag_, word.pos_) for word in sentence_nlp]
spacy_pos_tagged

[(Mr., 'NNP', 'PROPN'),
 (Trump, 'NNP', 'PROPN'),
 (became, 'VBD', 'VERB'),
 (president, 'NN', 'NOUN'),
 (after, 'IN', 'ADP'),
 (winning, 'VBG', 'VERB'),
 (the, 'DT', 'DET'),
 (political, 'JJ', 'ADJ'),
 (election, 'NN', 'NOUN'),
 (., '.', 'PUNCT'),
 (Though, 'IN', 'SCONJ'),
 (he, 'PRP', 'PRON'),
 (lost, 'VBD', 'VERB'),
 (the, 'DT', 'DET'),
 (support, 'NN', 'NOUN'),
 (of, 'IN', 'ADP'),
 (some, 'DT', 'DET'),
 (republican, 'JJ', 'ADJ'),
 (friends, 'NNS', 'NOUN'),
 (,, ',', 'PUNCT'),
 (Trump, 'NNP', 'PROPN'),
 (is, 'VBZ', 'AUX'),
 (friends, 'NNS', 'NOUN'),
 (with, 'IN', 'ADP'),
 (President, 'NNP', 'PROPN'),
 (Putin, 'NNP', 'PROPN')]

In [54]:
spacy_POS_DF=pd.DataFrame(spacy_pos_tagged, columns=['Word', 'POS Tag', 'Tag Type'])
spacy_POS_DF.head(10)

Unnamed: 0,Word,POS Tag,Tag Type
0,Mr.,NNP,PROPN
1,Trump,NNP,PROPN
2,became,VBD,VERB
3,president,NN,NOUN
4,after,IN,ADP
5,winning,VBG,VERB
6,the,DT,DET
7,political,JJ,ADJ
8,election,NN,NOUN
9,.,.,PUNCT


# Named Entity Recognition

In any text document, there are particular terms that represent specific entities that are more informative and have a unique context. These entities are known as named entities , which more specifically refer to terms that represent real-world objects like people, places, organizations, and so on, which are often denoted by proper names. 

__Named entity recognition (NER)__ , also known as entity chunking/extraction , is a popular technique used in information extraction to identify and segment the named entities and classify or categorize them under various predefined classes.

There are out of the box NER taggers available through popular libraries like __`nltk`__ and __`spacy`__. Each library follows a different approach to solve the problem.

### NER Using Spacy

In [55]:
import spacy
nlp = spacy.load('en_core_web_sm')
text_nlp = nlp(sentence)

In [56]:
# print named entities in article
ner_tagged = [(word.text, word.ent_type_) for word in text_nlp]
print(ner_tagged)

[('Mr.', ''), ('Trump', 'PERSON'), ('became', ''), ('president', ''), ('after', ''), ('winning', ''), ('the', ''), ('political', ''), ('election', ''), ('.', ''), ('Though', ''), ('he', ''), ('lost', ''), ('the', ''), ('support', ''), ('of', ''), ('some', ''), ('republican', 'NORP'), ('friends', ''), (',', ''), ('Trump', 'ORG'), ('is', ''), ('friends', ''), ('with', ''), ('President', ''), ('Putin', 'PERSON')]


In [57]:
from spacy import displacy #module to visulaize name entities.

# visualize named entities
displacy.render(text_nlp, style='ent', jupyter=True)

In [58]:
# words where entities mentioned.
named_entities = []
temp_entity_name = ''
temp_named_entity = None
for term, tag in ner_tagged:
    if tag:
        temp_entity_name = ' '.join([temp_entity_name, term]).strip()
        temp_named_entity = (temp_entity_name, tag)
    else:
        if temp_named_entity:
            named_entities.append(temp_named_entity)
            temp_entity_name = ''
            temp_named_entity = None

In [59]:
named_entities

[('Trump', 'PERSON'), ('republican', 'NORP'), ('Trump', 'ORG')]

In [60]:
#NER on IMDB reviews.
sent1=' '.join(imdb['review'])
# print named entities in article
text_nlp = nlp(sent1)
ner_tagged = [(word.text, word.ent_type_) for word in text_nlp]

In [61]:
# words where entities mentioned.
named_entities = []
temp_entity_name = ''
temp_named_entity = None
for term, tag in ner_tagged:
    if tag:
        temp_entity_name = ' '.join([temp_entity_name, term]).strip()
        temp_named_entity = (temp_entity_name, tag)
    else:
        if temp_named_entity:
            named_entities.append(temp_named_entity)
            temp_entity_name = ''
            temp_named_entity = None
named_entities

[('nearly half', 'CARDINAL'),
 ('two hours', 'TIME'),
 ('today', 'DATE'),
 ('two', 'CARDINAL'),
 ('dozen', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('0', 'CARDINAL'),
 ('one', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('today', 'DATE'),
 ('1', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('one', 'CARDINAL'),
 ('another 2 hours', 'TIME'),
 ('0', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('9', 'CARDINAL'),
 ('10', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('0', 'CARDINAL'),
 ('one', 'CARDINAL'),
 ('2 hours', 'TIME'),
 ('1', 'CARDINAL'),
 ('one', 'CARDINAL'),
 ('first', 'ORDINAL'),
 ('10 to in years', 'DATE'),
 ('10', 'CARDINAL'),
 ('tonight', 'TIME'),
 ('an hour and a half', 'TIME'),
 ('the end of the night', 'DATE'),
 ('10/10', 'CARDINAL'),
 ('1', 'CARDINAL'),
 ('last night', 'TIME'),
 ('an hour and half', 'CARDINAL'),
 ('thousand', 'CARDINAL'),
 ('a few minutes', 'TIME'),
 ('five dollars', 'MONEY'),
 ('7.50',

In [62]:
from spacy import displacy #module to visulaize name entities.

# visualize named entities
displacy.render(text_nlp, style='ent', jupyter=True)

Spacy offers fast NER tagger based on a number of techniques. The exact algorithm hasn't been talked about in much detail but the documentation marks it as <font color=blue> "The exact algorithm is a pastiche of well-known methods, and is not currently described in any single publication " </font>

The entities identified by spacy NER tagger are as shown in the following table \(details here: [spacy_documentation](https://spacy.io/api/annotation#named-entities)\)

![](https://github.com/dipanjanS/nlp_workshop_odsc19/blob/master/Module03%20-%20Text%20Understanding/Resources/spacy_ner.png?raw=1)

### How is NER used?
NER is suited to any situation in which a high-level overview of a large quantity of text is helpful. With NER, you can, at a glance, understand the subject or theme of a body of text and quickly group texts based on their relevancy or similarity.
Some notable NER use cases include:

**Human resources**
Speed up the hiring process by summarizing applicants’ CVs; improve internal workflows by categorizing employee complaints and questions

**Customer support**
Improve response times by categorizing user requests, complaints and questions and filtering by priority keywords

**Search and recommendation engines**
Improve the speed and relevance of search results and recommendations by summarizing descriptive text, reviews, and discussions
Booking.com is a notable success story here

**Content classification**
Surface content more easily and gain insights into trends by identifying the subjects and themes of blog posts and news articles

**Health care**
Improve patient care standards and reduce workloads by extracting essential information from lab reports
Roche is doing this with pathology and radiology reports

**Academia**
Enable students and researchers to find relevant material faster by summarizing papers and archive material and highlighting key terms, topics, and themes
The EU’s digital platform for cultural heritage, Europeana, is using NER to make historical newspapers searchable

Wherever there are large quantities of text, NER can make life easier.

<h1 align = 'center'> NLTK vs SpaCy <h1> 

##### NLTK

This suite of libraries and applications from the University of Pennsylvania has gained significant traction in Python-based sentiment analysis systems since its conception in 2001. However, its accumulated clutter and educational remit can prove an impediment to enterprise-level development.

The NLTK platform provides accessible interfaces to more than fifty corpora and lexical sources mapped to machine learning algorithms, as well as a robust choice of parsers and utilities.

Besides its provision for sentiment analysis, the NLTK algorithms include named entity recognition, tokenizing, part-of-speech (POS), and topic segmentation. NLTK also boasts a good selection of third-party extensions, as well as the most wide-ranging language support of any of the libraries listed here.

On the other hand, this versatility can also be overwhelming. The sheer variety of some of its tool categories (it has nine stemming libraries as opposed to SpaCy's single stemmer, for instance) can make the framework look like an unfocused grab-bag of NLP archive material from the last fifteen years. This could add a layer of complexity to our project ideation and logistical planning.

The positive side of this is that no competitor to NLTK can boast such a comprehensive and useful base of documentation, as well as secondary literature and online resources. Free ongoing support is provided by a lively Google Group.

Although NLTK offers Unicode support for multiple languages, setting up non-English workflows is sometimes a more involved process than with other comparable Python libraries. NLTK's out-of-the-box non-English support relies on tertiary mechanisms such as translation layers, language-specific datasets, and models that leverage lexicons or morphemes.

NLTK does not provide neural network models or integrated word vectors, and its string-based processing workflow is arguably behind the times and out of synch with Python's OOP model. NLTK's sentence tokenization is also rudimentary compared to newer competitors.

If we're training up or onboarding staff that has existing NLTK experience, this very popular set of Python NLP libraries might be the obvious choice; but it comes with a burden of redundancy and complexity that could prove hard to navigate for a new team.


##### SpaCy

With the claim of 'industrial-strength natural language processing', the SpaCy Python library is appealing for sentiment analysis projects that need to remain performant at scale, or which can benefit from a highly object-oriented programming approach.

SpaCy is a multi-platform environment that runs on Cython, a superset of Python that enables the development of fast-executing C-based frameworks for Python. Consequently, SpaCy is the fastest-running solution at the moment according to research by Jinho D. Choi et.al.

Unlike NLTK, SpaCy is focused on industrial usage and maintains a minimal effective toolset, with updates superseding previous versions and tools, in contrast to NLTK. SpaCy's prebuilt models address essential NLP sectors such as named entity recognition, part-of-speech (POS) tagging and classification.

In contrast to its older rival, SpaCy tokenizes parsed text at both the sentence and word levels on an OOP model. It also offers integrated word vectors, Stanford NER and syntactic parsing (including chunking). Enabling sentiment analysis with SpaCy would involve devising your own framework, though; SpaCy has no native functionality for this purpose. 

However, capable as SpaCy’s models are, we're stuck with their structure. It’s therefore essential to ensure in advance that your long-term goals won’t go out-of-bounds at a later date and become incompatible with this sparse design philosophy.

While SpaCy has an overall speed advantage over its stablemates, its sentence tokenization can run slower than NLTK under certain configurations, which might be a consideration with large-scale pipelines.

Although it demands Unicode input, SpaCy's multi-language support is a work in progress, with models currently available for German, Greek, English, Spanish, French, Italian, Dutch and Portuguese.

With its deliberately lean feature set, SpaCy (as the project website admits) is not an environment suitable for testing different neural network architectures, and is not a good starting point to explore bleeding-edge developments in NLP. SpaCy remains more committed to a consistent platform experience that is focused on the core objectives of its users.

SpaCy is resource-intensive, and requires a 64-bit Python stack as well as higher memory requirements per instance (in the order of 2 or 3 gigabytes) than some of its rivals.

If your project fits within the deliberate limitations of the SpaCy framework, this may be the most 'production-ready', scalable and high-performing environment currently available for sentiment analysis development. If you're willing to integrate external sentiment analysis modules into its core services, SpaCy could offer unrivaled speed benefits.

If you're willing to integrate external sentiment analysis modules into its core services, SpaCy could offer unrivaled speed benefits

### What is an API?
An API is a set of programming code that enables data transmission between one software product and another. It also contains the terms of this data exchange.

![title](https://content.altexsoft.com/media/2019/06/https-lh6-googleusercontent-com-_nyclktg8po_wx5-.png)

Application programming interfaces consist of two components:

- Technical specification describing the data exchange options between solutions with the specification done in the form of a request for processing and data delivery protocols
- Software interface written to the specification that represents it.

### Types of APIs
**APIs by availability aka release policies**

In terms of release policies, APIs can be private, partner, and public.

![title](https://content.altexsoft.com/media/2019/06/https-lh6-googleusercontent-com-gwdr_ml7gwkq7sex.png)

**Private APIs.** These application software interfaces are designed for improving solutions and services within an organization. In-house developers or contractors may use these APIs to integrate a company’s IT systems or applications, build new systems or customer-facing apps leveraging existing systems. Even if apps are publicly available, the interface itself remains available only for those working directly with the API publisher. The private strategy allows a company to fully control the API usage.

**Partner APIs.** Partner APIs are openly promoted but shared with business partners who have signed an agreement with the publisher. The common use case for partner APIs is software integration between two parties. A company that grants partners with access to data or capability benefits from extra revenue streams. At the same time, it can monitor how the exposed digital assets are used, ensure whether third-party solutions using their APIs provide decent user experience, and maintain corporate identity in their apps.

**Public APIs.** Also known as developer-facing or external, these APIs are available for any third-party developers. A public API program allows for increasing brand awareness and receiving an additional source of income when properly executed.

### What is the Twitter API?
Twitter APIis a set of programmatic endpoints that can be used to learn from and engage with the conversation on Twitter. 

This API allows you to find and retrieve, engage with, or create a variety of different resources including the following:
- Tweets
- Users
- Direct Messages
- Lists
- Trends
- Media
- Places

Here we will use the Twitter RESTful API to access data about both Twitter users and what they are tweeting about.

### How to start using Twitter api?

To get started, you’ll need to do the following things:

- Set up a Twitter account if you don’t have one already.
- Using your Twitter account, you will need to apply for Developer Access and then create an application that will generate the API credentials that you will use to access Twitter from Python.
- Import the tweepy package.

To know how get API credential you can also follow this link: [Link](https://www.youtube.com/watch?v=vlvtqp44xoQ&t=204s)

### What Is Tweepy?
Tweepy is an open source Python package that gives you a very convenient way to access the Twitter API with Python. Tweepy includes a set of classes and methods that represent Twitter’s models and API endpoints, and it transparently handles various implementation details, such as:

- Data encoding and decoding
- HTTP requests
- Results pagination
- OAuth authentication
- Rate limits
- Streams

To install tweepy use:
**pip install tweepy**

### Twitter API in python.
Once every thing setup we start with importing important libraries in python.

In [63]:
import os
import pandas as pd
import tweepy as tw

To access the Twitter API, you will need 4 things from the your Twitter App page. These keys are located in your Twitter app settings in the Keys and Access Tokens tab.

- consumer key
- consumer seceret key
- access token key
- access token secret key

In [64]:
#First we will need define your keys,
# consumer_key= 'yourworkhere'
# consumer_secret= 'yourworkhere'
# access_token= 'yourworkhere'
# access_token_secret= 'yourworkhere'

In [65]:
#First we will need define your keys,
consumer_key= 'H05jWnthRWmTgOdxWOhuOSsqt'
consumer_secret= 'uiwYC3rRW2L0Gb3u8a5XkhxbtFkp8ipy7dccmIRwfNfhZN6FXC'
access_token= '726729960275189760-eBgimtjY4mDM7k23RHni9PKuo8fkJKR'
access_token_secret= 'MwrK7pNyHaIae6t4KCajthzDhvkLEt5gwTkV35dtbzIvD'

In [66]:
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)

#### Search Twitter for Tweets
Now you are ready to search Twitter for recent tweets! Start by finding recent tweets that use the #bank hashtag. You will use the `.Cursor` method to get an object containing tweets containing the hashtag #bank.

To create this query, you will define the:

- Search term - in this case **#bank**
- the start date of your search

Remember that the Twitter API only allows you to access the past few weeks of tweets, so you cannot dig into the history too far.

In [67]:
# Define the search term and the date_since date as variables
search_words = "#FederalBank"
date_since = "2021-06-01"

We will use `.Cursor()` to search twitter for tweets containing the search term #bank. You can restrict the number of tweets returned by specifying a number in the .items() method. .items(5) will return 5 of the most recent tweets.

In [68]:
# Collect tweets
tweets = tw.Cursor(api.search,
              q=search_words,
              lang="en",
              since=date_since).items(5)
tweets

<tweepy.cursor.ItemIterator at 0x1f7218a0d00>

`.Cursor()` returns an object that you can iterate or loop over to access the data collected. Each item in the iterator has various attributes that you can access to get information about each tweet including:

1.the text of the tweet

2.who sent the tweet

3.the date the tweet was sent

In [69]:
tweets = tw.Cursor(api.search,
              q=search_words,
              lang="en",
              since=date_since).items(5)
tweets
# Iterate and print tweets
for tweet in tweets:
    print(tweet.text)

RT @TreasureHunt_TH: #FederalBank
The bank has posted good result in a tougher condition. The operational &amp; financial matrix has improved a…
Bank Nifty tops near 390 pts amid Q1 advances and deposits data; Federal Bank, SBI, ICICI Bank, HDFC Bank, IndusInd… https://t.co/15oZzNBNjE
#Federalbank One only has to take a look at the bank's AR for FY21 to gauge its industry leading transformation awa… https://t.co/PrAPklqIJM
#FederalBank can give a good move above 88, SL below 85. https://t.co/6rxADplVln
#federalbnk as discussed over weekend #hporb #federalbank https://t.co/5FjvPc4RIE


#### To Keep or Remove Retweets

A retweet is when someone shares someone else’s tweet. It is similar to sharing in Facebook. Sometimes you may want to remove retweets as they contain duplicate content that might skew our analysis if we are only looking at word frequency. Other times, you may want to keep retweets.

Below you ignore all retweets by adding -filter:retweets to your query. The Twitter API documentation has information on other ways to customize your queries.

In [70]:
new_search = search_words + " -filter:retweets"
new_search

'#FederalBank -filter:retweets'

In [71]:
tweets = tw.Cursor(api.search,
                       q=new_search,
                       lang="en",
                       since=date_since).items(5)

[tweet.text for tweet in tweets]

['Bank Nifty tops near 390 pts amid Q1 advances and deposits data; Federal Bank, SBI, ICICI Bank, HDFC Bank, IndusInd… https://t.co/15oZzNBNjE',
 "#Federalbank One only has to take a look at the bank's AR for FY21 to gauge its industry leading transformation awa… https://t.co/PrAPklqIJM",
 '#FederalBank can give a good move above 88, SL below 85. https://t.co/6rxADplVln',
 '#federalbnk as discussed over weekend #hporb #federalbank https://t.co/5FjvPc4RIE',
 'Federal Bank reports Rs132,770cr of gross advances in Q1, deposits logs single-digit growth; Stock climbs over 1%… https://t.co/luR91FE3iS']

### Who is Tweeting About Bank?
we can access a wealth of information associated with each tweet. Below is an example of accessing the users who are sending the tweets related to #bank and their locations. Note that user locations are manually entered into Twitter by the user. Thus, you will see a lot of variation in the format of this value.

`tweet.user.screen_name` provides the user’s twitter handle associated with each tweet.

`tweet.user.location` provides the user’s provided location.

In [72]:
tweets = tw.Cursor(api.search, 
                           q=new_search,
                           lang="en",
                           since=date_since).items(5)

users_locs = [[tweet.user.screen_name, tweet.user.location] for tweet in tweets]
users_locs

[['IIFLMarkets', 'Mumbai, India'],
 ['purushgem', ''],
 ['Traderknight007', ''],
 ['usb_3Dot0', ''],
 ['IIFLMarkets', 'Mumbai, India']]

### Create a Pandas Dataframe From A List of Tweet Data.
One we have a list of items that you wish to work with, we can create a pandas dataframe that contains that data.

In [73]:
tweets = tw.Cursor(api.search, 
                           q=new_search,
                           lang="en",
                           since=date_since).items(1000)

users_locs = [[tweet.user.screen_name, tweet.user.location,tweet.text] for tweet in tweets]
users_locs

[['IIFLMarkets',
  'Mumbai, India',
  'Bank Nifty tops near 390 pts amid Q1 advances and deposits data; Federal Bank, SBI, ICICI Bank, HDFC Bank, IndusInd… https://t.co/15oZzNBNjE'],
 ['purushgem',
  '',
  "#Federalbank One only has to take a look at the bank's AR for FY21 to gauge its industry leading transformation awa… https://t.co/PrAPklqIJM"],
 ['Traderknight007',
  '',
  '#FederalBank can give a good move above 88, SL below 85. https://t.co/6rxADplVln'],
 ['usb_3Dot0',
  '',
  '#federalbnk as discussed over weekend #hporb #federalbank https://t.co/5FjvPc4RIE'],
 ['IIFLMarkets',
  'Mumbai, India',
  'Federal Bank reports Rs132,770cr of gross advances in Q1, deposits logs single-digit growth; Stock climbs over 1%… https://t.co/luR91FE3iS'],
 ['PJs_PJs',
  '',
  'Ex Div:#ASMTechnologies #JSWSteel #Mindtree\nF&amp;O ban:#NALCO #PNB\n-ve res:#NagarjunaFertilisers\n-ve Q1 upd:… https://t.co/JRp53iD3f9'],
 ['maduraitrading',
  'INDIA',
  '#FederalBank | The bank’s total deposits in Q1FY

In [75]:
tweet_text = pd.DataFrame(data=users_locs, 
                    columns=['user', "location",'Tweet'])
tweet_text

Unnamed: 0,user,location,Tweet
0,IIFLMarkets,"Mumbai, India",Bank Nifty tops near 390 pts amid Q1 advances ...
1,purushgem,,#Federalbank One only has to take a look at th...
2,Traderknight007,,"#FederalBank can give a good move above 88, SL..."
3,usb_3Dot0,,#federalbnk as discussed over weekend #hporb #...
4,IIFLMarkets,"Mumbai, India","Federal Bank reports Rs132,770cr of gross adva..."
5,PJs_PJs,,Ex Div:#ASMTechnologies #JSWSteel #Mindtree\nF...
6,maduraitrading,INDIA,#FederalBank | The bank’s total deposits in Q1...
7,AEHarshada,"Mumbai, India",#FederalBank #Q1FY22Update \n\nDeposits at Rs1...
8,PratikSingh_,"Mumbai, India",@FederalBankHelp Konsa prepaid card be? Admin ...
9,PratikSingh_,"Mumbai, India",@mona_sensible @FederalBankLtd What is this @F...


In [77]:
#save the file in local drive.
tweet_text.to_csv('Tweet.csv',index=False)

### GoogleNews
Google News is a news aggregator service developed by Google. It presents a continuous flow of links to articles organized from thousands of publishers and magazines.

GoogleNews 1.5.8- Python Module for Extracting Google News.

**pip install GoogleNews**



In [78]:
# Intializing the googlenews object
from GoogleNews import GoogleNews
googlenews = GoogleNews()

In [79]:
#choose language
googlenews = GoogleNews(lang='en')

In [80]:
#choose period (period and custom day range should not set together)

In [81]:
# declare a getnews object 
googlenews.get_news('Federal Bank')

In [82]:
googlenews.results()

[{'title': 'Federal Bank to raise Rs 916 cr in equity capital from IFC, affiliates',
  'desc': 'amp',
  'date': '17 Jun',
  'datetime': None,
  'link': 'news.google.com/./articles/CAIiEIaCZBTuarhOTtnB7cXckGEqGQgEKhAIACoHCAowsLXdCjCm3dEBMOThpAM?uo=CAUiiwFodHRwczovL3d3dy5idXNpbmVzcy1zdGFuZGFyZC5jb20vYXJ0aWNsZS9maW5hbmNlL2ZlZGVyYWwtYmFuay10by1yYWlzZS1ycy05MTYtY3ItaW4tZXF1aXR5LWNhcGl0YWwtZnJvbS1pZmMtYWZmaWxpYXRlcy0xMjEwNjE3MDAzMDdfMS5odG1s0gEA&hl=en-IN&gl=IN&ceid=IN%3Aen',
  'img': 'https://lh3.googleusercontent.com/g52ixI0JOveh2EQxFZ6lLpzrBvNf7_mSSLX7Mz5QBZTWGYLKG2uPz1j7j_UO_K159UUH4oJHyIFcUPqDqA=-p-df-h100-w100',
  'media': None,
  'site': 'Business Standard'},
 {'title': 'Bank Nifty tops near 390 pts amid Q1 advances and deposits data; Federal Bank, SBI, ICICI Bank, HDFC Bank, IndusInd drive',
  'desc': 'amp',
  'date': '14 hours ago',
  'datetime': datetime.datetime(2021, 7, 5, 4, 3, 15, 919901),
  'link': 'news.google.com/./articles/CBMizQFodHRwczovL3d3dy5pbmRpYWluZm9saW5lLmNvbS9hcnRp

In [83]:
df=pd.DataFrame(googlenews.results())

In [84]:
df

Unnamed: 0,title,desc,date,datetime,link,img,media,site
0,Federal Bank to raise Rs 916 cr in equity capi...,amp,17 Jun,NaT,news.google.com/./articles/CAIiEIaCZBTuarhOTtn...,https://lh3.googleusercontent.com/g52ixI0JOveh...,,Business Standard
1,Bank Nifty tops near 390 pts amid Q1 advances ...,amp,14 hours ago,2021-07-05 04:03:15.919901,news.google.com/./articles/CBMizQFodHRwczovL3d...,https://lh4.googleusercontent.com/proxy/da_dTw...,,Indiainfoline
2,From partnering with fintech startups to helpi...,amp,5 days ago,2021-06-30 18:03:15.920899,news.google.com/./articles/CAIiEIhrKucLj-2teuC...,https://lh3.googleusercontent.com/wI9uPZqNsI3_...,,YourStory
3,Covid-19 impact: Federal Bank provides about 4...,amp,26 Jun,NaT,news.google.com/./articles/CAIiEFvHHep-NR1Rfnl...,https://lh6.googleusercontent.com/proxy/Kdmp7Z...,,BusinessLine
4,"Buy Federal Bank, target price Rs 110: Motilal...",amp,7 days ago,2021-06-28 18:03:15.922898,news.google.com/./articles/CAIiEDC2Ca4kY23JOvZ...,https://lh3.googleusercontent.com/jYBF7q7WVQL0...,,Economic Times
...,...,...,...,...,...,...,...,...
76,"Gold loans power up Federal Bank’s Q3, but str...",amp,21 Jan,NaT,news.google.com/./articles/CAIiEP4ETbW1_6UTZoi...,https://lh6.googleusercontent.com/proxy/EYQ12h...,,Mint
77,Stock alert! Strong Q4 performance makes Morga...,amp,18 May,NaT,news.google.com/./articles/CAIiELOta_uUlxBUUPf...,https://lh3.googleusercontent.com/wL8Xht459fjP...,,Zee Business
78,"How City Union Bank, Federal Bank have sustain...",amp,14-Oct-2020,NaT,news.google.com/./articles/CAIiENid_yz4SJfKPAq...,https://lh3.googleusercontent.com/Q6JZfkhoSYOg...,,Business Standard
79,Awesome Oscillator suggests buying opportunity...,amp,15-Nov-2020,NaT,news.google.com/./articles/CBMilQFodHRwczovL3d...,https://lh6.googleusercontent.com/proxy/RLmKeU...,,Moneycontrol.com


In [85]:
#Searching the news about federal bank.
googlenews=GoogleNews(start='05/01/2021',end='05/31/2020') #googlenews object
googlenews.search('Federal Bank India') #searchword
for i in range(1,20): #looping over and getting the news till 20 pages.
    googlenews.getpage(i)
    result=googlenews.result()
    df1=pd.DataFrame(result)

In [86]:
df1

Unnamed: 0,title,media,date,datetime,desc,link,img
0,Federal Bank introduces Pre-Booking Appointmen...,BFSI,18-Jun-2020,,Federal Bank introduces Pre-Booking Appointmen...,https://bfsi.eletsonline.com/federal-bank-intr...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
1,Federal Bank Q2 PAT down 26.2% at Rs 308 cr on...,Business Standard,17-Oct-2020,,Private sector lender Federal Bank on Friday r...,https://www.business-standard.com/article/fina...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
2,"Buy Federal Bank, target price Rs 80: Motilal ...",The Economic Times,04-Jan-2021,,"Federal Bank Ltd., incorporated in the year 19...",https://economictimes.indiatimes.com/markets/s...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
3,Federal Bank Q2: Total deposits up 12 per cent...,Business Line,04-Oct-2020,,According to provisional numbers reported by F...,https://www.thehindubusinessline.com/money-and...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
4,Federal Bank to launch credit cards in next fe...,Business Line,01-Mar-2021,,Federal Bank is set to launch credit cards in ...,https://www.thehindubusinessline.com/money-and...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
...,...,...,...,...,...,...,...
195,Home Federal Bank and FHLB Dallas Provide $750...,Business Wire,04-Feb-2021,,Home Federal Bank and FHLB Dallas have awarded...,https://www.businesswire.com/news/home/2021020...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
196,Federal Bank Regulators Issue Rule Supporting ...,,24-Mar-2021,,"On March 9, federal bank regulatory agencies a...",https://www.jdsupra.com/legalnews/federal-bank...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
197,India’s banks are racing to lend against a $1....,Mint,30-Jul-2020,,and Federal Bank Ltd. are expanding the loans ...,https://www.livemint.com/industry/banking/indi...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."
198,HDFC Mutual Fund's Bank ETF NFO explained in 1...,Mint,10-Aug-2020,,9) The 12 stocks in the index include- HDFC Ba...,https://www.livemint.com/mutual-fund/mf-news/h...,"data:image/gif;base64,R0lGODlhAQABAIAAAP//////..."


<h1><center>END</center></h1>