<a href="https://colab.research.google.com/github/anirbansen3027/NLP_Basics/blob/main/NLP_tasks_and_APIs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 1. Keyphrase Extraction - spacy(textacy)

In [None]:
! pip install textacy -q

In [None]:
import spacy
import textacy.ke
from textacy import *

en = textacy.load_spacy_lang("en_core_web_sm")

In [None]:
my_text = open("/content/nlphistory.txt").read()
doc = textacy.make_spacy_doc(my_text, lang = en)

In [None]:
terms = [term for term, weight in textacy.ke.textrank(doc)]
terms[:5]

['successful natural language processing system',
 'statistical machine translation system',
 'natural language system',
 'statistical natural language processing',
 'natural language task']

## 2. Language Detection & Translation - google_trans_new | textblob

In [None]:
! pip install google_trans_new -q

In [None]:
from textblob import TextBlob
from google_trans_new import google_translator
translator = google_translator()
from google_trans_new import LANGUAGES
len(LANGUAGES)

108

In [None]:
#Hindi Sentence Detection
# Detect the blob’s language using the Google Translate API.
# Requires an internet connection.
hi_blob = TextBlob(u'तुम्हारा नाम क्या है')
print(hi_blob.detect_language())
print(hi_blob.translate(to='en'))

hi
What is your name


In [None]:
#German Detection
de_blob = TextBlob(u"Maschinelles Lernen ist ein interessantes Thema zum Lernen")
print(de_blob.detect_language())
print(de_blob.translate(to='en'))

de
Machine learning is an interesting topic to learn


In [None]:
print(translator.translate(u'तुम्हारा नाम क्या है'))
print(translator.translate(u'التعلم الآلي هو موضوع مثير للاهتمام للتعلم'))

What is your name 
Machine learning is an interesting topic to learn 


In [None]:
translator.detect("Les livres sont les meilleurs amis de l'homme")

['fr', 'french']

##  3. Spelling Correction - symspellpy

In [None]:
! pip install symspellpy



In [None]:
import pkg_resources
from symspellpy import SymSpell, Verbosity

sym_spell = SymSpell(max_dictionary_edit_distance=2, prefix_length=7)
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_dictionary_en_82_765.txt")
# term_index is the column of the term and count_index is the
# column of the term frequency
sym_spell.load_dictionary(dictionary_path, term_index=0, count_index=1)

# lookup suggestions for single-word input strings
input_term = "memeebers"  # misspelling of "members"

# max edit distance per lookup
# (max_edit_distance_lookup <= max_dictionary_edit_distance)
suggestions = sym_spell.lookup(input_term, Verbosity.CLOSEST,
                               max_edit_distance=2)
# display suggestion term, edit distance and term frequency
for suggestion in suggestions:
    print(suggestion)

members, 2, 226656153
remembers, 2, 2102056


## 4. Named Entity Recognition - spacy

In [None]:
import spacy
nlp = spacy.load("en_core_web_sm")
text_from_fig = "On Tuesday, Apple announced its plans for another major chunk of the money: It will buy back a further $75 billion in stock."
doc = nlp(text_from_fig)

for ent in doc.ents:
  if ent.text:
    print(ent.text, "\t", ent.label_)

Tuesday 	 DATE
Apple 	 ORG
$75 billion 	 MONEY


Models trained on the OntoNotes 5 corpus support the following entity types: 
* PERSON	People, including fictional
* NORP	Nationalities or religious or political groups.
* FAC	Buildings, airports, highways, bridges, etc.
* ORG	Companies, agencies, institutions, etc.
* GPE	Countries, cities, states.
* LOC	Non-GPE locations, mountain ranges, bodies of water.
* PRODUCT	Objects, vehicles, foods, etc. (Not services.)
* EVENT	Named hurricanes, battles, wars, sports events, etc.
* WORK_OF_ART	Titles of books, songs, etc.
* LAW	Named documents made into laws.
* LANGUAGE	Any named language.
* DATE	Absolute or relative dates or periods.
* TIME	Times smaller than a day.
* PERCENT	Percentage, including ”%“.
* MONEY	Monetary values, including unit.
* QUANTITY	Measurements, as of weight or distance.
* ORDINAL	“first”, “second”, etc.
* CARDINAL 	Numerals that do not fall under another type.



##  5. Part of Speech Tagging - NLTK 

* Tag	| Meaning	| English Examples
* CC | coordinating conjunction
* CD | cardinal digit
* DT | determiner
* EX | existential | there (like: “there is” … think of it like “there exists”)
* FW | foreign word
* IN | preposition/subordinating conjunction
* JJ | adjective |‘big’
* JJR | adjective, comparative | ‘bigger’
* JJS | adjective, superlative | ‘biggest’
* LS |list marker | 1)
* MD | modal | could, will
* NN | noun, singular | ‘desk’
* NNS | noun plural | ‘desks’
* NNP | proper noun, singular | ‘Harrison’
* NNPS | proper noun, plural | ‘Americans’
* PDT | predeterminer | ‘all the kids’
* POS | possessive ending | parent‘s
* PRP | personal pronoun | I, he, she
* PRP\$ | possessive pronoun | my, his, hers
* RB | adverb | very, silently,
* RBR |adverb, comparative | better
* RBS | adverb, superlative | best
* RP | particle |give up
* TO | to go ‘to‘ the store.
* UH | interjection | errrrrrrrm
* VB | verb, base form | take
* VBD | verb, past tense | took
* VBG | verb, gerund/present participle | taking
* VBN | verb, past participle | taken
* VBP | verb, sing. present, non-3d | take
* VBZ | verb, 3rd person | sing. present takes
* WDT | wh-determiner | which
* WP | wh-pronoun | who, what
* WP\$ | possessive wh-pronoun | whose
* WRB | wh-abverb | where, when

In [None]:
import nltk
nltk.download('averaged_perceptron_tagger')
nltk.download('averaged_perceptron_tagger_ru')
from nltk import pos_tag, word_tokenize

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package averaged_perceptron_tagger_ru to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_ru is already up-to-
[nltk_data]       date!


In [None]:
pos_tag(word_tokenize("John's big idea isn't all that bad."))

[('John', 'NNP'),
 ("'s", 'POS'),
 ('big', 'JJ'),
 ('idea', 'NN'),
 ('is', 'VBZ'),
 ("n't", 'RB'),
 ('all', 'PDT'),
 ('that', 'DT'),
 ('bad', 'JJ'),
 ('.', '.')]

In [None]:
pos_tag(word_tokenize("Илья оторопел и дважды перечитал бумажку."), lang='rus')  

[('Илья', 'S'),
 ('оторопел', 'V'),
 ('и', 'CONJ'),
 ('дважды', 'ADV'),
 ('перечитал', 'V'),
 ('бумажку', 'S'),
 ('.', 'NONLEX')]

 ## 6. Automatic Speech Recognition

## 7. Temporal Information Extraction | Duckling

In [None]:
#installing the package
!pip install JPype1==0.7.4 #This is required as duckling is not compatible with recent versions of JPype. 
!pip install duckling==1.8.0



In [None]:
from duckling import DucklingWrapper
from pprint import pprint

In [None]:
d = DucklingWrapper()
result = d.parse(u'You owe me twenty bucks, please call me today')

TypeError: ignored