In [1]:
# import spacy.cli
# spacy.cli.download("en_core_web_sm")

In [2]:
import spacy
from spacy import displacy

In [3]:
nlp = spacy.load('en_core_web_sm')

In [4]:
text = "When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the trained pipelines typically include a tagger, a lemmatizer, a parser and an entity recognizer."

In [5]:
doc = nlp(text)
doc

When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the trained pipelines typically include a tagger, a lemmatizer, a parser and an entity recognizer.

In [6]:
for token in doc :
    print(token)

When
you
call
nlp
on
a
text
,
spaCy
first
tokenizes
the
text
to
produce
a
Doc
object
.
The
Doc
is
then
processed
in
several
different
steps
–
this
is
also
referred
to
as
the
processing
pipeline
.
The
pipeline
used
by
the
trained
pipelines
typically
include
a
tagger
,
a
lemmatizer
,
a
parser
and
an
entity
recognizer
.


In [7]:
nlp.add_pipe('sentencizer', before='parser')

<spacy.pipeline.sentencizer.Sentencizer at 0x180ff91b500>

In [17]:
doc = nlp(text)
doc

When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object. The Doc is then processed in several different steps – this is also referred to as the processing pipeline. The pipeline used by the trained pipelines typically include a tagger, a lemmatizer, a parser and an entity recognizer.

In [9]:
for sent in doc.sents:
    print(sent)

When you call nlp on a text, spaCy first tokenizes the text to produce a Doc object.
The Doc is then processed in several different steps – this is also referred to as the processing pipeline.
The pipeline used by the trained pipelines typically include a tagger, a lemmatizer, a parser and an entity recognizer.


In [10]:
from spacy.lang.en.stop_words import STOP_WORDS

In [11]:
stopwords = list(STOP_WORDS)
stopwords

['everything',
 '‘s',
 'ca',
 'seem',
 'everywhere',
 'used',
 'even',
 'formerly',
 'one',
 'may',
 'here',
 'yours',
 'name',
 'thereafter',
 'seeming',
 'front',
 'toward',
 'noone',
 'give',
 'n‘t',
 'this',
 'can',
 '’re',
 'none',
 'some',
 'you',
 'whereupon',
 'i',
 'nine',
 'your',
 'have',
 'these',
 'becoming',
 'which',
 "'re",
 '‘ll',
 'eleven',
 'go',
 'wherever',
 'until',
 'hereafter',
 'what',
 'bottom',
 'above',
 'enough',
 '’s',
 'than',
 'six',
 'anywhere',
 'beyond',
 'could',
 'are',
 'before',
 'not',
 'those',
 'keep',
 'unless',
 'nowhere',
 'take',
 'beforehand',
 'my',
 'moreover',
 'whereafter',
 'there',
 'done',
 'own',
 '‘ve',
 'also',
 'anything',
 'been',
 'twenty',
 'must',
 'while',
 'ourselves',
 'ever',
 'other',
 'hereby',
 'her',
 'elsewhere',
 'with',
 'into',
 'that',
 'but',
 'whoever',
 'whatever',
 'doing',
 'per',
 'where',
 'three',
 'sometime',
 'few',
 'were',
 'just',
 'meanwhile',
 'had',
 'she',
 'when',
 'five',
 'others',
 'against'

In [12]:
len(stopwords)

326

In [13]:
tokenized_text = []
for word in doc:
    if word.is_stop == False:
        tokenized_text.append(word)
tokenized_text

[nlp,
 text,
 ,,
 spaCy,
 tokenizes,
 text,
 produce,
 Doc,
 object,
 .,
 Doc,
 processed,
 different,
 steps,
 –,
 referred,
 processing,
 pipeline,
 .,
 pipeline,
 trained,
 pipelines,
 typically,
 include,
 tagger,
 ,,
 lemmatizer,
 ,,
 parser,
 entity,
 recognizer,
 .]

## Lemmatization

In [18]:
tokenized_text

[nlp,
 text,
 ,,
 spaCy,
 tokenizes,
 text,
 produce,
 Doc,
 object,
 .,
 Doc,
 processed,
 different,
 steps,
 –,
 referred,
 processing,
 pipeline,
 .,
 pipeline,
 trained,
 pipelines,
 typically,
 include,
 tagger,
 ,,
 lemmatizer,
 ,,
 parser,
 entity,
 recognizer,
 .]

In [19]:
for lem in tokenized_text:
    print(lem.text, lem.lemma_)

nlp nlp
text text
, ,
spaCy spaCy
tokenizes tokenize
text text
produce produce
Doc Doc
object object
. .
Doc Doc
processed process
different different
steps step
– –
referred refer
processing processing
pipeline pipeline
. .
pipeline pipeline
trained train
pipelines pipeline
typically typically
include include
tagger tagger
, ,
lemmatizer lemmatizer
, ,
parser parser
entity entity
recognizer recognizer
. .


## Parts Of Speech Tagging

In [20]:
for token in tokenized_text:
    print(token.text, token.pos_)

nlp NOUN
text NOUN
, PUNCT
spaCy PROPN
tokenizes VERB
text NOUN
produce VERB
Doc PROPN
object NOUN
. PUNCT
Doc PROPN
processed VERB
different ADJ
steps NOUN
– PUNCT
referred VERB
processing NOUN
pipeline NOUN
. PUNCT
pipeline NOUN
trained VERB
pipelines NOUN
typically ADV
include VERB
tagger NOUN
, PUNCT
lemmatizer NOUN
, PUNCT
parser NOUN
entity NOUN
recognizer ADV
. PUNCT


In [22]:
displacy.render(doc, style = 'dep')

## Entity Detection

In [27]:
text = """The student protest at Shahjalal University of Science and Technology (Sust) in Sylhet demanding Vice Chancellor Prof Farid Uddin Ahmed’s removal apparently had no solution in sight as of yesterday.   Sust students have been protesting, initially only demanding the resignation of Zafrin Ahmed, provost of Begum Sirajunnesa Chowdhury Hall that turned into the one-point demand of removing the VC. At a stage they went into a fast unto death movement, which is still going on amid pressure from different quarters.It all started on the night of January 13. Students claimed that the hall’s residents called the provost over the phone to talk about some issues, including the misbehavior of the hall’s security guards. Zafrin Ahmed allegedly misbehaved with them. That night female students started a demonstration in front of the hall at around 10:30pm. The demonstration later moved to the VC’s quarters.The students had two demands—removal of Zafrin Ahmed and other assistant provosts and an apology. The VC talked to them and scheduled a discussion the next day at his office.On January 14, a delegation of the students had a meeting with the VC at his office with demands, including eliminating all mismanagement of the hall immediately to ensure a healthy and normal environment and immediately appointing a student-friendly and responsible presiding committee.The students continued a sit-in in front of the VC’s office. In the afternoon they issued an ultimatum till Saturday evening, calling for the removal of the provost. They halted their movement, expecting the demands to be met.The Sust authorities that afternoon appointed assistant provost Zobaida Kanak Khan as the acting provost but the students were not satisfied.     On January 15, they took to the streets again even as the authorities requested them to go back to the halls. In the evening, while the university proctor went to the students and requested them to leave the streets, activists of the Chhatra League allegedly swooped on the students and activists of some left-leaning student organizations.The next day female students took to the streets again protesting the attack. Some students from other halls joined them. They announced a boycott of all classes and examinations until all demands were met.On the day, leaders of the teachers’ association, heads of departments and proctor talked to the students and announced that they would accept the demands, and sought a week’s time for implementation of the demands. But the students rejected the overture and wanted the demands to be met at once.The VC was confronted by them as he was going to attend a meeting of deans. Teachers and officials cordoned him off and took him to the ICT bhaban. Students locked the building’s gate. Police and students were soon locked in clashes when the latter turned up to rescue the VC. Students said a number of them were injured when the police wielded batons on them.Police action on the campus drew much criticism across the country.That night the university was declared closed indefinitely and students were asked to leave the dormitories by noon the next day. On January 17 new three-point demands were announced, including the VC’s resignation, public apology from the VC for ordering police action against students, and ensuring accountability of all involved in the attack on students. The students also declared the VC persona non grata on the campus.  Police filed a case against over 200 unnamed students.The next day the students issued an ultimatum to the VC to resign by noon on Jan 19 and announced a fast unto death program if VC was not removed. On January 19 twenty-four students went on a hunger strike unto death in front of the VC’s residence.Meanwhile, an audio clip of the VC, in which he was heard making derogatory comments about the female students of Jahangirnagar University, was disseminated.On January 20, five students were hospitalized and nine fell sick, teachers asked students to sit for a discussion, students refused to talk and urged teachers to join their cause.The Jahangirnagar University Teachers' Association today protested against the SUST VC for making offensive remarks about female students of JU, demanding that he publicly apologize and withdraw the indecent remarks.On January 21, the education minister in a video call urged five of the protesters to meet her for talks in Dhaka. The next day Education Minister Dipu Moni met a five-member delegation of Sust teachers. She said the issue would be resolved through discussions with students.The next day education minister Dr Dipu Moni had a virtual meeting with the protesting students, requesting them to submit complaints in written form and breaking the fast. The one-hour long meeting ended without any outcome. Students said they would come back to her after discussion with their fellow students   The next day the meeting was not held as students were adamant on their one-point demand of removing the VC. They took position in front of the VC's residence, allowing only law enforcement and media personnel to enter the house. The electricity connection of the VC’s official residence was severed by students but was reconnected after 30 hours.The Sust Teachers' Association urged the government to take steps regarding VC Farid’s resignation, and meet all demands of the students.On January 24, the Sust VC apologized to Jahangirnagar University students and teachers for his remarks on female students of JU.On January 25, mobile banking numbers of the current and former SUST students who used to receive financial assistance for the ongoing protest stopped working, alleged students. A medical team from Sylhet MAG Osmani Medical College Hospital discontinued services to the agitating students."""

In [28]:
doc =  nlp (text)

In [29]:
displacy.render(doc, style = 'ent')