# Tutorial: Working with Text

This tutorial covers current methods of working with text data in Python using the following methods: 

> * Step1: Loading & Cleaning data
> * Step2: Exploration
> * Step3: Entity Recognition & Extraction 
> * Step4: Summarization
> * Step5: Topic Modeling


## Guggenheim Museum Art Books

To explore steps above, we will leverage art books made publicly available by Guggenheim Museum. The full reporistory of books is available here: https://archive.org/details/guggenheimmuseum?and%5B%5D=mediatype%3A%22texts%22&sort=titleSorter&page=1. The data can be found in ../data/books/{1, 2, ..., 220}.txt . There are about 207 art books. 


## Step 1: Load the data

First, let's load the book text and ensure proper encoding of the document. 
Please select the book that you want to load:
   * Open the ../data/book_list.csv
   * Select the book you are interested to work with (e.g. "Marc Chagall and the Jewish theater"
   * Find the corresponding book_urn (e.g. "chagallj00chag")
   * Create a url by replacing book urn in the following url https://raw.githubusercontent.com/AnnaNican/wcaiconf_2019/master/data/books/[your book].txt 
   (e.g. https://raw.githubusercontent.com/AnnaNican/wcaiconf_2019/master/data/books/chagallj00chag.txt )
   * Place the url below in the file url

In [1]:
import urllib2

fileurl = 'https://raw.githubusercontent.com/AnnaNican/wcaiconf_2019/master/data/books/chagallj00chag.txt'
booktext = urllib2.urlopen(fileurl).read()

booktext = booktext.replace('\n', '')
booktext = unicode(booktext, 'utf-8')

print(booktext)



# Step 2: Exploring the data


##  Tokenisation

Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens , perhaps at the same time throwing away certain characters, such as punctuation. In short, "token" is a meaningful units of text

* Words
* Phrases
* Punctuation
* Numbers
* Dates
* Currencies
* Hashtags
* ...?


These tokens are often loosely referred to as terms or words, but it is sometimes important to make a type/token distinction. A token is an instance of a sequence of characters in some particular document that are grouped together as a useful semantic unit for processing. 

In [2]:
from nltk.tokenize import word_tokenize

try:  # py3
    all_tokens = [t for t in word_tokenize(booktext)]
except UnicodeDecodeError:  # py27
#     all_tokens = [t for t in word_tokenize(corpus_all_in_one.decode('utf-8'))]
    all_tokens = [t for t in word_tokenize(booktext.decode('utf-8'))]

print("Total number of tokens: {}".format(len(booktext)))
print("Sample of tokens: {}".format(booktext[0:10]))


Total number of tokens: 687938
Sample of tokens: GUGGENHEIM


## Counting Words¶

We start with a simple word count using **collections.Counter**
We are interested in finding: how many times a word occurs across the whole corpus (total number of occurrences)


In [3]:
from collections import Counter

total_term_frequency = Counter(all_tokens)

for word, freq in total_term_frequency.most_common(20):
    print("{}\t{}".format(word, freq))

,	9484
the	7179
.	6488
of	4064
and	3585
in	2923
a	2385
to	2096
Chagall	1197
``	1062
is	1053
's	994
his	973
''	969
)	943
The	919
(	914
was	882
that	850
with	831


## Stop-words

We notice that some of the most common words above are not very interesting.
These words are called stop-words, and they don't provide any particular meaning in isolation (articles, conjunctions, pronouns, etc.) So we will use **nltk.corpus** to remove these words from our tokens.

Notice:
there is no "universal" list of stop-words
removing stop-words can be useful or damaging depending on the application
e.g. if you remove stop-words, what do you do with "The Who", "to be or not to be" and similar phrases?

In [4]:
from nltk.corpus import stopwords
import string

print(stopwords.words('english'))
print(len(stopwords.words('english')))
print(string.punctuation)

[u'i', u'me', u'my', u'myself', u'we', u'our', u'ours', u'ourselves', u'you', u'your', u'yours', u'yourself', u'yourselves', u'he', u'him', u'his', u'himself', u'she', u'her', u'hers', u'herself', u'it', u'its', u'itself', u'they', u'them', u'their', u'theirs', u'themselves', u'what', u'which', u'who', u'whom', u'this', u'that', u'these', u'those', u'am', u'is', u'are', u'was', u'were', u'be', u'been', u'being', u'have', u'has', u'had', u'having', u'do', u'does', u'did', u'doing', u'a', u'an', u'the', u'and', u'but', u'if', u'or', u'because', u'as', u'until', u'while', u'of', u'at', u'by', u'for', u'with', u'about', u'against', u'between', u'into', u'through', u'during', u'before', u'after', u'above', u'below', u'to', u'from', u'up', u'down', u'in', u'out', u'on', u'off', u'over', u'under', u'again', u'further', u'then', u'once', u'here', u'there', u'when', u'where', u'why', u'how', u'all', u'any', u'both', u'each', u'few', u'more', u'most', u'other', u'some', u'such', u'no', u'nor', u

In [5]:
stop_list = stopwords.words('english') + list(string.punctuation)

tokens_no_stop = [token for token in all_tokens
                        if token not in stop_list]

total_term_frequency_no_stop = Counter(tokens_no_stop)

for word, freq in total_term_frequency_no_stop.most_common(20):
    print("{}\t{}".format(word.encode('utf-8'), freq))

Chagall	1197
``	1062
's	994
''	969
The	919
I	793
—	666
Yiddish	592
Jewish	563
theater	455
art	433
Russian	364
Theater	348
In	338
world	280
one	266
And	205
Moscow	201
new	201
n't	200


## Text Normalisation

Notice, that somethimes
Replacing tokens with a canonical form, so we can group together different spelling/variations of the same word.
There are many ways to perform text normalization: 

* lowercasing
* stemming (Stemming is the process of reducing a word to its base/root form, called stem)
    Stemming is the process of reducing the words(generally modified or derived) to their word stem or root form. The objective of stemming is to reduce related words to the same stem even if the stem is not a dictionary word. For example, in the English language-

> * beautiful and beautifully are stemmed to beauti 
> * good, better and best are stemmed to good, better and best respectively


The original paper by Martin Porter: https://tartarus.org/martin/PorterStemmer/def.txt on Porter Algorithm for stemming. 

* American-to-British mapping
* synonym mapping
* Lemmatization (Lemmatisation is the process of reducing a group of words into their lemma or dictionary form. It takes into account things like POS(Parts of Speech), the meaning of the word in the sentence, the meaning of the word in the nearby sentences etc. before reducing the word to its lemma. For example, in the English Language-

> * beautiful and beautifully are lemmatised to beautiful and beautifully respectively.
> * good, better and best are lemmatised to good, good and good respectively.
* ... 


In [6]:
from nltk.stem import PorterStemmer

stemmer = PorterStemmer()
all_tokens_lower = [t.lower() for t in all_tokens]

tokens_normalised = [stemmer.stem(t) for t in all_tokens_lower
                                     if t not in stop_list]

total_term_frequency_normalised = Counter(tokens_normalised)

for word, freq in total_term_frequency_normalised.most_common(20):
    print("{}\t{}".format(word.encode('utf-8'), freq))

chagal	1203
``	1062
's	994
''	969
theater	828
—	666
art	593
yiddish	592
jewish	568
paint	465
russian	368
world	337
artist	331
new	327
one	316
work	297
jew	222
life	214
time	213
moscow	201


## n-grams
When we are interested in phrases rather than single terms, we can look into n-grams. An **n-gram** is a sequence of n adjacent terms.Commonly used n-grams include bigrams (n=2) and trigrams (n=3).

In [7]:
from nltk import ngrams

phrases = Counter(ngrams(all_tokens_lower, 2))
for phrase, freq in phrases.most_common(40):
    print("{}\t{}".format(phrase, freq))

(u'of', u'the')	1163
(u'in', u'the')	812
(u',', u'and')	809
(u',', u'the')	574
(u'.', u'the')	504
(u'.', u'.')	446
(u'chagall', u"'s")	399
(u'to', u'the')	363
(u')', u'.')	358
(u'and', u'the')	350
(u'on', u'the')	330
(u'.', u'in')	268
(u'for', u'the')	255
(u',', u'in')	252
(u')', u',')	251
(u'of', u'a')	230
(u'.', u'i')	229
(u',', u'a')	213
(u'the', u'yiddish')	209
(u'.', u"''")	204
(u',', u"''")	201
(u',', u'but')	200
(u'of', u'his')	198
(u',', u'``')	195
(u'at', u'the')	191
(u'the', u'jewish')	187
(u'yiddish', u'theater')	180
(u',', u'which')	177
(u'the', u'theater')	172
(u'from', u'the')	172
(u'.', u'chagall')	165
(u'in', u'a')	164
(u'.', u'he')	160
(u',', u'he')	158
(u',', u'i')	153
(u'.', u'``')	152
(u',', u'chagall')	152
(u'with', u'the')	149
(u'by', u'the')	147
(u'.', u'and')	141


In [8]:
phrases = Counter(ngrams(all_tokens_lower, 3))
for phrase, freq in phrases.most_common(20):
    print("{}\t{}".format(phrase, freq))

(u'.', u'.', u'.')	268
(u',', u'and', u'the')	96
(u'the', u'yiddish', u'theater')	96
(u',', u'pp', u'.')	79
(u'yiddish', u'chamber', u'theater')	70
(u"''", u'(', u'in')	67
(u'of', u'chagall', u"'s")	65
(u'new', u'york', u':')	60
(u'.', u'in', u'the')	60
(u'.', u'chagall', u"'s")	57
(u',', u'in', u'the')	54
(u'the', u'jewish', u'theater')	51
(u'of', u'the', u'yiddish')	50
(u'texts', u'and', u'documents')	48
(u'in', u'chagall', u"'s")	47
(u'.', u'no', u'.')	47
(u'the', u'yiddish', u'chamber')	46
(u'of', u'the', u'theater')	45
(u',', u'no', u'.')	45
(u'marc', u'chagall', u':')	43



### n-grams and stop-words
Stop-word removal will affect n-grams
e.g. phrases like "a pinch of salt" become "pinch salt" after stop-word removal

In [9]:
phrases = Counter(ngrams(tokens_no_stop, 2))

for phrase, freq in phrases.most_common(40):
    print("{}\t{}".format(phrase, freq))

(u'Chagall', u"'s")	399
(u'Marc', u'Chagall')	137
(u'New', u'York')	111
(u'Chamber', u'Theater')	110
(u'Yiddish', u'Theater')	99
(u'Yiddish', u'theater')	81
(u'Yiddish', u'Chamber')	70
(u'Sholem', u'Aleichem')	70
(u'Jewish', u'Theater')	62
(u'``', u'The')	54
(u'Lanternshooter', u'Menakhem-Mendel')	53
(u'The', u'Russian')	48
(u'Texts', u'Documents')	47
(u'St.', u'Petersburg')	46
(u'``', u'Chagall')	46
(u'Chagall', u'The')	46
(u'Granovskii', u"'s")	44
(u"''", u'Russian')	44
(u"''", u'Yiddish')	44
(u"''", u'The')	42
(u'fictional', u'world')	42
(u"''", u'\u2014')	39
(u'I', u"n't")	39
(u'Russian', u'Years')	38
(u'Theater', u"''")	37
(u"''", u'``')	35
(u'The', u'Yiddish')	33
(u'I', u'would')	32
(u"Tret'iakov", u'Gallery')	31
(u'Solomon', u'R.')	31
(u'R.', u'Guggenheim')	31
(u'Chagall', u"''")	30
(u"n't", u'know')	29
(u'The', u'Jewish')	29
(u"''", u'In')	27
(u'Guggenheim', u'Museum')	27
(u'Benjamin', u'Harshav')	27
(u'State', u'Yiddish')	27
(u'Bakingfish', u'Lanternshooter')	26
(u'``', u'Marc

In [10]:
phrases = Counter(ngrams(tokens_no_stop, 3))

for phrase, freq in phrases.most_common(20):
    print("{}\t{}".format(phrase, freq))

(u'Yiddish', u'Chamber', u'Theater')	70
(u'Chagall', u'The', u'Russian')	39
(u'The', u'Russian', u'Years')	38
(u'Solomon', u'R.', u'Guggenheim')	31
(u'Marc', u'Chagall', u'The')	30
(u'Menakhem-Mendel', u'Lanternshooter', u'Menakhem-Mendel')	24
(u'Bakingfish', u'Lanternshooter', u'Menakhem-Mendel')	24
(u'State', u"Tret'iakov", u'Gallery')	24
(u'Lanternshooter', u'Menakhem-Mendel', u'Lanternshooter')	23
(u'``', u'Marc', u'Chagall')	23
(u'Chagall', u"'s", u'art')	21
(u'State', u'Jewish', u'Chamber')	19
(u'Jewish', u'Chamber', u'Theater')	19
(u'State', u'Yiddish', u'Chamber')	19
(u'Texts', u'Documents', u'1')	18
(u'Sholem', u'Aleichem', u'Evening')	17
(u'R.', u'Guggenheim', u'Museum')	17
(u'Vitali', u'Marc', u'Chagall')	17
(u'Sholem', u'Aleichem', u"'s")	17
(u'Chagall', u"'s", u'paintings')	17


### Part of Speech Tagging

*Notes on Instalation
pip install -U textblob
python -m textblob.download_corpora*

Identifying part of speech tags is much more complicated than simply mapping words to their part of speech tags. This is because POS tagging is not something that is generic. It is quite possible for a single word to have a different part of speech tag in different sentences based on different contexts. That is why it is impossible to have a generic mapping for POS tags.

*Futher [reading](https://medium.freecodecamp.org/an-introduction-to-part-of-speech-tagging-and-the-hidden-markov-model-953d45338f24)*

In [11]:
# With TextBlob
from textblob import TextBlob 
text_blob_object = TextBlob(booktext)

document_sentence = text_blob_object.sentences

# print(document_sentence)  
# print(len(document_sentence))  

for word, pos in text_blob_object.tags:  
    print(word + " => " + pos)

GUGGENHEIM => NNP
MUSEUM => NNP
Digitized => NNP
by => IN
the => DT
Internet => NNP
Arciiive => NNP
in => IN
2012 => CD
witii => NN
funding => NN
from => IN
IVIetropolitan => NNP
New => NNP
York => NNP
Library => NNP
Council => NNP
METRO => NN
http => NN
//archive.org/details/chagalljOOchag => JJ
Marc => NNP
Chagall => NNP
and => CC
the => DT
Jevs^ish => NNP
Theater => NNP
Marc => NNP
Chagall => NNP
and => CC
the => DT
JevN^ish => NNP
Theater => NNP
GUGGENHEIM => NNP
MUSEUM => NNP
©The => NNP
Solomon => NNP
R. => NNP
Guggenheim => NNP
Foundation => NNP
New => NNP
York => NNP
1992 => CD
All => NNP
rights => NNS
reserved => VBD
Reproductions => NNS
of => IN
cat => NN
nos => NNS
1-7 => JJ
© => NNP
State => NNP
Tret'iakov => NNP
Gallery => NNP
Moscow => NNP
Marc => NNP
Chagall => NNP
and => CC
the => DT
Je => NNP
& => CC
gt => NN
vish => JJ
Theater => NNP
Solomon => NNP
R. => NNP
Guggenheim => NNP
Museum => NNP
September => NNP
23 => CD
1992-January => JJ
17 => CD
1993 => CD
The => DT
Art 

Vrubel => NNP
's => POS
panel => NN
Priucess => NNP
Gnzci => NNP
The => DT
singular => JJ
history => NN
of => IN
Chagall => NNP
's => POS
murals => NNS
— => VBP
including => VBG
their => PRP$
creation => NN
in => IN
the => DT
extreme => JJ
conditions => NNS
of => IN
postrevolutionary => JJ
Russia => NNP
the => DT
nature => NN
of => IN
their => PRP$
materials => NNS
tempera => NN
and => CC
gouache => NN
on => IN
thin => JJ
linen => NN
their => PRP$
subsequent => JJ
fate => NN
determined => VBN
by => IN
dramatic => JJ
twists => NNS
and => CC
turns => NNS
such => JJ
as => IN
the => DT
theater => NN
's => POS
move => NN
from => IN
Bol'shoi => NNP
Chernyshevskii => NNP
Lane => NNP
to => TO
Malaia => NNP
Bronnaia => NNP
Street => NNP
and => CC
the => DT
far- => JJ
from-ideal => JJ
storage => NN
conditions => NNS
from => IN
1938 => CD
through => IN
the => DT
war => NN
— => NN
was => VBD
reflected => VBN
in => IN
their => PRP$
appearance => NN
The => DT
restorers => NNS
had => VBD
to => TO
dea

signification => NN
of => IN
the => DT
paintings => NNS
because => IN
the => DT
festival => NN
of => IN
Purim => NNP
commemorates => VBZ
the => DT
victory => NN
of => IN
Persian => JJ
Jews => NNPS
over => IN
Haman => NNP
a => DT
repressive => JJ
foe => NN
To => TO
Russian => JJ
Jews => NNP
who => WP
had => VBD
been => VBN
the => DT
victims => NNS
of => IN
suppression => NN
and => CC
persecution => NN
for => IN
centuries => NNS
Christian => JJ
tsars => NNS
and => CC
their => PRP$
governments => NNS
were => VBD
but => CC
latter-day => JJ
Hamans => NNP
While => IN
celebrating => VBG
the => DT
new => JJ
theater => NN
Chagall => NNP
may => MD
have => VB
used => VBN
his => PRP$
Purim-carnival => JJ
to => TO
rejoice => VB
at => IN
the => DT
troupe => NN
's => POS
victory => NN
over => IN
their => PRP$
Haman => NNP
that => WDT
is => VBZ
at => IN
the => DT
fall => NN
of => IN
the => DT
tsar => NN
who => WP
had => VBD
suppressed => VBN
Yiddish => JJ
theater => NN
In => IN
Introduction => NNP
to 

figure => NN
beside => IN
the => DT
house => NN
in => IN
the => DT
upper => JJ
left => JJ
section => NN
of => IN
the => DT
painting => NN
Music => NNP
In => IN
a => DT
1974 => CD
interview => NN
with => IN
Margit => NNP
Rowell => NNP
the => DT
artist => NN
described => VBD
this => DT
figure => NN
as => IN
a => DT
man => NN
qui => JJ
fait => NN
caca => NN
10 => CD
P => NNP
urimspieler => NN
are => VBP
itinerant => JJ
players => NNS
who => WP
perform => VBP
at => IN
Purim => NNP
Kampf => NNP
describes => VBZ
the => DT
Purim => NNP
atmosphere => NN
of => IN
the => DT
paintings => NNS
in => IN
Art => NNP
and => CC
Stage => NNP
Design => NNP
p. => VBZ
129 => CD
and => CC
The => DT
Quest => NNP
for => IN
a => DT
Jewish => JJ
Style => NNP
p. => VBZ
33 => CD
Amishai-Maisels => NNS
in => IN
Chagall => NNP
's => POS
Murals => NNS
for => IN
the => DT
State => NNP
Jewish => NNP
Chamber => NNP
Theatre => NNP
p. => VBZ
116 => CD
connects => VBZ
one => CD
of => IN
the => DT
acrobats => NNS
in => IN
C

though => IN
not => RB
in => IN
style => NN
to => TO
the => DT
nightclubs => NNS
or => CC
cabarets => NNS
decorated => VBN
by => IN
avant-garde => JJ
artists => NNS
that => WDT
were => VBD
such => PDT
a => DT
feature => NN
of => IN
artistic => JJ
life => NN
in => IN
Petrograd => NNP
and => CC
Moscow => NNP
Chagall => NNP
frequented => VBD
the => DT
Stray => NNP
Dog => NNP
and => CC
the => DT
Comedian => NNP
's => POS
Halt => NNP
in => IN
the => DT
capital => NN
when => WRB
he => PRP
lived => VBD
there => RB
from => IN
1915 => CD
to => TO
1917 => CD
he => PRP
must => MD
have => VB
also => RB
known => VBN
the => DT
Cafe => NNP
Pittoresque => NNP
and => CC
the => DT
Poets => NNP
' => POS
Cafe => NNP
which => WDT
opened => VBD
in => IN
Moscow => NNP
shortly => RB
before => IN
he => PRP
began => VBD
work => NN
on => IN
the => DT
murals => NNS
The => DT
Comedian => NNP
's => POS
Halt => NNP
provided => VBD
the => DT
closest => JJS
prototype => NN
its => PRP$
walls => NNS
were => VBD
covered 

reflect => VB
modern => JJ
life => NN
and => CC
its => PRP$
multiple => JJ
means => NNS
of => IN
communication => NN
Apollinaire => NNP
used => VBD
similar => JJ
sources => NNS
and => CC
juxtaposed => VBD
his => PRP$
images => NNS
in => IN
this => DT
seemingly => RB
random => JJ
fashion => NN
in => IN
the => DT
poems => NN
he => PRP
named => VBD
Calligranniies => NNP
where => WRB
the => DT
printed => JJ
words => NNS
not => RB
only => RB
carry => VB
their => PRP$
expected => JJ
meanings => NNS
but => CC
are => VBP
clustered => VBN
together => RB
on => IN
the => DT
page => NN
in => IN
novel => JJ
arrangements => NNS
to => TO
form => VB
literal => JJ
pictures => NNS
He => PRP
defended => VBD
them => PRP
from => IN
the => DT
charge => NN
that => IN
they => PRP
were => VBD
incomprehensible => JJ
as => IN
written => VBN
language => NN
by => IN
saying => VBG
that => IN
the => DT
fragments => NNS
of => IN
language => NN
were => VBD
now => RB
tied => VBN
together => RB
by => IN
an => DT
ideogra

and => CC
conventional => JJ
Cubism => NNP
may => MD
seem => VB
remote => VB
as => IN
Shterenberg => NNP
's => POS
composition => NN
of => IN
a => DT
tabletop => NN
with => IN
a => DT
dish => NN
and => CC
a => DT
bread => NN
roll => NN
is => VBZ
nearly => RB
abstract => JJ
but => CC
the => DT
principal => JJ
feature => NN
of => IN
both => DT
works => NNS
is => VBZ
an => DT
area => NN
of => IN
thick => JJ
white => JJ
paint => NN
worked => VBN
in => IN
places => NNS
with => IN
a => DT
house-painter => NN
's => POS
graining => NN
comb => NN
This => DT
refers => VBZ
to => TO
the => DT
Polemical => JJ
Supplement => NN
that => WDT
Aksenov => NNP
had => VBD
added => VBN
to => TO
his => PRP$
account => NN
of => IN
Picasso => NNP
's => POS
art => NN
in => IN
which => WDT
he => PRP
discussed => VBD
the => DT
artist => NN
's => POS
use => NN
of => IN
texture => NN
mentioning => VBG
the => DT
use => NN
of => IN
such => JJ
a => DT
comb => NN
* => NN
These => DT
three => CD
Russian => JJ
artists => 

section => NN
of => IN
the => DT
mural => NN
Here => RB
— => VBZ
literally => RB
at => IN
a => DT
higher => JJR
level => NN
— => NNP
Shakespeare => NNP
's => POS
Hamlet => NNP
for => IN
such => JJ
is => VBZ
the => DT
figure => NN
behind => IN
the => DT
cow^° => NN
points => VBZ
back => RB
to => TO
the => DT
dancers => NNS
at => IN
a => DT
Hasidic => NNP
wedding => NN
which => WDT
Chagall => NNP
also => RB
describes => VBZ
in => IN
My => NNP
Life => NNP
' => POS
In => IN
the => DT
foreground => NN
the => DT
artist => NN
's => POS
wife => NN
Bella => NNP
herself => PRP
an => DT
aspiring => VBG
actress => NN
and => CC
their => PRP$
little => JJ
daughter => NN
Ida => NNP
applaud => VBZ
this => DT
new => JJ
world => NN
that => WDT
extends => VBZ
to => TO
the => DT
very => RB
edge => NN
of => IN
the => DT
mural => NN
In => IN
the => DT
bottom-right => JJ
corner => NN
it => PRP
ends => VBZ
in => IN
a => DT
witty => JJ
image => NN
of => IN
a => DT
Jewish => JJ
villager => NN
the => DT
artist =

cat => NN
no => DT
3 => CD
pp => NN
155-56 => JJ
Russian => JJ
Wedding => NNP
1909 => CD
cat => NN
no => DT
9 => CD
p. => RB
160 => CD
Birth => NNP
1910 => CD
cat => NN
no => DT
10 => CD
pp => NN
160—61 => CD
The => DT
Dead => NNP
Man => NNP
is => VBZ
now => RB
in => IN
the => DT
collection => NN
of => IN
the => DT
Musee => NNP
national => JJ
d'art => NN
moderne => NN
Centre => NNP
Georges => NNP
Pompidou => NNP
Paris => NNP
The => DT
exhibition => NN
of => IN
works => NNS
by => IN
pupils => NNS
at => IN
the => DT
Zvantseva => NNP
School => NNP
took => VBD
place => NN
in => IN
the => DT
offices => NNS
oi => VBP
Apollon => NNP
April => NNP
20-May => CD
9 => CD
1910 => CD
30 => CD
V. => NNP
Ivanov => NNP
Dve => NNP
stikhii => NN
v => NN
sovremennom => NN
simvolizme => NN
Two => CD
Elements => NNS
of => IN
Contemporary => NNP
Symbolism => NNP
in => IN
Zolotoe => NNP
riinn => NN
April/May => NNP
1908 => CD
31 => CD
Birth => NNP
Kunsthaus => NNP
Zurich => NNP
is => VBZ
discussed => VBN
turt

Tami => NNP
Katz-Frieman => NNP
'Tounding => VBG
the => DT
Tel => NNP
Aviv => NNP
Museum => NNP
1930-36 => CD
The => DT
Tel => NNP
Aviv => NNP
Museum => NNP
Annual => NNP
Review => NNP
vol => NN
i => NN
1982 => CD
A => DT
summary => NN
is => VBZ
given => VBN
in => IN
the => DT
Chronology => NNP
in => IN
Compton => NNP
Chagall => NNP
pp => NN
264-65 => JJ
84 => CD
The => DT
Musee => NNP
national => JJ
message => NN
biblique => NN
Marc => NNP
Chagall => NNP
opened => VBD
in => IN
Nice => NNP
in => IN
1973 => CD
to => TO
house => NN
a => DT
permanent => JJ
collection => NN
of => IN
Chagall => NNP
's => POS
biblical => JJ
paintings => NNS
it => PRP
also => RB
includes => VBZ
a => DT
space => NN
for => IN
temporary => JJ
exhibitions => NNS
and => CC
a => DT
concert/lecture => NN
hall => NN
85 => CD
Alexander => NNP
'Vetrov => NNP
On => IN
Chagall => NNP
from => IN
Ekran => NNP
November => NNP
1921 => CD
Jerry => NNP
Payne => NNP
trans. => NN
in => IN
'Vitali => CD
Marc => NNP
Chagall => NNP

of => IN
Art => NNP
movement => NN
in => IN
the => DT
beginning => NN
of => IN
the => DT
century => NN
In => IN
Chagall => NNP
's => POS
work => NN
the => DT
seemingly => RB
disparate => JJ
components => NNS
fuse => VBP
into => IN
one => CD
functional => JJ
unity => NN
in => IN
each => DT
painting => NN
often => RB
in => IN
an => DT
asymmetrical => JJ
uneasy => JJ
but => CC
ultimately => RB
justified => JJ
balance => NN
Chagall => NNP
's => POS
paintings => NNS
have => VBP
perhaps => RB
unfairly => RB
been => VBN
judged => VBN
from => IN
a => DT
perspective => NN
of => IN
purism => NN
— => IN
the => DT
ascetic => JJ
use => NN
of => IN
a => DT
well- => JJ
delimited => JJ
discourse => NN
of => IN
art => NN
Thus => RB
they => PRP
have => VBP
been => VBN
termed => VBN
eclectic. => JJ
Today => NN
in => IN
an => DT
age => NN
of => IN
postmodernism => NN
we => PRP
can => MD
surely => RB
question => VB
the => DT
validity => NN
of => IN
a => DT
pure => NN
language => NN
as => IN
the => DT
highe

Chagall => NNP
was => VBD
too => RB
eclectic => JJ
and => CC
thematic => JJ
or => CC
objective => JJ
to => TO
use => VB
Malevich => NNP
's => POS
negative => JJ
term => NN
for => IN
their => PRP$
taste => NN
Thus => RB
he => PRP
came => VBD
to => TO
Moscow => NNP
and => CC
accepted => VBD
the => DT
commission => NN
for => IN
the => DT
new => JJ
State => NNP
Yiddish => NNP
Chamber => NNP
Theater => NNP
Fictional => JJ
world => NN
as => IN
a => DT
language => NN
of => IN
art => NN
Like => IN
the => DT
ideal => NN
of => IN
pure => JJ
poetry => NN
pure => JJ
art => NN
to => TO
the => DT
avant-garde => JJ
meant => NN
the => DT
acceptance => NN
of => IN
one => CD
language => NN
that => WDT
dominated => VBD
each => DT
work => NN
In => IN
their => PRP$
period => NN
of => IN
Analytic => JJ
Cubism => NNP
for => IN
example => NN
Picasso => NNP
and => CC
Braque => NNP
made => VBD
each => DT
painting => VBG
according => VBG
to => TO
the => DT
Cubist => NNP
views => NNS
of => IN
perception => NN
and

is => VBZ
based => VBN
on => IN
his => PRP$
life => NN
and => CC
fictional => JJ
world => NN
treatment => NN
of => IN
those => DT
rather => RB
than => IN
geometric => JJ
forms => NNS
became => VBD
the => DT
focus => NN
of => IN
critical => JJ
attention => NN
thus => RB
deformations => NNS
of => IN
figures => NNS
are => VBP
discussed => VBN
not => RB
for => IN
their => PRP$
own => JJ
sake => NN
as => IN
forms => NNS
in => IN
space => NN
and => CC
color => NN
is => VBZ
discussed => VBN
not => RB
for => IN
its => PRP$
autonomous => JJ
expressive => JJ
values => NNS
but => CC
for => IN
what => WP
they => PRP
do => VBP
to => TO
the => DT
fictional => JJ
world => NN
A => DT
scholar => NN
of => IN
Chagall => NNP
's => POS
art => NN
is => VBZ
compelled => VBN
to => TO
study => VB
his => PRP$
biography => NN
and => CC
cultural => JJ
context => NN
as => IN
primary => JJ
sources => NNS
of => IN
his => PRP$
language => NN
iconography => NN
and => CC
meaning => NN
Jewish => JJ
elements => NNS
too =

analogy => NN
Specific => JJ
characteristics => NNS
of => IN
that => DT
Jewish => JJ
discourse => NN
were => VBD
influenced => VBN
by => IN
an => DT
intersection => NN
of => IN
communicational => JJ
habits => NNS
from => IN
both => DT
the => DT
learned => VBN
tradition => NN
and => CC
the => DT
existential => JJ
situation => NN
of => IN
the => DT
Jews => NNP
For => IN
example => NN
the => DT
inferiority-superiority => NN
complex => NN
of => IN
Kafka => NNP
's => POS
protagonists => NNS
and => CC
of => IN
Chagall => NNP
's => POS
self-understanding => NN
came => VBD
from => IN
the => DT
religious => JJ
notion => NN
that => IN
the => DT
Jews => NNPS
are => VBP
the => DT
chosen => JJ
people => NNS
on => IN
the => DT
one => CD
hand => NN
combined => VBN
with => IN
their => PRP$
actual => JJ
existential => JJ
condition => NN
of => IN
a => DT
fallen => JJ
aristocracy => NN
of => IN
the => DT
mind => NN
chosen => NN
for => IN
suffering => VBG
as => RB
well => RB
on => IN
the => DT
other => JJ

Picasso => NNP
the => DT
basic => JJ
catastrophe => NN
— => VBD
the => DT
defiance => NN
of => IN
normal => JJ
human => JJ
anatomy => NN
— => NN
is => VBZ
joined => VBN
by => IN
secondary => JJ
catastrophes => NNS
and => CC
willful => JJ
deformations => NNS
such => JJ
as => IN
the => DT
house => NN
that => IN
defies => VBZ
gravity => NN
and => CC
runs => VBZ
off => RP
as => IN
a => DT
fox => NN
and => CC
the => DT
topsy-turvy => JJ
abstraction => NN
of => IN
the => DT
house => NN
in => IN
the => DT
upper-left => JJ
corner => NN
looking => VBG
down => RP
on => IN
it => PRP
all => DT
in => IN
an => DT
inverted => JJ
perspective => NN
Such => JJ
willful => JJ
catastrophes => NNS
drastic => JJ
breaks => NNS
in => IN
a => DT
normal => JJ
realistic => JJ
flow => NN
appear => VBP
in => IN
many => JJ
forms => NNS
and => CC
were => VBD
Chagall => NNP
's => POS
trademark => NN
from => IN
the => DT
beginning => NN
Indeed => RB
Chagall => NNP
was => VBD
a => DT
lyrical => JJ
joker => NN
an => DT
o

circle => NN
with => IN
inverted => JJ
red => JJ
and => CC
white => JJ
colors => NNS
intersecting => VBG
the => DT
major => JJ
one => CD
but => CC
not => RB
related => VBN
to => TO
any => DT
presented => JJ
figures => NNS
Furthermore => RB
the => DT
geometric => JJ
principle => NN
is => VBZ
repeated => VBN
in => IN
several => JJ
semicircular => JJ
lines => NNS
drawn => VBN
in => IN
a => DT
nonconcentric => JJ
manner => NN
That => DT
includes => VBZ
two => CD
intersecting => NN
ovoids => NNS
through => IN
the => DT
calf => NN
's => POS
head => NN
as => RB
well => RB
as => IN
several => JJ
continuations => NNS
of => IN
one => CD
oval => NN
in => IN
almost-straight => JJ
lines => NNS
Some => DT
of => IN
the => DT
outlines => NNS
of => IN
the => DT
objects => NNS
and => CC
figures => NNS
are => VBP
also => RB
involved => VBN
in => IN
the => DT
geometric => JJ
network => NN
like => IN
verbal => JJ
ambiguities => NNS
in => IN
poetry => NN
they => PRP
participate => VBP
in => IN
both => DT
sy

artist => NN
— => NN
to => TO
paint => VB
the => DT
auditorium => NN
of => IN
the => DT
theater => NN
— => NNP
challenged => VBD
him => PRP
to => TO
a => DT
dialogue => NN
between => IN
the => DT
auditorium => NN
and => CC
the => DT
stage => NN
The => DT
auditorium => NN
presumably => RB
embodying => VBG
life => NN
itself => PRP
life => NN
standing => VBG
against => IN
the => DT
stage => NN
speaks => VBZ
in => IN
Chagall => NNP
's => POS
Introduction => NN
to => TO
the => DT
Yiddish => JJ
Theater => NN
about => IN
the => DT
ever-theatrical => JJ
nature => NN
of => IN
mundane => JJ
life => NN
itself => PRP
about => IN
life => NN
itself => PRP
being => VBG
drunk => VBN
on => IN
the => DT
elixir => NN
of => IN
theatricality => NN
about => IN
Chagall => NNP
's => POS
paintings => NNS
connecting => VBG
the => DT
ties => NNS
between => IN
the => DT
Harlequinade => NNP
of => IN
the => DT
stage => NN
and => CC
the => DT
Harlequinade => NNP
of => IN
mundane => NN
life. => NN
Chagall => NNP
hate

last => JJ
name => NN
in => IN
inverted => JJ
Yiddish => JJ
— => NN
or => CC
even => RB
MRAC => NNP
with => IN
the => DT
M => NNP
and => CC
the => DT
C => NNP
in => IN
Latin => NNP
letters => NNS
and => CC
the => DT
AR => NNP
in => IN
inverted => JJ
Yiddish => NN
The => DT
names => NNS
on => IN
the => DT
mural => JJ
thus => RB
signal => VBP
the => DT
viewer => NN
's => POS
eyes => NNS
to => TO
move => VB
in => IN
the => DT
direction => NN
of => IN
the => DT
stage => NN
and => CC
in => IN
the => DT
direction => NN
of => IN
Russian => JJ
culture.^ => NN
The => DT
circus => NN
of => IN
intersecting => VBG
circles => NNS
stripes => NNS
sections => NNS
and => CC
triangles => NNS
acts => NNS
as => IN
the => DT
ground => NN
under => IN
the => DT
figures => NNS
It => PRP
also => RB
usurps => VBD
the => DT
role => NN
of => IN
perspective => NN
Chagall => NNP
painted => VBD
little => JJ
figures => NNS
that => WDT
emerge => VBP
from => IN
between => IN
the => DT
planes => NNS
thus => RB
creating 

the => DT
theater => NN
's => POS
first => JJ
composer => NN
Lev => NNP
Pulver => NNP
The => DT
musicians => NNS
are => VBP
actors => NNS
This => DT
is => VBZ
a => DT
realization => NN
of => IN
a => DT
metaphor => NN
in => IN
Yiddish => NNP
zey => NNP
shpiln => NN
means => VBZ
both => DT
they => PRP
play => VBP
music => NN
as => RB
well => RB
as => IN
they => PRP
perform => VBP
in => IN
a => DT
play => NN
A => DT
goat => NN
Chagall => NNP
's => POS
representative => NN
it => PRP
was => VBD
also => RB
painted => VBN
on => IN
the => DT
curtains => NNS
opens => VBZ
the => DT
performance => NN
The => DT
bodies => NNS
of => IN
the => DT
actors => NNS
are => VBP
deformed => VBN
in => IN
a => DT
Chagallian => JJ
manner => NN
limbs => NNS
scattered => VBD
in => IN
space => NN
figs => NNS
II-I2 => NNP
Details => NNP
r => NN
/ => CC
Introduction => NN
to => TO
the => DT
Yiddish => JJ
Theater => NNP
cat => NN
no => DT
i => NN
Chagall => NNP
's => POS
Murals => NNP
I => PRP
33 => CD
One => CD
can 

three => CD
semicircles => NNS
as => IN
an => DT
underlying => JJ
structure => NN
There => EX
are => VBP
additional => JJ
devices => NNS
to => TO
unify => VB
the => DT
murals => NNS
In => IN
both => DT
a => DT
dynamic => JJ
pulls => VBZ
the => DT
viewer => NN
with => IN
it => PRP
As => IN
Arnheim => NNP
notes => NNS
Any => DT
compositional => JJ
movement => NN
toward => IN
the => DT
left => NN
runs => NNS
against => IN
the => DT
tide => NN
because => IN
for => IN
psychological => JJ
reasons => NNS
the => DT
observer => NN
's => POS
glance => NN
proceeds => NNS
freely => RB
from => IN
left => VBN
to => TO
right => VB
whereas => NNS
it => PRP
is => VBZ
impeded => VBN
in => IN
the => DT
opposite => JJ
direction => NN
* => NN
In => IN
Picasso => NNP
's => POS
mural => JJ
there => EX
is => VBZ
a => DT
movement => NN
of => IN
the => DT
wave => NN
of => IN
figures => NNS
directed => VBN
toward => IN
the => DT
bull => NN
on => IN
the => DT
left => NN
counteracting => VBG
the => DT
normal => JJ

objects => NNS
In => IN
April => NNP
1920 => CD
Terevsat => NNP
moved => VBD
to => TO
Moscow => NNP
Chagall => NNP
came => VBD
a => DT
month => NN
later => RB
^ => NN
In => IN
June => NNP
1922 => CD
Meierkhol => NNP
'd => POS
became => VBD
its => PRP$
director => NN
and => CC
the => DT
theater => NN
was => VBD
renamed => VBN
Theater => NNP
of => IN
the => DT
Revolution => NNP
at => IN
the => DT
Moscow => NNP
Soviet => NNP
Chagall => NN
designed => VBD
one => CD
more => JJR
production => NN
for => IN
Terevsat => NNP
Comrad => NNP
Khlestakov => NNP
a => DT
Revolutionary => NNP
parody => NN
with => IN
allusions => NNS
to => TO
Gogol => NNP
' => POS
The => DT
Yiddish => JJ
Theater => NNP
moved => VBD
to => TO
Moscow => NNP
in => IN
the => DT
fall => NN
of => IN
1920 => CD
and => CC
in => IN
November => NNP
Chagall => NNP
plunged => VBD
into => IN
the => DT
immense => NN
project => NN
His => PRP$
stage => NN
sets => NNS
were => VBD
truly => RB
minimal => JJ
for => IN
lack => NN
of => IN
mea

Introduction => NNP
to => TO
the => DT
Yiddish => NNP
Theater => NNP
Kultur-Lige => NNP
emerged => VBD
during => IN
the => DT
Civil => NNP
War => NNP
when => WRB
Kiev => NNP
itself => PRP
was => VBD
shifting => VBG
from => IN
one => CD
power => NN
to => TO
another => DT
The => DT
war => NN
and => CC
the => DT
exterminating => VBG
pogroms => NNS
of => IN
1919 => CD
when => WRB
about => IN
a => DT
hundred => CD
thousand => CD
Jews => NNPS
were => VBD
slaughtered => VBN
and => CC
several => JJ
hundred => CD
thousand => NNS
were => VBD
exiled => VBN
from => IN
their => PRP$
towns => NNS
were => VBD
hard => RB
on => IN
the => DT
Kiev => NNP
center => NN
but => CC
its => PRP$
ideas => NNS
were => VBD
shared => VBN
by => IN
other => JJ
centers => NNS
in => IN
Russia => NNP
and => CC
in => IN
the => DT
newly => RB
re-established => JJ
Poland => NNP
With => IN
the => DT
same => JJ
spirit => NN
that => WDT
led => VBD
to => TO
Kultur-Lige => NNP
s => NN
formation => NN
a => DT
Jewish => JJ
Theate

were => VBD
we => PRP
— => VBP
lonely => JJ
dreamers => NNS
with => IN
unclear => JJ
strivings => NNS
what => WP
did => VBD
we => PRP
bring => VB
with => IN
us => PRP
— => VBP
except => IN
for => IN
oppressed => JJ
and => CC
bound => JJ
limbs => NNS
and => CC
internal => JJ
tightness => NN
complete => JJ
ignorance => NN
and => CC
helplessness => NN
in => IN
stage => NN
work => NN
and => CC
stage => NN
technique => NN
— => NNP
nothing => NN
Yet => RB
one => CD
thing => NN
each => DT
of => IN
us => PRP
had => VBD
— => VBN
fiery => NN
will => MD
and => CC
readiness => VB
for => IN
sacrifice => NN
And => CC
our => PRP$
leader => NN
told => VBD
us => PRP
it => PRP
was => VBD
enough => RB
For => IN
Granovskii => NNP
Man => NNP
was => VBD
but => CC
one => CD
of => IN
the => DT
elements => NNS
of => IN
a => DT
stage => NN
production => NN
along => IN
with => IN
the => DT
script => NN
the => DT
music => NN
the => DT
sets => NNS
and => CC
the => DT
lighting => NN
But => CC
as => IN
Mikhoels => N

his => PRP$
art => NN
however => RB
came => VBD
from => IN
Chagall => NN
the => DT
painter => NN
was => VBD
the => DT
source => NN
of => IN
the => DT
tragicomic => JJ
perception => NN
of => IN
the => DT
absurdity => NN
of => IN
Jewish => NNP
and => CC
general => JJ
human => NN
existence => NN
evoked => VBD
through => IN
a => DT
demonstrative => JJ
antirealism => NN
Almost => RB
from => IN
the => DT
beginning => NN
the => DT
aloof => NN
Granovskii => NNP
charged => VBD
Mikhoels => NNP
with => IN
conducting => VBG
the => DT
daily => JJ
work => NN
of => IN
the => DT
troupe => NN
Mikhoels => NNP
was => VBD
stage => JJ
director => NN
and => CC
before => IN
each => DT
production => NN
he => PRP
announced => VBD
a => DT
competition => NN
for => IN
each => DT
role => NN
Mikhoels => NNS
read => VBP
to => TO
them => PRP
from => IN
the => DT
newly => RB
published => VBN
muhivolume => JJ
Jewish => JJ
Encyclopedia => NNP
in => IN
Russian => NNP
Mikhoels => NNP
's => POS
enthusiastic => JJ
conversio

are => VBP
often => RB
mentioned => VBN
together => RB
A => DT
group => NN
of => IN
young => JJ
people => NNS
gathered => VBN
before => IN
World => NNP
War => NNP
I => PRP
in => IN
Bialystok => NNP
to => TO
create => VB
a => DT
new => JJ
kind => NN
oi => NN
bima => NN
the => DT
Hebrew => NNP
word => NN
for => IN
stage => NN
ha => NN
is => VBZ
the => DT
article => NN
in => IN
order => NN
to => TO
perform => VB
in => IN
the => DT
budding => NN
48 => CD
Benjamin => NNP
Harshav => NNP
^■■^^^H => NNP
^^^^^^^^^H => NNP
.jj^riitfil^A.ih^^^^^^^^^^^^^l => NNP
^^^^^^^^^^^^^^^^^^^^^^^^ => NNP
'^ => POS
^-i^^^^^^^^^^^^^l => JJ
Hr^ => NNP
] => NNP
r => NN
j^^^^y => NN
^^^K^^^^^^^K => NNP
* => NNP
.^ => NNP
.^^^^^^^^^^^^^^^^^^^^^^^^^^1 => NNP
fig => NN
22 => CD
Mikhoels => NNS
and => CC
Chagall => NNP
at => IN
thar => NN
last => JJ
meeting => NN
in => IN
New => NNP
York => NNP
City => NNP
in => IN
1944 => CD
The => DT
Yiddish => JJ
Theater => NNP
I => PRP
49 => CD
spoken => JJ
Hebrew => NNP
language

unlike => IN
the => DT
village => NN
the => DT
small => JJ
town => NN
had => VBD
no => DT
land => NN
or => CC
peasants => NNS
In => IN
Yiddish => NNP
the => DT
concept => NN
of => IN
village => NN
connoted => VBD
something => NN
entirely => RB
different => JJ
it => PRP
was => VBD
a => DT
place => NN
inhabited => VBN
by => IN
mostly => RB
illiterate => JJ
Christian => JJ
peasants => NNS
working => VBG
the => DT
land => NN
with => IN
little => JJ
contact => NN
to => TO
any => DT
modern => JJ
technology => NN
or => CC
industry => NN
who => WP
would => MD
come => VB
to => TO
the => DT
town => NN
market => NN
get => VB
drunk => NN
and => CC
exhibit => VB
their => PRP$
physical => JJ
prowess => NN
In => IN
Yiddish => NNP
the => DT
peasants => NNS
were => VBD
caWed => JJ
poyerim => JJ
or => CC
the => DT
synonymous => JJ
goyim => NN
gentiles => NNS
Just => RB
as => IN
the => DT
English => NNP
language => NN
distinguishes => NNS
between => IN
genders => NNS
for => IN
example => NN
waiter => NN


his => PRP$
slow => JJ
speech => NN
in => IN
which => WDT
the => DT
frowning => NN
of => IN
forehead => NN
and => CC
lips => NNS
or => CC
fuzzy => VB
sounds => NNS
and => CC
pauses => NNS
play => VBP
the => DT
role => NN
oj => NN
words => NNS
and => CC
concepts => NNS
— => VBP
you => PRP
imagine => VB
his => PRP$
painting => NN
as => IN
hesitant => NN
and => CC
careful => NN
Such => JJ
are => VBP
the => DT
first => JJ
canvases => NNS
of => IN
Chagall => NNP
's => POS
Cultural => NNP
Context => NNP
foreigners => NNS
in => IN
Paris => NNP
who => WP
try => VBP
to => TO
hold => VB
the => DT
brush => NN
in => IN
the => DT
French => JJ
manner => NN
Chagall => NNP
was => VBD
anything => NN
but => CC
hesitant => NN
but => CC
this => DT
describes => VBZ
his => PRP$
basic => JJ
situation => NN
too => RB
he => PRP
was => VBD
like => IN
Shterenberg => NNP
a => DT
typical => JJ
member => NN
of => IN
his => PRP$
generation => NN
and => CC
could => MD
not => RB
speak => VB
properly => RB
in => IN
any

to => TO
design => VB
its => PRP$
set => NN
Subsequently => RB
the => DT
theater => NN
itself => PRP
became => VBD
famous => JJ
and => CC
Chagall => NNP
drew => VBD
on => IN
its => PRP$
fame => NN
Chagall => NNP
felt => VBD
himself => PRP
to => TO
be => VB
among => IN
the => DT
best => JJS
in => IN
the => DT
art => NN
world => NN
yet => CC
he => PRP
retained => VBD
the => DT
inferiority => NN
complex => NN
of => IN
a => DT
Jewish => JJ
boy => NN
from => IN
the => DT
provinces => NNS
Witness => NNP
how => WRB
in => IN
1952 => CD
after => IN
major => JJ
solo => NN
exhibitions => NNS
in => IN
New => NNP
York => NNP
Paris => NNP
and => CC
London => NNP
he => PRP
wrote => VBD
with => IN
amazement => NN
and => CC
pride => NN
to => TO
Opatoshu => VB
that => IN
he => PRP
was => VBD
going => VBG
to => TO
marry => VB
a => DT
woman => NN
of => IN
the => DT
rich => JJ
Brodskii => NNP
family => NN
from => IN
Kiev => NNP
On => IN
Chagall => NNP
's => POS
evolution => NN
With => IN
time => NN
Chagall

the => DT
first => JJ
two => CD
words => NNS
mean => VBP
for => IN
smokers => NNS
in => IN
Yiddish => NN
in => IN
the => DT
Russian => JJ
train => NN
the => DT
last => JJ
part => NN
is => VBZ
an => DT
inversion => NN
of => IN
the => DT
Russian => NNP
label => NN
III => NNP
CL => NNP
[ => NNP
ASS => NNP
written => VBN
in => IN
the => DT
right => JJ
direction => NN
but => CC
in => IN
the => DT
wrong => JJ
language => NN
42 => CD
Avram => NNP
Kampf => NNP
suggests => VBZ
that => IN
a => DT
perspective => NN
of => IN
sentiment => NN
governs => VBZ
the => DT
picture => NN
that => IN
which => WDT
looms => VBZ
large => JJ
in => IN
one => CD
's => POS
memor => NN
appears => VBZ
huge => JJ
on => IN
the => DT
canvas => NN
Chagall => NNP
in => IN
the => DT
'Yiddish => JJ
Theater => NNP
in => IN
Vitali => NNP
Marc => NNP
Chagall => NNP
The => DT
Russian => JJ
Years => NNP
1906-1922 => CD
p. => VBZ
98 => CD
Perhaps => RB
such => PDT
a => DT
principle => NN
helps => VBZ
the => DT
writer => NN
ration

statistics => NNS
from => IN
the => DT
census => NN
of => IN
1897 => CD
The => DT
figures => NNS
here => RB
are => VBP
based => VBN
on => IN
the => DT
relevant => JJ
entries => NNS
in => IN
the => DT
Russian => JJ
Evreiskaia => NNP
entsiklopediia => NN
Jewish => JJ
encyclopedia => NN
St. => NNP
Petersburg => NNP
Brockhaus => NNP
and => CC
Ephron => NNP
1915 => CD
and => CC
Ch => NNP
Shmeruk => NNP
's => POS
dissertation => NN
1961 => CD
98 => CD
Shml => NNP
was => VBD
derived => VBN
from => IN
the => DT
Polish => JJ
mjasteczko => NN
small => JJ
city => NN
from => IN
rnjasto => NN
city => NN
then => RB
adopted => VBN
in => IN
Russian => JJ
as => IN
mestechko => NN
In => IN
many => JJ
English => NNP
books => NNS
including => VBG
autobiographies => NNS
and => CC
translations => NNS
shtetl => NN
is => VBZ
misleadingly => RB
translated => VBN
as => IN
village => NN
In => IN
some => DT
English => JJ
titles => NNS
given => VBN
to => TO
Chagall => NNP
's => POS
paintings => NNS
we => PRP
find 

attack => NN
Why => WRB
do => VBP
we => PRP
find => VB
no => DT
trace => NN
of => IN
these => DT
interpretations => NNS
in => IN
the => DT
memoirs => NNS
of => IN
Efros => NNP
who => WP
lectured => VBD
about => IN
the => DT
paintings => NNS
in => IN
the => DT
theater => NN
itselP => NN
Let => VB
us => PRP
analyze => VB
several => JJ
specific => JJ
claims => NNS
of => IN
this => DT
interpretation => NN
In => IN
her => PRP$
reading => NN
of => IN
the => DT
Introduction => NNP
through => IN
some => DT
leaps => NNS
of => IN
logic => NN
the => DT
author => NN
identifies => VBZ
the => DT
green => JJ
Chagallian => NNP
animal => NN
at => IN
the => DT
left => NN
as => IN
Malevich => NNP
's => POS
cow => NN
because => IN
in => IN
1913 => CD
Kazimir => NNP
Malevich => NNP
made => VBD
a => DT
painting => NN
with => IN
a => DT
cow => NN
and => CC
a => DT
violin => NN
along => IN
with => IN
some => DT
other => JJ
things => NNS
we => PRP
may => MD
add => VB
But => CC
Malevich => NNP
's => POS
cow => 

if/^ => NN
s => NN
^ => NNP
' => POS
V => NNP
:- => JJ
tl => NN
Study => NNP
for => IN
the => DT
Jev/ish => NNP
Theater => NNP
1920 => CD
Pencil => NNP
and => CC
watercolor => NN
on => IN
paper => NN
verso => NN
11.^ => CD
26.8 => CD
cm => NN
4 => CD
'A => POS
X => NN
10 => CD
'A => JJ
inches => NNS
Aluse'e => NNP
national => JJ
d'art => NN
moderne => NN
Centre => NNP
Georges => NNP
Pompidou => NNP
Paris => NNP
3 => CD
1 => CD
\ => CC
10 => CD
10 => CD
The => DT
Green => JJ
Violinist => NNP
ca => MD
1^20 => CD
Pencil => NNP
and => CC
waternlor => NN
on => IN
paper => NN
32 => CD
X => NNP
22 => CD
an => DT
12 => CD
Vs => NNP
X => NNP
8Vs => CD
inches => NNS
Collection => NNP
Ida => NNP
Chagall => NNP
Paris => NNP
^y'^ => NNP
^ => VBZ
11 => CD
The => DT
Green => JJ
Violinist => NNP
study => NN
for => IN
Musicj => NNP
ip20 => NN
Pencil => NNP
gouache => NN
and => CC
watercotor => NN
on => IN
brown => JJ
paper => NN
24.7 => CD
X => NNP
/J.J => NNP
cm => NN
9Y4 => CD
X => NNP
s'A => NN
inch

two => CD
human => JJ
tigures => NNS
protrude => VBP
from => IN
under => IN
his => PRP$
hooves => NNS
The => DT
head => NN
ot => IN
an => DT
old => JJ
woman => NN
has => VBZ
leaped => VBN
oft => JJ
and => CC
is => VBZ
tlying => VBG
upward => RB
and => CC
the => DT
headless => NN
body => NN
swittly => RB
sinks => VBZ
down => RB
to => TO
a => DT
cow => NN
standing => VBG
on => IN
the => DT
root => NN
ot => VBD
a => DT
house => NN
And => CC
the => DT
girl => NN
with => IN
a => DT
bouquet => NN
— => VBZ
a => DT
boy => NN
glued => NN
to => TO
her => PRP$
lips => NNS
folded => VBD
up => RP
in => IN
the => DT
air => NN
around => IN
her => PRP$
head => NN
like => IN
a => DT
cat => NN
hurled => VBN
upward => RB
an => DT
ox => NN
has => VBZ
a => DT
man => NN
's => POS
jacket => NN
and => CC
human => JJ
hands => NNS
and => CC
sits => VBZ
pensively => RB
leaning => VBG
on => IN
his => PRP$
elbow => NN
between => IN
two => CD
bare => JJ
feet => NNS
dangling => VBG
trom => IN
his => PRP$
shoulders =

candles => NNS
a => DT
giant => JJ
gravedigger => NN
raises => VBZ
his => PRP$
shovel => NN
a => DT
woman => NN
spreads => VBZ
her => PRP$
hands => NNS
high => JJ
and => CC
above => IN
them => PRP
all => DT
astride => IN
the => DT
roof => NN
of => IN
a => DT
house => NN
a => DT
strange => JJ
Jew => NNP
bent => NN
over => IN
his => PRP$
violin => NN
draws => VBZ
a => DT
melody => NN
— => NN
in => IN
harmony => NN
with => IN
the => DT
wind => NN
howling => VBG
under => IN
the => DT
glowering => NN
sky => NN
tearing => VBG
up => RP
the => DT
clouds => NN
and => CC
shaking => VBG
the => DT
eaves => NNS
while => IN
over => IN
the => DT
huts => NNS
a => DT
shoe => NN
and => CC
sock => NN
hang => NN
instead => RB
of => IN
si^ns => NN
1 => CD
36 => CD
Texts => NNP
and => CC
Documents => NNP
It => PRP
is => VBZ
amazing => VBG
that => IN
even => RB
in => IN
those => DT
early => JJ
years => NNS
Chagall => NNP
uses => VBZ
color => NN
and => CC
hue => NN
as => IN
means => NNS
oi => IN
characterizin

after => IN
a => DT
hard => JJ
earthly => NN
road => NN
he => PRP
now => RB
abides => VBZ
in => IN
a => DT
place => NN
of => IN
light => NN
a => DT
place => NN
of => IN
grains => NNS
a => DT
place => NN
of => IN
peace => NN
Is => VBZ
it => PRP
a => DT
final => JJ
reconciliation => NN
with => IN
everydayness => NN
that => IN
the => DT
subdued => VBN
artist => NN
has => VBZ
to => TO
go => VB
through => IN
But => CC
what => WP
will => MD
then => RB
link => VB
his => PRP$
grand => JJ
art => NN
with => IN
the => DT
apology => NN
for => IN
a => DT
dacha => NN
How => WRB
can => MD
we => PRP
know => VB
Except => IN
for => IN
guesses => NNS
what => WP
does => VBZ
Chagall => NNP
leave => VB
us => PRP
We => PRP
must => MD
admit => VB
courageously => RB
that => IN
there => EX
is => VBZ
nothing => NN
more => JJR
hopeless => NN
than => IN
predicting => VBG
his => PRP$
future => NN
for => IN
among => IN
our => PRP$
artists => NNS
there => EX
is => VBZ
no => DT
spirit => NN
more => RBR
free => JJ
and 

Chagall => VB
find => VB
another => DT
language => NN
for => IN
fiery => NN
and => CC
fateful => JJ
visions => NNS
of => IN
his => PRP$
creation => NN
If => IN
that => DT
happens => VBZ
and => CC
if => IN
the => DT
boundaries => NNS
of => IN
his => PRP$
ability => NN
and => CC
powers => NNS
thus => RB
expand => RB
then => RB
Chagall => NNP
will => MD
appear => VB
before => RB
us => PRP
as => IN
one => CD
of => IN
the => DT
most => RBS
accomplished => JJ
talents => NNS
of => IN
our => PRP$
art => NN
4 => CD
A => DT
painter => NN
turning => VBG
to => TO
graphics => NNS
becomes => VBZ
a => DT
philosopher => NN
of => IN
his => PRP$
own => JJ
work => NN
This => DT
will => MD
be => VB
recognized => VBN
when => WRB
we => PRP
recall => VBP
that => IN
graphics => NNS
are => VBP
the => DT
most => RBS
abstract => JJ
and => CC
generalizing => VBG
kind => NN
of => IN
art => NN
They => PRP
are => VBP
more => JJR
calculation => NN
than => IN
impulse => JJ
more => JJR
thought => JJ
than => IN
feeling 

simply => RB
what => WP
he => PRP
is => VBZ
as => IN
he => PRP
is => VBZ
made => VBN
is => VBZ
to => TO
say => VB
nothing => NN
Perhaps => RB
even => RB
the => DT
genius => NN
of => IN
Dostoevski => NNP
i => NN
's => POS
psychologism => NN
is => VBZ
related => VBN
causally => RB
to => TO
that => DT
minute => NN
he => PRP
experienced => VBD
on => IN
the => DT
gallows => NNS
The => DT
attempt => NN
of => IN
a => DT
literary => JJ
explanation => NN
does => VBZ
not => RB
diminish => VB
the => DT
artist => NN
On => IN
the => DT
contrary => JJ
his => PRP$
artistic => JJ
merits => NNS
are => VBP
diminished => VBN
when => WRB
he => PRP
himself => PRP
is => VBZ
so => RB
anecdotal => JJ
that => IN
such => PDT
an => DT
explanation => NN
is => VBZ
superfluous => JJ
One => CD
can => MD
explain => VB
Chagall => NNP
in => IN
a => DT
literary => JJ
manner => NN
but => CC
he => PRP
himself => PRP
is => VBZ
not => RB
a => DT
storyteller => NN
not => RB
an => DT
illustrator => NN
but => CC
first => RB
of

b/s => JJ
street => NN
bis => VBD
home => NN
— => NN
into => IN
the => DT
boundless => JJ
mystical => NN
The => DT
dirty => JJ
girl => NN
Aldonza => NNP
is => VBZ
no => DT
less => RBR
inspiring => VBG
than => IN
the => DT
beautiful => JJ
Dulcinea => NNP
because => IN
she => PRP
leaves => VBZ
room => NN
tor => NN
a => DT
dream => NN
Hoffmann => NNP
who => WP
grew => VBD
up => RP
among => IN
philistines => NNS
apple-sellers => NNS
and => CC
the => DT
Tomcat => NNP
Murr => NNP
was => VBD
a => DT
bright => JJ
storyteller => NN
and => CC
Gogol => NNP
' => POS
found => VBD
fantastic => JJ
curios => NNS
in => IN
the => DT
stupidity => NN
of => IN
Russian => NNP
mundane => NNP
life => NN
Chagall => NN
senses => VBZ
the => DT
supernatural => JJ
the => DT
mysterious => JJ
not => RB
only => RB
in => IN
Jewish => JJ
life => NN
but => CC
in => IN
mundane => JJ
life => NN
in => IN
general => JJ
He => PRP
is => VBZ
a => DT
student => NN
of => IN
Bakst => NNP
and => CC
Dobuzhinskii => NNP
but => CC
in

they => PRP
prepared => VBD
a => DT
repertoire => NN
they => PRP
listed => VBD
the => DT
teachers => NNS
and => CC
actors => NNS
by => IN
name => NN
they => PRP
wrote => VBD
estimates => NNS
and => CC
then => RB
the => DT
Revolution => NNP
blazed => VBD
up => RP
and => CC
the => DT
light => NN
of => IN
our => PRP$
work => NN
disappeared => VBN
in => IN
the => DT
glow => NN
of => IN
Revolutionary => NNP
sun => NN
A => DT
series => NN
of => IN
meetings => NNS
and => CC
assemblies => NNS
began => VBD
political => JJ
arguments => NNS
took => VBD
over => RP
The => DT
issue => NN
of => IN
the => DT
theater => NN
was => VBD
left => VBN
by => IN
the => DT
wayside => NN
In => IN
vain => NN
did => VBD
some => DT
try => NN
to => TO
go => VB
back => RB
to => TO
work => VB
— => RB
no => DT
one => NN
thought => VBD
about => IN
theater => NN
any => DT
longer => JJR
The => DT
convention => NN
of => IN
Yiddish => JJ
actors => NNS
in => IN
Moscow => NNP
did => VBD
no => DT
more => RBR
than => IN
force =

help => VB
Kuni-Lemel => NNP
and => CC
the => DT
Witch => NNP
Yakhne => NNP
screamed => VBD
louder => RBR
the => DT
Jewish => JJ
intelligentsia => NN
did => VBD
n't => RB
respond => VB
the => DT
portly => RB
Jewish => JJ
patron => NN
carefully => RB
hid => VB
his => PRP$
wallet => NN
and => CC
the => DT
Yiddish => JJ
stage => NN
remained => VBD
a => DT
dirty => NN
promiscuous => JJ
maidservant => NN
laughing => VBG
wildly => RB
dancing => VBG
making => VBG
ugly => RB
grimaces => NNS
here => RB
in => IN
Russia => NNP
and => CC
overseas => RB
— => NNP
in => IN
America => NNP
The => DT
same => JJ
lot => NN
tell => NN
to => TO
the => DT
poor => JJ
sickly => JJ
and => CC
talented => VBD
Peretz => NNP
Hirshbeyn => NNP
who => WP
tried => VBD
to => TO
start => VB
a => DT
new => JJ
page => NN
in => IN
the => DT
Book => NNP
of => IN
Lamentations => NNP
of => IN
the => DT
Yiddish => JJ
theater => NN
Neglected => VBN
rejected => VBN
despairing => VBG
tilled => VBN
with => IN
griet => JJ
and => CC


the => DT
actor => NN
's => POS
capability => NN
to => TO
become => VB
a => DT
part => NN
of => IN
the => DT
organic => JJ
whole => NN
of => IN
the => DT
performance => NN
I => PRP
consider => VBP
stage => JJ
art => NN
an => DT
independent => JJ
and => CC
sovereign => JJ
domain => NN
Therefore => RB
all => DT
elements => NNS
constituting => VBG
a => DT
finished => VBN
performance => NN
— => VBZ
the => DT
man => NN
the => DT
script => NN
the => DT
music => NN
the => DT
sets => NNS
and => CC
the => DT
light => JJ
— => NN
must => MD
be => VB
subordinate => JJ
to => TO
a => DT
single => JJ
steadfast => JJ
thought => NN
and => CC
the => DT
completed => VBN
score => NN
of => IN
the => DT
production => NN
I => PRP
do => VBP
n't => RB
mean => VB
that => IN
I => PRP
want => VBP
to => TO
bind => VB
the => DT
actor => NN
and => CC
deprive => JJ
him => PRP
of => IN
the => DT
possibility => NN
of => IN
being => VBG
creative => JJ
— => NN
on => IN
the => DT
contrary => JJ
he => PRP
is => VBZ
given =

from => IN
various => JJ
foods => NNS
bagels => NNS
and => CC
fruit => NN
set => VBN
tables => NNS
all => DT
painted => VBN
on => IN
friezes => NNS
Facing => VBG
them => PRP
— => VBP
the => DT
stage => NN
with => IN
the => DT
actors => NNS
The => DT
work => NN
was => VBD
hard => RB
my => PRP$
contact => NN
with => IN
the => DT
work => NN
was => VBD
settling => VBG
down => RP
Granovskii => NNP
apparently => RB
lived => VBD
slowly => RB
through => IN
a => DT
process => NN
of => IN
transformation => NN
from => IN
Reinhardt => NNP
and => CC
Stanislavskii => NNP
to => TO
something => NN
else => RB
In => IN
my => PRP$
presence => NN
Granovskii => NNP
seemed => VBD
to => TO
hover => VB
in => IN
other => JJ
worlds => NNS
Sometimes => RB
it => PRP
seemed => VBD
to => TO
me => PRP
that => IN
1 => CD
was => VBD
disturbing => VBG
him => PRP
Was => IN
it => PRP
true => JJ
I => PRP
do => VBP
n't => RB
know => VB
why => WRB
he => PRP
did => VBD
not => RB
confide => VB
in => IN
me => PRP
And => CC
1 =

Mikhoels => NNP
— => NNP
strong => JJ
though => IN
short => JJ
thin => JJ
but => CC
sturdy => JJ
practical => JJ
and => CC
dreamy => NN
his => PRP$
logic => NN
merged => VBD
with => IN
feeling => NN
his => PRP$
Yiddish => JJ
language => NN
sounded => VBD
as => IN
if => IN
it => PRP
came => VBD
from => IN
Yiddish => JJ
books => NNS
He => PRP
could => MD
help => VB
he => PRP
would => MD
pull => VB
himself => PRP
out => IN
and => CC
pull => VB
other => JJ
actors => NNS
even => RB
the => DT
director => NN
himself => PRP
along => IN
Right => RB
at => IN
my => PRP$
first => JJ
meeting => NN
with => IN
Mikhoels => NNP
I => PRP
was => VBD
amazed => VBN
by => IN
that => DT
rare => JJ
though => IN
still => RB
vague => JJ
artistic => JJ
striving => NN
and => CC
force => NN
which => WDT
one => CD
day => NN
will => MD
stumble => VB
onto => IN
logic => JJ
and => CC
form => NN
which => WDT
— => VBP
if => IN
you => PRP
fiind => VBP
them => PRP
— => JJ
take => VB
on => IN
various => JJ
sounds => NNS
rh

their => PRP$
appearance => NN
as => IN
the => DT
blush => NN
of => IN
the => DT
crisis => NN
In => IN
the => DT
beginning => NN
Russian => JJ
theater => NN
was => VBD
in => IN
a => DT
fever => NN
of => IN
decorationism => NN
the => DT
role => NN
of => IN
the => DT
artist => NN
made => VBD
disproportionately => RB
large => JJ
I => PRP
am => VBP
not => RB
afraid => JJ
to => TO
assert => VB
as => IN
my => PRP$
professional => JJ
memory => NN
reminds => NNS
me => PRP
that => IN
the => DT
premieres => NNS
of => IN
1912— => CD
17 => CD
were => VBD
impressive => JJ
mostly => RB
for => IN
the => DT
triumphs => NN
of => IN
their => PRP$
design => NN
rather => RB
than => IN
their => PRP$
actors => NNS
Later => RB
after => IN
the => DT
October => NNP
upheaval => NN
came => VBD
the => DT
era => NN
of => IN
Futurism => NNP
This => DT
happened => VBD
not => RB
because => IN
Russian => JJ
Futurism => NNP
was => VBD
belated => VBN
as => IN
the => DT
innovations => NNS
of => IN
Western => JJ
culture =

German => JJ
theatrical => JJ
methods => NNS
deformed => VBD
the => DT
Russian => JJ
stage => NN
tradition => NN
This => DT
however => RB
did => VBD
not => RB
resolve => VB
the => DT
problem => NN
did => VBD
not => RB
yet => RB
create => VB
a => DT
Jewish => JJ
theater => NN
The => DT
essence => NN
was => VBD
not => RB
here => RB
These => DT
were => VBD
only => RB
separate => JJ
levers => NNS
An => DT
Archimedean => JJ
point => NN
was => VBD
needed => VBN
III => NNP
Granovskii => NNP
selected => VBD
his => PRP$
first => JJ
designer => NN
in => IN
a => DT
manner => NN
typical => JJ
of => IN
other => JJ
Russian => JJ
theaters => NNS
at => IN
the => DT
time => NN
not => RB
forseeing => VBG
the => DT
role => NN
the => DT
artist => NN
would => MD
later => RB
play => VB
in => IN
realizing => VBG
his => PRP$
projects => NNS
Furthermore => RB
he => PRP
apparently => RB
did => VBD
not => RB
notice => VB
what => WP
was => VBD
happening => VBG
with => IN
artists => NNS
on => IN
the => DT
Russian 

people => NNS
and => CC
programs => NNS
Granovskii => NNP
had => VBD
to => TO
understand => VB
that => DT
in => IN
the => DT
artistic => JJ
revolution => NN
as => IN
in => IN
the => DT
social => JJ
revolution => NN
you => PRP
always => RB
have => VBP
to => TO
steer => VB
the => DT
most => RBS
extreme => JJ
course => NN
the => DT
resultant => JJ
force => NN
of => IN
intentions => NNS
and => CC
possibilities => NNS
will => MD
sort => VB
itself => PRP
out => RP
The => DT
Yiddish => JJ
stage => NN
needed => VBD
the => DT
most => RBS
Jewy => NNP
the => DT
most => RBS
contemporary => JJ
the => DT
most => RBS
unusual => JJ
the => DT
most => RBS
difficult => JJ
of => IN
all => DT
artists => NNS
And => CC
so => RB
I => PRP
mentioned => VBD
Chagall => NNP
's => POS
name => NN
to => TO
Granovskii => NNP
Granovskii => NNP
's => POS
always-sleepy => JJ
eyes => NNS
opened => VBN
with => IN
a => DT
start => NN
and => CC
rounded => VBD
like => IN
the => DT
eyes => NNS
of => IN
an => DT
owl => NN
at =>

nature => NN
of => IN
this => DT
ensemble => NN
was => VBD
so => RB
untheatrical => JJ
that => IN
one => CD
might => MD
have => VB
asked => VBN
why => WRB
turn => VBP
off => RP
the => DT
light => NN
in => IN
the => DT
auditorium => NN
and => CC
why => WRB
do => VBP
these => DT
Chagallian => JJ
beings => NNS
move => NN
and => CC
speak => NN
on => IN
the => DT
stage => NN
rather => RB
than => IN
stand => VB
unmoving => JJ
and => CC
silent => JJ
as => IN
on => IN
his => PRP$
canvases => NNS
Ultimately => RB
the => DT
Sholem => NNP
Aleichem => NNP
Evening => NNP
was => VBD
conducted => VBN
as => IN
it => PRP
were => VBD
in => IN
the => DT
form => NN
of => IN
Chagall => NNP
paintings => NNS
come => VBP
to => TO
life => NN
The => DT
best => JJS
places => NNS
were => VBD
those => DT
in => IN
which => WDT
Granovskii => NNP
executed => VBD
his => PRP$
system => NN
of => IN
dots => NNS
and => CC
the => DT
actors => NNS
froze => VBP
in => IN
mid-movement => NN
and => CC
gesture => NN
from => IN
o

Pale => NNP
of => IN
Settlement => NNP
The => DT
plants => NNS
and => CC
factories => NNS
were => VBD
opened => VBN
for => IN
Jewish => JJ
workers => NNS
The => DT
proletarian => JJ
supplanted => VBD
the => DT
artisan => NN
And => CC
instead => RB
of => IN
the => DT
right => NN
to => TO
graze => VB
a => DT
goat => NN
in => IN
a => DT
cemetery => NN
Jews => NNP
got => VBD
the => DT
right => NN
to => TO
the => DT
land => NN
Now => RB
in => IN
Byelorussia => NNP
and => CC
at => IN
the => DT
Azov => NNP
Sea => NNP
an => DT
immense => JJ
work => NN
to => TO
grant => VB
land => NN
to => TO
the => DT
Jews => NNP
is => VBZ
proceeding => VBG
Kolkhozes => NNP
emerge => NN
the => DT
soil => NN
is => VBZ
irrigated => VBN
Now => RB
it => PRP
is => VBZ
clear => JJ
that => IN
Zionism => NNP
a => DT
Jewish => JJ
state => NN
in => IN
Palestine => NNP
will => MD
produce => VB
only => RB
a => DT
southern => JJ
resort => NN
for => IN
rich => JJ
Jews => NNP
A => DT
patriotic => JJ
resort => NN
with => IN
o

's => POS
miniatures => NNS
and => CC
then => RB
in => IN
the => DT
other => JJ
works => NNS
especially => RB
in => IN
The => DT
Sorceress => NNP
But => CC
the => DT
chief => JJ
property => NN
of => IN
the => DT
Yiddish => JJ
Chamber => NNP
Theater => NNP
which => WDT
attracts => VBZ
more => JJR
attention => NN
than => IN
any => DT
other => JJ
I => PRP
would => MD
say => VB
is => VBZ
its => PRP$
planned => VBN
creation => NN
or => CC
the => DT
rationalist => JJ
methods => NNS
of => IN
its => PRP$
artistic => JJ
work => NN
The => DT
Yiddish => JJ
Chamber => NNP
Theater => NNP
completely => RB
rejects => VBZ
the => DT
method => NN
of => IN
experiencing => NN
the => DT
cult => NN
of => IN
emotionality => NN
Above => IN
the => DT
kingdom => NN
of => IN
necessity => NN
— => CC
above => IN
the => DT
spontaneous => JJ
force => NN
of => IN
unregulated => JJ
feelings => NNS
— => VBP
it => PRP
puts => VBZ
the => DT
kingdom => NN
of => IN
freedom => NN
— => IN
the => DT
organized => VBN
and => CC

The => DT
major => JJ
traits => NNS
of => IN
Sholem => NNP
Aleichem => NNP
' => POS
s => JJ
writings => NNS
are => VBP
daydreaming => VBG
and => CC
skepticism => NN
their => PRP$
unique => JJ
combination => NN
creates => VBZ
lyrical => JJ
irony => NN
the => DT
lyric-Jewish => JJ
humor => NN
The => DT
daydreaming => NN
endows => VBZ
the => DT
skepticism => NN
with => IN
a => DT
hopeful => JJ
character => NN
it => PRP
leaves => VBZ
the => DT
door => NN
open => JJ
for => IN
the => DT
eternal => JJ
''perhaps => NNS
yes. => VBP
The => DT
skepticism => NN
brings => VBZ
a => DT
fair => NN
in => IN
the => DT
sky => NN
castles => NNS
in => IN
the => DT
air => NN
down => RB
to => TO
earth => NN
and => CC
transforms => NNS
tangible => JJ
life => NN
itself => PRP
into => IN
a => DT
dream => NN
into => IN
a => DT
question- => JJ
niark => NN
This => DT
daydreaming => NN
that => WDT
gives => VBZ
wings => NNS
of => IN
hope => NN
to => TO
doubt => VB
this => DT
doubt => NN
that => WDT
is => VBZ
willing

Talks => NNS
to => TO
himself => PRP
] => VB
On => IN
the => DT
face => NN
of => IN
it => PRP
a => DT
pretty => RB
solid => JJ
citizen => NN
God => NNP
willing => JJ
he => PRP
will => MD
condemnify => VB
himself => PRP
from => IN
death => NN
[ => NN
To => TO
himselj => VB
] => JJ
With => IN
such => PDT
a => DT
jerk => NN
you => PRP
can => MD
talk => VB
Yiddish => JJ
[ => NN
Starts => VBZ
talking => VBG
Yiddish => JJ
] => NN
A => DT
pleasure => NN
to => TO
travel => VB
by => IN
train => NN
Not => RB
like => IN
it => PRP
used => VBD
to => TO
be => VB
When => WRB
you => PRP
traveled => VBN
by => IN
wagon => NN
you => PRP
used => VBD
to => TO
vous => JJ
comprenez => NN
drag => NN
on => IN
and => CC
on => IN
and => CC
on => IN
[ => NN
Repeats => NNS
after => IN
him => PRP
] => VBP
Drag => NNP
on => IN
and => CC
on => IN
and => CC
on => IN
You => PRP
were => VBD
m => JJ
Diaspora => NNP
in => IN
the => DT
hands => NNS
of => IN
the => DT
carter => NN
like => IN
clay => NN
in => IN
the => DT
Ma

On => IN
the => DT
road => NN
with => IN
children => NNS
must => MD
Lm => NNP
sure => NN
be => VB
hard => RB
To => TO
himself => PRP
] => JJ
Maybe => RB
this => DT
character => NN
would => MD
condemnify => VB
himself => PRP
from => IN
death => NN
maybe => RB
Aloud => NNP
] => VBZ
It => PRP
's => VBZ
hard => JJ
I => PRP
'm => VBP
sure => JJ
with => IN
children => NNS
on => IN
the => DT
road => NN
As => RB
hard => JJ
as => IN
death => NN
They => PRP
're => VBP
jittery => JJ
your => PRP$
kids => NNS
are => VBP
n't => RB
they => PRP
You => PRP
spoil => VBP
them => PRP
Are => IN
they => PRP
spoiled => VBD
your => PRP$
kids => NNS
Pokes => VB
the => DT
oldest => JJS
to => TO
move => VB
him => PRP
away => RB
from => IN
the => DT
window => NN
] => NNP
Not => RB
so => RB
jittery => JJ
as => IN
dear => NN
Fine => JJ
children => NNS
you => PRP
have => VBP
No => DT
evil => JJ
eye => NN
turned => VBD
out => RP
good => JJ
vous => JJ
comprenez => NN
Turned => VBN
out => RP
good => JJ
no => DT
evil =>

and => CC
in => IN
the => DT
theater => NN
Furthermore => RB
I => PRP
am => VBP
sure => JJ
that => IN
if => IN
I => PRP
stopped => VBD
shaving => VBG
I => PRP
would => MD
see => VB
his => PRP$
precise => JJ
portrait => NN
By => IN
the => DT
way => NN
my => PRP$
father => NN
Believe => NNP
me => PRP
I => PRP
put => VBP
in => IN
cjuite => NN
a => DT
bit => NN
of => IN
effort => NN
no => DT
less => JJR
love => NN
and => CC
what => WP
love => VB
have => VBP
we => PRP
both => DT
expended => VBD
The => DT
difference => NN
is => VBZ
only => RB
that => IN
he => PRP
took => VBD
orders => NNS
for => IN
signs => NNS
and => CC
I => PRP
studied => VBD
in => IN
Paris => NNP
about => RB
which => WDT
he => PRP
also => RB
heard => VBD
something => NN
And => CC
yet => RB
Both => DT
I => PRP
and => CC
he => PRP
and => CC
others => NNS
there => EX
are => VBP
such => JJ
are => VBP
not => RB
yet => RB
Jewish => JJ
art => NN
as => IN
a => DT
whole => NN
Why => WRB
not => RB
speak => VB
the => DT
truth => NN


of => IN
their => PRP$
time => NN
Can => MD
we => PRP
help => VB
it => PRP
In => IN
our => PRP$
Jewish => JJ
society => NN
we => PRP
do => VBP
n't => RB
have => VB
a => DT
Diaghilev => NNP
a => DT
Morozov => NNP
a => DT
Shchukin => NNP
who => WP
collected => VBD
and => CC
organized => VBD
the => DT
art => JJ
culture => NN
with => IN
such => JJ
ardor => NNS
and => CC
understanding => NN
And => CC
the => DT
fact => NN
that => IN
the => DT
intelligentsia => NN
in => IN
general => JJ
and => CC
Yiddish => JJ
writers => NNS
in => IN
particular => JJ
lack => JJ
interest => NN
in => IN
the => DT
plastic => NN
arts => NNS
indicates => VBZ
that => IN
art => NN
is => VBZ
alien => JJ
and => CC
superfluous => JJ
in => IN
their => PRP$
lives => NNS
and => CC
work => NN
and => CC
the => DT
world => NN
rests => VBZ
on => IN
literature => NN
alone => RB
It => PRP
Yiddish => JJ
poetry => NN
Yiddish => JJ
literature => NN
were => VBD
intertwined => VBN
with => IN
other => JJ
branches => NNS
of => IN
art 

soul => NN
I => PRP
greet => VBP
the => DT
Jewish => JJ
folk => NN
masses => NNS
I => PRP
always => RB
wanted => VBD
to => TO
feel => VB
like => IN
one => CD
of => IN
them => PRP
to => TO
fill => VB
myself => PRP
with => IN
the => DT
people => NNS
's => POS
breath => NN
as => IN
once => RB
upon => IN
a => DT
time => NN
in => IN
my => PRP$
home => NN
It => PRP
is => VBZ
good => JJ
to => TO
come => VB
to => TO
the => DT
people => NNS
as => IN
a => DT
man => NN
who => WP
knocks => VBZ
at => IN
the => DT
door => NN
at => IN
night => NN
Let => VB
us => PRP
just => RB
not => RB
think => VB
that => IN
the => DT
door => NN
is => VBZ
like => IN
a => DT
wall => NN
To => TO
go => VB
to => TO
the => DT
people => NNS
find => NN
in => IN
them => PRP
a => DT
salvation => NN
from => IN
yourself => PRP
a => DT
way => NN
to => TO
a => DT
lost => VBN
world => NN
I => PRP
wish => VBP
you => PRP
and => CC
your => PRP$
chilciren => NN
to => TO
seek => VB
not => RB
only => RB
a => DT
piece => NN
of => IN
bre

Jewishly => RB
stream => NN
into => IN
our => PRP$
people => NNS
as => IN
into => IN
a => DT
river => NN
a => DT
river => NN
that => WDT
flows => VBZ
into => IN
the => DT
sea => NN
of => IN
the => DT
world => NN
1955 => CD
Texts => NNP
and => CC
Documents => NNP
\ => VBP
179 => CD
Summary => JJ
Translation => NN
of => IN
Chagall => NNP
's => POS
Letter => NNP
to => TO
President => NNP
Weitzman => NNP
This => DT
s/iiinuayy => NN
in => IN
English => NNP
was => VBD
discovered => VBN
among => IN
Chagall => NNP
's => POS
correspondence => NN
in => IN
the => DT
YIVO => NNP
archives => NNS
New => NNP
York => NNP
City => NNP
It => PRP
is => VBZ
reprinted => VBN
with => IN
only => RB
minor => JJ
spelling => VBG
corrections => NNS
The => DT
original => JJ
is => VBZ
not => RB
in => IN
the => DT
Weitzman => NNP
archives => VBZ
in => IN
Rehovot => NNP
I => PRP
write => VBP
to => TO
you => PRP
as => IN
our => PRP$
fathers => NNS
in => IN
Russia => NNP
used => VBD
to => TO
write => VB
to => TO
their 

issue => NN
that => WDT
brings => VBZ
this => DT
change => NN
but => CC
the => DT
blood => NN
itself => PRP
a => DT
certain => JJ
chemistry => NN
of => IN
nature => NN
objects => NNS
and => CC
human => JJ
concentration => NN
itself => PRP
You => PRP
can => MD
see => VB
the => DT
conception => NN
of => IN
this => DT
authenticity => NN
in => IN
all => DT
domains => NNS
How => WRB
was => VBD
it => PRP
born => VBZ
how => WRB
is => VBZ
it => PRP
built => VBD
up => RP
this => DT
chemistry => NN
through => IN
which => WDT
art => NN
is => VBZ
created => VBN
the => DT
true => JJ
conception => NN
of => IN
the => DT
world => NN
and => CC
of => IN
life => NN
It => PRP
consists => VBZ
of => IN
elements => NNS
of => IN
love => NN
and => CC
of => IN
a => DT
certain => JJ
natural => JJ
attitude => NN
just => RB
as => IN
nature => NN
itself => PRP
which => WDT
can => MD
not => RB
stand => VB
evil => NN
hatred => VBD
indifference => NN
If => IN
for => IN
example => NN
we => PRP
are => VBP
seized => VBN


of => IN
peace => NN
is => VBZ
still => RB
a => DT
mirage => NN
Art => NN
of => IN
genius => NN
and => CC
its => PRP$
luminaries => NNS
are => VBP
so => RB
rare => JJ
People => NNS
prefer => VBP
to => TO
be => VB
content => JJ
with => IN
evil => JJ
and => CC
injustice => NN
rather => RB
than => IN
to => TO
clutch => VB
onto => IN
love => NN
I => PRP
pit => VBP
our => PRP$
enemies => NNS
who => WP
waste => VBP
their => PRP$
time => NN
and => CC
their => PRP$
lives => NNS
on => IN
b => NN
^ways => NNS
and => CC
tr\- => NN
to => TO
burst => VB
through => IN
closed => JJ
doors => NNS
that => WDT
are => VBP
actually => RB
open => JJ
The => DT
straight => JJ
road => NN
and => CC
the => DT
key => NN
to => TO
the => DT
doors => NNS
is => VBZ
love => NN
which => WDT
is => VBZ
sown => VBN
here => RB
at => IN
ever => RB
step => NN
by => IN
our => PRP$
forefathers => NNS
by => IN
the => DT
people => NNS
who => WP
returned => VBD
here => RB
two => CD
thousand => CD
years => NNS
later => RB
from => 

will => MD
send => VB
you => PRP
my => PRP$
dreaming => VBG
blood => NN
My => PRP$
breath => NN
will => MD
gradually => RB
drip => VB
like => IN
tears => NNS
The => DT
air => NN
will => MD
sway => VB
blue => JJ
And => CC
I => PRP
will => MD
lie => VB
quietly => RB
at => IN
the => DT
fence => NN
Are => NNP
you => PRP
my => PRP$
homeland => NN
angry => JJ
at => IN
me => PRP
I => PRP
am => VBP
open => JJ
to => TO
you => PRP
like => IN
water => NN
in => IN
a => DT
bottle => NN
Long => RB
ago => RB
you => PRP
hurled => VBD
me => PRP
into => IN
the => DT
distance => NN
I => PRP
will => MD
come => VB
to => TO
you => PRP
to => TO
sleep => VB
forever => RB
And => CC
you => PRP
will => MD
cover => VB
my => PRP$
grave => NN
with => IN
ash => NN
IV => NNP
My => NNP
people => NNS
poor => JJ
people => NNS
you => PRP
have => VBP
no => DT
more => JJR
tears => NNS
No => DT
cloud => NN
walks => VBZ
before => IN
us => PRP
no => DT
star => NN
Our => PRP$
Moses => NNS
is => VBZ
dead => JJ
He => PRP
has => 

I => PRP
stand => VBP
up => RB
and => CC
say => VB
farewell => NN
to => TO
you => PRP
I => PRP
take => VBP
the => DT
road => NN
to => TO
the => DT
new => JJ
Temple => NNP
And => CC
light => VBD
a => DT
candle => NN
there => EX
Before => IN
your => PRP$
image => NN
1950 => CD
Texts => NNP
and => CC
Documents => NNP
1 => CD
97 => CD
To => TO
Israel => NNP
Should => NNP
I => PRP
pray => VBP
to => TO
God => NNP
Who => NNP
led => VBD
my => PRP$
people => NNS
to => TO
the => DT
fire => NN
Or => NNP
should => MD
I => PRP
paint => VB
Him => NNP
in => IN
image => NN
of => IN
flame => NN
Should => MD
I => PRP
get => VB
up => RP
from => IN
my => PRP$
place => NN
a => DT
new => JJ
Jew => NNP
And => CC
go => VB
fight => RB
along => RB
with => IN
my => PRP$
race => NN
Should => MD
my => PRP$
eyes => NNS
lament => NN
without => IN
a => DT
halt => NN
So => IN
the => DT
tears => NNS
drown => VBN
in => IN
a => DT
river => NN
I => PRP
wo => MD
n't => RB
let => VB
my => PRP$
grief => NN
approach => NN
Whe

Crosscurrents => NNS
of => IN
Modernism => NNP
Four => CD
Latin => JJ
American => JJ
Pioneers => NNPS
Washington => NNP
D. => NNP
C => NNP
Smithsonian => JJ
Institution => NNP
Press => NNP
1992 => CD
Frost => NNP
Matthew => NNP
Marc => NNP
Chagall => NNP
and => CC
the => DT
Jewish => JJ
State => NNP
Chamber => NNP
Theater => NNP
Russian => JJ
History => NNP
vol => NN
8 => CD
1981 => CD
parts => NNS
1-2 => CD
pp => NN
90-107 => CD
Gay => NNP
Peter => NNP
Sigmund => JJ
Freud => NN
A => DT
German => JJ
and => CC
his => PRP$
Discontents => NNS
Freud => NNP
Jews => NNS
and => CC
Other => JJ
Germans => NNS
Masters => NNS
and => CC
Victims => NNP
in => IN
Modernist => NNP
Culture => NNP
New => NNP
York => NNP
Oxford => NNP
University => NNP
Press => NNP
1978 => CD
Geyser => NNP
M. => NNP
Solomon => NNP
Mikhoels => NNP
in => IN
Russian => NNP
Moscow => NNP
Prometheus => NNP
1990. => CD
Gifts => NNS
of => IN
Fate => NNP
in => IN
Russian => NNP
Unpublished => VBN
Gnessin => NNP
M. => NNP
Darki =

Theater => NNP
in => IN
the => DT
nineteen-twenties => NNS
Lausanne => NN
La => NNP
Cite => NNP
1973 => CD
Rischbieter => NNP
Henning => NNP
ed => NN
Art => NN
and => CC
the => DT
Stage => NN
in => IN
the => DT
Twentieth => NNP
Century => NNP
trans => NNS
from => IN
the => DT
German => JJ
by => IN
Michael => NNP
Bullock => NNP
Greenwich => NNP
Conn. => NNP
New => NNP
York => NNP
Graphic => NNP
Society => NNP
1978 => CD
Romm => NNP
Aleksandr => NNP
Marc => NNP
Chagall => NNP
in => IN
Russian => NNP
Unpublished => VBN
Ronch => NN
Itzhak => NNP
Elchanan => NNP
Di => NNP
velt => VBD
fun => JJ
Marc => NNP
Shagal => NNP
The => DT
world => NN
of => IN
Marc => NNP
Chagall => NNP
New => NNP
York => NNP
YIKUF => NN
1967 => CD
Roose-Evans => NNS
James => NNP
Experi => NNP
nental => JJ
Theatre => NN
From => NNP
Stanislavsky => NNP
to => TO
Today => NNP
New => NNP
York => NNP
Universe => NN
Books => NNP
1970 => CD
Rost => NNP
Nico => NNP
Kunst => NNP
en => VBZ
Kult/ir => NNP
in => IN
Soivjetrusland

me => PRP
and => CC
started => VBD
lamenting => VBG
how => WRB
quickly => RB
the => DT
years => NNS
had => VBD
flown => VBN
by => IN
He => PRP
recalled => VBD
the => DT
time => NN
of => IN
his => PRP$
youth => NN
when => WRB
he => PRP
and => CC
Chagall => NNP
had => VBD
worked => VBN
and => CC
exhibited => VBN
together => RB
in => IN
Berlin => NNP
We => PRP
were => VBD
waiting => VBG
for => IN
the => DT
viewing => NN
to => TO
end => VB
when => WRB
Chagall => NNP
came => VBD
over => RB
a => DT
museum => NN
photographer => NN
captured => VBD
the => DT
touching => VBG
meeting => NN
of => IN
the => DT
two => CD
former => JJ
students => NNS
as => IN
I => PRP
stood => VBD
between => IN
them => PRP
When => WRB
Chagall => NNP
came => VBD
to => TO
the => DT
museum => NN
again => RB
on => IN
June => NNP
8 => CD
to => TO
see => VB
the => DT
GOSEKT => NNP
panels => NNS
I => PRP
handed => VBD
him => PRP
the => DT
photograph => NN
which => WDT
he => PRP
kindly => RB
signed => VBD
with => IN
the => D

In [36]:
# With NLTK:
import nltk
nltk.pos_tag(tokens_no_stop)

[(u'GUGGENHEIM', 'NNP'),
 (u'MUSEUM', 'NNP'),
 (u'Digitized', 'NNP'),
 (u'Internet', 'NNP'),
 (u'Arciiive', 'NNP'),
 (u'2012', 'CD'),
 (u'witii', 'NN'),
 (u'funding', 'VBG'),
 (u'IVIetropolitan', 'NNP'),
 (u'New', 'NNP'),
 (u'York', 'NNP'),
 (u'Library', 'NNP'),
 (u'Council', 'NNP'),
 (u'METRO', 'NNP'),
 (u'http', 'NN'),
 (u'//archive.org/details/chagalljOOchag', 'NNP'),
 (u'Marc', 'NNP'),
 (u'Chagall', 'NNP'),
 (u'Jevs^ish', 'NNP'),
 (u'Theater', 'NNP'),
 (u'Marc', 'NNP'),
 (u'Chagall', 'NNP'),
 (u'JevN^ish', 'NNP'),
 (u'Theater', 'NNP'),
 (u'GUGGENHEIM', 'NNP'),
 (u'MUSEUM', 'NNP'),
 (u'\xa9The', 'NNP'),
 (u'Solomon', 'NNP'),
 (u'R.', 'NNP'),
 (u'Guggenheim', 'NNP'),
 (u'Foundation', 'NNP'),
 (u'New', 'NNP'),
 (u'York', 'NNP'),
 (u'1992', 'CD'),
 (u'All', 'NNP'),
 (u'rights', 'NNS'),
 (u'reserved', 'VBN'),
 (u'Reproductions', 'NNPS'),
 (u'cat', 'NN'),
 (u'nos', 'RB'),
 (u'1-7', 'JJ'),
 (u'\xa9', 'NNP'),
 (u'State', 'NNP'),
 (u"Tret'iakov", 'NNP'),
 (u'Gallery', 'NNP'),
 (u'Moscow', '

### Noun Phrase Extraction
Noun phrase extraction, as the name suggests, refers to extracting phrases that contain nouns. 

In [34]:
for noun_phrase in text_blob_object.noun_phrases:  
    print(noun_phrase)

guggenheim museum digitized
internet arciiive
ivietropolitan
york
library council
metro
marc chagall
jevs^ish
marc chagall
jevn^ish
guggenheim museum
solomon r. guggenheim
york
reproductions
© state
tret'iakov
moscow marc chagall
je
vish theater
solomon r. guggenheim
september
art
chicago january
isbn
guggenheim
york
york
prmted
thorner
front
marc chagall
m/isk
tempera
i03-5
v4
tret'iakov
moscow
marc chagall
loi'e
tempera
7s x
'a inches
tret'iakov
moscow frontispiece
emblem
jewish chamber theater
petrograd
color
cat
tret'iakov
moscow
h. preisig
fondation pierre gianadda
lee ewing
cat
musee
national d'art moderne
centre georges pompidou
paris
philippe migeat
centre g. pompidou
cat
ida chagall
paris
cat
solomon r. guggenheim
myles aronowitz
lee ewing
carmelo guadagno
david heald
lufthansa
lufthansa additional
helena rubinstein
marc chagall
jewish theater
contents preface thomas krens
fore
vord h/rii
k. korolev x sponsor
statement
weber
introduction jeiuujer bleising
chagall
's auditorium

— show
beginning work
rodchenko
varvara stepanova
nineteenth
state exhibition
brief
stepanova
's diary
november
chagall
own verdict
dancers —
work —
chagall
own stylistic inventions
constructivism
chagall
collage
theater murals
twenty-third
state exhibition
june
collage elements
russian collages
rodchenko
stepanova
such flirtation
chagall
different ways
general recognition
june
own initiative
theater management
jews
expediently
introd/zction
jewish theater
dual purpose
new theater
beciiuse
se —
puni
lissitzky
kandinskii
chagall
general public
chagall
's theater murals
early closeness
evreinov
introduction
abstract geometric elements
background double
musical instruments
angel trumpeter —
deliberate incompleteness
collage
artist 's repertoire
modern world
socialist pageantry
russia
formalist
malevich
rodchenko
lissitzky
kandinskii
chagall
efros
chagall
granovskii
everyday world
cubist
h\y life
real musicians
chagall
right-hand section
level —
shakespeare
hamlet
hasidic
chagall
artist 's

authentic image
introspectivists
inner voice
internal panorama — kaleidoscopic
chagall
introspectivists
cultural background
yiddish
common native tongue
yiddish
yiddishist
max weinreich
such disparate components
slavic
tongue
hebrew
aramaic
international vocabulary
various components
yiddish
pluricultural game
yiddish
multilingual perspective
english
yiddish
component languages
yiddish
multicultural perspectives
yiddish
open language
hebrew
part —
yiddish
german vocabulary
slavic
chagall
's mind
heterogeneous components
painting mirrors
yiddish
yiddish
general culture
christian tradition
jewish knowledge
chagall
yiddish
russian
modernist
revolution
total secularization
jews
general european culture
european culture
radical upheaval
eager newcomers
imaginary museum
andre malraux
's term
chagall
louvre
items regardless
historical context
c^hagall
's generation
modernism
european past
modernism
benjamin harsloav
yiddish
york
chagall
/ lot'e
harmonious truth
various poles
art meet
realist 

realistic background
art
chagall
production design
moscow
schil'dknecht
's design
railroad car
double compartment furniture
minimalist detail
asymmetrical arch
realistic illusion
tiny locomotive
yiddish
smok [ ers }
yiddish
iii
cl [ ass }
agents
vovs
mikhoels
} /
agents
/ /
valise
coat
briefcase
matches
letter
paper
white pages
krashinski
agents
/ /
small
briefcase
cigarette
cigarettes
pages
white paper
january
green curtain
right side
wooden window
wooden moon
wooden board
locomotive smokestack
white bench
railroad car
yakenhoz
lanternshooter
big doll
davidka
wooden instrument
somehow
tret'iakov
gallery —
church building
mikhoels
nkvd
concentration camps
yiddish
official soviet
yiddish
chagall
's paintings
yiddish
tret'iakov
liquidation
tret'iakov
moscow
art circles
artistic director
aleksandr tyshler
tret'iakov
tret'iakov
tret'iakov
careful job
long-distance transport
germany
original colors
chagall
's signatures
russia
chagall
french
introduction
russian letters
cyrillic
mazel tor
c

russian
late
russian-speaking
valentina brodskii
london
france
france
marc
intellectual friends
yiddish
russian flavor
virginia haggard
english
paris
york
france
french
russian accent
grammatical errors
virginia
chagall
whole book
russian pig
chagall
marc chagall
vitebsk
july
lucky number
lyozno
art historian
aleksandra shatskikh
chagall
's friend
abram efros
lyozno
israeli
zalman
benjamin marshal
shazar
byelorussia
lyozno
paris
young people
chagall
's parents
lyozno
paternal grandfather
provincial capital
vitebsk
big city
lyezner
lyozno
efros
lyozno
chagall
paris
chagall
small place
lyozno
vitebsk
symbol ol
pale
settlement
russia
source ol
fictional world
lyozno
early images
vitebsk
small-town scene
vitebsk
time show
different city altogether
vitebsk
lyozno
semirealistic paintings
jewish context
lyozno
chabad hasidism
lyozno
town 's
shazar
shneur-zalman rubashov
legendary rebbe
chagall
secular legend
modern jewish culture
chagall
strong religious world
jewishness
chagall
vitebsk
tugen

forgotten sensations
dark room
hairy hands
black walls
old chair
hoffmann
fantastic world
fantastic world
baron miinchhausen
hoffmann
miinchhausen
demands disbelief
respective effects
hoffmann
chagall
special logic
hoffmann
's inventions
naive spectator approaches
chagall
naturalistic criteria
chagall
's absurdities
art states
own special laws
sense implies
external manifestations
work — paintings
statues —
chagall
's art
illegitimate yardstick
realistic-mundane painting
own internal logic
chagall
's art
external boundaries
provincial-petersburg period
chagall
vitebsk province
petersburg
bakst
's school
independent paintings
period —
chagall
paris
chagall
bohemia
ruche
unusual canvases
new art
berlin
amsterdam
such chimerical canvases
paris
window
carter
calf seller
brides
headless bodies
current period —
russia
great war
chagall
vitebsk
barbershop
shtetl lyozno
provinces
outskirts
vitebsk
prayingjew
birthday
guitarist
internal line
creative work
chronological boundaries
chagall
chagal

mikhoels
yiddish
yiddish
new theater style
important meaning
technical material
mikhoels
wide road
theater art
own power
mikhoels
rare happy people
moscow yiddish
zuskin
thick lips
vakhtangov
habima
rowina
dybbuk
studying
mikhoels
theater stage
great time
art 's sake
mikhoels
texts
doc/tti/crits mikhoels
chagall yosef schein
ni \
slijchi
around
moscow yiddish
french
paris
lcs editions polyglottes
mikhoels
's success
menakhem-mendel yakenhoz
marc chagall
mikhoels
day ot
chagall
[ pale
downcast ]
chagall
right eyebrow
wrin- kles
wrinkle lines
menakhem-mendel
's tragic lot
question mark
solomon
solomon
right eye
mikhoels
mikhoels
chagall
menakhem-mendel
's eye
artists
granovskii
's theater
ahram efros exarpttd jroiii
i^ssay ongdhdl
r//ssian
iskusstvo
moscow
russian theater
secondary elements
basic ones
scenic accessories
theatrical art
external formation
stage designers
russian theater
world name
european centers
western criticism
exotic episode
revolution
true significance
highness vulga

bakingfish lanternshooter menakhem-mendel turtledove bakingfish lanternshooter menakhem-mendel bakingfish lanternshooter menakhem-mendel turtledove bakingfish lanternshooter menakhem-mendel turtledove bakingfish lanternshooter menakhem-mendel turtledove bakingfish lanternshooter
stands
stretches
akim isaakovich bakingfish
inspector-organizer
york
really
brother 's keeper
mark moyseyevich lanternshooter
agent acquereur
equitable.
looks
menakhem-mendel
elegantly
menakhem-mendel yakenhoz
sup-agent
yakir
scene
turtledove
looks
family ]
est-ce
que je peux entrer
je
enchante
dans
new character sits
commotion
screams ''dinner
mama
mother cracks nuts
child 's mouth
father honors
— 07ie
back ] {
aloud
un
convenable subject
aloud
lm
aloud
're jittery
pokes
window ]
fine children
evil eye
vous comprenez
turned
evil eye
learn
well
breeding
main thing
abrasha
n/
abrasha
abrasha
father ]
points
oy
'll mix
listens
main thing
little squirrel
big mouth
old man
davidke
davidke
davidke
mother 's arms
fat

naia dekada
dobrushin
cloud
yiddish
der etnes
jan.
german press
performances
moscow
yiddish
yiddish
der ernes
may
mikhoels
der aktyor
mikhoels
moscow
der ernes
doria
charles
samizdat art
essays
john e. bowk
szymon bojko
rimma
valery gerlovin
york
willis
locker
owens
efros
abram
remarks
art
chagall
al'tman
fal'k
russian
noiy
piit\ nos
russian
kuttiira
opening curtain
season
jewish theater
russian
teatr
i muzyka
nov.
iio-ii
rise
jewish chamber theater
russian
novaia rossiia
artists
granovskii
's theater
russian
iskusstvo
reprinted
kovtsheg
almanakh
evreiskoy kuttury [
ark
almanac
jewish culture ]
moscow/jerusalem
tarbut/ khudozhestvennaia
projili
profiles
moscow
federatsiia
efros
a.
tugendkhol
iskusstvo marka shagala
marc chagall
moscow
helikon
die kunst marc chagalls
frieda ichak-rubiner
potsdam
gustav kiepenheuer verlag
erben
walter
marc chagall
york
washington
frederick a. praeger
rev
even-zohar
itamar
polysyste
studies
special issue o {
poetics
eireiskaia
jewish encyclopedia
petersbu

# Step 3: Named Entity Recognition / Entity Extraction

Named entity recognition (NER)is probably the first step towards information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. NER is used in many fields in Natural Language Processing (NLP), and it can help answering many real-world questions, such as:

* Which companies were mentioned in the news article?
* Were specified products mentioned in complaints or reviews?
* Does the tweet contain the name of a person? 
* Which locations exist in the book?

[named entity](https://en.wikipedia.org/wiki/Named_entity) recognizer with NLTK and SpaCy, to identify the names of things, such as persons, organizations, or locations in the raw text. Let’s get started!


##  Entity Extraction with spaCy






In [None]:
# in case requirements.txt would not work
# import sys
# !{sys.executable} -m pip install spacy
# !{sys.executable} -m spacy download en

In [None]:
import spacy
from spacy import displacy
# import en_core_web_sm
# nlp = en_core_web_sm.load()
from spacy.lang.en import English

nlp = spacy.load("en")
doc = nlp(booktext)
print([(X.text, X.label_) for X in doc.ents])

In [None]:
len(doc.ents)

from collections import Counter
labels = [x.label_ for x in doc.ents]
Counter(labels)

In [None]:
items = [x.text for x in doc.ents]
Counter(items).most_common(3)

In [None]:
sentences = [x for x in doc.sents]
print(sentences[120])

##  Entity Extraction with Google
*! Paid API*

There are different opinions about the results you will be getting in you will perform analysis 
Here is one of the discussions on [Quora](https://www.quora.com/How-does-Googles-open-source-natural-language-parser-SyntaxNet-compare-with-spaCy-io-or-Stanfords-CoreNLP) and [Stack Overflow](https://stackoverflow.com/questions/52473653/better-named-entity-recognition-and-similarity-using-spacy) 
However, as you can see below Google API and some other paid APIs include richer meta-data like salience, wikipedia-url, mentions and sentiment. 
Please refer to API [documentation](https://cloud.google.com/natural-language/docs/reference/rest/v1/Entity#Type) for more details. 



In [38]:
import six, os
from google.cloud import language
from google.cloud.language import enums
from google.cloud.language import types


os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.join('.', "GoogleLocalCreds.json")
client = language.LanguageServiceClient()


if isinstance(booktext, six.binary_type):
    text = booktext.decode('utf-8')

# Instantiates a plain text document.
document = types.Document(
    content=booktext,
    type=enums.Document.Type.PLAIN_TEXT)

# Detects entities in the document. You can also analyze HTML with:
#   document.type == enums.Document.Type.HTML
entities = client.analyze_entities(document).entities


GraphLab Create requires a license to use. To get a non-commercial  license for academic use only, visit https://turi.com/register.

GraphLab Create requires a license to use. To get a non-commercial  license for academic use only, visit https://turi.com/register.

GraphLab Create requires a license to use. To get a non-commercial  license for academic use only, visit https://turi.com/register.

GraphLab Create requires a license to use. To get a non-commercial  license for academic use only, visit https://turi.com/register.

GraphLab Create requires a license to use. To get a non-commercial  license for academic use only, visit https://turi.com/register.

GraphLab Create requires a license to use. To get a non-commercial  license for academic use only, visit https://turi.com/register.



In [39]:
for entity in entities:
    entity_type = enums.Entity.Type(entity.type)
    print('=' * 20)
    print(u'{:<16}: {}'.format('name', entity.name))
    print(u'{:<16}: {}'.format('type', entity_type.name))
    print(u'{:<16}: {}'.format('salience', entity.salience))
    print(u'{:<16}: {}'.format('wikipedia_url',
          entity.metadata.get('wikipedia_url', '-')))
    print(u'{:<16}: {}'.format('mid', entity.metadata.get('mid', '-')))
    

name            : Marc Chagall
type            : PERSON
salience        : 0.520169138908
wikipedia_url   : -
mid             : /m/0bqr1t6
name            : Moscow
type            : LOCATION
salience        : 0.0177865549922
wikipedia_url   : https://en.wikipedia.org/wiki/Moscow
mid             : /m/04swd
name            : Paris
type            : LOCATION
salience        : 0.00588323595002
wikipedia_url   : https://en.wikipedia.org/wiki/Paris
mid             : /m/05qtj
name            : Jewish Theater Contents
type            : OTHER
salience        : 0.00385385239497
wikipedia_url   : -
mid             : -
name            : Jewish Theater
type            : LOCATION
salience        : 0.00332156592049
wikipedia_url   : -
mid             : -
name            : Persian Jews
type            : PERSON
salience        : 0.00296543980949
wikipedia_url   : https://en.wikipedia.org/wiki/Persian_Jews
mid             : /m/05p09y
name            : beneficiaries
type            : OTHER
salience       

name            : painting
type            : WORK_OF_ART
salience        : 0.000260438333498
wikipedia_url   : -
mid             : -
name            : theater paintings
type            : OTHER
salience        : 0.000260404398432
wikipedia_url   : -
mid             : -
name            : paintings
type            : WORK_OF_ART
salience        : 0.000260224041995
wikipedia_url   : -
mid             : -
name            : reflection
type            : OTHER
salience        : 0.000260048080236
wikipedia_url   : -
mid             : -
name            : paint
type            : OTHER
salience        : 0.000260023312876
wikipedia_url   : -
mid             : -
name            : paintings
type            : WORK_OF_ART
salience        : 0.000259905849816
wikipedia_url   : -
mid             : -
name            : paintings
type            : WORK_OF_ART
salience        : 0.000259339925833
wikipedia_url   : -
mid             : -
name            : paintings
type            : WORK_OF_ART
salience        : 

mid             : -
name            : mural
type            : WORK_OF_ART
salience        : 0.000196478824364
wikipedia_url   : -
mid             : -
name            : life
type            : OTHER
salience        : 0.000196457942366
wikipedia_url   : -
mid             : -
name            : fashion
type            : OTHER
salience        : 0.000196453838726
wikipedia_url   : -
mid             : -
name            : murals
type            : WORK_OF_ART
salience        : 0.000196451030206
wikipedia_url   : -
mid             : -
name            : figures
type            : OTHER
salience        : 0.000196388762561
wikipedia_url   : -
mid             : -
name            : life
type            : OTHER
salience        : 0.000196382883587
wikipedia_url   : -
mid             : -
name            : time
type            : EVENT
salience        : 0.000196373177459
wikipedia_url   : -
mid             : -
name            : life
type            : OTHER
salience        : 0.000196369495825
wikipedia_url  

salience        : 0.000149799903738
wikipedia_url   : -
mid             : -
name            : Introduction
type            : EVENT
salience        : 0.000149606348714
wikipedia_url   : -
mid             : -
name            : communication
type            : OTHER
salience        : 0.000149604631588
wikipedia_url   : -
mid             : -
name            : Introduction
type            : EVENT
salience        : 0.00014948748867
wikipedia_url   : -
mid             : -
name            : Introduction
type            : EVENT
salience        : 0.000149387997226
wikipedia_url   : -
mid             : -
name            : collection
type            : OTHER
salience        : 0.000149328465341
wikipedia_url   : -
mid             : -
name            : Introduction
type            : EVENT
salience        : 0.000149310973939
wikipedia_url   : -
mid             : -
name            : IVIetropolitan New York Library Council
type            : ORGANIZATION
salience        : 0.00014929693134
wikipedia_url   

salience        : 0.000124307291117
wikipedia_url   : -
mid             : -
name            : costumes
type            : OTHER
salience        : 0.00012399078696
wikipedia_url   : -
mid             : -
name            : costume
type            : OTHER
salience        : 0.000123918085592
wikipedia_url   : -
mid             : -
name            : institutions
type            : ORGANIZATION
salience        : 0.000123843885376
wikipedia_url   : -
mid             : -
name            : home
type            : LOCATION
salience        : 0.000123730394989
wikipedia_url   : -
mid             : -
name            : decorations
type            : OTHER
salience        : 0.000123699501273
wikipedia_url   : -
mid             : -
name            : theater
type            : LOCATION
salience        : 0.0001236894459
wikipedia_url   : -
mid             : -
name            : institutions
type            : ORGANIZATION
salience        : 0.000123688747408
wikipedia_url   : -
mid             : -
name         

name            : loan procedures
type            : OTHER
salience        : 0.000102434474684
wikipedia_url   : -
mid             : -
name            : pageantry
type            : OTHER
salience        : 0.000102411293483
wikipedia_url   : -
mid             : -
name            : corner
type            : OTHER
salience        : 0.000102402940684
wikipedia_url   : -
mid             : -
name            : drama
type            : WORK_OF_ART
salience        : 0.000102355857962
wikipedia_url   : -
mid             : -
name            : art
type            : WORK_OF_ART
salience        : 0.000102313446405
wikipedia_url   : -
mid             : -
name            : art
type            : WORK_OF_ART
salience        : 0.000102313446405
wikipedia_url   : -
mid             : -
name            : art
type            : WORK_OF_ART
salience        : 0.000102310761577
wikipedia_url   : -
mid             : -
name            : art
type            : WORK_OF_ART
salience        : 0.000102268590126
wikipedia_u

wikipedia_url   : -
mid             : -
name            : artists
type            : PERSON
salience        : 8.4311759565e-05
wikipedia_url   : -
mid             : -
name            : artists
type            : PERSON
salience        : 8.4309533122e-05
wikipedia_url   : -
mid             : -
name            : issues
type            : OTHER
salience        : 8.425082342e-05
wikipedia_url   : -
mid             : -
name            : artist
type            : PERSON
salience        : 8.42021981953e-05
wikipedia_url   : -
mid             : -
name            : artist
type            : PERSON
salience        : 8.41942382976e-05
wikipedia_url   : -
mid             : -
name            : artist
type            : PERSON
salience        : 8.41922592372e-05
wikipedia_url   : -
mid             : -
name            : artist
type            : PERSON
salience        : 8.41843720991e-05
wikipedia_url   : -
mid             : -
name            : treatment
type            : OTHER
salience        : 8.411566523

name            : harmony
type            : OTHER
salience        : 7.12383989594e-05
wikipedia_url   : -
mid             : -
name            : frontiers
type            : OTHER
salience        : 7.11922184564e-05
wikipedia_url   : -
mid             : -
name            : Vitali
type            : LOCATION
salience        : 7.11851462256e-05
wikipedia_url   : -
mid             : -
name            : loan
type            : OTHER
salience        : 7.10413005436e-05
wikipedia_url   : -
mid             : -
name            : accident
type            : EVENT
salience        : 7.09769956302e-05
wikipedia_url   : -
mid             : -
name            : screen
type            : OTHER
salience        : 7.09709202056e-05
wikipedia_url   : -
mid             : -
name            : effect
type            : OTHER
salience        : 7.09520681994e-05
wikipedia_url   : -
mid             : -
name            : vocabulary
type            : OTHER
salience        : 7.09404193913e-05
wikipedia_url   : -
mid      

salience        : 5.24506722286e-05
wikipedia_url   : -
mid             : -
name            : paper
type            : OTHER
salience        : 5.24506722286e-05
wikipedia_url   : -
mid             : -
name            : collage elements
type            : OTHER
salience        : 5.24339920958e-05
wikipedia_url   : -
mid             : -
name            : elements
type            : OTHER
salience        : 5.24226961716e-05
wikipedia_url   : -
mid             : -
name            : musicians
type            : PERSON
salience        : 5.2420211432e-05
wikipedia_url   : -
mid             : -
name            : musicians
type            : PERSON
salience        : 5.2420211432e-05
wikipedia_url   : -
mid             : -
name            : nos
type            : OTHER
salience        : 5.22036025359e-05
wikipedia_url   : -
mid             : -
name            : style
type            : OTHER
salience        : 5.21813344676e-05
wikipedia_url   : -
mid             : -
name            : area
type         

type            : OTHER
salience        : 3.90856403101e-05
wikipedia_url   : -
mid             : -
name            : South
type            : LOCATION
salience        : 3.90675231756e-05
wikipedia_url   : -
mid             : -
name            : love
type            : OTHER
salience        : 3.89519082091e-05
wikipedia_url   : -
mid             : -
name            : aspect
type            : OTHER
salience        : 3.8928923459e-05
wikipedia_url   : -
mid             : -
name            : poems
type            : WORK_OF_ART
salience        : 3.89267224818e-05
wikipedia_url   : -
mid             : -
name            : poems
type            : WORK_OF_ART
salience        : 3.89267224818e-05
wikipedia_url   : -
mid             : -
name            : table
type            : OTHER
salience        : 3.89201850339e-05
wikipedia_url   : -
mid             : -
name            : foreground
type            : LOCATION
salience        : 3.89126726077e-05
wikipedia_url   : -
mid             : -
name      

salience        : 2.57065221376e-05
wikipedia_url   : -
mid             : -
name            : scene
type            : OTHER
salience        : 2.56950988842e-05
wikipedia_url   : -
mid             : -
name            : Auditorium j 7
type            : ORGANIZATION
salience        : 2.52861227636e-05
wikipedia_url   : -
mid             : -
name            : prototype
type            : OTHER
salience        : 2.52059217019e-05
wikipedia_url   : -
mid             : -
name            : evocations
type            : WORK_OF_ART
salience        : 2.49753466051e-05
wikipedia_url   : -
mid             : -
name            : decision
type            : OTHER
salience        : 2.49739223364e-05
wikipedia_url   : -
mid             : -
name            : pyramid
type            : OTHER
salience        : 2.49710938078e-05
wikipedia_url   : -
mid             : -
name            : reveler
type            : PERSON
salience        : 2.49662061833e-05
wikipedia_url   : -
mid             : -
name            :

ValueError: 12 is not a valid Type

In [64]:
locations = []
people = []
for entity in entities:
    try:
        entity_type = enums.Entity.Type(entity.type)
        if str(entity_type.name) is 'LOCATION':
            locations.append(str(entity.name))
        elif str(entity_type.name) is 'PERSON':
            people.append(str(entity.name))
        else:
            next
    except ValueError:
        next

In [66]:
import collections
counter=collections.Counter(locations)
counter.most_common(20)

[('theater', 42),
 ('world', 17),
 ('museum', 12),
 ('space', 8),
 ('house', 6),
 ('auditorium', 4),
 ('home', 4),
 ('foreground', 4),
 ('Theater', 4),
 ('building', 4),
 ('parts', 3),
 ('country', 3),
 ('floor', 3),
 ('gallery', 3),
 ('sky', 2),
 ('homeland', 2),
 ('countries', 2),
 ('offices', 2),
 ('state', 2),
 ('capital', 2)]

In [67]:
counter=collections.Counter(people)
counter.most_common(20)

[('artist', 37),
 ('artists', 17),
 ('dancers', 8),
 ('people', 7),
 ('many', 6),
 ('director', 5),
 ('friends', 5),
 ('Director', 4),
 ('viewers', 4),
 ('man', 4),
 ('painter', 4),
 ('actors', 4),
 ('painters', 3),
 ('hero', 3),
 ('ensemble', 3),
 ('scholars', 3),
 ('audience', 3),
 ('lovers', 3),
 ('restorers', 3),
 ('avant-garde artists', 3)]

# Step 4: Summarization

There are two types of text summarization algorithms: *extractive* and *abstractive*. 

    * Extractive summarization algorithms attempt to score the phrases or sentences in a document and return only the most highly informative blocks of text.

    * Abstractive text summarization actually creates new text which doesn’t exist in that form in the document. Abstractive summarization is what you might do when explaining a book you read to your friend, and it is much more difficult for a computer to do than extractive summarization.
    
    
### PyTeaser

[PyTeaser](https://github.com/xiaoxu193/PyTeaser) is a Python implementation of the Scala project TextTeaser, which is a heuristic approach for extractive text summarization.TextTeaser associates a score with every sentence. This score is a linear combination of features extracted from that sentence. Features that TextTeaser looks at are:

* titleFeature: The count of words which are common to title of the document and sentence.
* sentenceLength: Authors of TextTeaser defined a constant “ideal” (with value 20), which represents the ideal length of the summary, in terms of number of words. sentenceLength is calculated as a normalized distance from this value.
* sentencePosition: Normalized sentence number (position in the list of sentences).
* keywordFrequency: Term frequency in the bag-of-words model (after removing stop words).



In [5]:
from pyteaser import Summarize
summaries = Summarize("Book about Chagall", booktext)
print summaries

[u'Therefore, Chagall meditating on his visions, Chagall the draftsman, is perceived even more sharply than Chagall the painter.', u'See Grigori Kasovsky, "Chagall and the Jewish Art Programme," in Vitali, Marc Chagall: The Russian Years ipo6-ip22, p. 57. 66.', u'It wants to produce the kernel trom which a normal Yiddish theater, Yiddish theater art in a European sense, will develop.', u'The Vilna artists have lived to see Chagall with their own eyes and to hear him speak in the international language, the Esperanto, called Jewish art. " Chagall delivered the opening address.', u'M. Chagall, "Letter to Pavel Davidovitch Ettinger 1920," in Vitali, Marc Chagall: The Russian Years lpo6~ip22, pp. 73\u201475. 71.']


### Gensim 

[gensim.summarization module](https://radimrehurek.com/gensim/summarization/summariser.html) implements TextRank, an unsupervised algorithm based on weighted-graphs from a paper by Mihalcea et al. TextRank works as follows:

* Pre-process the text: remove stop words and stem the remaining words.
* Create a graph where vertices are sentences.
* Connect every sentence to every other sentence by an edge. The weight of the edge is how similar the two sentences are.
* Run the PageRank algorithm on the graph.
* Pick the vertices(sentences) with the highest PageRank score

In original TextRank the weights of an edge between two sentences is the percentage of words appearing in both of them. 


In [None]:
from gensim.summarization.summarizer import summarize
print(summarize(booktext))

gensim Version: 3.4.0


### LexRank (sumy)

LexRank
LexRank is an unsupervised graph based approach similar to TextRank. LexRank uses IDF-modified Cosine as the similarity measure between two sentences. This similarity is used as weight of the graph edge between two sentences. LexRank also incorporates an intelligent post-processing step which makes sure that top sentences chosen for the summary are not too similar to each other.

More on LexRank Vs. TextRank can be found here.

Note on running time: extremely slow

In [None]:
#Import library essentials
from sumy.parsers.plaintext import PlaintextParser #We're choosing a plaintext parser here, other parsers available for HTML etc.
from sumy.nlp.tokenizers import Tokenizer 
from sumy.summarizers.lex_rank import LexRankSummarizer #We're choosing Lexrank, other algorithms are also built in


# parser = PlaintextParser.from_file(file, Tokenizer("english"))
summarizer = LexRankSummarizer()

# string = unicode(raw_input(), 'utf8')
booktext_for_output = booktext.encode('utf8', 'replace')
summary = summarizer(booktext_for_output, 5) #Summarize the document with 5 sentences

for sentence in summary:
    print sentence




### Luhn (sumy)

It is one of the earliest suggested algorithm by the famous IBM researcher it was named after. It scores sentences based on frequency of the most important words.

Note on running time: super fast


In [8]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.summarizers.luhn import LuhnSummarizer


parser = PlaintextParser.from_string(booktext,Tokenizer("english"))
summarizer_luhn = LuhnSummarizer()
summary_1 =summarizer_luhn(parser.document,2)
for sentence in summary_1:
	print(sentence)


Yet for the most part those various items are not depictions of individual objects in the world but represent several recognizable domains throughout Chagall's art: old Jews of the recent religious past, as seen from the distance of a secular generation; Christian officials and peasants of the village; his own, invented "Vitebsk " as the symbolic small town of a distant Jewish world; another version of "Vitebsk," with its churches symbolizing provincial Russia; animals in that world, often humanized; his child-bride Bella and loving couples; Jesus Christ as the suffering Jew; Paris with the emblematic Eiffel Tower and the window of his studio; and, later in his career, anonymous Jewish masses, crossing the Red Sea or facing the Holocaust; and the world of the Bible.
A skillful and excellently precise brush; now fondly licking, now scratching; now bathing in the even ripple of the daubs, now scattering marvelous "Chagallian " little dots, drops and patterns, joyful and resounding, scarl

### LSA (sumy)
Based on term frequency techniques with singular value decomposition to summarize texts.

Latent semantic analysis is an unsupervised method of summarization it combines term frequency techniques with singular value decomposition to summarize texts. It is one of the most recent suggested technique for summerization

Note on running time: extremely slow

In [8]:
from sumy.parsers.plaintext import PlaintextParser
from sumy.summarizers.lsa import LsaSummarizer

parser = PlaintextParser.from_string(booktext,Tokenizer("english"))
summarizer_lsa = LsaSummarizer()
summary_2 =summarizer_lsa(parser.document,2)
for sentence in summary_2:
    print(sentence)

NameError: name 'Tokenizer' is not defined


# Step 5: Topic Modeling

It is an unsupervised approach used for finding and observing the bunch of words (called “topics”) in large clusters of texts.

> * Bag of Words (BoW) is an algorithm that counts how many times a word appears in a document.
> * Term Frequency-Inverse Document Frequency (TF-IDF) - s another way to judge the topic of an article by the words it contains. With TF-IDF, words are given weight – TF-IDF measures relevance, not frequency. That is, wordcounts are replaced with TF-IDF scores across the whole dataset.
> * Latent Dirichlet Allocation - LDA assumes documents are produced from a mixture of topics. Those topics then generate words based on their probability distribution. Given a dataset of documents, LDA backtracks and tries to figure out what topics would create those documents in the first place.
> * ...


### LDA with sklearn

Refference: 
* Library [Docs](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html)
* Step-By_Step [Tutorial](https://ourcodingclub.github.io/2018/12/10/topic-modelling-python.html)

> * tf (we chose tf as a variable name to stand for ‘term frequency’ - the frequency of each word/token in each tweet). The shape of tf tells us how many tweets we have and how many words we have that made it through our filtering process.
> * tf_feature_names are the actual names of the tokens

In tf matrix each row is a token and each column is a word. The numbers in each position tell us how many times this word appears in this tweet.
Next we actually create the model object. Lets start by arbitrarily choosing 10 topics. We also define the random state so that this model is reproducible.

In [31]:
from sklearn.feature_extraction.text import CountVectorizer

# we are going to cheat slightly and remove unwanted elements that occurs without stopwords:
tokens_no_stop = [e for e in tokens_no_stop if e not in ('"', "\u2014", '\'t', 'n', 'p', '1', 'yet', '.', ',s', 'a', 
                                                         'in', 'and', 'like', 'in', 'the', 'x', 'it', 'for',
                                                         '-', 'ot', 'i')]

# the vectorizer object will be used to transform text to vector form
vectorizer = CountVectorizer(max_df=0.9, min_df=25, token_pattern='\w+|\$[\d\.]+|\S+')
# apply transformation
tf = vectorizer.fit_transform(tokens_no_stop).toarray()

# tf_feature_names tells us what word each column in the matric represents
tf_feature_names = vectorizer.get_feature_names() 
#! ensure that it does not contain wierd characters, but at the same time keeps the length

In [28]:
from sklearn.decomposition import LatentDirichletAllocation

number_of_topics = 10
model = LatentDirichletAllocation(n_components=number_of_topics, random_state=0)

**model** is our LDA algorithm model object. I expect that if you are here then you should be comfortable with Python’s object orientation. If not then all you need to know is that the model object hold everything we need. It holds parameters like the number of topics that we gave it when we created it; it also holds methods like the fitting method; once we fit it, it will hold fitted parameters which tell us how important different words are in different topics. We will apply this next and feed it our tf matrix

In [29]:
model.fit(tf)

LatentDirichletAllocation(batch_size=128, doc_topic_prior=None,
             evaluate_every=-1, learning_decay=0.7,
             learning_method='batch', learning_offset=10.0,
             max_doc_update_iter=100, max_iter=10, mean_change_tol=0.001,
             n_components=10, n_jobs=None, n_topics=None, perp_tol=0.1,
             random_state=0, topic_word_prior=None,
             total_samples=1000000.0, verbose=0)

Next we will want to inspect our topics that we generated and try to extract meaningful information from them.

Below I have written a function which takes in our model object model, the order of the words in our matrix tf_feature_names and the number of words we would like to show. Use this function, which returns a dataframe, to show you the topics we created. Remember that each topic is a list of words/tokens and weights

In [30]:
def display_topics(model, feature_names, no_top_words):
    topic_dict = {}
    for topic_idx, topic in enumerate(model.components_):
        topic_dict["Topic %d words" % (topic_idx)]= ['{}'.format(feature_names[i])
                        for i in topic.argsort()[:-no_top_words - 1:-1]]
        topic_dict["Topic %d weights" % (topic_idx)]= ['{:.1f}'.format(topic[i])
                        for i in topic.argsort()[:-no_top_words - 1:-1]]
    return pd.DataFrame(topic_dict)

#there are strange unicode characters in some texts, so use this code to remove
tf_feature_names = [w.replace(u'\u2014', '2014') for w in tf_feature_names]

no_top_words = 10
display_topics(model, tf_feature_names, no_top_words)

Unnamed: 0,Topic 0 weights,Topic 0 words,Topic 1 weights,Topic 1 words,Topic 2 weights,Topic 2 words,Topic 3 weights,Topic 3 words,Topic 4 weights,Topic 4 words,Topic 5 weights,Topic 5 words,Topic 6 weights,Topic 6 words,Topic 7 weights,Topic 7 words,Topic 8 weights,Topic 8 words,Topic 9 weights,Topic 9 words
0,969.1,'',835.1,i,805.1,theater,1198.1,chagall,565.1,jewish,1062.1,``,594.1,.,178.1,stage,994.1,'s,678.1,2014
1,598.1,yiddish,304.1,one,163.1,see,372.1,russian,339.1,in,919.1,the,566.1,art,161.1,but,324.1,world,222.1,n
2,219.1,life,202.1,moscow,140.1,paris,194.1,he,163.1,artist,123.1,murals,327.1,new,159.1,jews,212.1,a,200.1,'t
3,153.1,paintings,185.1,granovskii,124.1,culture,170.1,it,141.1,state,96.1,my,195.1,work,147.1,even,205.1,and,174.1,first
4,132.1,may,120.1,ot,99.1,for,159.1,years,124.1,this,84.1,director,188.1,time,146.1,two,176.1,painting,167.1,people
5,90.1,menakhem,106.1,museum,85.1,texts,143.1,us,119.1,chamber,78.1,well,167.1,would,96.1,old,160.1,also,154.1,marc
6,87.1,-mendel,104.1,could,84.1,exhibition,120.1,mikhoels,87.1,three,73.1,x,145.1,p,88.1,pp,125.1,language,115.1,york
7,74.1,name,103.1,artists,77.1,on,110.1,-,80.1,another,71.1,young,88.1,color,84.1,actors,99.1,figures,111.1,vitebsk
8,64.1,often,88.1,russia,73.1,they,80.1,made,79.1,to,69.1,green,79.1,painted,78.1,know,95.1,revolution,104.1,we
9,54.1,book,67.1,works,67.1,artistic,64.1,small,75.1,sholem,68.1,you,68.1,became,77.1,literature,89.1,whole,91.1,many


Now we have some topics, which are just clusters of words, we can try to figure out what they really mean.

Additional topic polishing can be achieved by :
> *  Part of Speech Tag Filter – POS tag filter is more about the context of the features than frequencies of features. Topic Modelling tries to map out the recurring patterns of terms into topics. However, every term might not be equally important contextually. For example, POS tag IN contain terms such as – “within”, “upon”, “except”. “CD” contains – “one”,”two”, “hundred” etc. “MD” contains “may”, “must” etc. These terms are the supporting words of a language and can be removed by studying their post tags.
> * Frequency Filter – Arrange every term according to its frequency. Terms with higher frequencies are more likely to appear in the results as compared ones with low frequency. The low frequency terms are essentially weak features of the corpus, hence it is a good practice to get rid of all those weak features. An exploratory analysis of terms and their frequency can help to decide what frequency value should be considered as the threshold.