### Using NLTK for Named Entity Recognition (Location)


In [31]:
# import text
import nltk 

f = open('NE.txt','r')
text = f.read()

print (text)

Eisenstein was born to a middle-class family in Riga, Latvia (then part of the Russian Empire in the Governorate of Livonia), but his family moved frequently in his early years, as Eisenstein continued to do throughout his life. His father, Mikhail Osipovich Eisenstein, was born to a German Jewish father who had converted to Christianity, Osip Eisenstein, and a mother of Swedish descent. His mother, Julia Ivanovna Konetskaya, was from a Russian Orthodox family. According to other sources, both of his paternal grandparents were of Baltic German descent. His father was an architect and his mother was the daughter of a prosperous merchant. Julia left Riga the same year as the Russian Revolution of 1905, taking Sergei with her to St. Petersburg.[6] Her son would return at times to see his father, who joined them around 1910.[7] Divorce followed and Julia left the family to live in France. Eisenstein was raised as an Orthodox Christian, but became an atheist later on.


### Steps
- First, we will split the text into sentences using a sentence segmenter "nltk.sent_tokenize" 
- Each sentence will be further sibdivided into words using a word tokenizer "nltk.word_tokenize"
- Next, each sentence will be tagged with part-of-speech tags using nltk.pos_tag, which will prove very helppful in the next step, name entity detection.
- Next is to chunk the tagged sentences using "nltk.ne_chunk". Chunking aims at grouping elements of the sequence, without any differentiation between obtained groups. For example, noun phrase chunking or verb group chunking.
- After chunking, named entities will be labeld as "GPE" if the chunk is a location name.


The following code can easily realze the steps listed above.

In [33]:
#create a list to store person names
loc_names = []
# tokenize sentences
for sent in nltk.sent_tokenize(text):
    # tokenize words, tag words, then chunk the words
    for chunk in nltk.ne_chunk(nltk.pos_tag(nltk.word_tokenize(sent))):
        if hasattr(chunk, 'label') and chunk.label()=='GPE':
            loc = ' '.join(leave[0] for leave in chunk.leaves())
            loc_names.append(loc)

print (loc_names)

['Riga', 'Latvia', 'Russian', 'Livonia', 'German', 'Christianity', 'Swedish', 'Russian', 'Baltic', 'German', 'Russian', 'St', 'Petersburg', 'France']


### Using geotext package for location extraction
[Geotext](https://pypi.python.org/pypi/geotext) is a python package helps to extracts countriy and city mentions from text.

#### Install geotext
First, we need to install geotext type in the following commond in the terminal:
    
    pip install geotext

Next, we will use our text to have a try.

In [34]:
from geotext import GeoText
places = GeoText(text)
print (places.countries)
print places.cities

['Latvia', 'France']
['Riga', 'Livonia', 'Riga', 'Petersburg']
