# Analyzing Text

It may be surprising that not all travel articles include a map.  Understanding where restaurants, hotels, landmarks, and other points of interest are in relation to each other is important for itinerary building.Parsing text to look up places is a good application of forward geocoding.

Read this blog post for more background:

https://developer.here.com/blog/turn-text-into-here-maps-with-python-nltk

Example of a travel article without a map:
- [25 Best Things to Do in Cleveland, OH](https://vacationidea.com/destinations/best-things-to-do-in-cleveland.html)
- [US News Travel Section](https://travel.usnews.com/Cleveland_OH/Things_To_Do/)

In [None]:
import bs4
import nltk
import urllib

from nltk.tokenize import RegexpTokenizer
from nltk.corpus import stopwords
from nltk.tag import pos_tag

nltk.download('stopwords')
nltk.download('averaged_perceptron_tagger')

In [None]:
url = 'https://vacationidea.com/destinations/best-things-to-do-in-cleveland.html'
response = urllib.request.urlopen(url)
html = response.read()
soup = bs4.BeautifulSoup(html, 'html.parser')
soup

In [None]:
for section in soup(['script', 'style']):
    section.decompose()
    
text = soup.get_text()
text

In [None]:
# Ignore punctuation, duplicates
tokenizer = RegexpTokenizer(r'\w+')
tokens = set(tokenizer.tokenize(text))
tokens

In [None]:
# Remove stop words, get proper nouns
stop_words_set = set(stopwords.words())
tokens = [w for w in tokens if not w in stop_words_set]
proper = pos_tag(tokens)
tokens = [w for w,pos in proper if pos in ['NNP', 'NNPS']]

tokens

# Geocoder Autocomplete

Request is for a list of address suggestions for search text.  Can be used interactively as one types to test for a match, or useful for a list of tokens.

In [None]:
import os
import requests

APP_ID_HERE = os.environ['APP_ID_HERE']
APP_CODE_HERE = os.environ['APP_CODE_HERE']

uri = 'https://autocomplete.geocoder.api.here.com/6.2/suggest.json'
params = {
    'app_id': APP_ID_HERE,
    'app_code': APP_CODE_HERE,
    'query': 'Charlottesville',
}

response = requests.get(uri, params=params)
response.json()

# Try It

Parsing street addresses is tricky but give it a try to look for combinations of tokens when combined with autocomplete can help you identify location matches.