# Text Analytics

Natural Language Processing (NLP) is a branch of artificial intelligence (AI) that deals with written and spoken language. You can use NLP to build solutions that extracting semantic meaning from text or speech, or that formulate meaningful responses in natural language.

Microsoft Azure *cognitive services* includes the *Text Analytics* service, which provides some out-of-the-box NLP capabilities, including the identification of key phrases in text, and the classification of text based on sentiment.

<p style='text-align:center'><img src='./images/NLP.jpg' alt='A robot reading a notebook'/></p>

For example, suppose the fictional *Margie's Travel* organization encourages customers to submit reviews for hotel stays. You could use the Text Analytics service to summarize the reviews by extracting key phrases, determine which reviews are positive and which are negative, or analyze the review text for mentions of known entities such as locations or people.

## View Review Documents

Let's start by taking a look at some hotel reviews that have been left by customers.

The reviews are in text files. To see them, just run the cell below by clicking its green <span style="color:green">&#9655</span> button (at the top left of the cell).

In [1]:
import os

# Read the reviews in the /data/reviews folder
reviews_folder = os.path.join('data', 'text', 'reviews')
reviews = []
for file_name in os.listdir(reviews_folder):
    review_text = open(os.path.join(reviews_folder, file_name)).read()
    review = {"id": file_name, "language": "en", "text": review_text}
    reviews.append(review)

for review_num in range(len(reviews)):
    # print the review text
    print(reviews[review_num]['id'], '\n', reviews[review_num]['text'],'\n')

review4.txt 
 Very noisy and rooms are tiny
The Lombard Hotel, San Francisco, USA
9/5/2018
Hotel is located on Lombard street which is a very busy SIX lane street directly off the Golden Gate Bridge. Traffic from early morning until late at night especially on weekends. Noise would not be so bad if rooms were better insulated but they are not. Had to put cotton balls in my ears to be able to sleep--was too tired to enjoy the city the next day. Rooms are TINY. I picked the room because it had two queen size beds--but the room barely had space to fit them. With family of four in the room it was tight. With all that said, rooms are clean and they've made an effort to update them. The hotel is in Marina district with lots of good places to eat, within walking distance to Presidio. May be good hotel for young stay-up-late adults on a budget
 

review1.txt 
 Good Hotel and staff
The Royal Hotel, London, UK
3/2/2018
Clean rooms, good service, great location near Buckingham Palace and Westmins

## Create a Cognitive Services Resource

To analyze the text in these reviews, you can use the **Text Analytics** cognitive service. To use this, you need to provision either a **Text Analytics** or **Cognitive Services** resource in your Azure subscription (Use a Text Analytics resource if this is the only service you plan to use or you want to track its usage separately; otherwise you can use a Cognitive Services resource to combine the Text Analytics service with other cognitive services - enabling developers to use a single endpoint and key to access them.)

If you don't already have one, use the following steps to create a **Cognitive Services** resource in your Azure subscription:

1. In another browser tab, open the Azure portal at https://portal.azure.com, signing in with your Microsoft account.
2. Click the **&#65291;Create a resource** button, search for *Cognitive Services*, and create a **Cognitive Services** resource with the following settings:
    - **Name**: *Enter a unique name*.
    - **Subscription**: *Your Azure subscription*.
    - **Location**: *Any available location*.
    - **Pricing tier**: S0
    - **Resource group**: *Create a resource group with a unique name*.
3. Wait for deployment to complete. Then go to your cognitive services resource, and on the **Quick start** page, note the keys and endpoint. You will need these to connect to your cognitive services resource from client applications.

## Get the Key and Endpoint for your Cognitive Services Resource

To use your cognitive services resource, client applications need its  endpoint and authentication key:

1. In the Azure portal, on the **Quick start** page for your Cognitive Services resource, copy the **Key1** for your resource and paste it in the code below, replacing **YOUR_COG_KEY**.
2. Copy the **endpoint** for your resource and and paste it in the code below, replacing **YOUR_COG_ENDPOINT**.
3. Run the cell below.

In [2]:
cog_key = 'YOUR_COG_KEY'
cog_endpoint = 'YOUR_COG_ENDPOINT'

print('Ready to use cognitive services at {} using key {}'.format(cog_endpoint, cog_key))

Ready to use cognitive services at https://westus2.api.cognitive.microsoft.com/ using key 50de79fcdc0d44bcaaee82326c50a9bd


## Extract Key Phrases

Let's start by analyzing the text in the customer reviews to identify key phrases that give some indication of the main talking points.

In [3]:
import os
from azure.cognitiveservices.language.textanalytics import TextAnalyticsClient
from msrest.authentication import CognitiveServicesCredentials

# Get a client for your text analytics cognitive service resource
text_analytics_client = TextAnalyticsClient(endpoint=cog_endpoint,
                                            credentials=CognitiveServicesCredentials(cog_key))

# Analyze the reviews you read from the /data/reviews folder earlier
key_phrase_analysis = text_analytics_client.key_phrases(documents=reviews)

# print review text for each review
for review_num in range(len(reviews)):
    # print the review text
    print(reviews[review_num]['id'])

    # Get the key phrases in this review
    print('\nKey Phrases:')
    key_phrases = key_phrase_analysis.documents[review_num].key_phrases
    for key_phrase in key_phrases:
        print('\t', key_phrase)
    print('\n')

review4.txt

Key Phrases:
	 rooms
	 good hotel
	 Lombard Hotel
	 Lombard street
	 late adults
	 good places
	 lane street
	 young stay
	 night
	 early morning
	 Marina district
	 San Francisco
	 USA
	 Golden Gate Bridge
	 walking distance
	 queen size beds
	 ears
	 Traffic
	 cotton balls
	 city
	 Presidio
	 weekends
	 budget
	 day
	 effort
	 Noise
	 space
	 family


review1.txt

Key Phrases:
	 Good Hotel
	 good service
	 Clean rooms
	 Royal Hotel
	 great location
	 Buckingham Palace
	 Westminster Abbey
	 fish
	 West coast
	 lounge
	 bedroom
	 enormous bathroom
	 group
	 kitchen
	 London
	 UK
	 taster menu
	 Michelin Star
	 staff
	 courtyard


review3.txt

Key Phrases:
	 helpful staff
	 Lombard Street
	 Good location
	 Chestnut Street
	 Lombard Hotel
	 Marina district
	 traffic noise
	 San Francisco Museum of Fine Arts
	 good view of Golden Gate bridge
	 trendy area
	 USA
	 city
	 bus route
	 busy road
	 centre
	 restaurants
	 Rooms
	 interesting houses
	 reviews


review2.txt

Key Phra

The key phrases can help you gain an understanding of the most important talking points in each review. For example, a review containing a phrase "helpful staff" or "poor service" can give you an indication of some of the main concerns of the reviewer.

## Determine Sentiment

It might be useful to classify the reviews as *positive* or *negative* based on a *sentiment score*. Again, you can use the Text Analytics service to do this.

In [4]:
# Use the client and reviews you created in the previous code cells to get sentiment scores
sentiment_analysis = text_analytics_client.sentiment(documents=reviews)

# Print the results for each review
for review_num in range(len(reviews)):

    # Get the sentiment score for this review
    sentiment_score = sentiment_analysis.documents[review_num].score

    # classifiy 'positive' if more than 0.5, 
    if sentiment_score < 0.5:
        sentiment = 'negative'
    else:
        sentiment = 'positive'

    # print file name and sentiment
    print('{} : {} ({})'.format(reviews[review_num]['id'], sentiment, sentiment_score))

review4.txt : negative (0.0739937424659729)
review1.txt : positive (0.9926161766052246)
review3.txt : positive (0.9941426515579224)
review2.txt : negative (0.025993913412094116)


## Extract Known Entities

*Entities* are things that might be mentioned in text that reference some commonly understood type of item. For example, a location, a person, or a date. Let's suppose you're interested in datea and places mentioned in the reviews - you can use the following code to find them.

In [5]:
# Use the client and reviews you created in the previous code cells to get named entities
entity_analysis = text_analytics_client.entities(documents=reviews)

# Print the results for each review
for review_num in range(len(reviews)):
    print(reviews[review_num]['id'])
    # Get the named entitites in this review
    entities = entity_analysis.documents[review_num].entities
    for entity in entities:
        # Only get location entitites
        if entity.type in ['DateTime','Location']:
            link = '(' + entity.wikipedia_url + ')' if entity.wikipedia_id is not None else ''
            print(' - {}: {} {}'.format(entity.type, entity.name, link))

review4.txt
 - Location: San Francisco (https://en.wikipedia.org/wiki/San_Francisco)
 - Location: Lombard 
 - Location: Lombard Street (San Francisco) (https://en.wikipedia.org/wiki/Lombard_Street_(San_Francisco))
 - Location: Golden Gate Bridge (https://en.wikipedia.org/wiki/Golden_Gate_Bridge)
 - DateTime: from early morning 
 - DateTime: night 
 - DateTime: the next day 
 - Location: Marina District, San Francisco (https://en.wikipedia.org/wiki/Marina_District,_San_Francisco)
 - Location: Marina 
 - Location: Presidio of San Francisco (https://en.wikipedia.org/wiki/Presidio_of_San_Francisco)
review1.txt
 - Location: London (https://en.wikipedia.org/wiki/London)
 - DateTime: 3/2/2018 
 - Location: Buckingham Palace (https://en.wikipedia.org/wiki/Buckingham_Palace)
 - Location: Westminster Abbey (https://en.wikipedia.org/wiki/Westminster_Abbey)
review3.txt
 - Location: San Francisco (https://en.wikipedia.org/wiki/San_Francisco)
 - DateTime: 8/16/2018 
 - DateTime: August 
 - Location:

Note that some entities are sufficiently well-known to have an associated Wikipedia page, in which case the Text Analytics service returns the URL for that page.

## Learn More

For more information about the Text Analytics service, see [the Text Analytics service documentation](https://docs.microsoft.com/azure/cognitive-services/text-analytics/)