#### MSDS 430 Module 6 Python Assignment

<font color=green> In this exercise you will work with TripAdvisor customer review data for the <b>Comfort Inn & Suites Seattle</b> hotel in Seattle, Washington. The data is stored in a JSON file. JSON is a popular language-independent data format derived from JavaScript. In fact, JSON stands for JavaScript Object Notation. The load method in the json module in Python can be used to parse a JSON file with result being a Python dictionary. Then by using dictionary methods we can extract the list of reviews for the hotel and then use String methods to get information from within the comments made by the users.</font>

The hotel data we want to analyze is contained in the (json) file `hotel_reviews.json`. The data includes some information about the hotel, and a number of hotel reviews made by people who (we assume) stayed there. When we read the data into Python we will end up with a "nested" dictionary, i.e. a dictionary some of whose values are also (lists of) dictionaries. Before we examine the structure of this nested dictionary we need to talk a bit about dictionaries in general.

Dictionaries in Python are data structures that store key/value pairs. The keys have to be of an "immutable" type (such as numbers or strings) but the values can be various kinds of things, including lists, arrays, and other dictionaries. The keys need also to be unique: there can't be duplicates. Let us look at some examples.

In [1]:
# A dictionary with different types of keys: 1, "two" and (1,2). 
# Here (1,2) is an example of a tuple.
mixed_keys_dict = {1:"one", "two":2, (1,2):"ordered pair" }
mixed_keys_dict

{1: 'one', 'two': 2, (1, 2): 'ordered pair'}

In [2]:
#  Let us define a simple dictionary with String keys: "name", "age" and "sex":
cust_dict = {"name":"John Doe","age": 32, "sex": "M"}
cust_dict

{'name': 'John Doe', 'age': 32, 'sex': 'M'}

The list of keys in a dictionary can be obtained by using the dictionary's `keys` method. Also, we can obtain the value of any key in the dictionary by "bracketing" the key. We could then use assignment to change the value of the key if we wished.

In [3]:
# Get the list of keys--actually a dict_keys object in Python 3.x.
cust_dict.keys()

dict_keys(['name', 'age', 'sex'])

In [4]:
# Get the value associated with the "name" key
cust_dict["name"]

'John Doe'

In [5]:
# Change the value of the "name" key
cust_dict["name"] = "John Doe Jr."

In [6]:
cust_dict

{'name': 'John Doe Jr.', 'age': 32, 'sex': 'M'}

We can also use assignment to add new key/value pairs to the dictionary. 

In [7]:
cust_dict['height'] = 6.0
cust_dict['weight'] = 200.5
cust_dict

{'name': 'John Doe Jr.', 'age': 32, 'sex': 'M', 'height': 6.0, 'weight': 200.5}

Note that we need to add keys to an *existing* dictionary even if the dictionary is empty to begin with...

In [8]:
market_dict = {}  # create an empty dictionary
market_dict['market_name'] = 'Foods R Us'
market_dict

{'market_name': 'Foods R Us'}

Let us add a new key/value pair to `cust_dict`, where the key is `"location"` and the value of that key is another dictionary (with keys: `"city"`, `"state"` and `"zip code"`).

In [9]:
# Example of a nested dictionary...
location_dict = {"city":"Miami","state":"FL","zip code":33165}
cust_dict["location"]=location_dict
cust_dict

{'name': 'John Doe Jr.',
 'age': 32,
 'sex': 'M',
 'height': 6.0,
 'weight': 200.5,
 'location': {'city': 'Miami', 'state': 'FL', 'zip code': 33165}}

Note that the value of the `"location"` key is itself a dictionary and we can access its value by "bracketing" again.

In [10]:
cust_dict['location']

{'city': 'Miami', 'state': 'FL', 'zip code': 33165}

In [11]:
cust_dict['location']['city']

'Miami'

In [12]:
cust_dict['location']['zip code']

33165

 **Problem 1 (2 pts.)**: Use Python code to add a key/value pair to the `market_dict` dictionary defined above. We want the key to be `"fruits"` and its corresponding value to be an "inventory" dictionary. This "inventory" dictionary should consist of fruit names as keys (i.e. `apples`, `oranges` and `pears`). The value of each key should be the number of such fruits being sold at the market. Assume that there are 123 apples, 98 oranges and 53 pears on sale. After adding this key/value pair to `market_dict`, display `market_dict["fruits"]` to verify your work.

In [13]:
fruit_dict = {"apples":123,"oranges":98,"pears":53}
# TO DO: Add a key/value pair to the dictionary where the key is "fruits" and the value fruit_dict

market_dict["fruits"] = fruit_dict

# The following should display the three keys: 'apples', 'oranges' and 'pears'.
print(market_dict['fruits'].keys())

dict_keys(['apples', 'oranges', 'pears'])


Now it is time to turn our attention to our JSON file. We want to open and read `hotel_reviews.json` and save the data as a Python dictionary to the variable `hotel_data`. This is a two step process:

 1. Use the open method to create a file object.
 2. Pass the file object to `load` method in the `json` module. This method parses the contents of the file and returns a Python dictionary.
 
 But first we need to import the json module.

In [14]:
import json
with open('hotel_reviews.json') as json_data:
    hotel_data = json.load(json_data,) 
hotel_data

{'Reviews': [{'Ratings': {'Service': '4',
    'Cleanliness': '5',
    'Overall': '5.0',
    'Value': '4',
    'Sleep Quality': '4',
    'Rooms': '5',
    'Location': '5'},
   'AuthorLocation': 'Boston',
   'Title': '“Excellent Hotel & Location”',
   'Author': 'gowharr32',
   'ReviewID': 'UR126946257',
   'Content': 'We enjoyed the Best Western Pioneer Square. My husband and I had a room with a king bed and it was clean, quiet, and attractive. Our sons were in a room with twin beds. Their room was in the corner on the main street and they said it was a little noisier and the neon light shone in. But later hotels on the trip made them appreciate this one more. We loved the old wood center staircase. Breakfast was included and everyone was happy with waffles, toast, cereal, and an egg meal. Location was great. We could walk to shops and restaurants as well as transportation. Pike Market was a reasonable walk. We enjoyed the nearby Gold Rush Museum. Very, very happy with our stay. Staff wa

The structure of `hotel_data` is a bit complicated but it is divided into two parts: a **HotelInfo** "section" (i.e. the value of the `'HotelInfo'` key) and the **Reviews** "section" (the value of the `'Reviews'` key).

In [15]:
hotel_data.keys()

dict_keys(['Reviews', 'HotelInfo'])

In [16]:
# The hotel information is stored in a dictionary.
hotel_data['HotelInfo']

{'Name': 'BEST WESTERN PLUS Pioneer Square Hotel',
 'HotelURL': '/ShowUserReviews-g60878-d72572-Reviews-BEST_WESTERN_PLUS_Pioneer_Square_Hotel-Seattle_Washington.html',
 'Price': '$117 - $189*',
 'Address': '<address class="addressReset"> <span rel="v:address"> <span dir="ltr"><span class="street-address" property="v:street-address">77 Yesler Way</span>, <span class="locality"><span property="v:locality">Seattle</span>, <span property="v:region">WA</span> <span property="v:postal-code">98104-2530</span></span> </span> </span> </address>',
 'HotelID': '72572',
 'ImgURL': 'http://media-cdn.tripadvisor.com/media/ProviderThumbnails/dirs/51/f5/51f5d5761c9d693626e59f8178be15442large.jpg'}

In [17]:
# The list of reviews with the data for each review also being stored in a dictionary.
hotel_data['Reviews']

[{'Ratings': {'Service': '4',
   'Cleanliness': '5',
   'Overall': '5.0',
   'Value': '4',
   'Sleep Quality': '4',
   'Rooms': '5',
   'Location': '5'},
  'AuthorLocation': 'Boston',
  'Title': '“Excellent Hotel & Location”',
  'Author': 'gowharr32',
  'ReviewID': 'UR126946257',
  'Content': 'We enjoyed the Best Western Pioneer Square. My husband and I had a room with a king bed and it was clean, quiet, and attractive. Our sons were in a room with twin beds. Their room was in the corner on the main street and they said it was a little noisier and the neon light shone in. But later hotels on the trip made them appreciate this one more. We loved the old wood center staircase. Breakfast was included and everyone was happy with waffles, toast, cereal, and an egg meal. Location was great. We could walk to shops and restaurants as well as transportation. Pike Market was a reasonable walk. We enjoyed the nearby Gold Rush Museum. Very, very happy with our stay. Staff was helpful and knowledge

The hotel information is stored in a dictionary (with keys such as `'HotelID'` and `'Address'`), while the reviews are stored in a list--a list of dictionaries, with each dictionary containing information about a particular review. Let us get the list of reviews and save them to the `reviews` variable for further analysis.

In [18]:
reviews = hotel_data['Reviews'] # list of reviews
type(reviews) # check that it is a list

list

In [19]:
print("There are",len(reviews),"reviews altogether.")

There are 233 reviews altogether.


In [20]:
# display first review
first_review = reviews[0] 
first_review

{'Ratings': {'Service': '4',
  'Cleanliness': '5',
  'Overall': '5.0',
  'Value': '4',
  'Sleep Quality': '4',
  'Rooms': '5',
  'Location': '5'},
 'AuthorLocation': 'Boston',
 'Title': '“Excellent Hotel & Location”',
 'Author': 'gowharr32',
 'ReviewID': 'UR126946257',
 'Content': 'We enjoyed the Best Western Pioneer Square. My husband and I had a room with a king bed and it was clean, quiet, and attractive. Our sons were in a room with twin beds. Their room was in the corner on the main street and they said it was a little noisier and the neon light shone in. But later hotels on the trip made them appreciate this one more. We loved the old wood center staircase. Breakfast was included and everyone was happy with waffles, toast, cereal, and an egg meal. Location was great. We could walk to shops and restaurants as well as transportation. Pike Market was a reasonable walk. We enjoyed the nearby Gold Rush Museum. Very, very happy with our stay. Staff was helpful and knowledgeable.',
 'Da

In [21]:
print("The first review's author is", first_review['Author'])

The first review's author is gowharr32


In [22]:
# or 
hotel_data['Reviews'][0]['Author']

'gowharr32'

In [23]:
print(first_review['Author'],"made the following comments:",'\n')
print(first_review['Content'])

gowharr32 made the following comments: 

We enjoyed the Best Western Pioneer Square. My husband and I had a room with a king bed and it was clean, quiet, and attractive. Our sons were in a room with twin beds. Their room was in the corner on the main street and they said it was a little noisier and the neon light shone in. But later hotels on the trip made them appreciate this one more. We loved the old wood center staircase. Breakfast was included and everyone was happy with waffles, toast, cereal, and an egg meal. Location was great. We could walk to shops and restaurants as well as transportation. Pike Market was a reasonable walk. We enjoyed the nearby Gold Rush Museum. Very, very happy with our stay. Staff was helpful and knowledgeable.


We want to create a list with just the comments (strings). We do this by iterating over the list of reviews...

In [24]:
comment_lst = []  # will contain the review strings
for review in reviews:
    comment_lst.append(review['Content'])

In [25]:
## Appended list

comment_lst

['We enjoyed the Best Western Pioneer Square. My husband and I had a room with a king bed and it was clean, quiet, and attractive. Our sons were in a room with twin beds. Their room was in the corner on the main street and they said it was a little noisier and the neon light shone in. But later hotels on the trip made them appreciate this one more. We loved the old wood center staircase. Breakfast was included and everyone was happy with waffles, toast, cereal, and an egg meal. Location was great. We could walk to shops and restaurants as well as transportation. Pike Market was a reasonable walk. We enjoyed the nearby Gold Rush Museum. Very, very happy with our stay. Staff was helpful and knowledgeable.',
 'Great visit to Seattle thanks to our stay at the Best Western Pioneer Square! The hotel was reasonably priced and close to everything we wanted to see - ferry ride, Underground Tour, Klondike Museum, short walk to Pike Market and other shopping. The staff was amazingly helpful and a

In [26]:
len(comment_lst) # contains 233 comments--one for each reviewer

233

In [27]:
first_comment=comment_lst[0]
print("The first comment in the comment list is:",'\n')
print(first_comment)

The first comment in the comment list is: 

We enjoyed the Best Western Pioneer Square. My husband and I had a room with a king bed and it was clean, quiet, and attractive. Our sons were in a room with twin beds. Their room was in the corner on the main street and they said it was a little noisier and the neon light shone in. But later hotels on the trip made them appreciate this one more. We loved the old wood center staircase. Breakfast was included and everyone was happy with waffles, toast, cereal, and an egg meal. Location was great. We could walk to shops and restaurants as well as transportation. Pike Market was a reasonable walk. We enjoyed the nearby Gold Rush Museum. Very, very happy with our stay. Staff was helpful and knowledgeable.


We we want to iterate over the list of comments and obtain information about the comments made by the reviewers. Since each of the comments is a String object, we are going to need some String methods to extract the information. See, for example, https://www.w3schools.com/python/python_ref_string.asp. Let us illustrate some of the listed methods with the comments from the first reviewer.

In [28]:
# Create a new string with all characters made lower case..
first_comment.lower()

'we enjoyed the best western pioneer square. my husband and i had a room with a king bed and it was clean, quiet, and attractive. our sons were in a room with twin beds. their room was in the corner on the main street and they said it was a little noisier and the neon light shone in. but later hotels on the trip made them appreciate this one more. we loved the old wood center staircase. breakfast was included and everyone was happy with waffles, toast, cereal, and an egg meal. location was great. we could walk to shops and restaurants as well as transportation. pike market was a reasonable walk. we enjoyed the nearby gold rush museum. very, very happy with our stay. staff was helpful and knowledgeable.'

In [29]:
# Find how many times the string "we" is mentioned in the comments.
first_comment.count("we")

2

In [30]:
# If we wanted a "case-insensitive" search of instances of "we", we can do this...
first_comment.lower().count("we")  # include "We" as well

7

**Problem 2 (4 pts.)**: Complete the loop below to display the number of times "bathroom" is displayed in the reviews for this hotel.

In [31]:
counter = 0
for review in comment_lst:
    # To Do: Insert code in the for loop body to determine if "bathroom" appears in the current review
    # and if it does to increment the counter variable
    if review.lower().count("bathroom"):
        counter += 1
        
print("There are", counter, "reviews that contain the word 'bathroom'.")
# https://realpython.com/python-string-formatting/#3-string-interpolation-f-strings-python-36
# print(f"There are {counter} reviews that contain the word 'bathroom'.")

There are 26 reviews that contain the word 'bathroom'.


**Problem 3 (5 pts.)**: Print the number of "wordy" comments. A comment is considered "wordy" if it contains more than 100 words. For example, "We stayed here and we liked it" contains 7 words.

In [32]:
counter = 0
for review in comment_lst:
    # TO DO: Insert code in the for loop body to get the number of "wordy" comments.
     if len(review.split()) > 100:
        counter += 1 


# Let us check that we have 128 wordy comments.
print(f'There are {counter} wordy comments.')

There are 128 wordy comments.


We want to iterate of the `reviews` dictionary again this time saving with name of the reviewer together with the comments (in a dictionary).

**Problem 4 (5 pts.)**: Create a list of dictionaries (`ar_dict`), where each dictionary has two keys: `"Author"` and `"Comments"`, by iterating over the list of reviews and for each review constructing a dictionary containing the author's name and comments and then appending it to the list of dictionaries we are creating.

In [33]:
ar_lst = []
for review in reviews:
    # TO DO: (1) Create an empty dictionary, ar_dict.
    #        (2) Add two key/value pairs containing the author and comments, respectively.
    #        (3) Append this newly constructed dictionary, ar_dict, to the ar_lst list.

    ar_dict = {}
    ar_dict["Author"] = review['Author']
    ar_dict["Comments"] = review['Content']
    ar_lst.append(ar_dict)


# Let us check that we have 233 elements in the ar_lst list.
print(f'There are {len(ar_lst)} elements in the list.')


There are 233 elements in the list.


In [34]:
# Let us display the data from the fourth dictionary in the list, i.e. ar_lst[3].

# Should display something like this:

# TimothyFlorida said this:
# Accommodation in Seattle can be expensive. This hotel is very reasonably priced, 
# located just off Pioneer Square, close to shops and restaurants and public transport (light rail). 
# It appears to have recently been completely renovated in a period theme. 
# Rooms are small but well equipped comfortable and clean - reminds me a little of a European style hotel. 
# Staff are polite and helpful - in a genuine way. 
# It is suitable for all types of stays business to a romantic getaway. Next time your in Seattle stay here!

first_review = ar_lst[3]
print(first_review['Author'],'said this:')
print(first_review['Comments'])

TimothyFlorida said this:
Accommodation in Seattle can be expensive. This hotel is very reasonably priced, located just off Pioneer Square, close to shops and restaurants and public transport (light rail). It appears to have recently been completely renovated in a period theme. Rooms are small but well equipped comfortable and clean - reminds me a little of a European style hotel. Staff are polite and helpful - in a genuine way. It is suitable for all types of stays business to a romantic getaway. Next time your in Seattle stay here!


In the following the following exercise we want to count the number of *unique*, i.e. *unique*, words in each of the comments. We previous learned how to split a string into to create a list of words. We can write code from scratch to count the number of different words in the list. Alternatively, we can convert the list to another container data type that makes it easier to obtain this information. The `counter` module defines the `Counter` class. A `Counter` is basically a "special type" of dictionary. Given a list object `my_list` we can turn it into a counter object as follows: `Counter(my_list)`. This assume we already imported the class from the collections module: `from collections import Counter`. See http://rahmonov.me/posts/python-collections-counter/. 

For example,

```python
from collections import Counter
my_list = ['a', 'b', 'c', 'c', 'a', 'd', 'b', 'e', 'a']
Counter(my_list)
```
creates the Counter object:

```python
Counter({'a': 3, 'b': 2, 'c': 2, 'd': 1, 'e': 1})
```

This tells you that the letter `a` appears `3` times in the list, etc.

We can get the keys and values just like with dictionaries:

```python
list(Counter(my_list).keys())
```

returns

```python
['a', 'b', 'c', 'd', 'e']
```

and

```python
list(Counter(my_list).values())
```

returns

```python
[3, 2, 2, 1, 1]
```

**Problem 5 (4 pts.)**: Iterate over `ar_list` and print the name of each reviewer (author) and the total number of *different*, i.e. *unique*, words in his review. For example, "We stayed here and we liked it" contains 6 *unique* words since 'we' is repeated.

In [36]:
# Testing 

from collections import Counter
first_review = ar_lst[0]
print(first_review['Author'])
print(first_review['Comments'])
print(len(Counter(first_review['Comments'].split()).keys()))

gowharr32
We enjoyed the Best Western Pioneer Square. My husband and I had a room with a king bed and it was clean, quiet, and attractive. Our sons were in a room with twin beds. Their room was in the corner on the main street and they said it was a little noisier and the neon light shone in. But later hotels on the trip made them appreciate this one more. We loved the old wood center staircase. Breakfast was included and everyone was happy with waffles, toast, cereal, and an egg meal. Location was great. We could walk to shops and restaurants as well as transportation. Pike Market was a reasonable walk. We enjoyed the nearby Gold Rush Museum. Very, very happy with our stay. Staff was helpful and knowledgeable.
91


In [37]:
counter = 0
from collections import Counter
for review in ar_lst:
    # TO DO: (1) Get the number of words in the current review variable.
    #        (2) Print the author's name and the number of words in his review, e.g.
    #            the first line printed might look something like this: 
    #            gowharr32 used 91 words.
    a = review['Author']
    b = len(Counter(review['Comments'].split()).keys())
    counter +=1
    print(a, "used", b, "words.")

gowharr32 used 91 words.
Nancy W used 90 words.
Janet H used 44 words.
TimothyFlorida used 70 words.
KarenArmstrong_BC used 111 words.
Shane33333 used 51 words.
Bnkruzn used 16 words.
Teacherbear used 55 words.
CandyGnomad used 50 words.
idahosandy used 17 words.
CW2S used 157 words.
jimmy62_11 used 84 words.
BoulderIllini used 136 words.
funlovingdad used 107 words.
suntraveler222 used 222 words.
rosariodurao used 41 words.
Jody R used 91 words.
Tasha M used 25 words.
Roy C used 89 words.
MikeGB2 used 113 words.
Jennie S used 64 words.
mcdonothing used 85 words.
SWanjiru used 109 words.
trish0 used 53 words.
txlnstr used 25 words.
mydogisfat used 49 words.
DaddyHoward used 101 words.
BCisBeautiful used 127 words.
wildorchid416 used 188 words.
JCPCG used 87 words.
quilter1975 used 53 words.
Zyg022 used 62 words.
daglou used 31 words.
seeknfind used 140 words.
photodiver12 used 100 words.
Mazneed used 140 words.
Barbra_M used 23 words.
bleepingfreezing used 45 words.
deborjak used 43 wo