# Reviews Challenge

The goal of this challange is to use data from restaurant reviews in Google Maps to create a Natural Language Processing (NLP) pipeline. For example, you can train a Deep Learning model to generate new random reviews or you could try to predict the number of stars a reviewer gave to a restaurant from the review they wrote. 

If you want you can also use this dataset in order to do some data analysis or visualization. For example, you could show the relationship between a restaurant's location, price and average rating. However, we encourage you to try to engage in some NLP/ML tasks. Even if you have no previous experience in Deep Learning or Machine Learning you can try to learn the basics over the weekend, at the end of the notebook we share with you some resources we think might be helpful.

You may even deploy some NLP models you develop! Take, for example, a WebApp where a user can input a review and is given a prediction for the rating of said review. The sky is the limit, get your ceative juices flowing! 

The possibilities are endless. This notebook begins with a brief description of the dataset and continues with an example of an NLP task where I give some tips on how to tackle this challenge. At the end, we attach some links to some resources for those of you who might want to learn more about NLP and ML. When doing any sort of ML solution, remember to **train and validate** your models using *train_reviews.json* and report your accuracy or other metrics using the **test data** in *test_reviews.json*, if need be.

## Taking a look at our data

First of all let's take a look at the data:

In [1]:
import json 

In [2]:
f=open("train_reviews.json")
json_data=json.load(f)

In [3]:
print("Number of records (restaurants): ",len(json_data))

Number of records (restaurants):  270


#### Example record

In [4]:
list(json_data[0].keys())

['position',
 'title',
 'place_id',
 'data_id',
 'data_cid',
 'gps_coordinates',
 'rating',
 'reviews',
 'price',
 'type',
 'address',
 'open_state',
 'hours',
 'phone',
 'website',
 'description',
 'thumbnail',
 'reviews_data']

We have 42104 reviews in total, averaging at around 140 reviews per restaurant.

#### Example review

In [5]:
reviews=json_data[0]["reviews_data"]
reviews[0]

{'user': {'name': 'Dr Josh Brower',
  'link': 'https://www.google.com/maps/contrib/112044676941472177853?hl=en-US&sa=X&ved=2ahUKEwjqkYfJiP_zAhVnnuAKHcHZCgkQvvQBegQIARAy',
  'thumbnail': 'https://lh3.googleusercontent.com/a-/AOh14GgHaTF4PU2_MxHxRa9tTqfawOnipqaSX4r4M7d-H9k=s40-c-c0x00000000-cc-rp-mo-ba5-br100',
  'local_guide': True,
  'reviews': 128,
  'photos': 837},
 'rating': 2.0,
 'date': 'a week ago',
 'snippet': "For now I would say stay away from this restaurant. I think they are having staffing issues effecting all levels of service. For a finer quality dining place this was poor. The older man that was the host when we entered was the high point for service. The server was a waitress and not particularly professional. To order took far to long, the cooking of the food took to long, and when it was delivered it wasn't to the right people with sides spread throughout the table instead of to the people who ordered them. The manager is a tall black man who definitely needs work on 

### Example task: Sentiment Analysis

In this example, we show a Deep Learning model you can get from Hugging Face. Some documentation for the library *transformers* can be found [here](https://huggingface.co/transformers/quicktour.html#getting-started-on-a-task-with-a-pipeline). The [model](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) we are loading is called “distilbert-base-uncased-finetuned-sst-2-english”. 

In [6]:
from transformers import pipeline

print("loading model...")
sentiment_analysis = pipeline("sentiment-analysis") # This might take a while
print(sentiment_analysis("I love this!"))

loading model...


All model checkpoint layers were used when initializing TFDistilBertForSequenceClassification.

All the layers of TFDistilBertForSequenceClassification were initialized from the model checkpoint at distilbert-base-uncased-finetuned-sst-2-english.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFDistilBertForSequenceClassification for predictions without further training.


[{'label': 'POSITIVE', 'score': 0.9998764991760254}]


In [9]:
r=reviews[10]
print(r["rating"], " stars review:\n", r["snippet"], sep="")
print("Prediction:", sentiment_analysis(r["snippet"]))

1.0 stars review:
I am a regular at this place. I have been going here before the pandemic and I have supported the place during the pandemic. The staff there is great including Eden, David, Nicole, Jessica, etc.Recently, I tried to go there around 9:30 knowing that the place closes at 10. The door was closed so I started walking away. A new staff member who is short and is wearing a vest came out and asked me if I had an order put in by 9:15. I said no but I told him I was a regular there and if it is possible for me to get desert. This guy that I have never seen before said in a condensing look and tone "We would love to have you when the place is open". I felt that was uncalled for and he could have just said we are closed now and I would have been fine with that. This guy is not the right person for this establishment. Eden ran this place very well and she was the sweetest manager anyone can ask for. This guy with a vest really does not understand customer service; something that I

In [8]:
r=reviews[1]
print(r["rating"], " stars review:\n", r["snippet"], sep="")
print("Prediction:", sentiment_analysis(r["snippet"]))

5.0 stars review:
Thoroughly enjoyed my wife s birthday at Capital Grille Broadway tonight Oct 21 2021🎂 . We were served by Michael who was very professional, personable, and made great suggestions as we were unsure which seafood to select for main dishes. The sushi grade tuna dinner was one of the best I've had and the Seabass was quite delicious.. Michael checked on us frequently to make sure we were content. The grille hostess welcomed my wife with a Happy Birthday when being seated. Michael surprised us with a complementary happy birthday dessert for my wife with the words Happy Birthday in chocolate. It was a great evening thanks to Michael and The GRILLE!!!
Prediction: [{'label': 'POSITIVE', 'score': 0.9998561143875122}]


As you can see, it labels a bad review (1 stars) as NEGATIVE and a good review (5 stars) as POSITIVE. Feel free to try other reviews or your own texts.

So, given this model, you have severel options. For example, you could try to change the last layer on the model to perform multiclass classification and try to **predict the rating of the review** through fine-tuning. Another option might be to extract the representations that this model generates in the hidden intermediate layers and use those in an ML pipeline using other algorithms. Some info on how to **fine-tune your models** can be found [here](https://huggingface.co/transformers/training.html). As previously mentioned, this is just one example and many other tasks and models can be trained. Take a look at other models at [Hugging Face](https://huggingface.co/) or make your own models from scratch.

However, it's not all about Deep Learning. The suggestion above might seem dauting if you've never done any DL before. Luckily for us, there are other ways to generate numerical representations from texts. For example, you could **use TF-IDF to compute a representation for every review**. Then you could use this representation to get the similarity between texts or to set up an ML pipeline using k-NN or other algorithms of your choice. This approach is not as powerful as standard DL but doesn't have such a steep learning curve.

If you are interested in this last approach you can take a look at the following resources:
- Understanding TF-IDF: https://monkeylearn.com/blog/what-is-tf-idf/
- Using TF-IDF for text classification: https://monkeylearn.com/blog/what-is-tf-idf/

However, feel free to use any technologies you want. **Do research, explore, learn and enjoy!**