## Objectives and Scope

The objective is to conduct the analysis of data comprising numerous reviews for McDonald's stores across the United States. The primary aim is to define the overall structure of the received data and to identify potential patterns within it. 

While the dataset contains textual data suitable for Natural Language Processing (NLP) analysis, it's essential to clarify that our current focus is not on NLP techniques. Instead, our primary goal is to extract insights that can inform strategic decisions by uncovering recurring themes, sentiments, and trends within the reviews. 

This main objective of this exploration is to provide valuable insights into customer experiences and perceptions of McDonald's stores nationwide, contributing to a better understanding of the brand's strengths and areas for improvement.

## Dataset Information

This dataset was took from [Kaggle](https://www.kaggle.com/datasets/nelgiriyewithana/mcdonalds-store-reviews), credits due to Nidula Elgiriyewithana, the author of that dataset.

According to the Kaggle page itself, there are over 33000 anonymized reviews of McDonald's stores in the United States, scraped from Google reviews.

The features are as it follows:

* **reviewer_id**: Unique identifier for each reviewer (anonymized)
* **store_name**: Name of the McDonald's store
* **category**: Category or type of the store
* **store_address**: Address of the store
* **latitude**: Latitude coordinate of the store's location
* **longitude**: Longitude coordinate of the store's location
* **rating_count**: Number of ratings/reviews for the store
* **review_time**: Timestamp of the review
* **review**: Textual content of the review
* **rating**: Rating provided by the reviewer

## Initial Exploration


In [2]:
#Imports and Variables

import pandas as pd 

In [3]:
df = pd.read_csv('../data/McDonald_s_Reviews.csv', encoding='latin1')
df.head()

Unnamed: 0,reviewer_id,store_name,category,store_address,latitude,longitude,rating_count,review_time,review,rating
0,1,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,Why does it look like someone spit on my food?...,1 star
1,2,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,It'd McDonalds. It is what it is as far as the...,4 stars
2,3,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,Made a mobile order got to the speaker and che...,1 star
3,4,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,a month ago,My mc. Crispy chicken sandwich was ï¿½ï¿½ï¿½ï¿...,5 stars
4,5,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,2 months ago,"I repeat my order 3 times in the drive thru, a...",1 star


In [4]:
df.describe()

Unnamed: 0,reviewer_id,latitude,longitude
count,33396.0,32736.0,32736.0
mean,16698.5,34.442546,-90.647033
std,9640.739131,5.344116,16.594844
min,1.0,25.790295,-121.995421
25%,8349.75,28.65535,-97.792874
50%,16698.5,33.931261,-81.471414
75%,25047.25,40.727401,-75.399919
max,33396.0,44.98141,-73.45982


* Note that the count of latitude and longitude are lower than reviewer_id. This might indicate missing values for coordinates.

In [5]:
df.nunique()

reviewer_id      33396
store_name           2
category             1
store_address       40
latitude            39
longitude           39
rating_count        51
review_time         39
review           22285
rating               5
dtype: int64

* Looking at the amount of unique values in each column, there is no variation in category. Then, it won't be valuable information for our analysis as of now.
* store_name always showed the same value in the first rows of data, but there are 2 different unique values.

In [6]:
df['store_name'].unique()

array(["McDonald's", "ýýýMcDonald's"], dtype=object)

* The second unique value appears as the same value, but with some noise. We may consider this a product of error and this column to also be ignorable.

In [8]:
df.isna().sum()

reviewer_id        0
store_name         0
category           0
store_address      0
latitude         660
longitude        660
rating_count       0
review_time        0
review             0
rating             0
dtype: int64

* There are indeed 660 rows of this dataset with missing values for coordinates. 


## Dashboard

![Dashboard McDonald's](dashboard-1.png)


## Conclusions

* Ratings predominantly cluster around 1 star or 5 stars, with intermediate ratings more common for higher overall ratings.
* Despite stores being from various US regions, data filters reveal consistent characteristics across regions.
* Majority of reviews are older, suggesting a need for increased recent review volume.
* Enhancing recent review data can facilitate analysis of evolving public perception for each McDonald's store.
* About 1% of the coordinates are missing. If we expand this analysis, this might be a point to keep attention to.