### In this section, you will use the Yelp reviews dataset to answer the questions below. You can
### access the dataset via this link (Section 1 data file).
### Please answer the following questions:
 1. How many unique restaurants could be found in this data set? (Hint: Use the
[Business_ID] column for this evaluation.)
2. Which restaurant received the highest number of reviews? What about percentage-
wise?
3. Which cities have got at least one 5-star review in Nevada (NV) state?
4. Which city has the highest number of reviews in the Business Category of “Hotels &amp;
Travel”? What about percentage-wise?
5. At what day of the week people are more likely to post their reviews?
6. Showcase if there are any trends regarding restaurant performance as time goes by.
7. Based on analyzed data showcase if there are any steps that the restaurant can take
to improve their public appeal.
8. Bonus Question – Based on this data set which user had the highest cumulative
travel distance? What distance has been covered by him/her?

In [1]:
import pandas as pd 
import plotly.express as px
import plotly.graph_objects as go
from geopy.distance import geodesic 

In [2]:
# We call the dataset to execute the process on it
df=pd.read_csv(r"C:\Users\User\Downloads\Section 1 and 3 data\Section 1 data.csv")

In [3]:
df.columns

Index(['Review_Date', 'Review_Text', 'User_ID', 'Business_ID', 'Business_Name',
       'Business_Category', 'City', 'State', 'Latitude', 'Longitude',
       'Avg_Business_Star_Rating'],
      dtype='object')

In [4]:
df.info() #dataset information

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 227581 entries, 0 to 227580
Data columns (total 11 columns):
 #   Column                    Non-Null Count   Dtype  
---  ------                    --------------   -----  
 0   Review_Date               227581 non-null  object 
 1   Review_Text               227580 non-null  object 
 2   User_ID                   227581 non-null  object 
 3   Business_ID               227581 non-null  object 
 4   Business_Name             227581 non-null  object 
 5   Business_Category         227297 non-null  object 
 6   City                      227581 non-null  object 
 7   State                     227581 non-null  object 
 8   Latitude                  227581 non-null  float64
 9   Longitude                 227581 non-null  float64
 10  Avg_Business_Star_Rating  227581 non-null  float64
dtypes: float64(3), object(8)
memory usage: 19.1+ MB


In [5]:
df.head() #we look at the first five lines

Unnamed: 0,Review_Date,Review_Text,User_ID,Business_ID,Business_Name,Business_Category,City,State,Latitude,Longitude,Avg_Business_Star_Rating
0,2014-07-16,okay...so so,6VxJJX7h36bMCCFt7URg6w,9QqLqYIwV-n1BJPjnaYv8A,Beef 'O' Brady's,Bars,Chandler,AZ,33.303847,-111.946838,3.5
1,2014-07-16,Some people complain about the prices but Subw...,viGPiPuMZnV4PR_aiA3-qw,pNQwnY_q4okdlnPiR-3RBA,Empire Bagels,Food,Las Vegas,NV,36.077299,-115.297979,3.5
2,2014-07-16,"Had red curry chicken, the chicken itself was ...",rwehMCinfBjhZ0IbR1zFBw,shCdCHRbnY5FTMJbWl-myQ,Thai Spices,Thai,Mesa,AZ,33.412708,-111.875803,4.0
3,2014-07-16,"Alright, I gave the restaurant manager ample t...",VWqt5IH8fm-k9M0CKFkJzg,HpaYCM_NCauI72LLXxC6SA,Yonaka Modern Japanese,Tapas/Small Plates,Las Vegas,NV,36.114935,-115.209737,4.5
4,2014-07-16,Blehhhh :/ this place shouldn't even be in bus...,Y6-0ToMhjBsm8iYEaT2meg,FC4q3hJyF8oo984xoo3RMg,808 Sushi,Sushi Bars,Las Vegas,NV,36.052181,-115.279227,3.5


Dataset Columns Description:
1. &#39;Review_Date&#39; – the date when the review was posted by the user
2. &#39;Review_Text&#39; – text of the review
3. &#39;User_ID&#39; – Unique Identification Number of users, who made this post
4. &#39;Business_ID&#39; – Unique ID of business. Please note, that this column distinguishes businesses
with the same names (for instance chain of restaurants)
5. &#39;Business_Name&#39; – Official business name
6. &#39;Business_Category&#39; – The category in which this business operates
7. &#39;City&#39; – City location
8. &#39;State&#39; – State Location
9. &#39;Latitude&#39; – X coordinates of business
10. &#39;Longitude&#39; – Y coordinates of business
11. &#39;Avg_Business_Star_Rating&#39; – Rating review left by user.

In [6]:
df.tail() #we look at the last five lines

Unnamed: 0,Review_Date,Review_Text,User_ID,Business_ID,Business_Name,Business_Category,City,State,Latitude,Longitude,Avg_Business_Star_Rating
227576,2005-03-08,"It's not the Four Seasons, but more appropriat...",K4FAia2Iy5MVnmBLfS-mCg,WnY4HPJIYNXOPQH2mFzl2Q,THEhotel at Mandalay Bay,Hotels & Travel,Las Vegas,NV,36.092988,-115.177838,4.0
227577,2005-03-03,There is nothing better than happy hour on the...,G8Q9rASB6YI2ICBkkpwvcw,RgBq9TFI8q6-vCvF6wOMVg,Genna's Lounge,Bars,Madison,WI,43.07272,-89.384389,4.0
227578,2005-03-03,Easily my favorite place to eat in Madison. G...,8ITVDdfK07owxCA1x878Vw,3nwskbfFgsSjVe6T8keTeg,Lao Laan-Xang Restaurant,Thai,Madison,WI,43.083166,-89.364985,4.0
227579,2005-03-01,"Spacious, luxurious rooms that definitely meri...",WPOKvkacSKHx_bIG1alFiA,-7yF42k0CcJhtPw51oaOqQ,Bellagio,Hotels & Travel,Las Vegas,NV,36.112024,-115.174593,4.0
227580,2005-01-24,The buffet in this hotel is excellent! The set...,XR4cWlqS9qC25GMnNz0zlw,bYhpy9u8fKkGhYHtvYXazQ,Paris Las Vegas Hotel & Casino,Hotels & Travel,Las Vegas,NV,36.112629,-115.172653,3.0


In [7]:
# Converts the DataFrame's column names to lowercase and replaces spaces with underscores
df.columns=map(lambda x:x.lower().replace(" ","_"),df.columns)

In [8]:
df.columns

Index(['review_date', 'review_text', 'user_id', 'business_id', 'business_name',
       'business_category', 'city', 'state', 'latitude', 'longitude',
       'avg_business_star_rating'],
      dtype='object')

1. How many unique restaurants could be found in this data set? (Hint: Use the
[Business_ID] column for this evaluation.) -- *Ansver*

In [9]:
# Count the number of unique restaurants in the 'business_id' column
# The .nunique() function finds how many unique (non-repeated) values ​​are in a column
uniq_resturants=df.business_id.nunique()

In [10]:
uniq_resturants

30276

In [11]:
# Print the number of unique restaurants
print(f"total unique returants: {uniq_resturants}")

total unique returants: 30276


**total unique returants**: ***30276***


2. Which restaurant received the highest number of reviews? What about percentage-
wise? -- *Ansver*

In [12]:
group_reviews=(df.groupby(["business_id","business_name"]) # Groups by 'business_id' and 'business name' columns
               .size() # Counts how many observations are in each group
               .reset_index(name="review_count")) # Converts the grouping results to a DataFrame and stores it in the 'review_count' column

In [13]:
# We calculate the total number of comments
total_reviews = group_reviews.review_count.sum()
# We divide the number of comments by the total number of comments and calculate the percentage
group_reviews['percentage'] = (group_reviews.review_count / total_reviews) * 100

In [14]:
print(f"Highest number of reviews by restaurant(business_id): {group_reviews.iloc[0,0]}")
print(f"Highest number of reviews by restaurant(business_name): {group_reviews.iloc[0,1]}")
print(f"Highest number of reviews by restaurant(review_count): {group_reviews.iloc[0,2]}")
print(f"Highest number of reviews by restaurant(percentage): {group_reviews.iloc[0,3]} %")

Highest number of reviews by restaurant(business_id): --1emggGHgoG6ipd_RMb-g
Highest number of reviews by restaurant(business_name): Sinclair
Highest number of reviews by restaurant(review_count): 2
Highest number of reviews by restaurant(percentage): 0.0008788079848493504 %


1. **Highest number of reviews by restaurant(business_id)** : ***4bEjOyTaDG24SY5TxsaUNQ***
2. **Highest number of reviews by restaurant(business_name)**: ***Mon Ami Gabi***
3. **Highest number of reviews by restaurant(review_count)**: ***856***
4. **Highest number of reviews by restaurant(percentage)**: ***0.37612981751552194 %***


In [15]:
# Sort groups by 'review_count' column in descending order
# This helps us see which groups have the most customer feedback by highlighting the groups with the most reviews.
group_reviews = group_reviews.sort_values(by='review_count', ascending=False)

In [16]:
group_reviews

Unnamed: 0,business_id,business_name,review_count,percentage
2721,4bEjOyTaDG24SY5TxsaUNQ,Mon Ami Gabi,856,0.376130
1784,2e2e7WgqU1BnpxmQL5jbfw,Earl of Sandwich,744,0.326917
30236,zt1TpTuJ6y9n551sw9TaEg,Wicked Spoon,656,0.288249
16654,Xhg93cMdemu5pAMkDoEdtQ,Serendipity 3,543,0.238596
16951,YNQgak-ZLtYJQxlDwN-qIg,The Buffet,531,0.233324
...,...,...,...,...
23766,m3oI5UX66AAQLrIYI12Dpw,Southwestern Eye Center,1,0.000439
8945,HXwr1Zh2zSltr_lR_GQv7w,Murphy's Bar and Restaurant,1,0.000439
8943,HXw8EgmDFKnoJhD_V8N4lA,Ecowater Systems,1,0.000439
23769,m4BfTr967Ym8ASXX38Ipqg,Target Stores,1,0.000439


In [17]:
group_reviews.head(1) #highest number of reviews

Unnamed: 0,business_id,business_name,review_count,percentage
2721,4bEjOyTaDG24SY5TxsaUNQ,Mon Ami Gabi,856,0.37613


In [18]:
top_10 = group_reviews.head(10)

# Create a graph with Plotly
fig = px.bar(
    top_10,
    x='review_count',
    y='business_name',
    orientation='h',  # Horizontal bar chart
    text='percentage',  # Add percentages above the bars
    labels={'review_count': 'Number of Reviews', 'business_name': 'Restaurant Name'},
    title='Top 10 Restaurants with Most Reviews',
    color='review_count',  # Set the color of the bars according to the number of comments
    color_continuous_scale='viridis'  # Color palette
)

# Format percentages
fig.update_traces(texttemplate='%{text:.1f}%', textposition='outside')

# Set axis and layout
fig.update_layout(
    xaxis_title='Number of Reviews',
    yaxis_title='Restaurant Name',
    yaxis=dict(categoryorder='total ascending'),  # Sort the bars in ascending order
    template='plotly_white'
)

# Show graph
fig.show()


3. Which cities have got at least one 5-star review in Nevada (NV) state? -- *Ansver*

In [19]:
# Filter the dataset for restaurants in Nevada (NV) with an average business star rating of 5.0
NV_5_stars=df[(df.state=="NV") & (df.avg_business_star_rating==5.0)]

In [20]:
# Get a list of unique cities where restaurants with a 5.0 rating are located in Nevada (NV)
cities=NV_5_stars["city"].unique()

In [21]:
cities.size

4

In [22]:
print(f"The following cities in Nevada have at least one 5-star review: {', '.join(cities)}.")


The following cities in Nevada have at least one 5-star review: Las Vegas, Henderson, Boulder City, Nellis.


**The following cities in Nevada have at least one 5-star review:** ***Las Vegas, Henderson, Boulder City, Nellis.***


4. Which city has the highest number of reviews in the Business Category of “Hotels &amp;
Travel”? What about percentage-wise? -- *Ansver*

In [23]:
city_review_count = df[df.business_category == "Hotels & Travel"].groupby("city")["review_text"].count()

In [24]:
total_reviews = city_review_count.sum()

In [25]:
top_10_cities = city_review_count.sort_values(ascending=False).head(10)

In [26]:
top_10_cities_percentage = (top_10_cities / total_reviews) * 100

In [27]:
print(f"Many review city: {top_10_cities.index[0]}, many reviews: {top_10_cities.iloc[0]}, many reviews percentage: {top_10_cities_percentage.iloc[0]:.2f}%")

Many review city: Las Vegas, many reviews: 10245, many reviews percentage: 77.64%


In [28]:
# Veriyi DataFrame'e çevir
top_10_cities_df = top_10_cities.reset_index()
top_10_cities_df.columns = ['City', 'Review Count']

# Plotly ile bar grafiği
fig = px.bar(
    top_10_cities_df,
    x='City',
    y='Review Count',
    title='Top 10 Cities with the Most Reviews in Hotels & Travel Category',
    text='Review Count',
    color='Review Count',
    color_continuous_scale='Turbo'
)

# Grafik düzenlemeleri
fig.update_traces(textposition='outside')
fig.update_layout(
    xaxis_title='City',
    yaxis_title='Number of Reviews',
    title=dict(x=0.5),  # Başlığı ortala
    template='plotly_white'
)

# Grafiği göster
fig.show()

In [29]:
# Diğer şehirlerin yüzdesini hesapla ve seriye ekle
other_cities_percentage = 100 - top_10_cities_percentage.sum()
others_series = pd.Series([other_cities_percentage], index=["Others"])

# İlk 10 şehir ve diğerlerini birleştir
top_10_cities_percentage_with_others = pd.concat([top_10_cities_percentage, others_series])

# Veriyi DataFrame'e çevir
percentage_df = top_10_cities_percentage_with_others.reset_index()
percentage_df.columns = ['City', 'Percentage']

# Plotly ile pie chart oluştur
fig = px.pie(
    percentage_df,
    names='City',
    values='Percentage',
    title='Percentage of Reviews for Top 10 Cities in Hotels & Travel Category',
    hole=0.3  # Donut tarzı bir görünüm için
)
# Grafik düzenlemeleri
fig.update_traces(
    textinfo='percent+label',  # Yüzdeler ve etiketler
    textposition='inside'
)
fig.update_layout(
    title=dict(x=0.5),  # Başlığı ortala
    template='plotly_white'
)
# Grafik gösterimi
fig.show()


5. At what day of the week people are more likely to post their reviews? -- *Ansver*

In [30]:
df.review_date=pd.to_datetime(df.review_date)

In [31]:
df.day_of_week=df.review_date.dt.day_name()


Pandas doesn't allow columns to be created via a new attribute name - see https://pandas.pydata.org/pandas-docs/stable/indexing.html#attribute-access



In [32]:
reviews_by_day=df.day_of_week.value_counts()

In [33]:
reviews_by_day

review_date
Monday       36446
Tuesday      34262
Wednesday    34170
Sunday       32889
Thursday     30669
Friday       29574
Saturday     29571
Name: count, dtype: int64

In [34]:
most_reviews_day=reviews_by_day.idxmax()
most_reviews_count=reviews_by_day.max()

In [35]:
print(f"People are more likely to post reviews on {most_reviews_day}.")
print(f"Number of reviews posted on {most_reviews_day}: {most_reviews_count}")

People are more likely to post reviews on Monday.
Number of reviews posted on Monday: 36446


In [36]:
import plotly.express as px

In [37]:
# Haftanın günlerine göre veri sıralaması
reviews_by_day = reviews_by_day.reindex(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])

# Veriyi DataFrame'e çevir
reviews_by_day_df = reviews_by_day.reset_index()
reviews_by_day_df.columns = ['Day', 'Number of Reviews']

# Plotly ile bar plot
fig = px.bar(
    reviews_by_day_df,
    x='Day',
    y='Number of Reviews',
    title='Number of Reviews by Day of the Week',
    text='Number of Reviews',
    color_discrete_sequence=['black']
)

# Grafik düzenlemeleri
fig.update_layout(
    xaxis_title='Day of the Week',
    yaxis_title='Number of Reviews',
    xaxis=dict(tickangle=45),
    template='plotly_white'
)

# Grafik gösterimi
fig.show()

6. Showcase if there are any trends regarding restaurant performance as time goes by. -- *Ansver*

In [38]:
df['review_date'] = pd.to_datetime(df['review_date'])
df['year_month'] = df['review_date'].dt.to_period('M')

In [39]:
performance = df.groupby('year_month').agg({
    'avg_business_star_rating': 'mean', 
    'review_text': 'count'
}).reset_index()

In [40]:
performance['year_month'] = performance['year_month'].astype(str)

In [41]:
fig = go.Figure()

In [42]:
fig.add_trace(go.Scatter(
    x=performance['year_month'], 
    y=performance['avg_business_star_rating'], 
    mode='lines',
    name='Average Rating',
    line=dict(color='blue')
))

# Toplam yorum sayısı çizgisi
fig.add_trace(go.Scatter(
    x=performance['year_month'], 
    y=performance['review_text'], 
    mode='lines',
    name='Total Reviews',
    yaxis='y2',
    line=dict(color='green')
))

# Grafikleri birleştir
fig.update_layout(
    title='Trends in Restaurant Performance Over Time',
    xaxis=dict(title='Time (Year-Month)', showgrid=False),
    yaxis=dict(title='Average Rating', titlefont=dict(color='blue'), tickfont=dict(color='blue')),
    yaxis2=dict(title='Total Reviews', titlefont=dict(color='green'), tickfont=dict(color='green'), anchor='x', overlaying='y', side='right'),
    legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1),
    template='plotly_white'
)

fig.show()

7. Based on analyzed data showcase if there are any steps that the restaurant can take
to improve their public appeal. -- *Ansver*

### Analysis and Strategic Suggestions for Average Rating and Total Reviews

#### 1. **Average Rating**

**Analysis:**
- **Upward Trend:** The Average Rating has been continuously increasing since 2010, indicating that the service quality of restaurants has improved over time.
- **Fluctuations:** There were noticeable fluctuations in 2006-2007. This could suggest that the restaurants' service quality was unstable or influenced by seasonal factors during that period.
- **Slow Growth Period:** A slower growth rate is observed between 2012-2014. This may indicate that service quality had stabilized, or that customer expectations increased during this time.

**Strategies:**
- **Ensuring Service Quality Continuity:** It is essential to continuously monitor and optimize service quality. The rising average prices suggest that restaurants are focusing on customer satisfaction and service enhancement.
- **Personalized Customer Experience:** In line with increasing prices, it is crucial to personalize the customer experience even further. Restaurants should focus on enhancing customer satisfaction through special services and individualized approaches.
- **Reducing Fluctuations:** Investigate the root causes of short-term fluctuations (seasonal changes, service issues, etc.) and take measures to prevent these fluctuations from affecting the overall service quality.

#### 2. **Total Reviews**

**Analysis:**
- **Increasing Trends:** The number of Total Reviews has grown rapidly since 2010, with a significant surge post-2014. This indicates that more customers are sharing their feedback, and the number of reviews is rising.
- **Reasons for the Increase:** The increase in reviews is likely due to a larger customer base that positively evaluates the restaurant’s service or an expansion of the customer base itself. It also suggests the restaurant has been successful in engaging with customers through social media and advertising.

**Strategies:**
- **Customer Incentives for Reviews:** To further increase the number of reviews, restaurants should encourage customers to leave feedback. Offering incentives such as discounts or gifts could motivate customers to write reviews.
- **Analyzing Review Quality:** While increasing the volume of reviews is important, it’s equally crucial to monitor their quality. The restaurant should actively identify and address fake or negative reviews, while also learning from the feedback to improve strengths and resolve weaknesses.
- **New Services and Menus:** Introducing new services and menu items based on customer feedback can steer reviews in a positive direction and improve customer satisfaction.

#### General Analysis and Strategic Recommendations:

**A. To Increase the Average Rating:**
- **Improve Service Quality:** Continuously monitor and enhance service quality to ensure customers are satisfied and provide positive reviews.
- **Personalized Services:** Introduce tailored services, offers, and experiences to further engage customers and improve their dining experience.
- **Focus on Special Occasions:** Pay particular attention to customer experiences during holidays and special events, which can help boost the average rating.

**B. To Increase the Total Reviews:**
- **Review Incentives:** Offer gifts, discounts, or other incentives to encourage customers to leave reviews.
- **Leverage Social Media:** Expand engagement on social media platforms, encouraging customers to share their experiences and reviews online.
- **Improve Customer Support:** Respond to customer complaints and suggestions effectively to ensure more positive reviews and customer retention.

**C. Seasonality and Trend-Specific Strategies:**
- **Seasonal Specials:** During spring and summer, attract customers with outdoor dining offers. During winter, offer warm meals, drinks, and cozy indoor settings.
- **Holiday Strategies:** Develop promotions such as discounts, special menus, and events to attract customers during key holidays and festive seasons.

#### Conclusion:

To improve customer satisfaction and restaurant performance, it is essential to closely monitor the Average Rating and Total Reviews indicators. Strategic actions, such as enhancing service quality, analyzing customer reviews, and considering seasonal factors, will contribute to the long-term success and reputation of the restaurant.


In [43]:
df_sorted=df.sort_values(by=["user_id","review_date"])

In [44]:
df_with_prev_coords = df_sorted.copy()
df_with_prev_coords['prev_latitude'] = df_with_prev_coords.groupby('user_id')['latitude'].shift()
df_with_prev_coords['prev_longitude'] = df_with_prev_coords.groupby('user_id')['longitude'].shift()

In [45]:
def compute_distance(row):
    if pd.notnull(row['prev_latitude']):
        return geodesic((row['prev_latitude'], row['prev_longitude']), 
                        (row['latitude'], row['longitude'])).kilometers
    return 0

In [46]:
df_with_distance = df_with_prev_coords.copy()
df_with_distance['travel_distance']=df_with_distance.apply(compute_distance,axis=1)

In [47]:
user_travel_distance=df_with_distance.groupby('user_id')['travel_distance'].sum()

In [48]:
max_user=user_travel_distance.idxmax()
max_distance=user_travel_distance.max()

In [49]:
print(f"User with highest travel distance: {max_user}, Distance: {max_distance:.2f} km")

User with highest travel distance: 6uYJ-ixRxPMyf-iEbhoz2g, Distance: 31847.14 km


1. **User with highest travel distance :** *6uYJ-ixRxPMyf-iEbhoz2g*
2. **Distance :** *31847.14 km*


In [50]:
# Select the top 10 most traveled users.
top_users = user_travel_distance.nlargest(10).reset_index()
top_users.columns = ['User_ID', 'Total_Travel_Distance']

# Create a bar chart.
fig = px.bar(
    top_users,
    x='User_ID',
    y='Total_Travel_Distance',
    text='Total_Travel_Distance',
    title='Top 10 Users by Total Travel Distance',
    labels={'User_ID': 'User ID', 'Total_Travel_Distance': 'Total Travel Distance (km)'},
    color='Total_Travel_Distance',
    color_continuous_scale='Blues'
)
# Customize the graph.
fig.update_traces(
    texttemplate='%{text:.2f} km',
    textposition='outside'
)
fig.update_layout(
    xaxis=dict(title='User ID'),
    yaxis=dict(title='Total Travel Distance (km)'),
    showlegend=False,
    title_x=0.5
)

# Show the graph.
fig.show()
