Top Businesses With Most Reviews

Find the top 5 businesses with most reviews. Assume that each row has a unique business_id such that the total reviews for each business is listed on each row. Output the business name along with the total number of reviews and order your results by the total reviews in descending order.

In [3]:
import pandas as pd
import numpy as np

In [5]:
yelp_business = pd.read_csv("CSV/yelp_business.csv")
yelp_business.head()

Unnamed: 0,business_id,name,neighborhood,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,categories
0,G5ERFWvPfHy7IDAUYlWL2A,All Colors Mobile Bumper Repair,,7137 N 28th Ave,Phoenix,AZ,85051,33.448,-112.074,1.0,4,1,Auto Detailing;Automotive
1,0jDvRJS-z9zdMgOUXgr6rA,Sunfare,,811 W Deer Valley Rd,Phoenix,AZ,85027,33.683,-112.085,5.0,27,1,Personal Chefs;Food;Gluten-Free;Food Delivery ...
2,6HmDqeNNZtHMK0t2glF_gg,Dry Clean Vegas,Southeast,"2550 Windmill Ln, Ste 100",Las Vegas,NV,89123,36.042,-115.118,1.0,4,1,Dry Cleaning & Laundry;Laundry Services;Local ...
3,pbt3SBcEmxCfZPdnmU9tNA,The Cuyahoga Room,,740 Munroe Falls Ave,Cuyahoga Falls,OH,44221,41.14,-81.472,1.0,3,0,Wedding Planning;Caterers;Event Planning & Ser...
4,CX8pfLn7Bk9o2-8yDMp_2w,The UPS Store,,"4815 E Carefree Hwy, Ste 108",Cave Creek,AZ,85331,33.798,-111.977,1.5,5,1,Notaries;Printing Services;Local Services;Ship...


In [6]:
yelp_business['rank'] = yelp_business['review_count'].rank(ascending=False)
yelp_business.head()


Unnamed: 0,business_id,name,neighborhood,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,categories,rank
0,G5ERFWvPfHy7IDAUYlWL2A,All Colors Mobile Bumper Repair,,7137 N 28th Ave,Phoenix,AZ,85051,33.448,-112.074,1.0,4,1,Auto Detailing;Automotive,81.0
1,0jDvRJS-z9zdMgOUXgr6rA,Sunfare,,811 W Deer Valley Rd,Phoenix,AZ,85027,33.683,-112.085,5.0,27,1,Personal Chefs;Food;Gluten-Free;Food Delivery ...,21.5
2,6HmDqeNNZtHMK0t2glF_gg,Dry Clean Vegas,Southeast,"2550 Windmill Ln, Ste 100",Las Vegas,NV,89123,36.042,-115.118,1.0,4,1,Dry Cleaning & Laundry;Laundry Services;Local ...,81.0
3,pbt3SBcEmxCfZPdnmU9tNA,The Cuyahoga Room,,740 Munroe Falls Ave,Cuyahoga Falls,OH,44221,41.14,-81.472,1.0,3,0,Wedding Planning;Caterers;Event Planning & Ser...,92.5
4,CX8pfLn7Bk9o2-8yDMp_2w,The UPS Store,,"4815 E Carefree Hwy, Ste 108",Cave Creek,AZ,85331,33.798,-111.977,1.5,5,1,Notaries;Printing Services;Local Services;Ship...,73.5


In [9]:
yelp_business.to_csv("My/yelp_business.csv")

In [10]:
result = yelp_business[yelp_business['rank']<=5][['name', 'review_count']].sort_values(by='review_count', ascending=False)
result

Unnamed: 0,name,review_count
68,Iron Chef,331
88,Jacs Dining and Tap House,197
63,Grimaldi's Pizzeria,187
84,Signs Restaurant,120
86,Kassab's,101


Solution Walkthrough
In this problem, we are given a dataset called yelp_business which contains information about different businesses. Each row in the dataset represents a unique business and includes the number of reviews for that business. Our task is to find the top 5 businesses with the most reviews and output their names along with the total number of reviews, ordered by the total reviews in descending order.

To solve this problem, we will use the pandas library in Python, which provides powerful data manipulation and analysis tools. We will use the rank() function in pandas to rank the businesses based on their number of reviews, and then filter out the top 5 businesses. Finally, we will sort the filtered businesses by their review count in descending order.

Let's now dive into the solution walkthrough.

Understanding The Data
The yelp_business dataset represents a table with columns such as 'business_id', 'name', 'review_count', and more. Each row in the table represents a unique business and contains information about that business, including its name and the number of reviews it has received.

The Problem Statement
We need to find the top 5 businesses with the most reviews from the yelp_business dataset. Our output should include the name of each business along with its total number of reviews, and the businesses should be ordered by the total reviews in descending order.

Breaking Down The Code
Let's break down the code provided into smaller parts to understand what each step does.

yelp_business["rank"] = yelp_business["review_count"].rank(
    ascending=False
)
Here, we are creating a new column called 'rank' in the yelp_business DataFrame using the rank() function from pandas. The rank() function assigns a rank to each value in the 'review_count' column. We pass the argument ascending=False to rank the values in descending order.

result = yelp_business[yelp_business["rank"] <= 5][
    ["name", "review_count"]
].sort_values(by="review_count", ascending=False)
In this line, we are using boolean indexing to filter the rows where the 'rank' is less than or equal to 5. We then select only the 'name' and 'review_count' columns using double brackets [['name', 'review_count']]. Finally, we sort the filtered DataFrame by the 'review_count' column in descending order using the sort_values() function with the argument by='review_count', ascending=False.

Bringing It All Together
Now, let's put all the code together and see how it solves the problem.

import pandas as pd
import numpy as np

yelp_business["rank"] = yelp_business["review_count"].rank(
    ascending=False
)
result = yelp_business[yelp_business["rank"] <= 5][
    ["name", "review_count"]
].sort_values(by="review_count", ascending=False)
First, we import the pandas and numpy libraries. Then, we use the rank() function to assign ranks to the 'review_count' column in the yelp_business DataFrame. We store the results in a new column called 'rank'. Next, we filter the DataFrame to select only the rows where the 'rank' is less than or equal to 5. We also select only the 'name' and 'review_count' columns. Finally, we sort the filtered DataFrame by the 'review_count' column in descending order. The final result is stored in the result variable.

Conclusion
By using the rank() function, boolean indexing, and sorting in pandas, we were able to find the top 5 businesses with the most reviews from the yelp_business dataset and output their names along with the total number of reviews.