Top 5 States With 5 Star Businesses

Find the top 5 states with the most 5 star businesses. Output the state name along with the number of 5-star businesses and order records by the number of 5-star businesses in descending order. In case there are ties in the number of businesses, return all the unique states. If two states have the same result, sort them in alphabetical order.

In [1]:
import pandas as pd

In [7]:
yelp_business = pd.read_csv("../CSV/yelp_business.csv")
yelp_business

Unnamed: 0,business_id,name,neighborhood,address,city,state,postal_code,latitude,longitude,stars,review_count,is_open,categories
0,G5ERFWvPfHy7IDAUYlWL2A,All Colors Mobile Bumper Repair,,7137 N 28th Ave,Phoenix,AZ,85051,33.448,-112.074,1.0,4,1,Auto Detailing;Automotive
1,0jDvRJS-z9zdMgOUXgr6rA,Sunfare,,811 W Deer Valley Rd,Phoenix,AZ,85027,33.683,-112.085,5.0,27,1,Personal Chefs;Food;Gluten-Free;Food Delivery ...
2,6HmDqeNNZtHMK0t2glF_gg,Dry Clean Vegas,Southeast,"2550 Windmill Ln, Ste 100",Las Vegas,NV,89123,36.042,-115.118,1.0,4,1,Dry Cleaning & Laundry;Laundry Services;Local ...
3,pbt3SBcEmxCfZPdnmU9tNA,The Cuyahoga Room,,740 Munroe Falls Ave,Cuyahoga Falls,OH,44221,41.140,-81.472,1.0,3,0,Wedding Planning;Caterers;Event Planning & Ser...
4,CX8pfLn7Bk9o2-8yDMp_2w,The UPS Store,,"4815 E Carefree Hwy, Ste 108",Cave Creek,AZ,85331,33.798,-111.977,1.5,5,1,Notaries;Printing Services;Local Services;Ship...
...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,l4xrBZAKLXpSR4iprqTw8A,Mark's,,"1015 Lakeshore Boulevard E, Unit 5",Toronto,ON,M4M 1B3,43.656,-79.332,5.0,3,1,Women's Clothing;Shopping;Fashion;Men's Clothing
96,ICdzSGuv70gpSk7aqpIrHw,Wok-In Bbq,,"3540 Rutherford Road, Unit 67",Vaughan,ON,L4H 3T8,43.829,-79.549,4.5,10,1,Chinese;Barbeque;Restaurants
97,wk3wGDfJb1V-ciZpyhoNAA,Bic's Pub and Grill,,560 State Road 130,Trafford,PA,15085,40.390,-79.727,4.5,7,1,Pubs;American (Traditional);Nightlife;Bars;Piz...
98,NBYN4Nks_EsPHyAlJ_mdNw,Bistro Merlot,,18425 Antoine Faucon,Pierrefonds,QC,H9K 1M7,45.452,-73.887,4.5,8,0,Salad;Pizza;Restaurants;Event Planning & Servi...


In [8]:
result = yelp_business[yelp_business.stars == 5].groupby('state')['business_id'].count().to_frame(
   'n_businesses').reset_index()
result

Unnamed: 0,state,n_businesses
0,AZ,10
1,BW,1
2,EDH,2
3,IL,3
4,NV,4
5,OH,3
6,ON,5
7,QC,1
8,WI,3


In [9]:
result['rank'] = result['n_businesses'].rank(method='min', ascending=False)
result

Unnamed: 0,state,n_businesses,rank
0,AZ,10,1.0
1,BW,1,8.0
2,EDH,2,7.0
3,IL,3,4.0
4,NV,4,3.0
5,OH,3,4.0
6,ON,5,2.0
7,QC,1,8.0
8,WI,3,4.0


In [10]:
result = result[result['rank'] <= 5][['state', 'n_businesses']].sort_values(by=['n_businesses', 'state'], ascending=[False, True])
result

Unnamed: 0,state,n_businesses
0,AZ,10
6,ON,5
4,NV,4
3,IL,3
5,OH,3
8,WI,3


Solution Walkthrough
This problem involves analyzing a dataset of Yelp businesses and finding the top 5 states with the most 5-star businesses. The solution code uses the pandas library to filter the data, group it by state, count the number of 5-star businesses in each state, rank the states based on the number of businesses, and finally sort the results.

Let's walk through the solution step by step and understand the code.

Understanding The Data
The data consists of a DataFrame named yelp_business which contains information about Yelp businesses. It has columns such as 'business_id', 'stars', and 'state'.

The Problem Statement
The problem requires finding the top 5 states with the most 5-star businesses. The output should include the state name along with the number of 5-star businesses. The records should be ordered by the number of 5-star businesses in descending order. In case of ties, all unique states should be returned. If two states have the same number of businesses, they should be sorted in alphabetical order.

Breaking Down The Code
Let's break down the solution code into smaller parts and understand their functionality.

import pandas as pd

result = (
    yelp_business[yelp_business.stars == 5]
    .groupby("state")["business_id"]
    .count()
    .to_frame("n_businesses")
    .reset_index()
)
The code begins by importing the pandas library and creating a DataFrame named result. The DataFrame is constructed using method chaining:

yelp_business[yelp_business.stars == 5] filters the yelp_business DataFrame to include only rows where the 'stars' column is equal to 5.
groupby('state') groups the filtered data by the 'state' column.
['business_id'].count() counts the number of occurrences of the 'business_id' column in each group. This gives us the number of 5-star businesses in each state.
.to_frame('n_businesses') converts the resulting Series into a DataFrame with a column named 'n_businesses'.
.reset_index() resets the index of the DataFrame, making the 'state' column a regular column.
Continuing:

result["rank"] = result["n_businesses"].rank(
    method="min", ascending=False
)
This line adds a new column named 'rank' to the result DataFrame. The values in this column are computed by ranking the 'n_businesses' column in descending order. We use the 'min' method to handle ties, meaning the ranks of tied values are assigned the same rank and the next rank is skipped.

Finally:

result = result[result["rank"] <= 5][
    ["state", "n_businesses"]
].sort_values(by=["n_businesses", "state"], ascending=[False, True])
This code filters the result DataFrame to include only rows where the 'rank' column is less than or equal to 5. Then, it selects only the 'state' and 'n_businesses' columns and sorts the DataFrame by the number of businesses in descending order and then by state name in alphabetical order.

Bringing It All Together
The complete solution code combines the above parts to solve the problem:

import pandas as pd

result = (
    yelp_business[yelp_business.stars == 5]
    .groupby("state")["business_id"]
    .count()
    .to_frame("n_businesses")
    .reset_index()
)
result["rank"] = result["n_businesses"].rank(
    method="min", ascending=False
)
result = result[result["rank"] <= 5][
    ["state", "n_businesses"]
].sort_values(by=["n_businesses", "state"], ascending=[False, True])
Conclusion
In this walkthrough, we discussed a solution to the problem of finding the top 5 states with the most 5-star businesses in a Yelp dataset. The code uses pandas to filter and group the data, compute rankings, and sort the results.