Number of violations

You're given a dataset of health inspections. Count the number of violation in an inspection in 'Roxanne Cafe' for each year. If an inspection resulted in a violation, there will be a value in the 'violation_id' column. Output the number of violations by year in ascending order.

In [1]:
import pandas as pd
import datetime as dt

In [2]:
sf_restaurant_health_violations = pd.read_excel("../CSV/sf_restaurant_health_violations.xlsx", header=1)
sf_restaurant_health_violations.head()

Unnamed: 0,business_id,business_name,business_address,business_city,business_state,business_postal_code,business_latitude,business_longitude,business_location,business_phone_number,inspection_id,inspection_date,inspection_score,inspection_type,violation_id,violation_description,risk_category
0,5800,John Chin Elementary School,350 Broadway St,San Francisco,CA,94133.0,37.798,-122.403,"{'longitude': '-122.403154', 'needs_recoding':...",,5800_20171017,2017-10-17,98.0,Routine - Unscheduled,5800_20171017_103149,Wiping cloths not clean or properly stored or ...,Low Risk
1,64236,Sutter Pub and Restaurant,700 Sutter St,San Francisco,CA,94102.0,37.789,-122.412,"{'longitude': '-122.41188', 'needs_recoding': ...",,64236_20170725,2017-07-25,88.0,Routine - Unscheduled,64236_20170725_103133,Foods not protected from contamination,Moderate Risk
2,1991,SRI THAI CUISINE,4621 LINCOLN Way,San Francisco,CA,94122.0,37.764,-122.508,"{'longitude': '-122.507779', 'needs_recoding':...",,1991_20171129,2017-11-29,86.0,Routine - Unscheduled,1991_20171129_103139,Improper food storage,Low Risk
3,3816,Washington Bakery & Restaurant,733 Washington St,San Francisco,CA,94108.0,37.795,-122.406,"{'longitude': '-122.405845', 'needs_recoding':...",,3816_20160728,2016-07-28,67.0,Routine - Unscheduled,3816_20160728_103108,Contaminated or adulterated food,High Risk
4,39119,Brothers Restaurant,4128 GEARY Blvd,San Francisco,CA,94118.0,37.781,-122.464,"{'longitude': '-122.463762', 'needs_recoding':...",,39119_20160718,2016-07-18,79.0,Routine - Unscheduled,39119_20160718_103133,Foods not protected from contamination,Moderate Risk


In [4]:
df = sf_restaurant_health_violations


In [5]:
result = df[(df['business_name'] == 'Roxanne Cafe') & (~df['violation_id'].isna())
            ].groupby(pd.to_datetime(df['inspection_date']).dt.year)['violation_id'].count().reset_index()
result

Unnamed: 0,inspection_date,violation_id
0,2015,5
1,2016,2
2,2018,3


Solution Walkthrough
In this problem, we are given a dataset of health inspections and we need to count the number of violations for the 'Roxanne Cafe' for each year. We will use the pandas library to manipulate and analyze the data.

Understanding The Data
The dataset we have is called sf_restaurant_health_violations. It contains information about health inspections for various restaurants in San Francisco. The dataset has columns such as business_name (name of the restaurant), violation_id (an ID number for each violation), and inspection_date (the date of the inspection).

The Problem Statement
The task at hand is to count the number of violations for the 'Roxanne Cafe' for each year. We need to filter the dataset for records where the business_name is 'Roxanne Cafe' and violation_id is not null (indicating that there was a violation). Then, we group the filtered dataset by the year of the inspection_date and count the number of violations.

Breaking Down The Code
Let's break down the given code step by step.

We import the pandas library and alias it as pd, and we also import the datetime module from the standard library and alias it as dt.
import pandas as pd
import datetime as dt
We then access the dataframe sf_restaurant_health_violations and assign it to the variable df.
df = sf_restaurant_health_violations
Next, we filter the dataframe df for records where the business_name is 'Roxanne Cafe' and the violation_id is not null. We use boolean indexing to achieve this.
df[
    (df["business_name"] == "Roxanne Cafe")
    & (~df["violation_id"].isna())
]
After filtering the dataframe, we group it by the year of the inspection_date using the pd.to_datetime() function and the dt.year attribute.
.groupby(pd.to_datetime(df['inspection_date']).dt.year)
We then apply the count() function to the grouped dataframe to get the number of violations for each year.
["violation_id"].count()
Finally, we use the reset_index() function to reset the index of the resulting dataframe.
.reset_index()
Bringing It All Together
Putting all the code together, we have:

import pandas as pd
import datetime as dt

df = sf_restaurant_health_violations
result = (
    df[
        (df["business_name"] == "Roxanne Cafe")
        & (~df["violation_id"].isna())
    ]
    .groupby(pd.to_datetime(df["inspection_date"]).dt.year)[
        "violation_id"
    ]
    .count()
    .reset_index()
)
This code filters the sf_restaurant_health_violations dataframe for violations in the 'Roxanne Cafe', groups the filtered data by year, counts the number of violations for each year, and resets the index of the resulting dataframe.

Conclusion
In this problem, we used the pandas library to count the number of violations in the 'Roxanne Cafe' for each year. By filtering the dataframe, grouping the data, and applying the count function, we were able to obtain the desired result.