# Query Filters

In [1]:
import censusdis.data as ced
from censusdis.datasets import CBP
from censusdis.states import NJ

## The County Business Patterns (CBP) Data Set

We are working with the [County Business Patterns](https://www.census.gov/data/developers/data-sets/cbp-zbp/cbp-api.html) data set.
This data set contains data on many different industries, each of which is represented by 
a 2 to 6 digit [NAICS Code](https://www.census.gov/naics/).

First, we will quickly explore the metadata on this data set and
find the variables we want for number of establishments, number of 
employees, and annual payroll.

In [2]:
ced.variables.search_groups(CBP, 2022)

Unnamed: 0,DATASET,YEAR,GROUP,DESCRIPTION
0,cbp,2022,CB2200CBP,"All Sectors: County Business Patterns, includi..."


In [3]:
ced.variables.search(CBP, 2022, group_name="CB2200CBP")

Unnamed: 0,YEAR,DATASET,GROUP,VARIABLE,LABEL,SUGGESTED_WEIGHT,VALUES
0,2022,cbp,CB2200CBP,EMP,Number of employees,,
1,2022,cbp,CB2200CBP,EMPSZES,Employment size of establishments code,,
2,2022,cbp,CB2200CBP,EMPSZES_LABEL,Meaning of Employment size of establishments code,,
3,2022,cbp,CB2200CBP,EMP_F,Flag for number of employees,,
4,2022,cbp,CB2200CBP,EMP_N,Noise range for number of employees,,
5,2022,cbp,CB2200CBP,EMP_N_F,Flag for Noise range for number of employees,,
6,2022,cbp,CB2200CBP,ESTAB,Number of establishments,,
7,2022,cbp,CB2200CBP,ESTAB_F,Flag for number of establishments,,
8,2022,cbp,CB2200CBP,GEO_ID,Geographic identifier code,,
9,2022,cbp,CB2200CBP,GEO_ID_F,Geo Footnote,,


## Unfiltered Download

In [4]:
df_unfiltered_cbp = ced.download(
    CBP,
    2022,
    ["NAME", "ESTAB", "EMP", "PAYANN", "NAICS2017"],
    state=NJ,
    county="*",
)

In [5]:
df_unfiltered_cbp

Unnamed: 0,STATE,COUNTY,NAME,ESTAB,EMP,PAYANN,NAICS2017
0,34,001,"Atlantic County, New Jersey",6210,105931,5244241,00
1,34,001,"Atlantic County, New Jersey",3,35,2617,11
2,34,001,"Atlantic County, New Jersey",26,920,115211,22
3,34,001,"Atlantic County, New Jersey",26,920,115211,221
4,34,001,"Atlantic County, New Jersey",15,473,64751,2211
...,...,...,...,...,...,...,...
20472,34,035,"Somerset County, New Jersey",3,9,438,81394
20473,34,035,"Somerset County, New Jersey",3,9,438,813940
20474,34,035,"Somerset County, New Jersey",11,459,54177,81399
20475,34,035,"Somerset County, New Jersey",11,459,54177,813990


## Using `query_filter=`

In most cases, we don't need such a comprehensive list. Instead, we want the data
for a specific industry code. So we can use a query filter. In this example, we
are just interested in restaurants.

In [6]:
NAICS_RESTAURANTS = "72251"

In [7]:
df_filtered_cbp = ced.download(
    CBP,
    2022,
    ["NAME", "ESTAB", "EMP", "PAYANN"],
    query_filter={"NAICS2017": NAICS_RESTAURANTS},
    state=NJ,
    county="*",
)

In [8]:
df_filtered_cbp

Unnamed: 0,NAICS2017,STATE,COUNTY,NAME,ESTAB,EMP,PAYANN
0,72251,34,1,"Atlantic County, New Jersey",629,11112,284163
1,72251,34,3,"Bergen County, New Jersey",2333,27017,739784
2,72251,34,5,"Burlington County, New Jersey",719,11278,253101
3,72251,34,7,"Camden County, New Jersey",857,12863,296845
4,72251,34,9,"Cape May County, New Jersey",609,4142,212790
5,72251,34,11,"Cumberland County, New Jersey",199,2963,58016
6,72251,34,13,"Essex County, New Jersey",1432,16059,425949
7,72251,34,15,"Gloucester County, New Jersey",482,8881,240434
8,72251,34,17,"Hudson County, New Jersey",1448,16685,469575
9,72251,34,19,"Hunterdon County, New Jersey",245,2787,64125
