# Variable Search

Sometimes variable semantics change over time, especially in the ACS. `censusdis.variable.search` gives us a way
to track these changes so we are sure we are using consistent data in multi-year studies. It also includes a regular
expression search to help us find variables we are looking for. Finally, it can be narrowed to search only a specific
group. This gives us flexibility to use it in many different ways as demonstrated below.

In [1]:
import censusdis.data as ced
from censusdis.datasets import ACS1

## About `ced.variables.search`

In [2]:
help(ced.variables.search)

Help on method search in module censusdis.impl.varcache:

search(dataset: str, vintage: Union[int, Iterable[int]], *, group_name: Optional[str] = None, name: Union[str, Iterable[str], NoneType] = None, pattern: Union[str, re.Pattern, NoneType] = None, case: bool = False) -> pandas.core.frame.DataFrame method of censusdis.impl.varcache.VariableCache instance
    Retrieve information about the evolution of one of more variables in a data set over one or more vintages.
    
    Parameters
    ----------
    dataset
        The data set.
    vintage
        One or more Vintages to explore.
    group_name
        The group if we should explore only a single group. If `None` all groups
        will be explored.
    name
        The name of one of more variables to explore. If `None` all variables are considered.
        Normally at most one of `name` and `re` will be used.
    pattern
        A regular expression to match against the name and description of a variable. This
        is used t

## Look at the history of a variable over time

`B08006_017E` is a variable that was repurposed after 2005. Also note that it did not
exist in 2020.

In [3]:
VARIABLE = "B08006_017E"

In [4]:
df_variable_history = ced.variables.search(ACS1, range(2005, 2023), name=VARIABLE)

In [5]:
df_variable_history

Unnamed: 0,YEAR,DATASET,GROUP,VARIABLE,LABEL,SUGGESTED_WEIGHT,VALUES
0,2005,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Motorcycle,,
1,2006,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,
2,2007,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,
3,2008,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,
4,2009,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,
5,2010,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,
6,2011,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,
7,2012,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,
8,2013,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,
9,2014,acs/acs1,B08006,B08006_017E,Estimate!!Total!!Worked at home,,


## Search for variables about grandparents

In [6]:
df_grandparent_variables = ced.variables.search(ACS1, 2022, pattern="grandparent")

In [7]:
df_grandparent_variables

Unnamed: 0,YEAR,DATASET,GROUP,VARIABLE,LABEL,SUGGESTED_WEIGHT,VALUES
0,2022,acs/acs1,B10002,B10002_002E,Estimate!!Total:!!Grandparent householder resp...,,
1,2022,acs/acs1,B10002,B10002_003E,Estimate!!Total:!!Grandparent householder resp...,,
2,2022,acs/acs1,B10002,B10002_004E,Estimate!!Total:!!Grandparent householder resp...,,
3,2022,acs/acs1,B10002,B10002_005E,Estimate!!Total:!!Grandparent householder not ...,,
4,2022,acs/acs1,B10010,B10010_002E,Estimate!!Median family income in the past 12 ...,,
...,...,...,...,...,...,...,...
142,2022,acs/acs1,B10063,B10063_007E,Estimate!!Total:!!Household without grandparen...,,
143,2022,acs/acs1,B99104,B99104_003E,Estimate!!Total:!!Living with own grandchildre...,,
144,2022,acs/acs1,B99104,B99104_004E,Estimate!!Total:!!Living with own grandchildre...,,
145,2022,acs/acs1,B99104,B99104_005E,Estimate!!Total:!!Living with own grandchildre...,,


What unique groups do they fall in?

In [8]:
df_grandparent_variables["GROUP"].unique()

array(['B10002', 'B10010', 'B10050', 'B10051A', 'B10051B', 'B10051C',
       'B10051D', 'B10051E', 'B10051F', 'B10051G', 'B10051H', 'B10051I',
       'B10051', 'B10052', 'B10053', 'B10054', 'B10056', 'B10057',
       'B10058', 'B10059', 'B10063', 'B99104'], dtype=object)

We can also do the same search but within a single group.

In [9]:
ced.variables.search(ACS1, 2022, pattern="grandparent", group_name="B10054")

Unnamed: 0,YEAR,DATASET,GROUP,VARIABLE,LABEL,SUGGESTED_WEIGHT,VALUES
0,2022,acs/acs1,B10054,B10054_003E,Estimate!!Total:!!Speak only English:!!Grandpa...,,
1,2022,acs/acs1,B10054,B10054_004E,Estimate!!Total:!!Speak only English:!!Grandpa...,,
2,2022,acs/acs1,B10054,B10054_005E,Estimate!!Total:!!Speak only English:!!Grandpa...,,
3,2022,acs/acs1,B10054,B10054_006E,Estimate!!Total:!!Speak only English:!!Grandpa...,,
4,2022,acs/acs1,B10054,B10054_009E,Estimate!!Total:!!Speak other language:!!Speak...,,
5,2022,acs/acs1,B10054,B10054_010E,Estimate!!Total:!!Speak other language:!!Speak...,,
6,2022,acs/acs1,B10054,B10054_011E,Estimate!!Total:!!Speak other language:!!Speak...,,
7,2022,acs/acs1,B10054,B10054_012E,Estimate!!Total:!!Speak other language:!!Speak...,,
8,2022,acs/acs1,B10054,B10054_014E,Estimate!!Total:!!Speak other language:!!Speak...,,
9,2022,acs/acs1,B10054,B10054_015E,Estimate!!Total:!!Speak other language:!!Speak...,,


## Has the number of variables in a group changed over time?

In [10]:
GROUP = "B08006"

In [11]:
df_group_vars = ced.variables.search(ACS1, range(2005, 2023), group_name=GROUP)

In [12]:
df_group_vars.groupby("YEAR")["VARIABLE"].count()

YEAR
2005    65
2006    53
2007    53
2008    53
2009    53
2010    53
2011    53
2012    53
2013    53
2014    53
2015    53
2016    53
2017    53
2018    53
2019    53
2021    53
2022    53
Name: VARIABLE, dtype: int64

## Sadly, the ACS has never asked any questions about tacos

In [13]:
df_taco = ced.variables.search(ACS1, range(2005, 2023), pattern="taco")
df_taco

Unnamed: 0,YEAR,DATASET,GROUP,VARIABLE,LABEL,SUGGESTED_WEIGHT,VALUES
