# Column Labels

Sometimes we'd rather work with human-readable name instead of the symbolic names
that variables have. We have access to metadata about the variables and can use that
to map the names to our liking. This notebook demonstrates how. 

Based on a question raised by [@arilamstein](https://github.com/arilamstein) in
[#325](https://github.com/censusdis/censusdis/issues/325).

In [1]:
import censusdis.data as ced
from censusdis.datasets import ACS5
from censusdis.states import NY

In [2]:
VINTAGE = 2023
GROUP = "B04006"

A pretty typical group download:

In [3]:
df = ced.download(ACS5, VINTAGE, "NAME", group=GROUP, state=NY, place="*")

In [4]:
df.head()

Unnamed: 0,STATE,PLACE,NAME,B04006_001E,B04006_002E,B04006_003E,B04006_004E,B04006_005E,B04006_006E,B04006_007E,...,B04006_101E,B04006_102E,B04006_103E,B04006_104E,B04006_105E,B04006_106E,B04006_107E,B04006_108E,B04006_109E,GEO_ID
0,36,155,"Accord CDP, New York",654,0,0,0,7,0,0,...,0,0,0,0,0,0,0,213,128,1600000US3600155
1,36,199,"Adams village, New York",1730,0,0,0,89,0,0,...,0,0,0,0,0,0,0,198,319,1600000US3600199
2,36,232,"Adams Center CDP, New York",1002,0,0,0,94,0,0,...,10,0,0,0,0,0,0,43,412,1600000US3600232
3,36,276,"Addison village, New York",1775,0,0,0,224,0,0,...,0,0,0,0,0,0,0,175,573,1600000US3600276
4,36,342,"Afton village, New York",1086,0,0,0,120,0,0,...,0,0,0,0,0,0,0,79,281,1600000US3600342


Now we want to change the symbolic column names like `B04006_*` to something more
human readable. So we will write a little function that takes a variable, looks it
up in the netadata, pulls our the label and parses out the part we want.

It would be nice if there was a flag to `ced.download` to do this. The reason there isn't
is that different data sets use different naming conventions in their labels, and sometimes
it even changes from vintage to vintage. It's no unsolvable, but we have not done it yet.

In [5]:
def name_mapper(variable: str):
    """Map from the variables we got back to their labels."""
    if variable.startswith(GROUP):
        # Look up details of the particular variable:
        vars = ced.variables.search(ACS5, VINTAGE, group_name=GROUP, name=variable)
        # Get the label and parse out the part we want:
        label = vars.iloc[0]["LABEL"]
        return label.split("!")[-1].split(":")[0]
    else:
        # Not in the group we are interested in, so leave it as is.
        return variable

In [6]:
# Rename the columns based on the mapper.
df = df.rename(columns=name_mapper)

In [7]:
# Now the names we want should be there.
df.head()

Unnamed: 0,STATE,PLACE,NAME,Total,Afghan,Albanian,Alsatian,American,Arab,Egyptian,...,Haitian,Jamaican,Trinidadian and Tobagonian,U.S. Virgin Islander,West Indian,Other West Indian,Yugoslavian,Other groups,Unclassified or not reported,GEO_ID
0,36,155,"Accord CDP, New York",654,0,0,0,7,0,0,...,0,0,0,0,0,0,0,213,128,1600000US3600155
1,36,199,"Adams village, New York",1730,0,0,0,89,0,0,...,0,0,0,0,0,0,0,198,319,1600000US3600199
2,36,232,"Adams Center CDP, New York",1002,0,0,0,94,0,0,...,10,0,0,0,0,0,0,43,412,1600000US3600232
3,36,276,"Addison village, New York",1775,0,0,0,224,0,0,...,0,0,0,0,0,0,0,175,573,1600000US3600276
4,36,342,"Afton village, New York",1086,0,0,0,120,0,0,...,0,0,0,0,0,0,0,79,281,1600000US3600342
