# Geographic gazetteer

> Urban or Rural Spatial Units: Standard and Ad Hoc

Census Bureau:
- [FIPS codes](https://www.census.gov/geographies/reference-files/2019/demo/popest/2019-fips.html)
- [Gazeteer reference files](https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.html)
- [Cartographic boundary files](https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html)

In [None]:
#default_exp geo_gazetteer

In [None]:
#export
import pandas as pd

In [None]:
#export
data = {}

def init():
    if data:
        return
    download_data()
    build_state_data()

def download_data():
    df = pd.read_excel('https://www2.census.gov/programs-surveys/popest/geographies/2019/all-geocodes-v2019.xlsx',
                       skiprows=4, dtype='str')
    data['all-geocodes'] = df
    
    df = pd.read_csv('https://www2.census.gov/geo/docs/maps-data/data/gazetteer/2020_Gazetteer/2020_Gaz_counties_national.zip',
                     sep='\t', dtype='str')
    data['gaz-counties'] = df

def build_state_data():
    df = data['all-geocodes']
    df = df[df['Summary Level'] == '040']
    df = df.rename(columns={'Area Name (including legal/statistical area description)': 'STATE_NAME',
                            'State Code (FIPS)': 'STATE_CODE'})
    df = df[['STATE_CODE', 'STATE_NAME']]
    df0 = df

    df = data['gaz-counties']
    df = df.rename(columns={'USPS': 'STATE_ABBR'})
    df = df.drop_duplicates('STATE_ABBR')
    df['STATE_CODE'] = df['GEOID'].str[:2]
    df = df[['STATE_ABBR', 'STATE_CODE']]

    df = df0.merge(df, 'outer', 'STATE_CODE', indicator=True)
    assert (df['_merge'] == 'both').all()
    del df['_merge']
    
    data['states'] = df

In [None]:
#export
def get_mapping(key, val):
    init()
    if key.startswith('STATE'):
        df = data['states']
    else:
        raise Exception(f'Unknown field: {key}')
    return dict(df[[key, val]].itertuples(False))

1. Non-urban counties (OMB)
2. Inverse of Census urbanity (Census Bureau)
3. Outside urban activity (ERS)
    - Urban Influence Codes
    - Rural-Urban Continuum
    - Rural-Urban Commuting Areas
4. Non-urban census tracts
    - HRSA/FORHP
    - Inverse of spatial overlap with urban areas
5. Zip codes
    - data challenges
    - FAR and Remote

##### Zip Code Tabulation Area (ZCTA)

##### Postal Zip Code

##### Urban Influence Codes

##### Rural-Urban Commuting Area (RUCA)

##### Rural-Urban Continuum (RUC)

##### HRSA/FORHP

HRSA's Federal Office of Rural Health Policy (FORHP) accepts all non-metro counties as rural 
and uses an additional method of determining rural status called the Rural-Urban Commuting 
Area (RUCA) codes. Like the MSAs, these are based on Census data which is used to assign a 
code to each Census Tract. Tracts inside Metropolitan counties with the codes 4-10 are 
considered rural. While use of the RUCA codes has allowed identification of rural census 
tracts in Metropolitan counties, among the more than 60,000 tracts in the U.S. there are 
some that are extremely large and where use of RUCA codes alone fails to account for distance 
to services and sparse population. In response to these concerns, FORHP has designated 
132 large area census tracts with RUCA codes 2 or 3 as rural. These tracts are at least 
400 square miles in area with a population density of no more than 35 people. The FORHP 
definition includes about 1866 of the population and 8566 of the area of the USA. RUCA codes 
represent the current version of the Goldsmith Modification.


There are two major definitions which the Federal government uses to identify the rural 
status of an area: the Census Bureau's 'Urban Area' and the OMB's 'Core-Based Statistical 
Area'.

##### Urban Area
The first is from the U.S. Census Bureau which identifies two types of 
urban areas, Urbanized Areas (UAs) of 50,000 or more people and Urban Clusters (UCs) of at 
Ieast2,S00 and less than 50,000 people. Since the U.S. Census Bureau does not explicitly 
classify areas as rural, rural is defined as “encompassing all population, housing, and 
territory not included within an urban area (those areas not identified as UC or UA)". 
In the 2010 Census, 19.366 of the population was rural while over 9596 of the land area 
is still classified as rural. 

##### CBSA
The second is from the Office of Management and Budget [OMB] 
which designates counties as Metropolitan, Micropolitan, or Neither. An OMB Metropolitan 
area contains a core urban area of 50,000 or more population, and a Micropolitan area 
contains an urban core of at least 10,000 (but less than 50,000) population. All counties 
that are not part of a Metropolitan Statistical Area [MSA) are considered rural. 

There are measurement challenges with both the U.S. Census Bureau and OMB definitions. 
Some policy experts note that the U.S. Census Bureau definition classifies quite a bit of  
suburban area as rural. The OMB definition includes rural areas in Metropolitan bounties.
Consequently, one could argue that the Census Bureau standard includes an overcount of rural 
population whereas the OMB standard represents an undercount of the rural population.

##### Census Tract

##### Frontier and Remote

### Addendum 1: ZIP codes

### Addendum 2: Census Tracts

### Addendum 3: GeoDataframes