namestand

namestand is a Python library for easily transforming/standardizing lists of names (and other strings). No magic here, just a collection of useful tools.

namestand was developed with unwieldy database column–names in mind, but can by applied to any list of strings. Other uses might include: standardizing political donor names, normalizing survey responses, et cetera.

Installation

pip install namestand

Pre-Built Converters

namestand comes with a set* of broadly useful converters.

*Right now, just two of 'em. Contributions and suggestions welcome.

namestand.downscore(string_or_list_of_strings)

Suggested usage: column names, form-response options, etc.

Steps:

Lowercases the string
Strips any leading and trailing whitespace
Converts any substring of non-ASCII alphanumeric characters to an underscore
Removes any leading and trailing underscores
Prefixes the string with "_" if it starts with a digit (which can otherwise cause trouble with pandas and other libraries). E.g., "2013 Happiness" becomes "_2013_happiness".

Example:

namestand.downscore("Case Number") == "case_number"

namestand.downscore([
    "Case Number",
    "Case #",
    "Is Super-Duper?"
]) == [
    "case_number",
    "case",
    "is_super_duper"
]

namestand.person_basic(string_or_list_of_strings) [very alpha]

Suggested usage: Donor names, etc.; note, though, that this converter does not have any special knowledge of the world, e.g., that "Riccchard" is likely a misspelling of "Richard".

Steps:

Uppercases the string
Strips any leading and trailing whitespace
Flips the "first" and "last" names if a comma is present
Removes the following characters that aren't either (unicode) letters, ', -, or spaces.

Along the way, it tries to gracefully handle name prefixes (Mr./Mrs./etc.) and suffixes (Jr./Sr./VII/Esq./etc.).

Example:

namestand.person_basic("Antony, Mark") == "MARK ANTONY"
namestand.person_basic([
    u"Diego Velázquez-O'Connor",
    "Antony, Mark"
]) == [
    u"DIEGO VELÁZQUEZ-O'CONNOR",
    "MARK ANTONY"
]

namestand.company_basic(string_or_list_of_strings) [very alpha]

Tries to remove common cruft from company names.

Steps:

Uppercases the string
Strips any leading and trailing whitespace
Removes the following characters that aren't either (unicode) letters, ', -, or spaces.
Removes "LLC", "LTD", and "INC"

Example:

namestand.person_basic("American Banana Stand, Inc.") == "AMERICAN BANANA STAND"

Custom Converters

You can easily build your own name-standardizing pipelines using the following tools.

namestand.combine(list_of_transformers)

This function accepts a list of transformers (i.e., functions that accept a string and return a string) and returns a pipeline (i.e., a function that can be used in the same way as the pre-built converters). Converters themselves can be used as parts of pipelines, too. For example, if you wanted to change the downscore method to use hyphens, instead:

downhyphen = namestand.combine([
    namestand.downscore,
    lambda x: x.replace("_", "-")
])

But namestand already comes with a few helpers for doing things like string replacements. So you could also do:

downhyphen = namestand.combine([
    namestand.downscore,
    namestand.translator("_", "-")
])

Some helpful transformers:

namestand.translator(pattern, replacement): pattern can be a string or a compiled regex. Equivalent to an argument-aware combination of lambda x: x.replace(string, replacement) and lambda x: re.sub(regex, replacement).
namestand.swapper(pattern, replacement): pattern can be a string or a compiled regex. If a given name matches the pattern (re.match for compiled regexes, x in pattern for string-patterns), the entire name is replaced with the replacement. Otherwise, the given name is retained.
namestand.stripper(chars_to_strip): Equivalent to lambda x: x.strip(chars_to_strip)
namestand.defaulter(test, default_value): test can be either a list of "approved" values, or a function that returns True or False. If x doesn't pass the test (or isn't in the list), it is replaced with default_value.

Tests

Additional usage examples can be found in test/. To test, run nosetests or tox from this repo's root directory. Currently tested, and passing, on the following Python versions:

2.7.14
3.5.4
3.6.4
3.7.5
3.8.0

Feedback?

Pull requests, suggestions, etc. welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
namestand		namestand
test		test
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

namestand

namestand

test

test

.gitignore

.gitignore

LICENSE.txt

LICENSE.txt

README.md

README.md

setup.py

setup.py

tox.ini

tox.ini

Repository files navigation

namestand

Installation

Pre-Built Converters

namestand.downscore(string_or_list_of_strings)

namestand.person_basic(string_or_list_of_strings) [very alpha]

namestand.company_basic(string_or_list_of_strings) [very alpha]

Custom Converters

namestand.combine(list_of_transformers)

Tests

Feedback?

About

Releases

Packages

Languages

License

BuzzFeedNews/namestand

Folders and files

Latest commit

History

Repository files navigation

namestand

Installation

Pre-Built Converters

namestand.downscore(string_or_list_of_strings)

namestand.person_basic(string_or_list_of_strings) [very alpha]

namestand.company_basic(string_or_list_of_strings) [very alpha]

Custom Converters

namestand.combine(list_of_transformers)

Tests

Feedback?

About

Resources

License

Stars

Watchers

Forks

Languages