surveypandas

Tools for managing (survey) datasets in which variables have descriptions and their possible values may each have a description (label) as well. Data are in a Pandas DataFrame, and the labels are in a nested dict. Routines for reading in, writing out to Stata (other commercial formats?); managing missing value categories; etc.

The emphasis here is on managing the "codebook" information, including for instance changing variable names while keeping the codebook updated, etc.

Example session (to do):

from surveypandas import read_stata, surveyDataFrame, read_pickle

df = read_stata('WV6_Stata_v_2016_01_01.dta.gz')     # Load both the data and codebook information from a Stata file

df.rename_columns(dict(
    A170 = 'SWL',
    X025R = 'educ3',
    ),
      inplace=True)                                  # Renames columns and corresponding codebook entries
df.rename_columns_from_descriptions(inplace=True, skip_already_renamed = True)        # Rename remaining columns to something readable, based on their codebook descriptions

df.set_float_values_from_negative_integers()         # Create a missing value lookup for integer columns based on the codebook
df.set_NaN_strings(["Don't know", "Not asked", "Refused"],) # Do the same thing, but using value labels

cols =df.grep('satis')                               # Search for all columns with this string in their name or documentation (case insensitive)
df.dgrep('satis')                                    # Report stats and descriptions for those same columns
df[  cols  ].describe()                              # Alternative syntax to above
df.to_floats()[ cols ].describe()                    # Show stats on non-missing values for the columns of interest

df.to_pickle('mydata.spandas')                       # Save data in compressed python format
df2 = read_pickle('mydata.spandas')                  # And read it back

df['countryName'] = df.as_labels('country')          # Use codebook to create str-value column from numeric codes

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
codebooks.py		codebooks.py
config-template.cfg		config-template.cfg
core.py		core.py
latexRegressions.py		latexRegressions.py
surveypandas_config.py		surveypandas_config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

init.py

init.py

codebooks.py

codebooks.py

config-template.cfg

config-template.cfg

core.py

core.py

latexRegressions.py

latexRegressions.py

surveypandas_config.py

surveypandas_config.py

Repository files navigation

surveypandas

About

Releases

Packages

Languages

License

cpbl/surveypandas-MOVED-TO-GITLAB

Folders and files

Latest commit

History

Repository files navigation

surveypandas

About

Resources

License

Stars

Watchers

Forks

Languages