Set-oriented Operations in Pandas
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
pandas_sets
tests Initial commit Dec 26, 2018
.gitignore Initial commit Dec 26, 2018
CHANGES.md Update changelog Dec 27, 2018
LICENSE Initial commit Dec 26, 2018
README.md Initial commit Dec 26, 2018
requirements.txt
setup.py Update homepage Dec 27, 2018

README.md

Pandas Sets: Set-oriented Operations in Pandas

If you store standard Python sets in your Series or DataFrame objects, you'll find this useful.

The pandas_sets package adds a .set accessor to any pandas Series object; it's like .dt for datetime or .str for string, but for set.

It exposes all public methods available in the standard set.

Installation

pip install pandas-sets

Just import the pandas_sets package and it will register a .set accessor to any Series object.

import pandas_sets

Examples

import pandas_sets
import pandas as pd
df = pd.DataFrame({'post': [1, 2, 3, 4],
                    'tags': [{'python', 'pandas'}, {'philosophy', 'strategy'}, {'scikit-learn'}, {'pandas'}]
                   })
                   
pandas_posts = df[df.tags.set.contains('pandas')]

pandas_posts.tags.set.add('data')

pandas_posts.tags.set.update({'data', 'analysis'})

pandas_posts.tags.set.len()

Notes

  • The implementation is primitive for now. It's based heavily on the pandas' core StringMethods implementation.
  • The public API has been tested for most expected scenarios.
  • The API will need to be extended to handle NA values appropriately.