Data Processing

pandas
pandas is a package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
Project Source: https://github.com/pydata/pandas
Project Homepage: http://pandas.pydata.org/
Faker
Faker is a package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.
Project Source: https://github.com/joke2k/faker
Project Documentation: http://fake-factory.readthedocs.org/en/latest/
tablib
Tablib is a format-agnostic tabular dataset library, written in Python.
Project Source: https://github.com/kennethreitz/tablib
Project Documentation: http://docs.python-tablib.org/en/latest/
data_hacks
Command line utilities for data analysis.
Project Source: https://github.com/bitly/data_hacks
fuzzywuzzy
Fuzzy string matching like a boss.
Project Source: https://github.com/seatgeek/fuzzywuzzy
snownlp
Python library for processing Chinese text.
Project Source: https://github.com/isnowfy/snownlp
jieba
Chinese text segmentation.
Project Source: https://github.com/fxsjy/jieba
Online Demo Address: http://jiebademo.ap01.aws.af.cm/
cubes
Light-weight Python OLAP framework for multi-dimensional data analysis.
Project Source: https://github.com/Stiivi/cubes
Project Homepage: http://cubes.databrewery.org/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataProcessing.md

DataProcessing.md

Data Processing

Files

DataProcessing.md

Latest commit

History

DataProcessing.md

File metadata and controls

Data Processing