Skip to content

Latest commit

 

History

History
39 lines (30 loc) · 1.64 KB

DataProcessing.md

File metadata and controls

39 lines (30 loc) · 1.64 KB

Data Processing

  1. pandas
    pandas is a package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python.
    Project Source: https://github.com/pydata/pandas
    Project Homepage: http://pandas.pydata.org/

  2. Faker
    Faker is a package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.
    Project Source: https://github.com/joke2k/faker
    Project Documentation: http://fake-factory.readthedocs.org/en/latest/

  3. tablib
    Tablib is a format-agnostic tabular dataset library, written in Python.
    Project Source: https://github.com/kennethreitz/tablib
    Project Documentation: http://docs.python-tablib.org/en/latest/

  4. data_hacks
    Command line utilities for data analysis.
    Project Source: https://github.com/bitly/data_hacks

  5. fuzzywuzzy
    Fuzzy string matching like a boss.
    Project Source: https://github.com/seatgeek/fuzzywuzzy

  6. snownlp
    Python library for processing Chinese text.
    Project Source: https://github.com/isnowfy/snownlp

  7. jieba
    Chinese text segmentation.
    Project Source: https://github.com/fxsjy/jieba
    Online Demo Address: http://jiebademo.ap01.aws.af.cm/

  8. cubes
    Light-weight Python OLAP framework for multi-dimensional data analysis.
    Project Source: https://github.com/Stiivi/cubes
    Project Homepage: http://cubes.databrewery.org/