Skip to content
Data exploration library with a pandas-like API
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
dexplo
.gitignore updates Aug 13, 2019
.travis.yml notag Jan 26, 2018
LICENSE Initial commit Jan 25, 2018
MANIFEST.in initial dexplo commit Jan 25, 2018
README.md updated description May 9, 2019
README.rst initial dexplo commit Jan 25, 2018
requirements.txt v003 Jan 26, 2018
setup.py refactor arithmetic ops Nov 6, 2018

README.md

dexplo

A data analysis library comparable to pandas

Installation

You must have cython installed. Run python setup.py build_ext --use-cython -i

Build Status

Main Goals

  • A minimal set of features
  • Be as explicit as possible
  • There should be one-- and preferably only one --obvious way to do it.

Data Structures

  • Only DataFrames
  • No Series

Only Scalar Data Types

All data types allow nulls

  • bool - always 8 bits
  • int
  • float
  • str - stored as a categorical
  • datetime
  • timedelta

Column Labels

  • No hierarchical index
  • Column names must be strings
  • Column names must be unique

Row Labels

  • No row labels for now
  • Only a number display on the output

Subset Selection

  • Only one way to select data - [ ]
  • Subset selection will be explicit and necessitate both rows and columns
  • Rows will be selected only by integer location
  • Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous
  • Slice notation is also OK

Development

  • Must use type hints
  • Must use 3.6+ - fstrings
  • numpy

Advantages over pandas

  • Easier to write idiomatically
  • String processing will be much faster
  • Nulls allowed in each data type
  • Nearly all operations will be faster

API

Attributes

  • size
  • shape
  • values
  • dtypes

Methods

Stats

  • abs
  • all
  • any
  • argmax
  • argmin
  • clip
  • corr
  • count
  • cov
  • cummax
  • cummin
  • cumprod
  • cumsum
  • describe
  • max
  • min
  • median
  • mean
  • mode
  • nlargest
  • nsmallest
  • prod
  • quantile
  • rank
  • round
  • std
  • streak
  • sum
  • var
  • unique
  • nunique
  • value_counts

Selection

  • drop
  • head
  • isin
  • rename
  • sample
  • select_dtypes
  • tail
  • where

Missing Data

  • isna
  • dropna
  • fillna
  • interpolate

Other

  • append
  • astype
  • factorize
  • groupby
  • iterrows
  • join
  • melt
  • pivot
  • replace
  • rolling
  • sort_values
  • to_csv

Other (after 0.1 release)

  • cut
  • plot
  • profile

Functions

  • read_csv
  • read_sql
  • concat

Group By - specifically with groupby method

  • agg
  • all
  • apply
  • any
  • corr
  • count
  • cov
  • cumcount
  • cummax
  • cummin
  • cumsum
  • cumprod
  • head
  • first
  • fillna
  • filter
  • last
  • max
  • median
  • min
  • ngroups
  • nunique
  • prod
  • quantile
  • rank
  • rolling
  • size
  • sum
  • tail
  • var

str - df.str.<method>

  • capitalize
  • cat
  • center
  • contains
  • count
  • endswith
  • find
  • findall
  • get
  • get_dummies
  • isalnum
  • isalpha
  • isdecimal
  • isdigit
  • islower
  • isnumeric
  • isspace
  • istitle
  • isupper
  • join
  • len
  • ljust
  • lower
  • lstrip
  • partition
  • repeat
  • replace
  • rfind
  • rjust
  • rpartition
  • rsplit
  • rstrip
  • slice
  • slice_replace
  • split
  • startswith
  • strip
  • swapcase
  • title
  • translate
  • upper
  • wrap
  • zfill

dt - df.dt.<method>

  • ceil
  • day
  • day_of_week
  • day_of_year
  • days_in_month
  • floor
  • freq
  • hour
  • is_leap_year
  • is_month_end
  • is_month_start
  • is_quarter_end
  • is_quarter_start
  • is_year_end
  • is_year_start
  • microsecond
  • millisecond
  • minute
  • month
  • nanosecond
  • quarter
  • round
  • second
  • strftime
  • to_pydatetime
  • to_pytime
  • tz
  • tz_convert
  • tz_localize
  • weekday_name
  • week_of_year
  • year

td - df.td.<method>

  • ceil
  • components
  • days
  • floor
  • freq
  • microseconds
  • milliseconds
  • nanoseconds
  • round
  • seconds
  • to_pytimedelta
You can’t perform that action at this time.