Skip to content

dexplo/dexplo

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

dexplo

Build Status PyPI - License

A data analysis library comparable to pandas

Installation

You must have cython installed. Run python setup.py build_ext --use-cython -i

Main Goals

  • A minimal set of features
  • Be as explicit as possible
  • There should be one-- and preferably only one --obvious way to do it.

Data Structures

  • Only DataFrames
  • No Series

Only Scalar Data Types

All data types allow nulls

  • bool - always 8 bits
  • int
  • float
  • str - stored as a categorical
  • datetime
  • timedelta

Column Labels

  • No hierarchical index
  • Column names must be strings
  • Column names must be unique

Row Labels

  • No row labels for now
  • Only a number display on the output

Subset Selection

  • Only one way to select data - [ ]
  • Subset selection will be explicit and necessitate both rows and columns
  • Rows will be selected only by integer location
  • Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous
  • Slice notation is also OK

Development

  • Must use type hints
  • Must use 3.6+ - fstrings
  • numpy

Advantages over pandas

  • Easier to write idiomatically
  • String processing will be much faster
  • Nulls allowed in each data type
  • Nearly all operations will be faster

API

Attributes

  • size
  • shape
  • values
  • dtypes

Methods

Stats

  • abs
  • all
  • any
  • argmax
  • argmin
  • clip
  • corr
  • count
  • cov
  • cummax
  • cummin
  • cumprod
  • cumsum
  • describe
  • max
  • min
  • median
  • mean
  • mode
  • nlargest
  • nsmallest
  • prod
  • quantile
  • rank
  • round
  • std
  • streak
  • sum
  • var
  • unique
  • nunique
  • value_counts

Selection

  • drop
  • head
  • isin
  • rename
  • sample
  • select_dtypes
  • tail
  • where

Missing Data

  • isna
  • dropna
  • fillna
  • interpolate

Other

  • append
  • astype
  • factorize
  • groupby
  • iterrows
  • join
  • melt
  • pivot
  • replace
  • rolling
  • sort_values
  • to_csv

Other (after 0.1 release)

  • cut
  • plot
  • profile

Functions

  • read_csv
  • read_sql
  • concat

Group By - specifically with groupby method

  • agg
  • all
  • apply
  • any
  • corr
  • count
  • cov
  • cumcount
  • cummax
  • cummin
  • cumsum
  • cumprod
  • head
  • first
  • fillna
  • filter
  • last
  • max
  • median
  • min
  • ngroups
  • nunique
  • prod
  • quantile
  • rank
  • rolling
  • size
  • sum
  • tail
  • var

str - df.str.<method>

  • capitalize
  • cat
  • center
  • contains
  • count
  • endswith
  • find
  • findall
  • get
  • get_dummies
  • isalnum
  • isalpha
  • isdecimal
  • isdigit
  • islower
  • isnumeric
  • isspace
  • istitle
  • isupper
  • join
  • len
  • ljust
  • lower
  • lstrip
  • partition
  • repeat
  • replace
  • rfind
  • rjust
  • rpartition
  • rsplit
  • rstrip
  • slice
  • slice_replace
  • split
  • startswith
  • strip
  • swapcase
  • title
  • translate
  • upper
  • wrap
  • zfill

dt - df.dt.<method>

  • ceil
  • day
  • day_of_week
  • day_of_year
  • days_in_month
  • floor
  • freq
  • hour
  • is_leap_year
  • is_month_end
  • is_month_start
  • is_quarter_end
  • is_quarter_start
  • is_year_end
  • is_year_start
  • microsecond
  • millisecond
  • minute
  • month
  • nanosecond
  • quarter
  • round
  • second
  • strftime
  • to_pydatetime
  • to_pytime
  • tz
  • tz_convert
  • tz_localize
  • weekday_name
  • week_of_year
  • year

td - df.td.<method>

  • ceil
  • components
  • days
  • floor
  • freq
  • microseconds
  • milliseconds
  • nanoseconds
  • round
  • seconds
  • to_pytimedelta

About

Data exploration library with a pandas-like API

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages