A data analysis library comparable to pandas
You must have cython installed. Run python setup.py build_ext --use-cython -i
- A minimal set of features
- Be as explicit as possible
- There should be one-- and preferably only one --obvious way to do it.
- Only DataFrames
- No Series
All data types allow nulls
- bool - always 8 bits
- int
- float
- str - stored as a categorical
- datetime
- timedelta
- No hierarchical index
- Column names must be strings
- Column names must be unique
- No row labels for now
- Only a number display on the output
- Only one way to select data -
[ ]
- Subset selection will be explicit and necessitate both rows and columns
- Rows will be selected only by integer location
- Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous
- Slice notation is also OK
- Must use type hints
- Must use 3.6+ - fstrings
- numpy
- Easier to write idiomatically
- String processing will be much faster
- Nulls allowed in each data type
- Nearly all operations will be faster
- size
- shape
- values
- dtypes
Stats
- abs
- all
- any
- argmax
- argmin
- clip
- corr
- count
- cov
- cummax
- cummin
- cumprod
- cumsum
- describe
- max
- min
- median
- mean
- mode
- nlargest
- nsmallest
- prod
- quantile
- rank
- round
- std
- streak
- sum
- var
- unique
- nunique
- value_counts
Selection
- drop
- head
- isin
- rename
- sample
- select_dtypes
- tail
- where
Missing Data
- isna
- dropna
- fillna
- interpolate
Other
- append
- astype
- factorize
- groupby
- iterrows
- join
- melt
- pivot
- replace
- rolling
- sort_values
- to_csv
Other (after 0.1 release)
- cut
- plot
- profile
Functions
- read_csv
- read_sql
- concat
Group By - specifically with groupby
method
- agg
- all
- apply
- any
- corr
- count
- cov
- cumcount
- cummax
- cummin
- cumsum
- cumprod
- head
- first
- fillna
- filter
- last
- max
- median
- min
- ngroups
- nunique
- prod
- quantile
- rank
- rolling
- size
- sum
- tail
- var
str - df.str.<method>
- capitalize
- cat
- center
- contains
- count
- endswith
- find
- findall
- get
- get_dummies
- isalnum
- isalpha
- isdecimal
- isdigit
- islower
- isnumeric
- isspace
- istitle
- isupper
- join
- len
- ljust
- lower
- lstrip
- partition
- repeat
- replace
- rfind
- rjust
- rpartition
- rsplit
- rstrip
- slice
- slice_replace
- split
- startswith
- strip
- swapcase
- title
- translate
- upper
- wrap
- zfill
dt - df.dt.<method>
- ceil
- day
- day_of_week
- day_of_year
- days_in_month
- floor
- freq
- hour
- is_leap_year
- is_month_end
- is_month_start
- is_quarter_end
- is_quarter_start
- is_year_end
- is_year_start
- microsecond
- millisecond
- minute
- month
- nanosecond
- quarter
- round
- second
- strftime
- to_pydatetime
- to_pytime
- tz
- tz_convert
- tz_localize
- weekday_name
- week_of_year
- year
td - df.td.<method>
- ceil
- components
- days
- floor
- freq
- microseconds
- milliseconds
- nanoseconds
- round
- seconds
- to_pytimedelta