Stable interface v1 #357
Replies: 3 comments 1 reply
-
Set and get attributesIn v1, for compatibility with the superclass DataFrame, we need to review these methods.
1) Get attr with dot operator
This is supported by DataFrame; no change required. 2) Get row selection with a boolean series indexer
This is supported by DataFrame; no change required. 3) Get row selection via regional retrieval
Unfortunately this must be dropped as it conflicts with Dataframe df[colname] to get a column.
We can implement this early, in v.0.0.*, issue a futurewarning, then maintain regional retrieval only through this indexer in v1. I used .chr as name; but other proposals are welcome. Perhaps .gloc? .genloc? .reg? .prloc? 4) Get attr by indexing with list of column names
We implemented the first method above. 5) Set attrOld pyranges supported these two syntaxes for setting new columns.
This is how you create a column in pandas:
My proposal is to add support for the p['newcol']=value syntax early, in v.0.0.*, issue a futurewarning for any use as p.newcol; from v1, stick to DataFrame behavior. 6) Others syntaxesWhile old pyranges did not support any more syntaxes for getattr and setattr (if I haven't forgot anything!), inheritance from DataFrame in v1 will bring plenty, including support for the .index, a big absentee of old pyranges. These may be combined with .chr to set attributes only in certain regions:
|
Beta Was this translation helpful? Give feedback.
-
RoadmapHere's a google docs with updated roadmap for what to do with each method. This will reflect any accepted suggestions we may receive: https://docs.google.com/document/d/1zjv-uUBaiccSHQa-lef9H7_mtT7IuWtdKzRSvR00ZPQ/edit?usp=sharing |
Beta Was this translation helpful? Give feedback.
-
Possibly of interest: https://peps.python.org/pep-0702/ |
Beta Was this translation helpful? Give feedback.
-
Related to #353 : let's plan a v1 release a newly designed method interface. We'll need to break some backward compatibility, so let's open this discussion with the pyranges community to get feedback.
Among other things, we'll make a major transition in data structure. Currently, PyRanges is a collection of DataFrames. We'll change it to be a a subclass of DataFrame. This brings many advantages but compels to change a few methods to avoid conflicts.
In general, all methods of the future PyRanges object, including those native of pyranges and those inherited from pandas DataFrame, will return a PyRanges object whenever they can -- that is to say, whenever their returned value is a DataFrame with columns Chromosome, Start, End.
The first item of discussion is what to do with conflicts in the namespace of PyRanges and DataFrame.
Namespace conflicts
The methods/properties below are in both the current PyRanges and DataFrame namespaces. I write my proposal for resolution of conflicts.
Identical behavior, no conflict in sight; can safely drop pyranges version
-columns identical
-copy identical
-dtypes identical
-empty identical
-head identical
-tail identical
Deprecated due to transition from old data structure (collection of dataframes) to new (one dataframe):
-apply conflict
pyranges version can be dropped, in favor of gr.groupby('Chromosome').apply or gr.groupby(['Chromosome', 'Strand']).apply
or should be build a new methods that wraps for that? I vote no
futurewarning on its use
-items conflict
pyranges version can be dropped
futurewarning on its use
-keys conflict
pyranges version can be dropped
futurewarning on its use
-values conflict
pyranges version can be dropped
futurewarning on its use
With an almost identical dataframe version:
-to_csv small conflict
pd offers more args, but not chain=True. Must drop pyranges version
futurewarning on problematic use only
-drop small conflict
pyranges version can be dropped
futurewarning on use of "like" argument
-assign small conflict
pyranges version can be dropped
futurewarning on its use, since syntax is different from pd
-insert small conflict
pyranges version can be dropped
futurewarning on its use, since syntax is different from pd
-sample small conflict
behavior is identical, but pd has more args, and arg order is not respected. Must drop pyranges version
futurewarning on problematic use only
Methods related to overlap, to be renamed:
-join conflict
rename pyranges join to overlap_join
futurewarning on its use
-merge conflict
rename pyranges join to overlap_merge
futurewarning on its use
-subtract conflict
rename pyranges join to overlap_substract
futurewarning on its use
Magic methods including get and set attributes
Require extensive discussion; later
Beta Was this translation helpful? Give feedback.
All reactions