Implementation probably needed for `sort-values` #8

ezmiller · 2020-03-06T14:47:58Z

The following piped transformation shows a gap that we might think about filling on this library:

(defn properties-per-host [df]
  (-> df
      (pt/melt {:id-vars :host_id :value-vars [:listing_id]})
      (pt/groupby :host_id)
      (pt/subset-cols :value)
      (pt/n-unique)
      (pt/data-frame)
      (py. sort_values :by :value :ascending false)
      (pt/rename {:columns {:value :num_unique_listings}}))
  ))

In order to sort the n-unique count on the :value column, it was necesssary as things stand to first cast the result as a data frame (it was a series after the group-by and aggregation fn), and then to call the sort_values method on the :pyobject.

It would be nice to set things up such that we don't need to do these extra steps.

The text was updated successfully, but these errors were encountered:

ezmiller · 2020-03-06T14:48:42Z

@alanmarazzi: I'd be game to open a PR for this. Presuming you are okay with that, is there some context and guidance you'd like to provide?

alanmarazzi · 2020-03-07T16:08:45Z

I'm absolutely ok with that! I think that sort_values might go into the generics ns.

After this the true main step is to decide whether you want to use kw-call or simple-kw-call (here).

The difference between the two is whether you think that having an explicit positional first argument makes sense for the implementation. For example, here you can find rename and to-excel which use the 2 different functions.

The reason why I decided to implement them like that is because to-excel always needs a filename/path, no exceptions, so it makes sense to have a Clojure positional argument dedicated to that, while the same is not true with rename since there are no mandatory arguments and bypassing one you might have to avoid passing another one.

After this, a simple docstring with a couple of examples will be ok, no need to add docs for all the arguments, I have to get rid of those for the other functions as well, since it's impossible to list them all. One cool thing would be to link to the original docs https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html, but of course even in this way you can't know whether people want the DataFrame or the Series implementation. (Suggestions are well accepted).

The examples can be used for some tests, nothing too extensive, we don't want to test pandas, but the panthera implementation.

After everything is working and test run, the last thing to do is to add the function into the panthera ns

panthera/src/panthera/panthera.clj

Line 12 in d35232e

(export-symbols

.

Please let me know about any inquiries you might have, I'd like to turn this into a contributing guide, and you'll see that once you get the hang of it it's very simple to add functionality to this!

ezmiller · 2020-03-09T13:07:59Z

@alanmarazzi thanks for this writeup! I'm going to try to turn to this this week, and will certainly come back with any questions I may have. Regarding the docstring, can you give me an example of what you think is a good model for the docstrings going forward? I have to say it has been very convenient to have some of the docstrings that you have written in place. I'm thinking, for example, of the docstrings for group-by or melt. Is your current thinking that those kind of docstrings wouldn't be included?

alanmarazzi · 2020-03-09T13:49:03Z

You're right, ok just to be clear a dumb example of the issue is this: https://pandas.pydata.org/pandas-docs/stable/search.html?q=swaplevel#

Now, the swap-level function will work wether you pass a series, a data-frame or even an index, but we can't know it beforehand what's what.

The problem is that if you look at the docs there are different arguments accepted depending on the type you get.

I'd settle for something like:

Switch levels i and j of a multi index.

Original docs https://pandas.pydata.org/pandas-docs/stable/search.html?q=swaplevel#

Arguments:

  - df-or-srs -> data-frame or series
  - i, j -> int, str: the levels that you want to swap

Examples:

  (swap-level my-df 0 1)
  (swap-level my-srs :a :b)
  (swap-level my-df 0 :b)

So, clarify only args and send people to original docs for attrs (in this case there's none because I decided to not let people easily access copy and inplace).

Anyway I'm really keen to receive suggestions about this, I already know pandas extensively, but anyway I find myself on the docs very often because of its API size and depth

ezmiller mentioned this issue Mar 22, 2020

Add support for sort_values function #10

Merged

alanmarazzi closed this as completed in #10 Mar 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementation probably needed for `sort-values` #8

Implementation probably needed for `sort-values` #8

ezmiller commented Mar 6, 2020

ezmiller commented Mar 6, 2020

alanmarazzi commented Mar 7, 2020

ezmiller commented Mar 9, 2020

alanmarazzi commented Mar 9, 2020

Implementation probably needed for sort-values #8

Implementation probably needed for sort-values #8

Comments

ezmiller commented Mar 6, 2020

ezmiller commented Mar 6, 2020

alanmarazzi commented Mar 7, 2020

ezmiller commented Mar 9, 2020

alanmarazzi commented Mar 9, 2020

Implementation probably needed for `sort-values` #8

Implementation probably needed for `sort-values` #8