Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outputs for README example don't match #4

Closed
jseabold opened this issue Dec 19, 2014 · 2 comments
Closed

Outputs for README example don't match #4

jseabold opened this issue Dec 19, 2014 · 2 comments

Comments

@jseabold
Copy link

There doesn't seem to be a __version__ in the code, but I installed via pip semi-recently. The filtered_output and the pandas-ply output in the README don't match. The pandas-ply results are missing January. On Python 3.4

import pandas as pd
from ply import install_ply, X
install_ply(pd)

%load_ext rpy2.ipython.rmagic
from pandas.rpy import common as com
%R library("nycflights13")
flights = com.load_data("flights")

grouped_flights = flights.groupby(['year', 'month', 'day'])
output = pd.DataFrame()
output['arr'] = grouped_flights.arr_delay.mean()
output['dep'] = grouped_flights.arr_delay.mean()
filtered_output = output[(output.arr > 30) & (output.dep > 30)]

print(filtered_output)

(flights
  .groupby(['year', 'month', 'day'])
  .ply_select(
    arr = X.arr_delay.mean(),
    dep = X.dep_delay.mean())
  .ply_where(X.arr > 30, X.dep > 30))

Produces

[42]: print(filtered_output)
                      arr        dep
year month day                      
2013 1     16   34.247362  34.247362
           31   32.602854  32.602854
     2     11   36.290094  36.290094
           27   31.252492  31.252492
     3     8    85.862155  85.862155
           18   41.291892  41.291892
     4     10   38.412311  38.412311
           12   36.048140  36.048140
           18   36.028481  36.028481
           19   47.911697  47.911697
           22   37.812166  37.812166
           25   33.681250  33.681250
     5     8    39.609183  39.609183
           23   61.970899  61.970899
     6     13   63.753689  63.753689
           18   37.648026  37.648026
           24   51.176808  51.176808
           25   41.513684  41.513684
           27   44.783296  44.783296
           28   44.976852  44.976852
           30   43.510278  43.510278
     7     1    58.280502  58.280502
           7    40.306378  40.306378
           9    31.334365  31.334365
           10   59.626478  59.626478
           22   62.763403  62.763403
           23   44.959821  44.959821
           28   49.831776  49.831776
     8     1    35.989259  35.989259
           8    55.481163  55.481163
           9    43.313641  43.313641
           28   35.203074  35.203074
     9     2    45.518430  45.518430
           12   58.912418  58.912418
     10    7    39.017260  39.017260
     12    5    51.666255  51.666255
           8    36.911801  36.911801
           9    42.575556  42.575556
           10   44.508796  44.508796
           14   46.397504  46.397504
           17   55.871856  55.871856
           23   32.226042  32.226042

and

                      dep        arr
year month day                      
2013 2     11   39.073598  36.290094
           27   37.763274  31.252492
     3     8    83.536921  85.862155
           18   30.117960  41.291892
     4     10   33.023675  38.412311
           12   34.838428  36.048140
           18   34.915361  36.028481
           19   46.127828  47.911697
           22   30.642553  37.812166
     5     8    43.217778  39.609183
           23   51.144720  61.970899
     6     13   45.790828  63.753689
           18   35.950766  37.648026
           24   47.157418  51.176808
           25   43.063025  41.513684
           27   40.891232  44.783296
           28   48.827784  44.976852
           30   44.188179  43.510278
     7     1    56.233825  58.280502
           7    36.617450  40.306378
           9    30.711499  31.334365
           10   52.860702  59.626478
           22   46.667047  62.763403
           23   44.741685  44.959821
           28   37.710162  49.831776
     8     1    34.574034  35.989259
           8    43.349947  55.481163
           9    34.691898  43.313641
           28   40.526894  35.203074
     9     2    53.029551  45.518430
           12   49.958750  58.912418
     10    7    39.146710  39.017260
     12    5    52.327990  51.666255
           9    34.800221  42.575556
           17   40.705602  55.871856
           23   32.254149  32.226042
@jhorowitz-coursera
Copy link
Contributor

@jseabold -- Agh, great catch. There's a typo in the readme. It says:

output['arr'] = grouped_flights.arr_delay.mean()
output['dep'] = grouped_flights.arr_delay.mean()

but it should say

output['arr'] = grouped_flights.arr_delay.mean()
output['dep'] = grouped_flights.dep_delay.mean()

With that fix, the with- and without-ply fragments give the same output (up to column order, which the keyword-argument form of ply_select doesn't control). I'll fix that up. Thanks!

@jhorowitz-coursera
Copy link
Contributor

Fixed in readme & docs; I'll push the change to the docs when I release a new version. (I'll also add a __version__, as you mentioned would be useful.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants