Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding catto_median.R #10

Merged
merged 10 commits into from
Mar 2, 2018
Merged

Adding catto_median.R #10

merged 10 commits into from
Mar 2, 2018

Conversation

markroepke
Copy link
Contributor

I've added functionality for median encoding. This follows a similar process to mean encoding. I went ahead and updated cattonum.R, cattonum.Rd, README.Rmd and NAMESPACE, but left @bfgray3 the ability to run testthat in the way he sees fit.

Please let me know if you'd like me to do something differently.

@bfgray3
Copy link
Owner

bfgray3 commented Feb 28, 2018

Thanks Mark; this is a great idea. I have a few requests. Let me know if this is too much.

  • I think the only real difference we need to have between catto_mean and catto_median is the .f argument passed to *_labeler. Would you mind putting catto_mean and catto_median in one file mean_median.R like in dummy_onehot.R for example? We could then write a function factory (mean_median?) that takes the summary function (mean/median) as an argument so the two main function definitions can be one line each (plus their own documentation). It's not exported, so the name doesn't matter too much, but what do you think about center_labeler as the new name for the old mean_labeler?

  • The diff for the pull request is relatively big compared to the lines of code with actual changes. Could you look into this and see if there's a difference in line endings or something like that? One other thing is I have my .Rproj set to automatically strip trailing whitespace at the end of lines--not sure if this is causing the extra diff. Sorry for the inconvenience; I like to keep the diffs as compact as possible to make looking back at changes easier in the future.

  • Could you add some tests? You can cp test_mean.R test_median.R and then change the expected_* data frames at the top based on what the expected output should be. The input data frames are in helpers.R. Sorry tests are a mess and not intuitive right now. I have an issue open to refactor them.

  • Can you please knit README.Rmd and then add and commit README.md?

  • Please feel free to add yourself as "ctb" to Authors@R in DESCRIPTION.

Thanks again and sorry for the list of requests. I'll probably have a little time next week to help if you'd like me to.

@markroepke
Copy link
Contributor Author

This is not too much. These are great ideas and I will make these changes. (Also, I believe it was whitespace at the end of the lines.)

@bfgray3
Copy link
Owner

bfgray3 commented Mar 1, 2018

Thanks Mark. If you get a chance can you look into these last two things?

  • Git is still giving some trouble with the whitespace. This link shows the non-whitespace changes and this link shows all the changes including whitespace. This'll be a little bit of a pain, but can you open each of the modified files in RStudio with cattonum.Rproj activated and save them? This should strip the trailing whitespace. Also, can you revert to the previous README.md? Sorry this didn't turn out as I wanted and git is showing some weird changes. I'll just knit it after we merge.

  • Thanks for adding mean_median. Could you give it a slight tweak to mirror dummy_onehot? The signature and beginning of body could look something like

mean_median <- function(.center_fn) {

  function(train, ..., response, test, verbose = TRUE) {

s.t. we can define the two exported functions in one line each

catto_mean <- mean_median(mean)
catto_median <- mean_median(median)

The pull request is looking good!

@markroepke
Copy link
Contributor Author

Sorry, I didn't actually look at dummy_onehot.R and I should have. I like the way you do this and I will certainly make these changes to match your package style.

On the other issue, something is wrong with my RStudio and it's freezing when opening .Rproj files. I've been meaning to fix this because it also affects the work with my website. If I can't fix it, we can talk about alternatives.

@bfgray3
Copy link
Owner

bfgray3 commented Mar 2, 2018

Thanks Mark. Just one more request before merging. From the command line, could you run the following?

git checkout 7e17bac37bccc638f9b65d66e8244b8005ce2fe5 README.md

cd ./R

for f in *.R
do
  sed -i 's/[ \t]*$//' "$f"
done

cd ../tests/
  
for f in *.R
do
  sed -i 's/[ \t]*$//' "$f"
done

# commit

This will checkout a previous version of README.md (I'll just re-knit it to avoid the weird changes git is picking up) and trim the trailing whitespace from *.R files.

@markroepke
Copy link
Contributor Author

I believe this should be good now. Feel free to squash the commits together if everything is how you want it to be.

@bfgray3
Copy link
Owner

bfgray3 commented Mar 2, 2018

Perfect. Sorry for the hassle and thanks for the contribution!

@bfgray3 bfgray3 merged commit 588be6d into bfgray3:master Mar 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants