Submit to CRAN #12

StevenMMortimer · 2019-03-30T14:31:13Z

Adding a note for work relating to the submission of this package to CRAN.

jeffboichuk · 2019-04-02T12:20:10Z

Might be worth cleaning up campaign_id. There are 27 campaigns in the data, and campaign_id includes c(1:23, 27:30). This visualization shines a light on the issue:

library(tidyverse)
library(completejourney)

campaign_descriptions %>% 
  gather(start_date, end_date, key = campaign_event, value = date) %>% 
  ggplot(aes(x = date, y = campaign_id %>% reorder(date, FUN = 'min'))) + 
  geom_line(aes(group = campaign_id), color = "grey50") + 
  geom_point(aes(color = campaign_type)) +
  scale_x_date(date_breaks = "3 months", date_labels = "%b %Y") +
  theme(legend.position = "bottom") +
  labs(
    title = "Redemption Periods by Campaign in the Complete Journey Study",
    subtitle = "Campaigns lasted a median length of five weeks",
    x = "Date",
    y = "Campaign ID",
    color = "Campaign Type",
    caption = "Data Source: The completejourney package (https://github.com/bradleyboehmke/completejourney)"
  )

bradleyboehmke · 2019-04-02T17:56:50Z

@jeffboichuk and @StevenMMortimer, I let this slide for a while but got back to it today. I was able to update all the data sets that contain campaign_id and ensured the ID values range from 1-27 rather than 1-23, 27-30. I also went through all necessary checks and was able to submit it to CRAN. We still have a note on the package size so we'll see if they allow it or not. As soon as I hear back I will let you know.

bradleyboehmke · 2019-04-05T10:14:01Z

@jeffboichuk and @StevenMMortimer , message from Uwe on our submission.

Thanks, we see:

Size of tarball: 28204328 bytes

Not more than 5 MB for a CRAN package, please host such huge data elsewhere.

Best,
Uwe Ligges

Looks like we need to go with plan B - host the data outside of CRAN but have the package functions install the data (like the read_mnist() function in the dslabs package). What we could do is provide just a small subset of the transactions data in the package; small enough where the total size is less than 5 MB. But then have another function (i.e. read_transactions()) that imports the entire transactions data set. Thoughts?

StevenMMortimer · 2019-04-06T12:44:28Z

@bradleyboehmke Thanks for submitting! Bummer to hear about the size. How do you feel about automatically loading the data from GitHub using the .onLoad() hook? I'm not sure if that's an accepted practice, but just thinking of alternatives to having a read_data() function. If I was loading a data package, I would expect the data to be there without having to call another function.

Here is a version of the .onLoad call that I mocked up:

load_url <- function (url, ..., sha1 = NULL) {
  # based very closely on code for devtools::source_url
  stopifnot(is.character(url), length(url) == 1)
  temp_file <- tempfile()
  on.exit(unlink(temp_file))
  request <- httr::GET(url)
  httr::stop_for_status(request)
  writeBin(httr::content(request, type = "raw"), temp_file)
  file_sha1 <- digest::digest(file = temp_file, algo = "sha1")
  if (is.null(sha1)) {
    #message("SHA-1 hash of file is ", file_sha1)
  }
  else {
    if (nchar(sha1) < 6) {
      stop("Supplied SHA-1 hash is too short (must be at least 6 characters)")
    }
    file_sha1 <- substr(file_sha1, 1, nchar(sha1))
    if (!identical(file_sha1, sha1)) {
      stop("SHA-1 hash of downloaded file (", file_sha1, 
           ")\n  does not match expected value (", sha1, 
           ")", call. = FALSE)
    }
  }
  load(temp_file, envir = .GlobalEnv)
}

#load_url from: https://stackoverflow.com/questions/24846120/importing-data-into-r-rdata-from-github

.onLoad <- function(libname, pkgname) {
  message("Loading Package Data from GitHub")
  dataset_names <- c("campaigns", "campaign_descriptions", "coupons", "coupon_redemptions", 
                     "demographics", "products", "promotions", "transactions")
  pb <- txtProgressBar(min=0, max=1, style=3)
  for (i in 1:length(dataset_names)) {
    load_url(sprintf("https://github.com/bradleyboehmke/completejourney/blob/master/data/%s.rda?raw=true", dataset_names[i]))
    setTxtProgressBar(pb, value = i*(1/length(dataset_names)))
  }
  invisible()
}

bradleyboehmke · 2019-04-09T11:07:25Z

@StevenMMortimer , I like this idea. I've never seen this approach before so let me read up on it a bit. Is there a restructuring of the package required so that the .rda files are not uploaded to CRAN or is as simple as adding the ^data to the Rbuildignore?

StevenMMortimer · 2019-04-09T11:58:38Z

@bradleyboehmke Yes, I was thinking of keeping the files in their place and ignoring them on build.

bradleyboehmke · 2019-04-09T13:01:13Z

@StevenMMortimer have you found a couple packages that implement this method? If so please share, I'd like to dig into them and see if there are best practices. Or if you've identified best practices regarding this please do educate me :). Either way, this seems like a great approach

StevenMMortimer · 2019-04-09T19:29:25Z

@bradleyboehmke That picture is amazing. I really have no idea what I'm doing. My suggestion to include stuff in the .onLoad() hook was just a gut reaction. I can't think of or find any packages that do that. I could see CRAN still rejecting the package using that technique since it's weird to connect to an external source on load and download something unknown to the user, especially if it's large data that might not fit into the local machine's RAM.

In just searching for packages using that method, I ran across the drat package, which can host packages in a way that allows you to use install.packages(), update.packages(), etc. that is similar to CRAN. Although, I haven't used it or thought through what exactly that would look like on GitHub or GitHub Pages.

Package repo: https://github.com/eddelbuettel/drat
Dirk's blog about it: http://dirk.eddelbuettel.com/code/drat.html

Lastly, it might be worth revisiting with how important it is to be a CRAN package or the benefits we gain from getting it to a state that's installable like a CRAN package. The motivation is escaping me at the moment, aside from it just being a more consistent way to install and maintain for beginners.

bradleyboehmke · 2019-04-25T21:58:11Z

@StevenMMortimer, I made some updates to the package. I decided against the .onLoad and .onAttach approach because of the fact that we would be importing from an external website before the user was informed of us doing so. I monkeyed around with adding an interactive menu() approach where the user is asked if they want to import all the data sets at time of calling library(completejourney); however, that throws a CRAN warning as you are not supposed to include interactiveness at pkg load time.

Instead, I added a package install message and incorporated a get_data() function that will install one or more completejourney data sets (defaults to all). Right now it passes all CRAN checks but I want you to test it out and give me feedback before submitting to CRAN.

jeffboichuk · 2019-04-25T22:08:38Z

@bradleyboehmke nicely done! Looks good to me.

Perhaps "Download them with the get_data() function. Consequently, you will need an
active wifi connection to download them from GitHub." could be changed to "Run get_data() in your console to down the download the datasets from GitHub. An internet connection is required."? I got it, but new users might need more direction. Just a thought...

StevenMMortimer · 2019-04-26T18:41:07Z

@bradleyboehmke @jeffboichuk That all sounds great to me. Fingers crossed on the second CRAN submission.

bradleyboehmke · 2019-05-01T11:02:41Z

@StevenMMortimer @jeffboichuk Got feedback on the CRAN submission. No major blockers but they wanted the following changed:

Add link to https://www.8451.com/area51 in the Description field of DESCRIPTION file
Replace \dontrun{} with \donttest{} in @examples
Change if(require("dplyr")){} statements to just require("dplyr")

I've made these changes and resubmitted this morning. Should be good to go but will let you know once final acceptance comes in. Might want to start thinking about a hex sticker 🤔!

jeffboichuk · 2019-05-04T16:00:12Z

@bradleyboehmke @StevenMMortimer I have an idea for the hex sticker. It might be crazy. A Pacman-like maze that has a shopping cart at the start and a few food-related emojis along the way. The image would symbolize that the data represent a sample of households' purchases at a grocer from the start of the study period to the end of it. That's the journey, which shouldn't be confused with customer journey mapping. Customer journey mapping stems from problem recognition, to information search, to choosing within a consideration set, to purchase. Just thought I'd throw it out there as a potential way to capture the package's contents in the hex sticker. Thanks for the finishing touches, Brad!

bradleyboehmke · 2019-05-20T12:03:56Z

@jeffboichuk @StevenMMortimer Ok, it's been a few weeks since the last submission and I've got an update. On the last submission, which was fixing minor issues that the first reviewer, we had a different reviewer assess the submission. This time they responded with:

Thanks, we see code lines such as load(temp_file, envir = .GlobalEnv). Please do not modify the .GlobalEnv and just return a list of loaded objecs.

This peeved me off since it was not discussed by the other reviewer so I argued against it. Well, I lost so we have a couple options:

Have the get_data() return a list with the data frames. Easiest fix but now the user has a list of the tibbles and they need to do a second step to extract each one out as an individual data set.
Try implement an alternative route with the drat package which requires:
- Package the data as normal R package, e.g. completejourneydata and make it available in a GitHub repo.
- Add the data package as Suggest to completejourney.
- Add the data repository as Additonal_repositories to completejourney.

This approach is described in https://journal.r-project.org/archive/2017/RJ-2017-026/RJ-2017-026.pdf but I have not tried to implement it yet. Let me know your thoughts and I can make this change (hopefully quickly) and get the package resubmitted.

StevenMMortimer · 2019-05-20T13:15:22Z

@bradleyboehmke Thanks for dealing with all that. Option 1 (list of tibbles) sounds easy and reliable. When I looked at drat before it sounded like we needed to host the files somewhere and GitHub was not an option, but I don't really know what I'm talking about with that. If you want to learn more and go down that rabbit hole, that's fine too.

bradleyboehmke · 2019-05-21T10:37:28Z

If you guys are cool with going the list route than I'm fine with that.

bradleyboehmke · 2019-08-16T19:34:36Z

@jeffboichuk @StevenMMortimer , I let this get a little stale but I spent yesterday making revisions per CRANs feedback on the last package submit. Check out the updates and let me know your thoughts. I went to submit it to CRAN yesterday but they are closed for submissions until Aug 18 (summer vacation).

jeffboichuk · 2019-08-16T19:45:00Z

@bradleyboehmke I really like the shift you made from get_data() to get_transactions() and get_promotions(). Everything looks great to me!

bradleyboehmke · 2019-08-23T13:12:56Z

completejourney is now on CRAN. Enjoy with install.packages("completejourney"). Thanks!

StevenMMortimer · 2019-08-23T13:28:01Z

Thanks so much @bradleyboehmke! Awesome stuff.

bradleyboehmke self-assigned this Apr 2, 2019

bradleyboehmke closed this as completed Aug 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Submit to CRAN #12

Submit to CRAN #12

StevenMMortimer commented Mar 30, 2019

jeffboichuk commented Apr 2, 2019 •

edited

bradleyboehmke commented Apr 2, 2019

bradleyboehmke commented Apr 5, 2019

StevenMMortimer commented Apr 6, 2019

bradleyboehmke commented Apr 9, 2019

StevenMMortimer commented Apr 9, 2019

bradleyboehmke commented Apr 9, 2019

StevenMMortimer commented Apr 9, 2019

bradleyboehmke commented Apr 25, 2019

jeffboichuk commented Apr 25, 2019

StevenMMortimer commented Apr 26, 2019

bradleyboehmke commented May 1, 2019

jeffboichuk commented May 4, 2019 •

edited

bradleyboehmke commented May 20, 2019

StevenMMortimer commented May 20, 2019

bradleyboehmke commented May 21, 2019

bradleyboehmke commented Aug 16, 2019

jeffboichuk commented Aug 16, 2019

bradleyboehmke commented Aug 23, 2019

StevenMMortimer commented Aug 23, 2019

Submit to CRAN #12

Submit to CRAN #12

Comments

StevenMMortimer commented Mar 30, 2019

jeffboichuk commented Apr 2, 2019 • edited

bradleyboehmke commented Apr 2, 2019

bradleyboehmke commented Apr 5, 2019

StevenMMortimer commented Apr 6, 2019

bradleyboehmke commented Apr 9, 2019

StevenMMortimer commented Apr 9, 2019

bradleyboehmke commented Apr 9, 2019

StevenMMortimer commented Apr 9, 2019

bradleyboehmke commented Apr 25, 2019

jeffboichuk commented Apr 25, 2019

StevenMMortimer commented Apr 26, 2019

bradleyboehmke commented May 1, 2019

jeffboichuk commented May 4, 2019 • edited

bradleyboehmke commented May 20, 2019

StevenMMortimer commented May 20, 2019

bradleyboehmke commented May 21, 2019

bradleyboehmke commented Aug 16, 2019

jeffboichuk commented Aug 16, 2019

bradleyboehmke commented Aug 23, 2019

StevenMMortimer commented Aug 23, 2019

jeffboichuk commented Apr 2, 2019 •

edited

jeffboichuk commented May 4, 2019 •

edited