New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: namespace issues with hk_accidents
#16
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this new implementation (and introducing fst format to me)! Only have one concern about the download link.
R/download_data.R
Outdated
if(is.null(dataset)){ | ||
|
||
stop("please provide the name of the dataset to pull.") | ||
|
||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be better to impose a stricter check of the dataset name? If the user typed the dataset name wrong, the cannot open URL 'https://github.com/Hong-Kong-Districts-Info/hkdatasets/raw/master/data-ready/SOME-RANDOM-CHARS.fst'
error message may be hard to understand.
if(is.null(dataset)){ | |
stop("please provide the name of the dataset to pull.") | |
} | |
AVAILABLE_DATASETS = c("hk_accidents", "hk_casualties", "hk_vehicles") | |
if (!(dataset %in% AVAILABLE_DATASETS)) { | |
stop( | |
paste0("Please provide the name of the dataset to pull.\n", | |
"Datasets currently available: ", | |
# Convert column vector to text first | |
paste0(AVAILABLE_DATASETS, collapse = ", ") | |
) | |
) | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a great suggestion. I'll add checks for the dataset
argument input. Did this work for you locally? I like fst
for its fast loading speed, so hopefully this will work as quite a scalable solution in the long term.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep I ran the download and load fst data script in my local machine and it works well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic. I've now added some notes on NEWS.md
and README
, but this should be ready to merge to main and push to CRAN once ready.
One additional question - are data types preserved when converted to fst format? I compared the native dataset and the fst file of hk_accidents and do not see observable differences. Just want to ensure things will not go wrong for other datasets. Native dataset
fst file
|
As far as I'm aware all the main types like string, numeric, factors, and dates are preserved in |
Great. And probably we will not do list-columns as the dataset should be kept as simple as possible. |
Summary
This branch resolves the loading error with
hk_accidents
as mentioned in #15.To resolve this error, a new implementation is introduced, where
download_data()
is a function that downloads and reads in the data into R usingfst::read_fst()
. The datasets are stored in thefst
format on GitHub for fast loading and high compression.Changes
The changes made in this PR are:
hk_accidents
as this causes issues with CRAN R-CMD-checks.fst
files to GitHub (committed directly tomaster
branch, separately). These files are stored indata-ready
, which is ignored by R via.Rbuildignore
.download_data()
.New implementation
You can download and return a dataset with the following code:
The files are downloaded to a temporary directory, and the
download_data()
function returns a data frame directly.Check
Note
This fixes #15.