Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

false cache entry from autoloading data #288

Open
2 of 5 tasks
alsmnn opened this issue Jan 28, 2019 · 3 comments
Open
2 of 5 tasks

false cache entry from autoloading data #288

alsmnn opened this issue Jan 28, 2019 · 3 comments

Comments

@alsmnn
Copy link

alsmnn commented Jan 28, 2019

Report an Issue / Request a Feature

Autoloading data creates a cache entry with the name data instead of the name of the dataset

I'm submitting a (Check one with "x") :

  • bug report
  • feature request

Issue Severity Classification -

(Check one with "x") :

  • 1 - Severe
  • 2 - Moderate
  • 3 - Low
Expected Behavior

load.project() creates a cache entry for every file in data/ with the corresponding name of the file in data/

Current Behavior

load.project() creates a cache entry with the name data , ignoring the original name of the file in data/

Steps to Reproduce Behavior

load.project() with a file in data/

Screenshots

grafik

Version Information
          Package           Version 
"ProjectTemplate"           "0.8.2" 

R version 3.5.1

Possible Solution

-/-

Best regards,
@aljole

@Hugovdberg
Copy link
Collaborator

What type of file are you trying to load? The .ACC is not supported, and should normally not be printed. The cached name is determined by detecting which new variables are created by the reader. So we need to know which reader is causing this. Could you post the complete filename that's causing this issue?

@alsmnn
Copy link
Author

alsmnn commented Jan 28, 2019

Hi @Hugovdberg,
the name of the file is tcga.ACC.RData and list.data()is showing:

> list.data()
               filename  varname is_ignored is_directory is_cached cache_only       reader
              README.md               FALSE        FALSE     FALSE      FALSE             
tcga.ACC tcga.ACC.RData tcga.ACC      FALSE        FALSE     FALSE      FALSE rdata.reader

I already tried tcga_ACC.RData and tcga-ACC.RData, but PT is converting it to tcga.ACCanyway.

Best regards,
@aljole

@Hugovdberg
Copy link
Collaborator

Ah, now I see what's going on, the variable name is initially determined by ProjectTemplate from the filename. However, .RData files are simply loaded into the global environment, and therefore ignore this initial variable name. Apparently your tcga.ACC.RData contains a variable called data, and therefore that's the name that's used for caching.

After loading the data any new variables in the global environment are cached by their actual name. There are several readers which alter the variable names during loading (eg, all sheets are read from Excel files, and the sheetname is appended to the initial filename). Variables are therefore not cached by the original filename, which unfortunately breaks the link between the file and the cache.

There is currently no solution to this problem, besides a rewrite of the data loading system. It would require a file info reader for each file type that reports the exact variable names as they would be created by the corresponding reader. I've done some work on that to create such a framework, but currently don't have the time to continue this major overhaul.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants