Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTrees without (reserved) child files #14

Open
petermr opened this issue Jul 7, 2016 · 0 comments
Open

CTrees without (reserved) child files #14

petermr opened this issue Jul 7, 2016 · 0 comments
Assignees

Comments

@petermr
Copy link
Member

petermr commented Jul 7, 2016

[See also https://github.com//issues/10 ]

Until recently CTrees were generated either locally or through getpapers or quickscrape. The automatically generated files contain at least one reserved file such as fulltext.pdf and this was used by CMine software to determine which directories in a CProject are actually CTrees. This was always recognised to be a heuristic, and recently with bulk download of metadata from Crossref we see many potential CTree without reserved files or even without any files. Here's a simple example:

├── PMC4678086
│   ├── eupmc_result.json
│   ├── fulltext.pdf
│   └── fulltext.xml
├── http_dx.doi.org_10.1001_jama.2016.7992
│   └── results.json
└── http_dx.doi.org_10.1007_s13201-016-0429-9

The first directory is retrieved by quickscrape from EPMC and the heuristics indicate it to be a potential CTree. The other two are retrieved from getpapers on Crossref followed by quickscrape which creates only metadata but currently are not flagged as CTrees. The empty directory is created (I think) by quickscrape which then fails to retrieve anything.

The original motivation for the heuristics is that we may introduce new reserved directories into a CProject and users might also introduce non-ctree directories. There was also the idea that we have a reserved file (e.g. metadata.json or log.xml) in any CTree directory`. At present I favour this, and we should discuss what is in it.

Currently I have added a switch

        cProject.setTreatAllChildDirectoriesAsCTrees(true);

which allows users to toggle this behaviour. I will also add results.json to the reserved files which flag "Ctree-ness".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant