Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code reorg: separate data #54

Closed
toloudis opened this issue Feb 3, 2023 · 1 comment · Fixed by #57
Closed

code reorg: separate data #54

toloudis opened this issue Feb 3, 2023 · 1 comment · Fixed by #57
Assignees

Comments

@toloudis
Copy link
Collaborator

toloudis commented Feb 3, 2023

Put all data subdirs into a data/ directory, and all source code into a src/ to make it very obvious which parts of this repo are code vs data.

@meganrm
Copy link
Collaborator

meganrm commented Feb 4, 2023

Refactoring suggestions:

put datasets in data folder:

  1. change temp folder: const TEMP_FOLDER = "./data/" + id; should be changed to const TEMP_FOLDER = "./tmp/" + id;
  2. In .gitignore: data/**/cell-feature-analysis.json and data/**/file-info.json need to be changed to tmp/**/cell-feature-analysis.json and tmp/**/file-info.json
  3. data should include the 5 directories that start with dataset-
  4. line 77 of check-input-datasets.js should be changed from fsPromises.readdir("./").then(async (files) => { to fsPromises.readdir("./data").then(async (files) => {

Verify these changes:

npm run validate-datasets: should print out a bunch of "passed" print statements
npm run process-dataset data/dataset-cellsystems-2021 true: should run, shouldn't create any new files that aren't git ignored.

put command line scripts in bin folder

  1. new folder: bin with process-dataset, upload-dataset-image and release-dataset
  2. change package.json to have
"process-dataset": "node bin/process-dataset",
    "upload-image": "node bin/upload-dataset-image",
    "release-dataset": "node bin/release-dataset",

Verify these changes:

npm run process-dataset data/dataset-cellsystems-2021 true: should run

Put rest of the code in src

  1. rest of the code, aws, data-validation, firebase, utils should be moved into new directory src
  2. new directory src/process-single-dataset/ should have the file currently named process-single-dataset.js but renamed index.js. Also steps/ directory and constants.js
/bin/process-dataset/index.js
/src/process-single-dataset/index.js (was process-dataset/process-single-dataset.js)
/src/process-single-dataset/steps/ (moved but not renamed)
/src/process-single-dataset/constants.js (moved but not renamed)
  1. The Makefile in data-validation could also be moved to bin (can pair-program if have trouble with the references) verify: cd bin && make docs should create git ignored html files
  2. resolve any remaining broken references

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants