Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a rule to build the folders of data/ #66

Closed
wants to merge 1 commit into from

Conversation

PetitLepton
Copy link

Hi,

as the data folders are not included in the Github repository, they are not there when the project is cloned. But the Makefile contains commands related to /data (and scripts could). I included a rule for creating the missing folders.

@isms
Copy link
Collaborator

isms commented Mar 12, 2017

@PetitLepton I'm not sure I understand -- are these directories not being created when you use cookiecutter?

Here's what I see:

$ cookiecutter git@github.com:drivendata/cookiecutter-data-science.git
You've cloned /home/isaac/.cookiecutters/cookiecutter-data-science before. Is it okay to delete and re-clone it? [yes]: 
project_name [project_name]: test
repo_name [test]: test
author_name [Your name (or your organization/company/team)]: test
description [A short description of the project.]: test
Select open_source_license:
1 - MIT
2 - BSD
3 - Not open source
Choose from 1, 2, 3 [1]: 1
s3_bucket [[OPTIONAL] your-bucket-for-syncing-data (do not include 's3://')]: 
Select python_interpreter:
1 - python
2 - python3
Choose from 1, 2 [1]: 2

$ cd test/

$ ls
data  LICENSE   models     README.md   reports           src                  tox.ini
docs  Makefile  notebooks  references  requirements.txt  test_environment.py

$ tree data
data
├── external
├── interim
├── processed
└── raw

4 directories, 0 files

@isms
Copy link
Collaborator

isms commented Mar 12, 2017

On further reflection, it seems like maybe you are cloning the repository rather than using the cookiecutter package?

@PetitLepton
Copy link
Author

Hi,
sorry, I did not explain properly (or I did not understand how to use the package).

First, I set up my project structure for the first time with cookiecutter, following your example. There, the data folders are indeed created. I now include my scripts, my data, etc. When I want to share the project, I push it on github. As, purposefully, /data/ is part of .gitignore the folder is not included on gihub. If my collaborator clone my project, the /data/ folder is missing and I may not know how to build the folders tree.

@isms
Copy link
Collaborator

isms commented Mar 12, 2017

Ah, okay — I think there are two things to consider:

  1. Our presets in data are designed to be reasonable defaults, so not all projects will end up having the same subfolders.

  2. If the data is not actually being stored in the git repo, presumably you can mirror it elsewhere or pass it along.

That being said, if you want to make the folders part of the repo, all you have to do is force add the placeholder files to your version control:

$ git status
On branch master
nothing to commit, working directory clean

$ git add --force data/*.gitkeep

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	new file:   data/external/.gitkeep
	new file:   data/interim/.gitkeep
	new file:   data/processed/.gitkeep
	new file:   data/raw/.gitkeep

Now anybody cloning the repo will get those folders created as well.

Happy to discuss further if there's a discussion to have about keeping this in the Makefile but otherwise OK closing this issue?

@PetitLepton
Copy link
Author

OK, I understand. Thank you for the explanation and for the project in general.

@isms
Copy link
Collaborator

isms commented Mar 13, 2017

@PetitLepton Thanks for raising the question!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants