Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not use submodules for data import #679

Closed
rokroskar opened this issue Sep 18, 2019 · 2 comments
Closed

Do not use submodules for data import #679

rokroskar opened this issue Sep 18, 2019 · 2 comments
Labels

Comments

@rokroskar
Copy link
Member

In the current implementation, git submodules are used to handle importing data from other git repositories (including renku projects). This has the benefit of providing versioning and syncing support with upstream changes using git tooling, but submodules are hardly anyone's favorite aspect of git. It seems that a simpler solution might be to implement the synchronization part on the renku side and simply copy the data from the source repository into the desired location in the renku project.

Some requirements:

  • store sufficient metadata to allow for easily identifying the source and repeat the download if necessary
  • if the source is a renku project, the link should be made explicit in the metadata and get passed on to the KG
  • provide the means to easily pull in changes to imported data from the source
@rokroskar rokroskar added the Epic label Sep 18, 2019
@m-alisafaee
Copy link
Contributor

One feature that we lose when moving away from using submodules is that we don't see lineage from the remote repo with renku log. A local renku project cannot have that information unless it has access to the full remote repository. For projects hosted on a server (e.g. renkulab.io) it would be possible to fetch remote linage information from a service hosted on the server.

@rokroskar
Copy link
Member Author

We should also be careful not to break backwards compatibility here - ideally existing renku projects with data imported from other git repositories should continue to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants