Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: separate Chado Schema/Perl into two different repositories #79

Closed
spficklin opened this issue Jan 4, 2019 · 7 comments
Closed

Comments

@spficklin
Copy link
Contributor

I'd like to propose for discussion that we split this Chado repository into two different GitHub projects where the Chado schema is in one and everything else (i.e. Perl code) is in another.

There are several reasons I think this would be beneficial:

  1. Perl is not the preferred language for everyone, it's losing users to Python in the informatics sphere, and there are potentially newer ways to get Chado installed if we explored and implemented them.
  2. Tools that use the Chado schema sometimes never need the Perl code.
  3. Right now, Tripal packages it's own copy of Chado rather than rely on the official Chado release. It would be preferable to pull directly from the Chado repository rather than package Chado with tools.
  4. Changes in the schema are very slow while changes in the Perl code happen more regularly. As there has been some discussion related to Chado schema Governance it seems these two update schedules are different enough that separate repositories would help. i.e. one group of individuals responsible for the schema can worry about updates to that while another group can worry about updates to the Perl code.
  5. The last digit version number confuses people who think it implies schema changes.
@spficklin spficklin added the Discussion thread for community discussion. label Jan 4, 2019
@bradfordcondon
Copy link
Contributor

this would be helpful for accomplishing some of the major PAG goals as well (namely #63 #72 ) . I'd be very much in favor of this.

@spficklin spficklin changed the title Suggestion: separate Chado Schema from Chado Perl to two different repositories Suggestion: separate Chado Schema/Perl into two different repositories Jan 4, 2019
@scottcain
Copy link
Member

I'm in favor of it too--it would also make build/install easier, since right now it's a behemouth for a variety of reasons but one of them is to support both installing and configuring a database but also to install perl modules and scripts. And of course, with Tripal, fewer (but not zero) people need the perl parts.

@scottcain
Copy link
Member

While I think this is a good (very good) idea, doing it during the codefest might be tricky, since the way I envision doing it at least, is to make two new repos, move the different parts to them with histories intact, and then push the new repos to github. I've done this before with other repos, but it usually takes a few tries to get right, and I could tell people not to commit or push while I was actively working on it--so not something we'd really want to do during the codefest.

@bradfordcondon
Copy link
Contributor

bradfordcondon commented Jan 9, 2019

I think that using git filter-branch --subdirectory-filter would make this somewhat straight forward? I would propose making this repository for the SQL (and accompanying documentation) only, and adding a new remote for the tools. I can see how things get sticky fast, the above relies on the split being clearly delineated by subdirectories.

I marked it high priority because it narrows the scope of the codefest, but definitely dont want to derail us before we begin.

I'll try it out and we will be free to reject out of hand and deal with it later. I'm moving everything other than chado/sql and chado/modules into https://github.com/GMOD/chado_tools. That migration looks OK to me.

its possible that removing the tools commit history from this repo requires a force in which case yeah we should probably wait until after... but if we can do it with a PR than i dont see why not (is it important to remove the commit history for those files? Would it be OK to just make a PR deleting them all instead?).

@bradfordcondon
Copy link
Contributor

bradfordcondon commented Mar 7, 2019

the stag folder looks kind of interesting- seems like stub queries that are helpful?

We should probably keep these until they can be incorporated into the documentation (instead of removing them as i prune this repo to just the schema).

@bradfordcondon
Copy link
Contributor

I've got a PR ready for the revised files and structure.

I think once its reviewed, we should look into, say, this repo cleaner to remove those big files from the commit history.

I found some oneliners that can point us to the big files

git rev-list --objects --all \
   grep "$(git verify-pack -v .git/objects/pack/*.idx \
           sort -k 3 -n \
 tail -30 \
 awk '{print$1}')"

these seem to point the finger at chado/modules/sequence/apollo-bridge/sample_db/ which contains .bz2 files, is that about right? if its that simple, then we can just remove that folder from the commit history... but we probably need to do it AFTER we merge 1.4 into master. otherwise, we'll just have conflicting commit histories when we do so.

@laceysanderson
Copy link
Contributor

This was finally merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants