Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datalad get can't find URL despite registering via addurls (and I can see the URL with git annex whereis) #7582

Closed
watson-e-and opened this issue Apr 22, 2024 · 21 comments

Comments

@watson-e-and
Copy link

What is the problem?

I’ve been trying a set up a dataset that primarily lives on a web server, but needs to be clone-able by other people. The annex files are visible and downloadable from the server’s website. In particular, the files I’m concerned about here are in a subdataset.

I would like people to be able to clone the dataset from Github, and then (whether or not they have permission to push back to Github) run datalad get to download files from the web server. The web server does not show the hidden files like .git, and so cannot be used as a remote, I believe.

I used datalad addurls to add the URL of each file on the server to each file in the annex. When I run git annex whereis filename, it shows up that it lives on the server in the server’s local copy of the dataset, and that it lives on the web, with a correct URL. In fact, if I click on that URL and open it in a browser, it downloads my file.

The dataset lives on Github, but the annex does not. When I make a clone of the superdataset on my personal computer, I get messages like

[INFO   ] Unable to parse git config from origin                                                                                       
[INFO   ] Remote origin does not have git-annex installed; setting annex-ignore                                                        
|   This could be a problem with the git-annex installation on the remote. Please make sure that git-annex-shell is available in PATH when you ssh into the remote. Once you have fixed the git-annex installation, run: git annex enableremote origin 
install(ok): /home/erin/Documents/DHA/carcas (dataset)

Then when I'm in the dataset carcas-models that has the annex and I run datalad get models /Alpaca\ 3rd\ Carpal\ L.glb, I get this error message:

get(error): models/Alpaca 3rd Carpal L.glb (file) [no known url                                
no known url
no known url]

I suspect my problem is with how I set things up with git annex, because when I try git annex get models/Alpaca\ 3rd\ Carpal\ L.glb, I get the error:

get models/Alpaca 3rd Carpal L.glb (from web...) 
  no known url

  Unable to access these remotes: web

  Maybe add some of these git remotes (git remote add ...):
        095e299d-037e-4172-87e0-bbd7183a6613 -- CARCAS models on the 3dviewers server

  (Note that these git remotes have annex-ignore set: origin)
failed
get: 1 failed

I'm confused on how to debug this because when I run git annex whereis models/Alpaca\ 3rd\ Carpal\ L.glb, everything looks correct:

whereis models/Alpaca 3rd Carpal L.glb (2 copies) 
        00000000-0000-0000-0000-000000000001 -- web
        095e299d-037e-4172-87e0-bbd7183a6613 -- CARCAS models on the 3dviewers server

  web: https://3dviewer.sites.carleton.edu/carcas/carcas-models/models/Alpaca%203rd%20Carpal%20L.glb
ok

What's the correct way to set up this use case? I don't think that I want the server to be a special remote, because the hidden files like .gitattributes aren't visible. I want to be able to put more files on the server, add their URLS based on where they are on the server, and push to Github so that other people can get these files if they want.

What steps will reproduce the problem?

I'm not sure how to reproduce without access to another web server.

DataLad information

Datalad 0.19.6
Git annex 10.20230626-g8594d49

Additional context

No response

Have you had any success using DataLad before?

This is my first time using Datalad, but everything else about using it has gone quite successfully.

@mslw
Copy link
Contributor

mslw commented Apr 22, 2024

Thanks for the detailed report. I think you are doing everything correctly, so this is puzzling...

I can confirm that when I use the URL from your whereis output, I can perform the git annex addurl - git annex drop - git annex get sequence without issues.

The crucial error seems to be "Unable to access these remotes: web", and I must say I never saw it happen before. Searching gives me this thread, which basically amounts to a) no internet connection (at least to that server), or b) file changed on the server, or the server reports size incorrectly (though I suppose git-annex's error message could have improved since).

Here are the things I would try to debug (although I am making guesses in a general direction here) - if you don't mind, please also share your outputs for those commands:

  • In addition to whereis, what does git annex info report for the file?
    git annex info models/Alpaca 3rd Carpal L.glb

  • Can you indeed access the URL from your laptop (you said you can though), and what is the content length reported in its header:
    curl -I https://3dviewer.sites.carleton.edu/carcas/carcas-models/models/Alpaca%203rd%20Carpal%20L.glb

  • What is the output when running git-annex with --debug? The debug messages should contain details on the GET request which gets made, and what status it returns. Hopefully seeing the full debug output that can tell us where along the way the problem occurs:
    git annex --debug get models/Alpaca 3rd Carpal L.glb

  • What is the addurls command which you ran? Did you use any of the optional parameters (e.g. --fast)? What's the (first two lines of) the url file, if you don't mind sharing? (although that shouldn't really matter, as the URL that gets recorded seems correct)

@yarikoptic
Copy link
Member

From use of / for folder separation I see that it is Windows? Indeed would be interesting to see direct git annex get --debug "models/Alpaca 3rd Carpal L.glb" invocation on why it cannot do it on Windows.

@watson-e-and
Copy link
Author

Here's the output of your debugging suggestions, thanks for your help @mslw !

$ git annex info models/Alpaca\ 3rd\ Carpal\ L.glb
file: models/Alpaca 3rd Carpal L.glb
size: 4.8 megabytes
key: MD5E-s4796924--be3ded304e622cd48b5b12bd9b372781.glb
present: false

And just to check the precise size, I got it in bytes as well:

git annex info models/Alpaca\ 3rd\ Carpal\ L.glb --bytes
file: models/Alpaca 3rd Carpal L.glb
size: 4796924
key: MD5E-s4796924--be3ded304e622cd48b5b12bd9b372781.glb
present: false

The size exactly matches what came out of curl:

$ curl -I https://3dviewer.sites.carleton.edu/carcas/carcas-models/models/Alpaca%203rd%20Carpal%20L.glb
HTTP/2 200 
last-modified: Tue, 12 Mar 2024 02:08:13 GMT
accept-ranges: bytes
content-length: 4796924
date: Tue, 23 Apr 2024 00:28:36 GMT
server: Apache

Could I have possibly turned off the web remote, or screwed it up in some way? I did mess around with remotes a bit at first when I thought that I wanted a special remote, but I don't think I would have tried turning off web.

$ git annex get models/Alpaca\ 3rd\ Carpal\ L.glb --debug
[2024-04-22 19:33:59.968907743] (Utility.Process) process [9367] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","ls-files","--stage","-z","--error-unmatch","--","models/Alpaca 3rd Carpal L.glb"]
[2024-04-22 19:33:59.969844739] (Utility.Process) process [9368] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch-check=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2024-04-22 19:33:59.970314738] (Utility.Process) process [9369] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
[2024-04-22 19:33:59.971085644] (Utility.Process) process [9370] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","git-annex"]
[2024-04-22 19:33:59.97338678] (Utility.Process) process [9370] done ExitSuccess
[2024-04-22 19:33:59.974190904] (Utility.Process) process [9371] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","show-ref","--hash","refs/heads/git-annex"]
[2024-04-22 19:33:59.976747211] (Utility.Process) process [9371] done ExitSuccess
[2024-04-22 19:33:59.977728415] (Utility.Process) process [9372] read: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","log","refs/heads/git-annex..e3c6b42398e7576de87babd34a528bffb7be7874","--pretty=%H","-n1"]
[2024-04-22 19:33:59.981026865] (Utility.Process) process [9372] done ExitSuccess
[2024-04-22 19:33:59.982053512] (Utility.Process) process [9373] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch=%(objectname) %(objecttype) %(objectsize)","--buffer"]
get models/Alpaca 3rd Carpal L.glb [2024-04-22 19:33:59.986940175] (Utility.Process) process [9375] chat: git ["--git-dir=.git","--work-tree=.","--literal-pathspecs","-c","annex.debug=true","cat-file","--batch"]
(from web...) 
  no known url

  Unable to access these remotes: web

  Maybe add some of these git remotes (git remote add ...):
        095e299d-037e-4172-87e0-bbd7183a6613 -- CARCAS models on the 3dviewers server

  (Note that these git remotes have annex-ignore set: origin)
failed
[2024-04-22 19:34:00.017577838] (Utility.Process) process [9373] done ExitSuccess
[2024-04-22 19:34:00.017754926] (Utility.Process) process [9369] done ExitSuccess
[2024-04-22 19:34:00.017829005] (Utility.Process) process [9368] done ExitSuccess
[2024-04-22 19:34:00.017884599] (Utility.Process) process [9367] done ExitSuccess
[2024-04-22 19:34:00.01875656] (Utility.Process) process [9375] done ExitSuccess
get: 1 failed

I added the URLs using a bash script, since I wanted to set up a process that will be super easy for future contributors to this project. It first checks if each model already has a URL (by seeing if the output of git annex whereis model-name.glb contains the string 'web'), and then for the ones that don't, creates a temporary file with the names and specific part of the URL. That piece of the script is

# Print "model" to the top of the file
echo "model,model_no_spaces" > temp_models_missing_urls.csv
# Add each element of "models_missing_urls" to the file on a new line
for model in "${models_missing_urls[@]}"; do
  echo "$model,${model// /%20}" >> temp_models_missing_urls.csv
done

Then when I finally run the addurls piece, I use this snippet (enable-url-tools is a directory inside my dataset carcas-models where this script lives)

datalad addurls enable-url-tools/temp_models_missing_urls.csv 'https://3dviewer.sites.carleton.edu/carcas/carcas-models/models/{model_no_spaces}' 'models/{model}' --message "Adding URLs for models so that they can be downloaded from the web from the server"

If more of the bash script would help, I can certainly put more in here.

@watson-e-and
Copy link
Author

@yarikoptic I'm running Linux, I believe one of the Ubuntu LTS's on the server, and Fedora 39 on my personal laptop. I'd like this to work on Windows, but so far there has not been a Windows machine involved. Isn't the Windows separator \, not /? I believe that I have \ in the path as an escape character for spaces, I've been using my terminal's tab autocomplete for filenames.

In case it's still relevant, it looks like running git annex get --debug "models/Alpaca 3rd Carpal L.glb" gives me the exact same output as with the escaped path.

@mslw
Copy link
Contributor

mslw commented Apr 23, 2024

Thanks for the additional details. Again, everything for git annex info and curl -I looks right to me: just as you say, the size matches (and the MD5 cheksum also matches the one I get when I download the file).

I also have no concerns about the addurls call or format (side note: if you find working with json output easier, you can also use git annex whereis --json).

Which brings us to git annex --debug get. For me, the first 11 lines, until "(from web...)", look basically the same, and then it follows with the GET request, while yours only reports "no known url" and "Unable to access these remotes: web" - as if it gave up before even trying. Which brings us to the starting point.

Regarding messing around with remotes, you can check git annex info and git annex info web for anything suspicious. However, I don't see how the web remote could be turned off in a way that would produce the behavior above (doesn't mean it can't be done, but I tried annex.ignore and untrust and neither produced the same effects).

Here's my minimal attempt at a reproducer (reduced to just git-annex calls, and just the one file):

❱ mkdir foo; cd foo; git init; git annex init
❱ git annex addurl --file alpaca.glb https://3dviewer.sites.carleton.edu/carcas/carcas-models/models
❱ git annex drop alpaca.glb
❱ git annex get alpaca.glb

Maybe you could try that (incl. creating a fresh repo) on your laptop - if it fails, it would point to a more general problem, if it works it would point to a repo-specific problem, but I am out of ideas 😕

@mslw
Copy link
Contributor

mslw commented Apr 23, 2024

FTR, I noticed that this is crossposted from https://git-annex.branchable.com/forum/How_to_allow_clones_to_get_files_via_URL__63__/ where it got no answers - that's perfectly fine, but good to keep track of :)

@yarikoptic
Copy link
Member

@watson-e-and you are totally right about / vs \ -- I guess I was tired ;-)

FWIW I have tried to replicate with independent minimal reproducer -- but failed
❯ mkdir d; cd d; git init; git annex init "CARCAS models on the 3dviewers server"; mkdir models; git annex addurl --file 'models/Alpaca 3rd Carpal L.glb' 'https://3dviewer.sites.carleton.edu/carcas/carcas-models/models/Alpaca%203rd%20Carpal%20L.glb' ; git annex drop *; git annex get *
Initialized empty Git repository in /tmp/d/.git/
init CARCAS models on the 3dviewers server ok
(recording state in git...)
addurl https://3dviewer.sites.carleton.edu/carcas/carcas-models/models/Alpaca%203rd%20Carpal%20L.glb 
(to models/Alpaca 3rd Carpal L.glb) ok
(recording state in git...)
drop models/Alpaca 3rd Carpal L.glb ok
(recording state in git...)
get models/Alpaca 3rd Carpal L.glb (from web...) 
ok                                   
(recording state in git...)
❯ echo $?
0

if you could share that resultant git/git-annex repo (could be private) or at least that temp_models_missing_urls.csv file so we could reproduce fully -- hopefully we get more info

Also given that you have git-annex 10.20230626-g8594d49 -- did you try more recent release?

FWIW -- I had a nearby 10.20230626+git13-g029d12815c-1~ndall+1 and with that my minimal reproducer also did not have any trouble.

@watson-e-and
Copy link
Author

This is exciting, I tried making a minimal example involving both the server and my own laptop, and everything worked exactly as expected. I set up the dataset, added a model, added a url to the model using datalad addurls and pushed to Github. I cloned the dataset on my own laptop, confirmed that I did not get the model, ran git annex whereis to see that everything looked the same as before, and then tried datalad get. It worked like a charm.

It seems like then that I've eliminated software as the source of the error. Since the history of my dataset isn't that important to the project, my instinct is to start from scratch and see if I still have issues with the URLs.

  • If so, back to debugging, maybe making datasets with intermediate levels of complexity somehow to narrow down the issue.
  • If not, I'd be willing to believe that I did something funky along the line. Is my use case (dataset on a server, server provides URLs to download the large files) something common enough that there should be some sort of article in the handbook explaining how to do this correctly? I scoured the handbook, user manual, and the internet looking for how to do this, but it was very much not clear.

About cross posts, yep, this is a cross post both from git annex's forum and from Datalad's help forum at Neurostars. I was reluctant to post an issue here because I suspected the problem was more a user error than a software bug. I plan on putting an update to each of those once my problem is resolved, or at least linking to this issue.

@yarikoptic
Copy link
Member

The beauty of git-annex is that "secret sauce" for its functioning is really just a bunch of text files within git-annex branch. Do you still have that other "not working" repository? You/we could compare content of git-annex branch there with what you have in a "working" one. It might be that somehow those urls were assigned to some other non web remote or some other difference which you could potentially simply find via something like git diff notworking/git-annex..working/git-annex where notworking and working would be two remotes you add to some new empty git repository pointing to those two instances ;)

@mslw
Copy link
Contributor

mslw commented Apr 24, 2024

Is my use case (dataset on a server, server provides URLs to download the large files) something common enough that there should be some sort of article in the handbook explaining how to do this correctly? I scoured the handbook, user manual, and the internet looking for how to do this, but it was very much not clear.

It's a much less publicized resource, but our lab has a note on that in the knowledge that we started a while ago: Create a DataLad dataset from a published collection of files. There is a small difference in starting points -- it looks like you add URLs to files already present in the dataset, while the note uses addurls to build the dataset from scratch -- both equally valid approaches.

@watson-e-and
Copy link
Author

@yarikoptic I'm not totally sure what you mean about comparing the differences. How can both be remotes? Should I make a third repository, and if so, do I install the working and non-working copies as subdatasets? That doesn't seem right...

I'm also happy to go digging around in the files if there's somewhere that explicitly has the settings for different remotes in git annex in a text file. I haven't found one yet, but it seems like there should be one.

@yarikoptic
Copy link
Member

How can both be remotes?

Just like that

modi repo3; cd repo3; git init
git remote add --fetch remote1 location1
git remote add --fetch remote2 location2
git diff remote1/git-annex remote2/git-annex

;-) so yes, third repository but not as sub datasets but as remotes

@watson-e-and
Copy link
Author

@mslw Thanks for that link, I wish I had stumbled across it earlier! What I want to do does seem quite simple and straightforward once you know the tools that are needed.

If I have time and/or the support of the rest of the team for this project, I might explore if this is something that could make a good Use Case in the Datalad Handbook. People interested in digital humanities or other collaborative projects with 3D models might want to replicate this workflow of datalad + github + web server for local development of better features with the model viewer, and I believe that the project I'm working on is in part intended as an exemplar. It would certainly be on the simpler side, but I think it could be worth it.

@watson-e-and
Copy link
Author

@yarikoptic I'm having trouble since I don't have enough disk space on my server to make a full copy there, and I'm trying to see if there's a non-annoying way to get all 59 of the models onto my laptop to build a working copy locally.

This is based on the understanding that I need to use a working copy that has the same files in it to get good results out of git diff ..., is that right? Otherwise I'll just get a bunch of garbage about some files existing in the bad copy but not the other.

As I'm typing this, I'm realizing that it can't hurt to try.

@watson-e-and
Copy link
Author

Ok, it might have to wait a couple days for me to get a full example working locally so that I can get less cluttered results from git diff. What I did get is not super helpful.

I got a bunch of outputs that look like this, which I assume is from all my 58 models being present as broken symlinks in the not working repo, and not present at all in the working repo. I'm pretty confident, since I can see .glb in there, and that's the filetype of my 3D models.

@@ -1 +0,0 @@
-1712343130.544668403s 1 095e299d-037e-4172-87e0-bbd7183a6613
diff --git a/cf9/5c8/MD5E-s5376128--aa97a9b2216adc76b54fcafd7b436155.glb.log b/cf9/5c8/MD5E-s5376128--aa97a9b2216adc76b54fcafd7b436155.glb.log
deleted file mode 100644
index 314e09b..0000000
--- a/cf9/5c8/MD5E-s5376128--aa97a9b2216adc76b54fcafd7b436155.glb.log
+++ /dev/null

There's a handful of more interesting ones, and of course I might have missed some useful ones.

This one seems to be about a . glb files as well, but it has a lot more than the others. It also seems to reference the remotes.

@@ -1,2 +1,3 @@
-1712025774.459464303s 1 00000000-0000-0000-0000-000000000001
-1712025774.863101388s 1 095e299d-037e-4172-87e0-bbd7183a6613
+1713909728.409842739s 1 00000000-0000-0000-0000-000000000001
+1713909325.17748794s 1 0059bbca-aaa1-4b98-98b0-dd18c9262c2f
+1714442429.484049046s 1 91a0a22b-b949-464c-b9e5-90c14acfb1aa
diff --git a/955/f4f/MD5E-s17873032--830142c49477a63ed42adb5d31371879.glb.log.met b/955/f4f/MD5E-s17873032--830142c49477a63ed42adb5d31371879.glb.log.met
index 64380a6..ffa2e16 100644
--- a/955/f4f/MD5E-s17873032--830142c49477a63ed42adb5d31371879.glb.log.met
+++ b/955/f4f/MD5E-s17873032--830142c49477a63ed42adb5d31371879.glb.log.met

There's also some that seem to be referencing the datalad run command that I used to attach the URLs in the first place.

@@ -1 +0,0 @@
-1712022791.712911434s model +!QWxwYWNhIDNyZCBDYXJwYWwgTC5nbGI= model_no_spaces +Alpaca%203rd%20Carpal%20L.glb
diff --git a/947/4f9/MD5E-s4796924--be3ded304e622cd48b5b12bd9b372781.glb.log.web b/947/4f9/MD5E-s4796924--be3ded304e622cd48b5b12bd9b372781.glb.log.web
deleted file mode 100644
index 907ac8f..0000000
--- a/947/4f9/MD5E-s4796924--be3ded304e622cd48b5b12bd9b372781.glb.log.web
+++ /dev/null

In these last two, you can clearly see the remnants of my first bad attempt at adding the URLs by creating a remote called 'serverweb'. I tried to delete it when I realized that I wanted the command addurls, but clearly I wasn't successful. I didn't think that it was impacting things, but maybe that's the source of my problem if I can fully delete it. It's also a little hard to read, but the long path is referencing the local copy of the repo, serverweb is the failed remote, and "CARCAS models on the 3dviewers server" is the copy where I created the original, which shouldn't be accessible here because I cloned from Github.

@@ -1 +0,0 @@
-b54a211a-da06-4249-9097-e886830107e9 name=serverweb type=web urlinclude=https://3dviewer.sites.carleton.edu/carcas/carcas-models/* timestamp=1711812950.871969098s
diff --git a/uuid.log b/uuid.log
index 2dc0116..00a49ae 100644
--- a/uuid.log
+++ b/uuid.log
@@ -1,3 +1,2 @@
-095e299d-037e-4172-87e0-bbd7183a6613 CARCAS models on the 3dviewers server timestamp=1710209225.933348033s
-852c74e6-2a7e-4882-b567-fdfd3ed52c15 erin@fedora:~/Documents/Code-Projects/Datalad-Tutorial/carcas-models timestamp=1714442484.637960641s
-b54a211a-da06-4249-9097-e886830107e9 serverweb timestamp=1711812950.865757662s
+0059bbca-aaa1-4b98-98b0-dd18c9262c2f server test of URLs timestamp=1713908981.296630467s
+91a0a22b-b949-464c-b9e5-90c14acfb1aa erin@fedora:~/Documents/Code-Projects/Datalad-Tutorial/test-urls timestamp=1714442387.246039754s

I also discovered that git annex will give you a little more information on remotes if you run git annex info $remote.
Working repo:

trusted repositories: 0
semitrusted repositories: 4
        00000000-0000-0000-0000-000000000001 -- web
        00000000-0000-0000-0000-000000000002 -- bittorrent
        0059bbca-aaa1-4b98-98b0-dd18c9262c2f -- server test of URLs
        91a0a22b-b949-464c-b9e5-90c14acfb1aa -- erin@fedora:~/Documents/Code-Projects/Datalad-Tutorial/test-urls [here]
untrusted repositories: 0
transfers in progress: none
available local disk space: 346.05 gigabytes (+100 megabytes reserved)
local annex keys: 1
local annex size: 17.87 megabytes
annexed files in working tree: 1
size of annexed files in working tree: 17.87 megabytes
bloom filter size: 32 mebibytes (0% full)
backend usage: 
        MD5E: 1

Not working repo:

trusted repositories: 0
semitrusted repositories: 5
        00000000-0000-0000-0000-000000000001 -- web
        00000000-0000-0000-0000-000000000002 -- bittorrent
        095e299d-037e-4172-87e0-bbd7183a6613 -- CARCAS models on the 3dviewers server
        852c74e6-2a7e-4882-b567-fdfd3ed52c15 -- erin@fedora:~/Documents/Code-Projects/Datalad-Tutorial/carcas-models [here]
        b54a211a-da06-4249-9097-e886830107e9 -- serverweb
untrusted repositories: 0
transfers in progress: none
available local disk space: 346.06 gigabytes (+100 megabytes reserved)
local annex keys: 0
local annex size: 0 bytes
annexed files in working tree: 58
size of annexed files in working tree: 829.71 megabytes
bloom filter size: 32 mebibytes (0% full)
backend usage: 
        MD5E: 58

The big differences I'm seeing are

  • not working repo has additional remote serverweb, which I should figure out how to delete for real.
  • working repo has 1 trusted annex key, while the not working repo has none
  • they use a different backend.

Do any of these look like the source of the issue?

@yarikoptic
Copy link
Member

in working one you seems to have just 1 key total which is odd. Isn't there the key for that "models/Alpaca 3rd Carpal L.glb"? what is it? check for diff on that key.

is it private data/urls? if not, let me repeat request:

if you could share that resultant git/git-annex repo (could be private) or at least that temp_models_missing_urls.csv file so we could reproduce fully -- hopefully we get more info

@watson-e-and
Copy link
Author

@yarikoptic Oops, I lost track of your request. Yes, the repository is public. Here's the link: https://github.com/DigitalCarleton/carcas

@yarikoptic
Copy link
Member

that one has no models /Alpaca\ 3rd\ Carpal\ L.glb from original post, and no enable-url-tools/ with file for addurls... nothing git-annex'ed. I thought to get access to that original 'broken' one.

@watson-e-and
Copy link
Author

My apologies, this is the right repository, it's just that I didn't also link to the subdataset where the problems really are. I forgot that the link doesn't work on Github.

If you recursively clone the repository at the link I gave you, you should be able to see the models folder, and the enable-url-tools folder. Or, here's the direct link to the subdataset with the problems: https://github.com/DigitalCarleton/carcas-models

@watson-e-and
Copy link
Author

I now have a working version at [https://github.com/DigitalCarleton/carcas]. Creating everything from scratch, it worked perfectly.

Things I did differently

  • I didn't touch special remotes this time. I don't need that functionality, but I had originally thought it was the solution before I discovered datalad addurls.
  • I no longer have a nested subdataset. Everything is in one Datalad dataset, which I think is a better option for my project to reduce complexity for contributors in the future.

If I have time, I'll look into what went wrong by trying to compare this working version with the non-working version.

@yarikoptic
Copy link
Member

I will then choose this issue for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants