Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot sync every repo's DBs #92

Closed
afonsofrancof opened this issue Sep 11, 2023 · 13 comments · Fixed by #103
Closed

Cannot sync every repo's DBs #92

afonsofrancof opened this issue Sep 11, 2023 · 13 comments · Fixed by #103

Comments

@afonsofrancof
Copy link

Hello. I have been getting this error for quite some time.
If I point my repos to pacoloco and then point pacoloco to my mirror list, only some of the DBs get downloaded.
On the other DBs I get:
pacoloco.go:168: repo archlinux has no urls

docker-compose.yml

---
version: "3.8"
services:
  pacoloco:
    container_name: pacoloco
    image: ghcr.io/anatol/pacoloco
    ports:
      - "9129:9129"
    volumes:
      - /pacoloco/cache:/var/cache/pacoloco
      - /pacoloco/pacoloco.yaml:/etc/pacoloco.yaml
      - /etc/pacman.d/reflector-mirrorlist:/etc/mirrorlist
    restart: unless-stopped
    environment:
      - TZ=Europe/Lisbon

/pacoloco/pacoloco.yaml (mapped to /etc/pacoloco.yaml inside the container)

port: 9129
cache_dir: /var/cache/pacoloco
purge_files_after: 360000
download_timeout: 3600
repos:
  archlinux:
    mirrorlist: /etc/mirrorlist
user_agent: Pacoloco/1.2
prefetch:
  cron: 0/30 * * * * 
  ttl_unaccessed_in_days: 30 
  ttl_unupdated_in_days: 300 

/etc/pacman.d/reflector-mirrorlist (mapped to /etc/mirrorlist inside the container)

Server = https://mirrors.celianvdb.fr/archlinux/$repo/os/$arch
Server = https://archlinux.mailtunnel.eu/$repo/os/$arch
Server = https://mirror.theo546.fr/archlinux/$repo/os/$arch
Server = https://mirror.ubrco.de/archlinux/$repo/os/$arch
Server = https://mirror.sunred.org/archlinux/$repo/os/$arch
Server = https://mirror.cyberbits.eu/archlinux/$repo/os/$arch
Server = https://packages.oth-regensburg.de/archlinux/$repo/os/$arch
Server = https://mirror.pseudoform.org/$repo/os/$arch
Server = https://de.arch.mirror.kescher.at/$repo/os/$arch
Server = https://arch.jensgutermuth.de/$repo/os/$arch
Server = https://arch.unixpeople.org/$repo/os/$arch
Server = https://mirror.iusearchbtw.nl/$repo/os/$arch
Server = https://mirror.f4st.host/archlinux/$repo/os/$arch
Server = https://mirrors.xtom.de/archlinux/$repo/os/$arch
Server = https://ftp.halifax.rwth-aachen.de/archlinux/$repo/os/$arch
Server = https://mirrors.janbruckner.de/archlinux/$repo/os/$arch
Server = https://mirror.cmt.de/archlinux/$repo/os/$arch
Server = https://mirrors.n-ix.net/archlinux/$repo/os/$arch
Server = https://arch.phinau.de/$repo/os/$arch
Server = https://archlinux.thaller.ws/$repo/os/$arch

/etc/pacman.d/mirrorlist
Server = http://localhost:9129/repo/archlinux/$repo/os/$arch

/etc/pacman.conf
(the important parts)

SigLevel    = Required DatabaseNever
LocalFileSigLevel = Optional

[core]
Include = /etc/pacman.d/mirrorlist

[extra]
Include = /etc/pacman.d/mirrorlist

[multilib]
Include = /etc/pacman.d/mirrorlist

Pacoloco logs (in this case core.db isn't being downloaded)

pacoloco.go:168: repo archlinux has no urls

downloader.go:102: downloading https://mirrors.celianvdb.fr/archlinux//extra/os/x86_64/extra.db

downloader.go:102: downloading https://mirrors.celianvdb.fr/archlinux//multilib/os/x86_64/multilib.db

pacoloco.go:270: serving cached file for archlinux/multilib/os/x86_64/multilib.db

As you can see, it says repo archlinux has no urls

Pacman logs

:: Synchronizing package databases...
 core.db failed to download
 extra is up to date
 multilib is up to date
 chaotic-aur is up to date
error: failed retrieving file 'core.db' from localhost:9129 : The requested URL returned error: 404
error: failed to synchronize all databases (failed to retrieve some files)

It all works if I specify individual repos for each arch repo inside the pacoloco config, like this:

repos:
  core:
    mirrorlist: /etc/mirrorlist
  extra:
    mirrorlist: /etc/mirrorlist
  multilib:
    mirrorlist: /etc/mirrorlist

and then change my /etc/pacman.conf to have this instead

[core]
Server = http://localhost:9129/repo/core/$repo/os/$arch

[extra]
Server = http://localhost:9129/repo/extra/$repo/os/$arch

[multilib]
Server = http://localhost:9129/repo/multilib/$repo/os/$arch

Thanks :)

@afonsofrancof afonsofrancof changed the title Cannot get all DBs Cannot sync every repo's DBs Sep 11, 2023
@afonsofrancof
Copy link
Author

Tried one more sync without changing my setup and now multilib isn't syncing.

downloader.go:336: repo archlinux has no urls
pacoloco.go:168: repo archlinux has no urls
downloader.go:102: downloading https://mirrors.celianvdb.fr/archlinux//extra/os/x86_64/extra.db
pacoloco.go:270: serving cached file for archlinux/extra/os/x86_64/extra.db

@afonsofrancof
Copy link
Author

And you can notice that it is adding an extra slash after the url before $repo.
downloader.go:102: downloading https://mirrors.celianvdb.fr/archlinux//extra/os/x86_64/extra.db

It reads archlinux// , instead of archlinux /

(Not that it affects the end result)

@afonsofrancof
Copy link
Author

No changes in config and have this now, maybe related?

downloader.go:102: downloading https://mirror.f4st.host/archlinux//multilib/os/x86_64/multilib.db
downloader.go:68: unable to download file archlinux/multilib/os/x86_64/multilib.db: receiving file 
https://mirror.f4st.host/archlinux//multilib/os/x86_64/multilib.db: 
Content-Length is 146416 while received body length is 1903408

@anatol
Copy link
Owner

anatol commented Sep 12, 2023

Okay it looks here are several issues reported.

pacoloco.go:168: repo archlinux has no urls

@Focshole had this issue before. @afonsofrancof could it be some race condition between pacoloco and reflector changing the file?

it reads archlinux// , instead of archlinux /

it should be fixed in master, could you please pull the recent changes and build pacoloco from sources?

Content-Length is 146416 while received body length is 1903408

This error is weird. Do you always see this issue or it just sporadic?

@Focshole
Copy link
Contributor

I had that issue when pacoloco mirrorlist was unreadable or empty for some reason. Is that file readable from the container?

@afonsofrancof
Copy link
Author

afonsofrancof commented Sep 12, 2023 via email

@afonsofrancof
Copy link
Author

@afonsofrancof could it be some race condition between pacoloco and reflector changing the file?

I don't have reflector running anymore, that was the name of the file because I had it running before. And it would make no sense since it works if I separate the repos.

This error is weird. Do you always see this issue or it just sporadic?

Only sometimes, I have no idea what conditions cause it.

@Orochimarufan
Copy link
Contributor

I think I've run into the same issue. It seems to happen when the mirrorlist file is bind-mounted directly.

Bind-mounting the parent directory instead (/etc/pacman.d:/etc/pacman.d:ro instead of /etc/pacman.d/mirrorlist:/etc/pacman.d/mirrorlist:ro) appears to have fixed it for me.

No idea why pacoloco chokes on bind-mounted files though...

@chennin
Copy link

chennin commented Nov 24, 2023

I'm having this problem without using a docker bind mount. Is there a debug log option / environment variable?

Also, how do I tell what version of pacoloco I am running? Besides inspecting the docker sha256 digest which isn't very friendly?

Repro below.

Version:

# docker inspect --format='{{index .RepoDigests 0}}' e5bcb0215eaa
ghcr.io/anatol/pacoloco@sha256:b93e352f8c4d34494df208158a05d5487da03d847467406ab35132197e9a2e9d

docker-compose.yaml:

  pacoloco:
    image: ghcr.io/anatol/pacoloco
    container_name: prod-pacoloco
    restart: unless-stopped
    user: "122:1999" # pacoloco:pacoloco
    environment:
      TZ: "UTC"
    ports:
      - "127.0.0.1:9129:9129"
    volumes:
      - 'pacoloco:/var/cache/pacoloco'
      - '/root/etc-docker/pacoloco/pacoloco.yaml:/etc/pacoloco.yaml:ro'
      - '/root/etc-docker/pacoloco:/data:ro'
      - '/etc/passwd:/etc/passwd:ro'
      - '/etc/group:/etc/group:ro'
      - '/etc/localtime:/etc/localtime:ro'
      - '/etc/timezone:/etc/timezone:ro'
    logging:
      <<: *logging

I have nginx proxying to pacoloco. That part works when pacoloco does.

pacoloco.yaml:

purge_files_after: 1814400 # 21 days
download_timeout: 3600 # download will timeout after 3600 seconds
repos:
#  archlinux:
#    urls:
#      - http://mirror.lty.me/archlinux
#      - http://mirrors.kernel.org/archlinux
  archlinux:
    mirrorlist: /data/mirrorlist
prefetch: # optional section, add it if you want to enable prefetching
  cron: 41 5 * * *
  ttl_unaccessed_in_days: 30  # defaults to 30, set it to a higher value than the number of consecutive days you don't update your systems
  # It deletes and stop prefetch packages(and db links) when not downloaded after ttl_unaccessed_in_days days that it had been updated.
  ttl_unupdated_in_days: 300 # defaults to 300, it deletes and stop prefetch packages which hadn't been either updated upstream or requested for ttl_unupdated_in_days.
set_timestamp_to_logs: true

pacoloco can read the file:

~# docker-compose exec pacoloco sh -c "id; tail -n1 /data/mirrorlist"
uid=122(pacoloco) gid=1999(pacoloco) groups=1999(pacoloco)
Server = https://mirror.rackspace.com/archlinux/$repo/os/$arch

Logs:

prod-pacoloco          | 2023/11/24 16:26:30 repo archlinux has no urls
prod-pacoloco          | 2023/11/24 16:26:30 repo archlinux has no urls
prod-pacoloco          | 2023/11/24 16:26:30 downloading https://mirrors.mit.edu/archlinux//extra/os/x86_64/extra.db
prod-pacoloco          | 2023/11/24 16:26:30 serving cached file for archlinux/extra/os/x86_64/extra.db
prod-pacoloco          | 2023/11/24 16:26:30 unable to download file archlinux/extra/os/x86_64/extra.db.sig

Client logs:

:: Synchronizing package databases...
 core.db failed to download
 extra is up to date
 multilib.db failed to download
error: failed retrieving file 'core.db' from arch.<server>.net : The requested URL returned error: 404
error: failed retrieving file 'multilib.db' from arch.<server>.net : The requested URL returned error: 404
error: failed to synchronize all databases (unexpected error)

Reproduction

For me this repro reliably prints errors, but which one fails (core or extra) is switches.

docker network create paco-test; \
docker stop pacoloco-test; \
docker rm pacoloco-test; \
mkdir -p /tmp/pactest && cd /tmp/pactest && \
{ cat >pacoloco.yaml <<EOF
repos:
  archlinux:
    mirrorlist: /data/pacoloco-mirrorlist
cache_dir: /tmp
EOF
} && \
{ cat >pacoloco-mirrorlist <<EOF
Server = https://mirrors.mit.edu/archlinux/\$repo/os/\$arch
Server = https://arch.mirror.constant.com/\$repo/os/\$arch
EOF
} && \
{ cat >arch-mirrorlist <<EOF
Server = http://pacoloco-test:9129/repo/archlinux/\$repo/os/\$arch
EOF
} && \
docker run -d --network paco-test --name pacoloco-test -v "$PWD:/data" -v "$PWD/pacoloco.yaml:/etc/pacoloco.yaml" ghcr.io/anatol/pacoloco@sha256:b93e352f8c4d34494df208158a05d5487da03d847467406ab35132197e9a2e9d && \
docker run --network paco-test --rm -it --name archlinux-test -v "$PWD/arch-mirrorlist:/etc/pacman.d/mirrorlist" archlinux:latest pacman -Syuv --noconfirm; \
echo -e  "\n^^ ARCH ^^\nvv PACOLOCO vv\n" && \
docker logs pacoloco-test

Repro cleanup:

docker stop archlinux-test pacoloco-test ; docker rm archlinux-test pacoloco-test; 
docker network rm paco-test; 
cd /tmp; rm -rf /tmp/pactest

@chennin
Copy link

chennin commented Nov 26, 2023

Limiting pacoloco to 1 CPU seems to SOMETIMES solve the problem for me.

In my docker case that's docker run -d --cpuset-cpus=1 ... or

services:
  pacoloco:
    image: ghcr.io/anatol/pacoloco
    cpuset: "1"

Edited to add: It worked for a while, but now I am getting "has no urls" and a 404 on core.db, on a container that has been running with 1 cpu for a few days.

anatol pushed a commit that referenced this issue Mar 14, 2024
Fix #92 by locking access to the Mirrorlist's Urls during refresh.

I had the same problem on my deployed pacoloco and also successfully reproduced the issue using @chennin's script. Using this PR, both were unable to reproduce the error on my system.

I'm new to go and it's concurrency model, but my guess is that the repo archlinux has no urls error happens due to a race condition in getMirrorlistURLs():
While the Mirrorlist File is read for the first time (e.g. because of a request for 'core.db') the URLs array is still empty but LastMirrorlistCheck is already set. When in this moment the function is called concurrently (e.g. because of a parallel download request for 'extra.db') it returns the still empty URLs array due to a recent LastMirrorlistCheck, instead of waiting for it to be populated.

I used a mutex instead of setting the LastMirrorlistCheck after URLs is populated to block multiple goroutines from reading and parsing the same Mirrorlist simultaneously.
However this doesn't fix any possible racing with reflector or similar.

Closes #92
@Focshole
Copy link
Contributor

Sorry for bumping this old thread, but I just got into this issue too. The issue is still present and it is with this section:

serving cached file for archlinux/multilib/os/x86_64/multilib.db

Pacoloco fetches an old .db file that somehow got left in cache. This has to be removed and not served anymore. I fixed it in my installation by removing .db files from cache. I guess it is something I forgot to cleanup after caching maybe!

@krameler
Copy link
Contributor

Did you change the mirrorlist between requests of the .db file, or did you see any connection errors, or how is this related to the closed issue?

For .db-files pacoloco always connects to a mirror and does a "If-Modified-Since"-Check, so the file being in cache shouldn't be a problem.

@anatol
Copy link
Owner

anatol commented May 28, 2024

@Focshole does this PR related to your issue by the chance #109 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants