Skip to content

Use wget/zip instead of Git #95

Closed
LordMike opened this Issue May 20, 2013 · 15 comments

6 participants

@LordMike

I was just wondering, why not use wget to fetch the firmware?
It is possible, with Github, to download the entire repository as a .zip file

F.ex. here
https://codeload.github.com/Hexxeh/rpi-firmware/zip/master

Or here, for a specific revision
https://codeload.github.com/Hexxeh/rpi-firmware/zip/d5b05be2147bf5dc0137798837af24b0bbbe398d

Then you won't need to clone the repository with history and all, but instead just need to unzip the file.

@popcornmix
Collaborator

The rpi-update <hash>
form does download the tar.gz archive. You can use that with the latest hash to do an rpi-udpate. E.g.
sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

Might be worth timing the two cases to see if it's significantly faster.

(if it says you are up to date, "sudo rm /boot/.firmware_revision" will force it to update).

@GrmpCerber

Well from the test I ran, I understand that git clone ... --depth=1 clones the last commit and it's parent, which might end, in the case of heavily random binary files, in downloading twice the size needed.
In such assumption, LordMike's solution would spare us quite a few bytes ...

Sidenote : It took me a loooong time to figure out the --depth=1 trick as :

  • I'm not "git-fluent"
  • it is not highlighted in @churten comment in #64
  • it's not commented in the code

;)

@LordMike
LordMike commented Jun 5, 2013

Theres a "depth" parameter?
I did not know that - but still, why use Git at all?

@popcornmix
Collaborator

Why? Because it gives us complete history.

You think start.elf had a regression that started about 3 months ago:
https://github.com/Hexxeh/rpi-firmware/commits/master/start.elf

you can download any previous version and see the history of changes.
There would be a lot of storage, bandwidth and infrastructure costs to create our own system that does this.

@GrmpCerber

Well Popcornmix I must admit I think that, in my opinion, LordMike is right : most users won't do that.
(At least Locally) to check something like that I would naturally go and check directly on git-hub.

Beside the script is not fit for taking advantage of a full local history because of #64

@popcornmix
Collaborator

@GrmpCerber
I think you misunderstood my point. I'm saying that github provides a web interface to browse the history and download files which allows people to find a specific version of a file that caused a regression.
I'm not suggesting that most will use the git command line, or rpi-update for this.

If we didn't use git at all for firmware, we'd have to invent alternate means to achieve this.

@fastcat
fastcat commented Jun 5, 2013

Some observations:

  • Functionally, git clone --depth=1 is equivalent to wget
  • For most users, the limiting factor is going to be internet download bandwidth, not CPU time, even on the Pi
  • Preparing the git clone or the http(s) download takes some time on the server, but one may be faster, esp. e.g. if github caches the .zip for the head revision
  • Having the history available is great, but not needed 99.9% of the time
  • Either the firmware files change dramatically with each build, or git is very poor at binary deltas for these files (Mercurial or SVN might do better at that, but this is github)

Therefore it seems like a few datapoints should be collected, and that can drive whether it would be better to switch to http(s) downloads:

  • Which is faster to start downloading (latency)?
  • Which downloads fewer bytes?
  • Which uses less temp space on the Pi?
  • How many Pi users are installing git-core only to get rpi-update to work?
@popcornmix
Collaborator

github supports downloading as a single archive. See my first post in this thread.

If someone just times a normal rpi-update, and the command I gave in my first post, we'd know how much speed difference there was and if it's worth switching. (I'm sure the archive will be faster).

git-core is tiny, and is preinstalled on latest rasbian image, so it's not a big concern.

@GrmpCerber

With git

$ time strace -e trace=read,write -o git.log git clone https://github.com/Hexxeh/rpi-firmware --depth=1
Cloning into 'rpi-firmware'...
remote: Counting objects: 1712, done.
remote: Compressing objects: 100% (1381/1381), done.
remote: Total 1712 (delta 303), reused 1281 (delta 213)
Receiving objects: 100% (1712/1712), 29.41 MiB | 795 KiB/s, done.
Resolving deltas: 100% (303/303), done.
Checking out files: 100% (1467/1467), done.

real 1m40.600s
user 0m23.930s
sys 0m13.720s

$ egrep 'read|write' git.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
56230551

With wget :

$ time strace -e trace=read,write -o wget.log wget https://codeload.github.com/Hexxeh/rpi-firmware/zip/master
--2013-06-05 16:26:43--  https://codeload.github.com/Hexxeh/rpi-firmware/zip/master
Résolution de codeload.github.com (codeload.github.com)... 204.232.175.86
Connexion vers codeload.github.com (codeload.github.com)|204.232.175.86|:443...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: non spécifié [application/zip]
Sauvegarde en : «master»

    [                             <=>                                                                                    ] 31 687 757   942K/s   ds 43s

2013-06-05 16:27:35 (713 KB/s) - «master» sauvegardé [31687757]


real    0m52.166s
user    0m19.180s
sys     0m25.250s

$ egrep 'read|write' wget.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
28550717

Unzipping wget result :

$time strace -e trace=read,write -o unzip.log unzip master
....
real    0m28.660s
user    0m7.780s
sys     0m11.160s

$ egrep 'read|write' unzip.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
84418077

Which means :

  • Git takes an additional 50s and 26MB I/O over wget
    • but it seems that both transfer the same amount over the network : git states 29.41 MiB and wget 31 687 757
  • wget itself is not enouth and you must spend an additionnal 28s and 80MB I/O to unzip

So, in the end, I think that git wins for the sake of simplicity

@fastcat
fastcat commented Jun 5, 2013

$ egrep 'read|write' git.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
56230551

This is fishy.

$ sudo du -hs /root/.rpi-firmware
132M    /root/.rpi-firmware

I can't see how git could write 132MB of files doing only 56MB of I/O. I think you need to add -f to your strace calls to follow forks and child processes. That's assuming git/wget/unzip don't use mmap for I/O.

Also, for both the unzip and git checkout stages, the reads should be cheap/free, as it the just download data will probably still be in cache.

@notro
notro commented Aug 12, 2013

I have tested the 'git clone' and 'wget' options

Test environment
Raspberry PI B rev. 2
Class 4 SD Card
Internet speed: 25 Mb/s

I tested each option twice in case github server caching would impact the result.
Each test run was done with a fresh 2013-07-26-wheezy-raspbian image.
After first boot I expanded the filesystem with raspi-config.

First git clone test

$ time sudo rpi-update

real    2m56.322s
user    0m56.330s
sys     0m28.010s


$ sudo -i
# du -s .rpi-firmware/
134308  .rpi-firmware/

Second git clone test

$ time sudo rpi-update

real    2m53.571s
user    0m55.340s
sys     0m27.340s

First wget test

$ time sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

real    2m2.446s
user    0m44.130s
sys     0m12.930s

$ sudo -i
# du -s .rpi-firmware/
60840   .rpi-firmware/

Second wget test

$ time sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

real    1m56.877s
user    0m44.200s
sys     0m12.400s
@lurch
lurch commented Aug 28, 2013

I haven't done any timing or comparisons myself, but just a small point to note that sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}') isn't strictly necessary - I've discovered that github.com seems to accept the same "symbolic refs" (or whatever the relevant terminology is) as the git command line, so you can simply run sudo rpi-update HEAD :-)

@popcornmix
Collaborator

Useful.

@lurch
lurch commented Aug 28, 2013

...and this also means that if the latest rpi-update'd firmware prevents your Pi from booting, you can use the offline-update mode of rpi-update on another Linux computer, and ask rpi-update to install the HEAD^ revision, and that'll take you back to the previous firmware revision :-)

@popcornmix
Collaborator

Latest update removes the requirement for git, and all downloading uses curl.
Please test, and report if okay.

@popcornmix popcornmix closed this Apr 17, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.