Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use wget/zip instead of Git #95

Closed
LordMike opened this issue May 20, 2013 · 21 comments
Closed

Use wget/zip instead of Git #95

LordMike opened this issue May 20, 2013 · 21 comments

Comments

@LordMike
Copy link

I was just wondering, why not use wget to fetch the firmware?
It is possible, with Github, to download the entire repository as a .zip file

F.ex. here
https://codeload.github.com/Hexxeh/rpi-firmware/zip/master

Or here, for a specific revision
https://codeload.github.com/Hexxeh/rpi-firmware/zip/d5b05be2147bf5dc0137798837af24b0bbbe398d

Then you won't need to clone the repository with history and all, but instead just need to unzip the file.

@popcornmix
Copy link
Collaborator

The rpi-update <hash>
form does download the tar.gz archive. You can use that with the latest hash to do an rpi-udpate. E.g.
sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

Might be worth timing the two cases to see if it's significantly faster.

(if it says you are up to date, "sudo rm /boot/.firmware_revision" will force it to update).

@GrmpCerber
Copy link
Contributor

Well from the test I ran, I understand that git clone ... --depth=1 clones the last commit and it's parent, which might end, in the case of heavily random binary files, in downloading twice the size needed.
In such assumption, LordMike's solution would spare us quite a few bytes ...

Sidenote : It took me a loooong time to figure out the --depth=1 trick as :

;)

@LordMike
Copy link
Author

LordMike commented Jun 5, 2013

Theres a "depth" parameter?
I did not know that - but still, why use Git at all?

@popcornmix
Copy link
Collaborator

Why? Because it gives us complete history.

You think start.elf had a regression that started about 3 months ago:
https://github.com/Hexxeh/rpi-firmware/commits/master/start.elf

you can download any previous version and see the history of changes.
There would be a lot of storage, bandwidth and infrastructure costs to create our own system that does this.

@GrmpCerber
Copy link
Contributor

Well Popcornmix I must admit I think that, in my opinion, LordMike is right : most users won't do that.
(At least Locally) to check something like that I would naturally go and check directly on git-hub.

Beside the script is not fit for taking advantage of a full local history because of #64

@popcornmix
Copy link
Collaborator

@GrmpCerber
I think you misunderstood my point. I'm saying that github provides a web interface to browse the history and download files which allows people to find a specific version of a file that caused a regression.
I'm not suggesting that most will use the git command line, or rpi-update for this.

If we didn't use git at all for firmware, we'd have to invent alternate means to achieve this.

@fastcat
Copy link
Contributor

fastcat commented Jun 5, 2013

Some observations:

  • Functionally, git clone --depth=1 is equivalent to wget
  • For most users, the limiting factor is going to be internet download bandwidth, not CPU time, even on the Pi
  • Preparing the git clone or the http(s) download takes some time on the server, but one may be faster, esp. e.g. if github caches the .zip for the head revision
  • Having the history available is great, but not needed 99.9% of the time
  • Either the firmware files change dramatically with each build, or git is very poor at binary deltas for these files (Mercurial or SVN might do better at that, but this is github)

Therefore it seems like a few datapoints should be collected, and that can drive whether it would be better to switch to http(s) downloads:

  • Which is faster to start downloading (latency)?
  • Which downloads fewer bytes?
  • Which uses less temp space on the Pi?
  • How many Pi users are installing git-core only to get rpi-update to work?

@popcornmix
Copy link
Collaborator

github supports downloading as a single archive. See my first post in this thread.

If someone just times a normal rpi-update, and the command I gave in my first post, we'd know how much speed difference there was and if it's worth switching. (I'm sure the archive will be faster).

git-core is tiny, and is preinstalled on latest rasbian image, so it's not a big concern.

@GrmpCerber
Copy link
Contributor

With git

$ time strace -e trace=read,write -o git.log git clone https://github.com/Hexxeh/rpi-firmware --depth=1
Cloning into 'rpi-firmware'...
remote: Counting objects: 1712, done.
remote: Compressing objects: 100% (1381/1381), done.
remote: Total 1712 (delta 303), reused 1281 (delta 213)
Receiving objects: 100% (1712/1712), 29.41 MiB | 795 KiB/s, done.
Resolving deltas: 100% (303/303), done.
Checking out files: 100% (1467/1467), done.

real 1m40.600s
user 0m23.930s
sys 0m13.720s

$ egrep 'read|write' git.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
56230551

With wget :

$ time strace -e trace=read,write -o wget.log wget https://codeload.github.com/Hexxeh/rpi-firmware/zip/master
--2013-06-05 16:26:43--  https://codeload.github.com/Hexxeh/rpi-firmware/zip/master
Résolution de codeload.github.com (codeload.github.com)... 204.232.175.86
Connexion vers codeload.github.com (codeload.github.com)|204.232.175.86|:443...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: non spécifié [application/zip]
Sauvegarde en : «master»
    [                             <=>                                                                                    ] 31 687 757   942K/s   ds 43s
2013-06-05 16:27:35 (713 KB/s) - «master» sauvegardé [31687757]
real    0m52.166s
user    0m19.180s
sys     0m25.250s
$ egrep 'read|write' wget.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
28550717

Unzipping wget result :

$time strace -e trace=read,write -o unzip.log unzip master
....
real    0m28.660s
user    0m7.780s
sys     0m11.160s
$ egrep 'read|write' unzip.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
84418077

Which means :

  • Git takes an additional 50s and 26MB I/O over wget
    • but it seems that both transfer the same amount over the network : git states 29.41 MiB and wget 31 687 757
  • wget itself is not enouth and you must spend an additionnal 28s and 80MB I/O to unzip

So, in the end, I think that git wins for the sake of simplicity

@fastcat
Copy link
Contributor

fastcat commented Jun 5, 2013

$ egrep 'read|write' git.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
56230551

This is fishy.

$ sudo du -hs /root/.rpi-firmware
132M    /root/.rpi-firmware

I can't see how git could write 132MB of files doing only 56MB of I/O. I think you need to add -f to your strace calls to follow forks and child processes. That's assuming git/wget/unzip don't use mmap for I/O.

Also, for both the unzip and git checkout stages, the reads should be cheap/free, as it the just download data will probably still be in cache.

@notro
Copy link
Contributor

notro commented Aug 12, 2013

I have tested the 'git clone' and 'wget' options

Test environment
Raspberry PI B rev. 2
Class 4 SD Card
Internet speed: 25 Mb/s

I tested each option twice in case github server caching would impact the result.
Each test run was done with a fresh 2013-07-26-wheezy-raspbian image.
After first boot I expanded the filesystem with raspi-config.

First git clone test

$ time sudo rpi-update

real    2m56.322s
user    0m56.330s
sys     0m28.010s


$ sudo -i
# du -s .rpi-firmware/
134308  .rpi-firmware/

Second git clone test

$ time sudo rpi-update

real    2m53.571s
user    0m55.340s
sys     0m27.340s

First wget test

$ time sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

real    2m2.446s
user    0m44.130s
sys     0m12.930s

$ sudo -i
# du -s .rpi-firmware/
60840   .rpi-firmware/

Second wget test

$ time sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

real    1m56.877s
user    0m44.200s
sys     0m12.400s

@lurch
Copy link
Contributor

lurch commented Aug 28, 2013

I haven't done any timing or comparisons myself, but just a small point to note that sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}') isn't strictly necessary - I've discovered that github.com seems to accept the same "symbolic refs" (or whatever the relevant terminology is) as the git command line, so you can simply run sudo rpi-update HEAD :-)

@popcornmix
Copy link
Collaborator

Useful.

@lurch
Copy link
Contributor

lurch commented Aug 28, 2013

...and this also means that if the latest rpi-update'd firmware prevents your Pi from booting, you can use the offline-update mode of rpi-update on another Linux computer, and ask rpi-update to install the HEAD^ revision, and that'll take you back to the previous firmware revision :-)

@popcornmix
Copy link
Collaborator

Latest update removes the requirement for git, and all downloading uses curl.
Please test, and report if okay.

@xyd945
Copy link

xyd945 commented Apr 27, 2017

when I use wget down a compressed file, e.g., tar, tar.gz or zip file, I failed to un-zip them. Error is: Can't extract files from the archive, you missed the archive name!

any idea about that???

@popcornmix
Copy link
Collaborator

Post the exact commands you entered and errors reported and we may be able to help.

@xyd945
Copy link

xyd945 commented Apr 29, 2017

I use Raspian, the OS in my raspberry pi. Compressed a folder to tar.gz file, and then I uploaded it to my github repository in the master branch.

Then I click the tar.gz file, and copy the link in the address bar, and use wget http://aaa/aa/aaa/file.tar.gz, it will download the file into my raspberry pi.

then I use tar -xzvf file.tar.gz to unzip it. it tells you that it cannot extract, there is no archive. Very strange thing is when I use the download button in github to download the same file, it works well.

@popcornmix
Copy link
Collaborator

Please post the actual wget url, not one with aaa/aaa.

@lurch
Copy link
Contributor

lurch commented May 3, 2017

You probably just need to get the right link to the 'raw' URL, rather than the HTML preview that Github offers you by default.

@GrmpCerber
Copy link
Contributor

@xyd945 could you try to run the file command on the file you downloaded ?
(eg. file file.tar.gz)
It should say something like "ZIP", if it states HTML or EMTPY, then the link is the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants