Use wget/zip instead of Git #95

Closed
LordMike opened this Issue May 20, 2013 · 21 comments

Comments

Projects
None yet
7 participants
@LordMike

I was just wondering, why not use wget to fetch the firmware?
It is possible, with Github, to download the entire repository as a .zip file

F.ex. here
https://codeload.github.com/Hexxeh/rpi-firmware/zip/master

Or here, for a specific revision
https://codeload.github.com/Hexxeh/rpi-firmware/zip/d5b05be2147bf5dc0137798837af24b0bbbe398d

Then you won't need to clone the repository with history and all, but instead just need to unzip the file.

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix May 20, 2013

Collaborator

The rpi-update <hash>
form does download the tar.gz archive. You can use that with the latest hash to do an rpi-udpate. E.g.
sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

Might be worth timing the two cases to see if it's significantly faster.

(if it says you are up to date, "sudo rm /boot/.firmware_revision" will force it to update).

Collaborator

popcornmix commented May 20, 2013

The rpi-update <hash>
form does download the tar.gz archive. You can use that with the latest hash to do an rpi-udpate. E.g.
sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

Might be worth timing the two cases to see if it's significantly faster.

(if it says you are up to date, "sudo rm /boot/.firmware_revision" will force it to update).

@GrmpCerber

This comment has been minimized.

Show comment
Hide comment
@GrmpCerber

GrmpCerber Jun 4, 2013

Contributor

Well from the test I ran, I understand that git clone ... --depth=1 clones the last commit and it's parent, which might end, in the case of heavily random binary files, in downloading twice the size needed.
In such assumption, LordMike's solution would spare us quite a few bytes ...

Sidenote : It took me a loooong time to figure out the --depth=1 trick as :

  • I'm not "git-fluent"
  • it is not highlighted in @churten comment in #64
  • it's not commented in the code

;)

Contributor

GrmpCerber commented Jun 4, 2013

Well from the test I ran, I understand that git clone ... --depth=1 clones the last commit and it's parent, which might end, in the case of heavily random binary files, in downloading twice the size needed.
In such assumption, LordMike's solution would spare us quite a few bytes ...

Sidenote : It took me a loooong time to figure out the --depth=1 trick as :

  • I'm not "git-fluent"
  • it is not highlighted in @churten comment in #64
  • it's not commented in the code

;)

@LordMike

This comment has been minimized.

Show comment
Hide comment
@LordMike

LordMike Jun 5, 2013

Theres a "depth" parameter?
I did not know that - but still, why use Git at all?

LordMike commented Jun 5, 2013

Theres a "depth" parameter?
I did not know that - but still, why use Git at all?

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Jun 5, 2013

Collaborator

Why? Because it gives us complete history.

You think start.elf had a regression that started about 3 months ago:
https://github.com/Hexxeh/rpi-firmware/commits/master/start.elf

you can download any previous version and see the history of changes.
There would be a lot of storage, bandwidth and infrastructure costs to create our own system that does this.

Collaborator

popcornmix commented Jun 5, 2013

Why? Because it gives us complete history.

You think start.elf had a regression that started about 3 months ago:
https://github.com/Hexxeh/rpi-firmware/commits/master/start.elf

you can download any previous version and see the history of changes.
There would be a lot of storage, bandwidth and infrastructure costs to create our own system that does this.

@GrmpCerber

This comment has been minimized.

Show comment
Hide comment
@GrmpCerber

GrmpCerber Jun 5, 2013

Contributor

Well Popcornmix I must admit I think that, in my opinion, LordMike is right : most users won't do that.
(At least Locally) to check something like that I would naturally go and check directly on git-hub.

Beside the script is not fit for taking advantage of a full local history because of #64

Contributor

GrmpCerber commented Jun 5, 2013

Well Popcornmix I must admit I think that, in my opinion, LordMike is right : most users won't do that.
(At least Locally) to check something like that I would naturally go and check directly on git-hub.

Beside the script is not fit for taking advantage of a full local history because of #64

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Jun 5, 2013

Collaborator

@GrmpCerber
I think you misunderstood my point. I'm saying that github provides a web interface to browse the history and download files which allows people to find a specific version of a file that caused a regression.
I'm not suggesting that most will use the git command line, or rpi-update for this.

If we didn't use git at all for firmware, we'd have to invent alternate means to achieve this.

Collaborator

popcornmix commented Jun 5, 2013

@GrmpCerber
I think you misunderstood my point. I'm saying that github provides a web interface to browse the history and download files which allows people to find a specific version of a file that caused a regression.
I'm not suggesting that most will use the git command line, or rpi-update for this.

If we didn't use git at all for firmware, we'd have to invent alternate means to achieve this.

@fastcat

This comment has been minimized.

Show comment
Hide comment
@fastcat

fastcat Jun 5, 2013

Contributor

Some observations:

  • Functionally, git clone --depth=1 is equivalent to wget
  • For most users, the limiting factor is going to be internet download bandwidth, not CPU time, even on the Pi
  • Preparing the git clone or the http(s) download takes some time on the server, but one may be faster, esp. e.g. if github caches the .zip for the head revision
  • Having the history available is great, but not needed 99.9% of the time
  • Either the firmware files change dramatically with each build, or git is very poor at binary deltas for these files (Mercurial or SVN might do better at that, but this is github)

Therefore it seems like a few datapoints should be collected, and that can drive whether it would be better to switch to http(s) downloads:

  • Which is faster to start downloading (latency)?
  • Which downloads fewer bytes?
  • Which uses less temp space on the Pi?
  • How many Pi users are installing git-core only to get rpi-update to work?
Contributor

fastcat commented Jun 5, 2013

Some observations:

  • Functionally, git clone --depth=1 is equivalent to wget
  • For most users, the limiting factor is going to be internet download bandwidth, not CPU time, even on the Pi
  • Preparing the git clone or the http(s) download takes some time on the server, but one may be faster, esp. e.g. if github caches the .zip for the head revision
  • Having the history available is great, but not needed 99.9% of the time
  • Either the firmware files change dramatically with each build, or git is very poor at binary deltas for these files (Mercurial or SVN might do better at that, but this is github)

Therefore it seems like a few datapoints should be collected, and that can drive whether it would be better to switch to http(s) downloads:

  • Which is faster to start downloading (latency)?
  • Which downloads fewer bytes?
  • Which uses less temp space on the Pi?
  • How many Pi users are installing git-core only to get rpi-update to work?
@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Jun 5, 2013

Collaborator

github supports downloading as a single archive. See my first post in this thread.

If someone just times a normal rpi-update, and the command I gave in my first post, we'd know how much speed difference there was and if it's worth switching. (I'm sure the archive will be faster).

git-core is tiny, and is preinstalled on latest rasbian image, so it's not a big concern.

Collaborator

popcornmix commented Jun 5, 2013

github supports downloading as a single archive. See my first post in this thread.

If someone just times a normal rpi-update, and the command I gave in my first post, we'd know how much speed difference there was and if it's worth switching. (I'm sure the archive will be faster).

git-core is tiny, and is preinstalled on latest rasbian image, so it's not a big concern.

@GrmpCerber

This comment has been minimized.

Show comment
Hide comment
@GrmpCerber

GrmpCerber Jun 5, 2013

Contributor

With git

$ time strace -e trace=read,write -o git.log git clone https://github.com/Hexxeh/rpi-firmware --depth=1
Cloning into 'rpi-firmware'...
remote: Counting objects: 1712, done.
remote: Compressing objects: 100% (1381/1381), done.
remote: Total 1712 (delta 303), reused 1281 (delta 213)
Receiving objects: 100% (1712/1712), 29.41 MiB | 795 KiB/s, done.
Resolving deltas: 100% (303/303), done.
Checking out files: 100% (1467/1467), done.

real 1m40.600s
user 0m23.930s
sys 0m13.720s

$ egrep 'read|write' git.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
56230551

With wget :

$ time strace -e trace=read,write -o wget.log wget https://codeload.github.com/Hexxeh/rpi-firmware/zip/master
--2013-06-05 16:26:43--  https://codeload.github.com/Hexxeh/rpi-firmware/zip/master
Résolution de codeload.github.com (codeload.github.com)... 204.232.175.86
Connexion vers codeload.github.com (codeload.github.com)|204.232.175.86|:443...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: non spécifié [application/zip]
Sauvegarde en : «master»
    [                             <=>                                                                                    ] 31 687 757   942K/s   ds 43s
2013-06-05 16:27:35 (713 KB/s) - «master» sauvegardé [31687757]
real    0m52.166s
user    0m19.180s
sys     0m25.250s
$ egrep 'read|write' wget.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
28550717

Unzipping wget result :

$time strace -e trace=read,write -o unzip.log unzip master
....
real    0m28.660s
user    0m7.780s
sys     0m11.160s
$ egrep 'read|write' unzip.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
84418077

Which means :

  • Git takes an additional 50s and 26MB I/O over wget
    • but it seems that both transfer the same amount over the network : git states 29.41 MiB and wget 31 687 757
  • wget itself is not enouth and you must spend an additionnal 28s and 80MB I/O to unzip

So, in the end, I think that git wins for the sake of simplicity

Contributor

GrmpCerber commented Jun 5, 2013

With git

$ time strace -e trace=read,write -o git.log git clone https://github.com/Hexxeh/rpi-firmware --depth=1
Cloning into 'rpi-firmware'...
remote: Counting objects: 1712, done.
remote: Compressing objects: 100% (1381/1381), done.
remote: Total 1712 (delta 303), reused 1281 (delta 213)
Receiving objects: 100% (1712/1712), 29.41 MiB | 795 KiB/s, done.
Resolving deltas: 100% (303/303), done.
Checking out files: 100% (1467/1467), done.

real 1m40.600s
user 0m23.930s
sys 0m13.720s

$ egrep 'read|write' git.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
56230551

With wget :

$ time strace -e trace=read,write -o wget.log wget https://codeload.github.com/Hexxeh/rpi-firmware/zip/master
--2013-06-05 16:26:43--  https://codeload.github.com/Hexxeh/rpi-firmware/zip/master
Résolution de codeload.github.com (codeload.github.com)... 204.232.175.86
Connexion vers codeload.github.com (codeload.github.com)|204.232.175.86|:443...connecté.
requête HTTP transmise, en attente de la réponse...200 OK
Longueur: non spécifié [application/zip]
Sauvegarde en : «master»
    [                             <=>                                                                                    ] 31 687 757   942K/s   ds 43s
2013-06-05 16:27:35 (713 KB/s) - «master» sauvegardé [31687757]
real    0m52.166s
user    0m19.180s
sys     0m25.250s
$ egrep 'read|write' wget.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
28550717

Unzipping wget result :

$time strace -e trace=read,write -o unzip.log unzip master
....
real    0m28.660s
user    0m7.780s
sys     0m11.160s
$ egrep 'read|write' unzip.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
84418077

Which means :

  • Git takes an additional 50s and 26MB I/O over wget
    • but it seems that both transfer the same amount over the network : git states 29.41 MiB and wget 31 687 757
  • wget itself is not enouth and you must spend an additionnal 28s and 80MB I/O to unzip

So, in the end, I think that git wins for the sake of simplicity

@fastcat

This comment has been minimized.

Show comment
Hide comment
@fastcat

fastcat Jun 5, 2013

Contributor

$ egrep 'read|write' git.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
56230551

This is fishy.

$ sudo du -hs /root/.rpi-firmware
132M    /root/.rpi-firmware

I can't see how git could write 132MB of files doing only 56MB of I/O. I think you need to add -f to your strace calls to follow forks and child processes. That's assuming git/wget/unzip don't use mmap for I/O.

Also, for both the unzip and git checkout stages, the reads should be cheap/free, as it the just download data will probably still be in cache.

Contributor

fastcat commented Jun 5, 2013

$ egrep 'read|write' git.log | awk 'BEGIN {FS="="}{ sum += $2} END {print sum}'
56230551

This is fishy.

$ sudo du -hs /root/.rpi-firmware
132M    /root/.rpi-firmware

I can't see how git could write 132MB of files doing only 56MB of I/O. I think you need to add -f to your strace calls to follow forks and child processes. That's assuming git/wget/unzip don't use mmap for I/O.

Also, for both the unzip and git checkout stages, the reads should be cheap/free, as it the just download data will probably still be in cache.

@notro

This comment has been minimized.

Show comment
Hide comment
@notro

notro Aug 12, 2013

Contributor

I have tested the 'git clone' and 'wget' options

Test environment
Raspberry PI B rev. 2
Class 4 SD Card
Internet speed: 25 Mb/s

I tested each option twice in case github server caching would impact the result.
Each test run was done with a fresh 2013-07-26-wheezy-raspbian image.
After first boot I expanded the filesystem with raspi-config.

First git clone test

$ time sudo rpi-update

real    2m56.322s
user    0m56.330s
sys     0m28.010s


$ sudo -i
# du -s .rpi-firmware/
134308  .rpi-firmware/

Second git clone test

$ time sudo rpi-update

real    2m53.571s
user    0m55.340s
sys     0m27.340s

First wget test

$ time sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

real    2m2.446s
user    0m44.130s
sys     0m12.930s

$ sudo -i
# du -s .rpi-firmware/
60840   .rpi-firmware/

Second wget test

$ time sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

real    1m56.877s
user    0m44.200s
sys     0m12.400s
Contributor

notro commented Aug 12, 2013

I have tested the 'git clone' and 'wget' options

Test environment
Raspberry PI B rev. 2
Class 4 SD Card
Internet speed: 25 Mb/s

I tested each option twice in case github server caching would impact the result.
Each test run was done with a fresh 2013-07-26-wheezy-raspbian image.
After first boot I expanded the filesystem with raspi-config.

First git clone test

$ time sudo rpi-update

real    2m56.322s
user    0m56.330s
sys     0m28.010s


$ sudo -i
# du -s .rpi-firmware/
134308  .rpi-firmware/

Second git clone test

$ time sudo rpi-update

real    2m53.571s
user    0m55.340s
sys     0m27.340s

First wget test

$ time sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

real    2m2.446s
user    0m44.130s
sys     0m12.930s

$ sudo -i
# du -s .rpi-firmware/
60840   .rpi-firmware/

Second wget test

$ time sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}')

real    1m56.877s
user    0m44.200s
sys     0m12.400s
@lurch

This comment has been minimized.

Show comment
Hide comment
@lurch

lurch Aug 28, 2013

Contributor

I haven't done any timing or comparisons myself, but just a small point to note that sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}') isn't strictly necessary - I've discovered that github.com seems to accept the same "symbolic refs" (or whatever the relevant terminology is) as the git command line, so you can simply run sudo rpi-update HEAD :-)

Contributor

lurch commented Aug 28, 2013

I haven't done any timing or comparisons myself, but just a small point to note that sudo rpi-update $(git ls-remote -h https://github.com/Hexxeh/rpi-firmware refs/heads/master | awk '{print $1}') isn't strictly necessary - I've discovered that github.com seems to accept the same "symbolic refs" (or whatever the relevant terminology is) as the git command line, so you can simply run sudo rpi-update HEAD :-)

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Aug 28, 2013

Collaborator

Useful.

Collaborator

popcornmix commented Aug 28, 2013

Useful.

@lurch

This comment has been minimized.

Show comment
Hide comment
@lurch

lurch Aug 28, 2013

Contributor

...and this also means that if the latest rpi-update'd firmware prevents your Pi from booting, you can use the offline-update mode of rpi-update on another Linux computer, and ask rpi-update to install the HEAD^ revision, and that'll take you back to the previous firmware revision :-)

Contributor

lurch commented Aug 28, 2013

...and this also means that if the latest rpi-update'd firmware prevents your Pi from booting, you can use the offline-update mode of rpi-update on another Linux computer, and ask rpi-update to install the HEAD^ revision, and that'll take you back to the previous firmware revision :-)

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 5, 2014

Collaborator

Latest update removes the requirement for git, and all downloading uses curl.
Please test, and report if okay.

Collaborator

popcornmix commented Apr 5, 2014

Latest update removes the requirement for git, and all downloading uses curl.
Please test, and report if okay.

@popcornmix popcornmix closed this Apr 17, 2014

@xyd945

This comment has been minimized.

Show comment
Hide comment
@xyd945

xyd945 Apr 27, 2017

when I use wget down a compressed file, e.g., tar, tar.gz or zip file, I failed to un-zip them. Error is: Can't extract files from the archive, you missed the archive name!

any idea about that???

xyd945 commented Apr 27, 2017

when I use wget down a compressed file, e.g., tar, tar.gz or zip file, I failed to un-zip them. Error is: Can't extract files from the archive, you missed the archive name!

any idea about that???

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 28, 2017

Collaborator

Post the exact commands you entered and errors reported and we may be able to help.

Collaborator

popcornmix commented Apr 28, 2017

Post the exact commands you entered and errors reported and we may be able to help.

@xyd945

This comment has been minimized.

Show comment
Hide comment
@xyd945

xyd945 Apr 29, 2017

I use Raspian, the OS in my raspberry pi. Compressed a folder to tar.gz file, and then I uploaded it to my github repository in the master branch.

Then I click the tar.gz file, and copy the link in the address bar, and use wget http://aaa/aa/aaa/file.tar.gz, it will download the file into my raspberry pi.

then I use tar -xzvf file.tar.gz to unzip it. it tells you that it cannot extract, there is no archive. Very strange thing is when I use the download button in github to download the same file, it works well.

xyd945 commented Apr 29, 2017

I use Raspian, the OS in my raspberry pi. Compressed a folder to tar.gz file, and then I uploaded it to my github repository in the master branch.

Then I click the tar.gz file, and copy the link in the address bar, and use wget http://aaa/aa/aaa/file.tar.gz, it will download the file into my raspberry pi.

then I use tar -xzvf file.tar.gz to unzip it. it tells you that it cannot extract, there is no archive. Very strange thing is when I use the download button in github to download the same file, it works well.

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 29, 2017

Collaborator

Please post the actual wget url, not one with aaa/aaa.

Collaborator

popcornmix commented Apr 29, 2017

Please post the actual wget url, not one with aaa/aaa.

@lurch

This comment has been minimized.

Show comment
Hide comment
@lurch

lurch May 3, 2017

Contributor

You probably just need to get the right link to the 'raw' URL, rather than the HTML preview that Github offers you by default.

Contributor

lurch commented May 3, 2017

You probably just need to get the right link to the 'raw' URL, rather than the HTML preview that Github offers you by default.

@GrmpCerber

This comment has been minimized.

Show comment
Hide comment
@GrmpCerber

GrmpCerber May 10, 2017

Contributor

@xyd945 could you try to run the file command on the file you downloaded ?
(eg. file file.tar.gz)
It should say something like "ZIP", if it states HTML or EMTPY, then the link is the problem

Contributor

GrmpCerber commented May 10, 2017

@xyd945 could you try to run the file command on the file you downloaded ?
(eg. file file.tar.gz)
It should say something like "ZIP", if it states HTML or EMTPY, then the link is the problem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment