-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathwget.texi
4532 lines (3693 loc) · 171 KB
/
wget.texi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\input texinfo @c -*-texinfo-*-
@c %**start of header
@setfilename wget.info
@documentencoding UTF-8
@include version.texi
@settitle GNU Wget @value{VERSION} Manual
@c Disable the monstrous rectangles beside overfull hbox-es.
@finalout
@c Use `odd' to print double-sided.
@setchapternewpage on
@c %**end of header
@iftex
@c Remove this if you don't use A4 paper.
@afourpaper
@end iftex
@c Title for man page. The weird way texi2pod.pl is written requires
@c the preceding @set.
@set Wget Wget
@c man title Wget The non-interactive network downloader.
@dircategory Network applications
@direntry
* Wget: (wget). Non-interactive network downloader.
@end direntry
@copying
This file documents the GNU Wget utility for downloading network
data.
@c man begin COPYRIGHT
Copyright @copyright{} 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2015 Free Software
Foundation, Inc.
@iftex
Permission is granted to make and distribute verbatim copies of
this manual provided the copyright notice and this permission notice
are preserved on all copies.
@end iftex
@ignore
Permission is granted to process this file through TeX and print the
results, provided the printed document carries a copying permission
notice identical to this one except for the removal of this paragraph
(this paragraph not being relevant to the printed manual).
@end ignore
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, with no Front-Cover Texts, and with no Back-Cover
Texts. A copy of the license is included in the section entitled
``GNU Free Documentation License''.
@c man end
@end copying
@titlepage
@title GNU Wget @value{VERSION}
@subtitle The non-interactive download utility
@subtitle Updated for Wget @value{VERSION}, @value{UPDATED}
@author by Hrvoje Nikšić and others
@ignore
@c man begin AUTHOR
Originally written by Hrvoje Nikšić <hniksic@xemacs.org>.
@c man end
@c man begin SEEALSO
This is @strong{not} the complete manual for GNU Wget.
For more complete information, including more detailed explanations of
some of the options, and a number of commands available
for use with @file{.wgetrc} files and the @samp{-e} option, see the GNU
Info entry for @file{wget}.
@c man end
@end ignore
@page
@vskip 0pt plus 1filll
@insertcopying
@end titlepage
@contents
@ifnottex
@node Top, Overview, (dir), (dir)
@top Wget @value{VERSION}
@insertcopying
@end ifnottex
@menu
* Overview:: Features of Wget.
* Invoking:: Wget command-line arguments.
* Recursive Download:: Downloading interlinked pages.
* Following Links:: The available methods of chasing links.
* Time-Stamping:: Mirroring according to time-stamps.
* Startup File:: Wget's initialization file.
* Examples:: Examples of usage.
* Various:: The stuff that doesn't fit anywhere else.
* Appendices:: Some useful references.
* Copying this manual:: You may give out copies of this manual.
* Concept Index:: Topics covered by this manual.
@end menu
@node Overview, Invoking, Top, Top
@chapter Overview
@cindex overview
@cindex features
@c man begin DESCRIPTION
GNU Wget is a free utility for non-interactive download of files from
the Web. It supports @sc{http}, @sc{https}, and @sc{ftp} protocols, as
well as retrieval through @sc{http} proxies.
@c man end
This chapter is a partial overview of Wget's features.
@itemize @bullet
@item
@c man begin DESCRIPTION
Wget is non-interactive, meaning that it can work in the background,
while the user is not logged on. This allows you to start a retrieval
and disconnect from the system, letting Wget finish the work. By
contrast, most of the Web browsers require constant user's presence,
which can be a great hindrance when transferring a lot of data.
@c man end
@item
@ignore
@c man begin DESCRIPTION
@c man end
@end ignore
@c man begin DESCRIPTION
Wget can follow links in @sc{html}, @sc{xhtml}, and @sc{css} pages, to
create local versions of remote web sites, fully recreating the
directory structure of the original site. This is sometimes referred to
as ``recursive downloading.'' While doing that, Wget respects the Robot
Exclusion Standard (@file{/robots.txt}). Wget can be instructed to
convert the links in downloaded files to point at the local files, for
offline viewing.
@c man end
@item
File name wildcard matching and recursive mirroring of directories are
available when retrieving via @sc{ftp}. Wget can read the time-stamp
information given by both @sc{http} and @sc{ftp} servers, and store it
locally. Thus Wget can see if the remote file has changed since last
retrieval, and automatically retrieve the new version if it has. This
makes Wget suitable for mirroring of @sc{ftp} sites, as well as home
pages.
@item
@ignore
@c man begin DESCRIPTION
@c man end
@end ignore
@c man begin DESCRIPTION
Wget has been designed for robustness over slow or unstable network
connections; if a download fails due to a network problem, it will
keep retrying until the whole file has been retrieved. If the server
supports regetting, it will instruct the server to continue the
download from where it left off.
@c man end
@item
Wget supports proxy servers, which can lighten the network load, speed
up retrieval and provide access behind firewalls. Wget uses the passive
@sc{ftp} downloading by default, active @sc{ftp} being an option.
@item
Wget supports IP version 6, the next generation of IP. IPv6 is
autodetected at compile-time, and can be disabled at either build or
run time. Binaries built with IPv6 support work well in both
IPv4-only and dual family environments.
@item
Built-in features offer mechanisms to tune which links you wish to follow
(@pxref{Following Links}).
@item
The progress of individual downloads is traced using a progress gauge.
Interactive downloads are tracked using a ``thermometer''-style gauge,
whereas non-interactive ones are traced with dots, each dot
representing a fixed amount of data received (1KB by default). Either
gauge can be customized to your preferences.
@item
Most of the features are fully configurable, either through command line
options, or via the initialization file @file{.wgetrc} (@pxref{Startup
File}). Wget allows you to define @dfn{global} startup files
(@file{/usr/local/etc/wgetrc} by default) for site settings. You can also
specify the location of a startup file with the --config option.
@ignore
@c man begin FILES
@table @samp
@item /usr/local/etc/wgetrc
Default location of the @dfn{global} startup file.
@item .wgetrc
User startup file.
@end table
@c man end
@end ignore
@item
Finally, GNU Wget is free software. This means that everyone may use
it, redistribute it and/or modify it under the terms of the GNU General
Public License, as published by the Free Software Foundation (see the
file @file{COPYING} that came with GNU Wget, for details).
@end itemize
@node Invoking, Recursive Download, Overview, Top
@chapter Invoking
@cindex invoking
@cindex command line
@cindex arguments
@cindex nohup
By default, Wget is very simple to invoke. The basic syntax is:
@example
@c man begin SYNOPSIS
wget [@var{option}]@dots{} [@var{URL}]@dots{}
@c man end
@end example
Wget will simply download all the @sc{url}s specified on the command
line. @var{URL} is a @dfn{Uniform Resource Locator}, as defined below.
However, you may wish to change some of the default parameters of
Wget. You can do it two ways: permanently, adding the appropriate
command to @file{.wgetrc} (@pxref{Startup File}), or specifying it on
the command line.
@menu
* URL Format::
* Option Syntax::
* Basic Startup Options::
* Logging and Input File Options::
* Download Options::
* Directory Options::
* HTTP Options::
* HTTPS (SSL/TLS) Options::
* FTP Options::
* Recursive Retrieval Options::
* Recursive Accept/Reject Options::
* Exit Status::
@end menu
@node URL Format, Option Syntax, Invoking, Invoking
@section URL Format
@cindex URL
@cindex URL syntax
@dfn{URL} is an acronym for Uniform Resource Locator. A uniform
resource locator is a compact string representation for a resource
available via the Internet. Wget recognizes the @sc{url} syntax as per
@sc{rfc1738}. This is the most widely used form (square brackets denote
optional parts):
@example
http://host[:port]/directory/file
ftp://host[:port]/directory/file
@end example
You can also encode your username and password within a @sc{url}:
@example
ftp://user:password@@host/path
http://user:password@@host/path
@end example
Either @var{user} or @var{password}, or both, may be left out. If you
leave out either the @sc{http} username or password, no authentication
will be sent. If you leave out the @sc{ftp} username, @samp{anonymous}
will be used. If you leave out the @sc{ftp} password, your email
address will be supplied as a default password.@footnote{If you have a
@file{.netrc} file in your home directory, password will also be
searched for there.}
@strong{Important Note}: if you specify a password-containing @sc{url}
on the command line, the username and password will be plainly visible
to all users on the system, by way of @code{ps}. On multi-user systems,
this is a big security risk. To work around it, use @code{wget -i -}
and feed the @sc{url}s to Wget's standard input, each on a separate
line, terminated by @kbd{C-d}.
You can encode unsafe characters in a @sc{url} as @samp{%xy}, @code{xy}
being the hexadecimal representation of the character's @sc{ascii}
value. Some common unsafe characters include @samp{%} (quoted as
@samp{%25}), @samp{:} (quoted as @samp{%3A}), and @samp{@@} (quoted as
@samp{%40}). Refer to @sc{rfc1738} for a comprehensive list of unsafe
characters.
Wget also supports the @code{type} feature for @sc{ftp} @sc{url}s. By
default, @sc{ftp} documents are retrieved in the binary mode (type
@samp{i}), which means that they are downloaded unchanged. Another
useful mode is the @samp{a} (@dfn{ASCII}) mode, which converts the line
delimiters between the different operating systems, and is thus useful
for text files. Here is an example:
@example
ftp://host/directory/file;type=a
@end example
Two alternative variants of @sc{url} specification are also supported,
because of historical (hysterical?) reasons and their widespreaded use.
@sc{ftp}-only syntax (supported by @code{NcFTP}):
@example
host:/dir/file
@end example
@sc{http}-only syntax (introduced by @code{Netscape}):
@example
host[:port]/dir/file
@end example
These two alternative forms are deprecated, and may cease being
supported in the future.
If you do not understand the difference between these notations, or do
not know which one to use, just use the plain ordinary format you use
with your favorite browser, like @code{Lynx} or @code{Netscape}.
@c man begin OPTIONS
@node Option Syntax, Basic Startup Options, URL Format, Invoking
@section Option Syntax
@cindex option syntax
@cindex syntax of options
Since Wget uses GNU getopt to process command-line arguments, every
option has a long form along with the short one. Long options are
more convenient to remember, but take time to type. You may freely
mix different option styles, or specify options after the command-line
arguments. Thus you may write:
@example
wget -r --tries=10 http://fly.srk.fer.hr/ -o log
@end example
The space between the option accepting an argument and the argument may
be omitted. Instead of @samp{-o log} you can write @samp{-olog}.
You may put several options that do not require arguments together,
like:
@example
wget -drc @var{URL}
@end example
This is completely equivalent to:
@example
wget -d -r -c @var{URL}
@end example
Since the options can be specified after the arguments, you may
terminate them with @samp{--}. So the following will try to download
@sc{url} @samp{-x}, reporting failure to @file{log}:
@example
wget -o log -- -x
@end example
The options that accept comma-separated lists all respect the convention
that specifying an empty list clears its value. This can be useful to
clear the @file{.wgetrc} settings. For instance, if your @file{.wgetrc}
sets @code{exclude_directories} to @file{/cgi-bin}, the following
example will first reset it, and then set it to exclude @file{/~nobody}
and @file{/~somebody}. You can also clear the lists in @file{.wgetrc}
(@pxref{Wgetrc Syntax}).
@example
wget -X '' -X /~nobody,/~somebody
@end example
Most options that do not accept arguments are @dfn{boolean} options,
so named because their state can be captured with a yes-or-no
(``boolean'') variable. For example, @samp{--follow-ftp} tells Wget
to follow FTP links from HTML files and, on the other hand,
@samp{--no-glob} tells it not to perform file globbing on FTP URLs. A
boolean option is either @dfn{affirmative} or @dfn{negative}
(beginning with @samp{--no}). All such options share several
properties.
Unless stated otherwise, it is assumed that the default behavior is
the opposite of what the option accomplishes. For example, the
documented existence of @samp{--follow-ftp} assumes that the default
is to @emph{not} follow FTP links from HTML pages.
Affirmative options can be negated by prepending the @samp{--no-} to
the option name; negative options can be negated by omitting the
@samp{--no-} prefix. This might seem superfluous---if the default for
an affirmative option is to not do something, then why provide a way
to explicitly turn it off? But the startup file may in fact change
the default. For instance, using @code{follow_ftp = on} in
@file{.wgetrc} makes Wget @emph{follow} FTP links by default, and
using @samp{--no-follow-ftp} is the only way to restore the factory
default from the command line.
@node Basic Startup Options, Logging and Input File Options, Option Syntax, Invoking
@section Basic Startup Options
@table @samp
@item -V
@itemx --version
Display the version of Wget.
@item -h
@itemx --help
Print a help message describing all of Wget's command-line options.
@item -b
@itemx --background
Go to background immediately after startup. If no output file is
specified via the @samp{-o}, output is redirected to @file{wget-log}.
@cindex execute wgetrc command
@item -e @var{command}
@itemx --execute @var{command}
Execute @var{command} as if it were a part of @file{.wgetrc}
(@pxref{Startup File}). A command thus invoked will be executed
@emph{after} the commands in @file{.wgetrc}, thus taking precedence over
them. If you need to specify more than one wgetrc command, use multiple
instances of @samp{-e}.
@end table
@node Logging and Input File Options, Download Options, Basic Startup Options, Invoking
@section Logging and Input File Options
@table @samp
@cindex output file
@cindex log file
@item -o @var{logfile}
@itemx --output-file=@var{logfile}
Log all messages to @var{logfile}. The messages are normally reported
to standard error.
@cindex append to log
@item -a @var{logfile}
@itemx --append-output=@var{logfile}
Append to @var{logfile}. This is the same as @samp{-o}, only it appends
to @var{logfile} instead of overwriting the old log file. If
@var{logfile} does not exist, a new file is created.
@cindex debug
@item -d
@itemx --debug
Turn on debug output, meaning various information important to the
developers of Wget if it does not work properly. Your system
administrator may have chosen to compile Wget without debug support, in
which case @samp{-d} will not work. Please note that compiling with
debug support is always safe---Wget compiled with the debug support will
@emph{not} print any debug info unless requested with @samp{-d}.
@xref{Reporting Bugs}, for more information on how to use @samp{-d} for
sending bug reports.
@cindex quiet
@item -q
@itemx --quiet
Turn off Wget's output.
@cindex verbose
@item -v
@itemx --verbose
Turn on verbose output, with all the available data. The default output
is verbose.
@item -nv
@itemx --no-verbose
Turn off verbose without being completely quiet (use @samp{-q} for
that), which means that error messages and basic information still get
printed.
@item --report-speed=@var{type}
Output bandwidth as @var{type}. The only accepted value is @samp{bits}.
@cindex input-file
@item -i @var{file}
@itemx --input-file=@var{file}
Read @sc{url}s from a local or external @var{file}. If @samp{-} is
specified as @var{file}, @sc{url}s are read from the standard input.
(Use @samp{./-} to read from a file literally named @samp{-}.)
If this function is used, no @sc{url}s need be present on the command
line. If there are @sc{url}s both on the command line and in an input
file, those on the command lines will be the first ones to be
retrieved. If @samp{--force-html} is not specified, then @var{file}
should consist of a series of URLs, one per line.
However, if you specify @samp{--force-html}, the document will be
regarded as @samp{html}. In that case you may have problems with
relative links, which you can solve either by adding @code{<base
href="@var{url}">} to the documents or by specifying
@samp{--base=@var{url}} on the command line.
If the @var{file} is an external one, the document will be automatically
treated as @samp{html} if the Content-Type matches @samp{text/html}.
Furthermore, the @var{file}'s location will be implicitly used as base
href if none was specified.
@cindex input-metalink
@item --input-metalink=@var{file}
Downloads files covered in local Metalink @var{file}. Metalink version 3
and 4 are supported.
@cindex metalink-over-http
@item --metalink-over-http
Issues HTTP HEAD request instead of GET and extracts Metalink metadata
from response headers. Then it switches to Metalink download.
If no valid Metalink metadata is found, it falls back to ordinary HTTP download.
@cindex preferred-location
@item --preferred-location
Set preferred location for Metalink resources. This has effect if multiple
resources with same priority are available.
@cindex force html
@item -F
@itemx --force-html
When input is read from a file, force it to be treated as an @sc{html}
file. This enables you to retrieve relative links from existing
@sc{html} files on your local disk, by adding @code{<base
href="@var{url}">} to @sc{html}, or using the @samp{--base} command-line
option.
@cindex base for relative links in input file
@item -B @var{URL}
@itemx --base=@var{URL}
Resolves relative links using @var{URL} as the point of reference,
when reading links from an HTML file specified via the
@samp{-i}/@samp{--input-file} option (together with
@samp{--force-html}, or when the input file was fetched remotely from
a server describing it as @sc{html}). This is equivalent to the
presence of a @code{BASE} tag in the @sc{html} input file, with
@var{URL} as the value for the @code{href} attribute.
For instance, if you specify @samp{http://foo/bar/a.html} for
@var{URL}, and Wget reads @samp{../baz/b.html} from the input file, it
would be resolved to @samp{http://foo/baz/b.html}.
@cindex specify config
@item --config=@var{FILE}
Specify the location of a startup file you wish to use.
@item --rejected-log=@var{logfile}
Logs all URL rejections to @var{logfile} as comma separated values. The values
include the reason of rejection, the URL and the parent URL it was found in.
@end table
@node Download Options, Directory Options, Logging and Input File Options, Invoking
@section Download Options
@table @samp
@cindex bind address
@cindex client IP address
@cindex IP address, client
@item --bind-address=@var{ADDRESS}
When making client TCP/IP connections, bind to @var{ADDRESS} on
the local machine. @var{ADDRESS} may be specified as a hostname or IP
address. This option can be useful if your machine is bound to multiple
IPs.
@cindex retries
@cindex tries
@cindex number of tries
@item -t @var{number}
@itemx --tries=@var{number}
Set number of tries to @var{number}. Specify 0 or @samp{inf} for
infinite retrying. The default is to retry 20 times, with the exception
of fatal errors like ``connection refused'' or ``not found'' (404),
which are not retried.
@item -O @var{file}
@itemx --output-document=@var{file}
The documents will not be written to the appropriate files, but all
will be concatenated together and written to @var{file}. If @samp{-}
is used as @var{file}, documents will be printed to standard output,
disabling link conversion. (Use @samp{./-} to print to a file
literally named @samp{-}.)
Use of @samp{-O} is @emph{not} intended to mean simply ``use the name
@var{file} instead of the one in the URL;'' rather, it is
analogous to shell redirection:
@samp{wget -O file http://foo} is intended to work like
@samp{wget -O - http://foo > file}; @file{file} will be truncated
immediately, and @emph{all} downloaded content will be written there.
For this reason, @samp{-N} (for timestamp-checking) is not supported
in combination with @samp{-O}: since @var{file} is always newly
created, it will always have a very new timestamp. A warning will be
issued if this combination is used.
Similarly, using @samp{-r} or @samp{-p} with @samp{-O} may not work as
you expect: Wget won't just download the first file to @var{file} and
then download the rest to their normal names: @emph{all} downloaded
content will be placed in @var{file}. This was disabled in version
1.11, but has been reinstated (with a warning) in 1.11.2, as there are
some cases where this behavior can actually have some use.
A combination with @samp{-nc} is only accepted if the given output
file does not exist.
Note that a combination with @samp{-k} is only permitted when
downloading a single document, as in that case it will just convert
all relative URIs to external ones; @samp{-k} makes no sense for
multiple URIs when they're all being downloaded to a single file;
@samp{-k} can be used only when the output is a regular file.
@cindex clobbering, file
@cindex downloading multiple times
@cindex no-clobber
@item -nc
@itemx --no-clobber
If a file is downloaded more than once in the same directory, Wget's
behavior depends on a few options, including @samp{-nc}. In certain
cases, the local file will be @dfn{clobbered}, or overwritten, upon
repeated download. In other cases it will be preserved.
When running Wget without @samp{-N}, @samp{-nc}, @samp{-r}, or
@samp{-p}, downloading the same file in the same directory will result
in the original copy of @var{file} being preserved and the second copy
being named @samp{@var{file}.1}. If that file is downloaded yet
again, the third copy will be named @samp{@var{file}.2}, and so on.
(This is also the behavior with @samp{-nd}, even if @samp{-r} or
@samp{-p} are in effect.) When @samp{-nc} is specified, this behavior
is suppressed, and Wget will refuse to download newer copies of
@samp{@var{file}}. Therefore, ``@code{no-clobber}'' is actually a
misnomer in this mode---it's not clobbering that's prevented (as the
numeric suffixes were already preventing clobbering), but rather the
multiple version saving that's prevented.
When running Wget with @samp{-r} or @samp{-p}, but without @samp{-N},
@samp{-nd}, or @samp{-nc}, re-downloading a file will result in the
new copy simply overwriting the old. Adding @samp{-nc} will prevent
this behavior, instead causing the original version to be preserved
and any newer copies on the server to be ignored.
When running Wget with @samp{-N}, with or without @samp{-r} or
@samp{-p}, the decision as to whether or not to download a newer copy
of a file depends on the local and remote timestamp and size of the
file (@pxref{Time-Stamping}). @samp{-nc} may not be specified at the
same time as @samp{-N}.
A combination with @samp{-O}/@samp{--output-document} is only accepted
if the given output file does not exist.
Note that when @samp{-nc} is specified, files with the suffixes
@samp{.html} or @samp{.htm} will be loaded from the local disk and
parsed as if they had been retrieved from the Web.
@cindex backing up files
@item --backups=@var{backups}
Before (over)writing a file, back up an existing file by adding a
@samp{.1} suffix (@samp{_1} on VMS) to the file name. Such backup
files are rotated to @samp{.2}, @samp{.3}, and so on, up to
@var{backups} (and lost beyond that).
@cindex continue retrieval
@cindex incomplete downloads
@cindex resume download
@item -c
@itemx --continue
Continue getting a partially-downloaded file. This is useful when you
want to finish up a download started by a previous instance of Wget, or
by another program. For instance:
@example
wget -c ftp://sunsite.doc.ic.ac.uk/ls-lR.Z
@end example
If there is a file named @file{ls-lR.Z} in the current directory, Wget
will assume that it is the first portion of the remote file, and will
ask the server to continue the retrieval from an offset equal to the
length of the local file.
Note that you don't need to specify this option if you just want the
current invocation of Wget to retry downloading a file should the
connection be lost midway through. This is the default behavior.
@samp{-c} only affects resumption of downloads started @emph{prior} to
this invocation of Wget, and whose local files are still sitting around.
Without @samp{-c}, the previous example would just download the remote
file to @file{ls-lR.Z.1}, leaving the truncated @file{ls-lR.Z} file
alone.
Beginning with Wget 1.7, if you use @samp{-c} on a non-empty file, and
it turns out that the server does not support continued downloading,
Wget will refuse to start the download from scratch, which would
effectively ruin existing contents. If you really want the download to
start from scratch, remove the file.
Also beginning with Wget 1.7, if you use @samp{-c} on a file which is of
equal size as the one on the server, Wget will refuse to download the
file and print an explanatory message. The same happens when the file
is smaller on the server than locally (presumably because it was changed
on the server since your last download attempt)---because ``continuing''
is not meaningful, no download occurs.
On the other side of the coin, while using @samp{-c}, any file that's
bigger on the server than locally will be considered an incomplete
download and only @code{(length(remote) - length(local))} bytes will be
downloaded and tacked onto the end of the local file. This behavior can
be desirable in certain cases---for instance, you can use @samp{wget -c}
to download just the new portion that's been appended to a data
collection or log file.
However, if the file is bigger on the server because it's been
@emph{changed}, as opposed to just @emph{appended} to, you'll end up
with a garbled file. Wget has no way of verifying that the local file
is really a valid prefix of the remote file. You need to be especially
careful of this when using @samp{-c} in conjunction with @samp{-r},
since every file will be considered as an "incomplete download" candidate.
Another instance where you'll get a garbled file if you try to use
@samp{-c} is if you have a lame @sc{http} proxy that inserts a
``transfer interrupted'' string into the local file. In the future a
``rollback'' option may be added to deal with this case.
Note that @samp{-c} only works with @sc{ftp} servers and with @sc{http}
servers that support the @code{Range} header.
@cindex offset
@cindex continue retrieval
@cindex incomplete downloads
@cindex resume download
@cindex start position
@item --start-pos=@var{OFFSET}
Start downloading at zero-based position @var{OFFSET}. Offset may be expressed
in bytes, kilobytes with the `k' suffix, or megabytes with the `m' suffix, etc.
@samp{--start-pos} has higher precedence over @samp{--continue}. When
@samp{--start-pos} and @samp{--continue} are both specified, wget will emit a
warning then proceed as if @samp{--continue} was absent.
Server support for continued download is required, otherwise @samp{--start-pos}
cannot help. See @samp{-c} for details.
@cindex progress indicator
@cindex dot style
@item --progress=@var{type}
Select the type of the progress indicator you wish to use. Legal
indicators are ``dot'' and ``bar''.
The ``bar'' indicator is used by default. It draws an @sc{ascii} progress
bar graphics (a.k.a ``thermometer'' display) indicating the status of
retrieval. If the output is not a TTY, the ``dot'' bar will be used by
default.
Use @samp{--progress=dot} to switch to the ``dot'' display. It traces
the retrieval by printing dots on the screen, each dot representing a
fixed amount of downloaded data.
The progress @var{type} can also take one or more parameters. The parameters
vary based on the @var{type} selected. Parameters to @var{type} are passed by
appending them to the type sperated by a colon (:) like this:
@samp{--progress=@var{type}:@var{parameter1}:@var{parameter2}}.
When using the dotted retrieval, you may set the @dfn{style} by
specifying the type as @samp{dot:@var{style}}. Different styles assign
different meaning to one dot. With the @code{default} style each dot
represents 1K, there are ten dots in a cluster and 50 dots in a line.
The @code{binary} style has a more ``computer''-like orientation---8K
dots, 16-dots clusters and 48 dots per line (which makes for 384K
lines). The @code{mega} style is suitable for downloading large
files---each dot represents 64K retrieved, there are eight dots in a
cluster, and 48 dots on each line (so each line contains 3M).
If @code{mega} is not enough then you can use the @code{giga}
style---each dot represents 1M retrieved, there are eight dots in a
cluster, and 32 dots on each line (so each line contains 32M).
With @samp{--progress=bar}, there are currently two possible parameters,
@var{force} and @var{noscroll}.
When the output is not a TTY, the progress bar always falls back to ``dot'',
even if @samp{--progress=bar} was passed to Wget during invokation. This
behaviour can be overridden and the ``bar'' output forced by using the ``force''
parameter as @samp{--progress=bar:force}.
By default, the @samp{bar} style progress bar scroll the name of the file from
left to right for the file being downloaded if the filename exceeds the maximum
length allotted for its display. In certain cases, such as with
@samp{--progress=bar:force}, one may not want the scrolling filename in the
progress bar. By passing the ``noscroll'' parameter, Wget can be forced to
display as much of the filename as possible without scrolling through it.
Note that you can set the default style using the @code{progress}
command in @file{.wgetrc}. That setting may be overridden from the
command line. For example, to force the bar output without scrolling,
use @samp{--progress=bar:force:noscroll}.
@item --show-progress
Force wget to display the progress bar in any verbosity.
By default, wget only displays the progress bar in verbose mode. One may
however, want wget to display the progress bar on screen in conjunction with
any other verbosity modes like @samp{--no-verbose} or @samp{--quiet}. This
is often a desired a property when invoking wget to download several small/large
files. In such a case, wget could simply be invoked with this parameter to get
a much cleaner output on the screen.
This option will also force the progress bar to be printed to @file{stderr} when
used alongside the @samp{--logfile} option.
@item -N
@itemx --timestamping
Turn on time-stamping. @xref{Time-Stamping}, for details.
@item --no-if-modified-since
Do not send If-Modified-Since header in @samp{-N} mode. Send preliminary HEAD
request instead. This has only effect in @samp{-N} mode.
@item --no-use-server-timestamps
Don't set the local file's timestamp by the one on the server.
By default, when a file is downloaded, its timestamps are set to
match those from the remote file. This allows the use of
@samp{--timestamping} on subsequent invocations of wget. However, it
is sometimes useful to base the local file's timestamp on when it was
actually downloaded; for that purpose, the
@samp{--no-use-server-timestamps} option has been provided.
@cindex server response, print
@item -S
@itemx --server-response
Print the headers sent by @sc{http} servers and responses sent by
@sc{ftp} servers.
@cindex Wget as spider
@cindex spider
@item --spider
When invoked with this option, Wget will behave as a Web @dfn{spider},
which means that it will not download the pages, just check that they
are there. For example, you can use Wget to check your bookmarks:
@example
wget --spider --force-html -i bookmarks.html
@end example
This feature needs much more work for Wget to get close to the
functionality of real web spiders.
@cindex timeout
@item -T seconds
@itemx --timeout=@var{seconds}
Set the network timeout to @var{seconds} seconds. This is equivalent
to specifying @samp{--dns-timeout}, @samp{--connect-timeout}, and
@samp{--read-timeout}, all at the same time.
When interacting with the network, Wget can check for timeout and
abort the operation if it takes too long. This prevents anomalies
like hanging reads and infinite connects. The only timeout enabled by
default is a 900-second read timeout. Setting a timeout to 0 disables
it altogether. Unless you know what you are doing, it is best not to
change the default timeout settings.
All timeout-related options accept decimal values, as well as
subsecond values. For example, @samp{0.1} seconds is a legal (though
unwise) choice of timeout. Subsecond timeouts are useful for checking
server response times or for testing network latency.
@cindex DNS timeout
@cindex timeout, DNS
@item --dns-timeout=@var{seconds}
Set the DNS lookup timeout to @var{seconds} seconds. DNS lookups that
don't complete within the specified time will fail. By default, there
is no timeout on DNS lookups, other than that implemented by system
libraries.
@cindex connect timeout
@cindex timeout, connect
@item --connect-timeout=@var{seconds}
Set the connect timeout to @var{seconds} seconds. TCP connections that
take longer to establish will be aborted. By default, there is no
connect timeout, other than that implemented by system libraries.
@cindex read timeout
@cindex timeout, read
@item --read-timeout=@var{seconds}
Set the read (and write) timeout to @var{seconds} seconds. The
``time'' of this timeout refers to @dfn{idle time}: if, at any point in
the download, no data is received for more than the specified number
of seconds, reading fails and the download is restarted. This option
does not directly affect the duration of the entire download.
Of course, the remote server may choose to terminate the connection
sooner than this option requires. The default read timeout is 900
seconds.
@cindex bandwidth, limit
@cindex rate, limit
@cindex limit bandwidth
@item --limit-rate=@var{amount}
Limit the download speed to @var{amount} bytes per second. Amount may
be expressed in bytes, kilobytes with the @samp{k} suffix, or megabytes
with the @samp{m} suffix. For example, @samp{--limit-rate=20k} will
limit the retrieval rate to 20KB/s. This is useful when, for whatever
reason, you don't want Wget to consume the entire available bandwidth.
This option allows the use of decimal numbers, usually in conjunction
with power suffixes; for example, @samp{--limit-rate=2.5k} is a legal
value.
Note that Wget implements the limiting by sleeping the appropriate
amount of time after a network read that took less time than specified
by the rate. Eventually this strategy causes the TCP transfer to slow
down to approximately the specified rate. However, it may take some
time for this balance to be achieved, so don't be surprised if limiting
the rate doesn't work well with very small files.
@cindex pause
@cindex wait
@item -w @var{seconds}
@itemx --wait=@var{seconds}
Wait the specified number of seconds between the retrievals. Use of
this option is recommended, as it lightens the server load by making the
requests less frequent. Instead of in seconds, the time can be
specified in minutes using the @code{m} suffix, in hours using @code{h}
suffix, or in days using @code{d} suffix.
Specifying a large value for this option is useful if the network or the
destination host is down, so that Wget can wait long enough to
reasonably expect the network error to be fixed before the retry. The
waiting interval specified by this function is influenced by
@code{--random-wait}, which see.
@cindex retries, waiting between
@cindex waiting between retries
@item --waitretry=@var{seconds}
If you don't want Wget to wait between @emph{every} retrieval, but only
between retries of failed downloads, you can use this option. Wget will
use @dfn{linear backoff}, waiting 1 second after the first failure on a
given file, then waiting 2 seconds after the second failure on that
file, up to the maximum number of @var{seconds} you specify.
By default, Wget will assume a value of 10 seconds.
@cindex wait, random
@cindex random wait
@item --random-wait
Some web sites may perform log analysis to identify retrieval programs
such as Wget by looking for statistically significant similarities in
the time between requests. This option causes the time between requests
to vary between 0.5 and 1.5 * @var{wait} seconds, where @var{wait} was
specified using the @samp{--wait} option, in order to mask Wget's
presence from such analysis.
A 2001 article in a publication devoted to development on a popular
consumer platform provided code to perform this analysis on the fly.
Its author suggested blocking at the class C address level to ensure
automated retrieval programs were blocked despite changing DHCP-supplied
addresses.
The @samp{--random-wait} option was inspired by this ill-advised
recommendation to block many unrelated users from a web site due to the
actions of one.
@cindex proxy
@item --no-proxy
Don't use proxies, even if the appropriate @code{*_proxy} environment
variable is defined.
@c man end
@xref{Proxies}, for more information about the use of proxies with
Wget.
@c man begin OPTIONS
@cindex quota
@item -Q @var{quota}
@itemx --quota=@var{quota}
Specify download quota for automatic retrievals. The value can be
specified in bytes (default), kilobytes (with @samp{k} suffix), or
megabytes (with @samp{m} suffix).
Note that quota will never affect downloading a single file. So if you
specify @samp{wget -Q10k ftp://wuarchive.wustl.edu/ls-lR.gz}, all of the
@file{ls-lR.gz} will be downloaded. The same goes even when several
@sc{url}s are specified on the command-line. However, quota is
respected when retrieving either recursively, or from an input file.
Thus you may safely type @samp{wget -Q2m -i sites}---download will be
aborted when the quota is exceeded.
Setting quota to 0 or to @samp{inf} unlimits the download quota.
@cindex DNS cache
@cindex caching of DNS lookups
@item --no-dns-cache
Turn off caching of DNS lookups. Normally, Wget remembers the IP
addresses it looked up from DNS so it doesn't have to repeatedly
contact the DNS server for the same (typically small) set of hosts it