You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some cloud storages have limitations on file size. For example Rackspace Cloud Files do not allow files more than 3GB.
It would be great to have WARC splitting feature as part of website crawler.
The text was updated successfully, but these errors were encountered:
--warc-max-size NUM is implemented in v0.25 but the WARC files will be bigger than NUM bytes especially on large files. Using WARC record segmentation (described in section 7) would fix this.
I think WARC record segmentation would not be helpful. Wpull requires using temporary files and if the temporary file is too large, then Wpull cannot construct the next WARC record.
--warc-max-size can still be used, but managing files on the filesystem is out of scope of Wpull. Feel free to reopen this issue if it is still a problem.
Some cloud storages have limitations on file size. For example Rackspace Cloud Files do not allow files more than 3GB.
It would be great to have WARC splitting feature as part of website crawler.
The text was updated successfully, but these errors were encountered: