The commoncrawl
Bucket stores many files containing lists of URL, that you can use to download a collection of files that was provided by a Web Crawler.
For this task, the Bucket commoncrawl
and key crawl-data/CC-MAIN-2022-05/wet.paths.gz
was provided. The objective is to pick an URL from the Path File located in the Bucket and download a file containing the Web Crawler's data.
-
Notifications
You must be signed in to change notification settings - Fork 0
Krisalyd/aws-s3-file-downloader
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Testing file download from AWS's S3 Bucket with Python.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published