Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list_blobs only list 5000 elements #392

Closed
jmlero opened this issue Jun 19, 2015 · 3 comments
Closed

list_blobs only list 5000 elements #392

jmlero opened this issue Jun 19, 2015 · 3 comments
Labels
question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)

Comments

@jmlero
Copy link

jmlero commented Jun 19, 2015

Hi all,

I am trying to download a "folder" inside of a blob container keeping the original tree folder structure. This "folder" contains millions of files.
For this reason first I run list_blobs just to obtain the list of blobs and then download the blobs using get_blob_to_path.

blobs = blob_service.list_blobs('blob_container','data/projects/folder')
for blob in blobs:
print(blob.name)

this function only shows the first 5000 blobs inside of data/projects/folder, but as I said, I have millions of files.

Any idea why this loop only shows the first 5000 elements?
Any other suggestion to download millions of files inside of a blob container?

Thanks in advance and best regards?

@Segsfault
Copy link

You'll need to set a marker for containers with > 5k entries. here is a code snippet I used:

marker = None
while True:
  results = blob_service.list_blobs('blob_container',marker=marker, prefix=prefix, delimiter=delimiter)
  #...do stuff with results ...
  if results.next_marker:
    marker = results.next_marker
  else:
    break

Basically you set the initial marker to None to start at the beginning, then loop until a result set does not return a pointer to a new marker.

@jmlero
Copy link
Author

jmlero commented Jun 20, 2015

Hi,

Really, thanks a lot, is working now. Altough I am not using "prefix" and "delimiter" since I am not sure what is the purpose of those two parameters.

Best regards,

@Segsfault
Copy link

Prefix and delimiter are utility parameters for fetching file lists. For example let's say you have a directory structure like this:

.
├── data
│   ├── archive
│   │   ├── old_file1.csv
│   │   └── old_file2.csv
│   ├── file1.csv
│   ├── file2.csv
│   └── file3.csv
└── logs
    └── log.txt

If you set the prefix to 'data', you will get:

data/file1.csv
data/file2.csv
data/file3.csv
data/archive/old_file1.csv
data/archive/old_file2.csv

But if you also set the delimiter to '/', you will only get:

data/file1.csv
data/file2.csv
data/file3.csv

In this example, I used prefix to target a specific directory, and a delimiter of '/' to specify that I only want the files in that directory (and not subdirectories and their contents below it).

Many applications of these parameters, but this is what I use them for anyways.

@huguesv huguesv added Storage Storage Service (Queues, Blobs, Files) question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jul 8, 2015
@huguesv huguesv closed this as completed Jul 8, 2015
@bsiegel bsiegel added the Service Attention This issue is responsible by Azure service team. label Sep 26, 2018
@github-actions github-actions bot locked and limited conversation to collaborators Apr 13, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

4 participants