list_blobs only list 5000 elements #392

jmlero · 2015-06-19T14:38:00Z

Hi all,

I am trying to download a "folder" inside of a blob container keeping the original tree folder structure. This "folder" contains millions of files.
For this reason first I run list_blobs just to obtain the list of blobs and then download the blobs using get_blob_to_path.

blobs = blob_service.list_blobs('blob_container','data/projects/folder')
for blob in blobs:
print(blob.name)

this function only shows the first 5000 blobs inside of data/projects/folder, but as I said, I have millions of files.

Any idea why this loop only shows the first 5000 elements?
Any other suggestion to download millions of files inside of a blob container?

Thanks in advance and best regards?

Segsfault · 2015-06-19T19:47:27Z

You'll need to set a marker for containers with > 5k entries. here is a code snippet I used:

marker = None
while True:
  results = blob_service.list_blobs('blob_container',marker=marker, prefix=prefix, delimiter=delimiter)
  #...do stuff with results ...
  if results.next_marker:
    marker = results.next_marker
  else:
    break

Basically you set the initial marker to None to start at the beginning, then loop until a result set does not return a pointer to a new marker.

jmlero · 2015-06-20T07:05:55Z

Hi,

Really, thanks a lot, is working now. Altough I am not using "prefix" and "delimiter" since I am not sure what is the purpose of those two parameters.

Best regards,

Segsfault · 2015-06-22T16:30:07Z

Prefix and delimiter are utility parameters for fetching file lists. For example let's say you have a directory structure like this:

.
├── data
│   ├── archive
│   │   ├── old_file1.csv
│   │   └── old_file2.csv
│   ├── file1.csv
│   ├── file2.csv
│   └── file3.csv
└── logs
    └── log.txt

If you set the prefix to 'data', you will get:

data/file1.csv
data/file2.csv
data/file3.csv
data/archive/old_file1.csv
data/archive/old_file2.csv

But if you also set the delimiter to '/', you will only get:

data/file1.csv
data/file2.csv
data/file3.csv

In this example, I used prefix to target a specific directory, and a delimiter of '/' to specify that I only want the files in that directory (and not subdirectories and their contents below it).

Many applications of these parameters, but this is what I use them for anyways.

huguesv added Storage Storage Service (Queues, Blobs, Files) question The issue doesn't require a change to the product in order to be resolved. Most issues start as that labels Jul 8, 2015

huguesv closed this as completed Jul 8, 2015

bsiegel added the Service Attention This issue is responsible by Azure service team. label Sep 26, 2018

github-actions bot locked and limited conversation to collaborators Apr 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

list_blobs only list 5000 elements #392

list_blobs only list 5000 elements #392

jmlero commented Jun 19, 2015

Segsfault commented Jun 19, 2015

jmlero commented Jun 20, 2015

Segsfault commented Jun 22, 2015

list_blobs only list 5000 elements #392

list_blobs only list 5000 elements #392

Comments

jmlero commented Jun 19, 2015

Segsfault commented Jun 19, 2015

jmlero commented Jun 20, 2015

Segsfault commented Jun 22, 2015