## Get number of s3 objects

Let us go through the details about how we can get number of s3 objects. We will understand the relevance of **Marker** to paginate `list_objects` output using boto3.

* One of the way to get s3 object metadata from a given bucket is to use `list_objects`.
* However, `list_objects` gets metadata only for 1000 objects at max.
* We need to paginate using `Marker` and iterate until we get details about all the objects.

Here are the steps we can follow to get the number of s3 objects with in a bucket.
* Create s3 client with appropriate profile.
* Invoke list_objects incrementally using `Marker` until you get details about all the objects.
* Get number of elements in the `Contents` and add it to object count. We can break the loop when the size of `Contents` list is less than 1000 or when `Contents` does not exists as part of the response.

In [1]:
import boto3

In [2]:
import os
os.environ.setdefault('AWS_PROFILE', 'itvgenlogs')

'itvgenlogs'

In [3]:
s3_client = boto3.client('s3')

### 1. Test Code

In [4]:
s3_objects = s3_client.list_objects(
    Bucket='itv-genlogs-mana00',
    Prefix='logs/year'
)

In [5]:
s3_objects.keys()

dict_keys(['ResponseMetadata', 'IsTruncated', 'Marker', 'Contents', 'Name', 'Prefix', 'MaxKeys', 'EncodingType'])

In [6]:
s3_objects['Marker'] # Is the key of the last read element

''

In [7]:
s3_objects['MaxKeys']

1000

In [8]:
len(s3_objects['Contents'])

7

In [9]:
s3_objects['Contents'][-1]['Key'] # The last element is the list

'logs/year=2024/month=02/day=16/gen_logs_s3-1-2024-02-16-12-58-26-80ac9ca0-1fe2-446b-b0fc-ad0ba6a991a4'

In [10]:
marker = s3_objects['Contents'][-1]['Key'] # Marker is for input value, we need to prepare them for the function

In [11]:
s3_objects = s3_client.list_objects(
    Bucket='itv-genlogs-mana00',
    Prefix='logs/year',
    Marker=marker
)

In [12]:
s3_objects['Marker']

'logs/year=2024/month=02/day=16/gen_logs_s3-1-2024-02-16-12-58-26-80ac9ca0-1fe2-446b-b0fc-ad0ba6a991a4'

In [20]:
s3_objects['Contents'] # s3_objects == NULL

TypeError: 'NoneType' object is not subscriptable

### 2. Real Code

In [14]:
marker = ''
object_count = 0
while True:
    s3_objects = s3_client.list_objects(
        Bucket='itv-genlogs-mana00',
        Prefix='logs/year',
        Marker=marker,
        MaxKeys=200  # Read 200 items at a time, default 1000
    ).get('Contents') # Notice that we use .get here ,not ['']
    if not s3_objects:
        break
    object_count += len(s3_objects)
    marker = s3_objects[-1]['Key']
    print(marker)

logs/year=2024/month=02/day=16/gen_logs_s3-1-2024-02-16-12-58-26-80ac9ca0-1fe2-446b-b0fc-ad0ba6a991a4


In [15]:
object_count

7