# Using Versioning in S3 Bucket

This notebook is a detailed, interactive tutorial for learning.

Reference:

- [Using Versioning in S3 Bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html)

## 01. Prepare Your Playground

First, we need to prepare our development environment for a better learning experience.

- an AWS CLI Profile with access to S3, it should have S3 full access and STS get-caller-identity permission.
- a bucket with versioning turned on (we will create it soon)
- Installed the following python library:
    - [boto_session_manager](https://pypi.org/project/boto_session_manager/): boto3 session management made easy
    - [s3pathlib](https://pypi.org/project/s3pathlib/): s3 manipulation made easy
    - [rich](https://pypi.org/project/s3pathlib/): for pretty print

In [1]:
# Enter your AWS Profile here
aws_profile = "awshsh_app_dev_us_east_1"

In [47]:
from rich import print as rprint
from pprint import pprint

def rprint_response(res: dict):
    """
    Pretty print boto3 API response
    """
    if "ResponseMetadata" in res:
        res.pop("ResponseMetadata")
    rprint(res)

In [32]:
from boto_session_manager import BotoSesManager
from s3pathlib import S3Path, context

bsm = BotoSesManager(profile_name=aws_profile)
context.attach_boto_session(bsm.boto_ses)

bucket = f"{bsm.aws_account_id}-{bsm.aws_region}-learn-s3-versioning"

# Create the bucket and turn on versioning
def is_bucket_exists() -> bool:
    try:
        bsm.s3_client.head_bucket(Bucket=bucket)
        return True
    except bsm.s3_client.exceptions.ClientError as e:
        return False

print("Try to create the bucket ...")
if is_bucket_exists() is False:
    kwargs = dict(Bucket=bucket)
    if bsm.aws_region != "us-east-1":
        kwargs["CreateBucketConfiguration"] = dict(LocationConstraint=bsm.aws_region)
    bsm.s3_client.create_bucket(**kwargs)
    print("done bucket is created")
else:
    print("bucket already exists")

print("Try to turn on bucket versioning ...")
response = bsm.s3_client.get_bucket_versioning(
    Bucket=bucket,
)
if "Status" in response: # versioning is already enabled or suspended
    pass
else: # versioning is not enabled
    bsm.s3_client.put_bucket_versioning(
        Bucket=bucket,
        VersioningConfiguration=dict(
            Status="Enabled",
        )
    )
print("done")

# verify if bucket versioning is enabled
response = bsm.s3_client.get_bucket_versioning(
    Bucket=bucket,
)
rprint_response(response)
print(f"preview S3 bucket: {S3Path(bucket).console_url}")

Try to create the bucket ...
bucket already exists
Try to turn on bucket versioning ...
done


preview S3 bucket: https://console.aws.amazon.com/s3/buckets/807388292768-us-east-1-learn-s3-versioning?tab=objects


## 02. Put and Get

With versioning, everytime you invoke [put_object](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/put_object.html) API, a new version of the object will be created. And everytime you invoke [get_object](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/get_object.html) API, you can get the latest version of the object.

Let's test it. First, we create a new object in a bucket with versioning turned on. It is also the first version of this object.

Reference:

- [Adding object](https://docs.aws.amazon.com/AmazonS3/latest/userguide/AddingObjectstoVersioningEnabledBuckets.html)
- [Retrieving object versions from a versioning-enabled bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/RetrievingObjectVersions.html)


In [33]:
print("Create a new object, which is also the first version of this object ...")
s3path = S3Path(bucket, "test.txt")
res = bsm.s3_client.put_object(
    Bucket=s3path.bucket,
    Key=s3path.key,
    Body="content v1",
)
rprint_response(res)

v1 = res["VersionId"]
print(f"The version id (v1) = {v1}")

Create a new object, which is also the first version of this object ...


The version id (v1) = qxS7XGuBGR5b8.12S4lrshKOqY.5r6J8


Then we can immediately get the object. By default, the latest version is returned.

In [34]:
print("Get the object ...")
res = bsm.s3_client.get_object(Bucket=s3path.bucket, Key=s3path.key)
rprint_response(res)

content = res["Body"].read().decode("utf-8")
assert content == "content v1"
print("Content = {}".format(content))

v = res["VersionId"]
assert v == v1
print("The version id (v1) = {}".format(v))

Get the object ...


Content = content v1
The version id (v1) = qxS7XGuBGR5b8.12S4lrshKOqY.5r6J8


Then we put a new content to this object, which creates a new version of the object. Note that you cannot overwrite an existing version, because the versioning system is designed to ensure immutability. That's why the ``put_object`` API doesn't have argument ``VersionId``.

In [35]:
print("Put a new version of the object ...")
res = bsm.s3_client.put_object(
    Bucket=s3path.bucket,
    Key=s3path.key,
    Body="content v2",
)
v2 = res["VersionId"]
print(f"The version id (v2) = {v2}")

Put a new version of the object ...
The version id (v2) = k59flX7WM58iS.DcZKdVWl46l8wi6_Sw


Now, let's get the object again. By default, it get the latest version, and we can see that the version id is different from the previous one.

In [36]:
print("Get the object again, now it should be the content of v2 ...")
res = bsm.s3_client.get_object(Bucket=s3path.bucket, Key=s3path.key)

content = res["Body"].read().decode("utf-8")
assert content == "content v2"
print(f"Content = {content}")

v = res["VersionId"]
assert v == v2
print(f"The version id (v2) = {v}")

Get the object again, now it should be the content of v2 ...
Content = content v2
The version id (v2) = k59flX7WM58iS.DcZKdVWl46l8wi6_Sw


We can explicitly get a historical version using version id

In [37]:
print("Explicitly get the v1 version ...")
res = bsm.s3_client.get_object(Bucket=s3path.bucket, Key=s3path.key, VersionId=v1)

content = res["Body"].read().decode("utf-8")
assert content == "content v1"
print(f"Content = {content}")

v = res["VersionId"]
assert v == v1
print(f"The version id = {v}")

Explicitly get the v1 version ...
Content = content v1
The version id = qxS7XGuBGR5b8.12S4lrshKOqY.5r6J8


We can also use the [list_object_versions](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/list_object_versions.html) API to list all the historical versions of an object. It will return in order of last modified time, from the latest to the oldest.

In [39]:
print("List all historical versions")
res = bsm.s3_client.list_object_versions(Bucket=s3path.bucket, Prefix=s3path.key)
rprint_response(res)

n_versions = len(res["Versions"])
print(f"Number of versions = {n_versions}")

List all historical versions


Number of versions = 2


## 03. Delete

There is only one 'delete' API [
delete_object](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/delete_object.html), but there are two ways to use it.

First way is that, if you call the ``delete_object`` API without giving the ``VersionId``, then it is a **regular delete**. It will mark the latest version of the object as "Deleted", and create a ``Marker``. After that, if you call ``get_object`` API without giving the ``VersionId``, it will return 404 error. Because it will find the latest version of this object, and see that it is marked as "Deleted". However, the content and the historical versions are still there, you can still get them by explicitly giving the ``VersionId``.

``Marker`` itself is not an attribute of a version, it is actually a tiny object with ``VersionId``. Note that this ``VersionId`` is the identifier of the marker, it is NOT the ``VersionId`` of the object version you deleted. This maker ``VersionId`` can be used to recover the deleted version by deleting it via the ``delete_object``.

Secondly way is that, if you call the ``delete_object`` API with a ``VersionId``, then it deletes the specific version. This method can also be used to delete a ``Maker``.

Next, we would like to test the deletion behavior. First, we put 3 more versions, so we have more versions to test with.

In [40]:
print("put 3 more versions for testing, we should have 5 versions in total.")
res = bsm.s3_client.put_object(Bucket=s3path.bucket, Key=s3path.key, Body="content v3")
v3 = res["VersionId"]

res = bsm.s3_client.put_object(Bucket=s3path.bucket, Key=s3path.key, Body="content v4")
v4 = res["VersionId"]

res = bsm.s3_client.put_object(Bucket=s3path.bucket, Key=s3path.key, Body="content v5")
v5 = res["VersionId"]

print(f"v3 = {v3}")
print(f"v4 = {v4}")
print(f"v5 = {v5}")

put 3 more versions for testing, we should have 5 versions in total.
v3 = c3VQKiaUgZMqS0y21hHahitBAsT.b9Td
v4 = lnCOajuiT8mQoQzyMurku8b5vTAgO2_c
v5 = 3WXcamR3xEOSmEU6.SyU7eCnWIDe5f0c


Now we try to delete the object. What happens is that the latest object version is marked as deleted, but the content and the historical versions are still there.

In [41]:
print("Delete the object, it marks the latest version as 'Deleted'")
res = bsm.s3_client.delete_object(Bucket=s3path.bucket, Key=s3path.key)
rprint_response(res)

m5 = res["VersionId"]
print(f"Marker Id (m5) = {m5}")

Delete the object, it marks the latest version as 'Deleted'


Marker Id (m5) = VWg5vwE3xD4pQ.gWkDaVngscosGMy.EI


Then, let's try to get the object. S3 will get the latest version of this object, and find out it is marked as deleted, so it will return a 404 error.

In [43]:
print("Get the object, it should returns a 404 error")
res = bsm.s3_client.get_object(Bucket=s3path.bucket, Key=s3path.key)

Get the object, it should returns a 404 error


NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

In [49]:
print("List all historical versions ...")
print("Now you should see a new field 'DeleteMarkers' and it has a marker object")
res = bsm.s3_client.list_object_versions(Bucket=s3path.bucket, Prefix=s3path.key)
rprint_response(res)
# pprint(res)

List all historical versions ...
Now you should see a new field 'DeleteMarkers' and it has a marker object


In [None]:

# content = res["Body"].read().decode("utf-8")
# v = res["VersionId"]
#
# assert content == "v4"
# assert v == v4
#
# print(f"Content = {content}")
# print(f"Version Id = {v}")

In [80]:
res = bsm.s3_client.get_object(Bucket=bucket, Key=key, VersionId=v4)
rprint(res)

Now we can list all version again

In [59]:
res1 = bsm.s3_client.list_object_versions(Bucket=bucket, Prefix=key)
rprint(res1)

In [24]:
_ = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=f"{prefix}/folder/file1.txt",
    Body="file1-v1",
)
_ = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=f"{prefix}/folder/file1.txt",
    Body="file1-v2",
)
_ = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=f"{prefix}/folder/file2.txt",
    Body="file2-v1",
)
_ = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=f"{prefix}/folder/file2.txt",
    Body="file2-v2",
)

In [25]:
res = bsm.s3_client.list_object_versions(
    Bucket=bucket,
    Prefix=f"{prefix}/folder/",
)
rprint(res)

In [26]:
res = bsm.s3_client.list_object_versions(
    Bucket=bucket,
    Prefix=f"{prefix}/folder/file1.txt",
)
rprint(res)