# Using Versioning in S3 Bucket

This notebook is a detailed, interactive tutorial for learning.

Reference:

- [Using Versioning in S3 Bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Versioning.html)

## Prepare Your Playground

First, we need to prepare our development environment for a better learning experience.

- an AWS CLI Profile with access to S3, it should have S3 full access and STS get-caller-identity permission.
- a bucket with versioning turned on (we will create it soon)
- Installed the following python library:
    - [boto_session_manager](https://pypi.org/project/boto_session_manager/): boto3 session management made easy
    - [s3pathlib](https://pypi.org/project/s3pathlib/): s3 manipulation made easy
    - [rich](https://pypi.org/project/s3pathlib/): for pretty print

In [1]:
# Enter your AWS Profile here
aws_profile = "awshsh_app_dev_us_east_1"

In [13]:
from rich import print as rprint

def rprint_response(res: dict):
    if "ResponseMetadata" in res:
        res.pop("ResponseMetadata")
    rprint(res)

In [18]:
from boto_session_manager import BotoSesManager
from s3pathlib import S3Path, context

bsm = BotoSesManager(profile_name=aws_profile)
context.attach_boto_session(bsm.boto_ses)

bucket = f"{bsm.aws_account_id}-{bsm.aws_region}-learn-s3-versioning"

# Create the bucket and turn on versioning
def is_bucket_exists() -> bool:
    try:
        bsm.s3_client.head_bucket(Bucket=bucket)
        return True
    except bsm.s3_client.exceptions.ClientError as e:
        return False

if is_bucket_exists() is False:
    kwargs = dict(Bucket=bucket)
    if bsm.aws_region != "us-east-1":
        kwargs["CreateBucketConfiguration"] = dict(LocationConstraint=bsm.aws_region)
    bsm.s3_client.create_bucket(**kwargs)

response = bsm.s3_client.get_bucket_versioning(
    Bucket=bucket,
)
if "Status" in response: # versioning is already enabled or suspended
    pass
else: # versioning is not enabled
    bsm.s3_client.put_bucket_versioning(
        Bucket=bucket,
        VersioningConfiguration=dict(
            Status="Enabled",
        )
    )

# verify if bucket versioning is enabled
response = bsm.s3_client.get_bucket_versioning(
    Bucket=bucket,
)
rprint_response(response)
print(f"preview S3 bucket: {S3Path(bucket).console_url}")

preview S3 bucket: https://console.aws.amazon.com/s3/buckets/807388292768-us-east-1-learn-s3-versioning?tab=objects


## Put and Get

[EN]

First, we create a new object in a bucket with versioning turned on. It is also the first version of this object

[CN]

In [71]:
s3path = S3Path(bucket, "test.txt")
res = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=key,
    Body="v1",
)
rprint(res)
v1 = res["VersionId"]
print(f"The version id = {v1}")

The version id = BLoyX9X9xLmHSx8gLgNzWUKI7kwwkuNz


Then we can immediately get the object. By default, the latest version is returned.

In [72]:
res = bsm.s3_client.get_object(Bucket=bucket, Key=key)
rprint(res)
print("Content = {}".format(res["Body"].read().decode("utf-8")))
print("The version id = {}".format(res["VersionId"]))

Content = v1
The version id = BLoyX9X9xLmHSx8gLgNzWUKI7kwwkuNz


Then we put a new content to this object, which creates a new version of the object. Note that you cannot overwrite an existing version, because the versioning system is designed to ensure immutability.

In [73]:
res = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=key,
    Body="v2",
)
v2 = res["VersionId"]
print(f"The version id = {v2}")

The version id = 36dbatEjGuecZop4MARBZMwfORr4k9Hc


We get the object again, by default, it get the latest version, and we can see that the version id is different from the previous one.

In [74]:
res = bsm.s3_client.get_object(Bucket=bucket, Key=key)

content = res["Body"].read().decode("utf-8")
assert content == "v2"
print(f"Content = {content}")

v = res["VersionId"]
assert v == v2
print(f"The version id = {v}")

Content = v2
The version id = 36dbatEjGuecZop4MARBZMwfORr4k9Hc


We can explicitly get a historical version using version id

In [75]:
res = bsm.s3_client.get_object(Bucket=bucket, Key=key, VersionId=v1)

content = res["Body"].read().decode("utf-8")
assert content == "v1"
print(f"Content = {content}")

v = res["VersionId"]
assert v == v1
print(f"The version id = {v}")

Content = v1
The version id = BLoyX9X9xLmHSx8gLgNzWUKI7kwwkuNz


We can also list all the historical version of an object. It will return in order of last modified time, from the latest to the oldest.

In [76]:
res = bsm.s3_client.list_object_versions(Bucket=bucket, Prefix=key)
rprint(res)

n_versions = len(res["Versions"])
print(f"Number of versions = {n_versions}")

Number of versions = 2


Next, we would like to test the deletion behavior. First, we put 3 more versions, so we have more versions to test with.

In [77]:
res = bsm.s3_client.put_object(Bucket=bucket, Key=key, Body="v3")
v3 = res["VersionId"]

res = bsm.s3_client.put_object(Bucket=bucket, Key=key, Body="v4")
v4 = res["VersionId"]

res = bsm.s3_client.put_object(Bucket=bucket, Key=key, Body="v5")
v5 = res["VersionId"]

print(f"v3 = {v3}")
print(f"v4 = {v4}")
print(f"v5 = {v5}")

v3 = FL9Bue.4UR_FC0h5el10gAzSzFlCQKPm
v4 = zufkTCbIA7N7DE0SPXS6AySBZzvRV5Oj
v5 = karfGCJYSeVX_d7q1fPW25syyIxKtXOl


Now we try to delete the object. What happens is that the latest object version is marked as deleted, but the content and the historical versions are still there.

In [78]:
res = bsm.s3_client.delete_object(Bucket=bucket, Key=key)
rprint(res)

# deleted_version = res["VersionId"]
# print(deleted_version, v5)
# assert deleted_version == v5
# print(f"deleted version = {deleted_version}")

Then, let's try to get the object. S3 will get the latest version of this object, and find out it is marked as deleted, so it will return a 404 error.

In [79]:
res = bsm.s3_client.get_object(Bucket=bucket, Key=key)
# content = res["Body"].read().decode("utf-8")
# v = res["VersionId"]
#
# assert content == "v4"
# assert v == v4
#
# print(f"Content = {content}")
# print(f"Version Id = {v}")

NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: The specified key does not exist.

In [80]:
res = bsm.s3_client.get_object(Bucket=bucket, Key=key, VersionId=v4)
rprint(res)

Now we can list all version again

In [59]:
res1 = bsm.s3_client.list_object_versions(Bucket=bucket, Prefix=key)
rprint(res1)

In [24]:
_ = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=f"{prefix}/folder/file1.txt",
    Body="file1-v1",
)
_ = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=f"{prefix}/folder/file1.txt",
    Body="file1-v2",
)
_ = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=f"{prefix}/folder/file2.txt",
    Body="file2-v1",
)
_ = bsm.s3_client.put_object(
    Bucket=bucket,
    Key=f"{prefix}/folder/file2.txt",
    Body="file2-v2",
)

In [25]:
res = bsm.s3_client.list_object_versions(
    Bucket=bucket,
    Prefix=f"{prefix}/folder/",
)
rprint(res)

In [26]:
res = bsm.s3_client.list_object_versions(
    Bucket=bucket,
    Prefix=f"{prefix}/folder/file1.txt",
)
rprint(res)