# Cross-Account S3 Access Test

This notebook demonstrates how to create a sample CSV, upload it to an S3 bucket using a specific AWS profile, preview the CSV using cross-account access, and clean up the test data. All utilities used are from the `utils` folder.

## 1. Create and Save Sample CSV Locally
We use pandas to create a sample CSV and save it to the `utils/data/` directory.

In [32]:
import os

import pandas as pd

# Create a sample DataFrame
df_sample = pd.DataFrame(
    {
        "id": range(1, 6),
        "name": ["Alice", "Bob", "Charlie", "Diana", "Eve"],
        "amount": [100.5, 200.0, 150.75, 300.2, 50.0],
    },
)

# Save to local file in the utils/data directory
local_dir = os.path.join("utils", "data")
os.makedirs(local_dir, exist_ok=True)
local_csv_path = os.path.join(local_dir, "cross_account_sample.csv")
df_sample.to_csv(local_csv_path, index=False)

Sample CSV saved to utils\data\cross_account_sample.csv


## 2. Upload the CSV sample file into the Source S3 Bucket.
We use the `upload_to_s3` function from `s3_sample_data_uploader.py` to upload the sample CSV file into the source bucket using the source profile.

In [None]:
# Load configuration from environment variables
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv(dotenv_path="../.env")

SOURCE_BUCKET = os.getenv(
    "SOURCE_BUCKET",
)  # a360-datalake-raw-bucket-277707121008-us-east-1
SOURCE_PROFILE = os.getenv("AWS_PROFILE")  # DataLake-Dev
TARGET_PROFILE = os.getenv("TARGET_PROFILE")  # GenAI-Platform-Dev
ASSUME_ROLE_ARN = os.getenv(
    "ASSUME_ROLE_ARN",
)  # arn:aws:iam::590183989543:role/A360-SageMaker-Studio-DomainDefaultRole

In [None]:
import os
import sys

project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
if project_root not in sys.path:
    sys.path.append(project_root)

from utils.s3_sample_data_uploader import upload_to_s3

SAMPLE_PREFIX = "cross_account_test_sample_data"
SOURCE_BUCKET = os.getenv("SOURCE_BUCKET")
SOURCE_PROFILE = os.getenv("AWS_PROFILE")

# Upload the CSV file to S3 using the upload_to_s3 function
uploaded_files = upload_to_s3(
    local_directory=local_dir,
    bucket_name=SOURCE_BUCKET,
    prefix=SAMPLE_PREFIX,
    profile=SOURCE_PROFILE,
)

a360-datalake-raw-bucket-277707121008-us-east-1
cross_account_test_sample_data
DataLake-Dev


2025-06-26 01:26:13,606 - botocore.tokens - INFO - Loading cached SSO token for a360-sso
2025-06-26 01:26:14,727 - botocore.tokens - INFO - SSO Token refresh succeeded


Uploaded files: ['cross_account_test_sample_data\\cross_account_sample.csv']


## 3. List and Preview the CSV with Target Profile
We use `list_csv_keys` and `preview_csv` from `s3_csv_preview.py` to list and preview the uploaded CSV in the source S3 using the target profile.

In [37]:
import os
import sys

project_root = os.path.abspath(os.path.join(os.getcwd(), ".."))
if project_root not in sys.path:
    sys.path.append(project_root)

from utils.s3_csv_preview import list_csv_keys, preview_csv

# List CSV files in the uploaded prefix
csv_files = list_csv_keys(SOURCE_BUCKET, SAMPLE_PREFIX, TARGET_PROFILE, ASSUME_ROLE_ARN)

# Preview the uploaded CSV using the utility
if csv_files:
    df_preview, metadata = preview_csv(
        SOURCE_BUCKET,
        csv_files[0],
        nrows=5,
        profile=TARGET_PROFILE,
        assume_role_arn=ASSUME_ROLE_ARN,
    )
    display(df_preview)
else:
    pass

2025-06-26 01:27:00,665 - botocore.tokens - INFO - Loading cached SSO token for a360-sso


arn:aws:iam::590183989543:role/A360-SageMaker-Studio-DomainDefaultRole
GenAI-Platform-Dev


2025-06-26 01:27:03,447 - botocore.tokens - INFO - Loading cached SSO token for a360-sso


CSV files found in S3: ['cross_account_test_sample_data\\cross_account_sample.csv']
Preview metadata: {'size_bytes': 89, 'last_modified': datetime.datetime(2025, 6, 26, 6, 26, 19, tzinfo=tzutc()), 'content_type': 'binary/octet-stream'}


Unnamed: 0,id,name,amount
0,1,Alice,100.5
1,2,Bob,200.0
2,3,Charlie,150.75
3,4,Diana,300.2
4,5,Eve,50.0


## 4. Clean Up: Delete the Sample CSV from S3
We delete the uploaded sample CSV from S3 using the target profile.

In [38]:
import boto3

session = boto3.Session(profile_name=SOURCE_PROFILE)
s3 = session.client("s3")
for key in csv_files:
    s3.delete_object(Bucket=SOURCE_BUCKET, Key=key)

2025-06-26 01:27:38,071 - botocore.tokens - INFO - Loading cached SSO token for a360-sso


Sample data deleted from source bucket.
