<span style="color:#1B65F1;font-weight:600;font-size:30px"> 
CloudWatch Log Retrieval - Step Function CPU
</span> <br>

The following notebook assists with identifying relevant CloudWatch log items in relation to a Team's S3 Bucket Title. <br>
The notebook will facilicate: <br>
- Identifying Received lines related to our 'team-one-S3-cosmic' S3 Bucket. <br>
- Search for the most relevant Request ID value affiliated with the World Size and Rank Received line.  <br>
- Using the captured Request ID, will grab the appropriate REPORT RequestId performance metrics including Duration, Billed Duration, Memory Size, Max Memory Used, and Init Duration.

In [2]:
import numpy as np
import pandas as pd
import math
from datetime import datetime
pd.set_option('max_colwidth', 300)

In [3]:
## Load in Log
step_function_logdata = pd.read_csv('Cosmic_AI_Logs/Trial_Two_logs-insights-results.csv')
# step_function_logdata = pd.read_csv('Cosmic_AI_Logs/cw_log_25mb_10.csv')

In [6]:
## Review data
print(step_function_logdata.shape)
step_function_logdata[0:2]

(370, 4)


Unnamed: 0,@timestamp,@message,@logStream,@log
0,2024-11-26 16:25:14.565,END RequestId: fe5cfefb-6e98-4e56-9662-edc18841b3d1\n,2024/11/26/[$LATEST]b70e567b847d48a384ba45fb45bbcb60,211125778552:/aws/lambda/cosmic-executor
1,2024-11-26 16:25:14.565,REPORT RequestId: fe5cfefb-6e98-4e56-9662-edc18841b3d1\tDuration: 12606.91 ms\tBilled Duration: 13836 ms\tMemory Size: 10240 MB\tMax Memory Used: 2865 MB\tInit Duration: 1228.99 ms\t\n,2024/11/26/[$LATEST]b70e567b847d48a384ba45fb45bbcb60,211125778552:/aws/lambda/cosmic-executor


### Searching for all Received records by World Size - Single Search

In [256]:
single_ws_filter = step_function_logdata[
                    step_function_logdata['@message'].str.contains("'WORLD_SIZE': '1'", na=False) &
                   step_function_logdata['@message'].str.contains('team-one', na=False)  ] 
single_ws_filter

Unnamed: 0,@timestamp,@message,@logStream,@log
2704,2024-11-21 19:00:56.811,"received: {'S3_BUCKET': 'team-one-s3-cosmic', 'S3_OBJECT_NAME': 'scripts/Anomaly Detection', 'SCRIPT': '/tmp/scripts/Anomaly Detection/Inference/inference_batch2048.py', 'S3_OBJECT_TYPE': 'folder', 'WORLD_SIZE': '1', 'RANK': '0', 'data_path': '/tmp/scripts/Anomaly Detection/Inference/resized_in...",2024/11/21/[$LATEST]67212cd480cc4910810e2ea6d98378a8,211125778552:/aws/lambda/cosmic-executor
2765,2024-11-21 18:52:45.187,"received: {'S3_BUCKET': 'team-one-s3-cosmic', 'S3_OBJECT_NAME': 'scripts/Anomaly Detection', 'SCRIPT': '/tmp/scripts/Anomaly Detection/Inference/inference_batch256.py', 'S3_OBJECT_TYPE': 'folder', 'WORLD_SIZE': '1', 'RANK': '0', 'data_path': '/tmp/scripts/Anomaly Detection/Inference/resized_inf...",2024/11/21/[$LATEST]e995e2ac936b4cb3b7d958ed3d184e33,211125778552:/aws/lambda/cosmic-executor
2822,2024-11-21 18:35:52.627,"received: {'S3_BUCKET': 'team-one-s3-cosmic', 'S3_OBJECT_NAME': 'scripts/Anomaly Detection', 'SCRIPT': '/tmp/scripts/Anomaly Detection/Inference/inference_batch1024.py', 'S3_OBJECT_TYPE': 'folder', 'WORLD_SIZE': '1', 'RANK': '0', 'data_path': '/tmp/scripts/Anomaly Detection/Inference/resized_in...",2024/11/21/[$LATEST]758b97adf2f246b4b20ecd184aa64889,211125778552:/aws/lambda/cosmic-executor
2976,2024-11-21 17:32:21.435,"received: {'S3_BUCKET': 'team-one-s3-cosmic', 'S3_OBJECT_NAME': 'scripts/Anomaly Detection', 'SCRIPT': '/tmp/scripts/Anomaly Detection/Inference/inference.py', 'S3_OBJECT_TYPE': 'folder', 'WORLD_SIZE': '1', 'RANK': '0', 'data_path': '/tmp/scripts/Anomaly Detection/Inference/resized_inference.pt...",2024/11/21/[$LATEST]1e7ea14e6f03408a9a185965570c0c1e,211125778552:/aws/lambda/cosmic-executor


### Search Based on Request ID

In [7]:
single_ws_filter = step_function_logdata[
                    step_function_logdata['@message'].str.contains("REPORT RequestId", na=False) ]
single_ws_filter

Unnamed: 0,@timestamp,@message,@logStream,@log
1,2024-11-26 16:25:14.565,REPORT RequestId: fe5cfefb-6e98-4e56-9662-edc18841b3d1\tDuration: 12606.91 ms\tBilled Duration: 13836 ms\tMemory Size: 10240 MB\tMax Memory Used: 2865 MB\tInit Duration: 1228.99 ms\t\n,2024/11/26/[$LATEST]b70e567b847d48a384ba45fb45bbcb60,211125778552:/aws/lambda/cosmic-executor
12,2024-11-26 16:25:13.546,REPORT RequestId: 41ff7714-5c72-4d2d-b590-eeb2e59bd155\tDuration: 11988.23 ms\tBilled Duration: 12906 ms\tMemory Size: 10240 MB\tMax Memory Used: 2837 MB\tInit Duration: 917.59 ms\t\n,2024/11/26/[$LATEST]a2b5a53d45b740318c41b607b9cf332d,211125778552:/aws/lambda/cosmic-executor
15,2024-11-26 16:25:13.423,REPORT RequestId: cf9234fb-b0d7-4f2c-959c-1bc359068478\tDuration: 11855.42 ms\tBilled Duration: 12796 ms\tMemory Size: 10240 MB\tMax Memory Used: 2852 MB\tInit Duration: 939.67 ms\t\n,2024/11/26/[$LATEST]8b8c0cb0af6445308679d029d3ffd750,211125778552:/aws/lambda/cosmic-executor
18,2024-11-26 16:25:13.295,REPORT RequestId: 0670e1c6-fa0e-4f7a-a316-59f40e4ab621\tDuration: 11571.60 ms\tBilled Duration: 12602 ms\tMemory Size: 10240 MB\tMax Memory Used: 2820 MB\tInit Duration: 1029.70 ms\t\n,2024/11/26/[$LATEST]36ccde0415e542988804388e07c76956,211125778552:/aws/lambda/cosmic-executor
21,2024-11-26 16:25:13.210,REPORT RequestId: eb680fee-a74a-42cc-a65d-21a592230318\tDuration: 11782.33 ms\tBilled Duration: 12574 ms\tMemory Size: 10240 MB\tMax Memory Used: 2850 MB\tInit Duration: 791.44 ms\t\n,2024/11/26/[$LATEST]8292b50e9e77424fbb1a1c3f4d1a1c4e,211125778552:/aws/lambda/cosmic-executor
24,2024-11-26 16:25:12.986,REPORT RequestId: 489d2dd0-fca5-4531-abc8-66bb29b69972\tDuration: 11651.44 ms\tBilled Duration: 12395 ms\tMemory Size: 10240 MB\tMax Memory Used: 2866 MB\tInit Duration: 743.29 ms\t\n,2024/11/26/[$LATEST]7026f6d7426c42d98404c36422bbf425,211125778552:/aws/lambda/cosmic-executor
27,2024-11-26 16:25:12.816,REPORT RequestId: 27f37f76-9d9c-4986-b4d5-a8787452694f\tDuration: 11292.46 ms\tBilled Duration: 12245 ms\tMemory Size: 10240 MB\tMax Memory Used: 2812 MB\tInit Duration: 952.30 ms\t\n,2024/11/26/[$LATEST]09e8d0cdf7ef4e2bbe16c516fd9dc7ec,211125778552:/aws/lambda/cosmic-executor
38,2024-11-26 16:25:12.518,REPORT RequestId: 07c74476-9ae1-44c1-8dac-ecdcfd85ede7\tDuration: 11198.84 ms\tBilled Duration: 11956 ms\tMemory Size: 10240 MB\tMax Memory Used: 2839 MB\tInit Duration: 756.54 ms\t\n,2024/11/26/[$LATEST]6b81c1b0e7024fd1a495e42573e61f77,211125778552:/aws/lambda/cosmic-executor
41,2024-11-26 16:25:12.496,REPORT RequestId: bc51b301-6c63-49fe-b988-72e32d8ffd80\tDuration: 10250.95 ms\tBilled Duration: 10951 ms\tMemory Size: 10240 MB\tMax Memory Used: 2826 MB\tInit Duration: 699.97 ms\t\n,2024/11/26/[$LATEST]f061137c82494c2a852de8d426ba1a43,211125778552:/aws/lambda/cosmic-executor
76,2024-11-26 16:25:11.925,REPORT RequestId: fdc0b18f-4a45-4241-9dfa-bc637fca384f\tDuration: 10603.46 ms\tBilled Duration: 11291 ms\tMemory Size: 10240 MB\tMax Memory Used: 2843 MB\tInit Duration: 686.71 ms\t\n,2024/11/26/[$LATEST]22a6446a6d05476297be7dfd28eaa743,211125778552:/aws/lambda/cosmic-executor


In [8]:
run_index_vals = single_ws_filter.index; print(run_index_vals)

Index([1, 12, 15, 18, 21, 24, 27, 38, 41, 76], dtype='int64')


<div style="border-top: 8px solid #16145b; margin-top: 10px; margin-bottom: 10px;"></div>

<span style="color:#1B65F1;font-weight:550;font-size:26px"> 
Result Retrievel from Multiple Executions
</span> <br>
Use the following code to capture all results related to a single WS.

In [268]:
# Search parameters
world_sizes = [1]
bucket_name = 'team-one-s3-cosmic'

In [311]:
report_results_dataframes = []
for i in world_sizes:
    ws_search_word = (f"'WORLD_SIZE': '{i}'")
    # print(ws_search_word)
    filtered_data = step_function_logdata[
                    step_function_logdata['@message'].str.contains(ws_search_word, na=False) &
                   step_function_logdata['@message'].str.contains(bucket_name, na=False)  ] 
    # print(filtered_data)
    df_index_vals = filtered_data.index; print(df_index_vals)
    
    for ii in df_index_vals:
        # print(filtered_data)
        print(f' Received Index in run: {ii}')
        received_text = filtered_data.loc[ii,'@message']  ; ### print(received_text) 
        
        for index_plus_one in range(6):
            new_index = ii - 3 + index_plus_one;  ### print(f' Next index: {new_index}')
            row_text = step_function_logdata.loc[new_index]['@message'] ; ###print(row_text)
            t_f = 'START RequestId:' in row_text ; ### print(t_f)
            if t_f == True:
                request_id_text = step_function_logdata.loc[new_index]['@message']
                # print(request_id_text)
                request_id = request_id_text.split("RequestId:")[1].split()[0]
                # print(request_id)
                break
            # else: 
            #     continue   
        # print(filtered_data)
        # print(filtered_data)
        # print('check - say hi')
        
        report_data_temp = step_function_logdata[
                        step_function_logdata['@message'].str.contains(request_id, na=False) &
                       step_function_logdata['@message'].str.contains('REPORT RequestId:', na=False)  ] 
        report_data_temp = report_data_temp.iloc[:,[0,1]]
        report_data_temp['inference_run'] = received_text
        report_results_dataframes.append(report_data_temp)  # Append the DataFrame to the list

        # print(f' Row to be added: {report_data_temp}')
# filtered_data

Index([2704, 2765, 2822, 2976], dtype='int64')
 Received Index in run: 2704
 Received Index in run: 2765
 Received Index in run: 2822
 Received Index in run: 2976


In [312]:
## Create single DF with all results. 
combined_results_df = pd.concat(report_results_dataframes, ignore_index=False)

### Results

In [313]:
print(combined_results_df.shape)
combined_results_df.head()

(4, 3)


Unnamed: 0,@timestamp,@message,inference_run
2645,2024-11-21 19:01:03.245,REPORT RequestId: f623ffe3-78ed-4d0e-a9de-ac00b9e9d71c\tDuration: 6434.80 ms\tBilled Duration: 7381 ms\tMemory Size: 10240 MB\tMax Memory Used: 645 MB\tInit Duration: 945.56 ms\t\n,"received: {'S3_BUCKET': 'team-one-s3-cosmic', 'S3_OBJECT_NAME': 'scripts/Anomaly Detection', 'SCRIPT': '/tmp/scripts/Anomaly Detection/Inference/inference_batch2048.py', 'S3_OBJECT_TYPE': 'folder', 'WORLD_SIZE': '1', 'RANK': '0', 'data_path': '/tmp/scripts/Anomaly Detection/Inference/resized_in..."
2708,2024-11-21 18:52:57.067,REPORT RequestId: b9aa5c74-5ff2-4631-9e25-1d4e8bf749db\tDuration: 11879.83 ms\tBilled Duration: 12578 ms\tMemory Size: 10240 MB\tMax Memory Used: 1960 MB\tInit Duration: 698.09 ms\t\n,"received: {'S3_BUCKET': 'team-one-s3-cosmic', 'S3_OBJECT_NAME': 'scripts/Anomaly Detection', 'SCRIPT': '/tmp/scripts/Anomaly Detection/Inference/inference_batch256.py', 'S3_OBJECT_TYPE': 'folder', 'WORLD_SIZE': '1', 'RANK': '0', 'data_path': '/tmp/scripts/Anomaly Detection/Inference/resized_inf..."
2767,2024-11-21 18:36:05.587,REPORT RequestId: 6196cb66-f681-4f5f-8641-71cc5f8f7b04\tDuration: 12960.20 ms\tBilled Duration: 13967 ms\tMemory Size: 10240 MB\tMax Memory Used: 3350 MB\tInit Duration: 1006.32 ms\t\n,"received: {'S3_BUCKET': 'team-one-s3-cosmic', 'S3_OBJECT_NAME': 'scripts/Anomaly Detection', 'SCRIPT': '/tmp/scripts/Anomaly Detection/Inference/inference_batch1024.py', 'S3_OBJECT_TYPE': 'folder', 'WORLD_SIZE': '1', 'RANK': '0', 'data_path': '/tmp/scripts/Anomaly Detection/Inference/resized_in..."
2924,2024-11-21 17:32:32.226,REPORT RequestId: 394fd31e-ddb1-4746-aa70-7e3ac3c572eb\tDuration: 10792.45 ms\tBilled Duration: 11671 ms\tMemory Size: 10240 MB\tMax Memory Used: 2848 MB\tInit Duration: 877.64 ms\t\n,"received: {'S3_BUCKET': 'team-one-s3-cosmic', 'S3_OBJECT_NAME': 'scripts/Anomaly Detection', 'SCRIPT': '/tmp/scripts/Anomaly Detection/Inference/inference.py', 'S3_OBJECT_TYPE': 'folder', 'WORLD_SIZE': '1', 'RANK': '0', 'data_path': '/tmp/scripts/Anomaly Detection/Inference/resized_inference.pt..."


### Query by individual ID

In [307]:
step_function_logdata.loc[2645]

@timestamp                                                                                                                                                                 2024-11-21 19:01:03.245
@message      REPORT RequestId: f623ffe3-78ed-4d0e-a9de-ac00b9e9d71c\tDuration: 6434.80 ms\tBilled Duration: 7381 ms\tMemory Size: 10240 MB\tMax Memory Used: 645 MB\tInit Duration: 945.56 ms\t\n
@logStream                                                                                                                                    2024/11/21/[$LATEST]67212cd480cc4910810e2ea6d98378a8
@log                                                                                                                                                      211125778552:/aws/lambda/cosmic-executor
Name: 2645, dtype: object

<div style="border-top: 8px solid #16145b; margin-top: 10px; margin-bottom: 10px;"></div>

## Searching by Time

### Use the time labels above to isolate the records for that particular execution. 

### Using the RequestId, you can query certain executions by Rank and observe the performance metrics. 