Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(logs): log retention fails with OperationAbortedException #16083

Merged
merged 4 commits into from Sep 10, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
54 changes: 32 additions & 22 deletions packages/@aws-cdk/aws-logs/lib/log-retention-provider/index.ts
Expand Up @@ -18,14 +18,35 @@ interface SdkRetryOptions {
* @param options CloudWatch API SDK options.
*/
async function createLogGroupSafe(logGroupName: string, region?: string, options?: SdkRetryOptions) {
try { // Try to create the log group
const cloudwatchlogs = new AWS.CloudWatchLogs({ apiVersion: '2014-03-28', region, ...options });
await cloudwatchlogs.createLogGroup({ logGroupName }).promise();
} catch (e) {
if (e.code !== 'ResourceAlreadyExistsException') {
throw e;
// If we set the log retention for a lambda, then due to the async nature of
// Lambda logging there could be a race condition when the same log group is
// already being created by the lambda execution. This can sometime result in
// an error "OperationAbortedException: A conflicting operation is currently
// in progress...Please try again."
// To avoid an error, we do as requested and try again.
let retryCount = options?.maxRetries == undefined ? 10 : options.maxRetries;
const delay = options?.retryOptions?.base == undefined ? 10 : options.retryOptions.base;
do {
try {
const cloudwatchlogs = new AWS.CloudWatchLogs({ apiVersion: '2014-03-28', region, ...options });
await cloudwatchlogs.createLogGroup({ logGroupName }).promise();
return;
} catch (error) {
if (error.code == 'ResourceAlreadyExistsException') {
// Yes, the log group is already created by the lambda execution
return;
}
if (error.code != 'OperationAbortedException') {
ddl-denis-parnovskiy marked this conversation as resolved.
Show resolved Hide resolved
throw error;
}
if (retryCount == 0) {
ddl-denis-parnovskiy marked this conversation as resolved.
Show resolved Hide resolved
ddl-denis-parnovskiy marked this conversation as resolved.
Show resolved Hide resolved
console.error(error);
throw error;
}
}
}
await new Promise(resolve => setTimeout(resolve, delay));
} while (retryCount > 0);
ddl-denis-parnovskiy marked this conversation as resolved.
Show resolved Hide resolved
throw new Error('Out of attempts to create logGroup');
}

/**
Expand Down Expand Up @@ -64,21 +85,10 @@ export async function handler(event: AWSLambda.CloudFormationCustomResourceEvent
await setRetentionPolicy(logGroupName, logGroupRegion, retryOptions, parseInt(event.ResourceProperties.RetentionInDays, 10));

if (event.RequestType === 'Create') {
// Set a retention policy of 1 day on the logs of this function. The log
// group for this function should already exist at this stage because we
// already logged the event but due to the async nature of Lambda logging
// there could be a race condition. So we also try to create the log group
// of this function first. If multiple LogRetention constructs are present
// in the stack, they will try to act on this function's log group at the
// same time. This can sometime result in an OperationAbortedException. To
// avoid this and because this operation is not critical we catch all errors.
ddl-denis-parnovskiy marked this conversation as resolved.
Show resolved Hide resolved
try {
const region = process.env.AWS_REGION;
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See in the old code, if CreateLogGroupSafe throws an error, setRetentionPolicy is skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by this?
This behaviour is the same in your updated code as well, i.e., if createLogGroupSafe() throws an error setRetentionPolicy() is skipped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old code:
CreateLogGroupSafe encounters OperationAbortedException, throws exception, setRetentionPolicy is skipped, the error is eaten by catch all.
Result: retention policy for this function is not set and no error. This is a bug in the old code I'm talking about.

New code:
CreateLogGroupSafe encounters OperationAbortedException, retries, if really fails after several attempts, then the error is floated up the stack.
Result: either log group is created with limited retention, or an exception is thrown.

} catch (e) {
console.log(e);
}
// Set a retention policy of 1 day on the logs of this function.
const region = process.env.AWS_REGION;
await createLogGroupSafe(`/aws/lambda/${context.functionName}`, region, retryOptions);
await setRetentionPolicy(`/aws/lambda/${context.functionName}`, region, retryOptions, 1);
}
}

Expand Down
60 changes: 59 additions & 1 deletion packages/@aws-cdk/aws-logs/test/test.log-retention-provider.ts
Expand Up @@ -27,6 +27,14 @@ function createRequest(type: string) {
.reply(200);
}

class MyError extends Error {
code: string;
constructor(message: string, code: string) {
super(message);
this.code = code;
}
}

export = {
'tearDown'(callback: any) {
AWS.restore();
Expand Down Expand Up @@ -232,9 +240,59 @@ export = {
},

async 'does not fail when operations on provider log group fail'(test: Test) {
let attempt = 2;
const createLogGroupFake = (params: AWSSDK.CloudWatchLogs.CreateLogGroupRequest) => {
if (params.logGroupName === '/aws/lambda/provider') {
return Promise.reject(new Error('OperationAbortedException'));
if (attempt > 0) {
attempt--;
return Promise.reject(new MyError(
'A conflicting operation is currently in progress against this resource. Please try again.',
'OperationAbortedException'));
} else {
return Promise.resolve({});
}
}
return Promise.resolve({});
};

const putRetentionPolicyFake = sinon.fake.resolves({});
const deleteRetentionPolicyFake = sinon.fake.resolves({});

AWS.mock('CloudWatchLogs', 'createLogGroup', createLogGroupFake);
AWS.mock('CloudWatchLogs', 'putRetentionPolicy', putRetentionPolicyFake);
AWS.mock('CloudWatchLogs', 'deleteRetentionPolicy', deleteRetentionPolicyFake);

const event = {
...eventCommon,
RequestType: 'Create',
ResourceProperties: {
ServiceToken: 'token',
RetentionInDays: '30',
LogGroupName: 'group',
},
};

const request = createRequest('SUCCESS');

await provider.handler(event as AWSLambda.CloudFormationCustomResourceCreateEvent, context);

test.equal(request.isDone(), true);

test.done();
},

async 'does not fail when operations on CDK lambda log group fail'(test: Test) {
let attempt = 2;
const createLogGroupFake = (params: AWSSDK.CloudWatchLogs.CreateLogGroupRequest) => {
if (params.logGroupName === 'group') {
if (attempt > 0) {
attempt--;
return Promise.reject(new MyError(
'A conflicting operation is currently in progress against this resource. Please try again.',
'OperationAbortedException'));
} else {
return Promise.resolve({});
}
}
return Promise.resolve({});
};
Expand Down