Skip to content

Conversation

@dangmaul-amazon
Copy link
Contributor

@dangmaul-amazon dangmaul-amazon commented Mar 25, 2021

Issue #, if available:

Description of changes:
As part of Lambda 1-click integration, we want the agent to create a profiling group for the user if it does not exist. To do this, we add a create_profiling_group API call in places where we could receive a ResourceNotFoundException for the PG.

Testing: Pytest unit tests.
Tested on a sample app in my AWS account.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

As part of Lambda 1-click integration, we want the agent to create a profiling group for the user if it does not exist. To do this, we add a create_profiling_group API call in places where we could receive a ResourceNotFoundException for the PG.
self.metadata = environment["agent_metadata"]
self.agent_config_merger = environment["agent_config_merger"]
self.is_lambda_one_click_pg_created_during_execution = False
# self.is_create_pg_called_during_submit_profile = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you want to keep only one of this is_create_pg_called_during_submit_profile and remove the other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides having all these flags for the Lambda usecase, how about creating a PG (if it doesn't exist already) in the Lambda decorator only? https://github.com/aws/amazon-codeguru-profiler-python-agent/blob/main/codeguru_profiler_agent/aws_lambda/profiler_decorator.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose you want to keep only one of this is_create_pg_called_during_submit_profile and remove the other.

Good catch! Forgot to remove this. Fixed in the latest commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides having all these flags for the Lambda usecase, how about creating a PG (if it doesn't exist already) in the Lambda decorator only? https://github.com/aws/amazon-codeguru-profiler-python-agent/blob/main/codeguru_profiler_agent/aws_lambda/profiler_decorator.py

We could use that approach but we do not want the initialization of the decorator to depend on the API calls. If let's say we do that, then we will have to check whether a PG has been created already by using DescribeProfilingGroup API and then conditionally call CreateProfilingGroup API. Then, in this case the load on control plane API describePG will be huge if the lambda scales to lets say 10000 instances. Another variation of this would be to call CreateProfilingGroup API unconditionally, in which case the load on CreatePG will be huge. Instead if we depend on refresh_configuration() or report() method calls, then we automatically get the functionality where if PG is not present already then we get ResourceNotFoundException, and we get the functionality where we will get this exception only once and we will call CreatePG only once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the explanation, good point!

global is_create_pg_called_during_submit_profile
is_create_pg_called_during_submit_profile = True
self.create_pg_when_one_click_integration_is_active()
self._log_request_failed(operation="post_agent_profile", exception=error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say to not log exception when we create the PG, but log something at INFO level that we are creating one PG.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this throwing ClientError? also the operation should be configure_agent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say to not log exception when we create the PG, but log something at INFO level that we are creating one PG.

Changed in the next commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because both ConfigureAgent and PostAgentProfile throw a ResourceNotFoundException. So if we fail to create a PG during the first refresh_configuration() call and the agent defaults to a default configuration, we can still attempt to create a PG during the report() call.

from codeguru_profiler_agent.utils.time import current_milli_time
from test.pytestutils import before
from mock import MagicMock, call
from mock import MagicMock, call, patch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new "patch" used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope removed it! :)

Comment on lines 180 to 184
try:
self.client_stubber.assert_no_pending_responses()
assert False
except AssertionError:
assert True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the usecase when the failure of the assertions should not be considered as a failure? I think this is the equivalent of not having any assertions, so if it's not reliable or useful, we can just delete it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, client_stubber.assert_no_pending_responses() should throw exception. What's the usecase? If this is really difficult to test, try to assert something more, like the message or something so we do not hide any failures that we do not want.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense. I removed the unnecessary check and just used self.client_stubber.assert_no_pending_responses() like we have in our other test files (ex. test_end_to_end_profiling.py)

)
logger.info("Reported profile successfully")
return True
except ClientError as error:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update the documentation of this functions that this creates a PG in the case of using Lamba layers if it throws ResourceNotFoundException? I would be clearer when reading the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!


logger = logging.getLogger(__name__)

HANDLER_ENV_NAME_FOR_CODEGURU = "HANDLER_ENV_NAME_FOR_CODEGURU"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about ending the variable in KEY, or ENV_KEY to be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

if fleet_info_as_map['computeType'] == 'aws_lambda':
handler_name_ = os.environ.get(HANDLER_ENV_NAME_FOR_CODEGURU)
if not handler_name_:
logger.error("Env Variables for CodeGuru Profiler Lambda Layer are not set. Cannot create PG.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this be printed if the customer is using Lambda, but without the layer integration?

If so, we should not log error then.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this is only when the customer is using Lambda layer. 1-click support is only for the layer I believe.

"""
# https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
python_runtime_identifier_prefix = "python"
execution_environment_env_variable = os.environ.get(AWS_EXECUTION_ENV_KEY)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be the lambda's runtime to be non-python, but this code to run?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per above comment! Why do we need to know if the execution environment is python or not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my bad not needed. removed it

except AssertionError:
assert True
assert self.subject.is_profiling_group_created_during_execution is False
assert self.client_stubber.assert_no_pending_responses() is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this warns about no empty line at end of file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, added it!

lambda_one_click_profiling_group_name = "aws-lambda-testLambdaName"
test_agent_metadata_for_lambda = AgentMetadata(
fleet_info=AWSLambda("arn:aws:lambda:us-east-1:111111111111:function:testLambdaName", "memory", "env", "agentId"))
test_lambda_profiling_group_name = "aws-lambda-testLambdaName"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be renamed to something clearer e.g. autocreated_profiling_group_name_lambda or something like it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, done!

self.reporter.refresh_configuration()

def _report_profile(self, now):
current_last_report_attempted_value = self.last_report_attempted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: replace current with previous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Comment on lines 180 to 184
try:
self.client_stubber.assert_no_pending_responses()
assert False
except AssertionError:
assert True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it, client_stubber.assert_no_pending_responses() should throw exception. What's the usecase? If this is really difficult to test, try to assert something more, like the message or something so we do not hide any failures that we do not want.

def test_when_backends_sends_resource_not_found_it_stops_the_profiling_in_non_lambda_case(self):
self.client_stubber.add_client_error('configure_agent', service_error_code='ResourceNotFoundException',
service_message='Simulated error in configure_agent call')
os.environ.__setitem__(AWS_EXECUTION_ENV_KEY, 'AWS_Lambda_java')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test is about non-lambda, though this is set to lambda java. Either update the test name, either remove this line? (also related to my questions why are we considering there the java lambda usecase; I think it can't happen, and this should be removed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this!

Comment on lines 84 to 85
logger.info("ResourceNotFoundException for a Lambda PG in ConfigureAgent call. "+
"Attempting to create PG")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change logging message to this :

"Profiling group not found. Will try to create a profiling group with name = %s and compute platform = %s and retry calling configure agent after 5 minutes"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self._log_request_failed(operation="configure_agent", exception=error)
self._log_request_failed(operation="configure_agent", exception=error)

if error.response['Error']['Code'] == 'ResourceNotFoundException':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question here : What configuration will be used in case of ResourceNotFoundException and ValidationException.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default configuration defined here


logger = logging.getLogger(__name__)

HANDLER_ENV_NAME_FOR_CODEGURU_KEY = "HANDLER_ENV_NAME_FOR_CODEGURU"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This env var key is already defined in lambda_handler.py. Can we use that instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, fixed!

Lambda layers are set, it tries to create a Profiling Group whenever a ResourceNotFoundException
is encountered.
"""
global is_create_pg_called_during_submit_profile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you defining it global here? Can't you do that at line #22?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed global altogether. was using a class variable so global was not needed!

@with_timer("createProfilingGroup", measurement="wall-clock-time")
def create_profiling_group(self):
"""
Create a PG for the Lambda function onboarded with 1-click integration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change this comment. This will also be done if the customer doesn't use the code change option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes done

)
self.is_profiling_group_created_during_execution = True
logger.info("Created Lambda Profiling Group with name " + str(self.profiling_group_name))
except Exception as e:
Copy link
Contributor

@pandpara pandpara Mar 31, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle ConflictException too by logging an appropriate message and setting is_profiling_group_created_during_execution if required.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

self._log_request_failed(operation="create_profiling_group", exception=e)

def should_autocreate_profiling_group(self):
return self.is_runtime_python() and self.is_handler_name_env_variable_set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to know if it is python or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think checking for environment variable is the correct way. Instead we can have class variable computePlatform and the default value should be Default and when the lambda profile builder is called we should create the SdkReported with AWSLambda compute platform.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want to know if it is python or not?

fixed it. not needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't think checking for environment variable is the correct way. Instead we can have class variable computePlatform and the default value should be Default and when the lambda profile builder is called we should create the SdkReported with AWSLambda compute platform.

Whenever we call build_profiler while creating a Lambda profiler, we set should_autocreate_profiling_group to True so that takes care of it.

profilingGroupName=self.profiling_group_name,
computePlatform='AWSLambda'
)
self.is_profiling_group_created_during_execution = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any usage of is_profiling_group_created_during_execution . Can you explain why are we having this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah was using it before. Removed it!

except Exception as e:
self._log_request_failed(operation="create_profiling_group", exception=e)

def is_compute_platform_lambda(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a wrapper method should_auto_create_profiling_group which checks if the compute platform is lambda for now. Use that wrapper method while deciding to call create_profiling_group in report and refresh_configuration methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@pandpara pandpara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@dangmaul-amazon dangmaul-amazon merged commit fac3bd6 into aws:main Apr 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants