Add create_profiling_group call in refresh_configuration and report() #26

dangmaul-amazon · 2021-03-25T09:40:50Z

Issue #, if available:

Description of changes:
As part of Lambda 1-click integration, we want the agent to create a profiling group for the user if it does not exist. To do this, we add a create_profiling_group API call in places where we could receive a ResourceNotFoundException for the PG.

Testing: Pytest unit tests.
Tested on a sample app in my AWS account.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

As part of Lambda 1-click integration, we want the agent to create a profiling group for the user if it does not exist. To do this, we add a create_profiling_group API call in places where we could receive a ResourceNotFoundException for the PG.

mirelap-amazon · 2021-03-25T11:09:14Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

        self.metadata = environment["agent_metadata"]
        self.agent_config_merger = environment["agent_config_merger"]
+        self.is_lambda_one_click_pg_created_during_execution = False
+        # self.is_create_pg_called_during_submit_profile = False


I suppose you want to keep only one of this is_create_pg_called_during_submit_profile and remove the other.

Besides having all these flags for the Lambda usecase, how about creating a PG (if it doesn't exist already) in the Lambda decorator only? https://github.com/aws/amazon-codeguru-profiler-python-agent/blob/main/codeguru_profiler_agent/aws_lambda/profiler_decorator.py

I suppose you want to keep only one of this is_create_pg_called_during_submit_profile and remove the other.

Good catch! Forgot to remove this. Fixed in the latest commit

Besides having all these flags for the Lambda usecase, how about creating a PG (if it doesn't exist already) in the Lambda decorator only? https://github.com/aws/amazon-codeguru-profiler-python-agent/blob/main/codeguru_profiler_agent/aws_lambda/profiler_decorator.py

We could use that approach but we do not want the initialization of the decorator to depend on the API calls. If let's say we do that, then we will have to check whether a PG has been created already by using DescribeProfilingGroup API and then conditionally call CreateProfilingGroup API. Then, in this case the load on control plane API describePG will be huge if the lambda scales to lets say 10000 instances. Another variation of this would be to call CreateProfilingGroup API unconditionally, in which case the load on CreatePG will be huge. Instead if we depend on refresh_configuration() or report() method calls, then we automatically get the functionality where if PG is not present already then we get ResourceNotFoundException, and we get the functionality where we will get this exception only once and we will call CreatePG only once.

Thank you for the explanation, good point!

mirelap-amazon · 2021-03-25T14:03:11Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+                    global is_create_pg_called_during_submit_profile
+                    is_create_pg_called_during_submit_profile = True
+                    self.create_pg_when_one_click_integration_is_active()
+            self._log_request_failed(operation="post_agent_profile", exception=error)


I would say to not log exception when we create the PG, but log something at INFO level that we are creating one PG.

Why is this throwing ClientError? also the operation should be configure_agent.

I would say to not log exception when we create the PG, but log something at INFO level that we are creating one PG.

Changed in the next commit.

Because both ConfigureAgent and PostAgentProfile throw a ResourceNotFoundException. So if we fail to create a PG during the first refresh_configuration() call and the agent defaults to a default configuration, we can still attempt to create a PG during the report() call.

mirelap-amazon · 2021-03-29T17:19:17Z

test/unit/test_local_aggregator.py

 from codeguru_profiler_agent.utils.time import current_milli_time
 from test.pytestutils import before
-from mock import MagicMock, call
+from mock import MagicMock, call, patch


Is this new "patch" used?

Nope removed it! :)

mirelap-amazon · 2021-03-29T17:23:20Z

test/unit/sdk_reporter/test_sdk_reporter.py

+            try:
+                self.client_stubber.assert_no_pending_responses()
+                assert False
+            except AssertionError:
+                assert True


What is the usecase when the failure of the assertions should not be considered as a failure? I think this is the equivalent of not having any assertions, so if it's not reliable or useful, we can just delete it.

I got it, client_stubber.assert_no_pending_responses() should throw exception. What's the usecase? If this is really difficult to test, try to assert something more, like the message or something so we do not hide any failures that we do not want.

Ah makes sense. I removed the unnecessary check and just used self.client_stubber.assert_no_pending_responses() like we have in our other test files (ex. test_end_to_end_profiling.py)

mirelap-amazon · 2021-03-29T17:30:00Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

            )
            logger.info("Reported profile successfully")
            return True
+        except ClientError as error:


Could you please update the documentation of this functions that this creates a PG in the case of using Lamba layers if it throws ResourceNotFoundException? I would be clearer when reading the code.

mirelap-amazon · 2021-03-29T17:32:07Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py


 logger = logging.getLogger(__name__)
-
+HANDLER_ENV_NAME_FOR_CODEGURU = "HANDLER_ENV_NAME_FOR_CODEGURU"


How about ending the variable in KEY, or ENV_KEY to be clearer?

mirelap-amazon · 2021-03-29T17:42:50Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+            if fleet_info_as_map['computeType'] == 'aws_lambda':
+                handler_name_ = os.environ.get(HANDLER_ENV_NAME_FOR_CODEGURU)
+                if not handler_name_:
+                    logger.error("Env Variables for CodeGuru Profiler Lambda Layer are not set. Cannot create PG.")


Would this be printed if the customer is using Lambda, but without the layer integration?

https://docs.aws.amazon.com/codeguru/latest/profiler-ug/python-lambda-command-line.html

If so, we should not log error then.

no this is only when the customer is using Lambda layer. 1-click support is only for the layer I believe.

mirelap-amazon · 2021-03-31T09:18:24Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+        """
+        # https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html
+        python_runtime_identifier_prefix = "python"
+        execution_environment_env_variable = os.environ.get(AWS_EXECUTION_ENV_KEY)


What would be the lambda's runtime to be non-python, but this code to run?

As per above comment! Why do we need to know if the execution environment is python or not?

yeah my bad not needed. removed it

mirelap-amazon · 2021-03-31T09:20:43Z

test/unit/sdk_reporter/test_sdk_reporter.py

-            except AssertionError:
-                assert True
+            assert self.subject.is_profiling_group_created_during_execution is False
+            assert self.client_stubber.assert_no_pending_responses() is None


Minor: this warns about no empty line at end of file.

sorry, added it!

mirelap-amazon · 2021-03-31T09:22:09Z

test/unit/sdk_reporter/test_sdk_reporter.py

-lambda_one_click_profiling_group_name = "aws-lambda-testLambdaName"
-test_agent_metadata_for_lambda = AgentMetadata(
-    fleet_info=AWSLambda("arn:aws:lambda:us-east-1:111111111111:function:testLambdaName", "memory", "env", "agentId"))
+test_lambda_profiling_group_name = "aws-lambda-testLambdaName"


Could this be renamed to something clearer e.g. autocreated_profiling_group_name_lambda or something like it?

mirelap-amazon · 2021-03-31T16:44:48Z

codeguru_profiler_agent/local_aggregator.py

        self.reporter.refresh_configuration()

    def _report_profile(self, now):
+        current_last_report_attempted_value = self.last_report_attempted


Minor: replace current with previous?

mirelap-amazon · 2021-03-31T17:11:11Z

test/unit/sdk_reporter/test_sdk_reporter.py

+            try:
+                self.client_stubber.assert_no_pending_responses()
+                assert False
+            except AssertionError:
+                assert True


I got it, client_stubber.assert_no_pending_responses() should throw exception. What's the usecase? If this is really difficult to test, try to assert something more, like the message or something so we do not hide any failures that we do not want.

mirelap-amazon · 2021-03-31T17:14:31Z

test/unit/sdk_reporter/test_sdk_reporter.py

+    def test_when_backends_sends_resource_not_found_it_stops_the_profiling_in_non_lambda_case(self):
        self.client_stubber.add_client_error('configure_agent', service_error_code='ResourceNotFoundException',
                                             service_message='Simulated error in configure_agent call')
+        os.environ.__setitem__(AWS_EXECUTION_ENV_KEY, 'AWS_Lambda_java')


The test is about non-lambda, though this is set to lambda java. Either update the test name, either remove this line? (also related to my questions why are we considering there the java lambda usecase; I think it can't happen, and this should be removed).

fixed this!

pandpara · 2021-03-31T18:49:04Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+                    logger.info("ResourceNotFoundException for a Lambda PG in ConfigureAgent call. "+
+                                "Attempting to create PG")


Change logging message to this :

"Profiling group not found. Will try to create a profiling group with name = %s and compute platform = %s and retry calling configure agent after 5 minutes"

pandpara · 2021-03-31T18:49:47Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

-            self._log_request_failed(operation="configure_agent", exception=error)
+                self._log_request_failed(operation="configure_agent", exception=error)
+
+            if error.response['Error']['Code'] == 'ResourceNotFoundException':


Question here : What configuration will be used in case of ResourceNotFoundException and ValidationException.

The default configuration defined here

pandpara · 2021-03-31T18:52:01Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py


 logger = logging.getLogger(__name__)
-
+HANDLER_ENV_NAME_FOR_CODEGURU_KEY = "HANDLER_ENV_NAME_FOR_CODEGURU"


This env var key is already defined in lambda_handler.py. Can we use that instead?

yes, fixed!

pandpara · 2021-03-31T18:53:34Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+        Lambda layers are set, it tries to create a Profiling Group whenever a ResourceNotFoundException
+        is encountered.
        """
+        global is_create_pg_called_during_submit_profile


Why are you defining it global here? Can't you do that at line #22?

removed global altogether. was using a class variable so global was not needed!

pandpara · 2021-03-31T18:54:27Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+    @with_timer("createProfilingGroup", measurement="wall-clock-time")
+    def create_profiling_group(self):
+        """
+        Create a PG for the Lambda function onboarded with 1-click integration


Change this comment. This will also be done if the customer doesn't use the code change option?

pandpara · 2021-03-31T18:55:01Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+            )
+            self.is_profiling_group_created_during_execution = True
+            logger.info("Created Lambda Profiling Group with name " + str(self.profiling_group_name))
+        except Exception as e:


Handle ConflictException too by logging an appropriate message and setting is_profiling_group_created_during_execution if required.

pandpara · 2021-03-31T18:56:35Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+            self._log_request_failed(operation="create_profiling_group", exception=e)
+
+    def should_autocreate_profiling_group(self):
+        return self.is_runtime_python() and self.is_handler_name_env_variable_set()


Why do you want to know if it is python or not?

Also, I don't think checking for environment variable is the correct way. Instead we can have class variable computePlatform and the default value should be Default and when the lambda profile builder is called we should create the SdkReported with AWSLambda compute platform.

Why do you want to know if it is python or not?

fixed it. not needed

Also, I don't think checking for environment variable is the correct way. Instead we can have class variable computePlatform and the default value should be Default and when the lambda profile builder is called we should create the SdkReported with AWSLambda compute platform.

Whenever we call build_profiler while creating a Lambda profiler, we set should_autocreate_profiling_group to True so that takes care of it.

pandpara · 2021-04-01T16:55:18Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+                profilingGroupName=self.profiling_group_name,
+                computePlatform='AWSLambda'
+            )
+            self.is_profiling_group_created_during_execution = True


I don't see any usage of is_profiling_group_created_during_execution . Can you explain why are we having this?

Yeah was using it before. Removed it!

pandpara · 2021-04-01T17:08:18Z

codeguru_profiler_agent/sdk_reporter/sdk_reporter.py

+        except Exception as e:
+            self._log_request_failed(operation="create_profiling_group", exception=e)
+
+    def is_compute_platform_lambda(self):


can you add a wrapper method should_auto_create_profiling_group which checks if the compute platform is lambda for now. Use that wrapper method while deciding to call create_profiling_group in report and refresh_configuration methods.

…uld be autocreated

pandpara

LGTM!

dangmaul-amazon requested a review from mirelap-amazon March 25, 2021 09:40

mirelap-amazon suggested changes Mar 25, 2021

View reviewed changes

mirelap-amazon requested a review from a team March 25, 2021 15:30

mirelap-amazon assigned dangmaul-amazon Mar 25, 2021

mirelap-amazon self-requested a review March 25, 2021 15:35

Update sdk_reporter.py

538f06b

mirelap-amazon suggested changes Mar 29, 2021

View reviewed changes

mirelap-amazon reviewed Mar 29, 2021

View reviewed changes

dangmaul-amazon added 2 commits March 30, 2021 22:27

Fixes and addressing comments

bc2d398

Add a note to the docstring for refresh_configuration() and report()

ee6d377

mirelap-amazon reviewed Mar 31, 2021

View reviewed changes

pandpara reviewed Mar 31, 2021

View reviewed changes

Minor fixes for 1-click integration

4d23afc

mirelap-amazon approved these changes Apr 1, 2021

View reviewed changes

pandpara reviewed Apr 1, 2021

View reviewed changes

Removing unused variable and adding a wrapper to check if profile sho…

b444e4f

…uld be autocreated

pandpara approved these changes Apr 1, 2021

View reviewed changes

dangmaul-amazon merged commit fac3bd6 into aws:main Apr 1, 2021


		logger = logging.getLogger(__name__)

		HANDLER_ENV_NAME_FOR_CODEGURU = "HANDLER_ENV_NAME_FOR_CODEGURU"

		logger.info("ResourceNotFoundException for a Lambda PG in ConfigureAgent call. "+
		"Attempting to create PG")


		logger = logging.getLogger(__name__)

		HANDLER_ENV_NAME_FOR_CODEGURU_KEY = "HANDLER_ENV_NAME_FOR_CODEGURU"

Add create_profiling_group call in refresh_configuration and report() #26

Add create_profiling_group call in refresh_configuration and report() #26

Uh oh!

Conversation

dangmaul-amazon commented Mar 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

dangmaul-amazon commented Mar 25, 2021 •

edited

Loading

pandpara Mar 31, 2021 •

edited

Loading