Skip to content
This repository was archived by the owner on Oct 21, 2022. It is now read-only.

Track docker errors in compatibility server#234

Merged
ylil93 merged 26 commits intoGoogleCloudPlatform:masterfrom
ylil93:opencensus_metrics
Feb 15, 2019
Merged

Track docker errors in compatibility server#234
ylil93 merged 26 commits intoGoogleCloudPlatform:masterfrom
ylil93:opencensus_metrics

Conversation

@ylil93
Copy link
Copy Markdown
Contributor

@ylil93 ylil93 commented Jan 30, 2019

Introduce a custom metric using opencensus and infrastructure for more custom metrics

Copy link
Copy Markdown
Contributor

@brianquinlan brianquinlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this will be an aggregate error measure and you'll break out the details in a follow-up PR?

Comment thread compatibility_server/pip_checker.py Outdated
Comment thread compatibility_server/pip_checker.py Outdated
Comment thread compatibility_server/pip_checker.py Outdated
@ylil93 ylil93 force-pushed the opencensus_metrics branch from 8167f87 to 88471c7 Compare January 30, 2019 19:49
@ylil93
Copy link
Copy Markdown
Contributor Author

ylil93 commented Jan 30, 2019

@brianquinlan

So this will be an aggregate error measure and you'll break out the details in a follow-up PR?

Sorry I'm not sure I understand what you mean by this?
For this PR, I'm creating and tracking a metric for docker errors. I plan on defining and tracking more errors in future PRs.

@brianquinlan
Copy link
Copy Markdown
Contributor

@brianquinlan

So this will be an aggregate error measure and you'll break out the details in a follow-up PR?

Sorry I'm not sure I understand what you mean by this?
For this PR, I'm creating and tracking a metric for docker errors. I plan on defining and tracking more errors in future PRs.

The goal here is to provide monitoring that is as actionable as possible. So it would be useful to be able to distinguish between different error types (e.g. container not found [=> maybe bad load balancer configuration], timeout during container start [=> maybe not enough I/O bandwidth]) and what action triggered the error.

I don't know what the best way to do that with...maybe look at tags? (https://opencensus.io/tag/)

Comment thread compatibility_server/requirements.txt Outdated
@ylil93 ylil93 force-pushed the opencensus_metrics branch from cd6ce03 to 39d6565 Compare February 5, 2019 02:24
Comment thread compatibility_server/compatibility_checker_server.py Outdated
Comment thread compatibility_server/pip_checker.py Outdated
Comment thread compatibility_server/run-in-docker.sh Outdated
Comment thread compatibility_server/pip_checker.py Outdated
Comment thread compatibility_server/pip_checker.py Outdated
Comment thread compatibility_server/run-in-docker.sh Outdated
Comment thread compatibility_server/pip_checker.py Outdated
Comment thread compatibility_server/pip_checker.py Outdated
Comment thread compatibility_server/pip_checker.py
Comment thread compatibility_server/views.py Outdated
# See the License for the specific language governing permissions and
# limitations under the License.

from opencensus.stats import aggregation as aggregation_module
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give this file, I'd say why not put Stats here as well and have everyone use the same global. But this is OK too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal of this file was to have a place to define new metrics (view objects), so I think Stats doesn't necessarily need to be here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current separation of responsibilities is kind of odd now.

All of the view are defined here but they are registered in compatibility_checker_server.py.

If you moved Stats here then you could write:

STATS = stats_module.Stats()
...
DOCKER_ERROR_VIEW = view_module.View(...)
STATS.view_manager.register_view(DOCKER_ERROR_VIEW)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Stats needs to stay in compatibility_checker_server.py because it's necessary for calling _enable_exporter() which needs to stay in that file as well, and calling that function would look super weird if Stats were in views.py

@ylil93 ylil93 force-pushed the opencensus_metrics branch from 55b2e0d to b1f25f2 Compare February 15, 2019 20:00
- break up _enable_metrics() into _enable_exporter() and _get_project_id()
- move those functions into compatibility_checker_server to avoid calling more than once
- move view/custom metric definitions into separate file
@ylil93 ylil93 force-pushed the opencensus_metrics branch from b1f25f2 to b47bfb7 Compare February 15, 2019 23:41
@ylil93 ylil93 merged commit 24ad875 into GoogleCloudPlatform:master Feb 15, 2019
CMD cd compatibility_checker && \
gunicorn -b 0.0.0.0:8888 -w 10 --worker-class gevent --max-requests 20 --max-requests-jitter 10 --timeout 300 \
compatibility_checker_server:app
compatibility_checker_server:app --export_metrics
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not working... We cannot pass the application args like this, see https://stackoverflow.com/questions/8495367/using-additional-command-line-arguments-with-gunicorn

import wsgiref.simple_server

import pip_checker
import views
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this module come from? The docker run fails at this line...

Copy link
Copy Markdown
Member

@liyanhui1228 liyanhui1228 Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see, you will need to add another line in Dockerfile ADD views.py /compatibility_checker to add that file into the docker image.

And could you test the docker build and docker run locally? The commands are in the README.rst file: https://github.com/GoogleCloudPlatform/cloud-opensource-python/tree/master/compatibility_server#commands-for-deployment

Copy link
Copy Markdown
Contributor Author

@ylil93 ylil93 Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I will submit a new PR to address this.

However, it also looks like there is a gap in our testing as I got all tests passing at the time of merge.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That tests didn't include the docker build and run steps, which is done manually in each deployment. We could add that in the future, but this requires changing our CircleCI to run in a machine instead of running based on a docker... which won't be able to build a inner docker image easily.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants