Track docker errors in compatibility server#234
Track docker errors in compatibility server#234ylil93 merged 26 commits intoGoogleCloudPlatform:masterfrom
Conversation
brianquinlan
left a comment
There was a problem hiding this comment.
So this will be an aggregate error measure and you'll break out the details in a follow-up PR?
8167f87 to
88471c7
Compare
Sorry I'm not sure I understand what you mean by this? |
The goal here is to provide monitoring that is as actionable as possible. So it would be useful to be able to distinguish between different error types (e.g. container not found [=> maybe bad load balancer configuration], timeout during container start [=> maybe not enough I/O bandwidth]) and what action triggered the error. I don't know what the best way to do that with...maybe look at tags? (https://opencensus.io/tag/) |
cd6ce03 to
39d6565
Compare
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| from opencensus.stats import aggregation as aggregation_module |
There was a problem hiding this comment.
Give this file, I'd say why not put Stats here as well and have everyone use the same global. But this is OK too.
There was a problem hiding this comment.
The goal of this file was to have a place to define new metrics (view objects), so I think Stats doesn't necessarily need to be here.
There was a problem hiding this comment.
The current separation of responsibilities is kind of odd now.
All of the view are defined here but they are registered in compatibility_checker_server.py.
If you moved Stats here then you could write:
STATS = stats_module.Stats()
...
DOCKER_ERROR_VIEW = view_module.View(...)
STATS.view_manager.register_view(DOCKER_ERROR_VIEW)There was a problem hiding this comment.
I think Stats needs to stay in compatibility_checker_server.py because it's necessary for calling _enable_exporter() which needs to stay in that file as well, and calling that function would look super weird if Stats were in views.py
55b2e0d to
b1f25f2
Compare
…cking for docker errors
- break up _enable_metrics() into _enable_exporter() and _get_project_id() - move those functions into compatibility_checker_server to avoid calling more than once - move view/custom metric definitions into separate file
b1f25f2 to
b47bfb7
Compare
| CMD cd compatibility_checker && \ | ||
| gunicorn -b 0.0.0.0:8888 -w 10 --worker-class gevent --max-requests 20 --max-requests-jitter 10 --timeout 300 \ | ||
| compatibility_checker_server:app | ||
| compatibility_checker_server:app --export_metrics |
There was a problem hiding this comment.
This is not working... We cannot pass the application args like this, see https://stackoverflow.com/questions/8495367/using-additional-command-line-arguments-with-gunicorn
| import wsgiref.simple_server | ||
|
|
||
| import pip_checker | ||
| import views |
There was a problem hiding this comment.
Where does this module come from? The docker run fails at this line...
There was a problem hiding this comment.
Oh I see, you will need to add another line in Dockerfile ADD views.py /compatibility_checker to add that file into the docker image.
And could you test the docker build and docker run locally? The commands are in the README.rst file: https://github.com/GoogleCloudPlatform/cloud-opensource-python/tree/master/compatibility_server#commands-for-deployment
There was a problem hiding this comment.
Ok I will submit a new PR to address this.
However, it also looks like there is a gap in our testing as I got all tests passing at the time of merge.
There was a problem hiding this comment.
That tests didn't include the docker build and run steps, which is done manually in each deployment. We could add that in the future, but this requires changing our CircleCI to run in a machine instead of running based on a docker... which won't be able to build a inner docker image easily.
Introduce a custom metric using opencensus and infrastructure for more custom metrics