Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move node type from network address to service level[Need storage module update] #2051

Merged
merged 8 commits into from
Dec 17, 2018

Conversation

wu-sheng
Copy link
Member

Update in this pull request

  1. Rename src_layer to node_type for NetworkAddressInventory. Break change, need discussion. Is this acceptable? @peng-yongsheng It will benefit for long term.
  2. Add node_type for ServiceInventory.
  3. Add NodeType for backend code use. spanLayer would not be used in backend codes, except the trace analysis. We need to do more refactor to avoid using enum value of protocol level in our storage.

@wu-sheng wu-sheng added the backend OAP backend related. label Dec 15, 2018
@wu-sheng wu-sheng added this to the 6.0.0-GA milestone Dec 15, 2018
@wu-sheng
Copy link
Member Author

wu-sheng commented Dec 15, 2018

This is my proposal mail thread. Consider this PR is step 1, Inventory changes (PROPOSAL ONLY).


I am thinking about SQL and Database analysis solution for a whole. Here are some my thoughts and proposal. Feedback are welcome.
Also, because I am not working on implementing this feature, all these are suggestions.

Inventory changes

All metadata need register in SkyWalking. So Database instances do the same. Right now, we provide a virtual service metadata for each database instance.
So we need following adjustments.

  1. network_address_inventory src_layer should be renamed as type. I know we want to keep upgrade compatible, this is just a suggestion. Because SrcLayer is hard to understand for people.
  2. service_instance_inventory and service_inventory add field type to bring network_address_inventory's type to higher level.

SQL Register

SQL statements are tons in real cluster, we can't and needn't to register and do for all of them.
First of all, we just register slow SQL statement, such as slower than 10ms? (Could provide settings for this).
Also, keep register number has a limit, such as, for each collector instance, would do slow SQL statement register over 10K times in 30mins.
Then, because register is async way, we could have two solution

  1. Use file buffer, waiting for register finished. Like mesh and trace data register. Advantage, one time slow SQL could be catch. Disadvantage, not quickly, especially the application codes are bad, don't use prepare statement.
  2. Just send the register, and don't wait. Ignore this time access. Disadvantage, clearly, some first times slow SQL access missed. Delay the opportunities to address slow SQL.
    We could discuss about these two.

Source

DBAccess source should be easy, include, db service id, latency, timestamp, status.
SQLStatement sources could be in two

  1. ServiceSQLStatement, use service id + sql id as entity id, and include latency, timestamp, status.
  2. EndpointSQLStatement, use endpoint id + sql id as entity id, and include latency, timestamp, status.

Query and UI

  1. Show Database service list
  2. Show metric for database cpm, SLA, avg response time, even P99/95...
  3. Show topN slow SQL from ServiceSQLStatement by aggregation query in database query page.
  4. Show topN slow SQL from EndpointSQLStatement with Endpoint related info attached in database query page.

Here are all in my mind.

@coveralls
Copy link

coveralls commented Dec 15, 2018

Coverage Status

Coverage decreased (-0.01%) to 14.306% when pulling 978b4d8 on nodetype into 2840f8e on master.

@wu-sheng
Copy link
Member Author

@liuhaoyang Please consider the proposal in ml and here(same) is just suggestion. I am doing this pull request just to try to make service and address inventory fixed. That is all.

All things about SQL register, analysis, and visualization are still depending on your leadership.

@wu-sheng
Copy link
Member Author

I am moving to adjust Zipkin, Jaeger and OC format analysis :)

@peng-yongsheng
Copy link
Member

Maybe we should better discuss the SQL and database analysis in a specific issue or pull request.

The source of layer is agent. Where is it from? Is there a specification that defined? Opentracing? TracingContext?

@wu-sheng
Copy link
Member Author

Which name? Layer? That is from tracing field. Adrian gave that name. But only make sense in trace, not inventory. Right?

@peng-yongsheng
Copy link
Member

All right. I'm not good at the name definition. This break change is not difficult for users to upgrade backend to the new edition. A database script could fixed it.

peng-yongsheng
peng-yongsheng previously approved these changes Dec 16, 2018
@wu-sheng wu-sheng mentioned this pull request Dec 16, 2018
3 tasks
@wu-sheng
Copy link
Member Author

@peng-yongsheng @liuhaoyang Tests are done. Look like working now.

@wu-sheng wu-sheng changed the title Move node type from network address to service level Move node type from network address to service level[Need storage module update] Dec 17, 2018
@wu-sheng wu-sheng merged commit c788469 into master Dec 17, 2018
@wu-sheng wu-sheng deleted the nodetype branch December 17, 2018 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend OAP backend related.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants