Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add OSM logs collector #48

Merged
merged 1 commit into from Jun 16, 2021
Merged

feat: add OSM logs collector #48

merged 1 commit into from Jun 16, 2021

Conversation

SanyaKochhar
Copy link
Contributor

@SanyaKochhar SanyaKochhar commented Apr 28, 2021

This PR adds collectors for OSM and SMI. SMI collector can be broken out into a separate PR as well.

  • Flags: in aks-periscope.yaml, setting flags to OSM turns on both osm and smi collectors. SMI flag turns on just smi collector
  • osm_collector collects the following for each Open Service Mesh present on the cluster:
    • ground truth - list/configs of all resources and mutating and validating webhook configurations
    • for each osm control plan namespace - pod configs and logs of OSM controller and OSM injector pods
    • for each namespace monitored by osm: metadata, services, endpoints, service accounts, configmaps, etc.
  • [moving this into a separate PR] smi_collector collects the traffic targets, traffic splits, httproutegroups, tcproutes and udproutes for smi policies being applied.
    • this collector can be used by any service mesh, not just osm

Copy link
Collaborator

@davidkydd davidkydd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SanyaKochhar - very interesting!

PS I think you accidentally included a built periscope binary in your commit, which you will probably want to revert (mentioning here as github doesn't support direct comments on binary files)

Copy link
Collaborator

@davidkydd davidkydd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should look to introduce a couple of new core Periscope concepts: resourceCollector, and either meshCollector (which defines the fixed hierarchical relationship represented here) or some more general purpose "chainable" or "heirarchical" functionality for defining how the output of one collector can feed into another. The specific object names and selector strings would then be defined in the yaml rather than baking OSM specifics into the code

pkg/collector/osmlogs_collector.go Outdated Show resolved Hide resolved
pkg/collector/osmlogs_collector.go Outdated Show resolved Hide resolved
pkg/collector/osmlogs_collector.go Outdated Show resolved Hide resolved
pkg/collector/osmlogs_collector.go Outdated Show resolved Hide resolved
pkg/collector/osmlogs_collector.go Outdated Show resolved Hide resolved
Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Thank you so much for this, I have added initial thoughts for the key things like flag passing, and what all things you could explore:

few things as a checklist to make sure you have covered existing behaviours are:

✅ Make sure you run attributes of CI. (Adding more clarity: locally atleast initial infra e2-e test are passing and your forked branch ci-workflow is passing)
✅ Tip to locally test changes via image reside here: https://github.com/Azure/aks-periscope#programming-guide
* Above will be a very important scenario to test, and feel free to reach out to us for anything.

✅ If you find any new functionality could be easily extracted to UT (unit test) then feel enabled to do so.
✅ Once we have done all the above, we should have enough confidence to open PR for review.

(At least these are my initial thoughts) I will add more in case I forget anything.

Really appreciate it. @arnaud-tincelin and @JunSun17 Feel free to correct me or add anything else.

cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
pkg/collector/collector.go Outdated Show resolved Hide resolved
pkg/collector/collector.go Show resolved Hide resolved
pkg/utils/helper.go Show resolved Hide resolved
Copy link
Contributor Author

@SanyaKochhar SanyaKochhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks folks for your thorough reviews and comments. Based on this, on our end we have been tracking and working on the following tasks, do let me know if anything is missing.

cc: @davidkydd @Tatsinnit

sophsoph321 pushed a commit to sophsoph321/aks-periscope that referenced this pull request May 7, 2021
# This is the 1st commit message:

Revert "introduce arcmode"

This reverts commit 5f4fed4.

# This is the commit message #2:

remove secrets

# This is the commit message Azure#3:

add print statement

# This is the commit message Azure#4:

update print statement

# This is the commit message Azure#5:

committed

# This is the commit message Azure#6:

committed

# This is the commit message Azure#7:

committed

# This is the commit message Azure#8:

committed

# This is the commit message Azure#9:

remove print statements

# This is the commit message Azure#10:

add helm collector

# This is the commit message Azure#11:

change helm command

# This is the commit message Azure#12:

add helm 3 installation

# This is the commit message Azure#13:

add curl command installation

# This is the commit message Azure#14:

change helm command

# This is the commit message Azure#15:

remove helm history

# This is the commit message Azure#16:

debug helm history

# This is the commit message Azure#17:

add repo command

# This is the commit message Azure#18:

change stable repo name

# This is the commit message Azure#19:

add write to file

# This is the commit message Azure#20:

add kured

# This is the commit message Azure#21:

change

# This is the commit message Azure#22:

changes

# This is the commit message Azure#23:

add default namespace

# This is the commit message Azure#24:

change

# This is the commit message Azure#25:

add integration test

# This is the commit message Azure#26:

changes

# This is the commit message Azure#27:

add helm test

# This is the commit message Azure#28:

change print statement to error

# This is the commit message Azure#29:

change

# This is the commit message Azure#30:

more changes

# This is the commit message Azure#31:

add go installation

# This is the commit message Azure#32:

fix unit test

# This is the commit message Azure#33:

iptables to Helm

# This is the commit message Azure#34:

add custom resource collector

# This is the commit message Azure#35:

add new exporter, diagnoser, collector

# This is the commit message Azure#36:

comment unused variables

# This is the commit message Azure#37:

debug exporter

# This is the commit message Azure#38:

filenames

# This is the commit message Azure#39:

test zip function

# This is the commit message Azure#40:

list files

# This is the commit message Azure#41:

fmt to log

# This is the commit message Azure#42:

delete lines

# This is the commit message Azure#43:

changed

# This is the commit message Azure#44:

get current directory

# This is the commit message Azure#45:

remove some print statements

# This is the commit message Azure#46:

test zip

# This is the commit message Azure#47:

changes

# This is the commit message Azure#48:

add windir check

# This is the commit message Azure#49:

minor fix

# This is the commit message Azure#50:

get hostname

# This is the commit message Azure#51:

add expose in dockerfile

# This is the commit message Azure#52:

add exec collector

# This is the commit message Azure#53:

mitigate exit code 126

# This is the commit message Azure#54:

change curl url from example.com to dp endpoint

# This is the commit message Azure#55:

changes

# This is the commit message Azure#56:

uncomment exec

# This is the commit message Azure#57:

add new diagnoser

# This is the commit message Azure#58:

debugging

# This is the commit message Azure#59:

debug

# This is the commit message Azure#60:

debugging

# This is the commit message Azure#61:

remove print statements

# This is the commit message Azure#62:

remove print

# This is the commit message Azure#63:

add back crd print statement

# This is the commit message Azure#64:

change

# This is the commit message Azure#65:

change

# This is the commit message Azure#66:

update dataPoint name

# This is the commit message Azure#67:

modify forloop

# This is the commit message Azure#68:

add filename to datapoint

# This is the commit message Azure#69:

add back log prints

# This is the commit message Azure#70:

test

# This is the commit message Azure#71:

add fields to diagnostic signal

# This is the commit message Azure#72:

add config content to diagnoser

# This is the commit message Azure#73:

change format from yaml to json

# This is the commit message Azure#74:

add parameters for kubeobject config map

# This is the commit message Azure#75:

Revert "introduce arcmode"

This reverts commit 5f4fed4.

# This is the commit message Azure#76:

fix helm collector style

# This is the commit message Azure#77:

revert changes that test arc customizations

# This is the commit message Azure#78:

fix merge conflicts

# This is the commit message Azure#79:

fix merge conflicts

# This is the commit message Azure#80:

Revert "Add v0.3 acr image for Private cluster fix. (Azure#22)"

This reverts commit 49dd302.

# This is the commit message Azure#81:

fix merge conflicts

# This is the commit message Azure#82:

fix merge conflicts

# This is the commit message Azure#83:

add print statement

# This is the commit message Azure#84:

update print statement

# This is the commit message Azure#85:

committed

# This is the commit message Azure#86:

committed

# This is the commit message Azure#87:

committed

# This is the commit message Azure#88:

committed

# This is the commit message Azure#89:

remove print statements

# This is the commit message Azure#90:

fix merge conflicts

# This is the commit message Azure#91:

fix merge conflicts

# This is the commit message Azure#92:

fix merge conflicts

# This is the commit message Azure#93:

add repo command

# This is the commit message Azure#94:

change stable repo name

# This is the commit message Azure#95:

add write to file

# This is the commit message Azure#96:

add kured

# This is the commit message Azure#97:

change

# This is the commit message Azure#98:

changes

# This is the commit message Azure#99:

add default namespace

# This is the commit message Azure#100:

change

# This is the commit message Azure#101:

add integration test

# This is the commit message Azure#102:

changes

# This is the commit message Azure#103:

add helm test

# This is the commit message Azure#104:

change print statement to error

# This is the commit message Azure#105:

change

# This is the commit message Azure#106:

more changes

# This is the commit message Azure#107:

add go installation

# This is the commit message Azure#108:

fix unit test

# This is the commit message Azure#109:

add custom resource collector

# This is the commit message Azure#110:

fix merge conflicts

# This is the commit message Azure#111:

comment unused variables

# This is the commit message Azure#112:

debug exporter

# This is the commit message Azure#113:

filenames

# This is the commit message Azure#114:

test zip function

# This is the commit message Azure#115:

list files

# This is the commit message Azure#116:

fmt to log

# This is the commit message Azure#117:

delete lines

# This is the commit message Azure#118:

changed

# This is the commit message Azure#119:

get current directory

# This is the commit message Azure#120:

remove some print statements

# This is the commit message Azure#121:

test zip

# This is the commit message Azure#122:

changes

# This is the commit message Azure#123:

add windir check

# This is the commit message Azure#124:

minor fix

# This is the commit message Azure#125:

get hostname

# This is the commit message Azure#126:

add expose in dockerfile

# This is the commit message Azure#127:

add exec collector
@SanyaKochhar SanyaKochhar marked this pull request as ready for review May 20, 2021 21:12
Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Thank you for this. I have not done a detailed review because there are some higher level questions this PR needs to cater.

  • I think you guys work with ARC team? if yes please liason with them, as their PR will be merged soon, ARC - PR already adds a feature flag which you could use. Some of my comments are specific to how the logic is constructed around the OSM and SMI flags and the built up.

  • Please remove the aks-periscope Binary which is checked in with this PR.

  • Side note: Is there any internal design doc you guys have for this work? (Something as basic of what OSM collector and SMI collector aim to achieve with what commands which are run)

    • it would help in understanding the nested for loops with in collectors as well etc.

Thank you. 🙏

cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
deployment/aks-periscope.yaml Outdated Show resolved Hide resolved
deployment/aks-periscope.yaml Outdated Show resolved Hide resolved
pkg/collector/collector.go Outdated Show resolved Hide resolved
pkg/collector/smi_collector.go Outdated Show resolved Hide resolved
pkg/collector/osm_collector.go Show resolved Hide resolved
@SanyaKochhar SanyaKochhar mentioned this pull request May 24, 2021
@SanyaKochhar
Copy link
Contributor Author

@Tatsinnit
Here is a short design doc to explain this work. Hopefully it explains all the loops and helps with the review process. Lmk if you have any questions.

Design

OSM work on AKS Periscope aims to add two new collectors to AKS Periscope that can be optionally enabled with flags when users wish to collect OSM and SMI data. These collectors will allow users to create a snapshot of their OSM instance(s) and SMI policies in their cluster to provide to the OSM team for troubleshooting. This is especially useful so that users do not need to provide access to their cluster for troubleshooting.

OSM Collector

The OSM collector can be turned on using the OSM flag. This will automatically also turn on the SMI collector to collect information about SMI policies being used in the cluster.

The OSM collector will look for all the OSM instances present in the cluster. For each mesh, it will collect the following information:

  • Basic ground truth about the OSM instance
    • Mutating webhook configurations
    • Validating webhook configurations
    • List of all resources present with the label app.kubernetes.io/instance=<mesh name>
    • JSON/configs of all resources present with the label app.kubernetes.io/instance=<mesh name>
  • Relevant data for all namespaces monitored by OSM:
    • For each pod's Envoy sidecar: config dump, clusters, listeners, ready and stats
    • Metadata, services, endpoints, ConfigMaps, ingresses and service accounts
    • List of all pods present and config of each pod
  • Relevant data for all OSM controller namespaces:
    • Logs of each pod in the controller namespace (i.e., logs for osm-controller and osm-injector)
    • Metadata, services, endpoints, ConfigMaps, ingresses and service accounts
    • List of all pods present and config of each pod

SMI Collector

The SMI collector can be enabled using the SMI flag to collect information about Service Mesh Interface policies being used in the cluster for any service mesh, not just OSM. It is, however, automatically turned on if the OSM collector is enabled. The SMI collector collects lists and configs of the following resources:

  • HTTP route groups
  • TCP routes
  • Traffic splits
  • Traffic targets
  • Traffic access
  • UDP routes

Helpers

The following helpers are needed to achieve the above work:

  • utils.GetResourceList() expects a kubectl command including a resource type and namespace. Returns a list of all resources of the given type in the specified namespace
  • utils.RunCommandOnContainerWithOutputStreams() a modified version of utils.RunCommandOnContainer that returns both the stdout and stderr output streams from the given command. This is used by GetResourceList to make sure an error is documented if a resource is not found and thus empty collector files are not erroneously created for resources that do not exist
  • utils.RunBackgroundCommand() starts running a command on a container and returns its process ID. It does not wait for the command to return. This is used to run kubectl port-forward to collect Envoy information
  • utils.KillProcess() is used to kill a process by PID (used in conjunction with background commands)
  • collector.CollectKubectlOutputToCollectorFiles() - expects a kubectl command, rootpath and filename. It runs the command and saves the output to a file in that collector's file directory

@SanyaKochhar
Copy link
Contributor Author

  • I think you guys work with ARC team? if yes please liason with them, as their PR will be merged soon, ARC - PR already adds a feature flag which you could use. Some of my comments are specific to how the logic is constructed around the OSM and SMI flags and the built up.

Makes sense to reuse their flag. I have left a comment on their PR about using a generic flag name that would make sense for us to reuse. Of course we can also change it after their PR is merged. The comments above and design doc hopefully explain the logic/interaction behind the two flags

  • Please remove the aks-periscope Binary which is checked in with this PR.

Accidental addition, removed now!

cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
@Tatsinnit Tatsinnit self-requested a review June 4, 2021 00:17
Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Thank you and looks much better, and great to see how everything has come together, few things I have mentioned in my review.

As original logic with regards to the nitty-gritty for the specific commands for ism collector I believe that you should make sure that you guys are satisfied.

  • Please do test your changes with local image and I will try as well.

Thank you 🙏

cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
deployment/aks-periscope.yaml Outdated Show resolved Hide resolved
pkg/utils/helper.go Outdated Show resolved Hide resolved
pkg/collector/collector.go Outdated Show resolved Hide resolved
pkg/collector/osm_collector.go Outdated Show resolved Hide resolved
pkg/collector/osm_collector.go Outdated Show resolved Hide resolved
pkg/collector/osm_collector.go Show resolved Hide resolved
pkg/collector/osm_collector.go Outdated Show resolved Hide resolved
pkg/collector/osm_collector.go Outdated Show resolved Hide resolved
pkg/collector/osm_collector.go Show resolved Hide resolved
cmd/aks-periscope/aks-periscope.go Outdated Show resolved Hide resolved
deployment/aks-periscope.yaml Outdated Show resolved Hide resolved
pkg/utils/helper.go Show resolved Hide resolved
pkg/utils/helper.go Outdated Show resolved Hide resolved
pkg/utils/helper.go Outdated Show resolved Hide resolved
Copy link
Member

@Tatsinnit Tatsinnit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Thank you for this, really appreciate it. Approved it and will await till evening for few more eyes. Thank you @SanyaKochhar for testing them locally as well.

@SanyaKochhar
Copy link
Contributor Author

@arnaud-tincelin thanks for your comments, I've just resolved them. Would appreciate if you could re-review when you get a chance

Signed-off-by: Sanya Kochhar <kochhars@microsoft.com>
Co-authored-by: Johnson Shi <Johnson.Shi@microsoft.com>
Co-authored-by: Shalier Xia <shalierxia@microsoft.com>
@Tatsinnit Tatsinnit merged commit 3776d34 into Azure:master Jun 16, 2021
johnsonshi added a commit to SanyaKochhar/aks-periscope that referenced this pull request Jun 22, 2021
Add log collector for Service Mesh Interface
logs, CustomResourceDefinitions, and custom
resources.

Related to PR Azure#48.

Resolves issue Azure#67.

Signed-off-by: Johnson Shi <Johnson.Shi@microsoft.com>
johnsonshi added a commit to SanyaKochhar/aks-periscope that referenced this pull request Jun 22, 2021
Add log collector for Service Mesh Interface
logs, CustomResourceDefinitions, and custom
resources.

Related to PR Azure#48.

Resolves issue Azure#67.

Signed-off-by: Johnson Shi <Johnson.Shi@microsoft.com>
johnsonshi added a commit to SanyaKochhar/aks-periscope that referenced this pull request Jun 22, 2021
Add log collector for Service Mesh Interface
logs, CustomResourceDefinitions, and custom
resources.

Related to PR Azure#48.

Resolves issue Azure#67.

Signed-off-by: Johnson Shi <Johnson.Shi@microsoft.com>
johnsonshi added a commit to SanyaKochhar/aks-periscope that referenced this pull request Jun 28, 2021
Add log collector for Service Mesh Interface
logs, CustomResourceDefinitions, and custom
resources.

Related to PR Azure#48.

Resolves issue Azure#67.

Signed-off-by: Johnson Shi <Johnson.Shi@microsoft.com>
johnsonshi added a commit to SanyaKochhar/aks-periscope that referenced this pull request Jul 7, 2021
Add log collector for Service Mesh Interface
logs, CustomResourceDefinitions, and custom
resources.

Related to PR Azure#48.

Resolves issue Azure#67.

Signed-off-by: Johnson Shi <Johnson.Shi@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants