Skip to content

add noTraceOrigins option #45

Merged
Fine0830 merged 6 commits intoapache:masterfrom
tianyk:feature/origin-allowlist
Mar 16, 2021
Merged

add noTraceOrigins option #45
Fine0830 merged 6 commits intoapache:masterfrom
tianyk:feature/origin-allowlist

Conversation

@tianyk
Copy link
Contributor

@tianyk tianyk commented Mar 5, 2021

Now, the SDK will intercept all XHR requests and add ws8 request headers. In the cross-domain, the server needs to add headers.

Access-Control-Allow-Headers: ws8

https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#preflighted_requests

In some cases, we may request third party services. If a third party server does not add the top header, there will be cross-domain problems.

@wu-sheng

Add originAllowlist option, only the request origin on this list will be tracked.

import ClientMonitor from 'skywalking-client-js';

ClientMonitor.setPerformance({
  collector: 'http://127.0.0.1:8080',
  service: 'browser-app',
  serviceVersion: '1.0.0',
  pagePath: location.href,
  useFmp: true,
  noTraceOrigins: ['http://example1.com', /\.example2\.com$/]
});

Copy link
Contributor Author

@tianyk tianyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge

@wu-sheng wu-sheng requested review from Fine0830 and kezhenxu94 March 5, 2021 09:20
@wu-sheng wu-sheng added the enhancement New feature or request label Mar 5, 2021
@kezhenxu94
Copy link
Member

I'm personally +1 to this new option, but wondering what's the reasonable default value to give it, now you didn't give any default value so all requests are not intercepted, right? Maybe set window.location.host as the default value? What do others think?

@wu-sheng
Copy link
Member

wu-sheng commented Mar 5, 2021

I am wondering the case, when do we use this? Such as which domain should we exclude. Usually this agent is being globally set by an app, then why some of them(especially domain name, rather than URI) should not be traced?

@tianyk
Copy link
Contributor Author

tianyk commented Mar 5, 2021

I'm personally +1 to this new option, but wondering what's the reasonable default value to give it, now you didn't give any default value so all requests are not intercepted, right? Maybe set window.location.host as the default value? What do others think?

This is an optional configuration, and it is backward compatible. If it's not configured, it's the same as now. All requests will be traced.

@tianyk
Copy link
Contributor Author

tianyk commented Mar 5, 2021

I am wondering the case, when do we use this? Such as which domain should we exclude. Usually this agent is being globally set by an app, then why some of them(especially domain name, rather than URI) should not be traced?

Suppose we want to under the domain name https://example.com, now we are going to visit an interface under https://openapi.example.cn server.

Access-Control-Allow-Headers: ws8

Now you have a cross-domain situation that requires the openapi.example.cn server to add a response header. If, openapi.example.cn is out of our control(third party services). We can't add this response header. If you do not add it, the cross - domain will fail.

@kezhenxu94
Copy link
Member

I'm personally +1 to this new option, but wondering what's the reasonable default value to give it, now you didn't give any default value so all requests are not intercepted, right? Maybe set window.location.host as the default value? What do others think?

This is an optional configuration, and it is backward compatible. If it's not configured, it's the same as now. All requests will be traced.

OK, then it looks good to me.

@wu-sheng does👇 make sense to you

Now, the SDK will intercept all XHR requests and add ws8 request headers. In the cross-domain, the server needs to add headers.

Access-Control-Allow-Headers: sw8

https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#preflighted_requests

In some cases, we may request third party services. If a third party server does not add the top header, there will be cross-domain problems.

The point is that some requests to 3rd-party servers cannot add Access-Control-Allow-Headers: sw8

@wu-sheng
Copy link
Member

wu-sheng commented Mar 5, 2021

I'm personally +1 to this new option, but wondering what's the reasonable default value to give it, now you didn't give any default value so all requests are not intercepted, right? Maybe set window.location.host as the default value? What do others think?

This is an optional configuration, and it is backward compatible. If it's not configured, it's the same as now. All requests will be traced.

OK, then it looks good to me.

@wu-sheng does👇 make sense to you

Now, the SDK will intercept all XHR requests and add ws8 request headers. In the cross-domain, the server needs to add headers.
Access-Control-Allow-Headers: sw8
https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#preflighted_requests
In some cases, we may request third party services. If a third party server does not add the top header, there will be cross-domain problems.

The point is that some requests to 3rd-party servers cannot add Access-Control-Allow-Headers: sw8

Make sense.

@Fine0830 @arugal Please do code review.

@wu-sheng wu-sheng requested a review from rainbend March 5, 2021 11:24
@wu-sheng wu-sheng added this to the 0.4.0 milestone Mar 5, 2021
@Fine0830
Copy link
Member

Fine0830 commented Mar 6, 2021

There are some suggestions for this. 1. For the SDK name, could you change originAllowlist to NoTraceOrigins and update the corresponding logic? 2. It is fine that the type of originAllowlist is (string | RegExp)[]. 3. the default value of originAllowlist is [].
WDYT? @wu-sheng @kezhenxu94 @tianyk

@wu-sheng
Copy link
Member

wu-sheng commented Mar 6, 2021

There are some suggestions for this. 1. For the SDK name, could you change originAllowlist to NoTraceOrigins and update the corresponding logic? 2. It is fine that the type of originAllowlist is (string | RegExp)[]. 3. the default value of originAllowlist is [].
WDYT? @wu-sheng @kezhenxu94 @tianyk

Use blockList if you want. I am +1 to use blocking rather than allowing mechanism. In this case, empty meaning all available makes more sense to me.

@tianyk
Copy link
Contributor Author

tianyk commented Mar 6, 2021 via email

@kezhenxu94
Copy link
Member

kezhenxu94 commented Mar 6, 2021

There are some suggestions for this. 1. For the SDK name, could you change originAllowlist to NoTraceOrigins and update the corresponding logic? 2. It is fine that the type of originAllowlist is (string | RegExp)[]. 3. the default value of originAllowlist is [].

WDYT? @wu-sheng @kezhenxu94 @tianyk

Use blockList if you want. I am +1 to use blocking rather than allowing mechanism. In this case, empty meaning all available makes more sense to me.

Blocking mechanism requires that you must list all uncontrollable domains, some of which may be unknown, allowlist is more preferable in this case since you know which domains you own / control

@tianyk
Copy link
Contributor Author

tianyk commented Mar 6, 2021 via email

@wu-sheng
Copy link
Member

wu-sheng commented Mar 6, 2021

Why you don't? From my understanding, even you don't know now, you should.

@wu-sheng
Copy link
Member

wu-sheng commented Mar 6, 2021

At least, if you insist you can't, use the regex to do exclusive match.

@tianyk
Copy link
Contributor Author

tianyk commented Mar 6, 2021 via email

@tianyk
Copy link
Contributor Author

tianyk commented Mar 6, 2021 via email

@Fine0830 Fine0830 closed this Mar 6, 2021
@Fine0830 Fine0830 reopened this Mar 6, 2021
@Fine0830
Copy link
Member

Fine0830 commented Mar 6, 2021

Let me give you an example. We used an SDK for a local service, and now we can know the origin of the service they provide. However, the SDK may be upgraded and they may change the service address. The situation is out of your control. For example, an SDK in the form of Google Analytics(Just an example, Google Analytics does not use XHR). In addition, adding extra headers can be dangerous when crossing domains. Services (third) that are not processed across domains will not be available. To make the service more manageable, I suggest whitelisting. 吴晟 Wu Sheng notifications@github.com于2021年3月6日 周六15:07写道:

At least, if you insist you can't, use the regex to do exclusive match. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ3CNWD6LY4JTS5NXQSPGTTCHID3ANCNFSM4YURSEZQ .

skywalking-client-js is dependent on browser. For the frontend web projects, if you use third-party projects in your own projects, probably, you should use iframe or a router with source codes. You said you didn't know the service address, do you use iframe in your web projects? If so, I have tested, the requests of the third party isn't set sw8 in the header.

test

@tianyk
Copy link
Contributor Author

tianyk commented Mar 6, 2021

Let me give you an example. We used an SDK for a local service, and now we can know the origin of the service they provide. However, the SDK may be upgraded and they may change the service address. The situation is out of your control. For example, an SDK in the form of Google Analytics(Just an example, Google Analytics does not use XHR). In addition, adding extra headers can be dangerous when crossing domains. Services (third) that are not processed across domains will not be available. To make the service more manageable, I suggest whitelisting. 吴晟 Wu Sheng notifications@github.com于2021年3月6日 周六15:07写道:

At least, if you insist you can't, use the regex to do exclusive match. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ3CNWD6LY4JTS5NXQSPGTTCHID3ANCNFSM4YURSEZQ .

skywalking-client-js is dependent on browser. For the frontend web projects, if you use third-party projects in your own projects, probably, you should use iframe or a router with source codes. You said you didn't know the service address, do you use iframe in your web projects? If so, I have tested, the requests of the third party isn't set sw8 in the header.

test

<script src="https://cdn.jsdelivr.net/npm/skywalking-client-js@0.4.0/lib/index.js"></script>
<script>
	// Comment out and try it here
	ClientMonitor.register({
		collector: 'http://127.0.0.1:8080',
		service: 'test-ui',
		pagePath: '/current/page/name',
		serviceVersion: 'v1.0.0'
	});

	fetch('http://kekek.cc/static/bear.bin');
</script>

@tianyk
Copy link
Contributor Author

tianyk commented Mar 9, 2021

Let me give you an example. We used an SDK for a local service, and now we can know the origin of the service they provide. However, the SDK may be upgraded and they may change the service address. The situation is out of your control. For example, an SDK in the form of Google Analytics(Just an example, Google Analytics does not use XHR). In addition, adding extra headers can be dangerous when crossing domains. Services (third) that are not processed across domains will not be available. To make the service more manageable, I suggest whitelisting. 吴晟 Wu Sheng notifications@github.com于2021年3月6日 周六15:07写道:

At least, if you insist you can't, use the regex to do exclusive match. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#45 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAZ3CNWD6LY4JTS5NXQSPGTTCHID3ANCNFSM4YURSEZQ .

skywalking-client-js is dependent on browser. For the frontend web projects, if you use third-party projects in your own projects, probably, you should use iframe or a router with source codes. You said you didn't know the service address, do you use iframe in your web projects? If so, I have tested, the requests of the third party isn't set sw8 in the header.
test

<script src="https://cdn.jsdelivr.net/npm/skywalking-client-js@0.4.0/lib/index.js"></script>
<script>
	// Comment out and try it here
	ClientMonitor.register({
		collector: 'http://127.0.0.1:8080',
		service: 'test-ui',
		pagePath: '/current/page/name',
		serviceVersion: 'v1.0.0'
	});

	fetch('http://kekek.cc/static/bear.bin');
</script>

@kezhenxu94 @wu-sheng @Fine0830

Is there any progress on this patch?

@kezhenxu94
Copy link
Member

@kezhenxu94 @wu-sheng @Fine0830

Is there any progress on this patch?

Sorry that this PR hasn't been merged for a little long because it seems that we haven't yet reached to a consensus whether we should allowlist / blocklist.

To reiterate, I'm +1 to allowlisting, the critical reason is that:

  1. Websites are revolving, new (existing) 3rd-party domains may be added (changed) without advance notice, blocklist will cause those requests failed, while allowlist may only miss some traces when new controllable domains are added.

  2. Adopting blocklist and using exclusive match with regex to allowlist some domains is counter-intuitive and error-prone, which is unnecessary if we know users are highly possible to configure in allowlist way.

@tianyk please give the community a little more time to evaluate and reach to a consensus (hopefully) so that we don't make something unreasonable for the users, :)

@tianyk
Copy link
Contributor Author

tianyk commented Mar 9, 2021

@tianyk Hi, you should know, the community and people like us work globally and in async/flexible. So, in the open-source, 3 days is very common to sync and update.

And, right now, I think I am convinced this allowList is better than blockList.

@Fine0830 Please consider this, if you still have concerns, let's discuss them.

I understand. I made the changes as suggested.

@Fine0830
Copy link
Member

Fine0830 commented Mar 9, 2021

I'm curious if there is no problem for the front end to call the third-party interface directly. What's more, in the case of unclear services, such as cross domain, how can we solve this problem if we don't know the service address?

@wu-sheng
Copy link
Member

wu-sheng commented Mar 9, 2021

how can we solve this problem if we don't know the service address?

Good question, I want to hear about this.

@tianyk
Copy link
Contributor Author

tianyk commented Mar 10, 2021

What's more, in the case of unclear services, such as cross domain, how can we solve this problem if we don't know the service address?

When using JS-SDK, we don't directly have to make our own HTTP requests.

@wu-sheng @Fine0830

<script src="https://cdn.jsdelivr.net/npm/skywalking-client-js@0.4.0/lib/index.js"></script>
<script>
	// Comment out and try it here
	ClientMonitor.register({
		collector: 'http://127.0.0.1:8080',
		service: 'test-ui',
		pagePath: '/current/page/name',
		serviceVersion: 'v1.0.0'
	});
</script>

<script>
       // js-sdk
      (function() {
               fetch('http://kekek.cc/static/bear.bin');
      })();
</script>

@Fine0830
Copy link
Member

Fine0830 commented Mar 10, 2021

What's more, in the case of unclear services, such as cross domain, how can we solve this problem if we don't know the service address?

When using JS-SDK, we don't directly have to make our own HTTP requests.

@wu-sheng @Fine0830

<script src="https://cdn.jsdelivr.net/npm/skywalking-client-js@0.4.0/lib/index.js"></script>
<script>
	// Comment out and try it here
	ClientMonitor.register({
		collector: 'http://127.0.0.1:8080',
		service: 'test-ui',
		pagePath: '/current/page/name',
		serviceVersion: 'v1.0.0'
	});
</script>

<script>
       // js-sdk
      (function() {
               fetch('http://kekek.cc/static/bear.bin');
      })();
</script>

When I use like this, here is an error with Cross domain.
image

@tianyk
Copy link
Contributor Author

tianyk commented Mar 10, 2021

What's more, in the case of unclear services, such as cross domain, how can we solve this problem if we don't know the service address?

When using JS-SDK, we don't directly have to make our own HTTP requests.
@wu-sheng @Fine0830

<script src="https://cdn.jsdelivr.net/npm/skywalking-client-js@0.4.0/lib/index.js"></script>
<script>
	// Comment out and try it here
	ClientMonitor.register({
		collector: 'http://127.0.0.1:8080',
		service: 'test-ui',
		pagePath: '/current/page/name',
		serviceVersion: 'v1.0.0'
	});
</script>

<script>
       // js-sdk
      (function() {
               fetch('http://kekek.cc/static/bear.bin');
      })();
</script>

When I use like this, here is an error with Cross domain.
image

You turn off trace, the error is gone.

Because in this example, the server only add Access-Control-Allow-Origin without handling Access-Control-Request-Headers

@Fine0830
Copy link
Member

BTW, probably the Risk Control Technology Department don't allow us to use external interface, as there are risks.

@tianyk
Copy link
Contributor Author

tianyk commented Mar 10, 2021

BTW, probably the Risk Control Technology Department don't allow us to use external interface, as there are risks.

Sometimes it's a third party service, but that's still the case.

Some examples are Analytics servers, OSS/S3 file upload services.

@Fine0830
Copy link
Member

From my development experience, I usually call such an interface on the back end instead directly calling. That's fine, if you think the example reasonable. @wu-sheng

@wu-sheng
Copy link
Member

Some examples are Analytics servers, OSS/S3 file upload services.

@tianyk I want to get more explanation about, why your codes are allowed to randomly accessing different domains without control? I can only see you are using HTTP, rather than HTTPS, which should not be in product env.

@tianyk
Copy link
Contributor Author

tianyk commented Mar 10, 2021

Some examples are Analytics servers, OSS/S3 file upload services.

@tianyk I want to get more explanation about, why your codes are allowed to randomly accessing different domains without control? I can only see you are using HTTP, rather than HTTPS, which should not be in product env.

I thought about it, if you are very clear about the service to be called, it is more appropriate to use denylist.

In some cases, allowlist may be more appropriate:

  1. You cannot be sure which services have been accessed.
  2. Only track specific services (for example, only some services use skywalking)

I think origin filtering is necessary, use trace or no trace list.

@wu-sheng @Fine0830

@wu-sheng
Copy link
Member

You cannot be sure which services have been accessed.

This is the key we are arguing right now, as a product env, why this is unknown.

@tianyk
Copy link
Contributor Author

tianyk commented Mar 11, 2021

You cannot be sure which services have been accessed.

This is the key we are arguing right now, as a product env, why this is unknown.

@wu-sheng

This is not the original problem. My initial requirement was to filter out some requests without tracking.

Our situation at the time was that we knew those domain names we needed to track, so we wanted to use a whitelist to solve it.

This brings up the above series of discussions on the whitelist.

My problem is not actually a whitelist or a blacklist, but the need to filter out some requests. Sorry, some of the above examples have caused misunderstandings.

We should go back to filter tracking, I think this is the most fundamental problem.

As for the whitelist or the blacklist, it is just a common filtering method.

@wu-sheng
Copy link
Member

You cannot be sure which services have been accessed.

This is the key we are arguing right now, as a product env, why this is unknown.

@wu-sheng

This is not the original problem. My initial requirement was to filter out some requests without tracking.

Our situation at the time was that we knew those domain names we needed to track, so we wanted to use a whitelist to solve it.

This brings up the above series of discussions on the whitelist.

My problem is not actually a whitelist or a blacklist, but the need to filter out some requests. Sorry, some of the above examples have caused misunderstandings.

We should go back to filter tracking, I think this is the most fundamental problem.

As for the whitelist or the blacklist, it is just a common filtering method.

Like I replied, I am supporting this feature. But also agree with @Fine0830, it seems blockList makes more sense. From the product management requirement, I prefer blockList.

@Fine0830
Copy link
Member

You cannot be sure which services have been accessed.

This is the key we are arguing right now, as a product env, why this is unknown.

@wu-sheng

This is not the original problem. My initial requirement was to filter out some requests without tracking.

Our situation at the time was that we knew those domain names we needed to track, so we wanted to use a whitelist to solve it.

This brings up the above series of discussions on the whitelist.

My problem is not actually a whitelist or a blacklist, but the need to filter out some requests. Sorry, some of the above examples have caused misunderstandings.

We should go back to filter tracking, I think this is the most fundamental problem.

As for the whitelist or the blacklist, it is just a common filtering method.

Do you agree with blockList? Could you update the logic to blockList?

@tianyk
Copy link
Contributor Author

tianyk commented Mar 12, 2021

You cannot be sure which services have been accessed.

This is the key we are arguing right now, as a product env, why this is unknown.

@wu-sheng
This is not the original problem. My initial requirement was to filter out some requests without tracking.
Our situation at the time was that we knew those domain names we needed to track, so we wanted to use a whitelist to solve it.
This brings up the above series of discussions on the whitelist.
My problem is not actually a whitelist or a blacklist, but the need to filter out some requests. Sorry, some of the above examples have caused misunderstandings.
We should go back to filter tracking, I think this is the most fundamental problem.
As for the whitelist or the blacklist, it is just a common filtering method.

Do you agree with blockList? Could you update the logic to blockList?

I agree there is a BlockList, shall we consider providing both BlockList and Whitelist configuration? Whitelist can be used for most situations that need to be excluded.

@Fine0830 @wu-sheng

@Fine0830
Copy link
Member

You cannot be sure which services have been accessed.

This is the key we are arguing right now, as a product env, why this is unknown.

@wu-sheng
This is not the original problem. My initial requirement was to filter out some requests without tracking.
Our situation at the time was that we knew those domain names we needed to track, so we wanted to use a whitelist to solve it.
This brings up the above series of discussions on the whitelist.
My problem is not actually a whitelist or a blacklist, but the need to filter out some requests. Sorry, some of the above examples have caused misunderstandings.
We should go back to filter tracking, I think this is the most fundamental problem.
As for the whitelist or the blacklist, it is just a common filtering method.

Do you agree with blockList? Could you update the logic to blockList?

I agree there is a BlockList, shall we consider providing both BlockList and Whitelist configuration? Whitelist can be used for most situations that need to be excluded.

@Fine0830 @wu-sheng

I think it's fine, we just use one.

@wu-sheng
Copy link
Member

Only block list makes sense to me.

@tianyk tianyk changed the title add originAllowlist option add noTraceOrigins option Mar 15, 2021
}
if (Array.isArray(options.originAllowList)) {
const traced = options.originAllowList.some((rule) => {
if (Array.isArray(options.noTraceOrigins)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add default value([]) for noTraceOrigins to remove this if.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌

Copy link
Member

@Fine0830 Fine0830 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other codes are good. Thanks @tianyk

Copy link
Member

@Fine0830 Fine0830 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Fine0830 Fine0830 merged commit b20a7fa into apache:master Mar 16, 2021
@tianyk tianyk deleted the feature/origin-allowlist branch March 16, 2021 02:41
@Fine0830 Fine0830 modified the milestones: 0.4.0, 0.5.0 Apr 24, 2021
@Mrtaochen
Copy link

现在,SDK将拦截所有XHR请求并添加ws8请求标头。在跨域时,服务器需要添加标头。

Access-Control-Allow-Headers: ws8

https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS#preflighted_requests

在某些情况下,我们可能会请求第三方服务。如果第三方服务器不添加top header,就会出现跨域问题。

@wu-sheng

添加originAllowlist选项,仅跟踪此列表中的请求来源。

import ClientMonitor from 'skywalking-client-js';

ClientMonitor.setPerformance({
  collector: 'http://127.0.0.1:8080',
  service: 'browser-app',
  serviceVersion: '1.0.0',
  pagePath: location.href,
  useFmp: true,
  noTraceOrigins: ['http://example1.com', /\.example2\.com$/]
});

想请问一个问题,是否允许自定义请求头名呢?如Access-Control-Allow-Headers: xxx-sw8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants