Skip to content

Conversation

@Benouare
Copy link
Contributor

@Benouare Benouare commented Jan 7, 2020

Hi,
Dont really know what are the differences between -set_serial and -CAcreateserial... How they are working or the exepted behavior.
But, regarding my test on my environnement (Mac OS 10.13.6 Python 3.8.0) it solved my issue.

@Benouare Benouare force-pushed the fix/TLS_Interception_issue branch from 94f4b1d to 4549482 Compare January 7, 2020 17:00
@abhinavsingh
Copy link
Owner

Thank you for the PR.

The first time you use your CA to sign a certificate you can use the -CAcreateserial option. This option will create a file (ca.srl) containing a serial number. You are probably going to create more certificate, and the next time you will have to do that use the -CAserial option (and no more -CAcreateserial) followed with the name of the file containing your serial number. This file will be incremented each time you sign a new certificate. This serial number will be readable using a browser (once the certificate is imported to a pkcs12 format). And we can have an idea of the number of certificate created by a CA.

So I think -CAcreateserial or -set_serial shouldn't make a difference. -CAcreateserial will simply result in a ca.crl file creation.

I looked at the code where -set_serial invocation is happening, I am wondering does changing from time.time() to a randomly generated number also helps fix this? In worst case, 2 parallel requests coming to proxy.py can end up getting same serial number by using time.time(). Wdyt? Plmk.

@abhinavsingh
Copy link
Owner

.... and after looking at your investigation here #251 (comment) and #251 (comment) , I have a strong hunch that this is exactly what probably is happening.

E.g. sky.com was loaded in browser but it is trying to make say 200 requests in background. There is a high probability that -set_serial time.time() will end up colliding resulting into the SSL error we ran into.

@abhinavsingh abhinavsingh added the Bug Fix A PR sent for a bug fix label Jan 7, 2020
@abhinavsingh
Copy link
Owner

Thank you for investigation and concluding that -set_serial is the problem. I really appreciate your effort. It's a bug and I have marked all PR / Issues accordingly.

Thinking further about it, larger problem I see is that we can run into race conditions during certificate generations, resulting into random request failures. At the time of adding TLS interception support, I didn't verify / tested this scenario.

So how can it happen?

  1. Previously in Use UUID as request ID instead of a string. Reuse UUID to fix OSError when generating certificates #258 (comment) I highlighted how a high throughput to proxy.py can result in a race condition where 2 certificates end-up with same serial ID. Reason being usage of time.time() for serial ID.

  2. One option can be to use self.uid which is a unique ID assigned to each client request by proxy.py. This self.uid is guaranteed to be unique across all requests OR as expressed earlier, we can also generate a random uid ad-hoc for setting the serial ID.

  3. I'll need to further investigate, but my gut feeling is that openssl is taking a lock on -CAcreateserial file on disk to avoid race conditions. This is exactly what we need and hence -CAcreateserial is a good intermediate solution.

  4. I say intermediate because I still believe we must use -set_serial due to performance and efficiency. Delegating file lock and serial generation to openssl will only slow things down, as now openssl will act like a global lock shared by all proxy.py workers.

Note: In-case, openssl is not taking a lock on crl file then we can still run into same issue.

@Benouare
Copy link
Contributor Author

Benouare commented Jan 7, 2020

Thx for you reply and your explications.
The UID seems to be the good solution. In my opinion, using the request uid is better than to generate a new one.
If you want, i can try to deal with it, and make an other PR or amend my commit.

@abhinavsingh
Copy link
Owner

Thx for you reply and your explications.
The UID seems to be the good solution. In my opinion, using the request uid is better than to generate a new one.
If you want, i can try to deal with it, and make an other PR or amend my commit.

Sure thing, please go ahead. Meanwhile, lemme figure why GitHub workflows are not running for your PR.

@Benouare
Copy link
Contributor Author

Benouare commented Jan 7, 2020

On my dev version it works using the uid (with little changes...).
I have to recheck my changes before to push: I change the uid from str (uuid.uuid4().hex()) to the main class UUID (uuid.uuid4()), so i had to change all the params in some classes.
It looks like we are not able to put a Str as param for the -set_serial, it's because of that, that i need to change the self.uid type.

I will do that maybe tonight, or tomorrow morning.

@abhinavsingh
Copy link
Owner

abhinavsingh commented Jan 7, 2020

On my dev version it works using the uid (with little changes...).
I have to recheck my changes before to push: I change the uid from str (uuid.uuid4().hex()) to the main class UUID (uuid.uuid4()), so i had to change all the params in some classes.
It looks like we are not able to put a Str as param for the -set_serial, it's because of that, that i need to change the self.uid type.

I will do that maybe tonight, or tomorrow morning.

Thanks for quickly checking out on it, I suspected it but wasn't sure. Let's not change uid type because at this stage uid is too much spread out to change.

To fix this bug, we need a unique integer that can act as certificate serial ID. If time.time() is itself running into race condition, can we do something else to mitigate the race condition. We don't even need an auto-incrementing ID, a random unique integer / floating point will work for our use case.

NOTE: Client threads run within context of a particular worker. There can be up-to N worker processes.

So can we append to time.time() to get a unique integer (actually it's a floating point). Some options from top of the head:

a) IIRC uid is just a hex string so we should be able to get numbers out of it.

b) Append time.time() value with a random number. E.g. say by using https://docs.python.org/3/library/random.html. For a perfect solution with random, we should ideally seed them differently per worker process.

c) Or append time.time() with first k integers obtained from uid as discussed in a) above

One or more combination of above should result in a unique identifier. Wdyt?

@Benouare Benouare force-pushed the fix/TLS_Interception_issue branch 2 times, most recently from fd5150b to fd6b125 Compare January 7, 2020 22:23
@abhinavsingh
Copy link
Owner

jFYI, https://github.com/abhinavsingh/proxy.py#setup-pre-commit-hook will be useful to avoid failing workflow due to small things here and there.

@Benouare Benouare force-pushed the fix/TLS_Interception_issue branch 4 times, most recently from 0f8d965 to 1ae6d8f Compare January 7, 2020 23:02
@Benouare
Copy link
Contributor Author

Benouare commented Jan 8, 2020

On my dev version it works using the uid (with little changes...).
I have to recheck my changes before to push: I change the uid from str (uuid.uuid4().hex()) to the main class UUID (uuid.uuid4()), so i had to change all the params in some classes.
It looks like we are not able to put a Str as param for the -set_serial, it's because of that, that i need to change the self.uid type.
I will do that maybe tonight, or tomorrow morning.

Thanks for quickly checking out on it, I suspected it but wasn't sure. Let's not change uid type because at this stage uid is too much spread out to change.

To fix this bug, we need a unique integer that can act as certificate serial ID. If time.time() is itself running into race condition, can we do something else to mitigate the race condition. We don't even need an auto-incrementing ID, a random unique integer / floating point will work for our use case.

NOTE: Client threads run within context of a particular worker. There can be up-to N worker processes.

So can we append to time.time() to get a unique integer (actually it's a floating point). Some options from top of the head:

a) IIRC uid is just a hex string so we should be able to get numbers out of it.

b) Append time.time() value with a random number. E.g. say by using https://docs.python.org/3/library/random.html. For a perfect solution with random, we should ideally seed them differently per worker process.

c) Or append time.time() with first k integers obtained from uid as discussed in a) above

One or more combination of above should result in a unique identifier. Wdyt?

I tried the a) solution -> int of the hex. The advantage is for me that it's already generated so with that solution i think we are not consuming more resources...

@Benouare Benouare closed this Jan 8, 2020
@abhinavsingh
Copy link
Owner

@Benouare That sounds great. Please send a PR to address the same :) Thank you!!!

@abhinavsingh
Copy link
Owner

@Benouare That sounds great. Please send a PR to address the same :) Thank you!!!

Just realized you actually closed the PR, will you be sending a separate one? I gazed over your current changes and it looks good to me. Changing UUID from str to UUID type itself works.

@Benouare Benouare reopened this Jan 8, 2020
@abhinavsingh abhinavsingh changed the title Fix for OSError when generating certificates Use UUID as request ID instead of a string. Reuse UUID to fix OSError when generating certificates Jan 31, 2020
@abhinavsingh
Copy link
Owner

This PR looks good for merge. Waiting for tests to finish. Thanks again and apologies it took time to get back to proxy.py.

@Benouare
Copy link
Contributor Author

Benouare commented Feb 1, 2020

Hi Abhinav,

This PR looks good for merge. Waiting for tests to finish. Thanks again and apologies it took time to get back to proxy.py.

No worries for the time ;).
When you are saying "Waiting for tests to finish", it looks like the tests are working well, there is just

  • codecov/patch
  • codecov/project
  • pyup.io/safety-ci
    that are not working
    Are you waiting something about me?

@abhinavsingh
Copy link
Owner

Just noticed it is codecov and pyup not updating (These checks doesn't seems to work out of box for incoming PRs).

All good, thank you, excellent investigation overall here :)

@abhinavsingh abhinavsingh merged commit 12de12d into abhinavsingh:develop Feb 1, 2020
@Benouare Benouare deleted the fix/TLS_Interception_issue branch February 1, 2020 17:09
anton-lisanin referenced this pull request in neonevm/proxy-model.py Oct 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Fix A PR sent for a bug fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants