Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace obsolete websocket library -- allows WebAssembly #433

Merged
merged 19 commits into from
Jul 1, 2022

Conversation

Jmgr
Copy link
Contributor

@Jmgr Jmgr commented Nov 2, 2021

This replaces the unmaintained golang.org/x/net/websocket websocket library with nhooyr.io/websocket. This allows compiling go-ably to WebAssembly and using Ably from a browser.

resolves #368

@Jmgr Jmgr requested a review from sacOO7 November 2, 2021 16:43
@jamienewcomb
Copy link
Member

@Jmgr is this PR still relevant?

@Jmgr
Copy link
Contributor Author

Jmgr commented Apr 7, 2022

I believe so, unless we don't care about WebAssembly? That's quite a small change also.

@jamienewcomb jamienewcomb removed the request for review from sacOO7 April 7, 2022 13:38
@jamienewcomb
Copy link
Member

@Jmgr could you solve the conflicts when you get an opportunity then and then i can assign some reviewers :)

@Jmgr Jmgr force-pushed the feature/webassembly-compatibility branch from f0ea2a8 to bcb2c3b Compare April 7, 2022 13:49
@Jmgr
Copy link
Contributor Author

Jmgr commented Apr 7, 2022

@Jmgr could you solve the conflicts when you get an opportunity then and then i can assign some reviewers :)

Done.

@Rosalita
Copy link
Contributor

I just took a look at the test that's failing, its TestRealtime_RSC7_AblyAgent/RSC7d3_:_Should_set_ablyAgent_header_with_correct_identifiers

It's failing because the header was expected to be
"ably-go/1.2.6 go/1.17.10 linux/5.13.0-1023-azure"

But it's actual : ""

This might be because in ably/websocket.go there's this deleted line...

config.Header.Set(ablyAgentHeader, ablyAgentIdentifier)

Hope this helps

@Jmgr
Copy link
Contributor Author

Jmgr commented May 20, 2022

I just took a look at the test that's failing, its TestRealtime_RSC7_AblyAgent/RSC7d3_:_Should_set_ablyAgent_header_with_correct_identifiers

It's failing because the header was expected to be "ably-go/1.2.6 go/1.17.10 linux/5.13.0-1023-azure"

But it's actual : ""

This might be because in ably/websocket.go there's this deleted line...

config.Header.Set(ablyAgentHeader, ablyAgentIdentifier)

Hope this helps

Yep, that was the issue, thanks. TestRealtime_RSC7_AblyAgent/RSC7d3_:_Should_set_ablyAgent_header_with_correct_identifiers and TestRealtimeConn_AuthError seem to fail randomly though, is that a known issue?

@Rosalita
Copy link
Contributor

@Jmgr we have a few known issues currently open for unstable tests they are #526 #519 #512 #508 #502 #472

To assist us with unstable tests we have a test observability server here https://test-observability.herokuapp.com/repos which we are actively using to identify, fix and re-stabilise the tests in CI.

Taking a look now for this specific branch, I can see this test TestRealtimeConn_SendErrorReconnects has only ever failed on this branch since test observability was turned on see https://test-observability.herokuapp.com/repos/ably/ably-go/test_cases/1395fa36-ffac-4034-ba07-3e289a0a4837

So I think there could be something specific in this PR causing that specific intermittent test failure.

@Rosalita
Copy link
Contributor

Looking deeper, the specific error for the testTestRealtimeConn_SendErrorReconnects gets logged as this...

=== RUN   TestRealtimeConn_SendErrorReconnects

[311](https://github.com/ably/ably-go/runs/6523755077?check_suite_focus=true#step:6:312)
    realtime_conn_integration_test.go:290: timed out waiting for channel send

[312](https://github.com/ably/ably-go/runs/6523755077?check_suite_focus=true#step:6:313)    ably_test.go:56: safeclose 0: failed to close ablytest.realtimeIOCloser: [ErrorInfo :failed to get reader: received close frame: status = StatusNormalClosure and reason = "" code=0 (error code not set) statusCode=0]

This test causes a send error and expect message to be enqueued and transport to be closed.
Then the test expects that reconnect should happen instantly as a result of transport closure.
Then the test expects a message to be published after reconnection.

Except after transport is closed, it doesn't seem to be reconnecting instantly as the send it tries to make after close is sometimes timing out.

Do you think switching the websocket library might have caused this?

@Rosalita
Copy link
Contributor

Rosalita commented Jun 6, 2022

Given the current situation on this project with integration test stability, I have raised #541 I think the responsible thing to do here is to add unit tests before switching the websocket package. Having greater confidence in the tests which will allow us to make this change with certainty that no new issues have been introduced.

@Jmgr Jmgr marked this pull request as ready for review June 28, 2022 12:08
@Rosalita
Copy link
Contributor

I've just noticed that lines 144 and 145 of websocket_internal_test.go arent needed.

expectedPayloadType byte
expectedResult      *websocketConn

these two lines can be removed.

Jmgr and others added 3 commits June 28, 2022 15:03
Co-authored-by: Rosie Hamilton <Rosalita@users.noreply.github.com>
Co-authored-by: Rosie Hamilton <Rosalita@users.noreply.github.com>
…bly-go into feature/webassembly-compatibility
@Jmgr Jmgr requested a review from lmars June 28, 2022 14:14
@Rosalita
Copy link
Contributor

Some observations on the test failures that are happening for this branch:

While unit tests pass, it looks integration tests have failed for 3 out of 4 attempts. The integration test which is failing isTestRealtimeConn_SendErrorReconnects. This the same integration test which appeared to become unstable the last time we tried to get this work merged.

Inspecting this test I can see that it uses a custom dial function to make the websocket connection. The unit test for websockets only test the default dial function. This would explain why the unit tests pass but this integration test is failing.

The integration test is failing because it is timing out waiting for send, however it is worth noting that the test is not failing 100% of the time. Indicating that it has become flakey, most likely due to a race or timing problem.

Inspecting the custom dial function in this test, it sets up the websocket connection in such a way that:

  • if an error is sent to a channel named sendErr that error will be returned when the websocket connection tries to send a message.
  • The websocket connection is also mocked so that it sends to a channel called closed immediately before closing the websocket connection.
  • The websocket receive function is not modified in the mock used by the test.

The dial function uses an unbuffered channel called allowDial which means the dial function will block and not try to dial until it receives a value from the allowDial channel.

The test creates a new realtime client using the custom dial function.
It then sends a value to allowDial which looks like it will trigger the dial function to dial.
It then waits until there is a connected event and asserts there is no errors.
The test then sends an error to the sendErr channel, this will cause the mocked websocket connection to return a send error.

Then this test makes a new channel called publishErr, which is a buffered channel which can hold 1 message and starts a new concurrent Go routine to publish a message named "test" to a channel named "test". The result of that publish is sent to the publishErr channel.

Then it looks like the test asserts that the closed channel has received a message, this would be triggered by the mocked connection when the mock closed function is called.

It also asserts that the go routine which published to a channel didn't experience an error when publishing.

Then the test tries to instantly send an empty struct to the allowDial channel again. This is the line that causes the test to fail with message realtime_conn_integration_test.go:290: timed out waiting for channel send.

So it looks like the code that is failing here is code in package ablytest.

I'm not sure yet why the ablytest package is used to send to the allowDial channel as the first time this channel is sent to, the send is done with allowDial <- struct{}{} and the ablytest package is not used.

@Rosalita
Copy link
Contributor

Rosalita commented Jun 29, 2022

I've found two ways to fix this test locally:

  1. in ablytest/timeout.go increase the timeout for "instantly" by making a change on line 10. Changing var Instantly = Before(10 * time.Millisecond) to a 20 ms delay.

  2. Instead of using the ablytest package to send an empty struct to the allowDial channel, replace line 290 with allowDial <- struct{}{}

Given that the websocket package was changed, the new websocket package may sometimes take slightly longer to close a connection than the old package did. I think the most sympathetic way to stop this test from failing some of the time would be to go with option 1 to increase the timeout from 10ms to 20ms.

This is a quick fix for enabling websocket tests to pass using nhooyr.io/websocket.
In the long term we would probably want to remove all timeouts in tests though.
@Rosalita
Copy link
Contributor

Rosalita commented Jul 1, 2022

I tested this morning and ran the TestRealtimeConn_SendErrorReconnects test 15 times on this branch. It timed out waiting for the channel send twice, so I think even after increasing the timeout in ablytest this test continues to be unstable. I think we should change the test to send to the allowDial channel without using the ablytest package (which was option 2). I made that change locally and ran the test 15 times and it passed every time. So I am confident that would not introduce more instability to the test pipeline. By going with option 2 we can revert the timeout in ablytest back to 10ms. I will put these changes up as suggestions to the PR shortly.

ably/websocket.go Outdated Show resolved Hide resolved
ablytest/timeout.go Outdated Show resolved Hide resolved
Jmgr and others added 3 commits July 1, 2022 10:27
…bly-go into feature/webassembly-compatibility
Co-authored-by: Rosie Hamilton <Rosalita@users.noreply.github.com>
ablytest/timeout.go Outdated Show resolved Hide resolved
Copy link
Contributor

@Rosalita Rosalita left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for all the effort one this one. I'm happy to approve 👍

@Jmgr
Copy link
Contributor Author

Jmgr commented Jul 1, 2022

No worries, thanks for the help in getting this done!

@Jmgr Jmgr merged commit 83054f2 into main Jul 1, 2022
@Jmgr Jmgr deleted the feature/webassembly-compatibility branch July 1, 2022 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Migrate from deprecated websocket package
4 participants