Job failing on GCP but works on AWS #2057

xtreme-peter-iskandar · 2018-02-26T15:07:47Z

Bug Report

Issue

When we run our integration tests for our application it frequently (i.e., ~90% of the time) will fail due to what appears to be a networking issue.

Our application uses RSpec/Capybara/Headless Chrome in order to run the integration tests. As part of the tests, we call out to a script to seed the app with data via its API. The architecture from the perspective of the tests is as follows:

The failure mode appears to be that at some point, the API gateway loses the ability to talk to the microservices behind it. There is not an apparent pattern for when this happens - 10% of the time, the tests will complete without error. Sometimes, it will fail on the very first request to write seed data to the API. Other times, it will fail halfway through seeding data.

For example, the API gateway will see a request and map it correctly:

2018-02-23 22:32:22.108 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.web.ZuulHandlerMapping      : Matching patterns for request [/v2/slots] are [/v2/slots/**]
2018-02-23 22:32:22.108 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.web.ZuulHandlerMapping      : URI Template variables for request [/v2/slots] are {}
2018-02-23 22:32:22.108 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.web.ZuulHandlerMapping      : Mapping [/v2/slots] to HandlerExecutionChain with handler [org.springframework.cloud.netflix.zuul.web.ZuulController@18cebaa5] and 1 interceptor
2018-02-23 22:32:22.122 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.filters.SimpleRouteLocator  : Finding route for path: /v2/slots
2018-02-23 22:32:22.122 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.filters.SimpleRouteLocator  : servletPath=/
2018-02-23 22:32:22.122 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.filters.SimpleRouteLocator  : zuulServletPath=/zuul
2018-02-23 22:32:22.122 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.filters.SimpleRouteLocator  : RequestUtils.isDispatcherServletRequest()=true
2018-02-23 22:32:22.122 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.filters.SimpleRouteLocator  : RequestUtils.isZuulServletRequest()=false
2018-02-23 22:32:22.122 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.filters.SimpleRouteLocator  : adjustedPath=/v2/slots
2018-02-23 22:32:22.122 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.filters.SimpleRouteLocator  : Matching pattern:/v2/slots/**
2018-02-23 22:32:22.122 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.zuul.filters.SimpleRouteLocator  : route matched=ZuulProperties.ZuulRoute(id=slots, path=/v2/slots/**, serviceId=null, url=http://127.0.0.1:9086, stripPrefix=false, retryable=null, sensitiveHeaders=[], customSensitiveHeaders=false)
2018-02-23 22:32:22.123 DEBUG 13229 --- [nio-9084-exec-8] o.s.c.n.z.f.r.SimpleHostRoutingFilter    : 127.0.0.1 9086 http
2018-02-23 22:33:22.188  INFO 13229 --- [       Thread-7] ConfigServletWebServerApplicationContext : Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@e056f20: startup date [Fri Feb 23 22:30:02 UTC 2018]; parent: org.springframework.context.annotation.AnnotationConfigApplicationContext@246ae04d

And the corresponding microservice will not see that request being made before it is terminated due to the test suite failing (note tthat he GET to /health is the only request received by the microservice):

2018-02-23 22:31:16.504  INFO 13556 --- [nio-9086-exec-1] i.p.m.timelines.api.RequestTimeLogger    : GET /health: 709 ms
2018-02-23 22:33:22.230  INFO 13556 --- [       Thread-6] ConfigServletWebServerApplicationContext : Closing org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@a4102b8: startup date [Fri Feb 23 22:31:06 UTC 2018]; parent: org.springframework.context.annotation.AnnotationConfigApplicationContext@735f7ae5
2018-02-23 22:33:22.235  INFO 13556 --- [       Thread-6] o.s.c.support.DefaultLifecycleProcessor  : Stopping beans in phase 0

The seeding script will then die with a read timeout:

...
Seeding slots: 2018-02-23 22:32:22 +0000
Creating slot for person 3 on project 1: 2018-02-23 22:32:22 +0000
rake aborted!
Faraday::TimeoutError: Net::ReadTimeout

We run all of the processes on the same container so all networking should be via the loopback interface.

We perform a GET request directly to /health endpoint on each of the microservices while they are booting, in order to determine that they have come up successfully and that the integration test can begin. This seems to work reliably.

What we tried

increased worker vm types (8cpu 30Gb)
updated ruby versions
increased logging on rake call (--trace)
*ssh'd into the container and curl'd config server before databases were populated to make sure we could hit it from the VM
switch "localhost" to 127.0.0.1
increased the time out to 2 minutes
investigated the target pool and & forwarding port settings on GCP, didn't change anything
investigated firewall rules on GCP, nothing was blocking the ports or anything
set the priviledged flag to true on container
changed port being used to host failing server

Setup

Concourse version: 3.8.0
Deployment type (BOSH/Docker/binary): BOSH
Infrastructure/IaaS: GCP
Did this used to work? Works on AWS

The text was updated successfully, but these errors were encountered:

Submodule src/github.com/beevik/etree 90dafc1e..4cd0dd97 (rewind): < add attribute sort support. < Release v1.0.1 < Update path documentation. < Minor code reordering. < add support for absolute path queries. < Update travis config. < fix bug in GetRelativePath. < Modify GetPath and GetRelativePath. < Added a GetPath() and GetRelativePath() to get the paths of an element. < Update travis config < Added filterText type < Added [text()] syntax to retrieve all elements with non empty text < path: add text filters < Fix broken Markdown headings < Add Permissive read setting. < Fix unit test. Submodule src/github.com/concourse/tsa 49a729b..e1df238: > fix race/panic in tsa suite Submodule src/github.com/gorilla/handlers 7e0847f9..3a5767ca (rewind): < added ability to register custom log formatter (#131) < Fix typo in cors.go (#127) < [bugfix] Handle CORS pre-flight request in middleware (#112) < Revert "Add Vary header when allowedOrigins is * (#114)" (#122) < Add Vary header when allowedOrigins is * (#114) < distinguish between explicit and implicit star (#118) < [bugfix] Don't return the origin header when configured to * (#116) < Travis go18 (#106) < use http.StatusOK as initial value for responseLogger.status (#103) < README.md: Add sourcegraph badge < Merge pull request #97 from nwidger/master Submodule src/github.com/gorilla/mux e48e440e..9fa818a4 (rewind): < Add test for multiple calls to Name(). Fixes #394 < Clarify behaviour of Name method if called multiple times. < Update LICENSE & AUTHORS files. (#386) < Initialize user map (#371) < [deps] Add go.mod for versioned Go (#376) < [docs] Improve docstrings for middleware, skipclean (#375) < [docs] Doc fix for testing variables in path (#374) < Add CORSMethodMiddleware (#366) < Fix linter issues (docs) (#370) < [build] Update Go versions; add 1.10.x (#364) < Fix table-driven example documentation (#363) < Make Use() variadic (#355) < Modify http status code to variable in README (#350) < Modify 403 status code to const variable (#349) < Create authentication middleware example. (#340) < [docs] Clarify SetURLVars (#335) < [docs] Document route.Get* methods consistently (#338) < [docs] README.md: Improve "walking routes" example. (#337) (#323) < README.md: add miss "time" (#336) < [docs] Fix doc.go (#333) < [docs] Add testing example (#331) < [docs] Fix Middleware docs typos (#332) < Update doc.go: r.AddMiddleware(...) -> r.Use(...) < Make shutdown docs compilable (#330) < [feat] Add middleware support as discussed in #293 (#294) < [docs] Add graceful shutdown example (#329) < refactor routeRegexp, particularily newRouteRegexp. (#328) < Public test API to set URL params (#322) < [docs] Add example usage for Route.HeadersRegexp (#320) < [docs] Note StrictSlash re-direct behaviour #308 (#321) < Create ISSUE_TEMPLATE.md (#318) < [bugfix] Fix method subrouter handler matching (#300) (#317) < [docs] fix outdated UseEncodedPath method docs (#314) < MatchErr is set to ErrNotFound if NotFoundHandler is used (#311) < [docs] Document router.Match (#313) < [build] Allow tip failures (#312) < .travis.yml: Remove versions < go1.5 from build matrix < use req.URL.EscapedPath() instead of getPath(req) (#306) < GetQueryTemplates and GetQueryRegexp extraction (#304) < Added 1.9 build step (#303) < Fix WriteHeader in TestA301ResponseWriter. (#301) < [docs] Document evaluation order for routes (#297) < [docs] README.md: add missing `.` (#292) < [docs] Fix missing space in docstring (#289) < Fix #271: Return 405 instead of 404 when request method doesn't match the route < Prefer scheme on child route when building URLs. < Use scheme from parent router when building URLs. < Fix typo < Add test and fix for escaped query values. < Update docs. < Add tests for support for queries in URL reversing. < Add support for queries in URL reversing. < Update Walking Routes Section < Fix invalid example code < Removing half of conflict marker (#268) < Update README with example for Router.Walk < Update ancestors parameter for WalkFunc for matcher subrouters < Update Walk to match all subrouters < Support building URLs with non-http schemes. (#260) < Updated README < Added method Route.GetMethods < Added method Route.GetPathRegexp < fixed typo (#250) < Fixing Regexp in the benchmark test (#234) < updating logic in route matcher, cleaner and saner (#235) < Merge pull request #232 from DavidJFelix/patch-1 < Add Go 1.8 to .travis.yml < [bugfix] fail fast if regex is incorrectly specified using capturing groups. (#218) < [docs] Add route listing example to README < Merge pull request #199 from wirehead/minor-doc-tweek < Merge pull request #215 from ShaneSaww/fix_for_subroutes_with_pathPrefix < Merge pull request #196 from olt/doc-non-capture-groups < Add useEncodedPath option to router and routes (#190) < Simplify extractVars, fixes edge cases. (#185) < make the getPath method safer, fixing panics within App Engine (#189) < Add mechanism to route based on the escaped path (#184) < .travis.yml: add go1.7 < [docs] Add logo to README. (#180) < [docs] Add static file example to README; doc.go. (#179) < Clean up some naming in mux_test.go < [bugfix] Fix error handling in Router.Walk (#177) < [docs] README typo (#175) Submodule src/github.com/jonboulle/clockwork e7c6d408..bcac9884 (rewind): < README: Fix "Faking time" Golang playground anchor (#16) < travis: bump go version (#15) < Add support for fake tickers (#8) Submodule src/github.com/russellhaering/goxmldsig 7acd5e4a..eaac44c6 (rewind): < Treat the xml namespace as already declared during exclusive c14n < Avoid mutating the original tree when performing transforms < Correctly build a surrounding NSContext to locate SignedInfo < In NSFindIterateCtx pass the surrounding context of found elements instead of their own context < Improve the efficiency of traversing Signature searching for SignedInfo < Improve namespace handling when locating CanonicalizationMethod < Improve namespace handling in locating SignedInfo < Add etreeutils support for iterating and searching of direct children < Actually expand travis test matrix < Expand go runtime test matrix < Merge pull request #33 from apilloud/chain < Merge pull request #31 from skyportsystems/master < Merge pull request #35 from danikarik/master < Merge pull request #34 from otto-md/master < Merge pull request #30 from skyportsystems/master < Merge pull request #27 from gravitational/rjones/signature < Merge pull request #26 from aidansteele/patch-1 Submodule src/google.golang.org/genproto 383e8b2c..411e09b9 (rewind): < Add response field to HttpRule (#87) < re-enable 1.6 < update from googleapis (#88) < update from googleapis (#85) < update from googleapis (#84) < update from googleapis (#83) < Revert "update from googleapis (#80)" (#81) < update from googleapis (#80) < update from googleapis (#79) < regen: use api-common-protos (#78) < update from googleapis (#76) < regenerate (#75) < update protos using new go protoc plugin (#73) < regen speech pb.gos (#72) < update from googleapis (#71) < update from googleapis (#69) < Update bigtable from googleapis (#70) < add cloud tasks protos (#67) < update from googleapis (#65) < update from googleapis (#63) < update from googleapis (#62) < update from googleapis (#61) < update cloudbuild (#60) < update from googleapis (#59) < update from googleapis (#58) < update generated files from googleapis for googleapis/spanner/* (#57) < update from googleapis (#56) < update from googleapis (#55) < update from googleapis (#54) < update generated file for googleapis/spanner/* (#53) < update from googleapis (#52) < add codeowners (#50) < update from googleapis (#49) < update from googleapis (#48) < update from googleapis (#47) < update from googleapis (#45) < update generated files (#43) < update googleapis (#42) < regenerate protos (#41) < firestore: add generated client (#40) < regenerate from updated googleapis (#39) < update from googleapis (#38) < update from googleapis and protobuf (#37) < regenerated from updated googleapis (#36) < regenerate speech client (#35) < all: regenerate from googleapis (#32) < regenerate with proper protobuf path (#31) < all: regenerate from latest googleapis (#29) < make travis go get cloud.google.com/go/... (#28) < release videointelligence (#26) < all: regenerate from googleapis (#25) Submodule src/google.golang.org/grpc 07ef407d9..0e8b58d22 (rewind): < channelz: unexport unnecessary API on grpc entities (#2257) < channelz: use atomic instead of mutex (#2218) < internal: remove TestingUseHandlerImpl (#2253) < update proto generated code (#2254) < Revert "internal: remove transportMonitor, replace with callbacks" (#2252) < internal: remove transportMonitor, replace with callbacks (#2219) < Change version to 1.15.0-dev (#2247) < interop: implement special_status_message interop test (#2241) < internal/grpcsync: introduce package for synchronization (#2244) < remove 1.6 support for channelz (#2242) < transport: eliminate StreamError; use status errors instead (#2239) < transport: replace ClientTransport with *http2Client for internal usage (#2238) < disable go1.6 travis tests (#2237) < go generate: update proto files (#2236) < ClientConn: add Target() returning target string (#2233) < client: define dialOptions as interfaces instead of functions (#2230) < interop: loosen restrictions on creds per test in interop client (#2231) < Convert io.ErrUnexpectedEOF to a codes.Internal-marked status in toRPCerr. (#2228) < internal/transport: remove unnecessary ServerTransport method (#2224) < internal/transport_test.go: prevent leaking context (#2227) < internal/syscall: add package description (#2226) < transport.go: minor typo fix (#2225) < resolver: document that SetDefaultScheme should be called at init time (#2217) < addrconn: remove unused wait() method (#2220) < dns resolver: exponential retry when getting empty address list (#2201) < internal/transport: remove some unused fields from structs (#2213) < internal: move DialOptions to a new file (#2193) < Benchmark: fix build tags (#2099) < transport: move to internal to make room for new, public transport API (#2212) < balancer: add rpc method to PickOptions (#2204) < transport: double-check deadline when processing server cancelation (#2211) < createTransport: timeout under waitForHandshake case should not have transport transferred to ready stage (#2208) < deprecate stream, move documentation to client|server stream (#2198) < Set and respect HTTP/2 SETTINGS_MAX_HEADER_LIST_SIZE (#2084) < travis: skip race testing on 386 as it is not supported (#2207) < internal: changes to travis to make it do less work (#2200) < stream: in withRetry, block until Status is valid and check on io.EOF (#2199) < grpclb: s/fmt.Errorf/errors.New/ (#2196) < Fix flaky test: TestClientStreamingError (#2192) < Add documentation for loopy. (#2169) < Fix test: wait on server to signal successful accept. (#2183) < Allow interop client to use call creds on any secure channel (#2185) < client: Implement gRFC A6: configurable client-side retry support (#2111) < documentation: clarify SendMsg documentation (#2171) < credentials: cleanup version-specific files (#2178) < Restrict channelz service test to x86 architecture (#2179) < client, server: update dial/server buffer options to support a "disable" setting (#2147) < credentials: add more appengine build tags (#2177) < Revert stickiness (#2175) < minor fix: remove redundant channelz files (#2176) < channelz: stage 4 - add security and socket option info with appengine build tags (#2149) < Update flow control test to have multiple concurrent streams. (#2170) < balancer/grpclb: update to latest lb proto (#2172) < resolver/dns: error if target ends with a colon instead of assuming the default port (#2150) < grpclb: remove old grpclb generated code (#2143) < testing: run test in simulated appengine environment (#2145) < interop: set dns as default scheme in interop client (#2165) < Change version to 1.14.0-dev (#2163) < Don't log grpclb server ending connection as error (#2162) < channelz: move APIs to internal except channelz service (#2157) < transport: notify controlbuf that transport is gracefully closing to ensure proper cleanup (#2158) < Register incoming stream with loopy as soon as it gets created. (#2144) < Import grpclb package in the interop client (#2155) < fix: do not percent encode character tilde (#2139) < grpclb: backoff for RPC call if init handshake was unsucessful (#2077) < status: handle invalid utf-8 characters (#2109) (#2134) < Don't do extra work for keepalive when it's disabled. (#2148) < internal: move backoff to internal (#2141) < Fix flaky tests in transport. (#2120) < internal: Change Lock to RLock since no mutation is performed (#2142) < grpclb: remove redundent testing struct (#2126) < Normalize gRPC LB < Fix test: Account for the fact that Dial can return successfully before Accept. (#2123) < Add some debug info (#2136) < Documentation: create doc describing grpc-go's log levels and their usages (#2033) < internal: Update proto generated code (#2133) < resolver_conn_wrapper.go: fix minor typo (#2135) < internal: move leakcheck to internal/ (#2129) < Revert "status: handle invalid utf-8 characters" (#2127) < status: handle invalid utf-8 characters (#2109) < Revert " channelz: stage 4 - add security and socket option info" (#2124) < grpclb: minor fixes on comments and tests (#2122) < channelz: stage 4 - add security and socket option info (#2098) < Split grpclb out of top level grpc package (#2107) < Reduce error logs in transport. (#2117) < DNS resolver: Throw an error for non-default DNS authority. (#2067) < grpclb: sync messages.proto and update client load reporting (#2101) < alts: copy handshake address in Clone() (#2119) < codes: fix: marshal/unmarshal a Code to JSON fails (#2116) < Account for user configured small io write buffer. (#2092) < clarify CloseSend vs CloseAndRecv; better formatting (#2071) < internal/grpcrand: New package for concurrency-safe randoms (#2106) < Clarify newCCResolverWrapper documentation. (#2100) < Revert "channelz: stage 4 - add security and socket option info" (#2096) < channelz: stage 4 - add security and socket option info (#1965) < stickiness: limit the max count of stickiness keys (#2021) < Benchmarks that runs server and client and separate processes. (#1952) < Synchronize WriteStatus with WriteHeader on server. (#2074) < internal: update proto generated code (#2093) < health: generate health proto from grpc-proto (#2081) < internal: remove redundant channelz service go generate (#2085) < Revert "Strip port from server name in grpclb (#2066)" (#2083) < channelz: generate proto from grpc-proto repo (#2082) < internal: move version to a separate file (#2080) < internal: fix travis failure on alts proto (#2079) < test: make end2end test use split grpc / proto imports (#2069) < credentials/alts: make go:generate rebuild alts protos (#2056) < channelz: split channelz grpc and pb (#2068) < Strip port from server name in grpclb (#2066) < benchmark: listen on all addresses in benchmark servers (#2073) < regenerate *.pb.go files due to proto-gen-go update (#2070) < transport: respect http2 setting SETTINGS_HEADER_TABLE_SIZE (#2045) < Add AuthInfoFromContext utility API (#2062) < Fix possible data loss; Only let reader goroutine handle connection errors. (#1993) < split encode into three functions (#2058) < small documentation addition to NewStream (#2060) < Documentation: Add initial documentation on concurrency (#2034) < status: Introduce FromContextError convenience function (#2057) < Change version to 1.13.0-dev (#2054) < client: introduce WithDisableServiceConfig DialOption (#2010) < fix flaky test caused by race in channelz test (#2051) < Fix typo (#2050) < Ignore metadata that gRPC explicitly sets. (#2026) < internal: better test names (#2043) < Revert "Less mem (#1987)" (#2049) < client: fix interceptors after recent cleanup (#2046) < internal: vet.sh quits when it sees macosx (#2048) < channelz: update proto to canonical version and rename directory (#2044) < interop: Fix unimplemented method test (#2040) < health: set health proto canonical path (#2038) < Fix "deprecated" function godoc comments to match standard formatting (#2027) < proto: update generated code (#2039) < Rename proto import. (#2036) < Fix typos. (#2035) < credentials/alts: Refer to ALTS gRPC types by a different package (#2028) < http2Client: send reset stream when closing the stream on protocol error (#2030) < Stage 3: Channelz server implementation (#1919) < Less mem (#1987) < server: export ServerTransportStreamFromContext for unary interceptors to control headers/trailers (#2019) < dns resolver: create rand seed at init time (#2007) < vet: disallow importing "unsafe" (#2024) < stickiness: avoid using unsafe (#2023) < Fix typos (#2020) < travis: skip vet install for 386 (#2018) < stickiness: add stickiness support (#1969) < Stage 2: Channelz metric collection (#1909) < credentials/alts: Add ServiceOption for server-side ALTS creation (#2009) < documentation: add instructions for running tests locally (#2006) < go vet: fix composite literal uses unkeyed fields (#2005) < documentation: add OAuth2 doc and example (#2003) < reflection: regenerate pb.go file after typo fix (#2002) < Remove unnecessary type conversions (unconvert) (#1995) < Fix typos (#1994) < Merge pull request #1996 from knweiss/gosimple < documentation: mention DialContext is non-blocking by default (#1970) < documentation: mention Register functions should be call at init time (#1975) < cleanup: extend dial context for TestFailFastRPCErrorOnBadCertificates to 10 seconds (#1984) < Fix Test: race between t.Write() and t.closeStream() (#1989) < Small test readability fixes (#1985) < documentation: mention peer will only be populated after RPC completes (#1982) < Channelz: more stable tesing (#1983) < grpclb: fix issues caused by caching SubConns (#1977) < createTransport: check for SHUTDOWN before assigning TransientFailure to ac.state (#1979) < resolver/dns: Typo in lookupHost failure warning (#1981) < Channelz: Entity Registration and Deletion (#1811) < clientconn: add support for unix network in DialContext. (#1883) < documentation: Mark compresser and decompresser as deprecated (#1971) < grpclb: cache SubConns for 10 seconds after it is removed from the backendlist (#1957) < internal: clean up deprecated Invoke() usage (#1966) < Mark old balancer and naming APIs as deprecated (#1951) < Export changes to OSS. (#1962) < metadata: Add Get, Set, and Append methods to metadata.MD (#1940) < server: add grpc.Method function for extracting method from context (#1961) < resolver/manual: fix minor typo (#1960) < status: remove redundant import (#1947) < client: Fix race when using both client-side default CallOptions and per-call CallOptions (#1948) < Change version to 1.12.0-dev (#1946) < resolver: keep full unparsed target string if scheme in parsed target is not registered (#1943) < status: rename Status to GRPCStatus to avoid name conflicts (#1944) < status: Allow external packages to produce status-compatible errors (#1927) < Merge pull request #1941 from jtattermusch/routeguide_reimplement_distance < service reflection can lookup enum, enum val, oneof, and field symbols (#1910) < Documentation: Fix broken link in rpc-errors.md (#1935) < Correct Go 1.6 support policy (#1934) < Add documentation and example of adding details to errors (#1915) < Allow storing alternate transport.ServerStream implementations in context (#1904) < Fix Test: Update the deadline since small deadlines are prone to flakes on Travis. (#1932) < gzip: Add ability to set compression level (#1891) < credentials/alts: Remove the enable_untrusted_alts flag (#1931) < metadata: Fix bug where AppendToOutgoingContext could modify another context's metadata (#1930) < fix minor typos and remove grpc.Codec related code in TestInterceptorCanAccessCallOptions (#1929) < credentials/alts: Update ALTS "New" APIs (#1921) < client: export types implementing CallOptions for access by interceptors (#1902) < travis: add Go 1.10 and run vet there instead of 1.9 (#1913) < stream: split per-attempt data from clientStream (#1900) < stats: add BeginTime to stats.End (#1907) < Reset ping strike counter right before sending out data. (#1905) < resolver: always fall back to default resolver when target does not follow URI scheme (#1889) < server: Convert all non-status errors to codes.Unknown (#1881) < credentials/alts: change ALTS protos to match the golden version (#1908) < credentials/alts: fix infinite recursion bug [in custom error type] (#1906) < Fix test race: Atomically access minConnecTimout in testing environment. (#1897) < interop: Add use_alts flag to client and server binaries (#1896) < ALTS: Simplify "New" APIs (#1895) < Fix flaky test: TestCloseConnectionWhenServerPrefaceNotReceived (#1870) < examples: Replace context.Background with context.WithTimeout (#1877) < alts: Change ALTS proto package name (#1886) < Add ALTS code (#1865) < Expunge error codes that shouldn't be returned from library (#1875) < Small spelling fixes (unknow -> unknown) (#1868) < clientconn: fix a typo in GetMethodConfig documentation (#1867) < Change version to 1.11.0-dev (#1863) < benchmarks: add flag to benchmain to use bufconn instead of network (#1837) < addrConn: Report underlying connection error in RPC error (#1855) < Fix data race in TestServerGoAwayPendingRPC (#1862) < addrConn: keep retrying even on non-temporary errors (#1856) < transport: fix race causing flow control discrepancy when sending messages over server limit (#1859) < interop test: Expect io.EOF from stream.Send() (#1858) < metadata: provide AppendToOutgoingContext interface (#1794) < Add status.Convert convenience function (#1848) < streams: Stop cleaning up after orphaned streams (#1854) < transport: support stats.Handler in serverHandlerTransport (#1840) < Fix connection drain error message (#1844) < Implement unary functionality using streams (#1835) < Revert "Add WithResolverUserOptions for custom resolver build options" (#1839) < Stream: do not cancel ctx created with service config timeout (#1838) < Fix lint error and typo (#1843) < stats: Fix bug causing trailers-only responses to be reported as headers (#1817) < transport: remove unnecessary rstReceived (#1834) < transport: remove redundant check of stream state in Write (#1833) < client: send RST_STREAM on client-side errors to prevent server from blocking (#1823) < Use keyed fields for struct initializers (#1829) < encoding: Introduce new method for registering and choosing codecs (#1813) < compare atomic and mutex performance in case of contention. (#1788) < transport: Fix a data race when headers are received while the stream is being closed (#1814) < Write should fail when the stream was done but context wasn't cancelled. (#1792) < Explain target format in DialContext's documentation (#1785) < gzip: add Name const to avoid typos in usage (#1804) < remove .please-update (#1800) < Documentation: update broken wire.html link in metadata package. (#1791) < Document that all errors from RPCs are status errors (#1782) < update const order (#1770) < Don't set reconnect parameters when the server has already responded. (#1779) < credentials: return Unavailable instead of Internal for per-RPC creds errors (#1776) < Avoid copying headers/trailers in unary RPCs unless requested by CallOptions (#1775) < Update version to 1.10.0-dev (#1777) < compare atomic and mutex performance for incrementing/storing one variable (#1757) < Fix flakey test. (#1771) < grpclb: Remove duplicate init() (#1764) < server: fix bug preventing Serve from exiting when Listener is closed (#1765) < Fix TestGracefulStop flakiness (#1767) < server: fix race between GracefulStop and new incoming connections (#1745) < Notify parent ClientConn to re-resolve in grpclb (#1699) < Add dial option to set balancer (#1697) < Fix test: Data race while resetting global var. (#1748) < status: add Code convenience function (#1754) < vet: run golint on _string files (#1749) < examples: fix concurrent map accesses in route_guide server (#1752) < grpc: fix deprecation comments to conform to standard (#1691) < Adjust keepalive paramenters in the test such that scheduling delays don't cause false failures too often. (#1730) < fix typo (#1746) < fix stats flaky test (#1740) < relocate check for shutdown in ac.tearDown() (#1723) < fix flaky TestPickfirstOneAddressRemoval (#1731) < bufconn: allow readers to receive data after writers close (#1739) < After sending second goaway close conn if idle. (#1736) < Make sure all goroutines have ended before restoring global vars. (#1732) < client: fix race between server response and stream context cancellation (#1729) < In gracefull stop close server transport only after flushing status of the last stream. (#1734) < Deflake tests that rely on Stop() then Dial() not reconnecting (#1728) < Switch balancer to grpclb when at least one address is grpclb address (#1692) < Merge pull request #1724 from grpc/jtattermusch-patch-1 < codes: Add UnmarshalJSON support to Code type (#1720) < naming: Fix build constraints for go1.6 and go1.7 (#1718) < remove stringer and go generate (#1715) < Add WithResolverUserOptions for custom resolver build options (#1711) < Fix grpc basics link in route_guide example (#1713) < Optimize codes.String() method using a switch instead of a slice of indexes (#1712) < Disable ccBalancerWrapper when it is closed (#1698) < Refactor roundrobin to support custom picker (#1707) < Change parseTimeout to not handle non-second durations (#1706) < make load balancing policy name string case-insensitive (#1708) < protoCodec: avoid buffer allocations if proto.Marshaler/Unmarshaler (#1689) < Add comments to ClientConn/SubConn interfaces to indicate new methods may be added (#1680) < client: backoff before reconnecting if an HTTP2 server preface was not received (#1648) < use the request context with net/http handler (#1696) < transport: fix race sending RPC status that could lead to a panic (#1687) < Fix misleading default resolver scheme comments (#1703) < Eliminate data race in ccBalancerWrapper (#1688) < Re-resolve target when one connection becomes TransientFailure (#1679) < New grpclb implementation (#1558) < Fix panics on balancer and resolver updates (#1684) < Change version to 1.9.0-dev (#1682) < set context timeout when Timeout value >= 0 (#1678) < switch balancer based on service config info (#1670) < Add proper support for 'identity' encoding type (#1664) < update code_string.go for new stringer changes (#1674) < addrConn: set ac.state to TransientFailure upon non-temporary errors (#1657) < Eliminate race on ac.acbw (#1666) < Corrected documentation on Server.Serve (#1668) < Update picker doc when returned SubConn is not ready (#1659) < travis: fix GOARCH=386 and add misspell check (#1658) < Add context benchmarks (#1610) < Add protoc command to example/readme (#1653) < Implement transparent retries for gRFC A6 (#1597) < server: add EXPERIMENTAL tag to grpc.ConnectTimeout (#1652) < *: replace deprecated grpc.Errorf calls with status.Errorf (#1651) < server: apply deadline to new connections until all handshaking is completed (#1646) < codec_benchmark_test: fix racy unmarshal behavior and make some cleanups (#1642) < Speed-up quota pools. (#1636) < Check ac state shutdown before setting it to TransientFailure (#1643) < vet.sh: don't check git status when doing -install (#1641) < latency: Listen on localhost:0 instead of :0 in test (#1640) < reduce timeout for tests to 5m (7m for testrace) (#1635) < Introduce new Compressor/Decompressor API (#1428) < Fix settings ack race (#1630) < Update examples/README.md (#1629) < Get method string from stream (#1588) < fix max msg size type issues on different arch (#1623) < Deflake roundrobin TestOneServerDown, and fix test error messages (#1622) < Remove self-imposed limit on max concurrent streams if the server doesn't impose any. (#1624) < Acquire all stream related quota and cache it locally since no more than one write can happen in parallel on stream (#1614) < Make travis 32-bit actually work (#1621) < balancer: reduce chattiness (#1608) < Revert "cap max msg size to min(max_int, max_uint32) (#1598)" (#1619) < cap max msg size to min(max_int, max_uint32) (#1598) < Fix parseTarget for unix socket address without scheme (#1611) < Fix connectivity state transitions when dialing (#1596) < Update go_package declarations (#1593) < ClientHandshake should get the dialing endpoint as the authority (#1607) < Add functions to ClientConn so it satisfies an interface for generated code (#1599) < Re-add support for Go1.6 (#1603) < Make passthrouth resolver the default instead of dns (#1606) < Fix goroutine leak in grpclb_test (#1595) < Add go report card (#1594) < Parse ServiceConfig JSON string (#1515) < Register and use default balancers and resolvers (#1551) < fix misspell (#1592) < Serve() should not return error on Stop() or GracefulStop() (#1485) < Remove single-entry var blocks (#1589) < update fail fast documentation to remove retry language (#1586) < Create versioning and release policy document (#1583) < Skip proxy_test in race mode (#1584) < transport: minor cleanups (comment and error text) (#1576) < Use proto3 in interop tests and end2end tests (#1574) < Change version to 1.8.0-dev (#1573) < Make resolver Build() take a target struct (#1567) < Revert "Temporary disable staticcheck" (#1568) < Update UnknownServiceHandler comment to be clearer about interceptor behavior (#1566) < transport: fix racey send to writes channel in WriteStatus (#1546) < fix stats test race (#1560) < Run tests without -v (#1562) < Remove Go1.6 support (#1492) < Temporary disable staticcheck (#1561) < fix TestServerCredsDispatch and stats test race (#1554) < Make interop client dial blocking (#1559) < benchmark: add type assertion benchmarks (#1556) < fix typo and lint (#1553) < transport: refactor of error/cancellation paths (#1533) < New implementation of roundrobin and pickfirst (#1506) < Update format string to match type (#1548) < add comment to dns package (#1545) < Make IO Buffer size configurable. (#1544) < Use the same hpack encoder on a transport and share it between RPCs. (#1536) < DNS with new API (#1513) < update markdown render (#1542) < Revert "Added localhost to net.Listen() calls to avoid macOS firewall dialog." (#1541) < Added localhost to net.Listen() calls to avoid macOS firewall dialog. (#1539) < transport: remove some defers (#1538) < Use Type() method for OAuth tokens instead of accessing TokenType field. (#1537) < benchmark: add primivites benchmark for Unlocking via defer vs. inline (#1534) < benchmain: format output of benchmark to a table (#1493) < Fix misspells (#1531) < vet.sh: set PATH to force downloaded binaries to be run (#1529) < Fix format error on travis (#1527) < Move primitives benchmarks to package primitives_test (#1522) < Speed up end to end tests by removing an unnecessary sleep (#1521) < Change quota version to uint32 instead on uint64 (#1517) < Fix deadline error on grpclb streams (#1511) < Dedicated goroutine for writing. (#1498) < benchmark: add primitives benchmarks for informational purposes (#1501) < Truncate payload trace string, and turn trace off by default (#1509) < Add leak goroutine checking to grpc/balancer tests (#1497) < Add RegisterIgnoreGoroutine to leakcheck package (#1507) < remove a debug print that causes deadlock (#1505) < vet.sh: fix protoc installation (#1502) < Add new Resolver and Balancer APIs (gRFC L9) (#1408) < Fix to avoid annoying firewall dialog on macOS (#1499) < Move leak check into a separate leakcheck package (#1445) < Change version to 1.7.0-dev (#1496) < Run Go1.9 and 386 on Travis (#1475) < Check "x/net/context" with `go vet` like "context" (#1490) < benchmain: add nop compressor and other usability tweaks (#1489) < Fix context warnings from govet. (#1486) < benchmain: minor bug fixes (#1488) < Update proto generation commands in example doc (#1481) < Remove expiration_interval from grpclb message (#1477) < balancer_test: possible ctx leak, cancel before break (#1479) < Merge pull request #1476 from dfawley/pkg < Fix for 32-bit architectures (#1471) < When sending a non heads-up goaway close the connection if there are no active streams. (#1474) < Remove unnecessary function handleStreamSuspension (#1468) < fix grpclb protos to not cause re-registration of types (#1466) < transport: fix handling of InTapHandle's returned context (#1461) < the cancel function should be called to avoid ctx leak (#1465) < add comment (#1464) < Remove buf copy when the compressor exist (#1427) < transport: Fix deadlock in client keepalive. (#1460) < benchmark: add benchmain/main.go to run benchmark with flag set (#1352) < stats: add methods to allow setting grpc-trace-bin and grpc-tags-bin headers (#1404) < deduplicate dns record in lookup (#1454) < Add -u to installation command (#1451) < addrConn: change address to slice of address (#1376) < go-generate pb.go files and check in Travis to make sure they don't change (#1426) < Fix host string passed to PerRPCCredentials (#1433) < metadata: Remove NewContext and FromContext for gRFC L7 (#1392) < Add status details support to server HTTP handler (#1438) < put *gzip.Writer back to pool (#1441) < Automatic WriteStatus for RecvMsg/SendMsg error on server side (#1409) < Update ServerInHandle comments (#1437) < Server should send 2 goaway messages to gracefully shutdown the connection. (#1403) < Add and use connectivity package for states (#1430) < Add 'experimental' note to ServeHTTP godoc (#1429) < Document Server.ServeHTTP (#1406) < Set peer before sending request (#1423) < Fix missing and wrong license (#1422) < Fix a goroutine leak in DialContext (#1424) < Use `NewOutgoingContext ` in the metadata doc (#1425) < Fix typo < Add flags for tls file path (#1419) < Change comment on stats.End.Error (#1418) < Call cancel on contexts in tests (#1412) < Don't use 64-bit integers with atomic. (#1411) < benchmark: don't stop timer until after workers are done (#1407) < Validate send quota again after acquiring writable channel (#1367) < Use log instead of grpclog in routeguide example (#1395) < Revert "Make all "grpc-" metadata field names reserved (#1391)" (#1400) < Enabling client process multiple GoAways (#1393) < Assign testdata path to correct variable (#1397) < Do not call testdata.Path when defining flags (#1394) < Make all "grpc-" metadata field names reserved (#1391) < remove defer funtion in recvBufferReader Read method (#1031) < Add testdata package and unify testdata to only one dir (#1297) < DNS resolver (#1300) < Expose ConnectivityState of a ClientConn. (#1385) < status: Add WithDetails and Details functions (#1358) < benchmark: remove multi-layer for loop (#1339) < transport: fix minor typo in http2_server.go (#1383) < Add doc in default implementation fatal functions on os.Exit() (#1365) < Fix bufconn.Close to not be blocking. (#1377) < Do not create new addrConn when connection error happens (#1369) < Change version to 1.6.x (#1382) < Revert "Use bufconn in end2end tests." (#1381) < Fix logging method (#1375) < Use bufconn in end2end tests. < Create bufconn package for a local, buffered net.Conn and dialer/listener < Fix a typo in examples/gotutorial.md (#1374) < Use log severity and verbosity level (#1340) < fix deadlock of roundrobin balancer (#1353) < Ignore goroutines spanwned by log.init during leakcheck. (#1368) < Populate callInfo.peer object for streaming RPCs (#1356) < BDP estimation and window update. (#1310) < Canonicalize https://grpc.io as the preferred URL prefix < Update leckCheck to ignore non-gRPC goroutine introduced in Go1.9 (#1351) < Do not flush NewStream header on client side for unary RPCs and streaming RPCs with requests. (#1343) < adjust import order (#1311) < add license for some proto files (#1322) < latency: sleep in Write when BDP is exceeded to avoid buffer bloat (#1330) < Add documentation to deprecate WithTimeout dial option (#1333) < change objects in recvBuffer queue from interface to concrete type to reduce allocs (#1029) < Catch invalid use of Server.RegisterService after Register.Serve (#828) < benchmark: add latency/MTU/bandwidth into testcases (#1304) < Updated documentation of ClientStream. (#1320) < Add support for grpc.SupportPackageIsVersion3 back (#1331) < Deflake TestServerGoAway (#1321) < dont create new reader in recvMsg (#940) < Make Apache 2.0 LICENSE file a verbatim copy (#1329) < Protect bytesSent and bytesReceived with mutex to avoid datarace (#1318) < Add Severity and VerboseLevel to grpclog. (#922) < update LICENSE (#1312) < fix spell (#1314) < Add goroutine safety doc on stream (#1313) < replace 127.0.0.1 with localhost for ipv6 only environment (#1306) < transport: fix error handling on Stream deletion (#1275) < Behaviour Change: transport errors should be coded Unavailable instead of internal. (#1307) < Support ipv6 addresses in grpclb (#1303) < Return header in Stream.Header() if available (#1281) < add license for some files (#1296) < Make RPCs non-failfast in grpclb_test. (#1302) < Specify characters allowed in metadata keys (#1299) < use subtests for the benchmark_test and add it into the Makefile (#1278) < update the path of guide (#950) < Create latency package for realistically simulating network latency (#1286) < Deflake TestFlowContolLogicalRace (#1279) < Merge pull request #1290 from jtattermusch/apache_license < Change version to 1.5.0-dev (#1288) < transport: fix minor typo in 'GoAway' godoc (#1284) < Piggyback window updates for connection with those of a stream. (#1273) < Reopening: Server shouldn't Fatalf in case it fails to encode. (#1276) < Avoid int32 overflow when applying initial window size setting < Revert "Server shouldn't Fatalf in case it fails to encode. (#1251)" (#1274) < Server shouldn't Fatalf in case it fails to encode. (#1251) < Decouple transport flow control from application read. (#1265) < Update references to route_guide.proto to use new directory name (#1270) < add MaxConcurrentStreams to benchmark_test when start the server (#1271) < Merge pull request #1267 from jtattermusch/improve_contributing < re-enable handler_server in end2end test, and fix some failed tests (#1259) < Avoid panic caused by stdlib context package errors (#1258) < Initialize stream properly in handler_server. (#1260) < Expand stream's flow control in case of an active read. (#1248) < Suppress server log message when EOF without receiving data for preface (#1052) < Fixed comment spelling (#1254) < Merge pull request #1165 from lyuxuan/service_config_pr < clientconn, server: replace time.After with time.NewTimer (#998) < grpclb balancer.Close() should not panic if called more than once (#1250) < Add doc and example for mocking streaming RPCs (#1230) < Test for EmptyCallOption < Implement `EmptyCallOption` < Reuse Token for serviceAccount credentials (#1238) < Travis: add staticcheck (#1019) < Defined GA and add pointer to benchmarks (#1239) < call listen with "localhost:port" instead of ":port" in tests (#1237) < fix server panic trying to send on stream as client disconnects #1111 (#1115) < Eagerly set a pointer to nil to help GC (#1232) < add logs to grpclb on send and recv (#1235) < Add stats test for client streaming and server streaming RPCs (#1140) < Adding dial options for PerRPCCredentials (#1225) < Pass custom dialer to balancer (#1205) < Http status to grpc status conversion (#1195) < Calling handleRPC with context derived from the original (#1227) < Use pooled gzip.{Writer,Reader} in gzip{Compressor,Decompressor} (#1217) < tentative fix to a flow control over-give-back bug (#1170) < Ensure that RoundRobin.Close() does not panic. (#1139) < Log the actual error when inTapHandle fails in http2Server (#1185) < make ServerOption panic messages more clear. (#1194) < Make window size configurable. (#1210) < Reset proto before unmarshalling (#1222) < Merge pull request #1221 from adelez/doc_fixit < Fix go buildable source file problem (#1213) < don't add defer func if stats handler is nil (#1214) < Change version to 1.4.0-dev (#1212) < Fix nil pointer dereferences from status.FromProto(nil) (#1211) < Split grpclb client load report test to deflake test. (#1206) < Use unpadded base64 encoding for binary metadata headers; handle padded or unpadded input (#1209) < Never encode binary metadata within the metadata map (#1188) < Client load report for grpclb. (#1200) < Use proto.Equal for equalities on Go proto messages (#1204) < Update grpclb proto and move grpclb into package grpc (#1186) < Revert "temporary disable 1.6 on travis (#1198)" (#1199) < temporary disable 1.6 on travis (#1198) < Revert "To adhere with protocol the server should send RST_STREAM on observing timeout on a strea, (#1130)" < Make sure all in-flight streams close when ClientConn.Close() is called. (#1136) < To adhere with protocol the server should send RST_STREAM on observing timeout on a strea, (#1130) < Fix broken Markdown headings in examples/gotutorial.md (#1189) < Support proxy with dialer (#1098) < grpclb should connect to the second balancer (#1181)

vito · 2019-01-03T16:09:54Z

I don't think we really have the bandwidth to resolve these kinds of issues as they tend to be outside of our realm of expertise and not super related to Concourse's code itself. Sorry, and thanks for providing a bunch of info anyway!

vito closed this as completed Jan 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Job failing on GCP but works on AWS #2057

Job failing on GCP but works on AWS #2057

xtreme-peter-iskandar commented Feb 26, 2018

vito commented Jan 3, 2019

Job failing on GCP but works on AWS #2057

Job failing on GCP but works on AWS #2057

Comments

xtreme-peter-iskandar commented Feb 26, 2018

Bug Report

Issue

What we tried

Setup

vito commented Jan 3, 2019