Skip to content
This repository has been archived by the owner on Oct 29, 2020. It is now read-only.

Signups endpoint is timing out. #7333

Open
DFurnes opened this issue Mar 27, 2017 · 12 comments
Open

Signups endpoint is timing out. #7333

DFurnes opened this issue Mar 27, 2017 · 12 comments

Comments

@DFurnes
Copy link
Contributor

DFurnes commented Mar 27, 2017

BUG

Current Behavior

The /v1/api/signups endpoint in Phoenix seems to either timing out or returning 500s. It seems like the timeouts started to occur roughly around the same time that we deployed the fix for #7329 (the diff for that deploy is in links below).

Desired Behavior

It should return a successful response in a reasonable amount of time.

Steps to Replicate

curl "https://www.dosomething.org/api/v1/signups?count=10"
HTTP/1.1 500 Internal Server Error
Content-Type: application/json
X-Drupal-Cache: MISS
Expires: Sun, 19 Nov 1978 05:00:00 GMT
Cache-Control: no-cache, must-revalidate
X-Content-Type-Options: nosniff
X-Frame-Options: Allow-From: http://optimizely.com
Via: 1.1 varnish
Set-Cookie: USDRUP2=us-varnish-1; path=/
Access-Control-Allow-Origin: *
Fastly-Restarts: 1
Content-Length: 2
Accept-Ranges: bytes
Date: Mon, 27 Mar 2017 19:46:57 GMT
Via: 1.1 varnish
Connection: close
X-Fastly-Original: /api/v1/signups?count=10
X-Fastly-Lookup: None
X-Served-By: cache-iad2647-IAD
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1490643973.316112,VS0,VE44432
Vary: Accept
X-Fastly-Country-Code-Client: US

Why This Matters

This is causing errors on SMS messaging in Gambit and we want to send a broadcast this afternoon!

💣

Relevant Screenshots + Links

@DFurnes
Copy link
Contributor Author

DFurnes commented Mar 27, 2017

@blisteringherb found the error for the example 500 request that I included above:

PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate
16384 bytes) in Unknown on line 0" while reading response header from upstream, client:
104.156.95.47, server: localhost, request: "GET /api/v1/signups?count=10 HTTP/1.1",
upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "www.dosomething.org"

@blisteringherb
Copy link
Contributor

Possibly something minor, but is there a chance the wrong data/structure is being sent in here?

Mar 27 15:24:24 ip-10-100-35-11 drupal: https://www.dosomething.org|1490642664|php|104.156.95.47|https://www.dosomething.org/api/v1/signups?count=10||2800057||Warning: Invalid argument supplied for foreach() in dosomething_helpers_extract_field_data() (line 179 of /var/www/www.dosomething.org/releases/20170327182454/lib/modules/dosomething/dosomething_helpers/dosomething_helpers.module).

@pnasrat
Copy link

pnasrat commented Mar 27, 2017

23.235.46.30 - - [27/Mar/2017:14:32:41 -0400] "GET /api/v1/signups?count=100&user=8771655 HTTP/1.1" 200 2137 "-" "GuzzleHttp/6.2.1 curl/7.35.0 PHP/7.0.17-2+deb.sury.org~trusty+1"
104.156.95.47 - - [27/Mar/2017:15:24:49 -0400] "GET /api/v1/signups?count=10 HTTP/1.1" 500 12 "-" "GuzzleHttp/6.2.1 curl/7.35.0 PHP/7.0.17-2+deb.sury.org~trusty+1"
23.235.46.33 - - [27/Mar/2017:15:25:20 -0400] "GET /api/v1/signups?count=10 HTTP/1.1" 500 12 "-" "GuzzleHttp/6.2.1 curl/7.35.0 PHP/7.0.17-2+deb.sury.org~trusty+1"
23.235.46.38 - - [27/Mar/2017:15:46:35 -0400] "GET /api/v1/signups?count=10 HTTP/1.1" 500 12 "-" "Paw/3.0.16 (Macintosh; OS X/10.12.3) GCDHTTPRequest"
104.156.95.47 - - [27/Mar/2017:15:46:57 -0400] "GET /api/v1/signups?count=10 HTTP/1.1" 500 12 "-" "Paw/3.0.16 (Macintosh; OS X/10.12.3) GCDHTTPRequest"
23.235.46.25 - - [27/Mar/2017:15:51:26 -0400] "GET /api/v1/signups?count=100&user=1940652 HTTP/1.1" 200 10691 "-" "GuzzleHttp/6.2.1 curl/7.35.0 PHP/7.0.17-2+deb.sury.org~trusty+1"
23.235.46.25 - - [27/Mar/2017:15:53:14 -0400] "GET /api/v1/signups?count=100&user=1940652 HTTP/1.1" 200 10691 "-" "GuzzleHttp/6.2.1 curl/7.35.0 PHP/7.0.17-2+deb.sury.org~trusty+1"
23.235.46.25 - - [27/Mar/2017:15:53:44 -0400] "GET /api/v1/signups?count=100&user=1940652 HTTP/1.1" 200 10691 "-" "GuzzleHttp/6.2.1 curl/7.35.0 PHP/7.0.17-2+deb.sury.org~trusty+1"
104.156.95.35 - - [27/Mar/2017:15:54:23 -0400] "GET /api/v1/signups?count=100&user=1940652 HTTP/1.1" 200 10691 "-" "GuzzleHttp/6.2.1 curl/7.35.0 PHP/7.0.17-2+deb.sury.org~trusty+1"

@DFurnes
Copy link
Contributor Author

DFurnes commented Mar 27, 2017

Yeah, it seems like it works when you whittle down the query by specifying a user to filter by.

@aaronschachter What's an example request URL from Gambit?

@pnasrat
Copy link

pnasrat commented Mar 27, 2017

nginx/dosomething-error.log:2017/03/27 15:24:49 [error] 1336#0: *50685 FastCGI sent in stderr: "PHP message: PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 4096 bytes) in /var/www/www.dosomething.org/releases/20170327182454/html/includes/database/database.inc on line 2227
nginx/dosomething-error.log:PHP message: PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 16384 bytes) in Unknown on line 0" while reading response header from upstream, client: 104.156.95.47, server: localhost, request: "GET /api/v1/signups?count=10 HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "www.dosomething.org"
nginx/dosomething-error.log:2017/03/27 15:25:20 [error] 1336#0: *50608 FastCGI sent in stderr: "PHP message: PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 4096 bytes) in /var/www/www.dosomething.org/releases/20170327182454/html/includes/database/database.inc on line 2227
nginx/dosomething-error.log:PHP message: PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 16384 bytes) in Unknown on line 0" while reading response header from upstream, client: 23.235.46.33, server: localhost, request: "GET /api/v1/signups?count=10 HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "www.dosomething.org"
nginx/dosomething-error.log:2017/03/27 15:46:35 [error] 1337#0: *52692 FastCGI sent in stderr: "PHP message: PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 4096 bytes) in /var/www/www.dosomething.org/releases/20170327182454/html/includes/database/database.inc on line 2227" while reading response header from upstream, client: 23.235.46.38, server: localhost, request: "GET /api/v1/signups?count=10 HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "www.dosomething.org"
nginx/dosomething-error.log:2017/03/27 15:46:57 [error] 1337#0: *52692 FastCGI sent in stderr: "PHP message: PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 4096 bytes) in /var/www/www.dosomething.org/releases/20170327182454/html/includes/database/database.inc on line 2227
nginx/dosomething-error.log:PHP message: PHP Fatal error:  Allowed memory size of 1073741824 bytes exhausted (tried to allocate 16384 bytes) in Unknown on line 0" while reading response header from upstream, client: 104.156.95.47, server: localhost, request: "GET /api/v1/signups?count=10 HTTP/1.1", upstream: "fastcgi://unix:/run/php/php7.0-fpm.sock:", host: "www.dosomething.org"

@DFurnes
Copy link
Contributor Author

DFurnes commented Mar 27, 2017

Interesting – I'm getting a successful response when querying Phoenix's /api/v1/signups endpoint with my Northstar or Drupal ID, but an empty response when I use Northstar proxy...

# good, returns full response
curl "https://www.dosomething.org/api/v1/signups?count=10&user=5543dfd6469c64ec7d8b46b3" \

# good, returns full response
curl "https://www.dosomething.org/api/v1/signups?count=10&user=1258186" \

# bad, returns { "data": [] }
curl "https://northstar.dosomething.org/v1/signups?user=5543dfd6469c64ec7d8b46b3&count=10" \
     -H "Accept: application/json" \
     -H "Authorization: Bearer ***** Hidden credentials *****" \
     -H "Content-Type: application/json; charset=utf-8"

@DFurnes
Copy link
Contributor Author

DFurnes commented Mar 27, 2017

For the Northstar empty-response, my guess is that something weird is happening with how the drupalIdForNorthstarId method in Northstar interacts with the new ID logic in Phoenix. But not really sure if that's the issue since we're seeing timeouts.

@aaronschachter
Copy link
Contributor

I made and deployed the change in Gambit today to query Phoenix instead of Northstar (https://github.com/DoSomething/gambit/pull/830) per https://dosomething.slack.com/archives/C02BBRN5A/p1490634446294290

If the Northstar proxy stays broken, the mobile apps stay broken.

If we drop support for the mobile apps completely (which.. we kind of have if we don't fix the proxy or update the apps), we may not have a need to join on things like Reportback Items or Campaign information anymore on this Signups index -- which could be helpful for performance...

@rapala61
Copy link

While load testing Gambit thor, it became apparent to me that using an aggressive concurrency setting ended in a high error rate of requests made to Northstar thor. Requests would start to take longer to respond very rapidly. Once it passed the 30 second mark, Heroku router interrupted the request and logged an H12 Timeout error.

@aaronschachter
Copy link
Contributor

@DFurnes to answer #7333 (comment) Gambit makes 2 requests to the Signups endpoint:

  • GET signups?user=:id:&campaigns=:id: for one Campaign ID, used by the chatbot route
  • GET signups/:id for a Signup ID, used by the signups route

@aaronschachter
Copy link
Contributor

aaronschachter commented Mar 29, 2017

Investigating in New Relic, and one idea that comes to mind is introducing a query parameter to exclude Reportback Files. The mobile apps (and possibly other DS apps?) need Reportback File image URL's to render User profiles, but Gambit doesn't ever use them -- it just cares about the Signup's reportback.quantity to determine whether the current User has completed the Campaign (side note: Gambit also currently requires the Signup user.id to exist)

Adding the parameter and conditionally joining on RB files could be a way to improve Gambit performance without breaking the mobile app User profiles.

Here's some examples of 5 minute GET signups requests I've found from poking around New Relic for Gambit external services

gambit-response-times

From Phoenix New Relic:
phoenix-services

Note: User profiles in mobile app are currently broken while the Northstar Signups proxy is broken.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants