Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where are the Metro logs? #55

Closed
hancush opened this issue Jul 22, 2020 · 10 comments
Closed

Where are the Metro logs? #55

hancush opened this issue Jul 22, 2020 · 10 comments

Comments

@hancush
Copy link
Member

hancush commented Jul 22, 2020

I'm trying to debug some strange behavior in the Metro scraper. I'm noticing that the main log (/tmp/lametro.log) is empty, and the log before (/tmp/lametro.log.1) only has logs from the scrape I ran manually, and nothing else.

ubuntu@ip-10-0-0-80:~$ wc -l /tmp/lametro.log
0 /tmp/lametro.log

I searched the syslog to confirm scrapes were running, and it seems like they are, and that they're configured to append logs to the expected log.

ubuntu@ip-10-0-0-80:~$ grep CRON /var/log/syslog | grep lametro
...
Jul 22 13:00:01 ip-10-0-0-80 CRON[29955]: (datamade) CMD (/usr/bin/flock -n /tmp/metroevents.lock -c "WINDOW=0.05 $APPDIR/scripts/lametro/windowed-event-scrape.sh" >> /tmp/lametro.log)
Jul 22 13:05:01 ip-10-0-0-80 CRON[32387]: (datamade) CMD (/usr/bin/flock -n /tmp/metrobills.lock -c "WINDOW=0.05 $APPDIR/scripts/lametro/windowed-bill-scrape.sh" >> /tmp/lametro.log)
Jul 22 13:15:01 ip-10-0-0-80 CRON[8136]: (datamade) CMD (/usr/bin/flock -n /tmp/metroevents.lock -c "WINDOW=0.05 $APPDIR/scripts/lametro/windowed-event-scrape.sh" >> /tmp/lametro.log)
Jul 22 13:20:01 ip-10-0-0-80 CRON[10888]: (datamade) CMD (/usr/bin/flock -n /tmp/metrobills.lock -c "WINDOW=0.05 $APPDIR/scripts/lametro/windowed-bill-scrape.sh" >> /tmp/lametro.log)

So why don't we have any logs????

Importantly, both the Chicago and NYC scrapers seem to be updating their respective logs as expected!

ubuntu@ip-10-0-0-80:~$ tail -n 50 /tmp/chicago.log
07/22/2020 08:25:57 INFO pupa: virtual hearing
07/22/2020 08:25:57 INFO scrapelib: GET - http://webapi.legistar.com/v1/chicago/events/5693/eventitems
07/22/2020 08:25:57 INFO scrapelib: GET - http://webapi.legistar.com/v1/chicago/events/5693/eventitems
07/22/2020 08:25:58 INFO pupa: save event 2020-07-30 10:00:00-05:00 Committee on Enviromental Protection and Energy as event_e06d6264-cc1e-11ea-9243-122a3d729da3.json
07/22/2020 08:25:58 INFO scrapelib: GET - https://chicago.legistar.com/MeetingDetail.aspx?LEGID=5694&GID=106&G=503964DF-290E-455B-86CC-7B03A231E984
/home/datamade/.virtualenvs/opencivicdata/lib/python3.5/site-packages/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
07/22/2020 08:25:59 INFO pupa: virtual meeting
07/22/2020 08:25:59 INFO scrapelib: GET - http://webapi.legistar.com/v1/chicago/events/5694/eventitems
07/22/2020 08:25:59 INFO scrapelib: GET - http://webapi.legistar.com/v1/chicago/events/5694/eventitems
07/22/2020 08:26:00 INFO pupa: save event 2020-07-29 10:00:00-05:00 Committee on Public Safety as event_e15ce50a-cc1e-11ea-9243-122a3d729da3.json
07/22/2020 08:26:00 INFO scrapelib: GET - https://chicago.legistar.com/MeetingDetail.aspx?LEGID=5698&GID=106&G=503964DF-290E-455B-86CC-7B03A231E984
/home/datamade/.virtualenvs/opencivicdata/lib/python3.5/site-packages/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
07/22/2020 08:26:00 INFO pupa: virtual meeting
07/22/2020 08:26:00 INFO scrapelib: GET - http://webapi.legistar.com/v1/chicago/events/5698/eventitems
07/22/2020 08:26:01 INFO scrapelib: GET - http://webapi.legistar.com/v1/chicago/events/5698/eventitems
07/22/2020 08:26:02 INFO pupa: save event 2020-09-02 10:00:00-05:00 Committee on Budget and Government Operations as event_e280c85c-cc1e-11ea-9243-122a3d729da3.json
chicago (scrape, import)
  events: {'window': '0.05'}
Not checking sessions...
import jurisdictions...
import organizations...
import people...
import posts...
import memberships...
import bills...
import events...
import vote events...
chicago (scrape, import)
  events: {'window': '0.05'}
events scrape:
  duration:  0:00:37.681782
  objects:
    event: 13
jurisdiction scrape:
  duration:  0:00:00.290168
  objects:
    jurisdiction: 1
    membership: 6
    organization: 3
    person: 6
    post: 52
import:
  event: 0 new 0 updated 13 noop
  jurisdiction: 0 new 0 updated 1 noop
  membership: 0 new 0 updated 6 noop
  organization: 0 new 0 updated 3 noop
  person: 0 new 0 updated 6 noop
  post: 0 new 0 updated 52 noop
ubuntu@ip-10-0-0-80:~$ tail -n 50 /tmp/nyc.log
/home/datamade/.virtualenvs/opencivicdata/lib/python3.5/site-packages/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
07/22/2020 08:30:37 INFO scrapelib: POST - https://legistar.council.nyc.gov/Calendar.aspx/
/home/datamade/.virtualenvs/opencivicdata/lib/python3.5/site-packages/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
07/22/2020 08:30:37 INFO scrapelib: GET - https://legistar.council.nyc.gov/View.ashx?M=IC&ID=797385&GUID=4A687CD7-993D-436D-A16B-F35EBD0B128A
/home/datamade/.virtualenvs/opencivicdata/lib/python3.5/site-packages/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
07/22/2020 08:30:38 INFO scrapelib: GET - https://webapi.legistar.com/v1/nyc/events/18038/eventitems
07/22/2020 08:30:38 INFO scrapelib: GET - https://webapi.legistar.com/v1/nyc/events/18038/eventitems
07/22/2020 08:30:44 INFO pupa: save event 2020-07-28 13:30:00-04:00 City Council as event_87c8bf04-cc1f-11ea-bfc6-122a3d729da3.json
07/22/2020 08:30:44 WARNING legistar: API event could not be found in web interface: https://webapi.legistar.com/v1/nyc/events/18040
07/22/2020 08:30:44 WARNING legistar: API event could not be found in web interface: https://webapi.legistar.com/v1/nyc/events/18041
07/22/2020 08:30:44 INFO scrapelib: GET - https://legistar.council.nyc.gov/View.ashx?M=IC&ID=798357&GUID=3393142C-96BB-4845-AC8E-6657EBC093B0
/home/datamade/.virtualenvs/opencivicdata/lib/python3.5/site-packages/urllib3/connectionpool.py:1004: InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
07/22/2020 08:30:44 INFO scrapelib: GET - https://webapi.legistar.com/v1/nyc/events/18039/eventitems
07/22/2020 08:30:45 INFO scrapelib: GET - https://webapi.legistar.com/v1/nyc/events/18039/eventitems
07/22/2020 08:30:51 INFO pupa: save event 2020-07-23 14:30:00-04:00 Democratic Conference of the Council of the City of New York as event_8bb5bf54-cc1f-11ea-bfc6-122a3d729da3.json
07/22/2020 08:30:51 WARNING legistar: API event could not be found in web interface: https://webapi.legistar.com/v1/nyc/events/18043
07/22/2020 08:30:51 WARNING legistar: API event could not be found in web interface: https://webapi.legistar.com/v1/nyc/events/18044
07/22/2020 08:30:53 ERROR pupa: cannot resolve pseudo id to Bill: ~{"identifier": "T2020-6410"}
nyc (scrape, import)
  events: {'window': '0.05'}
Not checking sessions...
import jurisdictions...
import organizations...
import people...
import posts...
import memberships...
import bills...
import events...
import vote events...
nyc (scrape, import)
  events: {'window': '0.05'}
events scrape:
  duration:  0:00:26.450395
  objects:
    event: 2
jurisdiction scrape:
  duration:  0:00:00.210824
  objects:
    jurisdiction: 1
    organization: 4
    post: 51
import:
  event: 0 new 0 updated 2 noop
  jurisdiction: 0 new 0 updated 1 noop
  organization: 0 new 0 updated 4 noop
  post: 0 new 0 updated 51 noop
@hancush
Copy link
Member Author

hancush commented Jul 22, 2020

Potentially related to opencivicdata/scrapers-us-municipal#328.

@hancush
Copy link
Member Author

hancush commented Jul 22, 2020

I unzipped the most recent log archive:

ubuntu@ip-10-0-0-80:~$ gunzip /tmp/lametro.log.2 -k -c > lametro.log.2

The last Metro scrape we have logs for ran on 7/18:

ubuntu@ip-10-0-0-80:~$ tail -n 100 lametro.log.2
07/18/2020 18:51:32 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6858
07/18/2020 18:51:32 INFO pupa: save bill 2020-0416 in 2019 as bill_9b266a2c-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:32 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6863
07/18/2020 18:51:33 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6863
07/18/2020 18:51:33 INFO pupa: save bill 2020-0421 in 2019 as bill_9bc74096-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:33 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6616
07/18/2020 18:51:34 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6616
07/18/2020 18:51:34 INFO pupa: save bill 2020-0174 in 2019 as bill_9c572b84-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:34 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6877
07/18/2020 18:51:35 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6877
07/18/2020 18:51:35 INFO pupa: save bill 2020-0435 in 2019 as bill_9cf548f0-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:35 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6878
07/18/2020 18:51:36 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6878
07/18/2020 18:51:36 INFO pupa: save bill 2020-0436 in 2019 as bill_9d88a744-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:36 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6893
07/18/2020 18:51:37 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6893
07/18/2020 18:51:37 INFO pupa: save bill 2020-0451 in 2019 as bill_9e26f4f8-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:37 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6906
07/18/2020 18:51:38 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6906
07/18/2020 18:51:38 INFO pupa: save bill 2020-0464 in 2020 as bill_9ebe73fa-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:38 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6721
07/18/2020 18:51:39 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6721
07/18/2020 18:51:39 INFO pupa: save bill 2020-0279 in 2019 as bill_9f55e4ec-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:39 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6913
07/18/2020 18:51:40 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6913
07/18/2020 18:51:40 INFO pupa: save bill 2020-0471 in 2020 as bill_9ff07c00-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:40 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6924
07/18/2020 18:51:41 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6924
07/18/2020 18:51:41 INFO pupa: save bill 2020-0482 in 2020 as bill_a089326a-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:41 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6918
07/18/2020 18:51:42 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6918
07/18/2020 18:51:42 INFO pupa: save bill 2020-0476 in 2020 as bill_a129eeb2-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:42 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6925
07/18/2020 18:51:43 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6925
07/18/2020 18:51:43 INFO pupa: save bill 2020-0483 in 2020 as bill_a1bd175a-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:43 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6898
07/18/2020 18:51:44 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6898
07/18/2020 18:51:44 INFO pupa: save bill 2020-0456 in 2019 as bill_a25853f0-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:44 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6927
07/18/2020 18:51:45 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6927
07/18/2020 18:51:45 INFO pupa: save bill 2020-0485 in 2020 as bill_a2f4b24a-c951-11ea-82d5-122a3d729da3.json
07/18/2020 18:51:45 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6928
07/18/2020 18:51:46 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6928
lametro (scrape)
  bills: {'window': '0.05'}
Not checking sessions...
lametro (scrape)
  bills: {'window': '0.05'}
bills scrape:
  duration:  0:01:29.129319
  objects:
    bill: 71
jurisdiction scrape:
  duration:  0:00:01.172966
  objects:
    jurisdiction: 1
    organization: 3
    post: 18
lametro (import)
  bills: {}
  events: {}
  people: {}
import jurisdictions...
import organizations...
import people...
import posts...
import memberships...
import bills...
import events...
import vote events...
lametro (import)
  bills: {}
  events: {}
  people: {}
import:
  bill: 0 new 0 updated 71 noop
  jurisdiction: 0 new 0 updated 1 noop
  organization: 0 new 0 updated 3 noop
  post: 0 new 0 updated 18 noop
lametro (import)
  people: {}
  bills: {}
  events: {}
import jurisdictions...
import organizations...
import people...
import posts...
import memberships...
import bills...
import events...
import vote events...
lametro (import)
  people: {}
  bills: {}
  events: {}
import:
  bill: 0 new 0 updated 71 noop
  jurisdiction: 0 new 0 updated 1 noop
  organization: 0 new 0 updated 3 noop
  post: 0 new 0 updated 18 noop

I'm not certain, but when I was spelunking re: opencivicdata/scrapers-us-municipal#328, I checked the max updated date on bills in the Councilmatic database, and it may have been 7/18. Are the scrapes not actually running? We break the Metro scrapes into scrape and import steps, so I would at least expect to see logs for the scrapes, even if the imports fail. Very curious...

@hancush
Copy link
Member Author

hancush commented Jul 22, 2020

We last deployed the scrapers at 9 a.m. 7/17, which is 2 p.m. UTC time. Per the logs, scrapes continued to run through Saturday, 7/18, before "stopping" (or not being logged). So I'm not sure that this was caused by a change to the deployed code.

@hancush
Copy link
Member Author

hancush commented Jul 22, 2020

FWIW, the last scrape coincides with the Saturday windowed scraping schedule:

# UTC: 6:00 am to 11:50 pm Saturday
# CST: 12:00 am to 5:50 pm Saturday
# CDT: 1:00 am to 6:50 pm Saturday
# Windowed scrapes every 15 minutes
0,15,30,45 6-23 * * 6 datamade /usr/bin/flock -n /tmp/metroevents.lock -c "WINDOW=0.05 $APPDIR/scripts/lametro/windowed-event-scrape.sh" >> /tmp/lametro.log
5,20,35,50 6-23 * * 6 datamade /usr/bin/flock -n /tmp/metrobills.lock -c "WINDOW=0.05 $APPDIR/scripts/lametro/windowed-bill-scrape.sh" >> /tmp/lametro.log

It's as if scheduling for the new week never kicked in?

@hancush
Copy link
Member Author

hancush commented Jul 22, 2020

This would seem to be the scrape where a huge number of bills were updated:

07/18/2020 00:55:23 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6898
07/18/2020 00:55:23 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6898
07/18/2020 00:55:23 INFO pupa: save bill 2020-0456 in 2019 as bill_450cf4da-c8bb-11ea-b551-122a3d729da3.json
07/18/2020 00:55:23 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6928
07/18/2020 00:55:23 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6928
07/18/2020 00:55:23 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6927
07/18/2020 00:55:23 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6927
07/18/2020 00:55:23 INFO pupa: save bill 2020-0485 in 2020 as bill_45139eca-c8bb-11ea-b551-122a3d729da3.json
lametro (scrape)
  bills: {'window': '0'}
Not checking sessions...
lametro (scrape)
  bills: {'window': '0'}
bills scrape:
  duration:  0:50:18.000070
  objects:
    bill: 2577
jurisdiction scrape:
  duration:  0:00:00.134372
  objects:
    jurisdiction: 1
    organization: 3
    post: 18
lametro (import)
  events: {}
  bills: {}
  people: {}
import jurisdictions...
import organizations...
import people...
import posts...
import memberships...
import bills...
import events...
import vote events...
lametro (import)
  events: {}
  bills: {}
  people: {}
import:
---------->  bill: 0 new 1906 updated 671 noop <----------
  jurisdiction: 0 new 0 updated 1 noop
  organization: 0 new 0 updated 3 noop
  post: 0 new 0 updated 18 noop
lametro (import)
  people: {}
  bills: {}
  events: {}
import jurisdictions...
import organizations...
import people...
import posts...
import memberships...
import bills...
import events...
import vote events...
lametro (import)
  people: {}
  bills: {}
  events: {}
import:
  bill: 0 new 1906 updated 671 noop
  jurisdiction: 0 new 0 updated 1 noop
  organization: 0 new 0 updated 3 noop
  post: 0 new 0 updated 18 noop

@hancush
Copy link
Member Author

hancush commented Jul 23, 2020

Confirming that both of the public bills Shelly provided as examples had inaccessible gateway links during the implicated scrape:

ubuntu@ip-10-0-0-80:~$ grep 07/18/2020 lametro.log.3 | grep 6217
07/18/2020 00:51:52 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6217
07/18/2020 00:51:52 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6217
ubuntu@ip-10-0-0-80:~$ grep 07/18/2020 lametro.log.3 | grep 6573
07/18/2020 00:53:28 INFO scrapelib: HEAD - https://metro.legistar.com/gateway.aspx?m=l&id=6573
07/18/2020 00:53:28 WARNING legistar: Bill could not be found in web interface: https://webapi.legistar.com/v1/metro/matters/6573

Cc opencivicdata/scrapers-us-municipal#328

@hancush
Copy link
Member Author

hancush commented Jul 23, 2020

Hm, it seems like there are a bunch of Metro scrapes just chilling on the OCD server:

ubuntu@ip-10-0-0-80:~$ ps --sort=start_time aux | grep [l]ametro
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
datamade  9472  0.0  0.0   4448   104 ?        Ss   Jul18   0:00 /bin/sh -c /usr/bin/flock -n /tmp/metroevents.lock -c $APPDIR/scripts/lametro/fast-full-event-scrape.sh >> /tmp/lametro.log
datamade  9474  0.0  0.0   5904    84 ?        S    Jul18   0:00 /usr/bin/flock -n /tmp/metroevents.lock -c /home/datamade/scrapers-us-municipal/scripts/lametro/fast-full-event-scrape.sh
datamade  9475  0.0  0.0   4448   104 ?        S    Jul18   0:00 /bin/sh -c /home/datamade/scrapers-us-municipal/scripts/lametro/fast-full-event-scrape.sh
datamade  9476  0.0  0.0   4448   100 ?        S    Jul18   0:00 /bin/sh /home/datamade/scrapers-us-municipal/scripts/lametro/fast-full-event-scrape.sh
datamade  9480  0.0  7.3 235912 151584 ?       S    Jul18   0:44 /home/datamade/.virtualenvs/opencivicdata/bin/python3.5 /home/datamade/.virtualenvs/opencivicdata/bin/pupa update --datadir=/cache/events/_data/ lametro --scrape events --rpm=0
datamade 30949  0.0  0.0   4448   108 ?        Ss   Jul19   0:00 /bin/sh -c /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock $APPDIR/scripts/lametro/full-scrape.sh >> /tmp/lametro.log
datamade 30951  0.0  0.0   5904    92 ?        S    Jul19   0:00 /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock /home/datamade/scrapers-us-municipal/scripts/lametro/full-scrape.sh
datamade 30955  0.0  0.0   5904    68 ?        S    Jul19   0:00 /usr/bin/flock /tmp/metroevents.lock /home/datamade/scrapers-us-municipal/scripts/lametro/full-scrape.sh
datamade 29715  0.0  0.0   4448   108 ?        Ss   Jul20   0:00 /bin/sh -c /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock $APPDIR/scripts/lametro/full-scrape.sh >> /tmp/lametro.log
datamade 29721  0.0  0.0   5904    64 ?        S    Jul20   0:00 /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock /home/datamade/scrapers-us-municipal/scripts/lametro/full-scrape.sh
datamade 27298  0.0  0.0   4448   104 ?        Ss   Jul21   0:00 /bin/sh -c /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock $APPDIR/scripts/lametro/full-scrape.sh >> /tmp/lametro.log
datamade 27301  0.0  0.0   5904    72 ?        S    Jul21   0:00 /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock /home/datamade/scrapers-us-municipal/scripts/lametro/full-scrape.sh
datamade 18780  0.0  0.0   4448   268 ?        Ss   Jul22   0:00 /bin/sh -c /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock $APPDIR/scripts/lametro/full-scrape.sh >> /tmp/lametro.log
datamade 18781  0.0  0.0   5904    68 ?        S    Jul22   0:00 /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock /home/datamade/scrapers-us-municipal/scripts/lametro/full-scrape.sh
datamade 24007  0.0  0.0   4448   268 ?        Ss   00:05   0:00 /bin/sh -c /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock $APPDIR/scripts/lametro/full-scrape.sh >> /tmp/lametro.log
datamade 24012  0.0  0.0   5904    68 ?        S    00:05   0:00 /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock /home/datamade/scrapers-us-municipal/scripts/lametro/full-scrape.sh

This would explain why there are no logs – old file locks are preventing the new scrapes from running. How did this happen? And is it safe to terminate them?

@hancush
Copy link
Member Author

hancush commented Jul 23, 2020

We use flock to manage locking files. Usually, we do this with the -n flag so commands fail, rather than block, if they can't acquire a lock.

However! It looks like the full scrape does not have this option:

# Full scrape
5 0 * * 0-5 datamade /usr/bin/flock /tmp/metrobills.lock /usr/bin/flock /tmp/metroevents.lock $APPDIR/scripts/lametro/full-scrape.sh >> /tmp/lametro.log

So, that might explain the build up of full scrapes. But why are the windowed scrapes hanging out???

More on flock: https://linux.die.net/man/1/flock

@hancush
Copy link
Member Author

hancush commented Jul 23, 2020

Oh, there's just the one fast full event scrape causing the bottleneck... So why is that hung????

@hancush
Copy link
Member Author

hancush commented Jul 23, 2020

Decided that it wasn't worth the time investment to investigate the bottleneck since we are moving away from this pattern for Metro. Instead, I killed each of the blocking processes with sudo kill -9, then (superstitiously) cleared the locks with flock -u /tmp/${LOCKNAME}.lock echo 'dropped (-u is unlock).

Hark! The next regularly scheduled event scrape ran and logged as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant