Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check_pixels.py script #55

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

will-moore
Copy link
Member

This script allows validation of pixel values using pixels.getPlanes() and comparing with the same image between the current server (that you are logged-in to) and IDR, using the same Image IDs.

To be used as the final step in the Fileset replacement workflow, to validate that Images on test or "idr-next" server have the same pixel values as on IDR itself.

Can be used to validate all Images within a container. Supported Object types are "Screen", "Plate", "Project", "Dataset", "Image".

Option --max-planes allows you to only check a subset of all planes for an Image. Otherwise, some images could take a very long time to check. By default we check ALL planes of each image.

Output is written to a log file.

$ omero login
$ python check_pixels.py Plate:4299 check_pix.log --max-planes=1

@will-moore
Copy link
Member Author

will-moore commented Nov 13, 2023

Testing last commit on idr-testing:omeroreadwrite:

(venv3) bash-4.2$ for i in 202 1351 1501 1551 1601 1602 1603 1202 1101 1302 1202 1251 1851 1751 2001 1952 2851; do  echo Plate:$i; python check_pixels.py Plate:$i --max-planes=5; --max-images=100 > /tmp/check_pix_$i.log; done

EDIT: oops - those are Screen IDs! - Need to run against Screens...

@will-moore
Copy link
Member Author

will-moore commented Nov 17, 2023

Going again... with typos fixed!...
On idr-testing: omeroreadwrite...
omero-server user screen -S check_pixels

$ for i in 202 1351 1501 1551 1601 1602 1603 1202 1101 1302 1202 1251 1851 1751 2001 1952 2851; do  echo Screen:$i; python check_pixels.py Screen:$i --max-planes=5 --max-images=10 > /tmp/check_pix_screen$i.log; done

@will-moore
Copy link
Member Author

After the weekend, checked logs... Looks like the loop above ran for a couple of days...

-rw-r--r--. 1 omero-server omero-server 1.9M Nov 18 14:43 /tmp/check_pix_screen1101.log
-rw-r--r--. 1 omero-server omero-server 429K Nov 18 15:55 /tmp/check_pix_screen1202.log
-rw-r--r--. 1 omero-server omero-server   84 Nov 18 16:05 /tmp/check_pix_screen1251.log
-rw-r--r--. 1 omero-server omero-server  84K Nov 18 14:59 /tmp/check_pix_screen1302.log
-rw-r--r--. 1 omero-server omero-server 555K Nov 17 23:33 /tmp/check_pix_screen1351.log
-rw-r--r--. 1 omero-server omero-server 131K Nov 18 02:34 /tmp/check_pix_screen1501.log
-rw-r--r--. 1 omero-server omero-server  38K Nov 18 03:51 /tmp/check_pix_screen1551.log
-rw-r--r--. 1 omero-server omero-server 3.7K Nov 18 03:54 /tmp/check_pix_screen1601.log
-rw-r--r--. 1 omero-server omero-server  13K Nov 18 04:16 /tmp/check_pix_screen1602.log
-rw-r--r--. 1 omero-server omero-server  940 Nov 18 04:18 /tmp/check_pix_screen1603.log
-rw-r--r--. 1 omero-server omero-server 365K Nov 18 17:16 /tmp/check_pix_screen1751.log
-rw-r--r--. 1 omero-server omero-server  12K Nov 18 16:08 /tmp/check_pix_screen1851.log
-rw-r--r--. 1 omero-server omero-server 620K Nov 18 18:46 /tmp/check_pix_screen1952.log
-rw-r--r--. 1 omero-server omero-server 144K Nov 18 18:26 /tmp/check_pix_screen2001.log
-rw-r--r--. 1 omero-server omero-server  56K Nov 17 22:32 /tmp/check_pix_screen202.log
-rw-r--r--. 1 omero-server omero-server 170K Nov 18 19:36 /tmp/check_pix_screen2851.log

Returning to screen...

Ice.ConnectionLostException: Ice.ConnectionLostException:
recv() returned zero
ERROR:omero.gateway:Failed to getPlane() or getTile() from rawPixelsStore
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7525, in getTiles
    rawPixelsStore = self._prepareRawPixelsStore()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7422, in _prepareRawPixelsStore
    ps.setPixelsId(self._obj.id.val, True, self._conn.SERVICE_OPTS)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4859, in __call__
    return self.handle_exception(e, *args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_RawPixelsStore_ice.py", line 529, in setPixelsId
    return _M_omero.api.RawPixelsStore._op_setPixelsId.invoke(self, ((pixelsId, bypassOriginalFile), _ctx))
Ice.ConnectionLostException: Ice.ConnectionLostException:
recv() returned zero
Traceback (most recent call last):
  File "check_pixels.py", line 42, in check_image
    for plane, idr_plane, idx in zip(planes, idr_planes, zctList):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7562, in getTiles
    raise exc
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7525, in getTiles
    rawPixelsStore = self._prepareRawPixelsStore()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7422, in _prepareRawPixelsStore
    ps.setPixelsId(self._obj.id.val, True, self._conn.SERVICE_OPTS)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4859, in __call__
    return self.handle_exception(e, *args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_RawPixelsStore_ice.py", line 529, in setPixelsId
    return _M_omero.api.RawPixelsStore._op_setPixelsId.invoke(self, ((pixelsId, bypassOriginalFile), _ctx))
Ice.ConnectionLostException: Ice.ConnectionLostException:
recv() returned zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "check_pixels.py", line 149, in <module>
    main(sys.argv[1:])
  File "check_pixels.py", line 144, in main
    check_image(idr_conn, image, max_planes)
  File "check_pixels.py", line 46, in check_image
    log("Error: Image:%s %s" % (image.id, ex.message))
AttributeError: 'ConnectionLostException' object has no attribute 'message'

Looking at individual logs...

less /tmp/check_pix_screen202.log
Error: Different Image IDs: [696320, 696321, 696324, 696325,...

# we checked 101 images...
grep "Check Image" /tmp/check_pix_screen202.log | wc
101
# 10 from each plate as expected...
grep "Check Image" /tmp/check_pix_screen202.log
...
86/460 Check Image:692844 P111 [Well H5, Field 1]
87/460 Check Image:692845 P111 [Well E5, Field 1]
88/460 Check Image:692846 P111 [Well H1, Field 1]
89/460 Check Image:692847 P111 [Well H10, Field 1]
90/460 Check Image:692926 P112 [Well D2, Field 1]
91/460 Check Image:692927 P112 [Well C4, Field 1]
92/460 Check Image:692928 P112 [Well D4, Field 1]
93/460 Check Image:692929 P112 [Well D12, Field 1]
94/460 Check Image:692930 P112 [Well F12, Field 1]
95/460 Check Image:692931 P112 [Well B5, Field 1]
96/460 Check Image:692932 P112 [Well E3, Field 1]
97/460 Check Image:692933 P112 [Well A4, Field 1]
98/460 Check Image:692934 P112 [Well A5, Field 1]
99/460 Check Image:692935 P112 [Well G11, Field 1]
100/460 Check Image:692975 P115 [Well C7, Field 1]

Looks like all images were failing pixels check.
NB: Accidentally re-ran the check_pixels.py for loop above, without being logged-in, which immediately wiped all the logs!

Logged-in and re-ran -

for i in 202 1351 1501 1551 1601 1602 1603 1202 1101 1302 1202 1251 1851 1751 2001 1952 2851; do  echo Screen:$i; python check_pixels.py Screen:$i --max-planes=1 --max-images=1 > /tmp/check_pix_screen$i.log; done

Screen:202
ERROR:omero.gateway:Failed to getPlane() or getTile() from rawPixelsStore
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7542, in getTiles
    convertedPlane = unpack(convertType, rawPlane)
struct.error: unpack requires a buffer of 614400 bytes
Traceback (most recent call last):
  File "check_pixels.py", line 42, in check_image
    for plane, idr_plane, idx in zip(planes, idr_planes, zctList):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7562, in getTiles
    raise exc
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7542, in getTiles
    convertedPlane = unpack(convertType, rawPlane)
struct.error: unpack requires a buffer of 614400 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "check_pixels.py", line 149, in <module>
    main(sys.argv[1:])
  File "check_pixels.py", line 144, in main
    check_image(idr_conn, image, max_planes)
  File "check_pixels.py", line 46, in check_image
    log("Error: Image:%s %s" % (image.id, ex.message))
AttributeError: 'error' object has no attribute 'message'

Failure to getPlane() causes script to crash - need to fix error handling

@will-moore
Copy link
Member Author

will-moore commented Nov 20, 2023

Started loop above at 9:59... checking a single plane per Fileset...
EDIT: updated on completion..

$ grep "Error: Image" /tmp/check_pix_screen* | wc
      9      62    1833
(base) [wmoore@test120-omeroreadwrite ~]$ grep "Error: Image" /tmp/check_pix_screen*
/tmp/check_pix_screen1101.log:Error: Image:1556034 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-05/03/23-33-31.705_mkngff/dab29e5a-d36f-430a-a9ff-7a1d6e4ce299.zarr/OME/METADATA.ome.xml
/tmp/check_pix_screen1101.log:Error: Image:1573072 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-05/07/02-36-52.924_mkngff/8387705b-16bf-4b14-8884-426b0c16dfff.zarr/OME/METADATA.ome.xml
/tmp/check_pix_screen1101.log:Error: Image:1600788 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-05/08/17-02-05.805_mkngff/df947dfe-ed8f-4dda-a20a-fb9f3a717b47.zarr/OME/METADATA.ome.xml
/tmp/check_pix_screen1251.log:Error: Image:2131186 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/17/08-11-44.096_mkngff/22803be2-9732-41ab-b9bf-fcd37c3b3b84.zarr/.zattrs
/tmp/check_pix_screen1351.log:Error: Image:3086565 /data/OMERO/Pixels/Dir-003/Dir-086/3086565 (Read-only file system)
/tmp/check_pix_screen1501.log:Error: Image:2852579 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-07/28/16-42-19.788_mkngff/8ba28efa-377f-4602-8c09-41737075ff08.zarr/OME/METADATA.ome.xml
/tmp/check_pix_screen1501.log:Error: Image:2854739 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-07/28/18-07-38.354_mkngff/bd01548b-da85-4a66-8422-7b16995fabba.zarr/OME/METADATA.ome.xml
/tmp/check_pix_screen1501.log:Error: Image:2855122 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-07/28/18-22-12.907_mkngff/36e53731-585a-4b73-9500-8e8b087b1022.zarr/OME/METADATA.ome.xml
/tmp/check_pix_screen1501.log:Error: Image:2855549 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-07/28/18-37-49.702_mkngff/0e5e1147-449f-4a89-96c6-3a3c85f8c1a7.zarr/OME/METADATA.ome.xml

Failures found so far:

  • 3 plates from idr0013-ScreenA (screen1101) LT0066_33, LT0080_27andLT0103_13` which need mkngff re-run.
  • 1 plates from idr0016 (screen1251) 24634
  • 1 plate from idr0010 screen1351.log which is invalid in IDR: https://idr.openmicroscopy.org/webclient/?show=plate-5894
  • 4 failing plates from idr0011 (screen1501.log) which still need to be recreated with mkngff.

@will-moore
Copy link
Member Author

will-moore commented Nov 22, 2023

All the following Screens ran to completion, with the number of Images representing the number of Plates in each Screen (1 Image per Plate).

$ grep -B 1 "End: " /tmp/check_pix_screen*
/tmp/check_pix_screen1101.log-509/510 Check Image:3051569 LT0067_11 [Well M11, Field 1]
/tmp/check_pix_screen1101.log:End: 2023-11-20 12:48:52.780474
--
/tmp/check_pix_screen1351.log-147/148 Check Image:3103845 96-14 [Scan K25]
/tmp/check_pix_screen1351.log:End: 2023-11-20 10:05:39.093512
--
/tmp/check_pix_screen1501.log-128/129 Check Image:2855926 Plate6-Green-A-(43) [Well F1, Field 1]
/tmp/check_pix_screen1501.log:End: 2023-11-20 10:13:56.385153
--
/tmp/check_pix_screen1551.log-39/40 Check Image:2857751 Plate2-TS-Red-B [Well B2, Field 1]
/tmp/check_pix_screen1551.log:End: 2023-11-20 10:16:14.794497
--
/tmp/check_pix_screen1601.log-3/4 Check Image:2857862 Plate Stinger-Target 1-A-used pics [Well D4, Field 1]
/tmp/check_pix_screen1601.log:End: 2023-11-20 10:16:29.730690
--
/tmp/check_pix_screen1602.log-7/8 Check Image:2959704 Plate1-Blue-B-TS-Stinger [Well E5, Field 1]
/tmp/check_pix_screen1602.log:End: 2023-11-20 10:16:52.772779
--
/tmp/check_pix_screen1603.log-0/1 Check Image:2857882 TS-Stinger-Target-1&2 [Well E6, Field 1]
/tmp/check_pix_screen1603.log:End: 2023-11-20 10:16:57.385534
--
/tmp/check_pix_screen1751.log-11/12 Check Image:3256968 41757_illum_corrected [Well P8, Field 1]
/tmp/check_pix_screen1751.log:End: 2023-11-20 15:04:51.962016
--
/tmp/check_pix_screen1851.log-2/3 Check Image:3261252 10x images plate 2 [Well H6, Field 1]
/tmp/check_pix_screen1851.log:End: 2023-11-20 15:03:04.208990
--
/tmp/check_pix_screen1952.log-19/20 Check Image:2781206 20586 [Well A16, Field 1]
/tmp/check_pix_screen1952.log:End: 2023-11-20 15:10:15.761085
--
/tmp/check_pix_screen2001.log-54/55 Check Image:3427062 Week6_31681 [Well E7, Field 1]
/tmp/check_pix_screen2001.log:End: 2023-11-20 15:07:54.243113

Others that didn't complete:

$ grep -L "End: " /tmp/check_pix_screen*
/tmp/check_pix_screen1202.log
/tmp/check_pix_screen1251.log
/tmp/check_pix_screen1302.log
/tmp/check_pix_screen202.log
/tmp/check_pix_screen2851.log
$ tail -n 2 /tmp/check_pix_screen1202.log
66/68 Check Image:3050227 HT47 [Well E18, Field 1]
67/68 Check Image:3050898 HT10 [Well M23, Field 1]

$ tail -n 2 /tmp/check_pix_screen1251.log
Error: Image:2131186 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/17/08-11-44.096_mkngff/22803be2-9732-41ab-b9bf-fcd37c3b3b84.zarr/.zattrs
40/413 Check Image:2133490 24641 [Well B3, Field 1]

$ tail -n 2 /tmp/check_pix_screen1302.log
5/28 Check Image:2860170 LTValidMitosisSon384Plate06_05 [Well P7, Field 1]
6/28 Check Image:2860514 LTValidMitosisSon384Plate06_03 [Well O20, Field 1]

$ tail -n 2 /tmp/check_pix_screen202.log
9/46 Check Image:692926 P112 [Well D2, Field 1]
10/46 Check Image:692975 P115 [Well C7, Field 1]

$ tail -n 2 /tmp/check_pix_screen2851.log
7/22 Check Image:12546038 190313.screen [Well E2, Field 1]
8/22 Check Image:12546774 190322.screen [Well C2, Field 1]

1202 actually did complete. Lets start the others again...

for i in 1251 1302 202 2851; do  echo Screen:$i; python check_pixels.py Screen:$i --max-planes=1 --max-images=1 > /tmp/check_pix20232211_screen$i.log; done

@will-moore
Copy link
Member Author

will-moore commented Nov 22, 2023

Progress...
11:36:

(base) [wmoore@test120-omeroreadwrite ~]$ ls -alh /tmp/check_pix20232211_screen*
-rw-r--r--. 1 omero-server omero-server 8.1M Nov 22 06:28 /tmp/check_pix20232211_screen1251.log
-rw-r--r--. 1 omero-server omero-server  87K Nov 22 06:31 /tmp/check_pix20232211_screen1302.log
-rw-r--r--. 1 omero-server omero-server  30K Nov 22 06:32 /tmp/check_pix20232211_screen202.log
-rw-r--r--. 1 omero-server omero-server 171K Nov 22 07:25 /tmp/check_pix20232211_screen2851.log

Only 1 of the 4 Screens completed:

$ grep -B 2 "End" /tmp/check_pix20232211_screen*
/tmp/check_pix20232211_screen1302.log-26/28 Check Image:2867393 LTValidMitosisSon384Plate01_02 [Well F19, Field 1]
/tmp/check_pix20232211_screen1302.log-27/28 Check Image:2867735 LTValidMitosisSon384Plate01_01 [Well C17, Field 1]
/tmp/check_pix20232211_screen1302.log:End: 2023-11-22 06:31:51.114386
$ tail /tmp/check_pix20232211_screen1251.log
...
37/413 Check Image:2126578 24631 [Well I10, Field 1]
38/413 Check Image:2126840 24633 [Well B7, Field 1]
39/413 Check Image:2131186 24634 [Well F21, Field 1]
Error: Image:2131186 Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/17/08-11-44.096_mkngff/22803be2-9732-41ab-b9bf-fcd37c3b3b84.zarr/.zattrs
40/413 Check Image:2133490 24641 [Well B3, Field 1]

The error is coming from an Image in idr0016 Plate named 24634, which also failed to generate memo file.

$ tail /tmp/check_pix20232211_screen202.log
...
1/46 Check Image:692225 P102 [Well D3, Field 1]
2/46 Check Image:692276 P105 [Well C2, Field 1]
3/46 Check Image:692362 P106 [Well C1, Field 1]
4/46 Check Image:692451 P107 [Well D6, Field 1]
5/46 Check Image:692549 P108 [Well H9, Field 1]
6/46 Check Image:692645 P109 [Well C9, Field 1]
7/46 Check Image:692743 P110 [Well C2, Field 1]
8/46 Check Image:692838 P111 [Well G5, Field 1]
9/46 Check Image:692926 P112 [Well D2, Field 1]
10/46 Check Image:692975 P115 [Well C7, Field 1]
$ tail /tmp/check_pix20232211_screen2851.log
0/22 Check Image:12539702 190129.screen [Well E2, Field 1]
1/22 Check Image:12541270 190206.screen [Well B5, Field 1]
2/22 Check Image:12542038 190211.screen [Well F11, Field 1]
3/22 Check Image:12543030 190213.screen [Well B10, Field 1]
4/22 Check Image:12543766 190220.screen [Well B8, Field 1]
5/22 Check Image:12544758 190227.screen [Well B2, Field 1]
6/22 Check Image:12545750 190306.screen [Well B5, Field 1]
7/22 Check Image:12546038 190313.screen [Well E2, Field 1]
8/22 Check Image:12546774 190322.screen [Well C2, Field 1]
9/22 Check Image:12547510 190327.screen [Well F2, Field 1]

Returning to the terminal screen where this was running, I see ConnectionLostException

ConnectionLostException
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7542, in getTiles
    convertedPlane = unpack(convertType, rawPlane)
struct.error: unpack requires a buffer of 614400 bytes

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "check_pixels.py", line 149, in <module>
    main(sys.argv[1:])
  File "check_pixels.py", line 144, in main
    check_image(idr_conn, image, max_planes)
  File "check_pixels.py", line 46, in check_image
    log("Error: Image:%s %s" % (image.id, ex.message))
AttributeError: 'error' object has no attribute 'message'
!! 11/22/23 06:32:42.201 error: communicator not destroyed during global destruction.Screen:2851
WARNING:omero.gateway:ConnectionLostException on <class 'omero.gateway.OmeroGatewaySafeCallWrapper'> to <98bcf454-af15-4643-83e1-0d4f2e813f5domero.api.RawPixelsStore> setPixelsId((12547510, True, <ServiceOptsDict: {'omero.client.uuid': 'a2b43c28-287c-4d1f-8e38-84b52f41411f', 'omero.session.uuid': 'd71a8b56-f065-400c-9c8f-23975b6a7813'}>), {})
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_RawPixelsStore_ice.py", line 529, in setPixelsId
    return _M_omero.api.RawPixelsStore._op_setPixelsId.invoke(self, ((pixelsId, bypassOriginalFile), _ctx))
Ice.ConnectionLostException: Ice.ConnectionLostException:
recv() returned zero
ERROR:omero.gateway:Failed to getPlane() or getTile() from rawPixelsStore
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7525, in getTiles
    rawPixelsStore = self._prepareRawPixelsStore()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7422, in _prepareRawPixelsStore
    ps.setPixelsId(self._obj.id.val, True, self._conn.SERVICE_OPTS)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4859, in __call__
    return self.handle_exception(e, *args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_RawPixelsStore_ice.py", line 529, in setPixelsId
    return _M_omero.api.RawPixelsStore._op_setPixelsId.invoke(self, ((pixelsId, bypassOriginalFile), _ctx))
Ice.ConnectionLostException: Ice.ConnectionLostException:
recv() returned zero
Traceback (most recent call last):
  File "check_pixels.py", line 42, in check_image
    for plane, idr_plane, idx in zip(planes, idr_planes, zctList):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7562, in getTiles
    raise exc
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7525, in getTiles
    rawPixelsStore = self._prepareRawPixelsStore()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7422, in _prepareRawPixelsStore
    ps.setPixelsId(self._obj.id.val, True, self._conn.SERVICE_OPTS)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4859, in __call__
    return self.handle_exception(e, *args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_RawPixelsStore_ice.py", line 529, in setPixelsId
    return _M_omero.api.RawPixelsStore._op_setPixelsId.invoke(self, ((pixelsId, bypassOriginalFile), _ctx))
Ice.ConnectionLostException: Ice.ConnectionLostException:
recv() returned zero

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "check_pixels.py", line 149, in <module>
    main(sys.argv[1:])
  File "check_pixels.py", line 144, in main
    check_image(idr_conn, image, max_planes)
  File "check_pixels.py", line 46, in check_image
    log("Error: Image:%s %s" % (image.id, ex.message))
AttributeError: 'ConnectionLostException' object has no attribute 'message'

@will-moore
Copy link
Member Author

will-moore commented Nov 22, 2023

Test idr0090 Screen:

check_pixels.py Screen:2851 --max-planes=1 --max-images=1`
...
0/22 Check Image:12539702 190129.screen [Well E2, Field 1]
1/22 Check Image:12541270 190206.screen [Well B5, Field 1]
2/22 Check Image:12542038 190211.screen [Well F11, Field 1]
3/22 Check Image:12543030 190213.screen [Well B10, Field 1]
4/22 Check Image:12543766 190220.screen [Well B8, Field 1]
5/22 Check Image:12544758 190227.screen [Well B2, Field 1]
6/22 Check Image:12545750 190306.screen [Well B5, Field 1]
7/22 Check Image:12546038 190313.screen [Well E2, Field 1]
8/22 Check Image:12546774 190322.screen [Well C2, Field 1]
9/22 Check Image:12547510 190327.screen [Well F2, Field 1]
10/22 Check Image:12548246 190502.screen [Well B4, Field 1]

Hangs at this point. Checking that image in webclient - also fails to load.
http://localhost:1080/webclient/?show=image-12548246 (idr0015)

@will-moore
Copy link
Member Author

will-moore commented Nov 22, 2023

Tried checking the rest of the idr0090 plates one at a time, but the first one just hangs...

python check_pixels.py Plate:9312 --max-planes=1 --max-images=1
...
0/1 Check Image:12549142 190510.screen [Well G8, Field 1]

Decided to test the remainder manually in webclient. Here,x is not an Error, it just never loads images:

  • Plate name: 190510: x
  • Plate name: 190528: OK
  • Plate name: 190531: x
  • Plate name: 190607: x
  • Plate name: 190614: x
  • Plate name: 190621: x
  • Plate name: 190628: x
  • Plate name: 190705: x
  • Plate name: 190710: x
  • Plate name: 190809: x
  • Plate name: 190904: OK

Summary - 9 Plates from idr0090 that are not viewable

EDIT - these plates eventually did work OK. All passed for check_pixels below (Screen:2851)

@will-moore
Copy link
Member Author

Updated check_pixels on idr-testing:omeroreadwrite to include the 2 commits above from yesterday.

@will-moore
Copy link
Member Author

will-moore commented Nov 23, 2023

Checking idr0004...
Now we don't fail when there's an error...

$ python check_pixels.py Screen:202 --max-planes=1 --max-images=1
Start: 2023-11-23 14:14:44.616396
Checking Screen:202
max_planes: 1
max_images: 1
0/46 Check Image:692151 P101 [Well B2, Field 1]
1/46 Check Image:692225 P102 [Well D3, Field 1]
2/46 Check Image:692276 P105 [Well C2, Field 1]
3/46 Check Image:692362 P106 [Well C1, Field 1]
4/46 Check Image:692451 P107 [Well D6, Field 1]
5/46 Check Image:692549 P108 [Well H9, Field 1]
6/46 Check Image:692645 P109 [Well C9, Field 1]
7/46 Check Image:692743 P110 [Well C2, Field 1]
8/46 Check Image:692838 P111 [Well G5, Field 1]
9/46 Check Image:692926 P112 [Well D2, Field 1]
10/46 Check Image:692975 P115 [Well C7, Field 1]
ERROR:omero.gateway:Failed to getPlane() or getTile() from rawPixelsStore
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7542, in getTiles
    convertedPlane = unpack(convertType, rawPlane)
struct.error: unpack requires a buffer of 614400 bytes
Error: Image:692975 unpack requires a buffer of 614400 bytes
11/46 Check Image:693071 P117 [Well A10, Field 1]
12/46 Check Image:693146 P118 [Well H1, Field 1]
13/46 Check Image:693215 P119 [Well D9, Field 1]
14/46 Check Image:693308 P121 [Well C8, Field 1]
15/46 Check Image:693386 P123 [Well F1, Field 1]
16/46 Check Image:693473 P124 [Well D8, Field 1]
ERROR:omero.gateway:Failed to getPlane() or getTile() from rawPixelsStore
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7542, in getTiles
    convertedPlane = unpack(convertType, rawPlane)
struct.error: unpack requires a buffer of 688128 bytes
Error: Image:693473 unpack requires a buffer of 688128 bytes
17/46 Check Image:693541 P125 [Well F8, Field 1]
18/46 Check Image:693620 P126 [Well B9, Field 1]
...
40/46 Check Image:698674 P150 [Well D11, Field 1]
41/46 Check Image:698738 P170 [Well B1, Field 1]
42/46 Check Image:698833 P171 [Well C4, Field 1]
43/46 Check Image:719278 P120 [Well D1, Field 1]
44/46 Check Image:797269 P128 [Well F6, Field 1]
45/46 Check Image:797731 P132 [Well D5, Field 1]
End: 2023-11-23 14:17:36.696575

Only 2 plates failing on idr0004...
Image: 692975 is viewable in webcilent, but appears to have chunks missing (see below):
Image: 693473 (plate P124) is viewable OK in webclient, but the error above is still happening. Might be missing tiny chunk or less?? Lots of other images on Plate P124 have same issue and the difference IS visible in webclient.

Screenshot 2023-11-23 at 14 34 45

@will-moore
Copy link
Member Author

will-moore commented Nov 23, 2023

All idr0013-ScreenB Plates are OK!

$ python check_pixels.py Screen:1302 --max-planes=1 --max-images=1
Start: 2023-11-23 14:40:22.413577
Checking Screen:1302
max_planes: 1
max_images: 1
0/28 Check Image:2858452 LTValidMitosisSon384Plate07_06 [Well K6, Field 1]
1/28 Check Image:2858796 LTValidMitosisSon384Plate07_05 [Well A12, Field 1]
2/28 Check Image:2859140 LTValidMitosisSon384Plate07_03 [Well H24, Field 1]
...
26/28 Check Image:2867393 LTValidMitosisSon384Plate01_02 [Well F19, Field 1]
27/28 Check Image:2867735 LTValidMitosisSon384Plate01_01 [Well C17, Field 1]
End: 2023-11-23 14:42:14.479046

Checked again idr0015 (plates were hanging above) and all looks good now

$ python check_pixels.py Screen:2851 --max-planes=1 --max-images=1
Start: 2023-11-23 14:46:14.518359
Checking Screen:2851
max_planes: 1
max_images: 1
0/22 Check Image:12539702 190129.screen [Well E2, Field 1]
1/22 Check Image:12541270 190206.screen [Well B5, Field 1]
2/22 Check Image:12542038 190211.screen [Well F11, Field 1]
^[[A^[[A^[[A^[[A^[[A^[[A3/22 Check Image:12543030 190213.screen [Well B10, Field 1]
4/22 Check Image:12543766 190220.screen [Well B8, Field 1]
5/22 Check Image:12544758 190227.screen [Well B2, Field 1]
6/22 Check Image:12545750 190306.screen [Well B5, Field 1]
7/22 Check Image:12546038 190313.screen [Well E2, Field 1]
8/22 Check Image:12546774 190322.screen [Well C2, Field 1]
9/22 Check Image:12547510 190327.screen [Well F2, Field 1]
10/22 Check Image:12548246 190502.screen [Well B4, Field 1]
11/22 Check Image:12549142 190510.screen [Well G8, Field 1]
12/22 Check Image:12550006 190528.screen [Well D7, Field 1]
13/22 Check Image:12550678 190531.screen [Well D7, Field 1]
14/22 Check Image:12551318 190607.screen [Well B10, Field 1]
15/22 Check Image:12552054 190614.screen [Well D2, Field 1]
16/22 Check Image:12552790 190621.screen [Well C7, Field 1]
17/22 Check Image:12553270 190628.screen [Well G7, Field 1]
18/22 Check Image:12553750 190705.screen [Well D7, Field 1]
19/22 Check Image:12554230 190710.screen [Well C2, Field 1]
20/22 Check Image:12554710 190809.screen [Well G10, Field 1]
21/22 Check Image:12554998 190904.screen [Well D3, Field 1]
End: 2023-11-23 14:49:09.959476

So, to summarise:

  • 2 Plates from idr0004 are failing (but are viewable in webcient)
  • Known issues for various plates at Add check_pixels.py script #55 (comment) either broken or TODO:mkngff except...
  • idr0016 (screen1251) Plate Named 24634

@will-moore
Copy link
Member Author

will-moore commented Nov 23, 2023

Try testing more than 1 image from each Plate... 5 images per plate from idr0004

python check_pixels.py Screen:202 --max-planes=1 --max-images=
...
228/230 Check Image:797734 P132 [Well E7, Field 1]
229/230 Check Image:797735 P132 [Well H6, Field 1]
Error: Mismatch for Image: 797735 at plane (z, c, t): (0, 0, 0)
End: 2023-11-23 16:08:27.789341

Wow, last Image failed!

Check all images from that Plate:

er/venv3/lib64/python3.6/site-packages/omero_RTypes_ice.py", line 352, in __init__
    self._val = _val
KeyboardInterrupt
!! 11/23/23 16:27:37.909 error: communicator not destroyed during global destruction.(venv3) bash-4.2$ python check_pixels.py Plate:1966 --max-planes=1
Start: 2023-11-23 16:28:23.824980
Checking Plate:1966
max_planes: 1
max_images: 0
0/87 Check Image:797729 P132 [Well A1, Field 1]
1/87 Check Image:797730 P132 [Well A2, Field 1]
2/87 Check Image:797731 P132 [Well D5, Field 1]
3/87 Check Image:797732 P132 [Well D9, Field 1]
4/87 Check Image:797733 P132 [Well D4, Field 1]
5/87 Check Image:797734 P132 [Well E7, Field 1]
6/87 Check Image:797735 P132 [Well H6, Field 1]
...
84/87 Check Image:797814 P132 [Well F12, Field 1]
Error: Mismatch for Image: 797814 at plane (z, c, t): (0, 0, 0)
85/87 Check Image:797815 P132 [Well A7, Field 1]
86/87 Check Image:797816 P132 [Well H10, Field 1]
Error: Mismatch for Image: 797816 at plane (z, c, t): (0, 0, 0)
End: 2023-11-23 16:31:40.685793

Every Well after E8 is failing...
Looking at the Plate, E8 is a duplicate of E7 and every following Well is shifted by 1 place:

Screenshot 2023-11-23 at 16 54 34

@will-moore
Copy link
Member Author

will-moore commented Nov 23, 2023

Better check more Images from each Plate (50 images) in case there are others from idr0004 with this issue...

$ python check_pixels.py Screen:202 --max-planes=1 --max-images=50 > /tmp/check_pix_screen202_50.log
...
End: 2023-11-23 18:56:52.547520

Looks like all these are from the plate P132 above

grep "Error: Mismatch" /tmp/check_pix_screen202_50.log
Error: Mismatch for Image: 797735 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797736 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797737 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797739 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797740 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797749 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797752 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797753 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797754 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797755 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797761 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797762 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797764 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797766 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797769 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797771 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797779 at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image: 797780 at plane (z, c, t): (0, 0, 0)

Also see errors like "Error: Image:693514 unpack requires a buffer of 688128 bytes"
These 29 images are all from 2 plates (NB: we only checked 50 images from each plate in this log)

$ grep -B 1 "unpack requires a buffer" /tmp/check_pix_screen202_50.log | grep Check
490/2287 Check Image:692975 P115 [Well C7, Field 1]
492/2287 Check Image:692977 P115 [Well E3, Field 1]
495/2287 Check Image:692980 P115 [Well D5, Field 1]
502/2287 Check Image:692987 P115 [Well D6, Field 1]
524/2287 Check Image:693009 P115 [Well F5, Field 1]
525/2287 Check Image:693010 P115 [Well C2, Field 1]
528/2287 Check Image:693013 P115 [Well B7, Field 1]
533/2287 Check Image:693018 P115 [Well C8, Field 1]
538/2287 Check Image:693023 P115 [Well F2, Field 1]
791/2287 Check Image:693473 P124 [Well D8, Field 1]
792/2287 Check Image:693474 P124 [Well D7, Field 1]
793/2287 Check Image:693475 P124 [Well E8, Field 1]
794/2287 Check Image:693476 P124 [Well B7, Field 1]
795/2287 Check Image:693477 P124 [Well C6, Field 1]
802/2287 Check Image:693484 P124 [Well F7, Field 1]
804/2287 Check Image:693486 P124 [Well E9, Field 1]
805/2287 Check Image:693487 P124 [Well D10, Field 1]
807/2287 Check Image:693489 P124 [Well A7, Field 1]
809/2287 Check Image:693491 P124 [Well B6, Field 1]
811/2287 Check Image:693493 P124 [Well F8, Field 1]
813/2287 Check Image:693495 P124 [Well A10, Field 1]
814/2287 Check Image:693496 P124 [Well D6, Field 1]
818/2287 Check Image:693500 P124 [Well C7, Field 1]
820/2287 Check Image:693502 P124 [Well E6, Field 1]
823/2287 Check Image:693505 P124 [Well F6, Field 1]
825/2287 Check Image:693507 P124 [Well F9, Field 1]
829/2287 Check Image:693511 P124 [Well E7, Field 1]
832/2287 Check Image:693514 P124 [Well C9, Field 1]
835/2287 Check Image:693518 P124 [Well B10, Field 1]

These images look corrupted the same as for Error: Image:692975 unpack requires a buffer of 614400 bytes above: #55 (comment)

@will-moore
Copy link
Member Author

will-moore commented Nov 23, 2023

Try a bunch of images from each plate of idr0010...

 python check_pixels.py Screen:1351 --max-planes=1 --max-images=50 > /tmp/check_pix_screen1351_50.log
...
    serverExceptionClass = ome.conditions.ResourceError
    message = /data/OMERO/Pixels/Dir-003/Dir-086/3086614 (Read-only file system)

Ran to completion:

tail -n 5 /tmp/check_pix_screen1351_50.log
7396/7400 Check Image:3103891 96-14 [Scan I25]
7397/7400 Check Image:3103892 96-14 [Scan D24]
7398/7400 Check Image:3103893 96-14 [Scan H18]
7399/7400 Check Image:3103894 96-14 [Scan F19]
End: 2023-11-23 23:51:42.785356

All these Errors come from the same Plate (we checked 50 images from each Plate) which is broken in IDR:

Error: Image:3086613 exception ::omero::ResourceError
Error: Image:3086614 exception ::omero::ResourceError
bash-4.2$ grep "Error: Image" /tmp/check_pix_screen1351_50.log | wc
     50     200    2700

https://idr.openmicroscopy.org/webclient/?show=plate-5894

Summary: no unexpected errors for idr0010.

@will-moore
Copy link
Member Author

will-moore commented Nov 24, 2023

Let's run for 20 Images per Fileset for all the others... (5 images per Plate was enough to pick up idr0004 issues above)

for i in 1501 1551 1601 1602 1603 1202 1101 1302 1202 1251 1851 1751 2001 1952 2851; do  echo Screen:$i; python check_pixels.py Screen:$i --max-planes=1 --max-images=20> /tmp/check_pix_20231124_screen$i.log; done
...

idr0011-screenA has 20 images from each of the 4 known failing plates. None for idr0011-screenB or C/D/E,

$ grep "Error: Image" /tmp/check_pix_20231124_screen1501.log | wc
     80     320    4320
$ grep "Error:" /tmp/check_pix_20231124_screen1551.log | wc
      0       0       0
$ grep "Error:" /tmp/check_pix_20231124_screen160* | wc
      0       0       0

13:57:
idr0012 - No Errors:

$ grep "Error: Image" /tmp/check_pix_20231124_screen1202.log | wc
      0       0       0

idr0013 screenA - 16:00 - 2 of the 3 plates parsed so far...
EDIT: on completion, 3 plates found to contain errors (as expected from earlier checks):

$ grep "Error: Image" /tmp/check_pix_20231124_screen1101.log | wc
     60     240    3240

@will-moore
Copy link
Member Author

will-moore commented Nov 27, 2023

Run completed - found 1 plate from idr0036 where all Images failed (20 checked). All other plates and Screens contain no new Errors.

(base) [wmoore@test120-omeroreadwrite ~]$ ls -alh /tmp/check_pix_20231124_screen*
-rw-r--r--. 1 omero-server omero-server 964K Nov 24 19:20 /tmp/check_pix_20231124_screen1101.log
-rw-r--r--. 1 omero-server omero-server  72K Nov 24 20:37 /tmp/check_pix_20231124_screen1202.log
-rw-r--r--. 1 omero-server omero-server 973K Nov 24 21:44 /tmp/check_pix_20231124_screen1251.log
-rw-r--r--. 1 omero-server omero-server  44K Nov 24 19:46 /tmp/check_pix_20231124_screen1302.log
-rw-r--r--. 1 omero-server omero-server 643K Nov 24 08:59 /tmp/check_pix_20231124_screen1501.log
-rw-r--r--. 1 omero-server omero-server  51K Nov 24 09:41 /tmp/check_pix_20231124_screen1551.log
-rw-r--r--. 1 omero-server omero-server 6.3K Nov 24 09:45 /tmp/check_pix_20231124_screen1601.log
-rw-r--r--. 1 omero-server omero-server  12K Nov 24 09:51 /tmp/check_pix_20231124_screen1602.log
-rw-r--r--. 1 omero-server omero-server 1.5K Nov 24 09:52 /tmp/check_pix_20231124_screen1603.log
-rw-r--r--. 1 omero-server omero-server  15K Nov 24 21:59 /tmp/check_pix_20231124_screen1751.log
-rw-r--r--. 1 omero-server omero-server 3.9K Nov 24 21:47 /tmp/check_pix_20231124_screen1851.log
-rw-r--r--. 1 omero-server omero-server  23K Nov 24 22:54 /tmp/check_pix_20231124_screen1952.log
-rw-r--r--. 1 omero-server omero-server  65K Nov 24 22:40 /tmp/check_pix_20231124_screen2001.log
-rw-r--r--. 1 omero-server omero-server  27K Nov 24 23:36 /tmp/check_pix_20231124_screen2851.log

grep Error /tmp/check_pix_20231124_screen1952.log | wc
     20     260    1300

@will-moore
Copy link
Member Author

will-moore commented Nov 27, 2023

Failing Plate for idr0036 - rendered images don't match Thumbnails:
Screenshot 2023-11-27 at 09 45 10

EDIT:
This turns out to be showing images from the previous Plate because the pixels name and path were set incorrectly.

Ran this psql to update...

UPDATE pixels SET name = '.zattrs', path = 'demo_2/2016-06/15/02-41-48.277_mkngff/e47a40cc-2810-4717-8263-d11e620b9516.zarr' where image in (select id from Image where fileset = 6314057);

Tried to view an Image, which should trigger memo file regeneration...

Still not viewable a day later after several attempts to view. Searching logs doesn't reveal much.
Logs are quite verbose, especially with check_pixels running...

(base) [wmoore@test120-omeroreadwrite ~]$ grep d11e620b9516 /opt/omero/server/OMERO.server/var/log/Blitz-0.log
2023-11-29 13:52:00,756 INFO  [      ome.services.OmeroFilePathResolver] (.Server-10) Metadata only file, resulting path: /data/OMERO/ManagedRepository/demo_2/2016-06/15/02-41-48.277_mkngff/e47a40cc-2810-4717-8263-d11e620b9516.zarr/.zattrs
2023-11-29 13:52:00,784 INFO  [                loci.formats.ImageReader] (.Server-10) ZarrReader initializing /data/OMERO/ManagedRepository/demo_2/2016-06/15/02-41-48.277_mkngff/e47a40cc-2810-4717-8263-d11e620b9516.zarr/.zattrs
2023-11-29 13:57:30,374 INFO  [        ome.services.util.ServiceHandler] (l.Server-9)  Rslt:	([demo_2/2016-06/15/02-41-48.277_mkngff/e47a40cc-2810-4717-8263-d11e620b9516.zarr/, .zattrs, unknown], [demo_2/2016-06/15/02-41-48.277_mkngff/e47a40cc-2810-4717-8263-d11e620b9516.zarr/, .zgroup, unknown], [demo_2/2016-06/15/02-41-48.277_mkngff/e47a40cc-2810-4717-8263-d11e620b9516.zarr/A/, .zgroup, unknown], ... 21519 more)

@will-moore
Copy link
Member Author

As discussed in IDR meeting today, we need to scale up check_pixels to be able to run in parallel on several servers etc.
Also run on ALL channels for each Image (this is a different requirement than the max-planes per image).

We want to check ALL images in each Fileset.

@will-moore
Copy link
Member Author

will-moore commented Nov 27, 2023

Start with idr0004... On idr-testing:omeroreadwrite...

python check_pixels.py Screen:202 --max-planes=sizeC > /tmp/check_pix_20231127_screen202.log
...
End: 2023-11-27 14:30:39.106256

EDIT: took 2hours 15 mins.

$ grep "Mismatch" /tmp/check_pix_20231127_screen202.log | wc
     70     910    4480

These are the 35 Images (2 channels each) from plate P132 (after E8).

There are 38 images from P115 and P124 with the "unpack" issue:

$ grep -B 1 "unpack" /tmp/check_pix_20231127_screen202.log | grep Check
779/3679 Check Image:692975 P115 [Well C7, Field 1]
781/3679 Check Image:692977 P115 [Well E3, Field 1]
784/3679 Check Image:692980 P115 [Well D5, Field 1]
791/3679 Check Image:692987 P115 [Well D6, Field 1]
813/3679 Check Image:693009 P115 [Well F5, Field 1]
814/3679 Check Image:693010 P115 [Well C2, Field 1]
817/3679 Check Image:693013 P115 [Well B7, Field 1]
822/3679 Check Image:693018 P115 [Well C8, Field 1]
827/3679 Check Image:693023 P115 [Well F2, Field 1]
832/3679 Check Image:693028 P115 [Well B6, Field 1]
835/3679 Check Image:693031 P115 [Well E4, Field 1]
853/3679 Check Image:693049 P115 [Well B4, Field 1]
860/3679 Check Image:693057 P115 [Well F8, Field 1]
862/3679 Check Image:693059 P115 [Well D8, Field 1]
863/3679 Check Image:693060 P115 [Well E6, Field 1]
1243/3679 Check Image:693473 P124 [Well D8, Field 1]
1244/3679 Check Image:693474 P124 [Well D7, Field 1]
1245/3679 Check Image:693475 P124 [Well E8, Field 1]
1246/3679 Check Image:693476 P124 [Well B7, Field 1]
1247/3679 Check Image:693477 P124 [Well C6, Field 1]
1254/3679 Check Image:693484 P124 [Well F7, Field 1]
1256/3679 Check Image:693486 P124 [Well E9, Field 1]
1257/3679 Check Image:693487 P124 [Well D10, Field 1]
1259/3679 Check Image:693489 P124 [Well A7, Field 1]
1261/3679 Check Image:693491 P124 [Well B6, Field 1]
1263/3679 Check Image:693493 P124 [Well F8, Field 1]
1265/3679 Check Image:693495 P124 [Well A10, Field 1]
1266/3679 Check Image:693496 P124 [Well D6, Field 1]
1270/3679 Check Image:693500 P124 [Well C7, Field 1]
1272/3679 Check Image:693502 P124 [Well E6, Field 1]
1275/3679 Check Image:693505 P124 [Well F6, Field 1]
1277/3679 Check Image:693507 P124 [Well F9, Field 1]
1281/3679 Check Image:693511 P124 [Well E7, Field 1]
1284/3679 Check Image:693514 P124 [Well C9, Field 1]
1287/3679 Check Image:693518 P124 [Well B10, Field 1]
1292/3679 Check Image:693523 P124 [Well A8, Field 1]
1296/3679 Check Image:693527 P124 [Well A9, Field 1]
1302/3679 Check Image:693533 P124 [Well D9, Field 1]

70 + 38 accounts for all Errors for idr0004:

$ grep Error /tmp/check_pix_20231127_screen202.log | wc
    108    1252    6798

@will-moore
Copy link
Member Author

will-moore commented Nov 27, 2023

For most studies, we can run a whole Screen at at time on one of the proxy servers.
But for e.g. idr0013 and idr0016 with many plates, we want to follow a similar procedure to https://github.com/IDR/deployment/blob/master/docs/operating-procedures.md#bio-formats-cache-regeneration to distribute parallel processing across various servers...

E.g. lets get all the Plates for idr0013 and idr0016...
as wmoore on idr-testing omeroreadwrite...

$ omero hql --limit -1 --ids-only --style csv 'select plate.id from Plate as plate where plate in (select child from ScreenPlateLink where parent=1101)' > idr0013_plates.txt
$ omero hql --limit -1 --ids-only --style csv 'select plate.id from Plate as plate where plate in (select child from ScreenPlateLink where parent=1251)' > idr0016_plates.txt

Edited both those files to remove the first line. Total of 923 Plates...

cut -d ',' -f2 idr0013_plates.txt | sed -e 's/^/Plate:/' > ids.txt
cut -d ',' -f2 idr0016_plates.txt | sed -e 's/^/Plate:/' >> ids.txt
$ cat ids.txt | wc
    923     923   10153

Copy this file from omeroreadwrite onto parent idr-testing server:

(base) [wmoore@test120-omeroreadwrite ~]$ rsync -rvP omeroreadwrite:/home/wmoore/ids.txt ./
receiving incremental file list

sent 20 bytes  received 44 bytes  128.00 bytes/sec
total size is 10,153  speedup is 158.64

Instead of omero render -s localhost -u public -w public I need to login with credentials, then run check_pixels...
Like this (without first activating venv, so all paths include the venv:

screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt -j10 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC > /tmp/check_pix_20231127.log'

EDIT 20:12:
seems to be doing something...

[wmoore@test120-omeroreadonly-2 ~]$ tail -f /tmp/check_pix_20231127.log
276/384 Check Image:1533068 LT0038_01 [Well M19, Field 1]
277/384 Check Image:1533069 LT0038_01 [Well C11, Field 1]
278/384 Check Image:1533070 LT0038_01 [Well K8, Field 1]
279/384 Check Image:1533071 LT0038_01 [Well E18, Field 1]
280/384 Check Image:1533072 LT0038_01 [Well C16, Field 1]
281/384 Check Image:1533073 LT0038_01 [Well H7, Field 1]
282/384 Check Image:1533074 LT0038_01 [Well H21, Field 1]
283/384 Check Image:1533075 LT0038_01 [Well D12, Field 1]
284/384 Check Image:1533076 LT0038_01 [Well G22, Field 1]
285/384 Check Image:1533077 LT0038_01 [Well J7, Field 1]

But file is empty on omeroreadonly-1 and omeroreadonly-4:

[wmoore@test120-omeroreadonly-1 ~]$ ls -alh /tmp/check_pix_20231127*
-rw-rw-r--. 1 wmoore wmoore    0 Nov 27 20:41 /tmp/check_pix_20231127.log

EDIT: 28th.. 9:52... - we have log on omeroreadonly-1 now but it appears to be binary file!?

[wmoore@test120-omeroreadonly-1 ~]$ ls -alh /tmp/check_pix_20231127*
-rw-rw-r--. 1 wmoore wmoore  17K Nov 28 09:43 /tmp/check_pix_20231127.log
[wmoore@test120-omeroreadonly-1 ~]$ less /tmp/check_pix_20231127.log
"/tmp/check_pix_20231127.log" may be a binary file.  See it anyway? 

Currently processing the 2nd Plate on -3...

[wmoore@test120-omeroreadonly-3 ~]$ tail -f /tmp/check_pix_20231127.log 
276/384 Check Image:1544183 LT0047_03 [Well A6, Field 1]
277/384 Check Image:1544184 LT0047_03 [Well E20, Field 1]
278/384 Check Image:1544185 LT0047_03 [Well G4, Field 1]
279/384 Check Image:1544186 LT0047_03 [Well K6, Field 1]
280/384 Check Image:1544187 LT0047_03 [Well E2, Field 1]
281/384 Check Image:1544188 LT0047_03 [Well P9, Field 1]
282/384 Check Image:1544189 LT0047_03 [Well E22, Field 1]
283/384 Check Image:1544190 LT0047_03 [Well I4, Field 1]
284/384 Check Image:1544191 LT0047_03 [Well N12, Field 1]
285/384 Check Image:1544192 LT0047_03 [Well F11, Field 1]d 1]1]

[wmoore@test120-omeroreadonly-3 ~]$ grep "A1," /tmp/check_pix_20231127.log 
0/380 Check Image:1548472 LT0060_51 [Well A1, Field 1]
[wmoore@test120-omeroreadonly-3 ~]$ grep "A10," /tmp/check_pix_20231127.log 
69/380 Check Image:1548541 LT0060_51 [Well A10, Field 1]
279/376 Check Image:1545706 LT0048_14 [Well A10, Field 1]

28th: 9:46
omeroreadwrite: Still on first plate... No errors...

(base) [wmoore@test120-omeroreadwrite ~]$ grep "A10," !$
grep "A10," /tmp/check_pix_20231127.log
278/380 Check Image:1649790 LT0142_01 [Well A10, Field 1]
(base) [wmoore@test120-omeroreadwrite ~]$ grep "A1," /tmp/check_pix_20231127.log
0/380 Check Image:1649512 LT0142_01 [Well A1, Field 1]

But then, the log disappeared (file is now empty).
And the log above from omeroreadonly-3 is also now empty:

[wmoore@test120-omeroreadonly-3 ~]$ ls -alh /tmp/check_pix_20231127.log 
-rw-rw-r--. 1 wmoore wmoore 0 Nov 28 10:05 /tmp/check_pix_20231127.log

Probably the issue is that the parallel command above is overwriting the log each time, rather than appending.

On idr-testing server, I do screen -r and get:

...
!! 11/28/23 09:59:56.439 error: communicator not destroyed during global destrucETA: 75512s Left: 487 AVG: 155.06s  1:9/76 2:9/101 3:9/80 4:9/96 5:9/83Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 11/28/23 10:03:47.490 error: communicator not destroyed during global destrucETA: 75304s Left: 486 AVG: 154.95s  1:9/77 2:9/101 3:9/80 4:9/96 5:9/83Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 11/28/23 10:05:34.525 error: communicator not destroyed during global destrucETA: 75089s Left: 485 AVG: 154.82s  1:9/77 2:9/101 3:9/81 4:9/96 5:9/83Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 11/28/23 10:07:14.794 error: communicator not destroyed during global destrucETA: 74993s Left: 484 AVG: 154.95s  1:9/78 2:9/101 3:9/81 4:9/96 5:9/83
Computer:jobs running/jobs completed/%of started jobs
ETA: 75032s Left: 484 AVG: 155.03s  1:9/78/17%/872.8s  2:9/101/22%/674.1s  3:9/81/18%/840.5s  4:9/96/21%/709.2s  5:9/83/19%/820.3s
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 74799s Left: 483 AVG: 154.87s  omeroreadonly-1:9/79/18%/862.8s  omeroreadonly-2:9/101/22%/674.9s  omeroreadonly-3:9/81/18%/841.5s  omeroreadonly-4:9/96/21%/710.1s  omeroreadwrite:9/83/18%/821.3s

Let's kill that screen and re-run (see below...)

@will-moore
Copy link
Member Author

will-moore commented Nov 27, 2023

Running idr0010 on idr-testing: omeroreadonly-1, as wmoore
13:39:

$ screen -S check_pixels
$ python check_pixels.py Screen:1351 --max-planes=sizeC > /tmp/check_pix_20231127_screen1351.log

EDIT: 20:27...

[wmoore@test120-omeroreadonly-1 ~]$ tail -f /tmp/check_pix_20231127_screen1351.log
4118/56832 Check Image:1945115 150-15 [Scan D05]
4119/56832 Check Image:1945116 150-15 [Scan J23]
4120/56832 Check Image:1945117 150-15 [Scan D03]
4121/56832 Check Image:1945118 150-15 [Scan C15]
4122/56832 Check Image:1945119 150-15 [Scan G16]
4123/56832 Check Image:1945120 150-15 [Scan H15]

28th 9:50...

[wmoore@test120-omeroreadonly-1 ~]$ tail -f /tmp/check_pix_20231127_screen1351.log
...
7347/56832 Check Image:3054687 110-35 [Scan L21]
7348/56832 Check Image:3054688 110-35 [Scan D08]
7349/56832 Check Image:3054689 110-35 [Scan D11]

29th 9:25...

$ tail -f /tmp/check_pix_20231127_screen1351.log
...
11183/56832 Check Image:3058523 12-23 [Scan K02]
11184/56832 Check Image:3058524 12-23 [Scan F01]
11185/56832 Check Image:3058525 12-23 [Scan J12]

30th 12:14... About a quarter of the way through idr0010 after 3 days (~ETA 9th December)

[wmoore@test120-omeroreadonly-1 ~]$ tail -f /tmp/check_pix_20231127_screen1351.log
...
14823/56832 Check Image:3062163 128-35 [Scan F19]
14824/56832 Check Image:3062164 128-35 [Scan C24]
14825/56832 Check Image:3062165 128-35 [Scan G29]

4th Dec. Still no Errors

[wmoore@test120-omeroreadonly-1 ~]$ tail /tmp/check_pix_20231127_screen1351.log
29048/56832 Check Image:3076444 24-27 [Scan H02]
29049/56832 Check Image:3076445 24-27 [Scan B07]
29050/56832 Check Image:3076446 24-27 [Scan J13]

6th Dec - Done!

[wmoore@test120-omeroreadonly-1 ~]$ tail /tmp/check_pix_20231127_screen1351.log
56830/56832 Check Image:3104226 96-14 [Scan J20]
56831/56832 Check Image:3104227 96-14 [Scan H13]
End: 2023-12-06 15:15:13.933620

Only Error is Plate 5-12 - known failure in IDR: https://idr.openmicroscopy.org/webclient/?show=plate-5894

[wmoore@test120-omeroreadonly-1 ~]$ grep -B 1 "Error: Image" /tmp/check_pix_20231127_screen1351.log | grep Check | grep "5-12" | wc
    384    2304   18432

@will-moore
Copy link
Member Author

will-moore commented Nov 30, 2023

Check idr0033 on omeroreadwrite...
30th 12:11...

$ python check_pixels.py Screen:1751 --max-planes=sizeC > /tmp/check_pix_20231130_screen1751.log

1st Dec: 20.5 hours, -> predict total time to be 30 days! (ETA ~30th December)

(base) [wmoore@test120-omeroreadwrite ~]$ tail -f /tmp/check_pix_20231130_screen1751.log
...
1176/41472 Check Image:3192407 41744 [Well L15, Field 6]
1177/41472 Check Image:3192408 41744 [Well L15, Field 7]
1178/41472 Check Image:3192409 41744 [Well L15, Field 8]

4th Dec

(base) [wmoore@test120-omeroreadwrite ~]$ tail /tmp/check_pix_20231130_screen1751.log
...
5673/41472 Check Image:3203718 41749 [Well K18, Field 3]
5674/41472 Check Image:3203719 41749 [Well K18, Field 4]
5675/41472 Check Image:3203720 41749 [Well K18, Field 5]

6th Dec

(base) [wmoore@test120-omeroreadwrite ~]$ tail /tmp/check_pix_20231130_screen1751.log
15592/41472 Check Image:3225303 41756 [Well I22, Field 4]
15593/41472 Check Image:3225304 41756 [Well I22, Field 5]
15594/41472 Check Image:3225305 41756 [Well I22, Field 6]

Looks like process got faster after parallel running of idr0013 and idr0016 completed. Done now with no Errors:

(base) [wmoore@test120-omeroreadwrite ~]$ tail /tmp/check_pix_20231130_screen1751.log
41469/41472 Check Image:3260420 41757_illum_corrected [Well A1, Field 7]
41470/41472 Check Image:3260421 41757_illum_corrected [Well A1, Field 8]
41471/41472 Check Image:3260422 41757_illum_corrected [Well A1, Field 9]
End: 2023-12-09 07:19:39.548530
(base) [wmoore@test120-omeroreadwrite ~]$ grep Error /tmp/check_pix_20231130_screen1751.log

@will-moore
Copy link
Member Author

will-moore commented Dec 6, 2023

On omeroreadonly-1, check idr0035...

(venv3) [wmoore@test120-omeroreadonly-1 scripts]$ python check_pixels.py Screen:2001 --max-planes=sizeC > /tmp/check_pix_20231206_screen2001.log

11th Dec - all done, no Errors

[wmoore@test120-omeroreadonly-1 ~]$ tail /tmp/check_pix_20231206_screen2001.log
13191/13200 Check Image:3427292 Week6_31681 [Well G9, Field 3]
13192/13200 Check Image:3427293 Week6_31681 [Well G9, Field 4]
13193/13200 Check Image:3427294 Week6_31681 [Well G5, Field 1]
13194/13200 Check Image:3427295 Week6_31681 [Well G5, Field 2]
13195/13200 Check Image:3427296 Week6_31681 [Well G5, Field 3]
13196/13200 Check Image:3427297 Week6_31681 [Well G5, Field 4]
13197/13200 Check Image:3427298 Week6_31681 [Well B2, Field 2]
13198/13200 Check Image:3427299 Week6_31681 [Well B2, Field 3]
13199/13200 Check Image:3427300 Week6_31681 [Well B2, Field 4]
End: 2023-12-07 12:56:24.361407

[wmoore@test120-omeroreadonly-1 ~]$ grep  Error /tmp/check_pix_20231206_screen2001.log

@will-moore
Copy link
Member Author

will-moore commented Dec 12, 2023

Remaining studies...

  • idr0090 Screen
  • idr0091 Projects
  • idr0015 Screen (re-run due to connection issues above)

Plates for idr0015 and idr0090, on idr-testing omeroreadwrite

omero hql --limit -1 --ids-only --style csv 'select plate.id from Plate as plate where plate in (select child from ScreenPlateLink where parent=1201)' > idr0015_plates.txt
omero hql --limit -1 --ids-only --style csv 'select plate.id from Plate as plate where plate in (select child from ScreenPlateLink where parent=2851)' > idr0090_plates.txt

Edited both files to remove first row. Then... idr0090 first...

cut -d ',' -f2 idr0090_plates.txt | sed -e 's/^/Plate:/' > ids.txt
cut -d ',' -f2 idr0015_plates.txt | sed -e 's/^/Plate:/' >> ids.txt
(venv3) (base) [wmoore@test120-omeroreadwrite ~]$ cat ids.txt | wc
    105     105    1155

Copy the ids.txt onto parent idr-testing server.. rsync -rvP omeroreadwrite:/home/wmoore/ids.txt ./

Run... 10:31...

screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt -j10 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC >> /tmp/check_pix_20231212.log'

20th Dec check logs...

Errors...

[wmoore@test120-proxy ~]$ for n in $(cat nodes); do ssh $n "grep 'Error: Image' /tmp/check_pix_20231212.log | wc"; done
    397    3178   56923
    396    3169   56830
    792    6338  113664
      1       9     147
    631    5284   91492

omeroreadwrite... only 2 Images NOT from know broken plate: http://localhost:1080/webclient/?show=plate-4653

(base) [wmoore@test120-omeroreadwrite ~]$ grep -B 1 "ResourceError" /tmp/check_pix_20231212.log | grep Check | grep -v "TARA_HCS1_H5_G100003406_G100004906--2013_08_24_19_23_14_chamber--U01--V01"
10/396 Check Image:2013055 TARA_HCS1_H5_G100003741_G100003739--2013_09_30_14_59_10_chamber--U01--V01 [Well F2, Field 1]
0/396 Check Image:1972041 TARA_HCS1_H5_G100010241_G100010731--2013_09_29_19_14_59_chamber--U00--V01 [Well A1, Field 1]

TODO: This image http://localhost:1080/webclient/?show=image-2013055 looks OK in webclient (idr-testing) - Need to run check_pixels on just this image. Fails on idr-testing but OK on idr-next...

(base) [wmoore@test120-omeroreadwrite ~]$ grep -A 2 "2013055" /tmp/check_pix_20231212.log
10/396 Check Image:2013055 TARA_HCS1_H5_G100003741_G100003739--2013_09_30_14_59_10_chamber--U01--V01 [Well F2, Field 1]
Error: Image:2013055 TARA_HCS1_H5_G100003741_G100003739--2013_09_30_14_59_10_chamber--U01--V01 [Well F2, Field 1] exception ::omero::ResourceError
{
    serverStackTrace = ome.conditions.ResourceError: /bia-integrator-data/S-BIAD861/7e6e3cd9-5a9a-4e13-b302-d75d7c55e22b/7e6e3cd9-5a9a-4e13-b302-d75d7c55e22b.zarr/F/2/0/0/0/2/0/0/0: Input/output error
(venv3) [wmoore@prod120-omeroreadwrite scripts]$ python check_pixels.py --max-planes=sizeC Image:2013055
Start: 2024-01-04 08:46:17.240774
Checking Image:2013055
max_planes: sizeC
max_images: 0
0/1 Check Image:2013055 TARA_HCS1_H5_G100003741_G100003739--2013_09_30_14_59_10_chamber--U01--V01 [Well F2, Field 1]

Other failure is OK on idr-next. This was a temp failure on idr-testing, but worked on idr0125-pilot.
http://localhost:1080/webclient/?show=image-1972041

(base) [wmoore@test120-omeroreadwrite ~]$ grep -A 2 "1972041" /tmp/check_pix_20231212.log
0/396 Check Image:1972041 TARA_HCS1_H5_G100010241_G100010731--2013_09_29_19_14_59_chamber--U00--V01 [Well A1, Field 1]
Error: Image:1972041 TARA_HCS1_H5_G100010241_G100010731--2013_09_29_19_14_59_chamber--U00--V01 [Well A1, Field 1] exception ::omero::ResourceError
{
    serverStackTrace = ome.conditions.ResourceError: Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/10/13-37-45.953_mkngff/0cc5dbe3-444a-4ea2-a335-b51cf89c1c53.zarr/OME/METADATA.ome.xml

omeroreadonly-1 Lots of ConnectionLostErrors. and a bunch of Mismatch errors...

[wmoore@test120-omeroreadonly-1 ~]$ grep Error /tmp/check_pix_20231212.log | grep -v ConnectionLost
Error: Image:1977981 TARA_HCS1_H5_G100011162_G100012868--2013_10_03_13_08_50_chamber--U00--V01 [Well A1, Field 1] exception ::omero::InternalException
Error: Image:12544202 190220.screen [Well C7, Field 21] exception ::omero::ResourceError
    serverStackTrace = ome.conditions.ResourceError: /bia-integrator-data/S-BIAD882/e62717ea-b060-48e5-8cea-7e4b82f009f4/e62717ea-b060-48e5-8cea-7e4b82f009f4.zarr/C/7/20/0/0/0/0/0/0: Input/output error
    serverExceptionClass = ome.conditions.ResourceError
Error: Mismatch for Image:1977645 TARA_HCS1_H5_G100011084_G100010958--2013_10_04_09_31_04_chamber--U01--V01 [Well L2, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1978821 TARA_HCS1_H5_G100011162_G100012868--2013_10_03_13_08_50_chamber--U01--V01 [Well P10, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1968961 TARA_HCS1_H5_G100009725_G100010454--2013_12_05_21_08_22_chamber--U00--V01 [Well R14, Field 1] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1968982 TARA_HCS1_H5_G100009725_G100010454--2013_12_05_21_08_22_chamber--U00--V01 [Well G18, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:12546509 190313.screen [Well B7, Field 24] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:12549582 190510.screen [Well G9, Field 25] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1977666 TARA_HCS1_H5_G100011084_G100010958--2013_10_04_09_31_04_chamber--U01--V01 [Well A15, Field 1] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:1978842 TARA_HCS1_H5_G100011162_G100012868--2013_10_03_13_08_50_chamber--U01--V01 [Well O11, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1968271 TARA_HCS1_H5_G100008302_G100008304--2013_12_02_21_30_23_chamber--U01--V01 [Well Q13, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:12544229 190220.screen [Well A6, Field 16] at plane (z, c, t): (0, 2, 0)

Image:1977981 (idr0015) looks OK in webclient - No error
Image:12544202 (idr0090) looks OK in webclient - No error, and matches IDR
Comparing first Mismatch for Image:1977645 in webclient: first plane of that image appears same in idr-testing and idr:
But the next one Image:1978821 channel 3 look blank in IDR, but has a few spots in idr-testing!
1968961 looks the same in both, so does 1968982.
12546509 and 12549582 (idr0090) look the same.
1977666 looks the same in both.
1978842 ch0 looks blank in IDR. Histogram different from idr-testing.
1968271 and 12544229 look the same in both.

Check these again, an Image at a time... No errors found!

for i in 1977981 12544202 1977645 1978821 1968961 1968982 12546509 12549582 1977666 1978842 1968271 12544229; do
  python check_pixels.py Image:$i --max-planes=sizeC > /tmp/check_pix_20231220.log
done

grep Error /tmp/check_pix_20231220.log

omeroreadonly-2

[wmoore@test120-omeroreadonly-2 ~]$  grep Error /tmp/check_pix_20231212.log | grep -v ConnectionLost
Error: Image:1962455 TARA_HCS1_H5_G100007665_G100007576--2013_10_28_21_05_26_chamber--U00--V01 [Well A1, Field 1] exception ::omero::ResourceError
    serverStackTrace = ome.conditions.ResourceError: Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828_mkngff/1a29207c-d50b-48b7-a7c0-54c6252bfd9c.zarr/OME/METADATA.ome.xml
    serverExceptionClass = ome.conditions.ResourceError
    message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/06/00-58-20.828_mkngff/1a29207c-d50b-48b7-a7c0-54c6252bfd9c.zarr/OME/METADATA.ome.xml
Error: Mismatch for Image:1967540 TARA_HCS1_H5_G100008302_G100008304--2013_12_02_21_30_23_chamber--U00--V01 [Well N16, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1970107 TARA_HCS1_H5_G100010173_G100010177--2013_08_22_17_12_07_chamber--U00--V01 [Well M3, Field 1] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1969709 TARA_HCS1_H5_G100010173_G100010177--2013_08_22_17_12_07_chamber--U01--V01 [Well I4, Field 1] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:1970498 TARA_HCS1_H5_G100010173_G100010177--2014_06_24_12_25_12_chamber--U00--V01 [Well Q13, Field 1] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1970891 TARA_HCS1_H5_G100010173_G100010177--2014_06_24_12_25_12_chamber--U01--V01 [Well O8, Field 1] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1970914 TARA_HCS1_H5_G100010173_G100010177--2014_06_24_12_25_12_chamber--U01--V01 [Well C8, Field 1] at plane (z, c, t): (0, 4, 0)Start: 2023-12-12 22:58:17.684574
Error: Mismatch for Image:1979208 TARA_HCS1_H5_G100012537_G100012477--2013_10_01_12_30_33_chamber--U01--V01 [Well F17, Field 1] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1979231 TARA_HCS1_H5_G100012537_G100012477--2013_10_01_12_30_33_chamber--U01--V01 [Well L17, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1973302 TARA_HCS1_H5_G100010607_G100010623--2014_06_23_16_05_11_chamber--U00--V01 [Well Q1, Field 1] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1973325 TARA_HCS1_H5_G100010607_G100010623--2014_06_23_16_05_11_chamber--U00--V01 [Well Q13, Field 1] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1975278 TARA_HCS1_H5_G100010824_G100010826--2013_12_03_22_14_42_chamber--U00--V01 [Well K3, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1975301 TARA_HCS1_H5_G100010824_G100010826--2013_12_03_22_14_42_chamber--U00--V01 [Well N6, Field 1] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:1967563 TARA_HCS1_H5_G100008302_G100008304--2013_12_02_21_30_23_chamber--U00--V01 [Well R8, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1974882 TARA_HCS1_H5_G100010623_G100010731--2014_06_20_18_48_17_chamber--U00--V01 [Well D10, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1969732 TARA_HCS1_H5_G100010173_G100010177--2013_08_22_17_12_07_chamber--U01--V01 [Well V6, Field 1] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:1970130 TARA_HCS1_H5_G100010173_G100010177--2013_08_22_17_12_07_chamber--U00--V01 [Well K16, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1970521 TARA_HCS1_H5_G100010173_G100010177--2014_06_24_12_25_12_chamber--U00--V01 [Well T9, Field 1] at plane (z, c, t): (0, 1, 0)

Too many to check manually! Just check_pixels...

for i in 1962455 1967540 1970107 1969709 1970498 1970891 1970914 1979208 1979231 1973302 1973325 1975278 1975301 1967563 1974882 1969732 1970130 1970521; do
  python check_pixels.py Image:$i --max-planes=sizeC >> /tmp/check_pix_20231220_2.log
done

omeroreadonly-3 - Similar pattern as above...

(venv3) [wmoore@test120-omeroreadonly-3 scripts]$ grep Error /tmp/check_pix_20231212.log | grep -v ConnectionLost
Error: Image:1964831 TARA_HCS1_H5_G100008060_G100008062--2013_11_03_22_51_32_chamber--U00--V01 [Well A1, Field 1] exception ::omero::ResourceError
    serverStackTrace = ome.conditions.ResourceError: Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533_mkngff/d69df538-4684-4b32-8ded-d2f2af43af9f.zarr/OME/METADATA.ome.xml
    serverExceptionClass = ome.conditions.ResourceError
    message = Error instantiating pixel buffer: /data/OMERO/ManagedRepository/demo_2/2016-06/06/21-26-25.533_mkngff/d69df538-4684-4b32-8ded-d2f2af43af9f.zarr/OME/METADATA.ome.xml
Error: Image:1998429 TARA_HCS1_H5_G100003584_G100003586--2014_06_26_15_58_43_chamber--U01--V01 [Well A1, Field 1] exception ::omero::InternalException
Error: Mismatch for Image:1976463 TARA_HCS1_H5_G100010824_G100010826--2013_12_03_22_14_42_chamber--U01--V01 [Well E6, Field 1] at plane (z, c, t): (0, 3, 0)Start: 2023-12-12 22:08:19.521952
Error: Mismatch for Image:1976856 TARA_HCS1_H5_G100010891_G100010893--2013_12_04_21_23_54_chamber--U01--V01 [Well C11, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1979994 TARA_HCS1_H5_G100012694_G100012776--2013_10_02_14_27_33_chamber--U01--V01 [Well N14, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1983844 TARA_HCS1_H5_G100002655_G100002656--2013_09_24_15_21_06_chamber--U00--V01 [Well Q4, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1991151 TARA_HCS1_H5_G100003584_G100003586--2014_06_26_15_58_43_chamber--U00--V01 [Well D12, Field 1] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1991172 TARA_HCS1_H5_G100003584_G100003586--2014_06_26_15_58_43_chamber--U00--V01 [Well A18, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1969375 TARA_HCS1_H5_G100009725_G100010454--2013_12_05_21_08_22_chamber--U01--V01 [Well R9, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:2005752 TARA_HCS1_H5_G100003741_G100003739--2013_09_30_14_59_10_chamber--U00--V01 [Well D18, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1967910 TARA_HCS1_H5_G100008608_G100008610--2013_11_05_20_28_35_chamber--U00--V01 [Well L3, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1975672 TARA_HCS1_H5_G100010623_G100010731--2014_06_20_18_48_17_chamber--U01--V01 [Well B2, Field 1] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1975693 TARA_HCS1_H5_G100010623_G100010731--2014_06_20_18_48_17_chamber--U01--V01 [Well J16, Field 1] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:1976877 TARA_HCS1_H5_G100010891_G100010893--2013_12_04_21_23_54_chamber--U01--V01 [Well V6, Field 1] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1976944 TARA_HCS1_H5_G100010891_G100010893--2013_12_04_21_23_54_chamber--U01--V01 [Well I2, Field 1] at plane (z, c, t): (0, 2, 0)

Checked all these failures with check_pixels.py as above - No errors!
Lots of false positives!

omeroreadwrite-4

[wmoore@test120-omeroreadonly-4 ~]$ grep Error /tmp/check_pix_20231212.log | grep -v ConnectionLost
Error: Image:1973695 TARA_HCS1_H5_G100010607_G100010623--2014_06_23_16_05_11_chamber--U01--V01 [Well F7, Field 1] exception ::omero::ResourceError
    serverStackTrace = ome.conditions.ResourceError: /bia-integrator-data/S-BIAD861/00fc2a08-e352-4720-beac-13fd06cda6b2/00fc2a08-e352-4720-beac-13fd06cda6b2.zarr/F/7/0/0/0/2/0/0/0: Input/output error
    serverExceptionClass = ome.conditions.ResourceError
Error: Mismatch for Image:1979602 TARA_HCS1_H5_G100012694_G100012776--2013_10_02_14_27_33_chamber--U00--V01 [Well I13, Field 1] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1979625 TARA_HCS1_H5_G100012694_G100012776--2013_10_02_14_27_33_chamber--U00--V01 [Well O17, Field 1] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:12552500 190614.screen [Well A6, Field 31] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:12552530 190614.screen [Well E7, Field 29] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:12551154 190531.screen [Well B4, Field 29] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:12542480 190211.screen [Well G7, Field 27] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:12542509 190211.screen [Well C11, Field 24] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1972509 TARA_HCS1_H5_G100010237_G100010241--2014_06_25_13_42_23_chamber--U01--V01 [Well H5, Field 1] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:1972532 TARA_HCS1_H5_G100010237_G100010241--2014_06_25_13_42_23_chamber--U01--V01 [Well M5, Field 1] at plane (z, c, t): (0, 3, 0)
Error: Mismatch for Image:1972904 TARA_HCS1_H5_G100010241_G100010731--2013_09_29_19_14_59_chamber--U01--V01 [Well I16, Field 1] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1972926 TARA_HCS1_H5_G100010241_G100010731--2013_09_29_19_14_59_chamber--U01--V01 [Well R12, Field 1] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1974486 TARA_HCS1_H5_G100010623_G100010731--2013_08_23_17_27_23_chamber--U01--V01 [Well T11, Field 1] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1974508 TARA_HCS1_H5_G100010623_G100010731--2013_08_23_17_27_23_chamber--U01--V01 [Well I10, Field 1] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:1974089 TARA_HCS1_H5_G100010623_G100010731--2013_08_23_17_27_23_chamber--U00--V01 [Well N6, Field 1] at plane (z, c, t): (0, 4, 0)
Error: Mismatch for Image:1974111 TARA_HCS1_H5_G100010623_G100010731--2013_08_23_17_27_23_chamber--U00--V01 [Well D12, Field 1] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1973718 TARA_HCS1_H5_G100010607_G100010623--2014_06_23_16_05_11_chamber--U01--V01 [Well K10, Field 1] at plane (z, c, t): (0, 4, 0)

All OK with:

for i in 1973718 1974111 1974089 1974508 1974486 1972926 1972904 1972532 1972509 12542480 12551154 12552530 12552500 1979625 1979602 1973695; do python check_pixels.py Image:$i --max-planes=sizeC; done

@will-moore
Copy link
Member Author

will-moore commented Dec 12, 2023

Checking idr0091 on idr-testing:omeroreadwrite...

python check_pixels.py Project:1351 --max-planes=sizeC > /tmp/check_pix_20231212_project1351.log

13th December

(base) [wmoore@test120-omeroreadwrite ~]$ tail /tmp/check_pix_20231212_project1351.log
1807/1816 Check Image:10649216 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL17.tif
1808/1816 Check Image:10649217 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL18.tif
1809/1816 Check Image:10649218 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL19.tif
1810/1816 Check Image:10649219 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL20.tif
1811/1816 Check Image:10649220 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL22.tif
1812/1816 Check Image:10649221 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL23.tif
1813/1816 Check Image:10649222 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL24.tif
1814/1816 Check Image:10649223 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL25.tif
1815/1816 Check Image:10649224 20180319_glu_lac_ramp40min_1_MMStack_Pos3_preproc_GL26.tif
End: 2023-12-12 12:19:15.160420

Noticed pixeltype bug - see ome/omero-cli-zarr#157 - re-exported data etc at IDR/idr-metadata#650 (comment)

EDIT 3rd Jan...
Errors in logs (re-ran as above as temp logs have gone)...

All of the .pattern files have same Error...

(base) [wmoore@test120-omeroreadwrite ~]$ grep Error /tmp/check_pix_20240103_project1351.log | wc
    342    3420   31926
(base) [wmoore@test120-omeroreadwrite ~]$ grep Check /tmp/check_pix_20240103_project1351.log | grep pattern | wc
    342    1368   20719
(base) [wmoore@test120-omeroreadwrite ~]$ grep Error /tmp/check_pix_20240103_project1351.log
...
Error: Image:10648765 20161207_Pos1_GL17.pattern unpack requires a buffer of 169276 bytes
Error: Image:10648766 20161207_Pos1_GL18.pattern unpack requires a buffer of 169276 bytes
Error: Image:10648767 20161207_Pos1_GL19.pattern unpack requires a buffer of 169276 bytes
Error: Image:10648768 20161207_Pos1_GL21.pattern unpack requires a buffer of 169276 bytes
Error: Image:10648769 20161207_Pos1_GL23.pattern unpack requires a buffer of 169276 bytes
Error: Image:10648770 20161207_Pos1_GL25.pattern unpack requires a buffer of 169276 bytes
Error: Image:10648771 20161207_Pos1_GL26.pattern unpack requires a buffer of 169276 bytes

@will-moore
Copy link
Member Author

will-moore commented Jan 6, 2024

omero hql --limit -1 --ids-only --style csv 'select MIN(field.image.id) FROM WellSample AS field GROUP BY field.well.plate' > plates.txt

for r in $(cat $plates.txt); do
  iid=$(echo $r | cut -d',' -f2)
  echo "Image:$iid" >> objects.txt
done

Same for datasets.txt >> objects.txt

Copied onto the parent host node, then...

screen -dmS cache parallel --eta --sshloginfile nodes -a objects.txt  -j10 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC --max-images=2 >> /tmp/check_pix_20240106.log'

@will-moore
Copy link
Member Author

will-moore commented Jan 7, 2024

check_pixels Error: - Ignoring ApiUsageException (from getPlane() on Big Images):
On idr-next: omeroreadwrite:

grep Error /tmp/check_pix_20240106.log | grep -v ApiUsageException | less

Need to focus on Error: Mismatch...

$ grep Mismatch /tmp/check_pix_20240106.log

Error: Mismatch for Image:1343827 0114-14--2006-05-22 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1343827 0114-14--2006-05-22 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1343827 0114-14--2006-05-22 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 2, 0)

No Mismatch errors on omeroreadonly-1
On omeroreadonly-2:

[wmoore@prod120-omeroreadonly-2 ~]$ grep Mismatch /tmp/check_pix_20240106.log
Error: Mismatch for Image:1481751 0307-10--2007-05-30 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1481751 0307-10--2007-05-30 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1481751 0307-10--2007-05-30 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 2, 0)

From idr0009, on Plates where first Well is blank. Also more below. No other mismatches.

[wmoore@prod120-omeroreadonly-4 ~]$ grep Mismatch /tmp/check_pix_20240106.log
Error: Mismatch for Image:1442953 0087-22--2006-02-24 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1442953 0087-22--2006-02-24 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1442953 0087-22--2006-02-24 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 2, 0)
Error: Mismatch for Image:1456009 0095-41--2006-03-05 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 0, 0)
Error: Mismatch for Image:1456009 0095-41--2006-03-05 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 1, 0)
Error: Mismatch for Image:1456009 0095-41--2006-03-05 [Well 1, Field 1 (Spot 1)] at plane (z, c, t): (0, 2, 0)

@will-moore
Copy link
Member Author

The last 2 commits are tested at IDR/idr-metadata#685 (comment)

@will-moore
Copy link
Member Author

We want to use check_pixels.py to test the server under load...
Update idr-next with this branch... on omeroreadwrite...

On idr-next proxy...

[wmoore@prod120-proxy ~]$ cat nodes 
omeroreadonly-1
omeroreadonly-2
omeroreadonly-3
omeroreadonly-4
omeroreadwrite

[wmoore@prod120-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -j50 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py Plate:4501 --host localhost > /tmp/check_pix_20240129.log'

@will-moore
Copy link
Member Author

will-moore commented Jan 29, 2024

Logs files are not being created as expected:

[wmoore@prod120-proxy ~]$ for n in $(cat nodes); do ssh $n "cat /tmp/check_pix_20240129.log"; done
cat: /tmp/check_pix_20240129.log: No such file or directory
cat: /tmp/check_pix_20240129.log: No such file or directory
cat: /tmp/check_pix_20240129.log: No such file or directory
cat: /tmp/check_pix_20240129.log: No such file or directory
cat: /tmp/check_pix_20240129.log: No such file or directory

We have 2 cache named screens...

[wmoore@prod120-proxy ~]$ screen -r
There are several suitable screens on:
	5482.cache	(Detached)
	29489.cache	(Detached)
Type "screen [-d] -r [pid.]tty.host" to resume one of them.
[wmoore@prod120-proxy ~]$ screen -r 5482.cache

parallel: Warning: Input is read from the terminal.
parallel: Warning: Only experts do this on purpose. Press CTRL-D to exit.

Used Ctrl-D, then I see the 2nd screen...

...
WARNING:omero.gateway:UnknownLocalException on <class 'omero.gateway.OmeroGatewaySafeCallWrapper'> to <a511de6b-24ec-4b35-a547-a0c7251e7e4comero.api.RawPixelsStore> getPlane((0, 0, 0), {})
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_RawPixelsStore_ice.py", line 1199, in getPlane
    return _M_omero.api.RawPixelsStore._op_getPlane.invoke(self, ((z, c, t), _ctx))
Ice.UnknownLocalException: exception ::Ice::UnknownLocalException
{   
    unknown = ConnectionI.cpp:1573: Ice::MemoryLimitException:
protocol error: memory limit exceeded:
requested 2047868958 bytes, maximum allowed is 256000000 bytes (see Ice.MessageSizeMax)
}
ERROR:omero.gateway:Failed to getPlane() or getTile() from rawPixelsStore
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 7529, in getTiles
    rawPlane = rawPixelsStore.getPlane(z, c, t)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4859, in __call__
    return self.handle_exception(e, *args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_RawPixelsStore_ice.py", line 1199, in getPlane
    return _M_omero.api.RawPixelsStore._op_getPlane.invoke(self, ((z, c, t), _ctx))
Ice.UnknownLocalException: exception ::Ice::UnknownLocalException
{   
    unknown = ConnectionI.cpp:1573: Ice::MemoryLimitException:
protocol error: memory limit exceeded:
requested 2047868958 bytes, maximum allowed is 256000000 bytes (see Ice.MessageSizeMax)
}
!! 01/07/24 16:53:25.602 error: communicator not destroyed during global destrucETA: 24s Left: 8 AVG: 3.04s  1:0/4413 2:4/4368 3:1/4371 4:1/4582 5:2/3950Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 01/07/24 16:53:26.208 error: communicator not destroyed during global destrucETA: 21s Left: 7 AVG: 3.04s  1:0/4413 2:4/4368 3:0/4372 4:1/4582 5:2/3950Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 01/07/24 16:54:01.080 error: communicator not destroyed during global destrucETA: 18s Left: 6 AVG: 3.04s  1:0/4413 2:4/4368 3:0/4372 4:1/4582 5:1/3951Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 01/07/24 16:54:55.607 error: communicator not destroyed during global destrucETA: 15s Left: 5 AVG: 3.05s  1:0/4413 2:3/4369 3:0/4372 4:1/4582 5:1/3951Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 01/07/24 16:55:36.373 error: communicator not destroyed during global destrucETA: 12s Left: 4 AVG: 3.05s  1:0/4413 2:2/4370 3:0/4372 4:1/4582 5:1/3951Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 01/07/24 16:56:59.529 error: communicator not destroyed during global destrucETA: 9s Left: 3 AVG: 3.05s  1:0/4413 2:1/4371 3:0/4372 4:1/4582 5:1/3951Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 01/07/24 16:58:50.698 error: communicator not destroyed during global destrucETA: 6s Left: 2 AVG: 3.39s  1:0/4413 2:0/4372 3:0/4372 4:1/4582 5:1/3951Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
!! 01/07/24 19:00:55.318 error: communicator not destroyed during global destrucETA: 90s Left: 1 AVG: 90.11s  1:0/4413 2:0/4372 3:0/4372 4:0/4583 5:1/3951
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 90s Left: 1 AVG: 90.11s  omeroreadonly-1:0/4413/20%/442.9s  omeroreadonly-2:0/4372/20%/447.1s  omeroreadonly-3:0/4372/20%/447.1s  omeroreadonly-4:0/4583/21%/426.5s  omeroreadwrite:1/3951/18%/494.7s
ETA: 90s Left: 1 AVG: 90.11s  omeroreadonly-1:0/4413/20%/442.9s  omeroreadonly-2:0/4372/20%/447.1s  omeroreadonly-3:0/4372/20%/447.1s  omeroreadonly-4:0/4583/21%/426.5s  omeroreadwrite:1/3951/18%/494.7s
ETA: 90s Left: 1 AVG: 90.11s  omeroreadonly-1:0/4413/20%/442.9s  omeroreadonly-2:0/4372/20%/447.1s  omeroreadonly-3:0/4372/20%/447.1s  omeroreadonly-4:0/4583/21%/426.5s  omeroreadwrite:1/3951/18%/494.7s
ETA: 90s Left: 1 AVG: 90.12s  omeroreadonly-1:0/4413/20%/443.0s  omeroreadonly-2:0/4372/20%/447.1s  omeroreadonly-3:0/4372/20%/447.1s  omeroreadonly-4:0/4583/21%/426.6s  omeroreadwrite:1/3951/18%/494.8s

Last line is repeated when I hit Enter, but doesn't update.

Eventually quit with...

[wmoore@prod120-proxy ~]$ screen -X -S 29489.cache quit

@will-moore
Copy link
Member Author

Test again with smaller number of jobs - 2! With a smaller Plate (from idr0011):

screen -dmS cache parallel --eta --sshloginfile nodes -j2 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py Plate:5387 --host localhost >> /tmp/check_pix_20240129.log'

Got the same as above:

[wmoore@prod120-proxy ~]$ screen -r

parallel: Warning: Input is read from the terminal.
parallel: Warning: Only experts do this on purpose. Press CTRL-D to exit.

Let's go back to running as above, with an Input file of IDs...

With Plate IDs just from idr0013...

[wmoore@prod120-proxy ~]$ cat ids.txt | wc
    510     510    5610

less.ids.txt
...
Plate:3510
Plate:3511
Plate:3512
Plate:3513
Plate:3514
Plate:3515

Exactly as above (except for log file)...

screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt -j10 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC >> /tmp/check_pix_20240129.log'

Then screen -r:

Computers / CPU cores / Max jobs to run
1:omeroreadonly-1 / 8 / 9
2:omeroreadonly-2 / 8 / 9
3:omeroreadonly-3 / 8 / 9
4:omeroreadonly-4 / 8 / 9
5:omeroreadwrite / 16 / 9

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 510 AVG: 0.00s  omeroreadonly-1:9/0/20%/0.0s  omeroreadonly-2:9/0/20%/0.0s  omeroreadonly-3:9/0/20%/0.0s  omeroreadonly-4:9/0/20%/0.0s  omeroreadwrite:9/0/20%/0.0s Previous session expired for public on localhost:4064
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 622, in attach
    rv = store.attach(server, name, uuid, set_current=set_current)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 349, in attach
    set_current=set_current)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 380, in create
    sf = client.createSession(name, pasw)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/clients.py", line 640, in createSession
    prx = rtr.createSession(username, password, ctx)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/Glacier2_Router_ice.py", line 258, in createSession
    return _M_Glacier2.Router._op_createSession.invoke(self, ((userId, password), _ctx))
Glacier2.PermissionDeniedException: exception ::Glacier2::PermissionDeniedException
{
    reason = Password check failed for 'b3a1f9d4-7b25-4106-bf31-69e1107697a3': []
}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/omero/server/venv3/bin/omero", line 8, in <module>
    sys.exit(main())
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/main.py", line 126, in main
    rv = omero.cli.argv()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1787, in argv
    cli.invoke(args[1:])
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1225, in invoke
    stop = self.onecmd(line, previous_args)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1302, in onecmd
    self.execute(line, previous_args)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1384, in execute
    args.func(args)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 488, in <lambda>
    login.set_defaults(func=lambda args: sessions.login(args))
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 517, in login
    check_group=True)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 615, in check_and_attach
    return self.attach(store, server, name, uuid, props, exists)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/plugins/sessions.py", line 627, in attach
    store.clear(server, name, uuid)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 443, in clear
    self.walk(f, host, name, sess)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 335, in walk
    func(h, n, s)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 441, in f
    s.remove()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_ext/path.py", line 1313, in remove
    os.remove(self)
FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/b3a1f9d4-7b25-4106-bf31-69e1107697a3')
ETA: 11556s Left: 509 AVG: 161.00s  omeroreadonly-1:9/0/19%/0.0s  omeroreadonly-2:9/0/19%/0.0s  omeroreadonly-3:9/0/19%/0.0s  omeroreadonly-4:9/1/21%/174.0s  omeroreadwrite:9/0/19%/0.0s 

Don't know if this was output under screen -r before (when it worked above).

This took a while but then started showing output...

[wmoore@prod120-proxy ~]$ for n in $(cat nodes); do ssh $n "cat /tmp/check_pix_20240129.log | wc"; done
      0       0       0
      0       0       0
    147    1156    8236
      0       0       0
      0       0       0

screen -r

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
...
ETA: 335624s Left: 509 AVG: 1131.00s  omeroreadonly-1:9/0/19%/0.0s  omeroreadonly-2:9/0/19%/0.0s  omeroreadonly-3:9/0/19%/0.0s  omeroreadonly-4:9/1/21%/1144.0s  omeroreadwrite:9/0/19%/0.0s

@will-moore
Copy link
Member Author

Seb: "At leat 1 of these 5 servers is the omeroreadwrite that end-users should never access
and I believe API access might only be distributed between 2 read-only servers"

So, reducing the nodes list 2 just readonly-1 and readonly-2...

Also, want to run above with --host localhost to avoid hitting idr.openmicroscopy.org...

Killed the Screen running above... deleted logs.

[wmoore@prod120-proxy ~]$ screen -X -S 10976.cache quit
[wmoore@prod120-proxy ~]$ for n in $(cat nodes); do ssh $n "rm /tmp/check_pix_20240129.log"; done

reduced servers in nodes:

$ cat nodes 
omeroreadonly-1
omeroreadonly-2

Ran again...

[wmoore@prod120-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt -j50 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC --host=localhost >> /tmp/check_pix_20240129.log'
parallel: Warning: Using only 31 connections to avoid race conditions.
parallel: Warning: ssh to omeroreadonly-2 only allows for 33 simultaneous logins.
parallel: Warning: You may raise this by changing /etc/ssh/sshd_config:MaxStartups and MaxSessions on omeroreadonly-2.
parallel: Warning: Using only 32 connections to avoid race conditions.

Computers / CPU cores / Max jobs to run
1:omeroreadonly-1 / 8 / 31
2:omeroreadonly-2 / 8 / 32

Computer:jobs running/jobs completed/%of started jobs
ETA: 0s Left: 510 AVG: 0.00s  1:31/0/49%/0.0s  2:32/0/50%/0.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 509 AVG: 0.00s  1:31/1/50%/12.0s  2:32/0/50%/0.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 508 AVG: 0.00s  1:31/1/49%/12.0s  2:32/1/50%/12.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 507 AVG: 0.00s  1:31/2/50%/6.0s  2:32/1/50%/12.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 506 AVG: 0.00s  1:31/2/49%/6.0s  2:32/2/50%/6.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 505 AVG: 0.00s  1:31/2/48%/6.0s  2:32/3/51%/4.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 504 AVG: 0.00s  1:31/2/47%/6.0s  2:32/4/52%/3.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 503 AVG: 0.00s  1:31/2/47%/6.0s  2:32/5/52%/2.4s ssh_exchange_identification: read: Connection reset by peer
ETA: 0s Left: 502 AVG: 0.00s  1:31/3/47%/4.0s  2:32/5/52%/2.4s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 501 AVG: 0.00s  1:31/4/48%/3.0s  2:32/5/51%/2.4s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 500 AVG: 0.00s  1:31/4/47%/3.0s  2:32/6/52%/2.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 499 AVG: 0.00s  1:31/5/48%/2.4s  2:32/6/51%/2.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 498 AVG: 0.00s  1:31/5/48%/2.4s  2:32/7/52%/1.7s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 497 AVG: 0.00s  1:31/5/47%/2.4s  2:32/8/52%/1.5s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 496 AVG: 0.00s  1:31/5/46%/2.4s  2:32/9/53%/1.3s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 495 AVG: 0.00s  1:31/6/47%/2.0s  2:32/9/52%/1.3s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 494 AVG: 0.00s  1:31/7/48%/1.7s  2:32/9/51%/1.3s ssh_exchange_identification: Connection closed by remote host
ETA: 43s Left: 493 AVG: 0.24s  1:31/7/47%/2.3s  2:32/10/52%/1.6s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
stdin is not a terminal: cannot request server
Traceback (most recent call last):
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 171, in <module>
    main(sys.argv[1:])
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 143, in main
    with cli_login() as cli:
  File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1739, in cli_login
    raise Exception("Failed to login")
Exception: Failed to login
ETA: 52s Left: 492 AVG: 0.22s  1:31/8/48%/2.0s  2:32/10/51%/1.6s 
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 102s Left: 492 AVG: 0.56s  omeroreadonly-1:31/8/48%/2.8s  omeroreadonly-2:32/10/51%/2.2s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
stdin is not a terminal: cannot request server
Traceback (most recent call last):
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 171, in <module>
    main(sys.argv[1:])
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 143, in main
    with cli_login() as cli:
  File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1739, in cli_login
    raise Exception("Failed to login")
Exception: Failed to login
ETA: 150s Left: 491 AVG: 0.68s  omeroreadonly-1:31/9/48%/2.8s  omeroreadonly-2:32/10/51%/2.5s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
stdin is not a terminal: cannot request server
Traceback (most recent call last):
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 171, in <module>
    main(sys.argv[1:])
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 143, in main
    with cli_login() as cli:
  File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1739, in cli_login
    raise Exception("Failed to login")
Exception: Failed to login
ETA: 280s Left: 490 AVG: 1.15s  omeroreadonly-1:31/9/48%/3.9s  omeroreadonly-2:32/11/51%/15.7s)

Seems like we're not able to connect to --host=localhost. Confirmed:

[wmoore@prod120-proxy ~]$ for n in $(cat nodes); do ssh $n "cat /tmp/check_pix_20240129.log"; done
Start: 2024-01-29 15:24:10.766948
Checking Plate:3457
max_planes: sizeC
max_images: 0
check timing: False
Start: 2024-01-29 15:24:17.575473
Checking Plate:3467
max_planes: sizeC
max_images: 0
check timing: False
Start: 2024-01-29 15:24:21.429353
Checking Plate:3464
max_planes: sizeC
max_images: 0
check timing: False
Previous session expired for public on localhost:4064

Let's try idr-testing...

Killed the screen, Deleted logs again.

[wmoore@prod120-proxy ~]$ screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt -j50 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC --host=idr-testing >> /tmp/check_pix_20240129.log'

But screen -r shows lots of:

Ice.DNSException: exception ::Ice::DNSException
{
    error = -2
    host = idr-testing
}

@will-moore
Copy link
Member Author

Switch to using #62

screen -dmS cache parallel --eta --sshloginfile nodes -a ids.txt -j50 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC --no-check >> /tmp/check_pix_20240129.log'

screen -r








Academic tradition requires you to cite works you base your article on.
When using programs that use GNU Parallel to process data for publication
please cite:

  O. Tange (2011): GNU Parallel - The Command-Line Power Tool,
  ;login: The USENIX Magazine, February 2011:42-47.

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

To silence the citation notice: run 'parallel --bibtex'.

parallel: Warning: ssh to omeroreadonly-1 only allows for 34 simultaneous logins.
parallel: Warning: You may raise this by changing /etc/ssh/sshd_config:MaxStartups and MaxSessions on omeroreadonly-1.
parallel: Warning: Using only 33 connections to avoid race conditions.
parallel: Warning: ssh to omeroreadonly-2 only allows for 36 simultaneous logins.
parallel: Warning: You may raise this by changing /etc/ssh/sshd_config:MaxStartups and MaxSessions on omeroreadonly-2.
parallel: Warning: Using only 35 connections to avoid race conditions.
parallel: Warning: Could not figure out number of cpus on omeroreadonly-1 (). Using 1.

Computers / CPU cores / Max jobs to run
1:omeroreadonly-1 / 1 / 33
2:omeroreadonly-2 / 8 / 35

Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 510 AVG: 0.00s  omeroreadonly-1:33/0/48%/0.0s  omeroreadonly-2:35/0/51%/0.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 509 AVG: 0.00s  omeroreadonly-1:33/1/49%/13.0s  omeroreadonly-2:35/0/50%/0.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 508 AVG: 0.00s  omeroreadonly-1:33/2/50%/6.5s  omeroreadonly-2:35/0/50%/0.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 507 AVG: 0.00s  omeroreadonly-1:33/2/49%/6.5s  omeroreadonly-2:35/1/50%/13.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 506 AVG: 0.00s  omeroreadonly-1:33/2/48%/6.5s  omeroreadonly-2:35/2/51%/6.5s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 505 AVG: 0.00s  omeroreadonly-1:33/3/49%/4.3s  omeroreadonly-2:35/2/50%/6.5s ssh_exchange_identification: read: Connection reset by peer
ETA: 0s Left: 504 AVG: 0.00s  omeroreadonly-1:33/3/48%/4.3s  omeroreadonly-2:35/3/51%/4.3s ssh_exchange_identification: read: Connection reset by peer
ETA: 0s Left: 503 AVG: 0.00s  omeroreadonly-1:33/4/49%/3.2s  omeroreadonly-2:35/3/50%/4.3s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 502 AVG: 0.00s  omeroreadonly-1:33/4/48%/3.2s  omeroreadonly-2:35/4/51%/3.2s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 501 AVG: 0.00s  omeroreadonly-1:33/4/48%/3.2s  omeroreadonly-2:35/5/51%/2.6s ssh_exchange_identification: read: Connection reset by peer
ETA: 0s Left: 500 AVG: 0.00s  omeroreadonly-1:33/5/48%/2.6s  omeroreadonly-2:35/5/51%/2.6s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 499 AVG: 0.00s  omeroreadonly-1:33/5/48%/2.6s  omeroreadonly-2:35/6/51%/2.2s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 498 AVG: 0.00s  omeroreadonly-1:33/6/48%/2.2s  omeroreadonly-2:35/6/51%/2.2s ssh_exchange_identification: read: Connection reset by peer
ETA: 109s Left: 497 AVG: 0.85s  omeroreadonly-1:33/6/48%/4.0s  omeroreadonly-2:35/7/51%/3.4s Previous session expired for public on localhost:4064
Created session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
stdin is not a terminal: cannot request server
Traceback (most recent call last):
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 181, in <module>
    main(sys.argv[1:])
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 147, in main
    with cli_login() as cli:
  File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1739, in cli_login
    raise Exception("Failed to login")
Exception: Failed to login
ETA: 200s Left: 496 AVG: 1.21s  omeroreadonly-1:33/7/48%/4.3s  omeroreadonly-2:35/7/51%/4.3s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
stdin is not a terminal: cannot request server
Traceback (most recent call last):
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 181, in <module>
    main(sys.argv[1:])
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 147, in main
    with cli_login() as cli:
  File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1739, in cli_login
    raise Exception("Failed to login")
Exception: Failed to login
ETA: 1792s Left: 495 AVG: 5.80s  omeroreadonly-1:33/8/49%/12.5s  omeroreadonly-2:35/7/50%/14.3

See lots of threads starting the check_pixels.py...

$ for n in $(cat nodes); do ssh $n "cat /tmp/check_pix_20240129.log"; done
...
Start: 2024-01-29 16:10:10.939264
Checking Plate:3464
max_planes: sizeC
max_images: 0
check timing: False
Start: 2024-01-29 16:10:10.899019
Checking Plate:3522
max_planes: sizeC
max_images: 0
check timing: False
Start: 2024-01-29 16:10:10.934005
Checking Plate:3559
max_planes: sizeC
max_images: 0
check timing: False
Start: 2024-01-29 16:10:10.984351
Checking Plate:3555
max_planes: sizeC
max_images: 0
check timing: False

But we've not reached and Check Image... yet...

@will-moore
Copy link
Member Author

will-moore commented Jan 29, 2024

Still running above... issues with logging-in:

screen -r

...
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
        at omero.cmd.CallContext.invoke(CallContext.java:85)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
        at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:213)
        at com.sun.proxy.$Proxy111.getEventContext_async(Unknown Source)
        at omero.api._IAdminTie.getEventContext_async(_IAdminTie.java:222)
        at omero.api._IAdminDisp.___getEventContext(_IAdminDisp.java:2115)
        at omero.api._IAdminDisp.__dispatch(_IAdminDisp.java:2301)
        at IceInternal.Incoming.invoke(Incoming.java:221)
        at Ice.ConnectionI.invokeAll(ConnectionI.java:2536)
        at Ice.ConnectionI.dispatch(ConnectionI.java:1145)
        at Ice.ConnectionI.message(ConnectionI.java:1056)
        at IceInternal.ThreadPool.run(ThreadPool.java:395)
        at IceInternal.ThreadPool.access$300(ThreadPool.java:12)
        at IceInternal.ThreadPool$EventHandlerThread.run(ThreadPool.java:832)
        at java.base/java.lang.Thread.run(Thread.java:829)

    serverExceptionClass = ome.conditions.DatabaseBusyException
    message = cannot create transaction
    backOff = 0
}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/omero/server/venv3/bin/omero", line 8, in <module>
    sys.exit(main())
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/main.py", line 126, in main
    rv = omero.cli.argv()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1787, in argv
    cli.invoke(args[1:])
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1225, in invoke
    stop = self.onecmd(line, previous_args)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1302, in onecmd
    self.execute(line, previous_args)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1384, in execute
    args.func(args)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 488, in <lambda>
    login.set_defaults(func=lambda args: sessions.login(args))
  File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 517, in login
    check_group=True)
  File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 615, in check_and_attach
    return self.attach(store, server, name, uuid, props, exists)
  File "/opt/omero/server/venv3/lib/python3.6/site-packages/omero/plugins/sessions.py", line 627, in attach
    store.clear(server, name, uuid)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 443, in clear
    self.walk(f, host, name, sess)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 335, in walk
    func(h, n, s)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/util/sessions.py", line 441, in f
    s.remove()
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_ext/path.py", line 1313, in remove
    os.remove(self)
FileNotFoundError: [Errno 2] No such file or directory: path('/home/wmoore/omero/sessions/localhost/public/b0b4faf5-bca6-4117-8759-1b808c18b432')
ETA: 2180s Left: 459 AVG: 4.86s  omeroreadonly-1:33/8/34%/32.6s  omeroreadonly-2:35/43/65%/6.1s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
stdin is not a terminal: cannot request server
Traceback (most recent call last):
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 181, in <module>
    main(sys.argv[1:])
  File "/uod/idr/metadata/idr-utils/scripts/check_pixels.py", line 147, in main
    with cli_login() as cli:
  File "/usr/lib64/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/cli.py", line 1739, in cli_login
    raise Exception("Failed to login")
Exception: Failed to login

Edit - looking OK...

[wmoore@prod120-proxy ~]$ for n in $(cat nodes); do ssh $n "cat /tmp/check_pix_20240129.log | wc"; done
  12887  101852  730917
   9726   76203  546400
[wmoore@prod120-proxy ~]$ for n in $(cat nodes); do ssh $n "cat /tmp/check_pix_20240129.log | grep Error | wc"; done
      0       0       0
      0       0       0

@will-moore
Copy link
Member Author

will-moore commented Jan 30, 2024

After running for ~18 hours now...
On idr-testing proxy...

[wmoore@prod120-proxy ~]$ screen -r
...
ETA: 1550s Left: 14 AVG: 110.72s  omeroreadonly-1:12/227/46%/242.0s  omeroreadonly-2:2/269/53%/204.2s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 1436s Left: 13 AVG: 110.52s  omeroreadonly-1:11/228/46%/241.0s  omeroreadonly-2:2/269/53%/204.2s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 1323s Left: 12 AVG: 110.30s  omeroreadonly-1:10/229/46%/239.9s  omeroreadonly-2:2/269/53%/204.3s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 1211s Left: 11 AVG: 110.11s  omeroreadonly-1:9/230/46%/239.0s  omeroreadonly-2:2/269/53%/204.3s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 1098s Left: 10 AVG: 109.89s  omeroreadonly-1:8/231/46%/237.9s  omeroreadonly-2:2/269/53%/204.3s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 987s Left: 9 AVG: 109.70s  omeroreadonly-1:7/232/46%/236.9s  omeroreadonly-2:2/269/53%/204.4s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 875s Left: 8 AVG: 109.50s  omeroreadonly-1:6/233/46%/236.0s  omeroreadonly-2:2/269/53%/204.4s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 765s Left: 7 AVG: 109.29s  omeroreadonly-1:5/234/46%/235.0s  omeroreadonly-2:2/269/53%/204.4s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 654s Left: 6 AVG: 109.08s  omeroreadonly-1:4/235/46%/234.0s  omeroreadonly-2:2/269/53%/204.4s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 544s Left: 5 AVG: 108.87s  omeroreadonly-1:3/236/46%/233.0s  omeroreadonly-2:2/269/53%/204.4s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 434s Left: 4 AVG: 108.66s  omeroreadonly-1:2/237/46%/232.1s  omeroreadonly-2:2/269/53%/204.4s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 325s Left: 3 AVG: 108.47s  omeroreadonly-1:1/238/46%/231.1s  omeroreadonly-2:2/269/53%/204.5s Using session for public@localhost:4064. Idle timeout: 10 min. Current group: Public
ETA: 269s Left: 2 AVG: 134.78s  omeroreadonly-1:0/239/46%/286.5s  omeroreadonly-2:2/269/53%/254.6s

Seems to be running happily - need to check logs...

On omeroreadonly-1 it looks like 35 threads started at the same time, within the minute 16:06...

[wmoore@prod120-omeroreadonly-1 ~]$ grep "Start: 2024-01-29 16:06" /tmp/check_pix_20240129.log | wc
     35     329    3005

Checking both logs...

[wmoore@prod120-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "ls -alh /tmp/check_pix_20240129.log && tail /tmp/check_pix_20240129.log"; done
omeroreadonly-1
-rw-rw-r--. 1 wmoore wmoore 4.9M Jan 30 07:22 /tmp/check_pix_20240129.log
371/380 Check Image:1664751 LT0154_03 [Well K13, Field 1]
372/380 Check Image:1664752 LT0154_03 [Well E14, Field 1]
373/380 Check Image:1664753 LT0154_03 [Well F10, Field 1]
374/380 Check Image:1664754 LT0154_03 [Well H21, Field 1]
375/380 Check Image:1664755 LT0154_03 [Well I18, Field 1]
376/380 Check Image:1664756 LT0154_03 [Well B17, Field 1]
377/380 Check Image:1664757 LT0154_03 [Well N4, Field 1]
378/380 Check Image:1664758 LT0154_03 [Well H9, Field 1]
379/380 Check Image:1664759 LT0154_03 [Well P14, Field 1]
End: 2024-01-30 07:22:52.683000
omeroreadonly-2
-rw-rw-r--. 1 wmoore wmoore 4.7M Jan 30 07:07 /tmp/check_pix_20240129.log
375/384 Check Image:1672142 LT0602_04 [Well A15, Field 1]
376/384 Check Image:1672143 LT0602_04 [Well F21, Field 1]
377/384 Check Image:1672144 LT0602_04 [Well J21, Field 1]
378/384 Check Image:1672145 LT0602_04 [Well F14, Field 1]
379/384 Check Image:1672146 LT0602_04 [Well B15, Field 1]
380/384 Check Image:1672147 LT0602_04 [Well E6, Field 1]
381/384 Check Image:1672148 LT0602_04 [Well F17, Field 1]
382/384 Check Image:1672149 LT0602_04 [Well H14, Field 1]
383/384 Check Image:1672150 LT0602_04 [Well I8, Field 1]
End: 2024-01-30 07:07:44.942422

[wmoore@prod120-proxy ~]$ for n in $(cat nodes); do echo $n && ssh $n "grep 'Checking Plate' /tmp/check_pix_20240129.log" | wc; done
omeroreadonly-1
    233     466    4660
omeroreadonly-2
    255     510    5100

EDIT: 12:25
In total we have 488 "Checking Plate"... logs, out of 510 Plates from idr0013 in ids.txt.
But there has still been no activity since ~7:20 am (5 hours).

Time to cancel this and start new run (see below)...

screen -r
...
ETA: 287s Left: 2 AVG: 143.97s  omeroreadonly-1:0/239/46%/306.1s  omeroreadonly-2:2/269/53%/271.9s

screen -X -S 19572.cache quit

@will-moore
Copy link
Member Author

Updated script at #62 to add --render option to renderImage() instead of getPlane().
Let's try running as above but with idr0016 (multi-channel images). Using idr0090 won't give us very many plates to run in parallel (22 Plates) whereas idr0016 has 413 Plates...

on idr-next:omeroreadonly

omero hql --limit -1 --ids-only --style csv 'select plate.id from Plate as plate where plate in (select child from ScreenPlateLink where parent=2851)' > idr0016_plates.txt
cut -d ',' -f2 idr0016_plates.txt | sed -e 's/^/Plate:/' >> ids_idr0016.txt

on proxy idr-next

rsync -rvP omeroreadwrite:/home/wmoore/ids_idr0016.txt ./
screen -dmS cache parallel --eta --sshloginfile nodes -a ids_idr0016.txt -j50 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC --render >> /tmp/render_20240130.log'

@will-moore
Copy link
Member Author

screen -r
Initially saw similar login errors as above (no DatabaseBusyException), then after a few (7) mins, lots of these...

WARNING:omero.gateway:UnknownLocalException on <class 'omero.gateway.OmeroGatewaySafeCallWrapper'> to <9163fa7c-d13d-46cf-971a-d83fc5c6e7d1omero.api.IContainer> getImages(('Image', (2299369,), None, <ServiceOptsDict: {'omero.client.uuid': '9163fa7c-d13d-46cf-971a-d83fc5c6e7d1', 'omero.session.uuid': '18305555-3dee-4ba0-be9b-79be7cc602ae', 'omero.group': '3'}>), {})
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_IContainer_ice.py", line 601, in getImages
    return _M_omero.api.IContainer._op_getImages.invoke(self, ((rootType, rootIds, options), _ctx))
Ice.UnknownLocalException: exception ::Ice::UnknownLocalException
{
    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
connection refused: Connection refused
}
WARNING:omero.gateway:UnknownLocalException on <class 'omero.gateway.OmeroGatewaySafeCallWrapper'> to <9163fa7c-d13d-46cf-971a-d83fc5c6e7d1omero.api.IContainer> getImages(('Image', (2299370,), None, <ServiceOptsDict: {'omero.client.uuid': '9163fa7c-d13d-46cf-971a-d83fc5c6e7d1', 'omero.session.uuid': '18305555-3dee-4ba0-be9b-79be7cc602ae', 'omero.group': '3'}>), {})
Traceback (most recent call last):
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero/gateway/__init__.py", line 4856, in __call__
    return self.f(*args, **kwargs)
  File "/opt/omero/server/venv3/lib64/python3.6/site-packages/omero_api_IContainer_ice.py", line 601, in getImages
    return _M_omero.api.IContainer._op_getImages.invoke(self, ((rootType, rootIds, options), _ctx))
Ice.UnknownLocalException: exception ::Ice::UnknownLocalException
{
    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
connection refused: Connection refused
}
ETA: 1634s Left: 334 AVG: 4.95s  omeroreadonly-1:32/32/45%/12.6s  omeroreadonly-2:31/47/54%/8.6s 

@will-moore
Copy link
Member Author

will-moore commented Jan 30, 2024

Checking logs... We see errors pretty quickly - probably corresponds to ConnectionRefusedException seen above:

ssh omeroreadonly-1
[wmoore@prod120-omeroreadonly-1 ~]$ less /tmp/render_20240130.log

Start: 2024-01-30 12:29:49.362540
Checking Plate:5045
max_planes: sizeC
max_images: 0
check timing: False
no_check - Don't connect to idr
0/2304 Render Image:2285547 25565 [Well A1, Field 1]
1/2304 Render Image:2285548 25565 [Well I17, Field 1]
2/2304 Render Image:2285549 25565 [Well I17, Field 2]
3/2304 Render Image:2285550 25565 [Well I17, Field 3]
4/2304 Render Image:2285551 25565 [Well I17, Field 4]
5/2304 Render Image:2285552 25565 [Well I17, Field 5]
6/2304 Render Image:2285553 25565 [Well I17, Field 6]
7/2304 Render Image:2285554 25565 [Well F8, Field 1]
8/2304 Render Image:2285555 25565 [Well F8, Field 2]
9/2304 Render Image:2285556 25565 [Well F8, Field 3]
10/2304 Render Image:2285557 25565 [Well F8, Field 4]
11/2304 Render Image:2285558 25565 [Well F8, Field 5]
12/2304 Render Image:2285559 25565 [Well F8, Field 6]
13/2304 Render Image:2285560 25565 [Well O10, Field 1]
14/2304 Render Image:2285561 25565 [Well O10, Field 2]
15/2304 Render Image:2285562 25565 [Well O10, Field 3]
16/2304 Render Image:2285563 25565 [Well O10, Field 4]
17/2304 Render Image:2285564 25565 [Well O10, Field 5]
18/2304 Render Image:2285565 25565 [Well O10, Field 6]
19/2304 Render Image:2285566 25565 [Well I18, Field 1]
20/2304 Render Image:2285567 25565 [Well I18, Field 2]
21/2304 Render Image:2285568 25565 [Well I18, Field 3]
22/2304 Render Image:2285569 25565 [Well I18, Field 4]
23/2304 Render Image:2285570 25565 [Well I18, Field 5]
24/2304 Render Image:2285571 25565 [Well I18, Field 6]
25/2304 Render Image:2285572 25565 [Well D16, Field 1]
26/2304 Render Image:2285573 25565 [Well D16, Field 2]
Error: RenderJpeg Image:2285573 25565 [Well D16, Field 2] exception ::Ice::UnknownLocalException
{
    unknown = Network.cpp:2357: Ice::ConnectionRefusedException:
connection refused: Connection refused
}
27/2304 Render Image:2285574 25565 [Well D16, Field 3]
Error: RenderJpeg Image:2285574 25565 [Well D16, Field 3] catching classes that do not inherit from BaseException is not allowed
28/2304 Render Image:2285575 25565 [Well D16, Field 4]
Error: RenderJpeg Image:2285575 25565 [Well D16, Field 4] catching classes that do not inherit from BaseException is not allowed
29/2304 Render Image:2285576 25565 [Well D16, Field 5]
Error: RenderJpeg Image:2285576 25565 [Well D16, Field 5] catching classes that do not inherit from BaseException is not allowed
30/2304 Render Image:2285577 25565 [Well D16, Field 6]
Error: RenderJpeg Image:2285577 25565 [Well D16, Field 6] catching classes that do not inherit from BaseException is not allowed
...

See almost as many Error as Render Image lines:

[wmoore@prod120-omeroreadonly-1 ~]$ grep "Render Image" /tmp/render_20240130.log | wc
 208105 1665034 11679808
[wmoore@prod120-omeroreadonly-1 ~]$ grep Error /tmp/render_20240130.log | wc
 205056 3895682 26376416
[wmoore@prod120-proxy ~]$ ssh omeroreadonly-2
Last login: Mon Jan 29 17:42:28 2024 from prod120-proxy
[wmoore@prod120-omeroreadonly-2 ~]$ grep "Render Image" /tmp/render_20240130.log | wc
  70050  560484 3931994
[wmoore@prod120-omeroreadonly-2 ~]$ grep Error /tmp/render_20240130.log | wc
  69130 1313294 8892059

@will-moore
Copy link
Member Author

At 13:24, updated to d6e0b1d while still running...
I expect this to be used the next time the check_pixels.py script is launched...

@will-moore
Copy link
Member Author

will-moore commented Jan 30, 2024

No Start since the script update above...

[wmoore@prod120-omeroreadonly-1 ~]$ grep Start /tmp/render_20240130.log | tail
Error: RenderJpeg Image:2682646 26580 [Well I5, Field 5] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:42:13.707600
Error: RenderJpeg Image:2649928 26232 [Well D4, Field 5] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:42:13.488344
Error: RenderJpeg Image:2675131 26569 [Well L11, Field 2] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:42:13.811556
Error: RenderJpeg Image:2647624 26224 [Well E11, Field 5] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:42:16.647255
Error: RenderJpeg Image:2675970 26574 [Well N12, Field 1] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:42:12.511049
Error: RenderJpeg Image:2596936 26092 [Well M13, Field 5] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:42:16.824447
67/1362 Render Image:2677626 26577 [Well L4, Field 1]Start: 2024-01-30 12:42:12.281491
Error: RenderJpeg Image:2613065 26128 [Well M8, Field 6] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:42:18.923561
65/1998 Render Image:2680648 26579 [Well I22, Field 5]Start: 2024-01-30 12:42:13.407455
Start: 2024-01-30 13:04:00.767939
[wmoore@prod120-omeroreadonly-1 ~]$ grep Error /tmp/render_20240130.log | wc
 206028 3914150 26501445
[wmoore@prod120-omeroreadonly-1 ~]$ grep "Render Image" /tmp/render_20240130.log | wc
 210114 1681106 11792861
[wmoore@prod120-omeroreadonly-2 ~]$ grep Start /tmp/render_20240130.log | tail
Error: RenderJpeg Image:2190879 24783 [Well A11, Field 4] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:41.464127
Error: RenderJpeg Image:2079575 24584 [Well D19, Field 4] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:39.116288
Error: RenderJpeg Image:2065382 24563 [Well P22, Field 5] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:51.844281
Error: RenderJpeg Image:2292521 25567 [Well M7, Field 2] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:42.449046
Error: RenderJpeg Image:2070821 24566 [Well P15, Field 5] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:41.804283
Error: RenderJpeg Image:2158897 24739 [Well B16, Field 4] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:41.837543
Error: RenderJpeg Image:2085170 24588 [Well H12, Field 5] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:50.748270
Error: RenderJpeg Image:2230320 25392 [Well D20, Field 3] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:41.738660
Error: RenderJpeg Image:2112412 24617 [Well C4, Field 5] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:51.109504
Error: RenderJpeg Image:2239059 25414 [Well N21, Field 3] catching classes that do not inherit from BaseException is not allowedStart: 2024-01-30 12:29:50.772802

Only 93 Plates processed - 413 in total, would expect half to be processed here...

[wmoore@prod120-omeroreadonly-1 ~]$ grep "A1, Field 1]" /tmp/render_20240130.log | wc
     93     744    4927

Checking webclient at http://localhost:1080/webclient/ now gives 502 Bad Gateway.

But no Errors in Blitz log

[wmoore@prod120-omeroreadwrite ~]$ grep Error /opt/omero/server/OMERO.server/var/log/Blitz-0.log | wc
      0       0       0

Let's restart the server, and run again - hopefully with better stack traces this time...

@will-moore
Copy link
Member Author

Use j30 this time to avoid warnings....

screen -dmS cache parallel --eta --sshloginfile nodes -a ids_idr0016.txt -j30 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC --render >> /tmp/render_20240130_b.log'

@will-moore
Copy link
Member Author

will-moore commented Jan 30, 2024

screen -r


parallel: Warning: ssh to omeroreadonly-1 only allows for 22 simultaneous logins.
parallel: Warning: You may raise this by changing /etc/ssh/sshd_config:MaxStartups and MaxSessions on omeroreadonly-1.
parallel: Warning: Using only 21 connections to avoid race conditions.
parallel: Warning: ssh to omeroreadonly-2 only allows for 25 simultaneous logins.
parallel: Warning: You may raise this by changing /etc/ssh/sshd_config:MaxStartups and MaxSessions on omeroreadonly-2.
parallel: Warning: Using only 24 connections to avoid race conditions.

Computers / CPU cores / Max jobs to run
1:omeroreadonly-1 / 8 / 21
2:omeroreadonly-2 / 8 / 24

Computer:jobs running/jobs completed/%of started jobs
ETA: 0s Left: 413 AVG: 0.00s  1:21/0/46%/0.0s  2:24/0/53%/0.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 412 AVG: 0.00s  1:21/1/47%/15.0s  2:24/0/52%/0.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 411 AVG: 0.00s  1:21/2/48%/7.5s  2:24/0/51%/0.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 410 AVG: 0.00s  1:21/2/47%/7.5s  2:24/1/52%/15.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 409 AVG: 0.00s  1:21/2/46%/7.5s  2:24/2/53%/7.5s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 408 AVG: 0.00s  1:21/3/48%/5.0s  2:24/2/52%/7.5s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 407 AVG: 0.00s  1:21/3/47%/5.0s  2:24/3/52%/5.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 406 AVG: 0.00s  1:21/4/48%/3.8s  2:24/3/51%/5.0s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 405 AVG: 0.00s  1:21/4/47%/3.8s  2:24/4/52%/3.8s ssh_exchange_identification: Connection closed by remote host
ETA: 0s Left: 404 AVG: 0.00s  1:21/4/46%/3.8s  2:24/5/53%/3.0s ssh_exchange_identification: Connection closed by remote host
ETA: 121s Left: 403 AVG: 1.20s  1:21/4/45%/6.8s  2:24/6/54%/4.5s 
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 340s Left: 403 AVG: 2.80s  omeroreadonly-1:21/4/45%/10.8s  omeroreadonly-2:24/6/54%/7.2s 

EDIT - next day
screen -r returns nothing.
Logs not found on omeroreadonly-1 !

Something went wrong?!

@will-moore
Copy link
Member Author

Trying again, j10.

$ screen -dmS cache parallel --eta --sshloginfile nodes -a ids_idr0016.txt -j10 '/opt/omero/server/OMERO.server/bin/omero login -s localhost -u public -w public && /opt/omero/server/venv3/bin/python /uod/idr/metadata/idr-utils/scripts/check_pixels.py --max-planes=sizeC --render >> /tmp/render_20240131.log'

screen -r
See lots of:

InternalException: Failed to connect: [Errno 28] No space left on device: path('/home/wmoore/omero/sessions/localhost/public/c3545329-d862-4115-bf3d-b6702365f05c')

Seems we've run out of space on omeroreadonly-1...

[wmoore@prod120-proxy ~]$ ssh omeroreadonly-1
Last login: Wed Jan 31 09:40:19 2024 from prod120-proxy
[wmoore@prod120-omeroreadonly-1 ~]$ 
[wmoore@prod120-omeroreadonly-1 ~]$ 
[wmoore@prod120-omeroreadonly-1 ~]$ df ./
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/vda1       83874796 83874776        20 100% /

This is due to logs, including 71G of master.err:

[wmoore@prod120-omeroreadonly-1 ~]$ ls -alh /opt/omero/server/OMERO.server/var/log/
total 75G
drwxrwxr-x. 2 omero-server omero-server  270 Jan 30 12:39 .
drwxr-xr-x. 5 omero-server omero-server   47 Dec  5 14:41 ..
-rw-rw-r--. 1 omero-server omero-server  45M Jan 31 10:00 Blitz-0.log
-rw-rw-r--. 1 omero-server omero-server 501M Jan 30 12:39 Blitz-0.log.1
-rw-rw-r--. 1 omero-server omero-server 501M Jan 29 17:48 Blitz-0.log.2
-rw-rw-r--. 1 omero-server omero-server 501M Jan 27 06:15 Blitz-0.log.3
-rw-rw-r--. 1 omero-server omero-server 501M Jan 23 03:05 Blitz-0.log.4
-rw-rw-r--. 1 omero-server omero-server 501M Jan 20 00:47 Blitz-0.log.5
-rw-rw-r--. 1 omero-server omero-server 501M Jan 16 09:46 Blitz-0.log.6
-rw-rw-r--. 1 omero-server omero-server 501M Jan 12 01:19 Blitz-0.log.7
-rw-rw-r--. 1 omero-server omero-server 501M Jan  8 13:04 Blitz-0.log.8
-rw-rw-r--. 1 omero-server omero-server 501M Jan  6 02:37 Blitz-0.log.9
-rw-rw-r--. 1 omero-server omero-server  71G Jan 31 10:01 master.err
-rw-rw-r--. 1 omero-server omero-server 1.2K Jan 19 15:17 master.out
-rw-rw-r--. 1 omero-server omero-server  80K Jan 30 13:04 Tables-0.log

Which contains lots of...

[wmoore@prod120-omeroreadonly-1 ~]$ tail /opt/omero/server/OMERO.server/var/log/master.err 
...
{AsyncHttpConnection@5c29d9cc,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} to io.prometheus.jmx.shaded.org.eclipse.jetty.server.nio.SelectChannelConnector$ConnectorSelectorManager@e934082
2024-01-30 15:43:08.473:WARN:ipjsoeji.nio:Dispatched Failed! SCEP@7a2888ba{l(/192.168.120.206:58796)<->r(/192.168.120.132:9180),d=false,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}-{AsyncHttpConnection@40119b09,g=HttpGenerator{s=0,h=-1,b=-1,c=-1},p=HttpParser{s=-14,l=0,c=0},r=0} to io.prometheus.jmx.shaded.org.eclipse.jetty.server.nio.SelectChannelConnector$ConnectorSelectorManager@e934082
2024-01-30 15:43:08.473:WARN:ipjsoeji.nio:Dispatched Failed! SCEP@761d50ce{l(/192.168.120.206:60260)<->r(/192.168.120.132:9180),d=false,open=true,ishut=false,oshut=false,rb=false,wb=false,w=true,i=1r}-{AsyncHttpConnection@63a28a09,g=HttpGenerator{s=0,h=-

Going to delete that, restart the server and run parallel again, with lower settings...

@will-moore
Copy link
Member Author

will-moore commented Jan 31, 2024

Seems that deleting that file doesn't free up space... even though we only have 4.5G in logs dir now instead of 75G:

[wmoore@prod120-omeroreadonly-1 ~]$ sudo rm /opt/omero/server/OMERO.server/var/log/master.err 
[wmoore@prod120-omeroreadonly-1 ~]$ df ./
Filesystem     1K-blocks     Used Available Use% Mounted on
/dev/vda1       83874796 83874776        20 100% /
[wmoore@prod120-omeroreadonly-1 ~]$ df -h /opt/omero/server/OMERO.server/var/log/
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1        80G   80G   20K 100% /
[wmoore@prod120-omeroreadonly-1 ~]$ ls -alh /opt/omero/server/OMERO.server/var/log/
total 4.5G
...

However, it looks like omeroreadonly-2 still has 70GB free, so I can continue to use that for now...
Instead of using parallel, let's just try running check_pixels --render in a single thread. If this is sufficient to cause issues then it's easier to debug, reproduce etc...
on idr-next:omeroreadonly-2,
Using a single Plate from idr0016...

cd /uod/idr/metadata/idr-utils/scripts
python check_pixels.py --render Plate:6151 > /tmp/render_20240130_plate6151.log

Edit: 13:13 (2 hours)
Still running with no errors - so this isn't sufficient to expose the issues...

[wmoore@prod120-omeroreadonly-2 ~]$ tail -f /tmp/render_20240130_plate6151.log
1313/2304 Render Image:3277064 24280 [Well O19, Field 5]
1314/2304 Render Image:3277065 24280 [Well O19, Field 6]
1315/2304 Render Image:3277066 24280 [Well C3, Field 1]
1316/2304 Render Image:3277067 24280 [Well C3, Field 2]
1317/2304 Render Image:3277068 24280 [Well C3, Field 3]
1318/2304 Render Image:3277069 24280 [Well C3, Field 4]
1319/2304 Render Image:3277070 24280 [Well C3, Field 5]
1320/2304 Render Image:3277071 24280 [Well C3, Field 6]
1321/2304 Render Image:3277072 24280 [Well J13, Field 1]
1322/2304 Render Image:3277073 24280 [Well J13, Field 2]

@will-moore
Copy link
Member Author

Moving --render testing to #62 now (where that option is added)...

@will-moore will-moore mentioned this pull request Jan 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant