submit command results in memory errors and rejects with <EOFError: end of file reached> invalid SIP descriptor errors and defunct processes #777

lydiam · 2015-12-14T19:41:01Z

I submitted the following command on fclnx30:

   sudo -b -u daitss submit --username lydiam --password fclad00d2 --source FTPDL --path /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803 --batch FTPDL_20151212T090803

The result was that 6-8 of about 144 SIPs submitted successfully and the remainder rejected with the following error:

   sip path: /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/AA00036241_00001; 
   Invalid SIP descriptor. XML validation errors:
   line:unknown: msg:#<EOFError: end of file reached>

The submit log contained the "standard" log messages, with the majority of packages being rejected and some being submitted.:

   2015-12-14 12:57:47 INFO 2015-12-14 12:57:47 -0500 -- AA00036194_00001 -- rejected: E14HY8Q8F_RM36BJ
   2015-12-14 12:57:50 INFO 2015-12-14 12:57:50 -0500 -- AA00036192_00001 -- submitted successfully: EIQJ0H9KI_IKIHUJ
   2015-12-14 12:57:50 INFO 2015-12-14 12:57:50 -0500 -- AA00037821_00001 -- rejected: E30TCY4OV_ONW141

At the same time I go memory errors in STDOUT and logs named hs_err_pid*.log, with the following contents:
[lydiam@fclnx30 ftpdl]$ more hs_err_pid15104.log
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 32744 bytes for ChunkPool::allocate
# Possible reasons:
# The system is out of physical RAM or swap space
# In 32 bit mode, the process size limit was hit
# Possible solutions:
# Reduce memory load on the system
# Increase physical memory or swap space
# Check if swap backing store is full
# Use 64 bit Java on a 64 bit OS
# Decrease Java heap size (-Xmx/-Xms)
# Decrease number of Java threads
# Decrease Java thread stack sizes (-Xss)
# Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
# Out of Memory Error (allocation.cpp:214), pid=15104, tid=47784611424576
#
# JRE version: OpenJDK Runtime Environment (7.0_85-b01) (build 1.7.0_85-mockbuild_2015_07_13_18_00-b00)
# Java VM: OpenJDK 64-Bit Server VM (24.85-b03 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 2.6.1
# Distribution: Built on Red Hat Enterprise Linux Server release 5.11 (Tikanga) (Mon Jul 13 18:00:16 EDT 2015)
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again

--------------- T H R E A D ---------------

Current thread (0x000000001a667000): JavaThread "C2 CompilerThread0" daemon [_thread_in_native, id=15130, stack(0x00002b75b8ba2000,0x00002b75b8ca3000)]

Stack: [0x00002b75b8ba2000,0x00002b75b8ca3000], sp=0x00002b75b8c9d440, free space=1005k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V [libjvm.so+0x9e3ece]
V [libjvm.so+0x9e45fb]
V [libjvm.so+0x4dcf85]
V [libjvm.so+0x2fdea0]
....

These logs can be found in: /var/daitss/ops/incoming/ftpdl

There are also lots of defunct processes:

daitss 27706 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27727 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27745 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27766 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27787 27690 0 12:53 pts/0 00:00:00 [ruby]
daitss 27809 27690 0 12:53 pts/0 00:00:00 [ruby]

And other daitss processes from Dec. 13:

daitss 23705 1 0 Dec13 ? 00:03:44 Rack: /opt/web-services/sites/core/current
daitss 23713 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/core/current
daitss 23719 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/core/current
daitss 23725 1 0 Dec13 ? 00:00:21 Rack: /opt/web-services/sites/core/current
daitss 23736 1 0 Dec13 ? 00:00:49 Rack: /opt/web-services/sites/actionplan/current
daitss 23743 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/actionplan/current
daitss 23749 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/actionplan/current
daitss 23755 1 0 Dec13 ? 00:00:17 Rack: /opt/web-services/sites/actionplan/current
daitss 23763 1 0 Dec13 ? 00:00:02 Rack: /opt/web-services/sites/storage-master/current
daitss 23773 1 0 Dec13 ? 00:00:02 Rack: /opt/web-services/sites/storage-master/current
daitss 23787 1 0 Dec13 ? 00:00:00 Rack: /opt/web-services/sites/transform/current

I will kill the defunct processes and attempt to restart DAITSS.

Gerald suggested attempting submission of one package at a time.

The text was updated successfully, but these errors were encountered:

lydiam · 2015-12-14T20:09:38Z

After stopping and restarting DAITSS, ruby is still spawning all sorts of processes:

[lydiam@fclnx30 submit-direct]$ postgres 5364 9871 96 04:22 ? root 6159 24970 0 14:29 pts/3 root 6163 6159 0 14:30 pts/3 daitss 6164 6163 0 14:30 pts/3 daitss 14663 1 0 15:00 ? daitss 14675 1 0 15:00 ? daitss 14681 1 4 15:00 ? daitss 14686 1 0 15:00 ? postgres 14752 9871 0 15:00 ? daitss 14753 14681 7 15:00 ? postgres 14762 9871 2 15:01 ? daitss 14770 14640 0 15:01 ? daitss 14779 1 18 15:01 ? daitss 14786 1 0 15:01 ? daitss 14792 1 0 15:01 ? daitss 14798 1 0 15:01 ? daitss 14895 14640 1 15:01 ? daitss 14908 1 2 15:01 ? daitss 14916 1 0 15:01 ? daitss 14925 1 0 15:01 ? daitss 14931 1 0 15:01 ? daitss 15871 14640 0 15:02 ? daitss 16243 1 0 15:02 ? daitss 16250 1 0 15:02 ? daitss 16258 1 0 15:02 ? daitss 16265 1 0 15:02 ? daitss 16273 1 1 15:02 ? daitss 16441 1 0 15:02 ? daitss 16460 14640 0 15:02 ? daitss 16466 1 0 15:02 ? daitss 16473 1 0 15:02 ? daitss 16485 1 0 15:02 ? daitss 16491 1 0 15:02 ? daitss 16501 1 0 15:02 ? daitss 16520 1 0 15:02 ? daitss 16528 1 0 15:02 ? daitss 16531 14640 0 15:02 ? daitss 16541 1 0 15:02 ? daitss 16549 1 0 15:02 ? daitss 16558 1 0 15:02 ? daitss 16564 1 0 15:02 ? daitss 16592 1 0 15:02 ? daitss 16598 1 1 15:02 ? daitss 16629 1 0 15:02 ? lydiam 18520 21654 0 15:06 pts/0 daitss 32406 10421 0 04:04 ? daitss 32407 32406 0 04:04 ? daitss 32411 32406 0 04:04 ? daitss 32412 32411 0 04:04 ? daitss 32416 32407 0 04:04 ? daitss 32417 32407 0 04:04 ? daitss 32418 32416 4 04:04 ? [lydiam@fclnx30 submit-direct]$ ps -ef | grep daitss
10:23:00 postgres: daitss2 daitss_db 127.0.0.1(60080) SELECT
00:00:00 sudo su - daitss
00:00:00 su - daitss
00:00:00 -bash
00:00:01 thin server (0.0.0.0:7002) [describe]
00:00:01 thin server (0.0.0.0:7003) [describe]
00:00:15 ruby /opt/web-services/sites/core/current/bin/pulse
00:00:03 ruby ./bin/mailer-daemon
00:00:00 postgres: daitss2 daitss_db 127.0.0.1(56442) idle
00:00:27 E87C3IH70_T30OFO.ingest
00:00:08 postgres: daitss2 daitss_db 127.0.0.1(56444) idle
00:00:00 Passenger ApplicationSpawner: /opt/web-services/sites/viruscheck/current
00:01:01 Rack: /opt/web-services/sites/viruscheck/current
00:00:00 Rack: /opt/web-services/sites/viruscheck/current
00:00:00 Rack: /opt/web-services/sites/viruscheck/current
00:00:00 Rack: /opt/web-services/sites/viruscheck/current
00:00:04 Passenger ApplicationSpawner: /opt/web-services/sites/core/current
00:00:07 Rack: /opt/web-services/sites/core/current
00:00:00 Rack: /opt/web-services/sites/core/current
00:00:00 Rack: /opt/web-services/sites/core/current
00:00:00 Rack: /opt/web-services/sites/core/current
00:00:01 Passenger ApplicationSpawner: /opt/web-services/sites/actionplan/current
00:00:00 Rack: /opt/web-services/sites/actionplan/current
00:00:00 Rack: /opt/web-services/sites/actionplan/current
00:00:00 Rack: /opt/web-services/sites/actionplan/current
00:00:00 Rack: /opt/web-services/sites/actionplan/current
00:00:02 Rack: /opt/web-services/sites/storage-master/current
00:00:02 Rack: /opt/web-services/sites/storage-master/current
00:00:00 Passenger ApplicationSpawner: /opt/web-services/sites/transform/current
00:00:00 Rack: /opt/web-services/sites/transform/current
00:00:02 Rack: /opt/web-services/sites/storage-master/current
00:00:00 Rack: /opt/web-services/sites/transform/current
00:00:02 Rack: /opt/web-services/sites/silos/current
00:00:02 Rack: /opt/web-services/sites/silos/current
00:00:02 Rack: /opt/web-services/sites/storage-master/current
00:00:00 Rack: /opt/web-services/sites/transform/current
00:00:00 Passenger ApplicationSpawner: /opt/web-services/sites/xmlresolution/current
00:00:00 Rack: /opt/web-services/sites/xmlresolution/current
00:00:00 Rack: /opt/web-services/sites/xmlresolution/current
00:00:00 Rack: /opt/web-services/sites/transform/current
00:00:02 Rack: /opt/web-services/sites/silos/current
00:00:00 Rack: /opt/web-services/sites/xmlresolution/current
00:00:02 Rack: /opt/web-services/sites/silos/current
00:00:00 Rack: /opt/web-services/sites/xmlresolution/current
00:00:00 grep daitss
00:00:00 crond
00:00:00 /bin/bash /usr/bin/run-parts /opt/fda/etc/daitss.daily
00:00:00 /usr/sbin/sendmail -FCronDaemon -i -odi -oem -oi -t
00:00:00 /usr/sbin/postdrop -r
00:00:00 /bin/sh /opt/fda/etc/daitss.daily/run-collect-fixities
00:00:00 awk -v progname=/opt/fda/etc/daitss.daily/run-collect-fixities progname {????? print progname ":\n"????? progname="";???? }???? { print; }
00:31:51 collect-fixities

Ingest seems to be processing. Once all submitted packages are ingested I will attempt to submit one package at a time.

lydiam · 2015-12-14T20:53:46Z

Now the AIP descriptor validation failed with the error we were getting on Friday:

descriptor fails daitss aip xml validation (1 errors)
unknown,: #<EOFError: end of file reached>

I reset the package and started manually. That worked on Friday.

Failed again with the same error. I will restart daitss again.

I restarted daitss and stopped pulse.

Restarted another package manually and it appears to be succeeding. Archived E30MZ6Y4F_63VOVM .
Another package archived.

Other packages failed repeatedly at aip descriptor validation.

lydiam · 2015-12-14T20:54:56Z

Gerald suggested that collect-fixities be killed. I did that and issued a stop-cron command: sudo -u daitss touch /opt/fda/etc/stop-cron.

I stopped and started daitss again, and the rack processes still show up.

lydiam · 2015-12-14T20:59:06Z

Packages are failing repeatedly with multiple errors such as:

2015-12-14 15:55:14 -0500
message

descriptor fails daitss aip xml validation (1 errors)
unknown,: #

trace

/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/template/descriptor.rb:49:in validate_aip_descriptor' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/ingest.rb:41:inblock in ingest'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/journal.rb:16:in step' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/ingest.rb:41:iningest'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:82:in block in spawn' /opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:infork'
/opt/web-services/sites/core/releases/20141117161651/lib/daitss/proc/wip/process.rb:66:in spawn' /opt/web-services/sites/core/releases/20141117161651/app.rb:949:inblock in '
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1602:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1602:inblock in compile!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:in []' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:inblock (3 levels) in route!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:985:in route_eval' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:966:inblock (2 levels) in route!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1006:in block in process_route' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:incatch'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1004:in process_route' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:964:inblock in route!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:in each' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:963:inroute!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1076:in block in dispatch!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:inblock in invoke'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in catch' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:ininvoke'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:in dispatch!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:inblock in call!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in block in invoke' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:incatch'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in invoke' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:incall!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/commonlogger.rb:33:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:217:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:945:inforward'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1019:in route_missing' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:980:inroute!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:976:in route!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1076:inblock in dispatch!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in block in invoke' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:incatch'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in invoke' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1073:indispatch!'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in block in call!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:inblock in invoke'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:in catch' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1058:ininvoke'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:898:in call!' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:886:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/xss_header.rb:18:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/path_traversal.rb:16:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/json_csrf.rb:18:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/base.rb:49:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-protection-1.5.3/lib/rack/protection/frame_options.rb:31:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/session/abstract/id.rb:225:in context' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/session/abstract/id.rb:220:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/nulllogger.rb:9:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/head.rb:11:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/rack-1.5.2/lib/rack/methodoverride.rb:21:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:180:incall'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:2014:in call' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:inblock in call'
/opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1788:in synchronize' /opt/web-services/sites/core/shared/bundle/ruby/1.9.1/gems/sinatra-1.4.5/lib/sinatra/base.rb:1478:incall'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/request_handler.rb:97:in process_request' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_request_handler.rb:520:inaccept_and_process_next_request'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_request_handler.rb:274:in main_loop' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/application_spawner.rb:206:instart_request_handler'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/application_spawner.rb:171:in block in handle_spawn_application' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/utils.rb:470:insafe_fork'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/application_spawner.rb:166:in handle_spawn_application' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:357:inserver_main_loop'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:206:in start_synchronously' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:180:instart'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/rack/application_spawner.rb:129:in start' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:253:inblock (2 levels) in spawn_rack_application'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server_collection.rb:132:in lookup_or_add' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:246:inblock in spawn_rack_application'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server_collection.rb:82:in block in synchronize' :10:insynchronize'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server_collection.rb:79:in synchronize' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:244:inspawn_rack_application'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:137:in spawn_application' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/spawn_manager.rb:275:inhandle_spawn_application'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:357:in server_main_loop' /opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/lib/phusion_passenger/abstract_server.rb:206:instart_synchronously'
/opt/ruby-1.9.3-p545/lib/ruby/gems/1.9.1/gems/passenger-3.0.21/helper-scripts/passenger-spawn-server:99:in `
'
Reset and restarted another package and it seems to be working past the aip descriptor validation point.

lydiam · 2015-12-14T21:13:53Z

Since packages aren't ingesting I tried to submit SIPs one at a time and got the java runtime error and a reject message:
[lydiam@fclnx30 FTPDL_20151212T090803]$ sudo -b -u daitss submit --username lydiam --password fclad00d2 --source FTPDL --package /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/AA00025584_00046 --batch FTPDL_20151212T090803
[lydiam@fclnx30 FTPDL_20151212T090803]$ #
# There is insufficient memory for the Java Runtime Environment to continue.
# pthread_getattr_np
# An error report file with more information is saved as:
# /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/hs_err_pid6107.log

jonpitts · 2015-12-14T21:14:17Z

It has been a while but I seem to remember we setup daitss to spawn multiple occurrences of specific web services to help with load. Such as more than one description service.

Some of the other processes listed above are spawned by daitss such as a mailer daemon and connection to the database. Also, anytime a user logs into daitss Apache will spawn a core process for that user. Not saying that there isn't something wrong with the number of daitss processes but its too hard for me to tell.

I would focus on the JVM error. One of two things must have occured for that error to generate. The system either ran out of Swap memory having gone through the other forms of memory first. Or the JVM settings need to be tweaked in the daitss config file.

Daitss relies on the JRE(Java Runtime Environment) for ruby gem RJB https://rubygems.org/gems/rjb/versions/1.5.4 to do xml_validation.

We use it here: https://github.com/daitss/core/blob/master/lib/daitss/proc/xmlvalidation.rb#L54

In the daitss config you will need to look at this line:
https://github.com/daitss/core/blob/master/daitss-config.example.yml#L78

Xms2G - the starting heap size is 2GB
Xmx2G - the max heap size is 2GB

Basically, every JVM spawned will take up 2GB of memory regardless of what is needed. You may want to decrease the starting heap size to Xms256m and increase the max heap size to Xmx4G. This could help with memory allocation. It will allow a single JVM process to grow larger than 2GB if needed but keep other JVM processes smaller. Keep in mind this is specific to the use of the JRE in daitss and not other processes on the daitss server using a JVM.

jonpitts · 2015-12-14T22:10:47Z

Something to think about - if there are other memory errors not relating to the JRE that can indicate a host of other potential problems. You will need a baseline. Stop daitss and look at the memory usage. Does it look normal?

Check swap space also:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Swap_Space-Checking_Swap_Space_Size_and_Usage.html

Fire things up and see what memory looks like. It may be a good idea to disable GUI submission or better yet disable the UI completely. Submit packages from the command line and see what's going on with memory.

lydiam · 2015-12-14T22:20:32Z

Current settings in /opt/web-services/conf.d/daitss-config.ymljjj

jvm_options:
- -Xmx4096m
- -Dhttp.proxyHost=localhost
- -Dhttp.proxyPort=3128

describe.fda.fcla.edu:
log_syslog_facility: LOG_LOCAL0
max_pdf_bitstreams: 100
pdfa_validator: /opt/pdfapilot/pdfaPilot -a $INPUT_FILE$ --noprogress --nohits --level=$LEVEL$ --report=XML,ALWAYS,PATH=$REPORT_FILE$
jvm_options:
- -Dhttp.proxyHost=localhost
- -Dhttp.proxyPort=3128
- -Xmx512m

(no Xms setting at all)

lydiam · 2015-12-15T14:33:08Z

NWRDC has just rebooted fclxn30, and daitss processes and pulse started up. Pulse started ingest on

 daitss   10884 10398 33 09:04 ?        00:05:15 E38C3ZIKV_YDQ7FX.ingest

Thats a 7GB+ package, so it will take some time to process. I've stopped pulse. I will not attempt submission on any new packages right now. There are no new packages ftp-ed to eclipsep yesterday.

Actual memory use is 6% and 0% swap, with physical 44.9%

I'm guessing that it's just trying to store the package, because all other steps were completed yesterday:

make aip descriptor 2015-12-14T14:03:49-05:00 112.63
validate aip descriptor 2015-12-14T14:05:41-05:00 007.29
make tarball 2015-12-14T14:05:49-05:00 070.93

I'm seeing some Passenger processes that are unfamiliar to me:

 root     10322 10307  0 09:04 ?        00:00:00 PassengerWatchdog
 root     10325 10322  1 09:04 ?        00:00:20 PassengerHelperAgent
 root     10327 10325  0 09:04 ?        00:00:00 Passenger spawn server

This package errored out when DAITSS was shut down while it was storing.

jonpitts · 2015-12-15T14:46:17Z

Xms setting is optional. Leaving it off will start the JVM with a heap size of 0. This is okay as long as there is memory to be allocated when the JVM needs it. It will only allocate up to the Xmx setting.

I would run some of the packages that failed before and monitor the memory. If they succeed then its not really a problem with the JVM. It propbably means the JVM cannot allocate memory because there is none to be had. I'd be willing to bet fclxn30 needs more memory. If that is not an option I would look at increasing the swap size to compensate. It still may be a good idea to disable the GUI for a time.

lydiam · 2015-12-15T14:51:18Z

There doesn't appear to be any current option to stop only the daitss GUI, so we'd require a developer to provide that option.

jonpitts · 2015-12-15T14:54:53Z

There is an option for the operator to disable GUI submission from the UI. Affiliates may still try to download/ disseminate packages.

A sysadmin should be able to setup a type of redirect for anyone outside of your organization. That would keep people from being able to connect to the GUI.

lydiam · 2015-12-15T15:42:33Z

All packages currently in the workspace were manually restarted and successfully processed.

lydiam · 2015-12-15T15:50:20Z

Retried submission of a single package that had failed yesterday.

Failed again, with the same symptoms:

   [lydiam@fclnx30 UF00028293_00564]$ sudo -b -u daitss submit --username lydiam --password fclad00d2 --source FTPDL --package /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/AA00025584_00046
   [sudo] password for lydiam: 
   [lydiam@fclnx30 UF00028293_00564]$ #
   # There is insufficient memory for the Java Runtime Environment to continue.
   # Native memory allocation (malloc) failed to allocate 16 bytes for AllocateHeap
   # An error report file with more information is saved as:
   # /var/daitss/ops/incoming/ftpdl/FTPDL_20151212T090803/UF00028293_00564/hs_err_pid6121.log

Xymon reports memory usage at 6% actual.

Attempted to submit one of the FDA_benchmarks packages with the same results:

   [lydiam@fclnx30 ETDs]$ sudo -b -u daitss submit --username lydiam --password fclad00d2 --source benchmarks --package /var/daitss/ops/FDA_benchmarks/ETDs/FDAE001612

[lydiam@fclnx30 ETDs]$ #
# There is insufficient memory for the Java Runtime Environment to continue.
# pthread_getattr_np
# An error report file with more information is saved as:
# /var/daitss/ops/FDA_benchmarks/ETDs/hs_err_pid7425.log

lydiam · 2015-12-15T16:01:59Z

GUI submission for the above benchmark package succeeded: E4P2B1PWE_5W0LH1

lydiam · 2015-12-15T16:14:03Z

Downloading SIPs and submitting them via the GUI, where possible, and starting them manually.
Package AA00025584_00046, rejected 3 times via command-line submit, was successfully manually submitted via the GUI and successfully archived after manually starting the package.

There appears to be some problem with the command-line submit and pulse processes. What's the difference between a package started manually and one started by pulse?

lydiam · 2015-12-15T19:35:01Z

Bill K. experimented with submit, submitting as Stephen's user and starting pulse as Stephen's user and also removed the memory limit from my user. I resubmitted the batch of records and they appear to have submitted successfully and are now being successfully processed.

So my biggest problem appears to have been the memory limit on my user. Bill thinks that there are multiple problems, including a possibly memory leak. More details to follow.

lydiam · 2015-12-15T19:37:15Z

Bill's assessment of the problem:

We actually have 3 different issues. One is the memory problems upon submission. I think we have resolved this memory issue. There was a parameter in Lydia’s .bash_profile that was limiting her memory usage. This seems to have resolved this issue, but want to wait a little while longer before declaring victory.

The second issue, the EOF error, has been an intermittent problem for a while now according to Lydia. There are no filesystem, CPU or memory constraints on the server during the time the error occurs. It does not seem to depend on who submitted the package to the app, who started the app, or how long the server has been up.

The third is another memory issue, in which all of the server memory is consumed by the daitss processes. This looks to be a memory leak in the application.

I do not think the second and third problems are hardware or OS related. I have exhausted my limited Ruby skills looking at the problem.

lydiam · 2015-12-15T21:48:33Z

I’ve mostly cleaned up the mess created from multiple submissions/rejects of UF records and submitted a batch of FIU records that were FTP-ed on Friday and am getting the same submit memory error again. Only 2 packages submitted successfully, and 18 got the same error:

sip path: /var/daitss/ops/incoming/nonUF_UFAD/FIU/FI15042500;
Invalid SIP descriptor. XML validation errors:
line:unknown: msg:#<EOFError: end of file reached>

Actual memory is at 21% and physical at 79%

[lydiam@fclnx30 ops]$ sudo -b -u daitss submit --username lydiam --password fclad00d2 --source FTPDL --path /var/daitss/ops/incoming/nonUF_UFAD/FIU --batch FTPDL_20151212T090803_FIU
[sudo] password for lydiam:
[lydiam@fclnx30 ops]$ #
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 16 bytes for AllocateHeap
# An error report file with more information is saved as:
# /var/daitss/ops/hs_err_pid25083.log
#
# There is insufficient memory for the Java Runtime Environment to continue.
# pthread_getattr_np
# An error report file with more information is saved as:
# /var/daitss/ops/hs_err_pid25106.log
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 41 bytes for AllocateHeap
# An error report file with more information is saved as:
# /var/daitss/ops/hs_err_pid25127.log

jonpitts · 2015-12-15T21:58:33Z

Assuming the same packages fail again and again with the same Java error and it doesn't appear the physical memory capped out maybe its time to try increasing the JVM memory setting.

Instead of -Xmx4096m try -Xmx8G. That will allow the JVM to grab more memory if needed. If it has no impact then switch it back.

Also, is there something about these packages that are in common? Stephen seemed to indicate one of the affiliates is sending large/ unusual packages. Are there a lot of .mov files in these packages?

lydiam · 2015-12-16T13:27:29Z

Retried the submit command with a new session. I likely used an old session to do the submission, one that didn't register the memory changes that Bill K. made for my user.

jonpitts · 2015-12-16T15:37:15Z

It may be a worthwhile attempt to find out what user Apache runs as and determine if it has any of those same memory restrictions. The user will typically be "apache" or "nobody" and is usually configured in an httpd.conf file. It could even be configured to run as the "daitss" user.

cchou · 2016-01-07T20:31:26Z

I remember the EOF error is what the xml parser returns whenever the parser has a problem parsing an XML file. We used get this error with those invalid characters in the descriptors. The work done in submit has fixed most of the EOF, however, there could be some additional scenarios which can cause XML parsing to fail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

submit command results in memory errors and rejects with <EOFError: end of file reached> invalid SIP descriptor errors and defunct processes #777

submit command results in memory errors and rejects with <EOFError: end of file reached> invalid SIP descriptor errors and defunct processes #777

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

jonpitts commented Dec 14, 2015

jonpitts commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 15, 2015

jonpitts commented Dec 15, 2015

lydiam commented Dec 15, 2015

jonpitts commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

jonpitts commented Dec 15, 2015

lydiam commented Dec 16, 2015

jonpitts commented Dec 16, 2015

cchou commented Jan 7, 2016

submit command results in memory errors and rejects with <EOFError: end of file reached> invalid SIP descriptor errors and defunct processes #777

submit command results in memory errors and rejects with <EOFError: end of file reached> invalid SIP descriptor errors and defunct processes #777

Comments

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 14, 2015

jonpitts commented Dec 14, 2015

jonpitts commented Dec 14, 2015

lydiam commented Dec 14, 2015

lydiam commented Dec 15, 2015

jonpitts commented Dec 15, 2015

lydiam commented Dec 15, 2015

jonpitts commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

lydiam commented Dec 15, 2015

jonpitts commented Dec 15, 2015

lydiam commented Dec 16, 2015

jonpitts commented Dec 16, 2015

cchou commented Jan 7, 2016