Hello
I was using since time datashare v. 19.8.9. To override some limits i decided to jump in and i recompiled some java files
I have an huge data collection, around 35M files with huge nesting and various filesize
-
IndexTask.java here there is a parameter for the default timeout for indexing, original was 30 minutes, i raised it to 240 minutes
-
Queue Capacity was 1M files; in MemoryDocumentCollectionFactory.java i raised this parameter from 1M to 10M
-
SecureContentHandler.java: here there is the parameter for nesting, original was 100 raised to 2000
-
answer Timeout. i found index-R3tvpy4W.js and index-pzTkhhgC.js, here the requestTimeout was 6e4, raised to 6e5
-
during the indexing i also got tons of warns about the need to increase xmlreadutils, file is XMLReaderUtils.java, original param was 10, raised to 100
Compiled, got the class files and put in the right place in datashare jar file.
Now i am trying datashare 20.5.2. In general setup there are the settings about Queue Capacity (by default 1M) and Max Content Length (by default 20M). These options look that they are just there but they do not work, in Max Content Length whatever i write, during indexing i got messages like this document
id 6b351db21823fae5dc49846d0fef891f42dcc85450e9e68c9de169d3293f6fddf61e3762967e6a29506ecdc241acc5ec extracted text will be truncated to 20000000 bytes
i tryed to give a folder with 6M files but just first Milion is indexed, so also queue capacity doesn't work.
in datashare 19.8.9 I already figured out what to do to change this max content limit from 20M.
Why in datashare 20.5.2 these options are in the config menu but still don't work?
Datashare is interesting but the option menu needs to be fixed
Hello
I was using since time datashare v. 19.8.9. To override some limits i decided to jump in and i recompiled some java files
I have an huge data collection, around 35M files with huge nesting and various filesize
IndexTask.java here there is a parameter for the default timeout for indexing, original was 30 minutes, i raised it to 240 minutes
Queue Capacity was 1M files; in MemoryDocumentCollectionFactory.java i raised this parameter from 1M to 10M
SecureContentHandler.java: here there is the parameter for nesting, original was 100 raised to 2000
answer Timeout. i found index-R3tvpy4W.js and index-pzTkhhgC.js, here the requestTimeout was 6e4, raised to 6e5
during the indexing i also got tons of warns about the need to increase xmlreadutils, file is XMLReaderUtils.java, original param was 10, raised to 100
Compiled, got the class files and put in the right place in datashare jar file.
Now i am trying datashare 20.5.2. In general setup there are the settings about Queue Capacity (by default 1M) and Max Content Length (by default 20M). These options look that they are just there but they do not work, in Max Content Length whatever i write, during indexing i got messages like this document
id 6b351db21823fae5dc49846d0fef891f42dcc85450e9e68c9de169d3293f6fddf61e3762967e6a29506ecdc241acc5ec extracted text will be truncated to 20000000 bytes
i tryed to give a folder with 6M files but just first Milion is indexed, so also queue capacity doesn't work.
in datashare 19.8.9 I already figured out what to do to change this max content limit from 20M.
Why in datashare 20.5.2 these options are in the config menu but still don't work?
Datashare is interesting but the option menu needs to be fixed