backup solr: should save all of /data/solr, not just the index #4

tlvu · 2020-01-22T21:03:34Z

The catalog_search.ipynb
(https://pavics.ouranos.ca/jupyter/user/public/lab/tree/tutorial-notebooks/catalog_search.ipynb)
notebook was failing with this error:

owslib.wps.WPSException : {'code': 'NoApplicableCode', 'locator': 'None', 'text': 'Process error: method=wps_pavicsearch.py._handler, line=254, msg=Traceback (most recent call last):\n File "/usr/local/lib/python2.7/dist-packages/pavics_datacatalog-0.6.11-py2.7.egg/pavics_datacatalog/wps_processes/wps_pavicsearch.py", line 251, in _handler\n output_format=output_format)\n File "/usr/local/lib/python2.7/dist-packages/pavics/catalog.py", line 973, in pavicsearch\n r.raise_for_status()\n File "/usr/lib/python2.7/dist-packages/requests/models.py", line 840, in raise_for_status\n raise HTTPError(http_error_msg, response=self)\nHTTPError: 400 Client Error: Bad Request for url: http://pavics.ouranos.ca:8983/solr/birdhouse/select?start=0&rows=10&q=*&fq=variable:%22tasmin%22&fq=project:%22CMIP5%22&fq=experiment:%22rcp85%22&fq=frequency:%22day%22&fl=*,score&fq=type:File&sort=id+asc&wt=json&indent=true\n'}

Interestingly the canarie monitoring of the Catalog service was working fine.

It turns out the file /data/solr/birdhouse/conf/managed-schema was important.

Diff of that managed-schema file against a working one from CRIM:

$ diff /data/solr/solr/birdhouse/conf/managed-schema /tmp/good-file
1c1
< <?xml version="1.0" encoding="UTF-8"?>
---
> <?xml version="1.0" encoding="UTF-8"?>
48a49,51
>   <field name="dataset_id" type="string" stored="true"/>
>   <field name="datetime_max" type="date" stored="true"/>
>   <field name="datetime_min" type="date" stored="true"/>
50a54
>   <field name="fileserver_url" type="string" stored="true"/>
55a60
>   <field name="latest" type="boolean" stored="true"/>
58a64
>   <field name="replica" type="boolean" stored="true"/>
63a70
>   <field name="type" type="string" stored="true"/>

The good file has a few more fields !

Replaced the bad file with the good file and the catalog_search.ipynb
works again.

Will launch the crawler again to really refresh the data but at least now the
Catalog service is working.

The `catalog_search.ipynb` (https://pavics.ouranos.ca/jupyter/user/public/lab/tree/tutorial-notebooks/catalog_search.ipynb) notebook was failing with this error: owslib.wps.WPSException : {'code': 'NoApplicableCode', 'locator': 'None', 'text': 'Process error: method=wps_pavicsearch.py._handler, line=254, msg=Traceback (most recent call last):\n File "/usr/local/lib/python2.7/dist-packages/pavics_datacatalog-0.6.11-py2.7.egg/pavics_datacatalog/wps_processes/wps_pavicsearch.py", line 251, in _handler\n output_format=output_format)\n File "/usr/local/lib/python2.7/dist-packages/pavics/catalog.py", line 973, in pavicsearch\n r.raise_for_status()\n File "/usr/lib/python2.7/dist-packages/requests/models.py", line 840, in raise_for_status\n raise HTTPError(http_error_msg, response=self)\nHTTPError: 400 Client Error: Bad Request for url: http://pavics.ouranos.ca:8983/solr/birdhouse/select?start=0&rows=10&q=*&fq=variable:%22tasmin%22&fq=project:%22CMIP5%22&fq=experiment:%22rcp85%22&fq=frequency:%22day%22&fl=*,score&fq=type:File&sort=id+asc&wt=json&indent=true\n'} Interestingly the canarie monitoring of the Catalog service was working fine. It turns out the file `/data/solr/birdhouse/conf/managed-schema` was important. Diff of that `managed-schema` file against a working one from CRIM: ```diff $ diff /data/solr/solr/birdhouse/conf/managed-schema /tmp/good-file 1c1 < <?xml version="1.0" encoding="UTF-8"?> --- > <?xml version="1.0" encoding="UTF-8"?> 48a49,51 > <field name="dataset_id" type="string" stored="true"/> > <field name="datetime_max" type="date" stored="true"/> > <field name="datetime_min" type="date" stored="true"/> 50a54 > <field name="fileserver_url" type="string" stored="true"/> 55a60 > <field name="latest" type="boolean" stored="true"/> 58a64 > <field name="replica" type="boolean" stored="true"/> 63a70 > <field name="type" type="string" stored="true"/> ``` The good file has a few more fields ! Replaced the bad file with the good file and the `catalog_search.ipynb` works again. Will launch the crawler again to really refresh the data but at least now the Catalog service is working.

tlvu · 2020-01-22T21:08:03Z

@davidcaron if you migrate more servers, use this updated backup script to avoid breaking the Catalog service again.

tlvu · 2020-01-22T21:14:46Z

Crawler re-launched:

$ curl --include "http://boreas.ouranos.ca:8086/pywps?service=WPS&request=execute&version=1.0.0&identifier=pavicrawler&storeExecuteResponse=true&status=true&DataInputs="
HTTP/1.1 200 OK
Date: Wed, 22 Jan 2020 21:09:48 GMT
Server: Apache/2.4.18 (Ubuntu)
Content-Length: 1010
Vary: Accept-Encoding
Content-Type: text/xml; charset=utf-8

<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd" service="WPS" version="1.0.0" xml:lang="en-US" serviceInstance="http://localhost/wps?request=GetCapabilities&amp;amp;service=WPS" statusLocation="https://pavics.ouranos.ca/wpsoutputs/catalog/85e8b1d8-3d5b-11ea-829a-0242ac120012.xml">
    <wps:Process wps:processVersion="0.1">
        <ows:Identifier>pavicrawler</ows:Identifier>
        <ows:Title>PAVICS Crawler</ows:Title>
        <ows:Abstract>Crawl thredds server and write metadata to SOLR database.</ows:Abstract>
        </wps:Process>
    <wps:Status creationTime="2020-01-22T21:09:48Z">
        <wps:ProcessAccepted percentCompleted="0">PyWPS Process pavicrawler accepted</wps:ProcessAccepted>
        </wps:Status>
</wps:ExecuteResponse>

Status location: https://pavics.ouranos.ca/wpsoutputs/catalog/85e8b1d8-3d5b-11ea-829a-0242ac120012.xml

$ curl --include https://pavics.ouranos.ca/wpsoutputs/catalog/85e8b1d8-3d5b-11ea-829a-0242ac120012.xml
HTTP/1.1 200 OK
Server: nginx/1.13.6
Date: Wed, 22 Jan 2020 21:12:39 GMT
Content-Type: text/xml
Content-Length: 994
Last-Modified: Wed, 22 Jan 2020 21:09:49 GMT
Connection: keep-alive
ETag: "5e28ba1d-3e2"
Accept-Ranges: bytes

<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd" service="WPS" version="1.0.0" xml:lang="en-US" serviceInstance="http://localhost/wps?request=GetCapabilities&amp;amp;service=WPS" statusLocation="https://pavics.ouranos.ca/wpsoutputs/catalog/85e8b1d8-3d5b-11ea-829a-0242ac120012.xml">
    <wps:Process wps:processVersion="0.1">
        <ows:Identifier>pavicrawler</ows:Identifier>
        <ows:Title>PAVICS Crawler</ows:Title>
        <ows:Abstract>Crawl thredds server and write metadata to SOLR database.</ows:Abstract>
        </wps:Process>
    <wps:Status creationTime="2020-01-22T21:09:49Z">
        <wps:ProcessStarted percentCompleted="10">Calling pavicrawler</wps:ProcessStarted>
        </wps:Status>
</wps:ExecuteResponse>

tlvu · 2020-01-22T21:26:10Z

Oh crap, re-crawl failed ! @davidcaron any quick hint?

$ curl --include https://pavics.ouranos.ca/wpsoutputs/catalog/85e8b1d8-3d5b-11ea-829a-0242ac120012.xml
HTTP/1.1 200 OK
Server: nginx/1.13.6
Date: Wed, 22 Jan 2020 21:23:57 GMT
Content-Type: text/xml
Content-Length: 2912
Last-Modified: Wed, 22 Jan 2020 21:17:19 GMT
Connection: keep-alive
ETag: "5e28bbdf-b60"
Accept-Ranges: bytes

<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd" service="WPS" version="1.0.0" xml:lang="en-US" serviceInstance="http://localhost/wps?request=GetCapabilities&amp;amp;service=WPS" statusLocation="https://pavics.ouranos.ca/wpsoutputs/catalog/85e8b1d8-3d5b-11ea-829a-0242ac120012.xml">
    <wps:Process wps:processVersion="0.1">
        <ows:Identifier>pavicrawler</ows:Identifier>
        <ows:Title>PAVICS Crawler</ows:Title>
        <ows:Abstract>Crawl thredds server and write metadata to SOLR database.</ows:Abstract>
        </wps:Process>
    <wps:Status creationTime="2020-01-22T21:17:19Z">
        <wps:ProcessFailed>
            <wps:ExceptionReport>
                    <ows:Exception exceptionCode="NoApplicableCode" locator="None">
                            <ows:ExceptionText>Process error: method=wps_pavicrawler.py._handler, line=146, msg=Traceback (most recent call last):
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics_datacatalog-0.6.11-py2.7.egg/pavics_datacatalog/wps_processes/wps_pavicrawler.py&#34;, line 144, in _handler
    headers=headers, verify=self.verify)
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics/catalog.py&#34;, line 476, in pavicrawler
    headers=headers, verify=verify)
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics/catalog.py&#34;, line 280, in thredds_crawler
    verify=verify):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 33, in crawl
    for ds in crawl(ref.url, skip, depth - 1, **kwargs):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 33, in crawl
    for ds in crawl(ref.url, skip, depth - 1, **kwargs):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 33, in crawl
    for ds in crawl(ref.url, skip, depth - 1, **kwargs):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 33, in crawl
    for ds in crawl(ref.url, skip, depth - 1, **kwargs):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 28, in crawl
    cat = read_url(url, skip, **kwargs)
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 52, in read_url
    return read_xml(req.text, url)
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 73, in read_xml
    raise ValueError(&#34;Does not appear to be a Thredds catalog&#34;)
ValueError: Does not appear to be a Thredds catalog
</ows:ExceptionText>
                    </ows:Exception>
            </wps:ExceptionReport>
        </wps:ProcessFailed>
        </wps:Status>

davidcaron · 2020-01-22T21:37:53Z

Not sure...

Check the user has the permissions to access thredds, the information is in config/catalog/catalog.cfg

So the magpie_user must have the permissions to access thredds_host

davidcaron · 2020-01-22T21:45:25Z

The crawler seems to crawl up to a certain depth... and at some point get something that it expects to be a thredds document but is not...

huard · 2020-01-22T21:56:40Z

THREDDS is now configured to serve *.txt files as well. Could that be an issue ?

davidcaron · 2020-01-22T22:01:06Z

Not impossible... One way to be sure would be to build a custom image of the catalog that logs every request it makes when crawling.

tlvu · 2020-01-23T02:34:36Z

*.txt file on Thredds probably did not cause that.

On my test server, I have this kind of dataset:

$ tree
.
├── testdata
│   ├── secure
│   │   ├── tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200601-200612.nc
│   │   ├── tasmax_Amon_MPI-ESM-MR_rcp45_r1i1p1_200701-200712.nc
│   │   ├── tasmax_Amon_MPI-ESM-MR_rcp45_r2i1p1_200601-200612.nc
│   │   └── TEST.txt
│   └── TEST.txt
├── TEST.txt
└── wps_outputs

Crawling worked fine and found the 3 .nc files.

There are these errors in the Catalog service but looks like they are harmless:

$ docker exec catalog bash -c 'tail -f /var/log/apache2/*'

(...)

syntax error, unexpected WORD_STRING, expecting WORD_WORD
context: Error { code = 500; message = "java.io.EOFException: Reading /pavics-data/testdata/TEST.txt at 5 file length = 5"^;};
syntax error, unexpected WORD_STRING, expecting WORD_WORD
context: Error { code = 500; message = "java.io.EOFException: Reading /pavics-data/testdata/secure/TEST.txt at 5 file length = 5"^;};

Let's hope it's just a glitch, I'll retry the crawling again.

tlvu · 2020-01-23T02:44:36Z

New crawler status location: https://pavics.ouranos.ca/wpsoutputs/catalog/0b9c06e4-3d8a-11ea-b543-0242ac120012.xml

I enabled debug logging on the Catalog service this time. Hope to get more hints if it fails.

tlvu · 2020-01-23T03:10:14Z

New crawler status location: https://pavics.ouranos.ca/wpsoutputs/catalog/0b9c06e4-3d8a-11ea-b543-0242ac120012.xml

Same error again :( Will continue investigation tomorrow.

tlvu · 2020-01-23T20:01:44Z

I enabled debug logging on the Catalog service this time. Hope to get more hints if it fails.

Absolutely nothing useful in the debug logs. I guess I will have to patch the docker image for more useful logs.

tlvu · 2020-01-23T20:06:15Z

@davidcaron So on my test server that only have 3 .nc files above, the resulting managed-schema is exactly the same as the one you gave me. Could it be possible that the number of .nc files do not impact the content of that managed-schema file?

tlvu · 2020-01-23T21:55:53Z

I hacked up the Catalog container with this change bird-house/threddsclient@master...bird-house:debug-crawl-failure and managed to get this more useful error:

ValueError: u'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/wps_outputs/hummingbird/e5c0b950-3277-11ea-b357-0242ac120010/catalog.xml': Does not appear to be a Thredds catalog, xml=u'<?xml version="1.0" encoding="utf-8"?>\n<ExceptionReport version="1.0.0"\n xmlns="http://www.opengis.net/ows/1.1"\n xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">\n <Exception exceptionCode=" NoApplicableCode" locator="NotAcceptable">\n <ExceptionText>Request failed: HTTPConnectionPool(host='pavics.ouranos.ca', port=8083): Max retries exceeded with url: /twitcher/ows/proxy/thredds/catalog/birdhouse/wps_outputs/hummingbird/e5c0b950-3277-11ea-b357-0242ac120010/catalog.xml (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fad01ec5b90>: Failed to establish a new connection: [Errno -3] Try again'))</ExceptionText>\n </Exception>\n</ExceptionReport>'

Looks like the transmission is cut during the xml file body transfer. Also, it's weird we are parsing stuff under wps_outputs in the first place.

All the url are under Twitcher, which could possibly explain the transmission cut (the amount of data transfer exceed Twitcher capacity?). Will try to remove Twitcher and have the Catalog directly hit Thredds.

dbyrns · 2020-01-24T14:22:40Z

@tlvu I remind you that wps_outputs is a shared docker volume between all wps providers and thredds. This way thredds can provide file/opendap/wms access facility. So being part of the birdhouse catalog it is indeed being parsed by the crawling process.
Having said that I think we should review this in the catalog exploration task so that we did not mix source/processed data.

huard · 2020-01-24T15:35:00Z

I think Blaise had done something about this (splitting user files vs source files).

dbyrns · 2020-01-24T15:43:49Z

But if I'm right output files should be deleted after some time, so they should not be indexed... Malleefowl has a function to persist output file (https://github.com/Ouranosinc/malleefowl/blob/pavics-dev/malleefowl/processes/wps_persist.py), that was called in some workflows or accessed via a frontend option and was required before indexing output files.

dbyrns · 2020-01-24T15:45:55Z

So I think that the "full"crawling option has been shortsighted as it assumes a fresh volume of a new deployment.

davidcaron · 2020-01-24T15:57:45Z

The thredds/catalog/birdhouse/wps_outputs/hummingbird contains a lot of txt files (~12000), and David Huard said they were added to thredds recently. So maybe they were just skipped before, and that would explain why the problem is new.

When I try to run the crawler form a simple python environment (doing only conda install -c conda-forge threddsclient)

from threddsclient import crawl

url = 'https://pavics.ouranos.ca/thredds/catalog/birdhouse/wps_outputs/hummingbird/catalog.xml'
for n, ds in enumerate(crawl(url, depth=1)):
    print(n, ds, ds.url)

It takes 10 minutes to run for all the hummingbird wps_outputs.

Also, notice I'm not passing through twitcher. (Edit: That might not be entirely true, because the urls returned by thredds are passing through twitcher)

The crawler sometimes finishes, sometimes stops at a different dataset everytime with a connection error...

tlvu · 2020-01-24T15:58:42Z

I am stumped. I sort of, not sure, removed Twitcher, then the front Nginx in front of Thredds (basically more or less undo this PR https://github.com/Ouranosinc/PAVICS/pull/162, might not be enough to completely remove Twitcher/Nginx in front), see diff update-catalog-config...debug-catalog-crawl-failure, still have the same error "ValueError Does not appear to be a Thredds catalog, xml"

Note the hostname and port changes: it tries "https://pavics.ouranos.ca/thredds/catalog" but end up with "Request failed: HTTPConnectionPool(host='boreas.ouranos.ca', port=8083): Max retries exceeded with url: /thredds/catalog".

This error seems to occur only on Thredds with a lot of data. On my test server with 3 .nc file and 3 .txt file the crawl works fine. Can CRIM try the crawl on your side, on a big and small Thredds server?

The debugging changes above are done directly on our production Boreas since I am not able to reproduce the problem somewhere else. I still made sure Jenkins and the Canarie monitoring are still OK.

Full error:

ValueError: u'https://pavics.ouranos.ca/thredds/catalog/birdhouse/wps_outputs/hummingbird/c61c6948-3c2f-11ea-a46b-0242ac120014/catalog.xml': Does not appear to be a Thredds catalog, xml=u'<?xml version="1.0" encoding="utf-8"?>\n<ExceptionReport version="1.0.0"\n xmlns="http://www.opengis.net/ows/1.1"\n xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"\n xsi:schemaLocation="http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd">\n <Exception exceptionCode="NoApplicableCode" locator="NotAcceptable">\n <ExceptionText>Request failed: HTTPConnectionPool(host='boreas.ouranos.ca', port=8083): Max retries exceeded with url: /thredds/catalog/birdhouse/wps_outputs/hummingbird/c61c6948-3c2f-11ea-a46b-0242ac120014/catalog.xml (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fad01d7b3d0>: Failed to establish a new connection: [Errno -3] Try again'))</ExceptionText>\n </Exception>\n</ExceptionReport>'
</ows:ExceptionText>

Full status location for reference: curl --include https://pavics.ouranos.ca/wpsoutputs/catalog/65d8b050-3ebe-11ea-89dd-0242ac120012.xml

HTTP/1.1 200 OK
Server: nginx/1.13.6
Date: Fri, 24 Jan 2020 15:44:32 GMT
Content-Type: text/xml
Content-Length: 3977
Last-Modified: Fri, 24 Jan 2020 15:32:28 GMT
Connection: keep-alive
ETag: "5e2b0e0c-f89"
Accept-Ranges: bytes

<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd" service="WPS" version="1.0.0" xml:lang="en-US" serviceInstance="http://localhost/wps?request=GetCapabilities&amp;amp;service=WPS" statusLocation="https://pavics.ouranos.ca/wpsoutputs/catalog/65d8b050-3ebe-11ea-89dd-0242ac120012.xml">
    <wps:Process wps:processVersion="0.1">
        <ows:Identifier>pavicrawler</ows:Identifier>
        <ows:Title>PAVICS Crawler</ows:Title>
        <ows:Abstract>Crawl thredds server and write metadata to SOLR database.</ows:Abstract>
        </wps:Process>
    <wps:Status creationTime="2020-01-24T15:32:28Z">
        <wps:ProcessFailed>
            <wps:ExceptionReport>
                    <ows:Exception exceptionCode="NoApplicableCode" locator="None">
                            <ows:ExceptionText>Process error: method=wps_pavicrawler.py._handler, line=146, msg=Traceback (most recent call last):
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics_datacatalog-0.6.11-py2.7.egg/pavics_datacatalog/wps_processes/wps_pavicrawler.py&#34;, line 144, in _handler
    headers=headers, verify=self.verify)
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics/catalog.py&#34;, line 476, in pavicrawler
    headers=headers, verify=verify)
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics/catalog.py&#34;, line 280, in thredds_crawler
    verify=verify):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 35, in crawl
    for ds in crawl(ref.url, skip, depth - 1, **kwargs):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 35, in crawl
    for ds in crawl(ref.url, skip, depth - 1, **kwargs):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 35, in crawl
    for ds in crawl(ref.url, skip, depth - 1, **kwargs):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 35, in crawl
    for ds in crawl(ref.url, skip, depth - 1, **kwargs):
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 30, in crawl
    cat = read_url(url, skip, **kwargs)
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 55, in read_url
    return read_xml(req.text, url)
  File &#34;/usr/local/lib/python2.7/dist-packages/threddsclient/client.py&#34;, line 77, in read_xml
    % (baseurl, xml)))
ValueError: u&#39;https://pavics.ouranos.ca/thredds/catalog/birdhouse/wps_outputs/hummingbird/c61c6948-3c2f-11ea-a46b-0242ac120014/catalog.xml&#39;: Does not appear to be a Thredds catalog, xml=u&#39;&lt;?xml version=&#34;1.0&#34; encoding=&#34;utf-8&#34;?&gt;\n&lt;ExceptionReport version=&#34;1.0.0&#34;\n    xmlns=&#34;http://www.opengis.net/ows/1.1&#34;\n    xmlns:xsi=&#34;http://www.w3.org/2001/XMLSchema-instance&#34;\n    xsi:schemaLocation=&#34;http://www.opengis.net/ows/1.1 http://schemas.opengis.net/ows/1.1.0/owsExceptionReport.xsd&#34;&gt;\n    &lt;Exception exceptionCode=&#34;NoApplicableCode&#34; locator=&#34;NotAcceptable&#34;&gt;\n        &lt;ExceptionText&gt;Request failed: HTTPConnectionPool(host=&amp;#x27;boreas.ouranos.ca&amp;#x27;, port=8083): Max retries exceeded with url: /thredds/catalog/birdhouse/wps_outputs/hummingbird/c61c6948-3c2f-11ea-a46b-0242ac120014/catalog.xml (Caused by NewConnectionError(&amp;#x27;&amp;lt;urllib3.connection.HTTPConnection object at 0x7fad01d7b3d0&amp;gt;: Failed to establish a new connection: [Errno -3] Try again&amp;#x27;))&lt;/ExceptionText&gt;\n    &lt;/Exception&gt;\n&lt;/ExceptionReport&gt;&#39;
</ows:ExceptionText>
                    </ows:Exception>
            </wps:ExceptionReport>
        </wps:ProcessFailed>
        </wps:Status>
</wps:ExecuteResponse>

huard · 2020-01-24T16:23:25Z

If the problem is related to text files. One option would be for THREDDS to index only

LICENSE.{txt,md,rst}
README.{txt,md,rst}

catalog: use public hostname in config when using self-signed SSL behind real SSL from pagekite Fix magpie connection error like: ``` <ows:ExceptionText>Process error: method=wps_pavicrawler.py._handler, line=146, msg=Traceback (most recent call last): File "/usr/local/lib/python2.7/dist-packages/pavics_datacatalog-0.6.11-py2.7.egg/pavics_datacatalog/wps_processes/wps_pavicrawler.py", line 125, in _handler verify=self.verify) File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 523, in post return self.request('POST', url, data=data, json=json, **kwargs) File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 480, in request resp = self.send(prep, **send_kwargs) File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 588, in send r = adapter.send(request, **kwargs) File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 447, in send raise SSLError(e, request=request) SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590) </ows:ExceptionText> ``` Thredds url need to use the public hostname too so the path recorded in Solr is the good public one. The wms_alternate_server, not sure what it impact but looks like it might be useful so change it too. Fixed needed to investigate the crawling problem in #4 (comment)

tlvu · 2020-01-27T17:58:59Z

Text files in the Thredds catalog is not the root cause. I just removed the Thredds config that exposes text files in the catalog and still that same error "ValueError: u'https://pavics.ouranos.ca/thredds/catalog/birdhouse/wps_outputs/hummingbird/f5858240-4c08-11e9-a17f-0242ac12000d/catalog.xml': Does not appear to be a Thredds catalog".

$ git diff
diff --git a/birdhouse/config/thredds/catalog.xml.template b/birdhouse/config/thredds/catalog.xml.template
index 7d97b36..7ada4b5 100644
--- a/birdhouse/config/thredds/catalog.xml.template
+++ b/birdhouse/config/thredds/catalog.xml.template
@@ -22,9 +22,6 @@
       <filter>
         <include wildcard="*.nc" />
         <include wildcard="*.ncml" />
-        <include wildcard="*.txt" />
-        <include wildcard="*.md" />
-        <include wildcard="*.rst" />
       </filter>
 
     </datasetScan>

tlvu · 2020-01-27T18:56:05Z

I have the Catalog access Thredds directly via internal docker networking instead of using the external network (PAVICS_FQDN) b165d1a and the crawl has been running for 20 mins uninterrupted, the previous longuest run was about 10 mins only.

I think I am onto something here, maybe some strict firewall rules or network denial of service attack protection interfering with the crawl since the crawl makes a huge amount of network connections. This would explain why none of other servesr is able to reproduce the problem since the protection would be on the public Boreas only.

tlvu · 2020-01-27T19:59:12Z

Crawler finished without error (way too fast) and did not insert anything into Solr. But last least we got over the network problem crawling Thredds.

$ curl https://pavics.ouranos.ca/wpsoutputs/catalog/05f2d1a6-413a-11ea-9d58-0242ac120008.xml
<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd" service="WPS" version="1.0.0" xml:lang="en-US" serviceInstance="http://localhost/wps?request=GetCapabilities&amp;amp;service=WPS" statusLocation="https://pavics.ouranos.ca/wpsoutputs/catalog/05f2d1a6-413a-11ea-9d58-0242ac120008.xml">
    <wps:Process wps:processVersion="0.1">
        <ows:Identifier>pavicrawler</ows:Identifier>
        <ows:Title>PAVICS Crawler</ows:Title>
        <ows:Abstract>Crawl thredds server and write metadata to SOLR database.</ows:Abstract>
        </wps:Process>
    <wps:Status creationTime="2020-01-27T19:50:29Z">
        <wps:ProcessSucceeded>PyWPS Process PAVICS Crawler finished</wps:ProcessSucceeded>
        </wps:Status>
        <wps:ProcessOutputs>
                <wps:Output>
            <ows:Identifier>crawler_result</ows:Identifier>
            <ows:Title>PAVICS Crawler Result</ows:Title>
            <ows:Abstract>Crawler result as a json.</ows:Abstract>
            <wps:Reference href="https://pavics.ouranos.ca/wpsoutputs/catalog/05f2d1a6-413a-11ea-9d58-0242ac120008/solr_result_2020-01-27T19:50:28Z_.json" mimeType="application/json" encoding="" schema=""/>
                </wps:Output>
        </wps:ProcessOutputs>
</wps:ExecuteResponse>

$ curl https://pavics.ouranos.ca/wpsoutputs/catalog/05f2d1a6-413a-11ea-9d58-0242ac120008/solr_result_2020-01-27T19:50:28Z_.json
{"responseHeader": {"status": 0, "QTime": 0, "Nquery": 0}}

tlvu · 2020-01-29T22:39:36Z

A crawl has been running for 4 hours, this looks promising. Note this is when bypassing all external networks and using internal docker network only between the Catalog and Thredds 83c8391...641c648

curl https://pavics.ouranos.ca/wpsoutputs/catalog/d35b666e-42be-11ea-827b-0242ac120016.xml

<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd" service="WPS" version="1.0.0" xml:lang="en-US" serviceInstance="http://localhost/wps?request=GetCapabilities&amp;amp;service=WPS" statusLocation="https://pavics.ouranos.ca/wpsoutputs/catalog/d35b666e-42be-11ea-827b-0242ac120016.xml">
    <wps:Process wps:processVersion="0.1">
        <ows:Identifier>pavicrawler</ows:Identifier>
        <ows:Title>PAVICS Crawler</ows:Title>
        <ows:Abstract>Crawl thredds server and write metadata to SOLR database.</ows:Abstract>
        </wps:Process>
    <wps:Status creationTime="2020-01-29T17:43:15Z">
        <wps:ProcessStarted percentCompleted="10">Calling pavicrawler</wps:ProcessStarted>
        </wps:Status>
</wps:ExecuteResponse>

tlvu · 2020-01-30T18:22:29Z

Crawl failed again, this time connection problem to Solr.

curl https://pavics.ouranos.ca/wpsoutputs/catalog/d35b666e-42be-11ea-827b-0242ac120016.xml

<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd" service="WPS" version="1.0.0" xml:lang="en-US" serviceInstance="http://localhost/wps?request=GetCapabilities&amp;amp;service=WPS" statusLocation="https://pavics.ouranos.ca/wpsoutputs/catalog/d35b666e-42be-11ea-827b-0242ac120016.xml">
    <wps:Process wps:processVersion="0.1">
        <ows:Identifier>pavicrawler</ows:Identifier>
        <ows:Title>PAVICS Crawler</ows:Title>
        <ows:Abstract>Crawl thredds server and write metadata to SOLR database.</ows:Abstract>
        </wps:Process>
    <wps:Status creationTime="2020-01-30T04:11:10Z">
        <wps:ProcessFailed>
            <wps:ExceptionReport>
                    <ows:Exception exceptionCode="NoApplicableCode" locator="None">
                            <ows:ExceptionText>Process error: method=wps_pavicrawler.py._handler, line=146, msg=Traceback (most recent call last):
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics_datacatalog-0.6.11-py2.7.egg/pavics_datacatalog/wps_processes/wps_pavicrawler.py&#34;, line 144, in _handler
    headers=headers, verify=self.verify)
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics/catalog.py&#34;, line 493, in pavicrawler
    doc[&#39;title&#39;], doc[&#39;dataset_id&#39;]))
  File &#34;/usr/local/lib/python2.7/dist-packages/pavics/catalog.py&#34;, line 976, in pavicsearch
    r = requests.get(solr_search_url)
  File &#34;/usr/lib/python2.7/dist-packages/requests/api.py&#34;, line 67, in get
    return request(&#39;get&#39;, url, params=params, **kwargs)
  File &#34;/usr/lib/python2.7/dist-packages/requests/api.py&#34;, line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File &#34;/usr/lib/python2.7/dist-packages/requests/sessions.py&#34;, line 480, in request
    resp = self.send(prep, **send_kwargs)
  File &#34;/usr/lib/python2.7/dist-packages/requests/sessions.py&#34;, line 588, in send
    r = adapter.send(request, **kwargs)
  File &#34;/usr/lib/python2.7/dist-packages/requests/adapters.py&#34;, line 437, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host=&#39;pavics.ouranos.ca&#39;, port=8983): Max retries exceeded with url: /solr/birdhouse/select?start=0&amp;rows=1000&amp;q=tasmax_day_MPI-ESM-LR_rcp85_r1i1p1_na10kgrid_qm-moving-50bins-detrend_2003.nc%20AND%20testdata.ouranos.cb-oura-1.0_rechunk.MPI-ESM-LR.rcp85.day.tasmax&amp;fl=*,score&amp;sort=id+asc&amp;wt=json&amp;indent=true (Caused by NewConnectionError(&#39;&lt;requests.packages.urllib3.connection.HTTPConnection object at 0x7f9ea2d26350&gt;: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution&#39;,))
</ows:ExceptionText>
                    </ows:Exception>
            </wps:ExceptionReport>
        </wps:ProcessFailed>
        </wps:Status>
</wps:ExecuteResponse>

tlvu · 2020-02-03T21:48:48Z

Tagged 1.7.0 since the last tag from the old PAVICS repo was 1.6.13. Migrating to a public repo is not small, worth to bump a minor instead of a patch version.

tlvu · 2020-02-17T21:22:52Z

Just to close on this crawling issue, @moulab88 and I finally found 2 root causes.

1 - Catalog was choking when crawling Thredds because there was a gigantic 244G folder under wps_outputs that probably timeout the connection between the Catalog and Thredds when Thredds was generating the catalog.xml of that folder. We removed that folder.

2 - Catalog was unable to connect to Solr due to an out-of-date DNS config on the Boreas host. New config was deployed.

So the full Crawl finally worked and took 2 days to complete. Mourad started the crawl during Friday morning and it finished Sunday 10:59 AM.

curl --include "https://pavics.ouranos.ca/wpsoutputs/catalog/ea24a6fe-4f6f-11ea-a3e1-0242ac120015.xml"

HTTP/1.1 200 OK
Server: nginx/1.13.6
Date: Mon, 17 Feb 2020 21:14:55 GMT
Content-Type: text/xml
Content-Length: 1461
Last-Modified: Sun, 16 Feb 2020 14:59:10 GMT
Connection: keep-alive
ETag: "5e4958be-5b5"
Accept-Ranges: bytes

<?xml version="1.0" encoding="UTF-8"?>
<wps:ExecuteResponse xmlns:wps="http://www.opengis.net/wps/1.0.0" xmlns:ows="http://www.opengis.net/ows/1.1" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 ../wpsExecute_response.xsd" service="WPS" version="1.0.0" xml:lang="en-US" serviceInstance="http://localhost/wps?request=GetCapabilities&amp;amp;service=WPS" statusLocation="https://pavics.ouranos.ca/wpsoutputs/catalog/ea24a6fe-4f6f-11ea-a3e1-0242ac120015.xml">
    <wps:Process wps:processVersion="0.1">
        <ows:Identifier>pavicrawler</ows:Identifier>
        <ows:Title>PAVICS Crawler</ows:Title>
        <ows:Abstract>Crawl thredds server and write metadata to SOLR database.</ows:Abstract>
        </wps:Process>
    <wps:Status creationTime="2020-02-16T14:59:10Z">
        <wps:ProcessSucceeded>PyWPS Process PAVICS Crawler finished</wps:ProcessSucceeded>
        </wps:Status>
        <wps:ProcessOutputs>
                <wps:Output>
            <ows:Identifier>crawler_result</ows:Identifier>
            <ows:Title>PAVICS Crawler Result</ows:Title>
            <ows:Abstract>Crawler result as a json.</ows:Abstract>
            <wps:Reference href="https://pavics.ouranos.ca/wpsoutputs/catalog/ea24a6fe-4f6f-11ea-a3e1-0242ac120015/solr_result_2020-02-16T14:59:08Z_.json" mimeType="application/json" encoding="" schema=""/>
                </wps:Output>
        </wps:ProcessOutputs>
</wps:ExecuteResponse>

curl --include https://pavics.ouranos.ca/wpsoutputs/catalog/ea24a6fe-4f6f-11ea-a3e1-0242ac120015/solr_result_2020-02-16T14:59:08Z_.json

HTTP/1.1 200 OK
Server: nginx/1.13.6
Date: Mon, 17 Feb 2020 21:20:27 GMT
Content-Type: application/json
Content-Length: 65
Last-Modified: Sun, 16 Feb 2020 14:59:08 GMT
Connection: keep-alive
ETag: "5e4958bc-41"
Accept-Ranges: bytes

{"responseHeader": {"status": 0, "QTime": 58978, "Nquery": 1214}}

moulab88 · 2020-02-18T14:23:26Z

And also 12702 files/sub-directory under this directory.

@huard

…ins-failure-after-new-crawl catalog_search.ipynb: fix jenkins failure after new crawl Make the query much more precise by adding "institute:CCCma,model:CanESM2". Previous query was returning 200+ results after new crawl triggered in bird-house/birdhouse-deploy#4 (comment). Now we seems to have duplicate result, "cccma" and "CCCMA". @huard did we rename "CCCMA" to "cccma" on Thredds? New working Jenkins run: http://jenkins.ouranos.ca/job/PAVICS-e2e-workflow-tests/job/master/480/console Jenkins error fixed: ``` 00:35:48 _____ pavics-sdi-master/docs/source/notebooks/catalog_search.ipynb::Cell 1 _____ 00:35:48 Notebook cell execution failed 00:35:48 Cell 1: Cell outputs differ 00:35:48 00:35:48 Input: 00:35:48 resp = wps.pavicsearch(constraints="variable:tasmin,project:CMIP5,experiment:rcp85,frequency:day", limit=10, type="File") 00:35:48 [result, files] = resp.get(asobj=True) 00:35:48 files 00:35:48 00:35:48 Traceback: 00:35:48 mismatch 'text/plain' 00:35:48 00:35:48 assert reference_output == test_output failed: 00:35:48 00:35:48 "['https://pa...21001231.nc']" == "['https://pa...20101130.nc']" 00:35:48 Skipping 61 identical leading characters in diff, use -v to show 00:35:48 - birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc', 00:35:48 ? ^^^ ^^^^^^^^^ ^ ^^^^ ^ ^ ^ 00:35:48 + birdhouse/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc', 00:35:48 ? ^^^^^^ ^^ ^ ++++ ^^ ^ ^ ^ 00:35:48 - 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r2i1p1/tasmin/tasmin_day_CanESM2_rcp85_r2i1p1_20060101-21001231.nc', 00:35:48 ? ^^^ ^ ^^^ ^ ^ ^^^ ^ ^ -- ^^ 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-LR/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MPI-ESM-LR_rcp85_r1i1p1_21300101-21391231.nc', 00:35:48 ? ^^^^^^ ^^^^ ^^^^ ^^^ ^ ^^^^ ^^^ ^ ++ ^^ 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-LR/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MPI-ESM-LR_rcp85_r1i1p1_21810101-21891231.nc', 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MOHC/HadGEM2-ES/rcp85/day/atmos/r4i1p1/tasmin/tasmin_day_HadGEM2-ES_rcp85_r4i1p1_20951201-21001130.nc', 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MOHC/HadGEM2-CC/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_HadGEM2-CC_rcp85_r1i1p1_20451201-20501130.nc', 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MOHC/HadGEM2-ES/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_HadGEM2-ES_rcp85_r1i1p1_20401201-20451130.nc', 00:35:48 - 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r3i1p1/tasmin/tasmin_day_CanESM2_rcp85_r3i1p1_20060101-21001231.nc', 00:35:48 ? ^^^ ^ ^^^ ^ ^^^ ^ - - ^ 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MPI-M/MPI-ESM-LR/rcp85/day/atmos/r3i1p1/tasmin/tasmin_day_MPI-ESM-LR_rcp85_r3i1p1_20400101-20491231.nc', 00:35:48 ? ^^^^^^ ^^^^ ^^^^ ^^^ ^^^^ ^^^ + ^^ 00:35:48 - 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r4i1p1/tasmin/tasmin_day_CanESM2_rcp85_r4i1p1_20060101-21001231.nc', 00:35:48 - 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_CanESM2_rcp85_r1i1p1_20060101-21001231.nc'] 00:35:48 ? ^^ ^^^^^ -- ^ ^ -- --- - ^ ^ ^ 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MOHC/HadGEM2-ES/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_HadGEM2-ES_rcp85_r1i1p1_20151201-20201130.nc', 00:35:48 ? +++++++++ ^^^^^^ ^^ ^ ^^^^^^ +++ + ^ ^ ^^ 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MOHC/HadGEM2-ES/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_HadGEM2-ES_rcp85_r1i1p1_22191201-22291130.nc', 00:35:48 + 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MOHC/HadGEM2-CC/rcp85/day/atmos/r3i1p1/tasmin/tasmin_day_HadGEM2-CC_rcp85_r3i1p1_20051201-20101130.nc'] 00:35:48 00:35:48 00:35:48 _____ pavics-sdi-master/docs/source/notebooks/catalog_search.ipynb::Cell 2 _____ 00:35:48 Notebook cell execution failed 00:35:48 Cell 2: Cell outputs differ 00:35:48 00:35:48 Input: 00:35:48 result['response']['docs'][0] 00:35:48 00:35:48 Traceback: 00:35:48 mismatch 'text/plain' 00:35:48 00:35:48 assert reference_output == test_output failed: 00:35:48 00:35:48 "{'cf_standar...21001231.nc'}" == "{'cf_standar...21001231.nc'}" 00:35:48 Skipping 56 identical leading characters in diff, use -v to show 00:35:48 - birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc', 00:35:48 ? ^^^ ^^^^^^^^^ ^ ^^^^ ^ ^ ^ 00:35:48 + birdhouse/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc', 00:35:48 ? ^^^^^^ ^^ ^ ++++ ^^ ^ ^ ^ 00:35:48 'replica': False, 00:35:48 - 'wms_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc', 00:35:48 ? ^^^^^^^^^^^^^ ^ ^^^^^^^ ^^^^^^^^^ 00:35:48 + 'wms_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/ncWMS2/wms?SERVICE=WMS&REQUEST=GetCapabilities&VERSION=1.3.0&DATASET=outputs/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc', 00:35:48 ? ^^^^^^^^^ ^ ^^^^^^^^^ ^^^^^^^^^ 00:35:48 'keywords': ['air_temperature', 00:35:48 'day', 00:35:48 'application/netcdf', 00:35:48 'tasmin', 00:35:48 'thredds', 00:35:48 'CMIP5', 00:35:48 'rcp85', 00:35:48 - 'CanESM2', 00:35:48 - 'CCCma'], 00:35:48 + 'MRI-CGCM3', 00:35:48 + 'MRI'], 00:35:48 - 'dataset_id': 'CCCMA.CanESM2.rcp85.day.atmos.r5i1p1.tasmin', 00:35:48 ? ^^^ ^^^^^^^^^ ^ 00:35:48 + 'dataset_id': 'cmip5.MRI.rcp85.day.atmos.r1i1p1.tasmin', 00:35:48 ? ^^^^^^ ^^ ^ 00:35:48 'datetime_max': 'DATE_TIME_TZ', 00:35:48 - 'id': '29186a2db2230376', 00:35:48 + 'id': '0035405c47cd3a2f', 00:35:48 'subject': 'Birdhouse Thredds Catalog', 00:35:48 'category': 'thredds', 00:35:48 - 'opendap_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc', 00:35:48 ? ^^^ ^^^^^^^^^ ^ ^^^^ ^ ^ ^ 00:35:48 + 'opendap_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/dodsC/birdhouse/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc', 00:35:48 ? ^^^^^^ ^^ ^ ++++ ^^ ^ ^ ^ 00:35:48 - 'title': 'tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc', 00:35:48 ? ^^^^ ^ ^ ^ 00:35:48 + 'title': 'tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc', 00:35:48 ? ++++ ^^ ^ ^ ^ 00:35:48 'variable_palette': ['default'], 00:35:48 'variable_min': [0], 00:35:48 'variable_long_name': ['Daily Minimum Near-Surface Air Temperature'], 00:35:48 - 'source': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog.xml', 00:35:48 + 'source': 'https://pavics.ouranos.ca//twitcher/ows/proxy/thredds/catalog.xml', 00:35:48 ? + 00:35:48 'datetime_min': 'DATE_TIME_TZ', 00:35:48 'score': 1.0, 00:35:48 'variable_max': [1], 00:35:48 'units': ['K'], 00:35:48 - 'resourcename': 'birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc', 00:35:48 ? ^^^ ^^^^^^^^^ ^ ^^^^ ^ ^ ^ 00:35:48 + 'resourcename': 'birdhouse/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc', 00:35:48 ? ^^^^^^ ^^ ^ ++++ ^^ ^ ^ ^ 00:35:48 'type': 'File', 00:35:48 - 'catalog_url': 'https://pavics.ouranos.ca/thredds/catalog/birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/catalog.xml?dataset=birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc', 00:35:48 + 'catalog_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/catalog/birdhouse/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/catalog.xml?dataset=birdhouse/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc', 00:35:48 'experiment': 'rcp85', 00:35:48 'last_modified': 'DATE_TIME_TZ', 00:35:48 'content_type': 'application/netcdf', 00:35:48 - '_version_': 1599589044577107972, 00:35:48 + '_version_': 1658705770170023939, 00:35:48 'variable': ['tasmin'], 00:35:48 - 'url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc', 00:35:48 ? ^^^ ^^^^^^^^^ ^ ^^^^ ^ ^ ^ 00:35:48 + 'url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc', 00:35:48 ? ^^^^^^ ^^ ^ ++++ ^^ ^ ^ ^ 00:35:48 'project': 'CMIP5', 00:35:48 - 'institute': 'CCCma', 00:35:48 ? ^^^^^ 00:35:48 + 'institute': 'MRI', 00:35:48 ? ^^^ 00:35:48 'frequency': 'day', 00:35:48 - 'model': 'CanESM2', 00:35:48 + 'model': 'MRI-CGCM3', 00:35:48 'latest': True, 00:35:48 - 'fileserver_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/CCCMA/CanESM2/rcp85/day/atmos/r5i1p1/tasmin/tasmin_day_CanESM2_rcp85_r5i1p1_20060101-21001231.nc'} 00:35:48 ? ^^^ ^^^^^^^^^ ^ ^^^^ ^ ^ ^ 00:35:48 + 'fileserver_url': 'https://pavics.ouranos.ca/twitcher/ows/proxy/thredds/fileServer/birdhouse/cmip5/MRI/rcp85/day/atmos/r1i1p1/tasmin/tasmin_day_MRI-CGCM3_rcp85_r1i1p1_20960101-21001231.nc'} 00:35:48 ? ^^^^^^ ^^ ^ ++++ ^^ ^ ^ ^ 00:35:48 ```

tlvu requested review from davidcaron and dbyrns January 22, 2020 21:04

davidcaron approved these changes Jan 22, 2020

View reviewed changes

tlvu merged commit b7c01f6 into master Jan 22, 2020

tlvu deleted the fix-solr-backup-script-missing-data branch January 22, 2020 23:37

tlvu mentioned this pull request Jan 23, 2020

catalog: use public hostname in config when using self-signed SSL behind real SSL from pagekite #5

Merged

tlvu mentioned this pull request Feb 17, 2020

The autodeploy mechanism should also be able to autodeploy itself #27

Closed

tlvu mentioned this pull request Feb 24, 2020

catalog_search.ipynb: fix jenkins failure after new crawl Ouranosinc/pavics-sdi#152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backup solr: should save all of /data/solr, not just the index #4

backup solr: should save all of /data/solr, not just the index #4

tlvu commented Jan 22, 2020

tlvu commented Jan 22, 2020

tlvu commented Jan 22, 2020

tlvu commented Jan 22, 2020

davidcaron commented Jan 22, 2020

davidcaron commented Jan 22, 2020

huard commented Jan 22, 2020

davidcaron commented Jan 22, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

dbyrns commented Jan 24, 2020

huard commented Jan 24, 2020

dbyrns commented Jan 24, 2020

dbyrns commented Jan 24, 2020

davidcaron commented Jan 24, 2020 •

edited

tlvu commented Jan 24, 2020

huard commented Jan 24, 2020

tlvu commented Jan 27, 2020

tlvu commented Jan 27, 2020

tlvu commented Jan 27, 2020

tlvu commented Jan 29, 2020

tlvu commented Jan 30, 2020

tlvu commented Feb 3, 2020

tlvu commented Feb 17, 2020

moulab88 commented Feb 18, 2020

backup solr: should save all of /data/solr, not just the index #4

backup solr: should save all of /data/solr, not just the index #4

Conversation

tlvu commented Jan 22, 2020

tlvu commented Jan 22, 2020

tlvu commented Jan 22, 2020

tlvu commented Jan 22, 2020

davidcaron commented Jan 22, 2020

davidcaron commented Jan 22, 2020

huard commented Jan 22, 2020

davidcaron commented Jan 22, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

tlvu commented Jan 23, 2020

dbyrns commented Jan 24, 2020

huard commented Jan 24, 2020

dbyrns commented Jan 24, 2020

dbyrns commented Jan 24, 2020

davidcaron commented Jan 24, 2020 • edited

tlvu commented Jan 24, 2020

huard commented Jan 24, 2020

tlvu commented Jan 27, 2020

tlvu commented Jan 27, 2020

tlvu commented Jan 27, 2020

tlvu commented Jan 29, 2020

tlvu commented Jan 30, 2020

tlvu commented Feb 3, 2020

tlvu commented Feb 17, 2020

moulab88 commented Feb 18, 2020

davidcaron commented Jan 24, 2020 •

edited