Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update for more robust scraping from NDBC websites #242

Merged
merged 4 commits into from Jul 15, 2021

Conversation

shinbunya
Copy link
Contributor

This makes the following changes for NDBC website scraping:

  1. Adapt to non numerical station IDs.
  2. Ignore unnecessary columns in the station list.
  3. Ignore "Station not found" return.

@krober10nd
Copy link
Collaborator

krober10nd commented Jul 14, 2021

Thanks! A quick test worked out nicely for the elevation data.

 m = Make_f15( m, '01-Jan-2018 03:00', '01-Feb-2018 22:00', 1.0, 'sta_database',{'CO-OPS',[1]}

producing something like this after writing a fort.15 file.

144  	 	 ! NSTAE 
-64.695000 32.370000 Bermuda Biological Station ID:2695535 
-64.703300 32.373300 Bermuda, St. Georges Island ID:2695540 
-66.982900 44.904600 Eastport ID:8410140 
-67.204700 44.657000 Cutler Farris Wharf ID:8411060 
-68.204300 44.392200 Bar Harbor ID:8413320 
-70.244200 43.658100 Portland ID:8418150 
-70.563300 43.320000 Wells ID:8419317 
-70.741100 43.079700 Seavey Island ID:8419870 
-71.050300 42.353900 Boston ID:8443970 
-71.164100 41.704300 Fall River ID:8447386 
-69.950800 41.688100 Chatham, Lydia Cove ID:8447435 
-70.671100 41.523600 Woods Hole ID:8447930 
-70.096400 41.285300 Nantucket Island ID:8449130 
-71.326100 41.504300 Newport ID:8452660 
.
.
.

However, if I want to record both elevation and velocity at various stations, it would only populate elevation stations.

 m = Make_f15( merged, '01-Jan-2018 03:00', '01-Feb-2018 22:00', 1.0, 'sta_database',{'CO-OPS',[1,2]}

and if I did just want velocity recording stations:

 m = Make_f15( merged, '01-Jan-2018 03:00', '01-Feb-2018 22:00', 1.0, 'sta_database',{'CO-OPS',[2]}

I got an error like this:

22  for sta_database = sta_databases
Error using xmlread (line 112)
Java exception occurred:
java.net.UnknownHostException: opendap.co-ops.nos.noaa.gov
	at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
	at java.net.Socket.connect(Socket.java:589)
	at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
	at sun.security.ssl.BaseSSLSocketImpl.connect(BaseSSLSocketImpl.java:173)
	at sun.net.NetworkClient.doConnect(NetworkClient.java:180)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
	at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
	at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
	at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
	at
        sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
        	at
        sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
	at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
	at
        sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
        	at
        sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1564)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
	at
        sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
        	at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
	at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
	at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
	at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
	at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)

Error in tidal_stations_parser (line 46)
        DOMnode = xmlread(xml);

Error in Make_f15 (line 337)
    obj = tidal_stations_parser(obj,string(sta_database(1:2:end-1)),...

My other thought is perhaps we could get WriteFort15 to just write an external elev_stat.151 directly if asked?

Anyway thank you again.

@shinbunya
Copy link
Contributor Author

I should've mentioned the following thing at the same time. I'm not sure if this is relevant to the error that @krober10nd mentioned, but it seems that the correct way to call the function is different from what the manual suggests.
In an example in the manual, we have
m = Make_f15( m, TS, TE, DT, 'tidal_database', TPXO9, 'const', ... {CONST},'sta_database',{'CO−OPS','NDBC',[1]} ) ;
However, the way I succeeded was
m = Make_f15( m, TS, TE, DT, 'tidal_database', TPXO9, 'const', ... {CONST},'sta_database',{'CO−OPS',1,'NDBC',2} ) ;
If you want to get met stations
m = Make_f15( m, TS, TE, DT, 'tidal_database', TPXO9, 'const', ... {CONST},'sta_database',{'CO−OPS',1,'NDBC',2,'NDBC',3} ) ;
This can be related to what you observed when you invoked Make_f15.

@shinbunya
Copy link
Contributor Author

Also, as far as I could read from the source code, it didn't look like we can have both CO-OPS and NDBC for elevation stations (nor velocity/met stations). I may be wrong about this.

@krober10nd
Copy link
Collaborator

krober10nd commented Jul 15, 2021

ah yes that makes more sense. We'll have to fix the user guide. By the way, I've already incorporated some of the changes on the Overleaf that you sent through but the appendix still remains out to date.

Coming back to this, when I try to add on NDBC data I get this (my mesh covers the entire WNAT so it should have some stations show up)

>> m = Make_f15( m, '01-Jan-2018 03:00', '01-Feb-2018 22:00', 1.0, 'sta_database',{'CO-OPS',1,'NDBC',3} );
Error using textscan
First input can not be empty. Expected a non-empty character vector or a valid file-id.

Error in tidal_stations_parser (line 110)
            C = textscan(NDBC,'%s','Delimiter','\t');

Error in Make_f15 (line 337)
    obj = tidal_stations_parser(obj,string(sta_database(1:2:end-1)),...

because on 107-108

 NDBC = urlread('http://www.ndbc.noaa.gov/stndesc.shtml');
  ii1 = strfind(NDBC,'<pre>'); ii2 = strfind(NDBC,'</pre>');

these indices can't seem to locate any strings with <pre> on that link for NDBC.

@shinbunya
Copy link
Contributor Author

shinbunya commented Jul 15, 2021

What do you get if you type in the following line in the matlab console?
NDBC = urlread('http://www.ndbc.noaa.gov/stndesc.shtml');
I get a html containg...

<pre>
NDBC owned meteorological buoys
WATCH ADRIFT PAYLOAD
BUOY BUOY LATITUDE LONGITUDE DEPTH RADIUS TYPE
STATION HULL DEG MIN SEC N/S DEG MIN SEC E/W (M) (YDS)
41001 3D90 34 46 30 N 72 10 2 W 4431 4653 SCOOP
41002 3DV33 31 45 42 N 74 57 24 W 3740 3811 SCOOP
...
51101 3DV49 24 21 40 N 162 4 30 W 4849 4962 SCOOP
</pre>

@krober10nd
Copy link
Collaborator

@shinbunya interestingly I get . Maybe it's because I'm in Brasil?

>> NDBC = urlread('http://www.ndbc.noaa.gov/stndesc.shtml')

NDBC =

    '<html>
     <head><title>301 Moved Permanently</title></head>
     <body bgcolor="white">
     <center><h1>301 Moved Permanently</h1></center>
     <hr><center>CloudFront</center>
     </body>
     </html>
     '

>> 

@shinbunya
Copy link
Contributor Author

Yeah, it sounds like an only reasoning of this unreasonable behavior that I can think of. The url gives me info shown in the attached image on my web browser. Should we contact NDBC? If you share an image of what you get on your web browser, I will contact NDBC with it, just to let them know the situation.
image

@krober10nd
Copy link
Collaborator

krober10nd commented Jul 15, 2021

Ah so my browser automatically redirects to https://www.ndbc.noaa.gov/stndesc.shtml this appears to be the most recent site. Check if it still works on your end please.

@krober10nd
Copy link
Collaborator

krober10nd commented Jul 15, 2021

Oh this is weird that's the same link! It seems to open work occasionally for me then. Oh well.
It has https instead of http

@WPringle
Copy link
Collaborator

Fantastic, thanks for updating this!

@shinbunya
Copy link
Contributor Author

Thanks guys!

@krober10nd krober10nd merged commit a0661ca into CHLNDDEV:Projection Jul 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants