Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tv_grab_zz_sdjson fails with its baked-in value of sd_json_request_max #137

Closed
JanCeuleers opened this issue Jun 14, 2021 · 28 comments
Closed

Comments

@JanCeuleers
Copy link

XMLTV Version:

apt-cache policy xmltv

xmltv:
Installed: 0.6.1-1
Candidate: 0.6.1-1
Version table:
*** 0.6.1-1 500
500 http://be.archive.ubuntu.com/ubuntu focal/universe amd64 Packages
500 http://be.archive.ubuntu.com/ubuntu focal/universe i386 Packages
100 /var/lib/dpkg/status

XMLTV Component: tv_grab_zz_sdjson

What happened:

After years of not having touched my configuration the grabber failed as follows:

$ /usr/bin/tv_grab_zz_sdjson --config-file tv_grab_sd_json.conf --
output tv_grab_sd_json.out
Initializing...
** POST https://json.schedulesdirect.org/20141201/token ==> 200 OK
** GET https://json.schedulesdirect.org/20141201/status ==> 200 OK
(1s)
Updating lineups...
** GET https://json.schedulesdirect.org/20141201/lineups/GBR-1000014-
DEFAULT ==>200 OK
Indexing channels...
Updating schedules...
** POST https://json.schedulesdirect.org/20141201/schedules/md5 ==>
200 OK (1s)
** POST https://json.schedulesdirect.org/20141201/schedules ==> 200
OK
Updating programs...
** POST https://json.schedulesdirect.org/20141201/programs ==> 200 OK
(183s)
malformed JSON string, neither tag, array, object, number, string or
atom, at character offset 0 (before "(end of string)") at
/usr/bin/tv_grab_zz_sdjson line 278.

The resulting .out file had zero length.

By the time I noticed this had been going on for some time (because by that time guide data for some channels had dried up in MythTV).

I reported this to SchedulesDirect, and Robert Kugalowski suggested lowering the number of items fetched in one go. This seems not to be configurable so I modified line 99 to set sd_json_request_max to 250 instead of 5000, and this makes the grabber work. I previously tried a value of 1000 and this did not fix the problem. According to Robert this is a grabber-side issue, not a server-side issue.

What steps are needed to reproduce this issue?

The above command reliably caused the failure, until I started using a patched version of the grabber as described.

What other software are you using?

Operating System: Ubuntu 20.04

Perl Version: v5.30.0 built for x86_64-linux-gnu-thread-multi

@JanCeuleers
Copy link
Author

Update: I've had to lower the value further (to 100) for the grabber to work.

@rmeden
Copy link
Contributor

rmeden commented Jun 15, 2021 via email

@JanCeuleers
Copy link
Author

JanCeuleers commented Jun 15, 2021 via email

@garybuhrmaster
Copy link
Contributor

In previous instances where the number of items needed to be substantially reduced this was due to either a local network issue (especially for people using gateways/proxies (sometimes by their ISP) that had very small timeouts such that waiting for a response resulted in timeouts and connection drops), or due changes at SD that temporarily (and it was only a day or so) caused their systems to have a longer response to due internal processing (to recreate all the cached data?) so was sufficiently slow to generate other timeouts. But those later timeouts resulted in specific SD side error messages regarding their own gateway timeouts.

One would likely have to run the application in debug mode (if that grabber offers that) to get better data to assist with diagnosis rather than the generic "something did not work" error that the grabber is returning be default.

@JanCeuleers
Copy link
Author

Thank you.

Any guidance about running this grabber in debug mode would be welcome.
I have not found one.

As far as I can tell there are no proxies in the path.

@honir
Copy link
Contributor

honir commented Jun 16, 2021

Any guidance about running this grabber in debug mode would be welcome.
I have not found one.

Add the --debug parameter to your run command

@JanCeuleers
Copy link
Author

JanCeuleers commented Jun 16, 2021 via email

@JanCeuleers
Copy link
Author

JanCeuleers commented Jun 16, 2021 via email

@JanCeuleers
Copy link
Author

The configuration file has the following content (except for the SD account credentials which have been redacted):

cache=/home/mythtv/.xmltv/tv_grab_sd_json.cache
channel-id-format=default
previously-shown-format=date
username=REDACTED
password=REDACTED
mode=channels
channels=GBR-1000014-DEFAULT
channel=30644
channel=17154
channel!17468
channel!17155
channel!17157
channel!20630
channel!60940
channel!84926
channel=20684
channel!44643
channel!48021
channel!24305
channel!56892
channel!47657
channel!21494
channel!21810
channel!45828
channel!31786
channel!31783
channel!82575
channel!52335
channel!24669
channel!79940
channel!48020
channel!21257
channel!90257
channel!53059
channel!25117
channel!33882
channel!52336
channel!97330
channel!81450
channel!65160
channel!29693
channel!98903
channel!62391
channel!102471
channel!73970
channel!28296
channel!44718
channel!17453
channel!27993
channel!77749
channel!25630
channel!21793
channel!75804
channel!46306
channel!30157
channel!59406
channel!97294
channel!97290
channel!56890
channel!31787
channel!90645
channel!61806
channel!99098
channel!57747
channel!44717
channel!101972
channel!102965
channel!97332
channel!31756
channel!45510
channel!97331
channel!97349
channel!42762
channel!57827
channel!97674
channel!97676
channel!29067
channel!102340
channel!44855
channel!103817
channel!99024
channel!68877
channel!65394
channel!102764
channel!47658
channel!103634
channel!97304
channel!73510
channel!32358
channel!105388
channel!87987
channel!69387
channel=50059
channel!50523
channel!50716
channel!50717
channel=83282
channel!83285
channel!97267
channel!89563
channel!89562
channel!87231
channel!97673
channel!98986
channel=29324
channel=29325
channel=83281
channel=83283
channel!33638
channel!34087
channel!56896
channel!103633
channel!56423
channel!19037
channel!20685
channel!16234
channel!49887
channel!105390
channel!105391
channel!104529
channel!24226
channel!67676
channel!46408
channel!81004
channel!69373
channel!105393
channel!105394
channel!24434
channel!31266
channel!24435
channel!24436
channel!24437
channel!24438
channel!29920
channel!30166
channel!31790
channel!24452
channel!24448
channel!33635
channel!25605
channel!25151
channel!30639
channel!31788
channel!45752
channel!24441
channel!30500
channel!24469
channel!24440
channel!24467
channel!45006
channel!24439
channel!33579
channel!25225

@honir
Copy link
Contributor

honir commented Jun 16, 2021

It's not much help posting the entire debug log: especially as it's cut-off at 64k characters, which is before the failure.

Ideally, only post the start-up section plus the section containing the failure.

@JanCeuleers
Copy link
Author

I emailed this but since it hasn't appeared here I'll post it as well -- apologies if this is a duplicate.

On 16/06/2021 19:18, Geoff wrote:

It's not much help posting the entire debug log: especially as it's
cut-off at 64k characters, which is before the failure.

Ideally, only post the start-up section plus the section containing
the failure.

Trying again, after having removed lots of lines.

Initializing...
Magic number checking on storable file failed at /usr/lib/x86_64-linux-gnu/perl/5.30/Storable.pm line 421, at /usr/bin/tv_grab_zz_sdjson line 693.
** POST https://json.schedulesdirect.org/20141201/token ==> -�200 OK (1s)
** GET https://json.schedulesdirect.org/20141201/status ==> -�200 OK
Updating lineups...
lineup GBR-1000014-DEFAULT: new
** GET https://json.schedulesdirect.org/20141201/lineups/GBR-1000014-DEFAULT ==> -�\�|�/�-�\�200 OK (1s)
Indexing channels...
Updating schedules...
** POST https://json.schedulesdirect.org/20141201/schedules/md5 ==> -�\�200 OK (1s)
channel 30644 2021-06-27: new
channel 30644 2021-06-29: new
channel 30644 2021-06-18: new
(...about 100 lines deleted; all new...)
channel 29325 2021-06-30: new
channel 29325 2021-06-29: new
channel 29325 2021-06-18: new
channel 29325 2021-06-27: new
** POST https://json.schedulesdirect.org/20141201/schedules ==> -�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�200 OK (1s)
Updating programs...
program EP015431490185: new
program EP038362730006: new
program SH016308710000: new
(...lots of lines deleted; all new...)
program EP013085690722: new
program SH034728720000: new
program SH034728720000: new
** POST https://json.schedulesdirect.org/20141201/programs ==> -�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�|�/�-�\�200 OK (187s)
malformed JSON string, neither tag, array, object, number, string or atom, at character offset 0 (before "(end of string)") at /usr/bin/tv_grab_zz_sdjson line 278.

@JanCeuleers
Copy link
Author

I guess the next step is to try and capture a network trace using tcpdump?

Doesn't look easy if I want the trace to only include traffic relevant to the grabber's process -- need to learn how to set up network namespaces etc.

@garybuhrmaster
Copy link
Contributor

Well, looking at the log snippets, there are at least two issues. First, the 180+ second timeout is the default in perl LWP, so that is why the fetch aborts (although I have no idea why one gets a 200 status reported in that case), and second, the app does not properly handle malformed return data (possibly partially returned and then gets the malformed string error).

I have tested against the specified lineup against both sdjson grabbers, and both work fine. Which, again, suggests something closer to your enviornment (network, OS packages, OS configuration, etc.) rather than anything wrong with the service itself, or the grabber (well, other than the grabber you are using does not handle errors as well as the other grabber).

@honir
Copy link
Contributor

honir commented Jun 17, 2021

Because the issue occurs at varied times and with varied content I'm inclined to agree with Gary that the error is a network-related timeout.

Where you get

** POST https://json.schedulesdirect.org/20141201/programs ==> 200 OK (187s)

I get

** POST https://json.schedulesdirect.org/20141201/programs ==> 200 OK (1s)

i.e. 1 second vs. your 187 seconds.

Let's try and capture the error your client has detected. I think you are confident with editing the tv_grab_zz_sdjson file?

Replace line 278

return decode_json($response->decoded_content());

with this

use Try::Tiny;
try{
	return decode_json($response->decoded_content());
}
catch{
	print STDERR "HTTP status : ".$response->code."\n";
	print STDERR "Response type : ".$response->header('content-type')."\n";
	print STDERR "Response encoding : ".$response->header('content-encoding')."\n";
	print STDERR "Response content : ".$response->decoded_content()."\n";
	die "fetch aborted", "\n";
};
return decode_json($response->decoded_content());

Then rerun till you get the error. Hopefully it will print some useful info which we can use to determine the reason why the grabber is barfing.

I suspect this will show the response content as being 'blank'.

@JanCeuleers
Copy link
Author

I can reproduce the issue at will by moving the cache out of the way, such that there is always a lot of data to fetch.

I have patched the grabber as Geoff suggested and here is the output. The content is indeed blank.

mythtv@dracor:~/.xmltv$ ./tv_grab_zz_sdjson.patched --config-file tv_grab_sd_json.conf --output tv_grab_sd_json.out
Initializing...
Magic number checking on storable file failed at /usr/lib/x86_64-linux-gnu/perl/5.30/Storable.pm line 421, at ./tv_grab_zz_sdjson.patched line 704.
** POST https://json.schedulesdirect.org/20141201/token ==> 200 OK
** GET https://json.schedulesdirect.org/20141201/status ==> 200 OK (1s)
Updating lineups...
** GET https://json.schedulesdirect.org/20141201/lineups/GBR-1000014-DEFAULT ==>200 OK
Indexing channels...
Updating schedules...
** POST https://json.schedulesdirect.org/20141201/schedules/md5 ==> 200 OK (1s)
** POST https://json.schedulesdirect.org/20141201/schedules ==> 200 OK (2s)
Updating programs...
** POST https://json.schedulesdirect.org/20141201/programs ==> 200 OK (186s)
HTTP status : 200
Response type : application/json;charset=UTF-8
Response encoding : gzip
Use of uninitialized value in concatenation (.) or string at ./tv_grab_zz_sdjson.patched line 287.
Response content :
fetch aborted

I've been looking at potential causes of networking issues:

  • There is no firewall on the backend server at all.
  • I have added logging to all rules on the router (which connects me to my ISP) that drop traffic, and have determined that none of those rules fire in relation to traffic to/from the SD servers.
  • I have enabled/disabled ECN without effect.
  • I have enabled/disabled TCP SACK without effect.
  • I have checked that the PPP connection to my ISP can carry packets that are of the same size as those on my local Ethernet without fragmentation. This is actually a PPPoE connection, and the Ethernet port carrying that connection has had its MTU set to 1508 bytes such that there is enough headroom for a full 1500-byte packet plus the PPP header.
  • Whereas I'm running Squid on this same server, it is not used for this traffic. I have double-checked that this is the case by examining Squid's access logs.

I can't think of anything else to check.

Thanks

@honir
Copy link
Contributor

honir commented Jun 18, 2021

SD doesn't return a Content-Length header which is a pity as it would be a useful cross-check.

Unfortunately we don't know if SD is getting your 'programs' request or not. But...

I'm confused by the 200 code returned in the response. If the server timed-out then I would expect code 500 + empty response (not 200 + empty response). AFAIK a 200 has to come from the server (rather than say Perl LWP generating it).

You said you had it working with low numbers for sd_json_request_max - what is the response time for the 'programs' call in these cases?

Maybe there's an issue with the response being chunked (are low volume requests chunked?). Add

print STDERR "Response chunking : ".Dumper($response->header('client-transfer-encoding'))."\n";

to the debug code and check for the 'chunked' header between runs with low numbers (which work) and runs with high numbers (which don't work) to see if there's a difference.

Let's go back a step: you said it worked for a long time and then broke...any ideas as to what might have changed since it last worked? e.g. new router, updated Perl version, anti-virus program, new broadband package, Newtflix sub, etc.

@JanCeuleers
Copy link
Author

My emailed replies don't seem to make it into the ticket -- submitting this via the web interface.

On 18/06/2021 09:16, Geoff wrote:

You said you had it working with low numbers for sd_json_request_max - what is the response time for the 'programs' call in these cases?

Reasonably quick: 1-2s per iteration.

Maybe there's an issue with the response being chunked (are low volume requests chunked?). Add

print STDERR "Response chunking : ".Dumper($response->header('client-transfer-encoding'))."\n";

to the debug code and check for the 'chunked' header between runs with low numbers (which work) and runs with high numbers (which don't work) to see if there's a difference.

Output below.

Let's go back a step: you said it worked for a long time and then broke...any ideas as to what might have changed since it last worked? e.g. new router, updated Perl version, new broadband package, Newtflix sub, etc.

Other than keeping the OS up-to-date (on both the backend server which runs Ubuntu 20.04 and on the router which runs Debian 10), no.

mythtv@dracor:~/.xmltv$ ./tv_grab_zz_sdjson.patched --config-file tv_grab_sd_json.conf --output tv_grab_sd_json.out
Initializing...
Magic number checking on storable file failed at /usr/lib/x86_64-linux-gnu/perl/5.30/Storable.pm line 421, at ./tv_grab_zz_sdjson.patched line 705.
** POST https://json.schedulesdirect.org/20141201/token ==> 200 OK (1s)
** GET https://json.schedulesdirect.org/20141201/status ==> 200 OK
Updating lineups...
** GET https://json.schedulesdirect.org/20141201/lineups/GBR-1000014-DEFAULT ==>200 OK (1s)
Indexing channels...
Updating schedules...
** POST https://json.schedulesdirect.org/20141201/schedules/md5 ==> 200 OK (1s)
** POST https://json.schedulesdirect.org/20141201/schedules ==> 200 OK (1s)
Updating programs...
** POST https://json.schedulesdirect.org/20141201/programs ==> 200 OK (186s)
HTTP status : 200
Response type : application/json;charset=UTF-8
Response encoding : gzip
Use of uninitialized value in concatenation (.) or string at ./tv_grab_zz_sdjson.patched line 287.
Response content :
Response chunking : $VAR1 = 'chunked';

@rmeden
Copy link
Contributor

rmeden commented Jun 18, 2021 via email

@honir
Copy link
Contributor

honir commented Jun 18, 2021

Yep. It wasn't meant as a criticism ;-) (Content-Length values are often wrong and sometimes misleading anyways). I was just trying to identify whether Jan's empty value was because SD sent an empty string or whether the data sent by SD is getting 'lost' somewhere. I susepect the latter.

@honir
Copy link
Contributor

honir commented Jun 18, 2021

Response chunking : $VAR1 = 'chunked';

I couldn't say for certain without adding debug code to capture it, but I suspect the grab with low values for sd_json_request_max is returning an un-chunked response.

In which case it suggests something in your setup which is broken when the response is received in chunks (hence the timeout after 180 seconds when it has failed to find all the chunks).

@JanCeuleers
Copy link
Author

JanCeuleers commented Jun 18, 2021 via email

@garybuhrmaster
Copy link
Contributor

A simple test may be to run the grabber on your debian router, and see if it works[1]. And if not, try turning down any firewall rules, and try again. If it works there, this issue would seem to be in your network (or some recent update), and a bisect of updates might point you in the right direction.

[1] If your gateway was embedded, I would have suggested hanging your BE, or some specifically built system, directly onto your ISP, but since it is debian, it is probably easier to test with it, although if it does not work there, you may still have to test with a system directly connected to your ISP (since if this is a recent debian update, the router itself may be the issue).

@JanCeuleers
Copy link
Author

As Gary suggested I installed xmltv on the router, then re-ran the test there. It failed in the same way.

I then cleared all firewall rules (by running shorewall clear and then checking that ip-tables-save was blank) and re-ran the test. It still failed in the same way.

@honir
Copy link
Contributor

honir commented Jun 19, 2021

Do you have a VPN you can try it over? (or some other way of getting off of Belgacom/Proximus)

@JanCeuleers
Copy link
Author

JanCeuleers commented Jun 19, 2021 via email

@JanCeuleers
Copy link
Author

I can no longer reproduce the issue (not on the backend, neither on the router nor the laptop).

@JanCeuleers
Copy link
Author

Closing

@honir
Copy link
Contributor

honir commented Jun 20, 2021

Don't you just hate it when that happens :-(
Good that it's working again though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants