Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: IPv6 addresses are 16 bytes long #477

Open
Sanqui opened this issue Apr 20, 2023 · 6 comments
Open

ValueError: IPv6 addresses are 16 bytes long #477

Sanqui opened this issue Apr 20, 2023 · 6 comments
Labels

Comments

@Sanqui
Copy link
Member

Sanqui commented Apr 20, 2023

This is surely true but one weird DNS response shouldn't bring the crawler down.

ERROR Fatal exception.
Traceback (most recent call last):
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/application/app.py", line 157, in run
    yield from pipeline.process()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 194, in process
    yield from self._process_one_worker()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 215, in _process_one_worker
    task.result()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 119, in process
    item = yield from self.process_one(_worker_id=worker_id)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/pipeline/pipeline.py", line 103, in process_one
    yield from task.process(item)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/application/tasks/download.py", line 492, in process
    yield from session.app_session.factory['Processor'].process(session)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/processor/delegate.py", line 29, in process
    return (yield from processor.process(item_session))
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/processor/web.py", line 92, in process
    return (yield from session.process())
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/processor/web.py", line 186, in process
    yield from self._process_loop()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/processor/web.py", line 245, in _process_loop
    exit_early, wait_time = yield from self._fetch_one(cast(Request, self._item_session.request))
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/processor/web.py", line 268, in _fetch_one
    response = yield from self._web_client_session.start()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/protocol/http/web.py", line 107, in start
    response = yield from session.start(request)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/protocol/http/client.py", line 87, in start
    yield from self._stream.reconnect()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/protocol/http/stream.py", line 438, in reconnect
    yield from self._connection.connect()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/network/pool.py", line 375, in connect
    result = yield from self._resolver.resolve(self._address[0])
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/network/dns.py", line 206, in resolve
    answer = yield from self._query_dns(host, family)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/wpull/network/dns.py", line 255, in _query_dns
    answer = yield from event_loop.run_in_executor(None, query)
  File "/home/archivebot/.pyenv/versions/3.6.15/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/resolver.py", line 913, in query
    source_port=source_port)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/query.py", line 325, in udp
    q.keyring, q.mac, ignore_trailing)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/query.py", line 271, in receive_udp
    ignore_trailing=ignore_trailing)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/message.py", line 823, in from_wire
    reader.read()
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/message.py", line 749, in read
    self._get_section(self.message.answer, ancount)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/message.py", line 723, in _get_section
    self.message.origin)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/rdata.py", line 424, in from_wire
    return cls.from_wire(rdclass, rdtype, wire, current, rdlen, origin)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/rdtypes/IN/AAAA.py", line 54, in from_wire
    wire[current: current + rdlen])
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/inet.py", line 78, in inet_ntop
    return dns.ipv6.inet_ntoa(address)
  File "/home/archivebot/.pyenv/versions/3.6.15/envs/archivebot-20230414/lib/python3.6/site-packages/dns/ipv6.py", line 39, in inet_ntoa
    raise ValueError("IPv6 addresses are 16 bytes long")
ValueError: IPv6 addresses are 16 bytes long
CRITICAL Sorry, Wpull unexpectedly crashed.
CRITICAL Please report this problem to the authors at Wpull's issue tracker so it may be fixed. If you know how to program, maybe help us fix it? Thank you for helping us help you help us all.
@JustAnotherArchivist
Copy link
Contributor

ArchiveBot jobs: atsg5kk40bay9xhgjirm454sm, ag2cunwxe333pf3uag3ilysn3, 25gx6tjqcfvral93bsuo124ov

The former two both seemed to attempt to resolve www.snpp.com at the time. The authoritative NS for that domain (peel.ddddns.net and tomato.ddddns.net) return bad data:

$ dig +trace www.snpp.com AAAA
; <<>> DiG 9.11.5-P4-5.1+deb10u1-Debian <<>> +trace www.snpp.com AAAA
;; global options: +cmd
.                       275643  IN      NS      g.root-servers.net.
.                       275643  IN      NS      f.root-servers.net.
.                       275643  IN      NS      m.root-servers.net.
.                       275643  IN      NS      j.root-servers.net.
.                       275643  IN      NS      a.root-servers.net.
.                       275643  IN      NS      h.root-servers.net.
.                       275643  IN      NS      l.root-servers.net.
.                       275643  IN      NS      i.root-servers.net.
.                       275643  IN      NS      c.root-servers.net.
.                       275643  IN      NS      k.root-servers.net.
.                       275643  IN      NS      d.root-servers.net.
.                       275643  IN      NS      b.root-servers.net.
.                       275643  IN      NS      e.root-servers.net.
.                       275643  IN      RRSIG   NS 8 0 518400 20230510210000 20230427200000 60955 . oqOMjUihv4gRblvN+Q3wnTBAYrzWHDQZsFygCYWpa8rnkqDSKphxH2C6 Vqj2G2o9A7+h1MhWwbUqb6KK/BqrVidEQcZtJVwrFJdEZsRXLF/eCNGj XUDOOUekDVhyZN51BHhsErsVtgTmFXXWpoXrrlIfwt2jqrlYfPhYRxZR VnKYG7oIP9HltlBMbb6/mbkbZKkxoYMJrOf9rv7eP5YTDdh+w4I0aQwH 260gb4nLG7Y1knwHtumbDbSHTRpICzdsZnZ97Uecq7KH7eKC6sZhvbPf vfW93Rp1TdwH9MLnseA/zo/eHAazGJlPxXip9bmFq+gwRLtQWPa4egR7 77K9PA==
;; Received 1125 bytes from 127.0.0.1#53(127.0.0.1) in 0 ms

com.                    172800  IN      NS      a.gtld-servers.net.
com.                    172800  IN      NS      b.gtld-servers.net.
com.                    172800  IN      NS      c.gtld-servers.net.
com.                    172800  IN      NS      d.gtld-servers.net.
com.                    172800  IN      NS      e.gtld-servers.net.
com.                    172800  IN      NS      f.gtld-servers.net.
com.                    172800  IN      NS      g.gtld-servers.net.
com.                    172800  IN      NS      h.gtld-servers.net.
com.                    172800  IN      NS      i.gtld-servers.net.
com.                    172800  IN      NS      j.gtld-servers.net.
com.                    172800  IN      NS      k.gtld-servers.net.
com.                    172800  IN      NS      l.gtld-servers.net.
com.                    172800  IN      NS      m.gtld-servers.net.
com.                    86400   IN      DS      30909 8 2 E2D3C916F6DEEAC73294E8268FB5885044A833FC5459588F4A9184CF C41A5766
com.                    86400   IN      RRSIG   DS 8 1 86400 20230513170000 20230430160000 60955 . by9bZHfXQ0kJHLCN1YBtLMoSJN0rd6nx23TmPKVr5u5GY+M1iQNjnyh6 n16o7aeUkmbG/poWlfNhZxIfToDXtlDgJ+w5PN/hy9Ri9hPpBI1kBeYy x/NIJ9iSwpVqL5Bef88xzQ+Gu4JqowPN9QKXog6gK3MKgfSUdYIByjRa Mluaf5hwb8E38CAa4fVoPwgOeK++9Rr155qToNwu3PfWFvao3Vfw9HeD 3NsrW/7duNFcPmRUstKjO1jX9qBpeDXY3fBfZMJ8oM/Py6ZWiq3MiW81 GmjZxSS04yXAA7EDbboXHpaPJdQCkYww9XXS2E4N/sMf1oSuI369i0Xh oFzglA==
;; Received 1172 bytes from 2001:500:9f::42#53(l.root-servers.net) in 99 ms

snpp.com.               172800  IN      NS      peel.ddddns.net.
snpp.com.               172800  IN      NS      tomato.ddddns.net.
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN NSEC3 1 1 0 - CK0Q2D6NI4I7EQH8NA30NS61O48UL8G5 NS SOA RRSIG DNSKEY NSEC3PARAM
CK0POJMG874LJREF7EFN8430QVIT8BSM.com. 86400 IN RRSIG NSEC3 8 2 86400 20230506042301 20230429031301 46551 com. GLb1X3yzLuoKpHjnwU4uKeL7Tz1NyVewyC6x0DU4tAq9kE8ynCgLFbRd yGojp7wN3O69yNC8Ljcbo6kshlRjvVfRsBXH2FQZoKgS4LO/hA1sp0As /dtTLW1gQNvhf8Gj9EtlcpaADBhenGnrvuRLKAz/WkuIOPJoCoRXeUpP lPd4rWGEoFRbXdPMyjhbIpdI0eGlbzL2kitOk2Szu92JKg==
NS9VEP418JCV52JFHMCIVAICU5J5J8J4.com. 86400 IN NSEC3 1 1 0 - NS9VKN0RTJKAL4RQRFHQDQIH1FSJDLVU NS DS RRSIG
NS9VEP418JCV52JFHMCIVAICU5J5J8J4.com. 86400 IN RRSIG NSEC3 8 2 86400 20230505042418 20230428031418 46551 com. TugewX4FWe2otSQ81pdXLaAvT4F+f6U8eQFrFNx7hIf6vKRGLJVorlQo kpprpqCR/LMzZUdt9HstnrFwz71I8oRRfFE+L7bTTrQmm+FWmNqBj28x Xe4UidAl+U/WEP0LvwR/d+H7soF8R+v5++GD/PRCGJep6t/KvWffxaq3 wHXsR3Fi5k8rRM+HJnSUFaNy/TZsIYxbVqOcBqEexYT3TQ==
;; Received 640 bytes from 2001:501:b1f9::30#53(m.gtld-servers.net) in 36 ms

;; Warning: Message parser reports malformed message packet.
;; expected opt record in response
;; Received 58 bytes from 13.251.31.214#53(peel.ddddns.net) in 243 ms

The third job was resolving www.thewebcomiclist.com, which does not appear to cause any problems to me now.

So I guess this happens due to bad DNS servers or possibly corrupted packets. Either way, wpull should catch it and treat it as a resolution error rather than crashing, obviously.

@JustAnotherArchivist
Copy link
Contributor

The third job was actually resolving bestmedicine.djnd.com at crash time, which has the same authoritative NS with the same error.

@JustAnotherArchivist
Copy link
Contributor

The problem on ddddns.net has existed since at least last summer: blechschmidt/massdns#130

@Flashfire42
Copy link

issue has since killed job 19nmu5mdcqc6fbn0vh3glzc3z

@pabs3
Copy link

pabs3 commented Jun 30, 2023

This killed job dy4zhbfss3ngk7cwulndgx35k too, restarted with !ignore dy4zhbfss3ngk7cwulndgx35k ^https?://(www\.)?snpp\.com/.

@pabs3
Copy link

pabs3 commented Jul 10, 2023

This also killed job 1w4tgylnp6m83r5bpx1ulfa2z for https://sucs.org/News.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants