Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

disable tldextract caching #64

Merged
merged 2 commits into from
Nov 9, 2022
Merged

Conversation

mpkuth
Copy link
Contributor

@mpkuth mpkuth commented Nov 9, 2022

Fetching updated TLD lists is already disabled, so the default TLD list that is included with the library will always be used and there is nothing to cache. Explicitly disabling the cache prevents possible occurrences of john-kurkowski/tldextract#254.


I'm running into john-kurkowski/tldextract#254 when using pypac 0.16.0 on Windows Server 2016.

INFO  2022-11-08T22:06:29.184 Traceback (most recent call last):
  ...my code removed...
  File "C:\Program Files\...\pypac\api.py", line 86, in get_pac
    pac_candidate_urls = collect_pac_urls(from_os_settings=True, from_dns=from_dns)
  File "C:\Program Files\...\pypac\api.py", line 126, in collect_pac_urls
    pac_urls.extend(proxy_urls_from_dns())
  File "C:\Program Files\...\pypac\wpad.py", line 40, in proxy_urls_from_dns
    parsed = no_fetch_extract(local_hostname)
  File "C:\Program Files\...\tldextract\tldextract.py", line 213, in __call__
    return self.extract_str(url, include_psl_private_domains)
  File "C:\Program Files\...\tldextract\tldextract.py", line 228, in extract_str
    return self._extract_netloc(lenient_netloc(url), include_psl_private_domains)
  File "C:\Program Files\...\tldextract\tldextract.py", line 257, in _extract_netloc
    suffix_index = self._get_tld_extractor().suffix_index(
  File "C:\Program Files\...\tldextract\tldextract.py", line 302, in _get_tld_extractor
    fallback_to_snapshot=self.fallback_to_snapshot,
  File "C:\Program Files\...\tldextract\suffix_list.py", line 76, in get_suffix_lists
    hashed_argnames=["urls", "fallback_to_snapshot"],
  File "C:\Program Files\...\tldextract\cache.py", line 206, in run_and_cache
    with FileLock(lock_path, timeout=self.lock_timeout):
  File "C:\Program Files\...\filelock\_api.py", line 220, in __enter__
    self.acquire()
  File "C:\Program Files\...\filelock\_api.py", line 183, in acquire
    raise Timeout(self._lock_file)
filelock._error.Timeout: The file lock 'C:\Program Files\...\tldextract\.suffix_cache/publicsuffix.org-tlds\906337bdfc421126a1477ade77793840.tldextract.json.lock' could not be acquired.

This is a suggested fix from that issue and makes sense to include in this library because pypac isn't using the dynamic TLD list feature of tldextract so there is nothing to cache.

https://github.com/john-kurkowski/tldextract#note-about-caching

Once we made this change locally pypac worked as I expected.

mpkuth and others added 2 commits November 8, 2022 17:06
Fetching updated TLD lists is already disabled, so the default TLD list
that is included with the library will always be used and there is
nothing to cache. Explicitly disabling the cache prevents possible
occurences of john-kurkowski/tldextract#254.
@carsonyl
Copy link
Owner

carsonyl commented Nov 9, 2022

Thanks! I'll take your word for it, as I don't have a scenario to repro this issue.

@carsonyl carsonyl merged commit 9cfc959 into carsonyl:master Nov 9, 2022
@mpkuth mpkuth deleted the disable-tld-cache branch November 9, 2022 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants