Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect tld for appspot.com (support excluding private domains on lookup) #19

Closed
davefd opened this issue Oct 16, 2012 · 11 comments
Closed

Comments

@davefd
Copy link

davefd commented Oct 16, 2012

There are multiple uses of the Mozilla public suffix list which allow sites such as "appspot.com" to appear on the list as a tld instead of being split into domain="appspot" and tld="com".

This is perfectly reasonable behavior for some use cases, but for others it would be helpful to have the "private" domains be excluded. Mozilla has split the list into "ICANN Domains" and "Private Domains", and it would be useful to optionally be able to exclude the private domains so that sites like "appspot.com" would have their tld reflected as "com".

@john-kurkowski
Copy link
Owner

I wholeheartedly agree. We should split the list and exclude private domains by default. "appspot.com" as a TLD does not fit the definition laid out in tldextract.

I also see the utility of the private domain list, and have long wondered how to let tldextract users define their own private domains. For some use cases similar to appspot.com, blogspot.com also could be a TLD. It would be a nice-to-have if the fix for this let users provide custom private domains, as well as the option to use the PSL's.

@john-kurkowski
Copy link
Owner

I really want this feature. Reflecting a bit though, it's debatable if this should be the default in 1.x.

@fletom
Copy link

fletom commented Apr 28, 2014

In my opinion, this should be the default. Your package is called "tldextract", and these public sufffixes are not TLDs. An include_public_suffixes = False flag should do the trick.

From what I can see, it looks like this has confused many people trying to use this package, which further proves my point.

@pranavsharma
Copy link

Has this issue been fixed? I just downloaded the latest version and I'm still seeing the issue. Thanks.

@john-kurkowski
Copy link
Owner

Nope, this is still very much an issue, reading in the entire PSL as is. I welcome a PR along the lines of my suggestion in my comment above.

@soggychips
Copy link

Just curious on the status of this. Are we waiting for someone to write a pull request? Is it in progress? Thanks! =)

@john-kurkowski
Copy link
Owner

Nope, no progress that I'm aware of. I don't have the time or urge to do it. Again, I'd welcome a PR along the lines of my suggestion above.

@soggychips
Copy link

Gotcha. I'm afraid I can't use this package with full confidence until then, seeing as it doesn't quite work out of the box.

I'll take a look at the code and your suggestion, see what I can come up with.

john-kurkowski added a commit that referenced this issue Sep 9, 2014
Set include_psl_private_domains=True and update your PSL snapshot to revert to the old behavior.
@john-kurkowski
Copy link
Owner

Fixed.

I forewent any custom inclusions/exclusions at runtime, since that complicates caching and the upstream contents. Nobody was clamoring for it anyway. You can always point to a custom PSL URL, besides.

@john-kurkowski
Copy link
Owner

This long-awaited fix is now in the wild in 1.5.

@soggychips
Copy link

Awesome, thanks a lot, John!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants