Skip to content
This repository has been archived by the owner on Jun 17, 2022. It is now read-only.

Improve hostname and domain retrieval #547

Closed
wants to merge 29 commits into from

Conversation

djsmith85
Copy link
Contributor

@djsmith85 djsmith85 commented Nov 11, 2021

Type of change

  • Bug fix
  • New feature development
  • Tech debt (refactoring, code cleanup, dependency upgrades, etc)
  • Build/deploy pipeline (DevOps)
  • Other

Objective

Improve the retrieval and validation of hostnames and domains, which affects url matching/auto-fill
Newly supported scenarios:

  • IPv6
  • subdomains with an underscore
  • Fixes a scenario where IPv4 addresses would not be returned as they got prefixed with http:// (Regex fails)
  • Fully up-to-date public-suffix list instead of relying on a regex only including the most common tlds
  • Support for parsing urls with umlauts

If for any reason the new detection does not work, I've included a fallback to the original parsing logic.

Marking this as draft:
As this affects a core part of the product/clients, I'd like to discuss the points below with the team.

  • I'd like to discuss the presence of the fallback to the original logic
    • If we can get rid of it, we could remove tldjs as a dependency
    • Remove tldjs.noop.ts
    • Clean up the tsconfig.json in some repos that map tldjs to tldjs.noop.ts
  • The need for a tldjs.noop.ts or for that matter tldts.noop.ts

Asana task: https://app.asana.com/0/1153292148278596/1201431275009049/f

Closes bitwarden/web#545

Could possibly improve or fix the following issues;

Code changes

  • spec/common/misc/utils.spec.ts: Added several new test cases
  • common/package.json: Added new dependency tldts to parse urls
  • common/package-lock.json: Updates after adding tldts and running npm i
  • package-lock.json: Updates after adding tldts and running npm i
  • common/src/misc/utils.ts: Added several new test cases
  • common/src/misc/tldts.noop.ts: Included method overrides similar to existing tldjs.noop.ts

Testing requirements

For jslib itself there won't be much to test/verify other than the present unit tests.
The following PR's for the affected repos will have a lot of areas to test. Will detail them there once this is ready for merge.

Before you submit

  • I have checked for linting errors (npm run lint) (required)
  • I have added unit tests where it makes sense to do so (encouraged but not required)
  • This change requires a documentation update (notify the documentation team)
  • This change has particular deployment requirements (notify the DevOps team)

@eliykat
Copy link
Member

eliykat commented Nov 24, 2021

May also fix bitwarden/clients#2177. Would need to check what method the Excluded Domains page calls during validation.

Copy link
Member

@eliykat eliykat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @djsmith85. See comments below. In answer to your discussion questions:

  • let's deprecate tldjs and the old tldjs logic. We seem to have lots of test case coverage here, that lets us be aggressive in our refactoring. There's no point keeping around old stuff "just in case".
  • let's try to get rid of the noop implementation by moving this to a separate class and only using it in the clients that require it. Utils is supposed to be shared by all Typescript clients, stubbing out dependencies with noop implementation is a bad pattern imo.

@@ -1,4 +1,5 @@
import * as tldjs from 'tldjs';
import * as tldts from 'tldts';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to import it this way when it's a native Typescript project? We should be able to use the usual import format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessarily, but I did do it this way, as the getHostname method clashes with ours in utils.ts

@@ -0,0 +1,7 @@
export function parse(url: string, options?: object): null {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we have a noop version so that other clients can use the Utils class without needing this dependency. But this seems like a bad pattern. Can we extract this out to a new class/service instead, so that it's only imported by the clients that need it, it doesn't interfere with Utils, and we don't need the noop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment here: #547 (comment)

common/src/misc/utils.ts Outdated Show resolved Hide resolved
return null;
}

uriString = uriString.trim();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd include this in the previous if block just for brevity:
`if (uriString == null || uriString.trim() == '')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling trim() was actually important before executing the old parsing logic, as it would throw an exception if it was not trimmed beforehand

if (tldts.getHostname != null) {
// Does uriString contain invalid characters
// TODO Needs to possibly be extended, although '!' is a reserved character
if (uriString.indexOf('!') > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can tldts do this kind of validation for us? Crafting regexes here seems like a losing battle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason I included is, a test would be failing as it's assumed to be an invalid url. The issue is the exclamation mark is actually a reserved character, it's just not commonly used. So I'd prefer to adjust the test and get rid of this check.

return null;
}

if (tldts.getHostname != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is a holdover pattern from tldjs. I don't know if/why we need this, I suspect it's a holdover from using a js library and/or the noop implementation. If you move this into its own class then I think you can remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is just a holdover in case we'd use the tldts.noop. Has been removed in a recent push.

return hostname;
}
}

const url = Utils.getUrl(uriString);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably don't need this if we're removing the if statement on line 201, right? I assume it's a fallback in case tldjs/ts isn't present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a fallback and I've removed it.

expect(Utils.getDomain('https://my_vault.bitwarden.com/')).toBe('bitwarden.com');
});

it('should support urls containing umlauts', () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these are valid test cases because domain names do not actually contain unicode characters. If we want the domain name from a string containing unicode, I think it should do the punycode translation in order to get the "real" domain name. e.g.

expect(Utils.getDomain('bütwarden.com')).toBe('xn--btwarden-65a.com');

However, this does depend on what components/services are using this method. There may be some situations where we want to display the unicode in the GUI. But for behind the scenes domain matching logic, we always need the punycode. I suggest doing some investigation here.

spec/common/misc/utils.spec.ts Outdated Show resolved Hide resolved
@djsmith85
Copy link
Contributor Author

@eliykat Thank you for taking the time to go through this and also the great feedback. I've decided to ditch the noop's and also the fallback logic. Pulling this stuff into a separate class/service would be great but it is all static and it's also being used inside our models (loginUriView). So no way to inject that.

I'll need to have a look into for example the web repo and why we are overriding tldjs with the noop. I don't really see any good value/reason for it anymore.

@Hinton
Copy link
Member

Hinton commented Jun 3, 2022

Hi,

We are currently in the process of archiving this repository and moving the code and future development to the bitwarden/clients repository. This PR will be closed but feel free to re-open it again on bitwarden/clients.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants