Skip to content

BeameryHQ/url-utils

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

47 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

URI Swiss Army Knife Utilities

You can install this via the published NPM package:

npm i beam-uri

URL Validation

A complete definition of what constitutes a valid URL can be found in RFC 3986 and RFC 3987. The short version is that a valid URL must, at minimum, consist of a scheme (https://, http://ftp://, http://gopher://) and a host name. If it does not, validation should fail, and the browser should throw an error.

A URL string is a structured string containing multiple meaningful components. When parsed, a URL object is returned containing properties for each of these components.

The Node.js url module provides two APIs for working with URLs: a legacy API that is Node.js specific, and a newer API that implements the same WHATWG URL Standard used by web browsers.

┌────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              href                                              │
├──────────┬──┬─────────────────────┬────────────────────────┬───────────────────────────┬───────┤
│ protocol │  │        auth         │          host          │           path            │ hash  │
│          │  │                     ├─────────────────┬──────┼──────────┬────────────────┤       │
│          │  │                     │    hostname     │ port │ pathname │     search     │       │
│          │  │                     │                 │      │          ├─┬──────────────┤       │
│          │  │                     │                 │      │          │ │    query     │       │
"  https:   //    user   :   pass   @ sub.example.com : 8080   /p/a/t/h  ?  query=string   #hash "
│          │  │          │          │    hostname     │ port │          │                │       │
│          │  │          │          ├─────────────────┴──────┤          │                │       │
│ protocol │  │ username │ password │          host          │          │                │       │
├──────────┴──┼──────────┴──────────┼────────────────────────┤          │                │       │
│   origin    │                     │         origin         │ pathname │     search     │ hash  │
├─────────────┴─────────────────────┴────────────────────────┴──────────┴────────────────┴───────┤
│                                              href                                              │
└────────────────────────────────────────────────────────────────────────────────────────────────┘
(all spaces in the "" line should be ignored — they are purely for formatting)

getDomain(url) ⇒ String

We can extract the domain from a url by leveraging our method for parsing the hostname. Since the above getHostName() method gets us very close to a solution, we just need to remove the sub-domain and clean-up special cases (such as .co.uk)

Returns: String - the extracted domain

getDomainName(url) ⇒ String

Extract the main domain without the .domain notation

Returns: String - the extracted domain

getHostName(url) ⇒ String

Extracting the hostname from a url is generally easier than parsing the domain. The hostname of a url consists of the entire domain plus sub-domain. We can easily parse this with a regular expression, which looks for everything to the left of the double-slash in a url. We remove the “www” (and associated integers e.g. www2), as this is typically not needed when parsing the hostname from a url

Returns: String - the extracted hostname

getLinkType(source) ⇒ String

Identify if the link is for a social website

Kind: global function

isValidIP(ip) ⇒ Boolean

Validate if a passed string is a valid IP according to: http://jsfiddle.net/AJEzQ/

Returns: Boolean - indication if the string is valid URI or not

isValidURI(url) ⇒ Boolean

Validate if a passed string is a valid URI according to: https://gist.github.com/dperini/729294

Returns: Boolean - indication if the string is valid URI or not

normalize(url) ⇒ String

normalize and canonicalise urls including data URL The function first normalize the url by performing various steps from lower-casing to encoding The function then strips any url trackers and paddings in the url The function tries to canonicalise the url if possible based on configurations depending on the domain name

Returns: String - the normalized and canonical url

removeURLTracking(url) ⇒ String

removes tracking query parameters from the url

Returns: String - strippedUrl the URL address after tracker stripping

parse(url) ⇒ Object

Parses a valid URI into its subparts

Returns: Object - the parsed url

References