This project is done under the BDMA first semester at Université Libre de Bruxelles
Create a PostgreSQL (Postgres) extension that provides a Uniform Resource Locator URL data type. A URL represents a pointer to a "resource" on the World Wide Web. A resource can be something as simple as a file or a directory, or it can be a reference to a more complicated object, such as a query to a database or to a search engine. A URL has a structure, which is standardized by W3C.
The goal of this project is to create an equivalent type to java.net.URL class in Postgres. More information about the java class can be found here
This project is done by:
- Ali AbuSaleh
- Muhammad Rizwan Khalid
- Ahmad
- Prashant Gupta
Under supervision of professor Mahmoud Sakr
- liburiparser library
- PostgreSQL
- Python
-
Constractor methods are
'www.google.com'::url
url_in(URL context, varchar spec)
url_in(varchar protocol, varchar host, varchar file)
url_in(varchar protocol, varchar host, int port, varchar file)
-
Casting
Cast ( text as url)
'text'::url
-
Operators
- operator (=) : compare Hosts Only.
-
Other methods
sameHost(url, url)
:Compare only hosts of the URLs
sameFile(url, url)
:Compare whole URLs without fragment
equals(url, url)
:Compare whole URLs including fragment
getPort(url)
:Get the port fir a Given URL.
getDefaultPort(url)
:Get Default Port for a given URL.
getHost(url)
:Get the Host for a given URL.
getProtocol(url)
:Get the protocol for a given URL.
getQuery(url)
Get the query for a given URL.
getUserInfo(url)
Get UserInfo for a given URL.
getRef(url)
Get the reference for a given URL.
getPath(url)
Get the path for a given URL.
getFile(url)
Get the file for a given URL.
getAuthority(url)
Get the authority for a given URL.
to_string(url)
cast URL to string.
$ pip install uniparser
$ git clone https://github.com/aliabusaleh/bdma-dbsa-url-postgres.git
$ cd bdma-dbsa-url-postgres
$ sudo make
$ sudo make install
Login to Postgres using your creds
postgres=# create extension url;
CREATE EXTENSION
using testing.sql
postgres=# create table test(arg url);
CREATE TABLE
postgres=#
postgres=# insert into test(arg) values (url_in('https', 'www.yahoo.com', 32, 'abc.png'));
INSERT 0 1
postgres=# insert into test select url_in(arg, '/test') from test;
INSERT 0 1
postgres=#
postgres=# insert into test values ('https://www.google.com?q=a&c=4');
INSERT 0 1
postgres=# insert into test values ('https://abc:xyz@www.yes.com/abc/xyz?s=3#njcdncnd');
INSERT 0 1
postgres=# insert into test values('www.abc.com');
INSERT 0 1
postgres=# insert into test values('www.abc.com/33');
INSERT 0 1
postgres=# insert into test values('www.google.com');
INSERT 0 1
postgres=# insert into test values('www.google.com/path1/path2');
INSERT 0 1
postgres=# insert into test values('www.google.com/path1/path2/file.html');
INSERT 0 1
postgres=# insert into test values('www.google.com/path1/path2/file.html#2');
INSERT 0 1
postgres=# insert into test values('www.google.com/path1/path2/file.html#2');
INSERT 0 1
postgres=# insert into test values('www.google.com/path1/path2/path3/file.html#2');
INSERT 0 1
postgres=#
postgres=# create index arg_idx on test(arg);
CREATE INDEX
postgres=# select * from test ;
arg
--------------------------------------------------
https://www.yahoo.com:32/abc.png
https://www.yahoo.com:32/abc.png/test
https://www.google.com?q=a&c=4
https://abc:xyz@www.yes.com/abc/xyz?s=3#njcdncnd
www.abc.com
www.abc.com/33
www.google.com
www.google.com/path1/path2
www.google.com/path1/path2/file.html
www.google.com/path1/path2/file.html#2
www.google.com/path1/path2/file.html#2
www.google.com/path1/path2/path3/file.html#2
(12 rows)
postgres=# set enable_seqscan=False; -- disable sequantal scan
SET
postgres=#
postgres=# select * from test where arg = 'www.abc.com'; -- check host only
arg
----------------
www.abc.com
www.abc.com/33
(2 rows)
postgres=# select * from test where arg = 'www.abc.com/22'; -- check host only
arg
----------------
www.abc.com
www.abc.com/33
(2 rows)
postgres=#
postgres=# select * from test where sameHost(arg, 'www.abc.com'); -- check only host
arg
----------------
www.abc.com
www.abc.com/33
(2 rows)
postgres=# select * from test where sameHost(arg, 'www.abc.com/newpath/'); -- check only host
arg
----------------
www.abc.com
www.abc.com/33
(2 rows)
postgres=# select * from test where sameHost(arg, 'www.google.com/newpath/pathtest'); -- check only host
arg
----------------------------------------------
https://www.google.com?q=a&c=4
www.google.com
www.google.com/path1/path2
www.google.com/path1/path2/file.html
www.google.com/path1/path2/file.html#2
www.google.com/path1/path2/file.html#2
www.google.com/path1/path2/path3/file.html#2
(7 rows)
postgres=# select * from test where sameHost(arg, 'www.yes.com'); -- check only host
arg
--------------------------------------------------
https://abc:xyz@www.yes.com/abc/xyz?s=3#njcdncnd
(1 row)
postgres=#
postgres=# select * from test where sameFile(arg, 'www.google.com/path1/path2/home.html'); -- check file
arg
-----
(0 rows)
postgres=# select * from test where sameFile(arg, 'www.google.com/path1/path2/file.html#4'); -- file only ignore fragment
arg
----------------------------------------
www.google.com/path1/path2/file.html
www.google.com/path1/path2/file.html#2
www.google.com/path1/path2/file.html#2
(3 rows)
postgres=# select * from test where sameFile(arg, 'www.google.com/path1/path2/path3/file.html#66'); -- file only ignore fragment
arg
----------------------------------------------
www.google.com/path1/path2/path3/file.html#2
(1 row)
postgres=# select * from test where sameFile(arg, 'https://abc:xyz@www.yes.com/abc/xyz?s=3#22newfrag'); -- file only ignore fragment ( case diff fragment )
arg
--------------------------------------------------
https://abc:xyz@www.yes.com/abc/xyz?s=3#njcdncnd
(1 row)
postgres=# select * from test where sameFile(arg, 'https://abc:xyz@www.yes.com/abc/xyz?s=newquery'); -- -- file only ignore fragment ( case diff file )
arg
-----
(0 rows)
postgres=#
postgres=#
postgres=# select * from test where equals(arg, 'www.abc.com/22'); -- check all url
arg
-----
(0 rows)
postgres=# select * from test where equals(arg, 'www.google.com/path1'); -- check all url
arg
-----
(0 rows)
postgres=# select * from test where equals(arg, 'www.abc.com/33'); -- check all url
arg
----------------
www.abc.com/33
(1 row)
postgres=#
To easily install and use the extension without worrying about platform. Please follow below steps
- Install docker in your system. Please follow the below link to install docker in windows and mac respectively
-
Move to the directory of source code of extension
-
Run Below command to start docker container
docker-compose up -d
- After start of docker container. Enter the docker container.
docker exec -it postgre_ext bash
- In docker container, start the postgres process.
pg_ctlcluster 14 main start
- Build the extension with
make
command
make
make install
- Setup extension in postgres
sudo -i -u postgres
psql postgres
- Inside postgres install the extension
create extension url;
Now extension is installed and ready to be used.
Feel free to contact us for more questions regarding this extension