Skip to content

Code Documentation

FlyingWolFox edited this page Feb 14, 2024 · 6 revisions

Netscape Bookmarks File Parser

The parser analyzes the file line by line and uses recursion to get the bookmark folder tree. The creator does the opposite of that

There are three modules more the exception module. __init__.py contain all the classes, parser.py contains the parser and creator.py contains the creator

Classes

The classes are in __init__.py. It has the variable non_parsed that is a dictionary containing all lines that were ignored by the parser. The variable is synced with the one with the same name in the NetscapeBookmarkFile class

The NetscapeBookmarkFile class

Represents the Netscape Bookmark File.

Variables:

html            # the bookmark file in a string
bookmarks       # BookmarkFolder object containing the bookmark tree
non_parsed      # lines of the file that haven't been parsed, synced with the global variable
doc_type        #\
http_equiv_meta # \ header info of the bookmark file
content_meta    # /
title           #/ 

Without any other module imported, this class doesn't have any methods. When the parser module is imported, the method parse() is added to the class and when the creator module is imported, the method create_file() is added to the class

The BookmarkItem class

Represents an item in the bookmarks. An item can be a folder or an shortcut (or feed or web slice). It's attributes are:

num                 # the position of the item in the folder it's in
add_date_unix       # the creation date of the item in unix time
last_modified_unix  # the creation date of the item in unix time
parent              # the parent folder of the item. Just the root folder have this equal None
name                # name of the item

The BookmarkItem is a data class (introduced in Python 3.7). It doesn't have any method

The BookmarkFolder class

Represents an folder in the bookmarks. It's subclass of BookmarkItem, also a data class.

Variables:

personal_toolbar    # true if the folder is the bookmarks toolbar
items               # list that contains all items inside this folder
children            # list that contains all subfolders inside this folder
shortcuts           # list that contains all shortcuts inside this folder

Methods:

sync_items()                    # clears and fills the items list with the contents of the children and shortcuts lists
split_items()                   # splits the items list between the children and shortcuts
sort_items()                    # sorts the items list
sort_children_and_shortcuts()   # sorts the children and shortcuts lists

The BookmarkShortcut class

Represents a shortcut in the bookmarks. It's subclass of BookmarkItem, data class too.

Variables:

href                # link to the web page (or anything alike) of the shortcut
last_visit_unix     # date when the web paged was last visited, in unix time
private             # equals to the PRIVATE attribute
tags                # tags of this shortcut, if present
icon_url_fake       # true if the ICON_URI attribute start with fake-favicon-uri.
icon_url            # the favicon url if icon_url_fake is false and the attribute ICON_URI is present
icon_base64         # the format/encoding of the favicon and the favicon encoded data. Commonly is a png image encoded with base64. The string here can be really big
feed                # true if the attribute FEED  is present. Legacy support for feeds
web_slice           # true if the attribute WEBSLICE is present. Legacy support for web slices
comment             # comment of the shortcut if present
shortcut_url        # the shortcut keyword associated with the shortcut, if set. Used by firefox (see #8)

Doesn't have any any methods.

The BookmarkFeed class

Represents a Feed in the bookmarks. It's subclass of BookmarkShortcut, data class too.

Variables:

feed        # overrides super and its value is True
feed_url    # feed url

Doesn't have any methods. This is for legacy support

The BookmarkWebSlice class

Represents an Web Slice in the bookmarks. It's a subclass of BookmarkShortcut, data class too.

Variables:

web_slice           # overrides super and its value is True
is_live_preview     # same value of the attribute ISLIVEPREVIEW
preview_size        # the preview size in the attribute PREVIEWSIZE. It's a string

Doesn't have any methods. This is for legacy support

Modules

NetscapeBookmarksFileParser.parser

The module with functions to parse a Netscape Bookmarks File

Functions

attribute_extractor(inside: str) -> dict

Get attributes and its values from the tag content (just attributes) and returns them in a dictionary

Ex:

tag = '<A ATTRIBUTE="value">text</A>'
inside = ' ATTRIBUTE="value"'    
attribute_finder(inside) -> {'ATTRIBUTE': 'value'}

doc_type_extractor(tag: str):

Handles the !DOCTYPE tag, verifying if it matches the expected. Prints a warning if doesn't. Returns the content of !DOCTYPE

Ex:

tag = '<!DOCTYPE NETSCAPE-Bookmark-file-1>'
doc_type_extractor(tag) -> 'NETSCAPE-Bookmark-file-1'

folder_tag_extractor(tag: str):

Handles H3 tags, that represents folders. It'll create a BookmarkFolder object and fill some variables with the H3 tag's info, like name and add date. Returns the BookmarkFolder created

Ex:

tag = '<DT><H3 ADD_DATE="1530184751">Folder</H3>'
folder_tag_extractor(tag) -> BookmarkFolder(add_date_unix=1530184751, name='Folder')

shortcut_tag_extractor(tag: str, comment='')

Handles the A tags, that represents, commonly, shortcuts and their comments. It'll create a BookmarkShortcut (or subclasses: BookmarkFeed or BookmarkWebSlices, rare) and fill some variables with the A tag's info, like name, href and comment. Returns the created BookmarkShortcut (or subclass)

Ex:

tag = '<DT><A HREF="http://www.google.com" ADD_DATE="1471007115">Google</A>'
comment = 'Google!'
shortcut_tag_extractor(tag, comment) -> BookmarkShortcut(href='https://www.google.com', add_date_unix=1471007115, name=Google, comment='Google!')

shortcut_handler(line: int, a_tag: str, dd_tag: str = ''):

Handles shortcuts and their comments, A and DD tag respectively. Verifies if a_tag contains the opening and the closing of the A tag, if doesn't, an warning is printed, extracts the comment of the DD tag and calls shortcut_tag_extractor(), returning the object returned. The line argument is used for the warning

Ex:

line = 56
a_tag = tag = '<DT><A HREF="http://www.google.com" ADD_DATE="1471007115">Google</A>'
dd_tag = '<DD>Google!'
shortcut_handler(line, a_tag, dd_tag) -> BookmarkShortcut(href='https://www.google.com', add_date_unix=1471007115, name=Google, comment='Google!')

folder_handler(line: int, h3_tag: str, body: list):

Handles folder and their tree. Verifies if the H3 tag has its opening and closing, calls folder(h3_tag) and process the body. The body processing is recursive. Items inside the folder are transformed in BookmarkShortcut (or subclass), by calling shortcut_handler, subfolder have their body verified for closing tag <\DL><p>, if it isn't found an Exception will be raised, because the bookmarks file doesn't have the same number of <DL><p> and </DL><p>, that wrap a folder body, after, calls itself recursively. It's responsible for filling the num, parent, items, entries and children in the BookmarkShortcut and BookmarkFolder objects. Returns a BookmarkFolder object with every possible variable filled, and the folder tree inside of the items, shortcuts and children lists

Ex:

line = 52
h3_tag = '<DT><H3 ADD_DATE="1530184751">Folder</H3>'
body = ['<DL><p>', '<DT><A HREF="http://www.google.com" ADD_DATE="1471007115">Google</A>', '</DL><p>']
folder_handler(line, h3_tag, body) -> BookmarkFolder(add_date_unix=1530184751, name=Folder, items=[BookmarkShortcut(href='https://www.google.com', add_date_unix=1471007115, name=Google, comment='Google!')], entries=[BookmarkShortcut(href='https://www.google.com', add_date_unix=1471007115, name=Google, comment='Google!')])

parse(netscape_bookmarks_file: NetscapeBookmarksFile)

Responsible to start the parsing process. Also gets the H1 tag's content and make a fake H3 tag with it, calling folder_handler() passing the fake H3 tag and the entire body of the root bookmarks folder. Returns the NetscapeBookmarksFile parsed. This function is added to the NetscapeBookmarkFile class at import time

add_parser(cls)

Responsible for adding the parse() method to the NetscapeBookmarksFile class. It's executed when the module is imported

NetscapeBookmarksFileParser.creator

Th module with the functions needed to create a Netscape Bookmark File. It's almost like the parser in reverse

Functions

Many function do the opposite of their parser counterparts

http_verifier(url: str)

Verifies if the url has http:// or https:// at the start. If it does, returns the url with nothing changed, if it doesn't, puts http:// at the start. Function created because the shortcut's HREF attribute must start with http:// or https://

attribute_printer(attributes: dict)

Creates a string with the attributes and its value from the dictionary received. If an attribute doesn't have a value, it'll be printed, but without =

Ex:

attribute_printer({'ATTRIBUTE': 'value'}) -> 'ATTRIBUTE="value"'
attribute_printer({'ATTRIBUTE': ''}) -> 'ATTRIBUTE'

meta_creator(doctype='NETSCAPE-Bookmark-file-1', meta=None, title='Bookmarks', h1='Bookmarks')

Creates the meta (start) of the file. h1 is the name of the root bookmark folder. If no argument is passed, prints a default meta

Ex:

out = creator.meta_creator('x', ['y', 'z'], 'B', 'b')
out == '''<!DOCTYPE x>
        <!-- This is an automatically generated file.
             It will be read and overwritten.
             DO NOT EDIT! -->
        <META HTTP-EQUIV="y" CONTENT="z">
        <TITLE>B</TITLE>
        <H1>b</H1>
        '''

shortcut_creator(shortcut: BookmarkShortcut)

Creates a shortcut A tag from the BookmarkShortcut. If a value in the object is the default, that attribute won't be printed. A list is returned, with the first element being the A tag and the second, if present, the DD tag

Ex:

arg = Classes.BookmarkShortcut()
arg.href = 'https://duckduckgo.com'
arg.name = 'Duck Duck Go'
out = creator.shortcut_creator(arg)
out == ['<DT><A HREF="https://duckduckgo.com">Duck Duck Go</A>']

arg = Classes.BookmarkShortcut()
arg.href = 'https://duckduckgo.com'
arg.name = 'Duck Duck Go'
arg.add_date_unix = 1515
out = creator.shortcut_creator(arg)
out == ['<DT><A HREF="https://duckduckgo.com" ADD_DATE="1515">Duck Duck Go</A>']

folder_creator(folder: BookmarkFolder)

Creates a folder H3 tag from the BookmarkFolder. If a value in the object is the default, that attribute won't be printed

Ex:

arg = creator.BookmarkFolder()
arg.name = 'Folder'
shortcut = creator.BookmarkShortcut()
shortcut.name = 'Duck Duck Go'
shortcut.href = 'https://duckduckgo.com'
arg.items.append(shortcut)
out = creator.folder_creator(arg)
out == ['<DT><H3>Folder</H3>',
        '<DL><p>',
        '    <DT><A HREF="https://duckduckgo.com">Duck Duck Go</A>'
        '</DL><p>']

create_file(netscape_bookmarks_file: NetscapeBookmarksFile, print_meta=True)

Responsible to create the html file. It'll create the meta (if print_meta == True) and start the folder creation recursion loop. The file is put in netscape_bookmarks_file.html and returns the lines of the file, without line breaks. This function is added to the NetscapeBookmarkFile class at import time

add_creator(cls)

Responsible to add the create_file() to the NetscapeBookmarkFile class. It's ran at import time

NetscapeBookmarksFile.exceptions

Have the only exception that the parser can raise TagNotPresentException(), raised when a required tag isn't found