Written in java, httplinkchecker is used to check if HTTP links are broken or not
$java -jar httplinkchecker.jar -v arg-1 arg-2 ...
Note for options:
1. -r: recursively read all existing sub-directories/folders for regular files
2. -v: version information
3. -h/help: for information about the tool as well as instructions on how to use the tool
1.1 Objects to be used:
1.1.1. ArgParser:
parsing all the user input via CLI arguments
generating the corresponding absolute paths for the files to be processed by LinkRetriever
public interfaces: only getter methods
1.1.2. LinkRetriever:
based on the absolute paths generated by the ArgParser,
search the corresponding files for legal http/s links
1.1.3. LinkValidator
1.1.4. LinkStatus: use enum
1.1.5. Message: as Singleton
1.1.6. PageContentParser.java: given the HTTP link to a webpage, this class reads all the contents of the page into a single file,
retrieves all valid HTTP links in the file and check those links are broken or not.
2.1 LinkRetriever: 1. Performance is a key factor to be considered
when the number of files to be searched and the size of a file to be searched
are signficant enough to impact the performance of this tool.
2. GNU Grep is recommended for searching for HTTP links
due to its implmenetation of Boyer–Moore string-search algorithm.
3. Multithreading is considered.
4. SeekableByteChannel and ByteBuffer.allocateDirect are considered
based on the suggestions from
https://stackoverflow.com/questions/14037404/java-read-large-text-file-with-70million-line-of-text
2.2 LinkValidator: 1. Performance could become a significant issue when sending out many HTTP requests
and waiting for responses.
2. Mutlithreading is considered via java's ExecutorService to manage multithreading.
2.3 Singleton pattern is used for utility classes: LinkStatus and Message
2.4 In Message class, use HashMap instead of 2D String array to store all fixed messages for output
2.5 Displaying too many the response results simultaneously could slow down the performance of the tool
due to IO factor of a system. I consider to write all response results to a file first.
When the writing is done, display the single file onto the console.