Skip to content

Written in java, httplinkchecker is used to check if links are broken or not

License

Notifications You must be signed in to change notification settings

bpan2/httplinkchecker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

httplinkchecker

Written in java, httplinkchecker is used to check if HTTP links are broken or not

0. How to use it on the command line:

	$java -jar httplinkchecker.jar -v arg-1 arg-2 ...
	
	Note for options:
		1. -r: recursively read all existing sub-directories/folders for regular files
		2. -v: version information
		3. -h/help: for information about the tool as well as instructions on how to use the tool

1. Design Considerations:

1.1 Objects to be used:
	1.1.1. ArgParser: 
		parsing all the user input via CLI arguments
		generating the corresponding absolute paths for the files to be processed by LinkRetriever

		public interfaces: only getter methods
		
	1.1.2. LinkRetriever:
		based on the absolute paths generated by the ArgParser, 
		search the corresponding files for legal http/s links

	1.1.3. LinkValidator

	1.1.4. LinkStatus: use enum

	1.1.5. Message: as Singleton
	
	1.1.6. PageContentParser.java: given the HTTP link to a webpage, this class reads all the contents of the page into a single file,
		                       retrieves all valid HTTP links in the file and check those links are broken or not.

2. Implementation Considertations:

2.1 LinkRetriever: 1. Performance is a key factor to be considered 
			when the number of files to be searched and the size of a file to be searched 
			are signficant enough to impact the performance of this tool. 
		   2. GNU Grep is recommended for searching for HTTP links
			due to its implmenetation of Boyer–Moore string-search algorithm.
		   3. Multithreading is considered. 
		   4. SeekableByteChannel and ByteBuffer.allocateDirect are considered 
			based on the suggestions from
			https://stackoverflow.com/questions/14037404/java-read-large-text-file-with-70million-line-of-text 

2.2 LinkValidator: 1. Performance could become a significant issue when sending out many HTTP requests 
			and waiting for responses.
		   2. Mutlithreading is considered via java's ExecutorService to manage multithreading.
	 
2.3 Singleton pattern is used for utility classes: LinkStatus and Message
	
2.4 In Message class, use HashMap instead of 2D String array to store all fixed messages for output
	
2.5 Displaying too many the response results simultaneously could slow down the performance of the tool 
			due to IO factor of a system. I consider to write all response results to a file first.  
			When the writing is done,  display the single file onto the console. 

About

Written in java, httplinkchecker is used to check if links are broken or not

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages