Skip to content

TLiu2121999/Web-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

Web-Crawler

Retrieve certain information from a given web site
Documentation for Sears.com Results Text Scraper Aug. 20, 2014

This package includes: TongtongLIU_BrightEdge_fat.jar Ducument_Sears_TongtongLIU.pdf Parsing.java (/ src) Result.java (/src)

Target The package is mainly for two questions:

Query 1: Total number of a given product on Sears.com Given a keyword, such as "digital camera", return the total number of results found. Implementation: java -jar Assignment.jar (e.g. java -jar Assignment.jar "baby strollers")

Query 2: Result Object Given a keyword (e.g. "digital cameras") and page number (e.g. "1"), return the results in a result object and then print results on screen. For each result, return title, price, vendor, and vendor information. Implementation: java -jar Assignment.jar (e.g. java -jar Assignment.jar "baby strollers" 2)

Implementation: Command line For Query 1: java -jar Assignment.jar (e.g. java -jar Assignment.jar "baby strollers") For Query 2: java -jar Assignment.jar (e.g. java -jar Assignment.jar "baby strollers" 2)

Import Package “Sears_Scraper.*” to specified class files

External libraries for HTML parsing jsoup-1.7.3.jar jsoup-1.7.3-javadoc.jar jsoup-1.7.3-sources.jar Web.jar

Forms of classes and methods:

Result

name:String price: String vendor: String Result ( String name,String price,String vendor) String getName() String getPrice() String getVendor()

Parsing

results: ArrayList

String retrieveUrlFragment(String url) retrievePage(String search) int getNumProducts(String product_str) ArrayList getProductDetails(String product_str, int paging) main(String[] args)

*Note: This scraper (Assignment.jar) is developed in Java with Eclipse in win8 system. Execution of this program requires JVM. To enable the function and make sure it can be implemented in cmd, another Eclipse plugin is also needed: Fat Jar.

About

Retrieve certain information from a given web site

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages