Run arguments for testing on local machine:
KVS Master: 8000
KVS Worker: 8001 localhost:8000
Flame Master: 9000
Flame Worker: 9001 localhost:9000
Run arguments for EC2:
KVS Master: sudo java -cp bin cis5550.kvs.Master 8000
KVS Worker: sudo java -cp bin cis5550.kvs.Worker 8001 worker1 44.198.182.73:8000
Flame Master: sudo java -cp bin cis5550.flame.Master 9000 44.198.182.73:8000
Flame Worker: sudo java -cp bin cis5550.flame.Worker 9001 44.198.182.73:9000
Flame submit(run this command on a local machine): java -cp bin cis5550.flame.FlameSubmit 44.198.182.73:9000 crawler.jar cis5550.jobs.Crawler https://en.wikipedia.org/
To restart the crawler after pausing, find in the kvs worker directory a lengthy-named table that contains column key and url. That's the queue we should continue with. You may safety delete the other lengthy-named table, which should be empty. Start the crawler with the usual setting, but in the arguments provided to the crawler, put in "-t table-name" (Don't put ".table" after the name). The crawler should pick up the queue. The initial crawls may show "Already attempted," and this is normal because we start from the beginning of the queue, which may contain urls we have tried.
- To compile use
javac cis5550/jobs/Indexer.java && jar -cf xxx.jar cis5550/jobs/xxx.class
- To run use
java cis5550.flame.FlameSubmit localhost:9000 xxx.jar cis5550.jobs.xxx
- Descending sort of 3 * w_td * w_tq + 0.75 * page_rank, where
- w_td uses Euclidean normalized tf_weighting without use of idf
- w_tq is the product of the query term prequency and the idf raised to 1.5
- page_rank is the the page rank value resulting out of the iterative page rank algorithm.
- To run the ranker, use
java -cp lib/\*:src cis5550.ranker.Ranker port kvs_ip:kvs_port
JSON stringified Java objects, where each object has three fields:
- url - url of the webpage,
- title - title of the webpage (max of 60 characters),
- page_head - snippet of the webpage (max of 300 characters)
- Has a GET request method with path
/search
with- a required query parameter
q
that has the encoded query phrase string - an optional query parameter
page
that defaults to 1 (first page).
- a required query parameter
- Each page returns 10 urls. Pages beyond the number of matching urls will return empty strings.
javac -cp "lib/*" --source-path src src/cis5550/webserver/TestServer.java
java -cp "../lib/*;" cis5550.webserver.TestServer <frontend-server port> <ranker ip:port> <kvs ip:port>
Open browser tab at frontend-server port
- TestServer class defined routes:
/
: this route simply return the home page/search?q=&p=
: this route expects a json object like below:
class SearchResult {
String title;
String url;
String page_head;
public SearchResult(String title, String url, String page_head) {
this.title = title;
this.url = url;
}
}
class SearchResultsResponse {
List<SearchResult> results;
int page;
int totalPages;
public SearchResultsResponse(List<SearchResult> results, int page, int totalPages) {
this.results = results;
this.page = page;
this.totalPages = totalPages;
}
}