Webcrawler Project

CS5700 Computer Networking

Briefly describe your high-level approach (all steps involved in logging in, crawling and getting the secret Flags)

We have first created a class according to html parser library. We captured all the tags that will include ‘a’ tag for the urls to be crawled and make sure it will not be revisited. We also include the ‘input’ tag to search for the middleware token. Finally, we used the “handle_data” function to search for the secret flag.

Any challenges you faced

One of the major challenges we faced was to handle the receive function, which involves a lot of byte and string conversion and parsing.
Parsed receiving data.
Adjusted the handle_starttag methods in the FakebookHTMLParser so that it can get the special tag properly.

An overview of how you tested your code

We mainly started off by simply printing line-by-line when we had issues and debug starting at that point where we found issues.
We then replaced the empty string in our username and password in the parse_cmd_line() function
We commented out sys.argv[1] and sys.argv[2], so we can test in console first.

You must also include a breakdown of who worked on what part(s) of the code. Also, give us the steps on how to run your code.

Each of us started our own research and finished 80% of the code. Then we met at a zoom meeting to discuss errors and issues we encountered. And finalized the project with the remaining 20%.
Amanda @amandaay: mainly focused on writing the html parser class, receiver function
Wayne @Chun-Wei-Tseng: mainly focused on get request function, cookie jar
Jason @JasonKTChen: mainly focused on login_user, start_crawling and main

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.idea		.idea
Image		Image
CS5700 Web Crawler Project Instructions.pdf		CS5700 Web Crawler Project Instructions.pdf
Makefile		Makefile
README.md		README.md
metadata.yml		metadata.yml
secret_flags		secret_flags
webcrawler		webcrawler

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webcrawler Project

CS5700 Computer Networking

Briefly describe your high-level approach (all steps involved in logging in, crawling and getting the secret Flags)

Any challenges you faced

An overview of how you tested your code

You must also include a breakdown of who worked on what part(s) of the code. Also, give us the steps on how to run your code.

Team's Terminal URL execution

Instructions to execute (Only those with NUID can access)

Contributors

About

Releases

Packages

Languages

amandaay/webcrawler

Folders and files

Latest commit

History

Repository files navigation

Webcrawler Project

CS5700 Computer Networking

Briefly describe your high-level approach (all steps involved in logging in, crawling and getting the secret Flags)

Any challenges you faced

An overview of how you tested your code

You must also include a breakdown of who worked on what part(s) of the code. Also, give us the steps on how to run your code.

Team's Terminal URL execution

Instructions to execute (Only those with NUID can access)

Contributors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages