A java web spider.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Charlotte/src/us/rockhopper/spider
libs
src/us/rockhopper/spider
.gitattributes
.gitignore
README.md

README.md

Charlotte

A java web-spider.

This program takes a given seed URL (in this case rockhopper.us) and conducts either a breadth-first search or depth-first search on the structure of pages linked from the seed site. In this way the structure of the internet in the vicinity of a particular webpage can be understood.

####Database Structure The results gathered by Charlotte are stored in a MySQL database (configurable in the code, but curently scrubbed of login credentials for my own database). In my implementation I used a WAMP stack, although many other common database solutions could be easily implemented into this code. The structure of the internet is stored in graph form using two tables. The first keys an automatically incrementing integer ID to each unique URL encountered, and the second stores tuples of ID numbers which represent an edge in the graph. Together, these two tables can be used to reconstruct the structure of the internet as parsed by Charlotte.

####Application In this video, I describe how I used Charlotte's data in my Web project. Internet Visualization

####Acknowledgments Charlotte's DFS code was learned from ryanlr's excellent tutorial. JSoup was also invaluable in implementing this project. I brainstormed Charlotte's BFS method at the recent PennApps XII in conjunction with Charles Nickerson. Equipment used and code implemented are my own.