This repository has been archived by the owner on Apr 24, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 9
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #66 from amihaiemil/65
Cleaned phantomjs references
- Loading branch information
Showing
2 changed files
with
65 additions
and
38 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,43 @@ | ||
<img src="http://www.amihaiemil.com/images/roundcharleslogo.PNG" align="left" height="100" width="100"/> | ||
# charles | ||
|
||
Smart web crawler. | ||
[![Build Status](https://travis-ci.org/amihaiemil/charles.svg?branch=master)](https://travis-ci.org/amihaiemil/charles) | ||
[![Coverage Status](https://coveralls.io/repos/github/amihaiemil/charles/badge.svg?branch=master&service=github)](https://coveralls.io/github/amihaiemil/charles?branch=master) | ||
|
||
### v 1.0.0 (to be released) | ||
|
||
A smart web crawler that fetches data from a website and stores it in some way (writes it in files on the disk or POSTs it to an http endpoint etc) . | ||
|
||
2 options for crawling: | ||
|
||
1) crawl the links from a ``sitemap.xml`` | ||
|
||
2) crawl the website as a graph starting from a given url (the index) | ||
|
||
### Under the hood | ||
|
||
Charles is powered by [Selenium WebDriver 2.41](http://www.seleniumhq.org/projects/webdriver/) and [PhantomJS](http://phantomjs.org/) through [GhostDriver](https://github.com/detro/ghostdriver). Crawling with other, graphical, drivers like ChromeDriver and FirefoxDriver will also be implemented. | ||
|
||
### How to contribute | ||
|
||
1. Open an issue regarding an improvement you thought of, or a bug you noticed. | ||
2. If the issue is confirmed, fork the repository, do the changes on a sepparate branch and make a Pull Request. | ||
3. After review and acceptance, the PR is merged and closed. | ||
4. You are automatically listed as a contributor on the project's site | ||
|
||
Make sure the maven build | ||
|
||
``$mvn clean install -Pitcases`` | ||
|
||
passes before making a PR. | ||
|
||
### Running integration tests: | ||
|
||
In order to run the integration tests you need to have PhantomJS installed on your machine and set the JVM system property ``phantomjsExec`` to point to that location. By default the exe is looked up at ``/usr/local/bin/phantomjs`` (linux), so if it's not found the tests won't work. | ||
<img src="http://www.amihaiemil.com/images/roundcharleslogo.PNG" align="left" height="100" width="100"/> | ||
# charles | ||
|
||
Smart web crawler. | ||
[![Build Status](https://travis-ci.org/amihaiemil/charles.svg?branch=master)](https://travis-ci.org/amihaiemil/charles) | ||
[![Coverage Status](https://coveralls.io/repos/github/amihaiemil/charles/badge.svg?branch=master&service=github)](https://coveralls.io/github/amihaiemil/charles?branch=master) | ||
|
||
### v 1.0.0 (to be released) | ||
|
||
A smart web crawler that fetches data from a website and stores it in some way (writes it in files on the disk or POSTs it to an http endpoint etc) . | ||
|
||
More options for crawling: | ||
|
||
1) crawl the links from a ``sitemap.xml`` | ||
|
||
2) crawl the website as a graph starting from a given url (the index) | ||
|
||
3) crawl with retrial if any ``RuntimeException`` happens etc | ||
### Under the hood | ||
|
||
Charles is powered by [Selenium WebDriver](http://www.seleniumhq.org/projects/webdriver/). | ||
Any WebDriver implementation can be used to build a ``WebCrawl`` | ||
Examples: | ||
- [PhantomJsDriver](https://github.com/detro/ghostdriver) | ||
- FirefoxDriver | ||
- ChromeDriver etc | ||
|
||
### How to contribute | ||
|
||
1. Open an issue regarding an improvement you thought of, or a bug you noticed. | ||
2. If the issue is confirmed, fork the repository, do the changes on a sepparate branch and make a Pull Request. | ||
3. After review and acceptance, the PR is merged and closed. | ||
4. You are automatically listed as a contributor on the project's site | ||
|
||
Make sure the maven build | ||
|
||
``$mvn clean install -Pitcases`` | ||
|
||
passes before making a PR. | ||
|
||
### Running integration tests: | ||
|
||
In order to run the integration tests you need to have PhantomJS installed on your machine and set the JVM system property ``phantomjsExec`` to point to that location. By default the exe is looked up at ``/usr/local/bin/phantomjs`` (linux), so if it's not found the tests won't work. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters