Skip to content
This repository has been archived by the owner on Apr 24, 2020. It is now read-only.

Commit

Permalink
Merge pull request #66 from amihaiemil/65
Browse files Browse the repository at this point in the history
Cleaned phantomjs references
  • Loading branch information
amihaiemil committed Nov 4, 2016
2 parents 9c446fa + 57fe055 commit cbfb7c1
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 38 deletions.
80 changes: 43 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,37 +1,43 @@
<img src="http://www.amihaiemil.com/images/roundcharleslogo.PNG" align="left" height="100" width="100"/>
# charles

Smart web crawler.
[![Build Status](https://travis-ci.org/amihaiemil/charles.svg?branch=master)](https://travis-ci.org/amihaiemil/charles)
[![Coverage Status](https://coveralls.io/repos/github/amihaiemil/charles/badge.svg?branch=master&service=github)](https://coveralls.io/github/amihaiemil/charles?branch=master)

### v 1.0.0 (to be released)

A smart web crawler that fetches data from a website and stores it in some way (writes it in files on the disk or POSTs it to an http endpoint etc) .

2 options for crawling:

1) crawl the links from a ``sitemap.xml``

2) crawl the website as a graph starting from a given url (the index)

### Under the hood

Charles is powered by [Selenium WebDriver 2.41](http://www.seleniumhq.org/projects/webdriver/) and [PhantomJS](http://phantomjs.org/) through [GhostDriver](https://github.com/detro/ghostdriver). Crawling with other, graphical, drivers like ChromeDriver and FirefoxDriver will also be implemented.

### How to contribute

1. Open an issue regarding an improvement you thought of, or a bug you noticed.
2. If the issue is confirmed, fork the repository, do the changes on a sepparate branch and make a Pull Request.
3. After review and acceptance, the PR is merged and closed.
4. You are automatically listed as a contributor on the project's site

Make sure the maven build

``$mvn clean install -Pitcases``

passes before making a PR.

### Running integration tests:

In order to run the integration tests you need to have PhantomJS installed on your machine and set the JVM system property ``phantomjsExec`` to point to that location. By default the exe is looked up at ``/usr/local/bin/phantomjs`` (linux), so if it's not found the tests won't work.
<img src="http://www.amihaiemil.com/images/roundcharleslogo.PNG" align="left" height="100" width="100"/>
# charles

Smart web crawler.
[![Build Status](https://travis-ci.org/amihaiemil/charles.svg?branch=master)](https://travis-ci.org/amihaiemil/charles)
[![Coverage Status](https://coveralls.io/repos/github/amihaiemil/charles/badge.svg?branch=master&service=github)](https://coveralls.io/github/amihaiemil/charles?branch=master)

### v 1.0.0 (to be released)

A smart web crawler that fetches data from a website and stores it in some way (writes it in files on the disk or POSTs it to an http endpoint etc) .

More options for crawling:

1) crawl the links from a ``sitemap.xml``

2) crawl the website as a graph starting from a given url (the index)

3) crawl with retrial if any ``RuntimeException`` happens etc
### Under the hood

Charles is powered by [Selenium WebDriver](http://www.seleniumhq.org/projects/webdriver/).
Any WebDriver implementation can be used to build a ``WebCrawl``
Examples:
- [PhantomJsDriver](https://github.com/detro/ghostdriver)
- FirefoxDriver
- ChromeDriver etc

### How to contribute

1. Open an issue regarding an improvement you thought of, or a bug you noticed.
2. If the issue is confirmed, fork the repository, do the changes on a sepparate branch and make a Pull Request.
3. After review and acceptance, the PR is merged and closed.
4. You are automatically listed as a contributor on the project's site

Make sure the maven build

``$mvn clean install -Pitcases``

passes before making a PR.

### Running integration tests:

In order to run the integration tests you need to have PhantomJS installed on your machine and set the JVM system property ``phantomjsExec`` to point to that location. By default the exe is looked up at ``/usr/local/bin/phantomjs`` (linux), so if it's not found the tests won't work.
23 changes: 22 additions & 1 deletion pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,31 @@
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.21</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-remote-driver</artifactId>
<version>2.41.0</version>
</dependency>
<dependency>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
<version>2.41.0</version>
</dependency>
<dependency>
<groupId>com.github.detro</groupId>
<artifactId>phantomjsdriver</artifactId>
<version>1.2.0</version>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-java</artifactId>
</exclusion>
<exclusion>
<groupId>org.seleniumhq.selenium</groupId>
<artifactId>selenium-remote-driver</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>com.jcabi</groupId>
Expand Down Expand Up @@ -118,7 +139,7 @@
<artifactId>coveralls-maven-plugin</artifactId>
<version>4.2.0</version>
<configuration>
<failOnServiceError>false</failOnServiceError>
<failOnServiceError>false</failOnServiceError>
</configuration>
</plugin>
</plugins>
Expand Down

0 comments on commit cbfb7c1

Please sign in to comment.