A data scraping library for Java
Licensed under BSD License.
eXparity Data is a Java Library which provides data scraping, manipulation, and ingestion tools for both structured and unstructured data sourced from the internet, local files, or any other source
You can obtain eXparity Data binaries from maven central. To include in your project:
A maven project
<dependency>
<groupId>org.exparity</groupId>
<artifactId>exparity-data</artifactId>
<version>1.0.0</version>
</dependency>
eXparity Data has a single binary, exparity-data.jar, which contains all the utilities. Sources and JavaDoc jars are available.
The exparity-data library current supports 4 file formats; HTML, XML, CSV, and Text, and it can load them from the internet, a classpath resource, the file system, and InputStream and Reader implementations.
The file format classes are all found in the org.exparity.data package and can be instantiated through static methods. For example:
HTML html = HTML.openURL("http://www.google.com/");
CSV csv = CSV.openFile("C:/Users/Bob/Desktop/MyCSV.csv");
Once a file has been instantiated then the library provides tools to interrogate and process the data. For example
List<String> headers = CSV.openFile("...").getHeaders();
List<Anchor> anchors = HTML.openURL("...").findAnchors();
The Javadocs include examples on all methods so you can look there for examples for specific methods
The source is structured along the lines of the maven standard folder structure for a jar project.
- Core classes [src/main/java]
- Unit tests [src/test/java]
The source includes a pom.xml for building with Maven
1.0.0
- Initial release cut of code
Developers:
- Stewart Bissett