Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update jars, fix link rot #41

Merged
merged 2 commits into from
Jan 28, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
deps/tika-app-1.10.jar
deps/*.jar
deps/fop-2.0
deps/fop-2.0-bin.tar.gz
deps/*.tar.gz
test/simple.pdf
.DS_Store
docs/build
Expand Down
12 changes: 6 additions & 6 deletions deps/build.jl
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
tdeps = dirname(@__FILE__)
tika_jar = joinpath(tdeps, "tika-app-1.10.jar")
tika_jar = joinpath(tdeps, "tika-app-1.17.jar")
if !isfile(tika_jar)
info(" Downloading tika-app-1.10.jar from Maven Central")
download("http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-app/1.10/tika-app-1.10.jar", tika_jar)
info(" Downloading tika-app-1.17.jar from Maven Central")
download("http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-app/1.17/tika-app-1.17.jar", tika_jar)
end

fop_jar = joinpath(tdeps, "fop-2,0", "fop-2.0.jar")
fop_gz = joinpath(tdeps, "fop-2.0-bin.tar.gz")
fop_jar = joinpath(tdeps, "fop-2,2", "fop-2.2.jar")
fop_gz = joinpath(tdeps, "fop-2.2-bin.tar.gz")

if !isfile(fop_gz)
info(" Downloading fop-2.0 binary from Apache OSUOSL Mirror")
info(" Downloading fop-2.2 binary from Apache OSUOSL Mirror")
download("http://apache.osuosl.org/xmlgraphics/fop/binaries/fop-2.0-bin.tar.gz", fop_gz)
end
if !isfile(fop_jar)
Expand Down
6 changes: 3 additions & 3 deletions docs/src/guide/extract.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Taro includes a few high level functions that extract data from various document
##Text extraction

The [`Taro.extract`](@ref) method retrieves document metadata and the body text of a document,
using [Apache Tika](https://tika.apache.org/). Formats [supported by Tika](https://tika.apache.org/1.13/formats.html)
using [Apache Tika](https://tika.apache.org/). Formats [supported by Tika](https://tika.apache.org/1.17/formats.html)
include MS Office and Open Office documents, as well as PDF files.

The function return a Tuple of a Dict and String. The Dict contains name/value pairs of various metadata from the document, while the string contains the body text of the document.
Expand All @@ -21,12 +21,12 @@ text[1:53]
## Read Excel files into a DataFrame

The [`Taro.readxl`](@ref) method reads a rectangular region from an excel sheet, and
returns a [Dataframe](http://juliastats.github.io/DataFrames.jl/stable/man/getting_started/#The-DataFrame-Type-1).
returns a [Dataframe](http://juliadata.github.io/DataFrames.jl/latest/man/getting_started.html#The-DataFrame-Type-1).
This function takes as an input parameter the name and path of the Excel file to be read. A sheet name (or number) can optionally be supplied. If no sheet information is given, the first sheet (index 0) is read. Finally, this
function is provided with the rectangular region from which data is extracted. This region is specified as an excel
range.

This function is similar to, and inspired by, the [readtable](http://juliastats.github.io/DataFrames.jl/stable/man/io/#DataFrames.readtable) function in DataFrames.
This function is similar to, and inspired by, the [CSV.read/DataFrames.readtable](http://juliadata.github.io/CSV.jl/latest/#CSV.read) function in CSV/DataFrames.

```@repl
using Taro # hide
Expand Down
6 changes: 3 additions & 3 deletions docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ The [ExcelReaders.jl](https://github.com/davidanthoff/ExcelReaders.jl) package a
julia> Pkg.add("Taro")
```

On installation, the `tika-app-1.4.jar` file will be downloaded from *Maven Central*
and `fop-2.0` will be downloaded from an Apache mirror.
On installation, the `tika-app-1.17.jar` file will be downloaded from *Maven Central*
and `fop-2.2` will be downloaded from an Apache mirror.

## Usage

Expand All @@ -31,4 +31,4 @@ This will set up the correct classpath, and initialise the JVM.
using Taro
Taro.init()
```
Note: The reason why we do not run init() automatically on module load has to do with the fact that only one embedded JVM can be loaded per process. We need to set the classpath when we start the JVM. Thus, when we load two different packages which both depend on the JVM, we need provide the ability for all packages to modify the Java classpath.
Note: The reason why we do not run init() automatically on module load has to do with the fact that only one embedded JVM can be loaded per process. We need to set the classpath when we start the JVM. Thus, when we load two different packages which both depend on the JVM, we need provide the ability for all packages to modify the Java classpath.
2 changes: 1 addition & 1 deletion src/Taro.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ using JavaCall
using DataFrames
using DataArrays

tika_jar = joinpath(dirname(@__FILE__), "..", "deps", "tika-app-1.10.jar")
tika_jar = joinpath(dirname(@__FILE__), "..", "deps", "tika-app-1.17.jar")
fop_lib = joinpath(dirname(@__FILE__), "..", "deps", "fop-2.0", "lib", "*")
fop_jar = joinpath(dirname(@__FILE__), "..", "deps", "fop-2.0", "build", "fop.jar")

Expand Down