From b14b665a7995aec8a59a6f4500aa87fefa3878c1 Mon Sep 17 00:00:00 2001 From: randyzwitch Date: Sat, 27 Jan 2018 18:56:20 -0500 Subject: [PATCH] Update jars, fix link rot Update jars to the newest ones online, fix documentation where DataFrames moved to JuliaData. --- deps/build.jl | 12 ++++++------ docs/src/guide/extract.md | 6 +++--- docs/src/index.md | 6 +++--- 3 files changed, 12 insertions(+), 12 deletions(-) diff --git a/deps/build.jl b/deps/build.jl index 7b6cbe1..beda1c2 100644 --- a/deps/build.jl +++ b/deps/build.jl @@ -1,15 +1,15 @@ tdeps = dirname(@__FILE__) -tika_jar = joinpath(tdeps, "tika-app-1.10.jar") +tika_jar = joinpath(tdeps, "tika-app-1.17.jar") if !isfile(tika_jar) - info(" Downloading tika-app-1.10.jar from Maven Central") - download("http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-app/1.10/tika-app-1.10.jar", tika_jar) + info(" Downloading tika-app-1.17.jar from Maven Central") + download("http://search.maven.org/remotecontent?filepath=org/apache/tika/tika-app/1.17/tika-app-1.17.jar", tika_jar) end -fop_jar = joinpath(tdeps, "fop-2,0", "fop-2.0.jar") -fop_gz = joinpath(tdeps, "fop-2.0-bin.tar.gz") +fop_jar = joinpath(tdeps, "fop-2,2", "fop-2.2.jar") +fop_gz = joinpath(tdeps, "fop-2.2-bin.tar.gz") if !isfile(fop_gz) - info(" Downloading fop-2.0 binary from Apache OSUOSL Mirror") + info(" Downloading fop-2.2 binary from Apache OSUOSL Mirror") download("http://apache.osuosl.org/xmlgraphics/fop/binaries/fop-2.0-bin.tar.gz", fop_gz) end if !isfile(fop_jar) diff --git a/docs/src/guide/extract.md b/docs/src/guide/extract.md index 766c88f..692499c 100644 --- a/docs/src/guide/extract.md +++ b/docs/src/guide/extract.md @@ -5,7 +5,7 @@ Taro includes a few high level functions that extract data from various document ##Text extraction The [`Taro.extract`](@ref) method retrieves document metadata and the body text of a document, -using [Apache Tika](https://tika.apache.org/). Formats [supported by Tika](https://tika.apache.org/1.13/formats.html) +using [Apache Tika](https://tika.apache.org/). Formats [supported by Tika](https://tika.apache.org/1.17/formats.html) include MS Office and Open Office documents, as well as PDF files. The function return a Tuple of a Dict and String. The Dict contains name/value pairs of various metadata from the document, while the string contains the body text of the document. @@ -21,12 +21,12 @@ text[1:53] ## Read Excel files into a DataFrame The [`Taro.readxl`](@ref) method reads a rectangular region from an excel sheet, and -returns a [Dataframe](http://juliastats.github.io/DataFrames.jl/stable/man/getting_started/#The-DataFrame-Type-1). +returns a [Dataframe](http://juliadata.github.io/DataFrames.jl/latest/man/getting_started.html#The-DataFrame-Type-1). This function takes as an input parameter the name and path of the Excel file to be read. A sheet name (or number) can optionally be supplied. If no sheet information is given, the first sheet (index 0) is read. Finally, this function is provided with the rectangular region from which data is extracted. This region is specified as an excel range. -This function is similar to, and inspired by, the [readtable](http://juliastats.github.io/DataFrames.jl/stable/man/io/#DataFrames.readtable) function in DataFrames. +This function is similar to, and inspired by, the [CSV.read/DataFrames.readtable](http://juliadata.github.io/CSV.jl/latest/#CSV.read) function in CSV/DataFrames. ```@repl using Taro # hide diff --git a/docs/src/index.md b/docs/src/index.md index acd3349..1da8afa 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -19,8 +19,8 @@ The [ExcelReaders.jl](https://github.com/davidanthoff/ExcelReaders.jl) package a julia> Pkg.add("Taro") ``` -On installation, the `tika-app-1.4.jar` file will be downloaded from *Maven Central* -and `fop-2.0` will be downloaded from an Apache mirror. +On installation, the `tika-app-1.17.jar` file will be downloaded from *Maven Central* +and `fop-2.2` will be downloaded from an Apache mirror. ## Usage @@ -31,4 +31,4 @@ This will set up the correct classpath, and initialise the JVM. using Taro Taro.init() ``` -Note: The reason why we do not run init() automatically on module load has to do with the fact that only one embedded JVM can be loaded per process. We need to set the classpath when we start the JVM. Thus, when we load two different packages which both depend on the JVM, we need provide the ability for all packages to modify the Java classpath. +Note: The reason why we do not run init() automatically on module load has to do with the fact that only one embedded JVM can be loaded per process. We need to set the classpath when we start the JVM. Thus, when we load two different packages which both depend on the JVM, we need provide the ability for all packages to modify the Java classpath.