Skip to content

Commit

Permalink
Merge branch 'v2' of github.com:aaronland/dogeared-extruder into main
Browse files Browse the repository at this point in the history
  • Loading branch information
thisisaaronland committed Feb 19, 2022
2 parents 4e97a84 + dc1fa81 commit eb8eda6
Show file tree
Hide file tree
Showing 16 changed files with 157 additions and 112 deletions.
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ FROM openjdk:17-slim

RUN mkdir /usr/local/jar

COPY --from=builder /usr/src/dogeared-extruder/target/extruder-1.1.jar /usr/local/jar/dogeared-extruder.jar
COPY --from=builder /usr/src/dogeared-extruder/target/extruder-2.0.jar /usr/local/jar/dogeared-extruder.jar
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ build: todo
mvn install

run:
java -jar target/extruder-1.0.jar server
java -jar target/extruder-2.0.jar server

todo:
echo "# Generated automatically at" `date` > TODO.txt
Expand Down
10 changes: 3 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,12 @@ This is a meant to be a simple HTTP Pony to wrap the `boilerpipe` and `Tika` and
clones of the `readability` text extraction libraries using the `dropwizard`
framework.

Important
Version "2"
--

This package was not updated between May 2014 and February 2022.

There is a [v2 branch](https://github.com/aaronland/dogeared-extruder/tree/v2) for this package with up-to-date dependencies.
Unfortunately, some of those dependencies contain changes that need to be accounted for in this package's code. That
work is underway. Any help or suggestions would be appreciated.

In the meantime, known security vulnerabilities for older dependencies have been addressed.
In February 2022 "version 2" was released which introduces no new user-facing features but updated the internal code, where necessary, to account for updated dependencies and known security vulnerabilities.

Quick start
--
Expand All @@ -24,7 +20,7 @@ To start the server:
$> cd dogeared-extruder
$> make build
... JAVA STUFF ...
$> java -jar target/extruder-1.1.jar server
$> java -jar target/extruder-2.0.jar server
... MOAR JAVA STUFF ...
INFO [2013-08-30 12:49:12,184] org.eclipse.jetty.server.AbstractConnector: Started InstrumentedBlockingChannelConnector@0.0.0.0:8080
INFO [2013-08-30 12:49:12,189] org.eclipse.jetty.server.AbstractConnector: Started SocketConnector@0.0.0.0:8081
Expand Down
7 changes: 3 additions & 4 deletions TODO.txt
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
# Generated automatically at Tue Feb 15 22:45:38 PST 2022
# Generated automatically at Fri Feb 18 17:32:35 PST 2022

./src/main/java/info/aaronland/extruder/ExtruderService.java:30: // TODO: put me in the config file... (20130908/straup)
./src/main/java/info/aaronland/extruder/JavaReadabilityResource.java:54: // TODO: trap MalformedURLExceptions and return NOT_ACCEPTABLE here (20130901/straup)
./src/main/java/info/aaronland/extruder/ExtruderApplication.java~:30: // TODO: put me in the config file... (20130908/straup)
./src/main/java/info/aaronland/extruder/BoilerpipeResource.java:52: // TODO: trap MalformedURLExceptions and return NOT_ACCEPTABLE here (20130901/straup)
./src/main/java/info/aaronland/extruder/ExtruderApplication.java:40: // TODO: put me in the config file... (20130908/straup)
./src/main/java/com/basistech/readability/Readability.java:93: // TODO: reset the results.
./src/main/java/com/basistech/readability/Readability.java:368: * http://www.peachpit.com/articles/article.aspx?p=31567&seqNum=5 TODO: Shouldn't this be a reverse
./src/main/java/com/basistech/readability/Readability.java:686: * at the same time without effecting the traversal. TODO: Consider taking into account original
./src/main/java/info/aaronland/extruder/TikaResource.java:140: // TO DO: figure out how to make this return HTML instead of text
./src/main/java/info/aaronland/extruder/TikaResource.java:139: // TO DO: figure out how to make this return HTML instead of text
./src/main/java/info/aaronland/extruder/Upload.java:20: // TO DO: sort out file extensions etc.
6 changes: 6 additions & 0 deletions configuration.yml
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
logging:
level: INFO
loggers:
info.aaronland.extruder: DEBUG
viewRendererConfiguration:
freemarker:
strict_syntax: yes
whitespace_stripping: yes
76 changes: 52 additions & 24 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -8,38 +8,53 @@

<groupId>info.aaronland.extruder</groupId>
<artifactId>extruder</artifactId>
<version>1.1</version>
<version>2.0</version>

<dependencies>

<!-- dropwizard stuff -->

<dependency>
<groupId>com.yammer.dropwizard</groupId>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-core</artifactId>
<version>0.6.2</version>
<version>2.0.28</version>
</dependency>

<dependency>
<groupId>com.yammer.dropwizard</groupId>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-views</artifactId>
<version>0.6.2</version>
<version>2.0.28</version>
</dependency>

<dependency>
<groupId>com.yammer.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>2.2.0</version>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-forms</artifactId>
<version>2.0.28</version>
</dependency>

<dependency>
<groupId>com.sun.jersey.contribs</groupId>
<artifactId>jersey-multipart</artifactId>
<!-- this is important and needs to be in sync with the jersey defined in -->
<!-- https://github.com/codahale/dropwizard/blob/master/pom.xml -->
<version>1.17.1</version>
</dependency>
<groupId>io.dropwizard</groupId>
<artifactId>dropwizard-views-freemarker</artifactId>
<version>2.0.28</version>
</dependency>

<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-core</artifactId>
<version>4.2.8</version>
</dependency>

<!--
not 3.x.y, because dropwizard is still using 2.29
- https://www.dropwizard.io/en/latest/manual/upgrade-notes/upgrade-notes-2_0_x.html
-->

<dependency>
<groupId>org.glassfish.jersey.media</groupId>
<artifactId>jersey-media-multipart</artifactId>
<version>2.29.1</version>
</dependency>

<!-- boilerpipe stuff -->

<dependency>
Expand All @@ -53,28 +68,41 @@
<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-core</artifactId>
<version>1.22</version>
<version>2.3.0</version>
</dependency>

<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers</artifactId>
<version>1.22</version>
<version>2.3.0</version>
<type>pom</type>
</dependency>

<dependency>
<groupId>org.apache.tika</groupId>
<artifactId>tika-parsers-standard-package</artifactId>
<version>2.3.0</version>
</dependency>

<!-- Misc dependencies -->

<!-- https://mvnrepository.com/artifact/javax.xml.bind/jaxb-api -->

<dependency>
<groupId>javax.xml.bind</groupId>
<artifactId>jaxb-api</artifactId>
<version>2.3.1</version>
</dependency>

<dependency>
<groupId>javax.mail</groupId>
<artifactId>javax.mail-api</artifactId>
<version>1.6.2</version>
</dependency>

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.1</version>
<version>3.12.0</version>
</dependency>

<dependency>
Expand All @@ -86,7 +114,7 @@
<dependency>
<groupId>net.sourceforge.nekohtml</groupId>
<artifactId>nekohtml</artifactId>
<version>1.9.18</version>
<version>1.9.22</version>
</dependency>

<!-- JavaReadability depedencies -->
Expand All @@ -102,7 +130,7 @@
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.7</version>
<version>2.11.0</version>
<type>jar</type>
<scope>compile</scope>
</dependency>
Expand All @@ -120,7 +148,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<version>3.10.0</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
Expand All @@ -132,7 +160,7 @@
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>1.6<!--2.1--></version>
<version>3.2.4</version>
<executions>
<execution>
<phase>package</phase>
Expand All @@ -144,7 +172,7 @@
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>info.aaronland.extruder.ExtruderService</mainClass>
<mainClass>info.aaronland.extruder.ExtruderApplication</mainClass>
</transformer>
</transformers>
<!-- http://stackoverflow.com/questions/999489/invalid-signature-file-when-attempting-to-run-a-jar -->
Expand All @@ -166,7 +194,7 @@
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.2.1</version>
<version>3.0.0</version>
<configuration>
<mainClass>info.aaronland.extruder.ExtruderService</mainClass>
<arguments>
Expand Down
4 changes: 3 additions & 1 deletion src/main/java/com/basistech/readability/FilePageReader.java
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@
import java.io.FileInputStream;
import java.io.IOException;

import org.apache.tika.io.IOUtils;
import org.apache.commons.io.IOUtils ;

// import org.apache.tika.io.IOUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

Expand Down
6 changes: 3 additions & 3 deletions src/main/java/info/aaronland/extruder/BoilerpipeResource.java
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@
import java.io.InputStream;
import java.io.File;

import com.sun.jersey.core.header.FormDataContentDisposition;
import com.sun.jersey.multipart.FormDataMultiPart;
import com.sun.jersey.multipart.FormDataBodyPart;
import org.glassfish.jersey.media.multipart.FormDataContentDisposition;
import org.glassfish.jersey.media.multipart.FormDataMultiPart;
import org.glassfish.jersey.media.multipart.FormDataBodyPart;
import javax.ws.rs.core.MediaType;

import javax.ws.rs.GET;
Expand Down
15 changes: 1 addition & 14 deletions src/main/java/info/aaronland/extruder/DocumentView.java
Original file line number Diff line number Diff line change
@@ -1,11 +1,7 @@
package info.aaronland.extruder;

import info.aaronland.extruder.Document;
import com.yammer.dropwizard.views.View;

import com.google.common.base.Charsets;
import com.google.common.base.Optional;
import java.nio.charset.Charset;
import io.dropwizard.views.View;

public class DocumentView extends View {
private final Document document;
Expand All @@ -19,13 +15,4 @@ public Document getDocument(){
return document;
}

// Because in com/codahale/dropwizard/views/freemarker/FreemarkerViewRenderer.java this:
// final Charset charset = view.getCharset().or(Charset.forName(configuration.getEncoding(locale)));
// And since the default encoding for en-us is ISO-8859-1... good times
// (20130908/straup)

public Optional<Charset> getCharset(){
return Optional.of(Charsets.UTF_8);
}

}
46 changes: 46 additions & 0 deletions src/main/java/info/aaronland/extruder/ExtruderApplication.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
package info.aaronland.extruder;

import io.dropwizard.Application;
import io.dropwizard.setup.Bootstrap;
import io.dropwizard.setup.Environment;
import io.dropwizard.views.ViewBundle;
import io.dropwizard.forms.MultiPartBundle;

import info.aaronland.extruder.ExtruderConfiguration;

import java.net.URL;
import java.util.Map;

public class ExtruderApplication extends Application<ExtruderConfiguration> {

public static void main(String[] args) throws Exception {
new ExtruderApplication().run(args);
}

public void initialize(Bootstrap<ExtruderConfiguration> bootstrap) {

bootstrap.addBundle(new MultiPartBundle());

bootstrap.addBundle(new ViewBundle<ExtruderConfiguration>(){

@Override
public Map<String, Map<String, String>> getViewConfiguration(ExtruderConfiguration config) {
return config.getViewRendererConfiguration();
}
});
}

@Override
public void run(ExtruderConfiguration conf, Environment env) throws Exception {

env.jersey().register(new BoilerpipeResource());
env.jersey().register(new TikaResource());
env.jersey().register(new JavaReadabilityResource());

// TODO: put me in the config file... (20130908/straup)
URL healthcheck_url = new URL("https://github.com/aaronland/dogeared-extruder/");

env.healthChecks().register("internets", new InternetsHealthCheck(healthcheck_url));
}

}
20 changes: 19 additions & 1 deletion src/main/java/info/aaronland/extruder/ExtruderConfiguration.java
Original file line number Diff line number Diff line change
@@ -1,9 +1,27 @@
package info.aaronland.extruder;

import com.yammer.dropwizard.config.Configuration;
import io.dropwizard.Configuration;

import javax.validation.Valid;
import javax.validation.constraints.NotNull;

import java.util.Collections;
import java.util.Map;

import com.fasterxml.jackson.annotation.JsonProperty;

public class ExtruderConfiguration extends Configuration {

@NotNull
private Map<String, Map<String, String>> viewRendererConfiguration = Collections.emptyMap();

@JsonProperty("viewRendererConfiguration")
public Map<String, Map<String, String>> getViewRendererConfiguration() {
return viewRendererConfiguration;
}

@JsonProperty("viewRendererConfiguration")
public void setViewRendererConfiguration(Map<String, Map<String, String>> viewRendererConfiguration) {
this.viewRendererConfiguration = viewRendererConfiguration;
}
}
35 changes: 0 additions & 35 deletions src/main/java/info/aaronland/extruder/ExtruderService.java

This file was deleted.

Loading

0 comments on commit eb8eda6

Please sign in to comment.