Skip to content
This repository has been archived by the owner on Jul 3, 2023. It is now read-only.

Latest commit

 

History

History
975 lines (791 loc) · 44.2 KB

RELEASE-NOTES.md

File metadata and controls

975 lines (791 loc) · 44.2 KB
     Release Notes - Apache Any23 - Version 2.7

New Feature

  • [ANY23-546] - Implement sonarcloud.io in Any23 continuous integration

Improvement

  • [ANY23-553] - Document MathUtils#md5 to warn that the weak hash algorithm is not to be used in a sensitive context
  • [ANY23-555] - Bump buildnumber-maven-plugin from 1.4 to 3.0.0
  • [ANY23-556] - Bump spotbugs-maven-plugin from 4.5.2.0 to 4.5.3.0
  • [ANY23-558] - Bump maven-jar-plugin from 3.2.0 to 3.2.1
  • [ANY23-559] - Bump maven-site-plugin from 3.9.1 to 3.10.0
  • [ANY23-560] - Bump jcommander from 1.81 to 1.82
  • [ANY23-561] - Bump slf4j-api from 1.7.32 to 1.7.33
  • [ANY23-562] - Bump maven-jar-plugin from 3.2.1 to 3.2.2
  • [ANY23-563] - Bump maven-compiler-plugin from 3.8.1 to 3.9.0
  • [ANY23-564] - Bump slf4j-api from 1.7.33 to 1.7.35
  • [ANY23-565] - Bump xercesImpl from 2.12.1 to 2.12.2
  • [ANY23-566] - Bump mockito-core from 4.2.0 to 4.3.1
  • [ANY23-568] - Bump maven-compiler-plugin from 3.9.0 to 3.10.0
  • [ANY23-569] - Bump ossindex-maven-plugin from 3.1.0 to 3.2.0
  • [ANY23-570] - Bump tika.version from 2.2.1 to 2.3.0
  • [ANY23-571] - Bump maven-project-info-reports-plugin from 3.1.2 to 3.2.1
  • [ANY23-572] - Upgrade owlapi to 5.1.20
    Release Notes - Apache Any23 - Version 2.6

Bug

  • [ANY23-524] - XFNExtractor: NPE on parsing links with incorrect HREF attribute

Improvement

  • [ANY23-307] - Ensure Microformats test suite compliance
  • [ANY23-485] - Activate dependabot on Any23 codebase
  • [ANY23-486] - Bump jackson.version from 2.12.2 to 2.12.5
  • [ANY23-487] - Bump jsoup from 1.13.1 to 1.14.2
  • [ANY23-488] - Bump jsonld-java from 0.13.2 to 0.13.3
  • [ANY23-489] - Bump slf4j.logger.version from 1.7.30 to 1.7.32
  • [ANY23-490] - Bump httpclient.version from 4.5.12 to 4.5.13
  • [ANY23-491] - Bump tika.version from 1.24 to 1.27
  • [ANY23-492] - Bump poi.version from 4.1.2 to 5.0.0
  • [ANY23-493] - Bump commons-io from 2.6 to 2.7
  • [ANY23-494] - Bump maven-gpg-plugin from 1.6 to 3.0.1
  • [ANY23-496] - Bump tika.version from 1.27 to 2.1.0
  • [ANY23-497] - Bump commons-codec from 1.14 to 1.15
  • [ANY23-498] - Bump httpcore from 4.4.13 to 4.4.14
  • [ANY23-499] - Bump spotbugs-maven-plugin from 4.1.3 to 4.3.0
  • [ANY23-500] - Bump maven-assembly-plugin from 3.1.1 to 3.3.0
  • [ANY23-501] - Bump maven-invoker-plugin from 3.2.1 to 3.2.2
  • [ANY23-502] - Bump maven-surefire-plugin from 3.0.0-M3 to 3.0.0-M5
  • [ANY23-503] - Bump apache from 21 to 24
  • [ANY23-504] - XML-based parsers should not load external DTDs by default
  • [ANY23-505] - Bump maven-scm-publish-plugin from 1.0-beta-2 to 3.1.0
  • [ANY23-506] - Bump jcommander from 1.78 to 1.81
  • [ANY23-507] - Bump commons-csv from 1.8 to 1.9.0
  • [ANY23-508] - Bump maven-project-info-reports-plugin from 3.0.0 to 3.1.2
  • [ANY23-509] - Bump velocity from 1.5 to 1.7
  • [ANY23-510] - Bump maven-site-plugin from 3.7.1 to 3.9.1
  • [ANY23-511] - Bump snakeyaml from 1.26 to 1.29
  • [ANY23-512] - Bump maven-jxr-plugin from 3.0.0 to 3.1.1
  • [ANY23-513] - Bump formatter-maven-plugin from 2.14.0 to 2.16.0
  • [ANY23-514] - Bump maven-scm-provider-gitexe from 1.9 to 1.12.0
  • [ANY23-515] - Bump commons-lang3 from 3.10 to 3.12.0
  • [ANY23-516] - Bump appassembler-booter from 1.10 to 2.1.0
  • [ANY23-517] - Bump maven-javadoc-plugin from 3.2.0 to 3.3.1
  • [ANY23-518] - Bump jacoco-maven-plugin from 0.8.4 to 0.8.7
  • [ANY23-519] - Bump maven-enforcer-plugin from 3.0.0-M2 to 3.0.0
  • [ANY23-520] - Augment any23 extractor CLI to print all mimetypes for a given extractor
  • [ANY23-521] - Bump jsoup from 1.14.2 to 1.14.3
  • [ANY23-523] - Bump owlapi.version from 5.1.13 to 5.1.19 and RDF4J to 3.7.3
  • [ANY23-525] - Remove maven-war-plugin configuration from build lifecycle
  • [ANY23-526] - Bump jackson.version from 2.12.5 to 2.13.0
  • [ANY23-527] - Bump biweekly from 0.6.3 to 0.6.6
  • [ANY23-528] - Bump maven-resources-plugin from 3.1.0 to 3.2.0
  • [ANY23-529] - Bump maven-checkstyle-plugin from 3.1.1 to 3.1.2
  • [ANY23-530] - Bump xercesImpl from 2.12.0 to 2.12.1
  • [ANY23-531] - Bump commons-compress from 1.20 to 1.21
  • [ANY23-532] - Bump poi.version from 5.0.0 to 5.1.0
  • [ANY23-533] - Bump formatter-maven-plugin from 2.16.0 to 2.17.0
  • [ANY23-536] - Upgrade to tika 2.2.0
  • [ANY23-537] - Bump formatter-maven-plugin from 2.17.0 to 2.17.1
  • [ANY23-538] - Replace existing logging with Slf4j over log4j2
  • [ANY23-539] - Introduce ossindex-maven-plugin support
  • [ANY23-540] - Bump spotbugs-maven-plugin from 4.5.0.0 to 4.5.2.0
  • [ANY23-541] - Bump rdf4j.version from 3.7.3 to 3.7.4
  • [ANY23-542] - Bump jackson.version from 2.13.0 to 2.13.1
  • [ANY23-543] - Bump tika.version from 2.1.0 to 2.2.0
  • [ANY23-544] - Bump snakeyaml from 1.29 to 1.30
  • [ANY23-547] - Bump httpcore from 4.4.14 to 4.4.15
  • [ANY23-548] - Bump mockito-core from 3.3.3 to 4.2.0
  • [ANY23-549] - Bump log4j2.version from 2.17.0 to 2.17.1
  • [ANY23-550] - Bump maven-deploy-plugin from 3.0.0-M1 to 3.0.0-M2
  • [ANY23-551] - Bump tika.version from 2.2.0 to 2.2.1
         Apache Any23 2.5
          Release Notes
       27/08/2021 (dd/mm/yyy)

Sub-task

[ANY23-145] - SKOS Vocabulary
[ANY23-147] - Improve DCMIMetadataTerms Vocabularly
[ANY23-189] - Music Ontology Vocab
[ANY23-191] - BBC Sport ontology Vocab
[ANY23-192] - BBC Curriculum Ontology Vocab
[ANY23-193] - BBC Corenews Ontology Vocab
[ANY23-194] - BBC Storyline Ontology Vocab

Bug

[ANY23-159] - Error with nodes and markup extracted from HListingExtractorTest.testKelkoo & testKelkooFull
[ANY23-370] - Jenkins: IllegalStateException: checksum mismatch
[ANY23-371] - Any23 cannot start in CMD in Windows 10
[ANY23-451] - raise AirflowException("SSH operator error: {0}".format(str(e)))
[ANY23-455] - Format entire Any23 codebase with formatter-maven-plugin
[ANY23-456] - Add Github Action against Any23 pull requests
[ANY23-457] - Fix error: White spaces are required between publicId and systemId
[ANY23-458] - Improve extractor and writer information in Rover
[ANY23-462] - Address forbidden API violations

New Feature

[ANY23-10] - Integrate Javascript engine to extract dynamic data
[ANY23-138] - Add OSGi metadata
[ANY23-239] - Any23 Chrome Extension
[ANY23-294] - Create extractor plugin for IFC files
[ANY23-459] - Create git - .asf.yaml for ANY23

Improvement

[ANY23-30] - Improve Any23 Web Service Logging
[ANY23-104] - Make Any23 OSGi ready
[ANY23-127] - Parent Task for Improving Any23 Vocab Package
[ANY23-461] - Upgrade Any23 to JDK11
[ANY23-464] - Improve Performance for Inner Classes
[ANY23-465] - Use StringBuilder Instead of String Concatenation at Loop

Wish

[ANY23-216] - Any23 Firefox Extension

Task

[ANY23-15] - TODO reduction session nedeed


         Apache Any23 2.4
          Release Notes
       20/09/2020 (dd/mm/yyy)

Sub-task

[ANY23-146] - CEN Metalex Vocabulary
[ANY23-149] - Expand SCHEMAORG Vocab
[ANY23-150] - Implement all vocab.sindice.net Vocabularies
[ANY23-269] - Support auto.schema.org
[ANY23-270] - Support bib.schema.org

Bug

[ANY23-427] - http://semanticweb.org/ down causes tests to fail
[ANY23-428] - RDFa parse issue if vocab not defined with trailing slash
[ANY23-430] - Microdata and HTML's attribute case
[ANY23-441] - TikaEncodingDetector: guessEncoding may throws an ArrayIndexOutOfBoundsException
[ANY23-446] - Fix bugs in Jsoup
[ANY23-449] - Fix the online microdata test failure
[ANY23-453] - Upgrade jsonld-java to 0.13.1

New Feature

[ANY23-5] - Add support for archive input.
[ANY23-6] - Integrate MetaX support

Improvement

[ANY23-51] - Full support for rel-tag's
[ANY23-178] - Add fully annotated Javadoc to o.a.any23.source.*
[ANY23-183] - Address javac warning's in Any23 code base
[ANY23-202] - Add analytics on any23.org landing page
[ANY23-254] - Demo frontend should provide interactive CLI usage examples
[ANY23-281] - Build Policeman's Forbidden API Checker into Maven config
[ANY23-426] - Address Javadoc WARNING's
[ANY23-439] - Replace commons-lang with commons-lang3
[ANY23-440] - any23 configuration documentation has a wrong property name
[ANY23-442] - Move HTML preprocessing logic from BaseRDFExtractor to semargl Extractors
[ANY23-443] - Improve efficiency of RDFa Extractor
[ANY23-444] - Update all dependencies and plugins
[ANY23-450] - Update Maven deps and plugin versions

Wish

[ANY23-225] - Fix Javadoc WARNING's in Any23 codebase

Task

[ANY23-72] - Evaluate the introduction of Aether as to improve the Any23 plugin management system
[ANY23-429] - Website Build Fails due to Javadoc issues
[ANY23-431] - Upgrade jsoup to v1.12.1
[ANY23-432] - Upgrade owlapi to v5.1.11
[ANY23-433] - Upgrade rdf4j to v3.0.0
[ANY23-434] - Upgrade tika to v1.22
[ANY23-435] - Upgrade httpclient to v4.5.10
[ANY23-436] - Upgrade commons-csv to v1.7
[ANY23-437] - Upgrade snakeyaml to v1.25
[ANY23-438] - Upgrade slf4j-api to v1.7.28
[ANY23-448] - Move service and plugins out of core

         Apache Any23 2.3
          Release Notes
       10/02/2019 (dd/mm/yyy)

Sub-task

[ANY23-184] - Update Javadoc in o.a.a.extractor.microdata.*
[ANY23-356] - Update dependencies
[ANY23-357] - Resolve mockito deprecation warnings
[ANY23-358] - Resolve junit.framework deprecation warnings & RDFa11Parser deprecation warnings
[ANY23-359] - Resolve org.apache.commons.io.IOUtils deprecation warning
[ANY23-360] - Resolve Xerces deprecation warnings
[ANY23-361] - Resolve Tika deprecation warning
[ANY23-362] - Resolve rdf4j deprecation warnings
[ANY23-363] - Update httpclient/httpcore to version 4.5.6/4.4.10
[ANY23-364] - Resolve POI deprecation warnings
[ANY23-365] - Resolve additional warnings
[ANY23-366] - Resolve additional warnings in build
[ANY23-369] - Resolve overlapping classes
[ANY23-388] - It should be possible to configure the NTriplesWriter to use unicode points
[ANY23-404] - Make MicrodataExtractor compliant with default registry
[ANY23-405] - Parse microdata property values correctly
[ANY23-407] - Allow microdata itemids to be created from relative URLs
[ANY23-408] - Use document IRI as default namespace in microdata strict mode
[ANY23-409] - Allow multiple microdata itemtype values
[ANY23-410] - Fix microdata itemrefs

Bug

[ANY23-13] - Verify why the maven-changelog-plugin doesn't work properly
[ANY23-16] - Property URI generation for Microdata/schema.org
[ANY23-17] - problem detecting media type for turtle content with comment at the top
[ANY23-55] - any23 is not following the redirection
[ANY23-67] - Microdata extraction using obsolete RDF conversion scheme
[ANY23-154] - Not able to extract microdata in few test cases
[ANY23-167] - Microdata itemscope properties incorrectly attached
[ANY23-169] - Incorrect interpretation of relative and absolute paths with Microdata
[ANY23-188] - NPE when ICBMExtractor#getDescription()#getExtractorLabel() called
[ANY23-237] - Fix RDFa test 0087: stylesheet reserved word is stripped out
[ANY23-245] - Infinite loop on some malformed markup
[ANY23-322] - Any23 embedded service is broken
[ANY23-329] - master branch broken with pom.xml any23 version
[ANY23-331] - Tool service implementations declared in wrong module?
[ANY23-334] - SingleDocumentExtraction.createExtractionContext() uses UUID as defaultLanguage
[ANY23-336] - Parsing json-ld content takes prohibitively long time
[ANY23-337] - BenchmarkTripleHandler does not report accurate extraction interval times
[ANY23-338] - Json-ld comment parsing fails in rare cases
[ANY23-339] - Microdata extractor can sometime merge two different itemscopes into one
[ANY23-340] - Any23 extraction does not pass Nutch plugin test
[ANY23-344] - MicrodataExtractor not resolving urls correctly
[ANY23-345] - MicrodataExtractorTest has a duplicated test
[ANY23-346] - rdf4j versions 2.3.0, 2.3.1 contain a regression: we need to switch back to version 2.2.4
[ANY23-347] - RDFParseException: the prefix "pw" is not bound
[ANY23-348] - IllegalArgumentException in MicrodataExtractor
[ANY23-349] - MicrodataExtractor errors for links that are telephone numbers
[ANY23-350] - RDFParseException: "icon" must be followed by ' = ' character
[ANY23-351] - NullPointerException in HCardExtractor
[ANY23-353] - RDFParseException: datatype rdf:langString requires a language tag
[ANY23-367] - latest.stable.released property is never used and out of date
[ANY23-368] - Jenkins builds are failing after running out of disk space
[ANY23-372] - LGPL-licensed transitive dependency
[ANY23-373] - Web page /install.html: software version variable was not decoded.
[ANY23-376] - IllegalArgumentException: invalid property name ''
[ANY23-377] - Microdata extractor replaces empty strings with "Null"
[ANY23-378] - JsonParseException caused by trailing commas in JSON-LD
[ANY23-379] - RDFa SAXParseException: invalid XML character
[ANY23-380] - RDFa SAXParseException: attribute was already specified
[ANY23-381] - JsonParseException: Illegal unquoted character
[ANY23-382] - Distinguish between fatal and recoverable json-ld parsing errors
[ANY23-383] - JsonParseException: Unexpected character 0x2028
[ANY23-386] - Item's properties are in the wrong item since the 2.2
[ANY23-387] - Possible OutOfMemoryError with bad deeply nested HTML
[ANY23-389] - RDFa extraction breaks when base element uses relative href
[ANY23-391] - ICAL vocab uses class "vcalendar" instead of "Vcalendar"
[ANY23-392] - Lunching maven-jetty-plugin: Problem accessing /apache-any23-service/resources/form.html
[ANY23-395] - any23.org 500 Internal Server Error
[ANY23-406] - Cannot suppress Tika warnings
[ANY23-411] - Use Content-Type to help determine encoding
[ANY23-415] - NTriplesExtractor tries all text/plain files, causing numerous fatal issues
[ANY23-416] - NTriplesExtractor does not recognize "application/n-triples" mimetype
[ANY23-420] - Handle Json+ld extraction failure
[ANY23-425] - iCal, jCal, xCal extractors aren't listed in META-INF/services

New Feature

[ANY23-81] - Interactive web service

Improvement

[ANY23-38] - Use a single logging tool: slf4j
[ANY23-190] - any23.org homepage busted on IE11
[ANY23-212] - Improve naming convention for service output files
[ANY23-215] - Forward slashes in URL's should not be escaped in RDF output
[ANY23-231] - Make JSON Reporting output pretty print
[ANY23-240] - Option to process html tags as spaces in Microdata
[ANY23-323] - Update Eclipse RDF4J version to 2.3
[ANY23-332] - Plugin-specific properties shouldn't be declared in default-configuration.properties
[ANY23-341] - Remove dependency on defunct commons-httpclient 3.1
[ANY23-343] - Upgrade to jsonld-java v 0.12.0
[ANY23-352] - Update to rdf4j version 2.3.2
[ANY23-354] - Clean up dependencies
[ANY23-355] - Deprecate RDFa11Parser since Rio implementations are used instead
[ANY23-374] - Invalid nested item takes out everything
[ANY23-385] - Improve charset detection for (x)html documents
[ANY23-390] - Implement ICal, JCal, XCal extractors
[ANY23-393] - Any23 master to build under JDK 10.X
[ANY23-394] - JSON-LD Extractions Flag Errors in Google's Structured Data Tooling
[ANY23-396] - Overhaul WriterFactory API
[ANY23-399] - Upgrade Apache parent POM to version 21
[ANY23-401] - Upgrade to Tika 1.19.1
[ANY23-402] - Deprecate JSONWriter, JSONWriterFactory
[ANY23-403] - Upgrade to RDF4J 2.4.0
[ANY23-414] - Support reverse itemprops in microdata
[ANY23-418] - Take another look at encoding detection
[ANY23-419] - Add J2EE depednencies such that service runs under JDK11
[ANY23-424] - Update dependencies

Test

[ANY23-422] - Error message when any23 cli tool used

Task

[ANY23-333] - Augment use of Any23PluginManager in How to Register a Plugin documentation
[ANY23-423] - Update POM for the move to gitbox.

         Apache Any23 2.2
          Release Notes
          25/01/2018 (dd/mm/yyy)

Sub-task

[ANY23-155] - Test failure: testRunOnHTTPResource(org.apache.any23.cli.MicrodataParserTest)
[ANY23-267] - Entire extractions fail due to "The element type 'meta' must be terminated by the matching end-tag </meta>"
[ANY23-268] - Entire extraction task fails due to "Element type "t.length" must be followed by either attribute specifications, ">" or "/>"

Bug

[ANY23-12] - character are wrongly encoded in rdfxml output
[ANY23-131] - Nested Microdata are not extracted
[ANY23-140] - Revise Any23 tests to remove fetching of web content
[ANY23-166] - Parsing crashes with attributes that don't use quotes
[ANY23-201] - Service Regularly Times Out on DBPedia Queries
[ANY23-227] - not extracting opengraph rdfa
[ANY23-228] - Invalid URI
[ANY23-230] - any23.org redirects to single slash URI
[ANY23-256] - MicrodataParserTest failing locally but not on Jenkins
[ANY23-260] - Get Any23 listed as an Application capable of using DBPedia
[ANY23-266] - Fix Issues with Failing WebService Examples
[ANY23-271] - Address "...The entity "raquo" was referenced, but not declared" SAXParseException
[ANY23-273] - The content of elements must consist of well-formed character data or markup - no bogus comments
[ANY23-303] - JsonLdError: loading remote context failed: http://schema.org/
[ANY23-306] - Absent binaries for version 2.0
[ANY23-312] - Triple sub-pred-null should not be added into outcome. Change traversing method.
[ANY23-314] - Service fails to return extraction in case of extraction error
[ANY23-316] - Yaml parser does not halndle intentional null value
[ANY23-317] - Any23 fails when dealing with JavaScript
[ANY23-318] - ExtractionException handling in BaseRDFExtractor.java kills entire extraction
[ANY23-326] - parsing unclosed meta and input tags fails

New Feature

[ANY23-8] - Write a separate tool for RDFa/microformat detection tool usable in crawlers
[ANY23-233] - Add local extraction cache to Any23 service

Improvement

[ANY23-106] - Gracefully shut down Any23 service
[ANY23-213] - Implement JSOn reporting for the Any23 service
[ANY23-214] - ë (e-umlaut or diaeresis) not decoded in RDF output
[ANY23-249] - Update all W3C and other Standards Compliance within Any23
[ANY23-280] - Refactor ContentExtractor to improve extraction flexibility
[ANY23-291] - JSON-LD should be looked up in entire HTML document, not just in <head>
[ANY23-298] - Revisit the OGP.java vocabulary and update it
[ANY23-309] - "Scraper" misspelled as "Scarper" on Downloads webpage
[ANY23-319] - Upgrade jsonld-java dependency to 0.11.1
[ANY23-324] - Replace net.sourceforge.nekohtml with jsoup
[ANY23-325] - Any23 incompatible with http://rdfa.info/test-suite/#

Test

[ANY23-320] - Address @Ignore tests in Any23

Wish

[ANY23-210] - Address 1.0 Release Review Discrepancies

Task

[ANY23-40] - Complete Documentation for Plugin Management system


		 Apache Any23 2.1
		  Release Notes
	      14/09/2017 (dd/mm/yyy)

Bug

[ANY23-244] - Broken Links on Web-Site
[ANY23-282] - Replacement for all Sindice namespaces and URI's
[ANY23-304] - Add extractor for OpenIE
[ANY23-305] - Missing appender in command line tool
[ANY23-308] - Adding option "-d" to yaml file parsing gives error
[ANY23-310] - Rover displays wrong statistical values

Improvement

[ANY23-206] - Overhaul Any23 site documentation
[ANY23-301] - Forward all logs into STDERR stream

New Feature

[ANY23-257] - Support OWL as an input format

Task

[ANY23-283] - access to analysis.apache.org

		 Apache Any23 2.0
		  Release Notes
	      03/02/2017 (dd/mm/yyy)

Sub-task

[ANY23-243] - Overhaul and update README.txt

Bug

[ANY23-79] - No execute permissions in command line tool
[ANY23-92] - NQuadsParser does not require whitespace between elements
[ANY23-99] - NQuadsWriter should force ASCII in OutputStream constructor
[ANY23-153] - Automatically Generate EARL reports for Any23 RDF Parsers
[ANY23-176] - DOC: Apache Any23 Installation Guide
[ANY23-200] - Build revision is not correctly defined
[ANY23-219] - rover is does not work with -f nquads option
[ANY23-235] - NQuads links broken on Supported Formats Page
[ANY23-236] - Port Any23 site to Apache CMS
[ANY23-248] - NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies to 2.7.14
[ANY23-252] - JSON-LD format MIME type is not detected
[ANY23-253] - JSON-LD cannot be processed by Rover
[ANY23-255] - apache-any23-quads dependency should not be <scope> test in core pom.xml
[ANY23-265] - ThreadSafety issue in ItemPropValue
[ANY23-272] - Service fails to start with any23server.bat
[ANY23-277] - Any23 master branch will not build to to build due to lacking maven-assembly-plugin
[ANY23-279] - Fix EmbeddedJSONLDExtractor ExtractorDescription getDescription() implementation
[ANY23-296] - Tar complains about groupid value being too big
[ANY23-302] - rover JSON output is not valid

Improvement

[ANY23-80] - Split out command line tools into a separate module
[ANY23-163] - VocabPrinter tool broken with No writer factory available for RDF format N-Quads (mimeTypes=text/x-nquads; ext=nq)
[ANY23-185] - Add missing <meta> element attributes to HTMLMetaExtractor
[ANY23-207] - Implement Microformats2
[ANY23-246] - Add Open Graph Protocol and Facebook prefixes to popular.prefixes
[ANY23-247] - FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.
[ANY23-250] - Upgrade to Tika 1.7
[ANY23-261] - Tiny typo in Data Extraction documentation source example
[ANY23-263] - Upgrade to Tika 1.14
[ANY23-274] - Change any23.microdata.ns.default configuration value to http://schema.org
[ANY23-276] - Upgrade sesame dependencies to RDF4J
[ANY23-278] - Upgrade all Maven plugin versions in parent pom.xml
[ANY23-293] - Package log4j configuration with core appassembler
[ANY23-297] - Any23 doesn't build under JDK1.8
[ANY23-299] - Missing YAML to RDF parser
[ANY23-300] - Ignore NetBeans configuration files

Task

[ANY23-141] - Upgrade OpenRDF Sesame to 2.7.0
[ANY23-242] - Address issues with 1.1 #1 RC

Wish

[ANY23-19] - Abstract away any specific RDF APIs
[ANY23-226] - Extract JSON-LD embedded in HTML

                     Apache Any23 1.1
                      Release Notes
                  15/10/2014 (dd/mm/yyyy)

Bug

[ANY23-205] - Remove xrefs from Any23 site and replave with Git(hub) links
[ANY23-220] - Run crawler plugin on Apache Any23 site
[ANY23-234] - No writer factory available for RDF format N-Quads (mimeTypes=text/x-nquads; ext=nq)

Improvement

[ANY23-157] - Update Any23 site to accommodate move to Git.
[ANY23-197] - Extract embedded json-ld from html documents
[ANY23-204] - fix url encoding problem : PR#3
[ANY23-209] - Bug in site generation
[ANY23-221] - Enable JSON-LD as an input format for the WebService at any23.org
[ANY23-238] - Fix generation of BNode name for microdata when 'itemid' is given without a value.

New Feature

[ANY23-7] - Performance test suite
[ANY23-160] - [SECURITY] Frame injection vulnerability in published Javadoc

Task

[ANY23-222] - Push 1.1-SNAPSHOT artifacts to the Any23 website
                       

                       Apache Any23 1.0
                         Release Notes
                     09/05/2014 (dd/mm/yyyy)

Sub-task

[ANY23-148] - Programmes Ontology

Bug

[ANY23-100] - Issue with RDFa extractor while processing nested properties
[ANY23-135] - Any23 RDFa Extractor ignores multiple prefix and property statements
[ANY23-136] - Some RDFa tests have incorrect expected results
[ANY23-168] - RDFa properties in <meta> elements not picked up
[ANY23-170] - Dependency error org.apache.commons:commons-csv:1.0-SNAPSHOT-rev1148315
[ANY23-172] - Fix minor issues with Any23 0.9.0 RC
[ANY23-173] - Please delete old releases from mirroring system
[ANY23-174] - Incorrect RDFa extractions
[ANY23-203] - Update version revisions from 0.9.1 to 1.0

Improvement

[ANY23-65] - Update to RDFa extraction stylesheet
[ANY23-128] - html-rdfa11 extractor fails on mailto: anchors
[ANY23-130] - Improve aesthetics of the output format when straying from default java.io.PrintStream
[ANY23-137] - RDFa parser implementation proposal
[ANY23-179] - Improve Javadoc and throwing of IllegalArgumentException in Any23#createDocumentSource
[ANY23-180] - Create an Apache hosted jail running an Any23 service instance
[ANY23-181] - Upgrade NekoHTML to 1.9.20

New Feature

[ANY23-134] - Create o.a.a.extractor.tika Parser and Extractor implementations
[ANY23-177] - Add support for JSON-LD

Task

[ANY23-162] - Add package.java for all LKIFCore classes

                       Apache Any23 0.9.0
                         Release Notes
                     28/10/2013 (dd/mm/yyyy)

Sub-task

[ANY23-142] - LKIF-Core Vocabulary
[ANY23-143] - LRICore Vocabulary

Bug

[ANY23-111] - Any23 raises an unmanaged exception from the Microdata parser
[ANY23-115] - Empty spans seem to break ANY23
[ANY23-161] - Fix service file generation
[ANY23-165] - "Invalid content" error if TITLE precedes encoding declaration in the document
[ANY23-171] - form.html not in correct location in service.

Improvement

[ANY23-47] - Migrate basic-crawler classes to org.apache.nutch
[ANY23-164] - office-scraper ExcelExtractorFactory.java to accept application/x-tika-ooxml and application/x-tika-msoffice formats

New Feature

[ANY23-120] - Split CLI tools out into a new module

Task

[ANY23-122] - Cleanup Distribution Mirrors

                       Apache Any23 0.8.0
                         Release Notes
                     01/05/2013 (dd/mm/yyyy)

Sub-task

[ANY23-109] - Missing tika-config.xml in o.a.a.mime
[ANY23-110] - DOAP Vocabulary

Bug

[ANY23-44] - error when parsing a document from http://www.afdsi.org/docs/test/html/RDFa/_food-stream_.htm
[ANY23-78] - Download page links are broken
[ANY23-108] - Broken schema.org microdata extraction
[ANY23-112] - Fix incubation disclaimer
[ANY23-113] - Remove dependencies from parent pom.xml file
[ANY23-116] - Empty values are skipped when reading tab separated CSV.
[ANY23-156] - Add logging dependencies to plugins and service

Improvement

[ANY23-2] - Add support for hreview-aggregate microformat.
[ANY23-26] - Upgrade dependency to Apache Tika 1.2
[ANY23-46] - Update Any23 web service
[ANY23-83] - Remove hardcoded formats throughout Any23 to make it useful as a library
[ANY23-101] - Use RDFFormat.NQUADS in nquads module
[ANY23-139] - Simplify site deploy plugging the maven-scm-publish-plugin
[ANY23-144] - Implement comprehensive naming of o.a.a.api.vocab classes

New Feature

[ANY23-4] - Integrate W3C's RDFa test suite and pass all tests
[ANY23-85] - Split NQuads out into its own module
[ANY23-96] - Add user agent string to basic-crawler
[ANY23-117] - Split Mime type detection out into its own module
[ANY23-118] - Split Encoding detection out into its own module

Task

[ANY23-41] - Write basic-crawler plugin documentation
[ANY23-125] - Drop the Incubating DISCLAIMER
                     

                         Apache Any23 0.7.0-incubating
                          Release Notes
                          25/06/2012

Sub-task

[ANY23-25] - Update all Maven POM's in trunk
[ANY23-31] - Move any23 site documentation out of trunk and into its own SVN directory
[ANY23-53] - Bad Web Service documentation

Bug

[ANY23-14] - Add support for Extractor sub results
[ANY23-20] - The Any23 PluginManager fails handing resource paths containing spaces.
[ANY23-34] - Plugin Integration Test Fails
[ANY23-37] - LGPL'ed components cannot be included in distribution packages
[ANY23-42] - Fix issue in RDFa11Parser.java is not resolving relative URIs correctly
[ANY23-49] - N3/NQ parsers ignoring stopAtFirstError flag
[ANY23-58] - HCardExtractor infinite loop and memory exhaustion
[ANY23-62] - ExtractionResultImpl loses all issues generated by sub extractions
[ANY23-73] - The ToolRunner CLI driver -p (--plugins-dir) option doesn't work because parsed after the Tool list loading
[ANY23-77] - Facing a infinite loop problem in version 0.6.1 - Verify
[ANY23-78] - Download page links are broken
[ANY23-87] - Bogus arguement in o.a.a.cli.CrawlerTest
[ANY23-88] - any23 script -v or --version option doesn't display actual version
[ANY23-94] - The Microdata CLI tool doesn't work anymore
[ANY23-95] - Activate the IgnoreAccidentalRDFa filter for the Any23 Service instance
[ANY23-97] - The test suite was not running all tests, minor regressions occurred

Improvement

[ANY23-18] - Add a new extractor for RDFa using java-rdfa
[ANY23-28] - Document munging of Any23 history to CHANGES.txt
[ANY23-32] - replace hardcoded bash script with generated via appassembler
[ANY23-33] - Replace proprietary SUN imports from Any23 classes.
[ANY23-45] - Improve issue verification support in Extractor tests
[ANY23-50] - Simplify plugin loading avoiding the classpath scanning
[ANY23-56] - Change repo-ext to Any23 SVN mirrior repo.
[ANY23-63] - The Any23 web service doesn't return the Issue Report generated by activated Extractors, hiding major metadata issues
[ANY23-64] - Improve CLI uage aesthetics
[ANY23-70] - Establish searchable list archives
[ANY23-71] - improve the current CLI engine
[ANY23-74] - Disable domain triple generation in default configuration
[ANY23-75] - Improve runtime of the Microdata extractor on documents with many relations.
[ANY23-76] - Improve runtime of the Microformat extractor on documents with many relations.
[ANY23-82] - Don't use explicit reference to Log4j classes
[ANY23-86] - Better logging in SiteCrawlerTest

New Feature

[ANY23-9] - Prepare a dedicated homepage for Any23
[ANY23-29] - Migrate code base to ASF infrastructure
[ANY23-57] - Create Any23 History documentation and add to site
[ANY23-59] - Create KEYS file for Any23
[ANY23-68] - Create Powered By documentation/page
[ANY23-102] - Any23 DOAP file

Task

[ANY23-21] - Migrate all packages and classes to ORG.APACHE.ANY23
[ANY23-27] - Import revisions r1547 to r1607 from Google Code SVN to ASF SVN
[ANY23-36] - Merge GCode specific CHANGES.txt report in main changes.xml
[ANY23-39] - Write Down Overall Architecture Document to help new developers maintaining the Any23 core
[ANY23-48] - Update Documentation (Site + READMEs) to reflect changes in shell script usage
[ANY23-52] - Remove non ASF logos from Any23 Service page
[ANY23-66] - Fix Javadoc

==========================================================================

                         Apache Any23 0.6.1
                          Release Notes

Fixes

  • Improved MIMEType detection for CSV input. [172, 176]

==========================================================================

                         Apache Any23 0.6.0
                          Release Notes

Fixes

  • Fixed several bugs. [151, 153, 154, 155, 156, 164, 168]
  • Removed unused Apache Any23 dependencies. [162]
  • Introduced parent POM dependencyManagement. [163]
  • Minor code refactoring. [142]
  • Updated project documentation. [161]

Enhancements

  • Added support for Microdata [114, 141, 144, 145, 152, 157]
  • Added RDFa 1.1 support for new prefix specification. [143]
  • Added CSV Extractor (RDFizer). [150, 165]
  • Added HTML/META Extractor. [148, 149]
  • Improved Configuration programmatic management. [147]
  • Added several flags to control metadata triples generation. [146]
  • Improved nesting relationship explicitation in Microformat extractors. [80]
  • Major Extractor interface refactoring. [160, 167]
  • Improved TagSoup Extractor based error reporting. [159]
  • Added command-line tool to print out the Apache Any23 declared vocabularies. [114]

==========================================================================

                          Apache Any23 0.6.0-M2
                            Release Notes

The release 0.6.0-M2 introduces major fixes on M1 milestone [154, 155, 156] and improves Configuration [147] and Microdata error management[157].

==========================================================================

                         Apache Any23 0.6.0-M1
                           Release Notes

The release 0.6.0-M1 is an early preview of the Microdata support. [114]

==========================================================================

                         Apache Any23 0.5.0
                          Release Notes

Fixes

  • Fixed wrong conversion of a generic XML file to RDF. [131]
  • Fixed usage of 'base' tag when resolving relative URIs in RDFa. [75]
  • Fixed error parsing Turtle data. [87]
  • Fixed issue with escaping in NQuads parser. [126]
  • Fixed XML DTD validation attempt. [95]
  • Fixed concurrent modification exception in ExtractionContentBlocker filter. [86]
  • Fixed mime type detection of direct input when source contains blank chars. [83, 90]
  • Fixed reporting when producing no triples. [79]
  • Fixed any23-service packaging, added profile for excluding embedded dependencies. [113]

Enhancements

  • Improved extraction report: added list of activated extractors. [89]
  • Improved extraction of HTML link element. [133]
  • Added XPath HTML extractor. [124]
  • Added HRecipe Microformat extractor. [103]
  • Added plugin support for Apache Any23. [111]
  • Implemented HTML Scraper Plugin. [123]
  • Upgraded to Sesame 2.4.0. [136]
  • Upgraded to Jetty 8.0.0 [138]
  • Upgraded maven-site-plugin. [85]
  • Added flags to exclude metadata triples [134]
  • Added removal of CSS related triples. [135]
  • Improved overall documentation. [130]
  • Overall POM refactoring. [125]

==========================================================================

                         Apache Any23 0.4.0 
                          Release Notes
  • The any23-service module has been separated from the any23-core module, the Ant build system has been dropped. [Issue 44]
  • Added support for HTML metadata (RDFa / Microformats) validation and correction (validator). [Issue 77]
  • Added flag to disable the nesting relationship property enrichment. [Issue 67]
  • Improved coverage of Microformats tests. [Issue 65]
  • Improved documentation. [Issue 44]
  • Various code consolidation. [Issues 68, 69, 70, 71, 72, 73, 74, 77]

==========================================================================

                         Apache Any23 0.3.0 
                          Release Notes
  • Added detection and enrichment of nested microformats. [Issue #61]
  • Added detection and support of N-Quads as input and output format. [Issue #7]
  • General Improvements in RDFa extraction. [Issue #12, Issue #14]
  • Added support of Turtle embedded in HTML script tag. [Issue #62]
  • Improvement in encoding support. [Issue #43]
  • Improvement in Core API. [Issue #27]
  • Improved support for Species Microformat. [Issue #63]
  • General Code prettification.

==========================================================================

                         Apache Any23 0.2.2 
                          Release Notes
  • Fixed dependency management on Maven. A second level dependency of Xerces introduced a conflict on the java.xml.transform API causing wrong XSLT transformations within RDFa extractor.

==========================================================================

                         Apache Any23 0.2.1 
                          Release Notes
  • Major applyFix on Tika configuration management. This applyFix solves the auto detection of the main Semantic Web related formats.

==========================================================================

                        Apache Any23 0.2
                         Release Notes

============ Introduction

This release features a redesigned API and incorporating enhancements and bug fixes that have accumulated since the 0.1 release. Apart from some new or changed dependencies on the underlying libraries, this version comes with an improved unit test coverage and other features like the automatic charset encoding detection and an improved documentation. Maven build system has been introduced.

================================== Summary of major changes since 0.1

  • Redesigned Java API
    • Input from string, stream, file, or URI
    • Allow choosing which extractors to use
    • Report origin of triples (document/extractor) to client processors
    • Various processors/serializers for extracted triples
  • Added flexible command-line tool for easy testing
  • Vastly improved website and documentation
  • Media type and encoding detection via Apache Tika
  • Switched RDF library from Jena to Sesame
  • Added Maven build
  • Better RDF extraction from Microformats
  • Extractors now come with an example file to document typical in- and output
  • Major refactoring
  • Lots and lots of bugfixes

================= Supported formats

  • RDF/XML
  • Notation3 and Turtle
  • N-Triples
  • RDFa

Various microformats, see http://sindice.com/developers/microformat on Sindice Microformats support.

=================== Dependency Upgrade

CyberNeko Html parser has been upgraded to 1.9.14.

Apache Tika 0.3 has been replaced with 0.6, with the new support for the automatic encoding detection.

EOF