Skip to content

24Media/escenic-syndication-converter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

escenic-syndication-converter

About

Convertion of Exported Escenic Syndication Files to the Required by the Import Mechanism Format

Technologies / Tools

  • Spring Framework
  • Spring MVC
  • Hibernate ORM
  • JAXB
  • dom4j
  • jsoup

To Get You Started

  1. Restore the mysqldump you'll find in src/main/resources/database/mysql

  2. Create src/main/resources/database/database.properties File having the following properties :

dataSource.driverClassName=com.mysql.jdbc.Driver
dataSource.url=jdbc:mysql://localhost:3306/escenic-syndication-converter?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8
dataSource.username=your-username
dataSource.password=your-password
  1. Create src/main/resources/miscellaneous.properties having the following property :
filepath.syndicationFiles=/home/blixabargeld/Desktop/folder-for-read-and-write

Import Methodology

  1. The first thing is to ensure that 'picture' and 'multipleTypeVideo' Content Items have correctly imported. There are -fortunately few- cases that 'picture' binaries exported are damaged, 'multipleTypeVideo' with no obvious reason can not get imported et al. Manually we must give the characterization MISSING_BINARIES to all these cases.

  2. 'news' Contents with home sections that do not interest us gets excluded next. Home sections like '%kairos%' is one good example. The characterization is EXCLUDED_BY_SECTION and can be given with the /administrator/analyze Controller. Note that -if not stated otherwise- this is the Controller that does everything that is described in the following paragraphs.

  3. All Content Items with state 'draft' or 'deleted' gets excluded also. The characterization is -surprisingly- named DRAFT_OR_DELETED.

  4. The next thing is to ensure that all Contents' Relations exist. If the related Content Item does not exist or has a characterization such as MISSING_BINARIES, DRAFT_OR_DELETED etc. the Content gets charactezized as MISSING_RELATIONS. The same characterization is given to the missing Relation Items also because, if we like, we can remove these Relations from the Content Item when we'll do the marshalling.

  5. To make our lives a little more difficult, if anchor tags follow inline 'news' relations (such as 'picture' or 'multipleTypeVideo') Escenic Export Mechanism exports the article with the inline relation replacing all following anchors (till it finds the next inline relation). So we have to define the RelationInline Item and try to correct these duplicate occurences. The /administrator/analyze Controller parses all 'news' body fields to persist Inline Relations, check if these relations exist (if not the Content gets characterized as MISSING_INLINE_RELATIONS and considered rubbish) and finally finds the duplicate Inline Relations giving the characterization RELATIONS_NEEDS_REPLACEMENT to both the RelationInline Items and the Contents having them.

  6. If there was a second source from which the missing inline anchors can be read, actions to correct duplicate inline relations could be taken. Fortunaly this source exist and it is a RSS Feed (deliberately no details will be given here). The implementation of the correction actions may be complicated but the idea is very simple: for every destroyed article read all anchor tags from the RSS Feed. How many of these does not exist in the article read from Escenic? Lets say 5. How many duplicate relations exist for the same article? Lets also say 5. Figure out the rest. The Content and Inline Relations gets characterized as RELATIONS_CAN_BE_REPLACED and we can marshall this article corrected. If missing anchors count does not match duplicate relations count Content gets characterized as RELATIONS_CANNOT_BE_REPLACED and is considered rubbish.

Having all characterizations in place, we can proceed to marshalling following the links and advices given in the .jsps.

Future Improvements

  1. Due to tight deadlines Field's ((ANYTHING|<relation>...</relation>|text)*|<field>...</field>*|<value>...</value>*)<options>...</options>? Element substituted by text inside CDATA tokens. Further parsing of this Element may be a good idea.

  2. Find a way to correctly Import of Author and Creator Entities to Escenic.

  3. Model remaining Escenic Entities (Person, Inbox et al.)

  4. Application's performance is lame. Given the absence of a real user interface and the need for object oriented implementation Hibernate was probably a bad choice. This does not mean that you can not make it perform better.

About

Convertion of Exported Escenic Syndication Files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published