Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Processing to replace OCRProcessing #13

Closed
jukervin opened this issue Feb 20, 2014 · 3 comments
Closed

Add Processing to replace OCRProcessing #13

jukervin opened this issue Feb 20, 2014 · 3 comments

Comments

@jukervin
Copy link
Member

jukervin commented Feb 20, 2014

The current process recording elements are fixed with OCR and on the other hand bit redundand. I think it would make sense to change OCRProcessing to Processing and the preProcessingStep,ocrProcessingStep, postProcessingStep to generic processingStep with processingStepType element to record the type of processing performed.

Currently:

<OCRProcessing ID="OCRPROCESSING_1">
  <preProcessingStep>
    <processingDateTime>2009-10-19</processingDateTime>
    <processingAgency>CCS Content Conversion Specialists GmbH, 
    </processingAgency>
    <processingStepDescription>align</processingStepDescription>
    <processingStepSettings>CCS OCR Processing Filter</processingStepSettings>
     <processingSoftware>
         <softwareCreator>CCS Content Conversion Specialists GmbH,Germany</softwareCreator>
         <softwareName>CCS docWORKS</softwareName>
         <softwareVersion>6.3-0.91</softwareVersion>
         <applicationDescription/>
       </processingSoftware>
    </preProcessingStep>
    <ocrProcessingStep>
    <processingSoftware>
    <softwareCreator>ABBYY (BIT Software), Russia</softwareCreator>
      <softwareName>FineReader</softwareName>
      <softwareVersion>8.1</softwareVersion>
    </processingSoftware>
  </ocrProcessingStep>
</OCRProcessing>

Suggestion

<Processing>
  <ProcessingStep ID="01">
    <processingDateTime>2009-10-19T10:10:10+05:00</processingDateTime>
    <processingStepType>image processing</processingStepType>
    <processingAgency>ACME Processing</processingAgency>
    <processingStepDescription>align</processingStepDescription>
    <processingStepSettings>ACME OCR Processing Filter</processingStepSettings>
    <processingSoftware>
      <softwareCreator>CCS Content Conversion Specialists GmbH, Germany</softwareCreator>
      <softwareName>CCS docWORKS</softwareName>
      <softwareVersion>6.3-0.91</softwareVersion>
      <softwareDescription/>
    </processingSoftware>
  </ProcessingStep>
  <ProcessingStep ID="02">
    <processingDateTime>2009-10-19T10:21:14+05:00</processingDateTime>
    <processingStepType>OCR</processingStepType>
    <processingAgency>CCS Content Conversion Specialists GmbH, www.content-conversion.com</processingAgency>
    <processingStepDescription></processingStepDescription>
    <processingStepSettings></processingStepSettings>
    <processingSoftware>
      <softwareCreator>ABBYY (BIT Software), Russia</softwareCreator>
      <softwareName>FineReader</softwareName>
      <softwareVersion>8.1</softwareVersion> 
      <softwareDescription/>
    </processingSoftware>
  </ProcessingStep>
  <ProcessingStep ID="03">
     <processingDateTime>2009-10-19T15:28:30+05:00</processingDateTime>
     <processingStepType>Proofreading</processingStepType>
     <processingAgency>ACME Corp.</processingAgency>
     <processingStepDescription></processingStepDescription>
     <processingStepSettings></processingStepSettings>
     <processingSoftware>
        <softwareCreator>ACME</softwareCreator>
        <softwareName>Proofreader</softwareName>
       <softwareVersion>9.9</softwareVersion>
       <softwareDescription/>
     </processingSoftware>
   </ProcessingStep>
</Processing>

Schema changes:

<xsd:element name="OCRProcessing" minOccurs="0" maxOccurs="unbounded">
+  <xsd:annotation>
+    <xsd:documentation>DEPRECATED: Processing element should be used instead. 
+  </xsd:documentation>
 <xsd:complexType>
   <xsd:complexContent>
     <xsd:extension base="ocrProcessingType">
       <xsd:attribute name="ID" type="xsd:ID" use="required"/>
     </xsd:extension>
   </xsd:complexContent>
</xsd:complexType>


+<xsd:element name="Processing" minOccurs="0" maxOccurs="unbounded">
+  <xsd:complexType>
+     <xsd:complexContent>
+       <xsd:extension base="ProcessingStepType">
+         <xsd:attribute name="ID" type="xsd:ID" use="required"/>
+       </xsd:extension>
+      </xsd:complexContent>
+  </xsd:complexType>


<xsd:complexType name="ProcessingStepType">
<xsd:annotation> 
  <xsd:documentation>A processing step.</xsd:documentation>
</xsd:annotation>
 <xsd:sequence>

+  <xsd:element name="processingStepType" type="xsd:string" minOccurs="0"> 
+   <xsd:annotation>
+    <xsd:documentation>Type of processing step</xsd:documentation>
+   </xsd:annotation>
+  </xsd:element>

  <xsd:element name="processingDateTime" type="dateTimeType" minOccurs="0"> 
   <xsd:annotation>
    <xsd:documentation>Date or DateTime the image was processed.</xsd:documentation>
   </xsd:annotation>
  </xsd:element>
  <xsd:element name="processingAgency" type="xsd:string" minOccurs="0">
   <xsd:annotation>
    <xsd:documentation>Identifies the organizationlevel producer(s) of the
      processed image.</xsd:documentation>
   </xsd:annotation>
  </xsd:element>
  <xsd:element name="processingStepDescription" type="xsd:string" minOccurs="0" maxOccurs="unbounded">
   <xsd:annotation>
    <xsd:documentation>An ordinal listing of the image processing steps performed.
        For example, "image despeckling."</xsd:documentation>
   </xsd:annotation>
  </xsd:element>
  <xsd:element name="processingStepSettings" type="xsd:string" minOccurs="0">
   <xsd:annotation>
    <xsd:documentation>A description of any setting of the processing application.
        For example, for a multi-engine OCR application this might include the
        engines which were used. Ideally, this description should be adequate so
        that someone else using the same application can produce identical
        results.</xsd:documentation>
   </xsd:annotation>
  </xsd:element>
  <xsd:element name="processingSoftware" type="processingSoftwareType" minOccurs="0"/>
  </xsd:sequence>
</xsd:complexType> 
@jukervin jukervin added this to the 2.2 milestone Feb 20, 2014
@jukervin jukervin self-assigned this Feb 20, 2014
@jukervin jukervin removed this from the 2.2 milestone Jun 5, 2014
@jukervin jukervin changed the title Change OCRProcessing to Processing Add Processing to replace OCRProcessing Jun 5, 2014
@jukervin jukervin modified the milestone: 3.1 Dec 11, 2014
@Jo-CCS Jo-CCS mentioned this issue Mar 23, 2016
@cneud
Copy link
Member

cneud commented Jun 14, 2016

This seems very sensible to me!

Having a generic Processing and an ID attribute for a processingStep would seem to me to also satisfy much of what has been requested in #35. What it is still missing though is a way to track, which exact elements have been produced or altered by a particular processingStep.

@Jo-CCS
Copy link
Member

Jo-CCS commented Jun 16, 2016

To track the changes of element will be imposisble to cover within an XML file, as XML is hierarchical structured and the change by (post-)processing actions will also cause change of hiararchy, which cannot be recorded. Also elements might be removed which then cannot be referenced any more.
In such case it makes much more sense to clone files, just add the history recordings to know which file has which status and to compare.
Storage managements systems do the rest to prevent full redundant data holding by just saving the changes and keep ability to roll back to former version.

@cneud cneud mentioned this issue Jun 16, 2016
6 tasks
@cneud
Copy link
Member

cneud commented Jun 16, 2016

Continued in #39.

@cneud cneud closed this as completed Jun 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants