| Field | Value |
|---|---|
| Standard | Unstructured Information Management Architecture (UIMA) |
| OASIS Document | OASIS UIMA 1.0 (2009) |
| OASIS Specification | docs.oasis-open.org/uima/v1.0/uima-v1.0.html |
| ISO Adoption | ISO/IEC 24618 — Information Technology — UIMA framework |
| Apache Reference Implementation | Apache UIMA Java SDK 3.x |
| Authority | OASIS UIMA Technical Committee and Apache Software Foundation — UIMA Project |
| npm Package | @amlhubs/uima |
| npm Version | 0.0.1 |
| Peer Dependencies | @amlhubs/uml@^0.0.2, @amlhubs/mof@^0.0.1, @amlhubs/xmi@^0.0.1 |
| License | MIT |
The Unstructured Information Management Architecture (UIMA) is the OASIS-ratified component framework for orchestrating analysis of unstructured content — text, audio, video — through a type-safe shared data structure (the Common Analysis Structure, CAS) and a composable pipeline of analysis components (Analysis Engines, CAS Consumers, CAS Multipliers, Collection Readers, Flow Controllers). Originally engineered at IBM Research, contributed to the Apache Software Foundation in 2006, and ratified as the OASIS UIMA 1.0 standard in 2009 (also published as ISO/IEC 24618 Information Technology — UIMA framework), UIMA is the load-bearing reference architecture for every modern enterprise NLP, biomedical text-mining, and multi-modal information-extraction pipeline that requires interoperable annotation interchange across vendors and analysis stages.
The @amlhubs/uima npm package repackages the OASIS UIMA 1.0 Component Descriptor metamodel as extensible TypeScript interfaces and base classes following the @amlhubs Three-Layer Pattern: every metaclass surfaces as IFoo<T> (generic interface), AbstractFoo<const T> (inference-site abstract class), and Foo{Instance} (registered concrete). The package surfaces the UIMA Component Descriptor metaclasses spanning the four UIMA core packages — Type System (TypeSystem, Type, Feature, AllowedValue, TypePriority, TypePriorityList), Common Analysis Structure (CAS, View, SofA, FeatureStructure, AnnotationFS, ArrayFS, ListFS, FSIndex, FSIterator), Analysis Engine descriptors (AnalysisEngineDescription, PrimitiveAnalysisEngineDescription, AggregateAnalysisEngineDescription, FlowController, FixedFlow, CapabilityLanguageFlow, FlowConstraints), and the Auxiliary Component descriptors (CollectionReader, CASConsumer, CASMultiplier, CASInitializer) — together with the descriptor substrate (Capability, ConfigurationParameter, ConfigurationParameterDeclaration, ConfigurationParameterSetting, ResourceManagerConfiguration, ExternalResourceDescription, ExternalResourceBinding, OperationalProperties) on which every UIMA component depends. Every interface carries a JSDoc header citing the precise OASIS UIMA 1.0 chapter and section that defines it, making each symbol an auditable projection of the OASIS standard rather than an internal invention.
Adopting UIMA through a typed package produces an immediate interoperability dividend. Every modern enterprise NLP and text-analytics platform — IBM Watson Discovery, Apache cTAKES (clinical NLP), DKPro Core (academic NLP), Apache OpenNLP UIMA wrappers, LAPPS Grid (Language Application Grid), the BioNLP-UIMA initiative, and every biomedical informatics platform that consumes electronic health records — converges on UIMA's CAS abstraction and Type System as the machine-readable substrate for cross-vendor annotation interchange. A pipeline expressed against the metaclasses @amlhubs/uima exports is, by construction, portable to every one of those platforms without a custom converter. Ventures that would otherwise spend quarters re-implementing per-platform analysis-engine descriptors and CAS serializers — a custom XML-to-CAS converter, a per-vendor type-system mapper, a bespoke flow controller for each pipeline shape — amortize that engineering cost to zero.
OASIS standardization plus ISO adoption turns internal architectural decisions into regulator-recognized artifacts. OASIS UIMA 1.0 is cited in U.S. federal procurement frameworks for unstructured-data analysis, in the European Language Grid governance documents, in clinical NLP best-practice publications from the American Medical Informatics Association, and in the HL7 Clinical Quality Language interoperability stack. ISO/IEC 24618 — the ISO publication of the same UIMA architecture — is cited in international information-technology procurement standards. A regulated vendor — a clinical-NLP platform subject to HIPAA traceability requirements, a regulatory-document analysis platform subject to FDA 21 CFR Part 11, a financial-services KYC platform subject to ISO 22301 — presents the same UIMA surface to an auditor without translating internal jargon into standards language.
The second business lever is agentic runtime leverage. Ageni's Probabilistic Reduction Engine consumes the UIMA metamodel as the deterministic substrate over which large-language-model reasoning operates on unstructured-data pipelines. When an agent writes source code against IAnalysisEngineDescription, ITypeSystem, ICAS, and IAnnotationFS, the TypeScript compiler evaluates whether the proposed pipeline is a well-formed UIMA architecture at the same moment the compiler evaluates whether the code itself is well-formed — the two correctness checks collapse into one tsc pass. Structural hallucinations that would otherwise slip past a natural-language review (inventing an annotation type without a parent TypeSystem, mis-attributing a Feature to a CAS rather than a Type, violating the AggregateAnalysisEngine flow-controller binding constraint, declaring outputsNewCASes without backing CAS Multiplier semantics) are caught at compile time, and every surviving interface reference traces to a §-section of the OASIS standard through the JSDoc header. The agent's reasoning surface is reduced from the open set of possible English sentences about NLP pipelines to the closed set of typed UIMA metaclass compositions.
The third lever is compounding reuse across the AML NLP stack. UIMA is the load-bearing pipeline-orchestration substrate of the natural-language-processing layer in the AML graph: the @amlhubs/lmf-core package supplies the lexicon metamodel that UIMA Analysis Engines emit annotations against; the future @amlhubs/maf (ISO 24611 — Morpho-Syntactic Annotation Framework), @amlhubs/laf (ISO 24612 — Linguistic Annotation Framework), @amlhubs/semaf (ISO 24617 — Semantic Annotation Framework), @amlhubs/iso-24617-2 (DiAML — Dialogue Act Markup Language), @amlhubs/its (W3C Internationalization Tag Set), and @amlhubs/fipa-acl (FIPA Agent Communication Language) all interoperate with UIMA's CAS and Type System through declared annotation types. Every downstream ageni venture that reasons about text annotation, biomedical entity extraction, dialogue-act classification, or multi-stage NLP orchestration consumes these same UIMA Component Descriptor metaclasses through their transitively-dependent packages. The engineering investment that produced @amlhubs/uima is not charged to any single venture; it is amortized over every venture that ever extends it.
The fourth lever is composability across the OMG/ISO/OASIS specification stack. UIMA's metamodel is a UML-class-diagram-shaped metamodel: every UIMA metaclass realizes a UML IClass carrying owned attributes typed by UML IDataType/IPrimitiveType and connected by UML IAssociation instances; every UIMA descriptor serializes through XMI 2.5.1 (the OMG XML Metadata Interchange standard, ISO/IEC 19509:2014). The @amlhubs/uima package depends on @amlhubs/uml for those structural types, on @amlhubs/mof for the reflective machinery that lets agents query a UIMA pipeline's metamodel-shape at runtime, and on @amlhubs/xmi for the canonical XML serialization of UIMA descriptors. Owning the typed packages from UML through MOF through XMI through UIMA — rooted in the single UML foundation that underpins the entire AML graph — gives the ageni platform one coherent metamodeling surface for both structural code and unstructured-data analysis pipelines rather than two loosely-coupled specifications, and makes every future agentic capability that touches a text-analytics pipeline a composition of capabilities already expressed in the same formal language.
The package exports the OASIS UIMA 1.0 Component Descriptor metaclasses grouped by spec chapter. The complete enumeration lives in src/uima.ts; the table below summarizes the groups and cites the authoritative OASIS UIMA 1.0 chapter.
| UIMA Package | OASIS UIMA 1.0 Chapter | Metaclasses Surfaced |
|---|---|---|
| Type System | §3 (Conceptual Overview) + §6 (Type System Description) | ITypeSystemDescription, ITypeDescription, IFeatureDescription, IAllowedValue, ITypePriorities, ITypePriorityList, ITypeOrFeature, IImport |
| Common Analysis Structure | §3.4 (CAS) + §4 (Annotation and CAS Concepts) | ICAS, IView, ISofa, IFeatureStructure, IAnnotationFS, IArrayFS, IListFS, IFSIndex, IFSIndexDescription, IFSIndexCollection, IFSIterator |
| Analysis Engine Descriptors | §2 (Analysis Engine Architecture) + §5 (AE Descriptors) | IAnalysisEngineDescription, IPrimitiveAnalysisEngineDescription, IAggregateAnalysisEngineDescription, IAnalysisEngineMetaData, IFlowControllerDescription, IFlowConstraints, IFixedFlow, ICapabilityLanguageFlow |
| Auxiliary Components | §2.6 (CAS Multipliers) + §7 (Collection Processing) | ICollectionReaderDescription, ICASConsumerDescription, ICASMultiplierDescription, ICASInitializerDescription, ICollectionProcessingEngine |
| Capability + Configuration | §5.4 (Capabilities) + §5.5 (Configuration Parameters) | ICapability, IConfigurationParameter, IConfigurationParameterDeclaration, IConfigurationParameterSetting, INameValuePair, IConfigurationGroup |
| Resource Manager | §5.6 (External Resources) | IResourceManagerConfiguration, IResourceSpecifier, IExternalResourceDescription, IExternalResourceBinding, IFileResourceSpecifier, IRelativePathResourceSpecifier |
| Operational Properties | §2.5 (Operational Properties) | IOperationalProperties, IModifiesCas, IMultipleDeploymentAllowed, IOutputsNewCASes |
Every interface is accompanied by an extensible base class with the same name minus the I prefix (e.g., TypeSystemDescription, CAS, AnnotationFS). The full list and the JSDoc headers citing each OASIS UIMA 1.0 §-section live at src/uima.ts.
@amlhubs/uima is a downstream consumer of UML, MOF, and XMI. It depends on all three. Future NLP-pipeline extensions depend on it.
@amlhubs/uml (root, zero dependencies)
▲
│ peerDependency
├── @amlhubs/mof (reflective machinery over UML)
│ ▲
│ │ peerDependency
│ │
│ ├─── @amlhubs/xmi (XMI 2.5.1 serialization metamodel)
│ │ ▲
│ │ │ peerDependency
│ │ │
│ └─── @amlhubs/uima (this package — OASIS UIMA 1.0)
│ ▲
│ │ (future)
│ ├── @amlhubs/maf (ISO 24611 — MAF)
│ ├── @amlhubs/laf (ISO 24612 — LAF)
│ ├── @amlhubs/semaf (ISO 24617 — SemAF)
│ └── @amlhubs/lmf-core (ISO 24613-1)
│
└── peerDependency reused by uima for IPackage / IClass / IDataType
The edges are load-bearing. @amlhubs/uima imports IElement, IPackage, IClass, IDataType, IPrimitiveType, IProperty, IAssociation from @amlhubs/uml to type the UIMA metaclass surface as well-formed UML 2.5.1 classes. It imports IFactory from @amlhubs/mof to surface the runtime instantiation of CAS, Type, and Feature instances through MOF reflection. It imports IXMIDocument, IXMIElement, IXMIReference from @amlhubs/xmi to type the canonical XML serialization of UIMA descriptors per the OASIS UIMA 1.0 §5.7 (XML Representation of Resource Specifiers) requirement.
npm install @amlhubs/uimaimport type {
ITypeSystemDescription,
ITypeDescription,
IFeatureDescription,
ICAS,
IView,
ISofa,
IAnnotationFS,
IFeatureStructure,
IAnalysisEngineDescription,
IPrimitiveAnalysisEngineDescription,
IAggregateAnalysisEngineDescription,
IFlowControllerDescription,
ICapability,
IConfigurationParameter,
IOperationalProperties,
} from '@amlhubs/uima';Concrete implementations follow the Three-Layer Pattern: extend an Abstract{Foo} base class, supply registered Form-typed type arguments, and implement the abstract members with registry references.
Every metaclass in src/uima.ts carries a JSDoc block whose @standard, @chapter, @section, @metaclass, @generalization, @definition, @associationEnds, @ownedAttributes, @operations, @constraints tags reproduce the OASIS UIMA 1.0 (2009) specification verbatim. The single source of truth for the spec is docs.oasis-open.org/uima/v1.0/uima-v1.0.html. The Apache UIMA 3.x reference implementation is the canonical Java realization at uima.apache.org. The ISO publication of the same architecture is ISO/IEC 24618.
- UIMA-AS (UIMA Asynchronous Scaleout) — the JMS-based scale-out broker for distributed UIMA pipelines. Tracked as a separate
@amlhubs/uima-ascandidate. - Apache Ruta (Rule-based Text Annotation) — the UIMA rule language. Tracked as a separate
@amlhubs/uima-rutacandidate. - Apache UIMA DUCC (Distributed UIMA Cluster Computing) — orchestration cluster for UIMA. Out of metamodel scope.
- CAS Editor / UIMA tooling — Eclipse-based authoring tooling. Out of metamodel scope.
- JCas Java code generation — Apache UIMA's Java code generator for typed CAS access. Realized through downstream tooling, not the metamodel.
These are explicit non-goals of this 0.0.1 release; future @amlhubs/uima-as and @amlhubs/uima-ruta packages will surface them as separate, peer-dependent metamodels.