LoggingRecordCounts
This page was used during development of record counts code.
Dave in email on 2011-03-31
I guess I have an initial comment to start:
We are counting active, updates, and deletes, but I believe there are at least four categories - perhaps we should count the following kinds of arriving records (I am assuming that active is defined as a record that is not marked-deleted):
1
) new record (active)
2) new record (marked-deleted)
3) updated record (active)
4) updated record (marked-deleted)
We could also count some additional categories that would also look at what the updated records are replacing (two of the above cases) :
3a) updated (active replacing an existing active)
3b) updated (active replacing an existing marked-deleted)
4a) updated (marked-deleted replacing an existing active)
4b) updated (marked-deleted replacing an existing marked-deleted)
additional facets
- num output records ?
- status (default to active only)
- unknown processing errors
## Operation ## ### after harvest step completes ### **external repository's completeListSize for this run**: 5,800,000 records
records added to repository this run (stored in a structured db table)
set-name | active records | updates | deleted records |
---|---|---|---|
5,000,000 | 5,000 | 50 | |
bib | 2,500,000 | 2,500 | 25 |
hold | 2,500,000 | 2,500 | 25 |
column definitions
active records | all incoming records not marked as deleted |
---|---|
updates | incoming records that have previously been harvested from this provider |
deleted records | incoming records that have been marked as deleted (if a record is active, then deleted, then active, then deleted, this counter will be incremented twice |
total records in this repository (queried from live db tables and archived in a structured db summation table)
set-name | active records | updates | deleted records |
---|---|---|---|
5,000,000 | 5,000 | 50 | |
bib | 2,500,000 | 2,500 | 25 |
hold | 2,500,000 | 2,500 | 25 |
column definitions
active records | all records available for harvest |
---|---|
updates | # of record updates minus # of records |
deleted records | # of records marked deleted |
### after service completes ###
records received this run
type | active records | updates | deleted records |
---|---|---|---|
5,000,000 | 5,000 | 50 | |
bib | 2,500,000 | 2,500 | 25 |
hold | 2,500,000 | 2,500 | 25 |
column definitions
active records | all incoming records not marked as deleted |
---|---|
updates | incoming records that have previously been harvested from this provider |
deleted records | incoming records that have been marked as deleted (if a record is active, then deleted, then active, then deleted, this counter will be incremented twice |
records added to repository this run
service name | type | active records | held records | deleted records | updates |
---|---|---|---|---|---|
marcnormalization | 5,000,000 | 5,000 | 5,000 | 50 | |
marcnormalization | bib | 2,500,000 | 2,500 | 25 | 10 |
marcnormalization | hold | 2,500,000 | 2,500 | 25 | 0 |
total records in repository
service name | type | active records | updates | deleted records | unknown processing errors |
---|---|---|---|---|---|
marcnormalization | 5,000,000 | 5,000 | 50 | 10 | |
marcnormalization | bib | 2,500,000 | 2,500 | 25 | 10 |
marcnormalization | hold | 2,500,000 | 2,500 | 25 | 0 |
column definitions
active records | the # of (A)ctive records in the live repo |
---|---|
updated records | the # of updates to records beyond the initial insert |
deleted records | the # of (D)eleted records in the live repo |
### after solr index completes for a specific repo ###
records received this run
set | type | active records | deleted records |
---|---|---|---|
5,000,000 | 5,000 | ||
bib | 2,500,000 | 2,500 | |
hold | 2,500,000 | 2,500 |
records added to solr this run
set | type | active records | deleted records |
---|---|---|---|
5,000,000 | 5,000 | ||
bib | 2,500,000 | 2,500 | |
hold | 2,500,000 | 2,500 |
total records in solr (this repository only)
set | type | active records | deleted records |
---|---|---|---|
5,000,000 | 5,000 | ||
bib | 2,500,000 | 2,500 | |
hold | 2,500,000 | 2,500 |
### structured db table ###
repo name | incoming/outgoing | incremental/total | date | set | type | active | update | deleted | held |
---|
Example:
$ mysql -u root --password=root -D xc_marcaggregation -e 'describe incoming_record_counts'
+--------------------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------------------+-------------+------+-----+---------+----------------+
| incoming_record_count_id | int(11) | NO | PRI | NULL | auto_increment |
| harvest_start_date | datetime | NO | MUL | NULL | |
| type_name | varchar(35) | NO | MUL | NULL | |
| new_act_cnt | int(11) | NO | | 0 | |
| new_del_cnt | int(11) | NO | | 0 | |
| upd_act_cnt | int(11) | NO | | 0 | |
| upd_del_cnt | int(11) | NO | | 0 | |
| upd_act_prev_act_cnt | int(11) | NO | | 0 | |
| upd_act_prev_del_cnt | int(11) | NO | | 0 | |
| upd_del_prev_act_cnt | int(11) | NO | | 0 | |
| upd_del_prev_del_cnt | int(11) | NO | | 0 | |
| unexpected_error_cnt | int(11) | NO | | 0 | |
+--------------------------+-------------+------+-----+---------+----------------+
- there's a hook after each service runs to check rules about the record counts
- at the end of service processing, this method gets called: processServiceRecordCounts((RecordCounts mostRecentIncomingRecordCounts), at the end of this method, applyRulesToRecordCounts(RecordCounts mostRecentIncomingRecordCounts) gets called. The individual services need to subclass applyRulesToRecordCounts(RecordCounts mostRecentIncomingRecordCounts) and check that the record counts are as expected by applying any pertinent rules.
- Downloads
- Installing the Toolkit
- Hardware Requirements
- Installing 3rd Party Tools
- Installing the Metadata Services Toolkit
- Configuring
- Starting the MST
- Uninstalling and Reinstalling the MST
- Upgrading the MST
- Useful Info
- Using the Toolkit
- Services
- What is a service?
- What are Configuration 1 and Configuration 2?
- XC MARCXML Normalization
- MARCXML to XC Transformation
- DC to XC Transformation
- MARC Aggregation
- Multiple Instances of the Same Service
- Harvesting from an MST Service
- How To Implement a Service
- About the XC Schema
- MST Frequently Asked Questions
-
Performance Results
- RecordBreakdown
- MySQLCustomizations
- Release Notes
- Next Coding Period Summary
- Glossary
- Developer ScratchPad
- ServerChart
- Transformation 1.0
- TransformationDocumentationNotes
- new
- TransformationDocumentation
- old
- AdditionalWorksAndExpressions
- Transformation Service Documentation
- TransformationServiceSteps
- XcRoleTranslationTable
- AggregationServices
- MarcAggregation
- TransformationTwoPointOh
- old
- FirstIteration
- PriorDesign
- PackagingMST
- 1.0 Decisions
- ReleaseWork
- QuickInstallNotes
- MST Implementation Details
- OaiIdIndexAlgorithm
- CacheDetails
- MessageHandling
- ServiceTests
- ProcessingStepsExplained
- ResumptionToken->completeListSize
- UpdateDelete
- OaiPmhImpl
- record counts
- RecordCountsOnePtTwoPtOne
- in production
- how to log and display
- RecordCountsOnePtZero
- RecordCountTestRestarted
- UrRecordCounts
- RecordCountTesting
- TransformationWackiness
- OaiImplementation
- Testing
- randys-30
- RegressionTests
- QuickRef
- UnicodeNormalization
- LoggingHelp
- CodeFormatPolicy
- SvnBranchingStrategy
- MultipleEclipseWorkspaces
- DeleteReaddServiceForRetest
- FileHarvests
- CharsetEncodingWithEric
- DrupalSolrOptimization
- WorkPlan
- MetricsForAssessment
- IdeasForImprovement
- RandomNotes
-
Wiki en español
- Servicios
- Qué es un Servicio de Metadatos?
- Servicio de Normalización XC MARCXML
- Servicio de Transformación MARCXML a Esquema XC
- Servicio de Agregación MARC
- Servicios