This primer describes curation of MS Access files.
Topic | Description |
---|---|
File Extension | mdb, accdb |
MIME type | Non/ application/octet-stream |
Structure | Proprietary relational database |
Versions | Access 1.0 - 2003, 2007 - 2016 |
Primary fields or areas of use | Storing relational data. Used in many disciplines. Surveys, data entry, geospatial, storing metadata for certain disciplines, etc. |
Source and affiliation | Microsoft |
Metadata standards | No widely adopted standards. Built-in documentation may be used |
Key questions for curation review |
|
Tools for curation review |
|
Date created | 2019-01-18 |
Created by | Fernando Rios (University of Arizona Libraries, Office of Digital Innovation & Stewardship), Dave Fearon (Johns Hopkins University Libraries Data Services) |
Date updated and summary of changes made | 2019-02-01 - 1.0: initial version 2019-04-18 - 2.0: DCN peer review |
Suggested Citation: Rios, Fernando; Fearon, Dave. (2019). Microsoft Access Data Curation Primer. Data Curation Network GitHub Repository.
This work was created as part of the Data Curation Network “Specialized Data Curation” Workshop #1 co-located with the Digital Library Federation (DLF) Forum 2018 in Las Vegas, Nevada on October 17-18, 2018. These workshops have been generously funded by the Institute of Museum and Library Services # RE-85-18-0040-18.
A Microsoft Access (MS Access) database could be in one of two main formats: MDB and ACCDB. There are variations on these1 but they will not be discussed further. The MDB format is the native format for MS Access 1.0 through MS Access 2003. Note that although the MDB file extension was used for all those versions, the internal format changed significantly. Starting with Access 2007, the ACCDB format was used by default.
The following descriptions of the MDB and ACCDB formats are largely summarized from the Library of Congress format description entries2 along with additional information on the differences between the formats. Refer to those for more detailed information of the format along with a large list of additional resources.
MDB | ACCDB | |
---|---|---|
Overview |
|
|
Version | The default format up until Access 2003
|
The default format starting from Access 2007
|
Metadata |
|
- Tabular data associated with spatial information - geodatabases
- Capturing survey results
- Storing metadata about other datasets
- Data entry forms - Makes use of the way Access can be used to easily create forms for more efficient data entry.
- Basic statistical analysis functionality and other data analysis uses such as content coding.
- https://figshare.com/articles/MS_Access_database_Seed_data_from_rattlesnake_GI_tracts/6034904
- https://figshare.com/articles/EPA_Literature_Database_Cleft_Palate_Chemical_Data/6062773
- Content linked to ftp://newftp.epa.gov/comptox/virtual_tissues_data/epa/litdb/litdb_projectsandpublications/cleftpalate/cp_chems.accdb
- https://figshare.com/articles/Matura_NP_ESA_vegetation_survey/5630335
- https://figshare.com/articles/Picoplankton_cytometry_accdb/3180280
- https://figshare.com/articles/Murray_Cod_Larval_Database_Murrumbidgee_2011-2013/5715097
- https://figshare.com/articles/Effects_of_Recreation_on_Animals_Revealed_as_Widespread_through_a_Global_Systematic_Review/4297430
- https://core.tdar.org/geospatial/394771/bell-site-geodatabase-mdb-file
- https://core.tdar.org/dataset/396397/gfs2011_nabone-database (locked)
- https://core.tdar.org/dataset/394059/akurvik-access-database
- Lasda Bergman, Elaine, 2014, "Replication data for: Citations", https://doi.org/10.7910/DVN/27655, Harvard Dataverse, V1
- Correlates of War, 2007, "National Material Capabilities (v3.02)", https://hdl.handle.net/1902.1/10170, Harvard Dataverse, V1, UNF:3:B4DsVV/TykpPeJQNXdtwAA== [fileUNF]
Recent version of Access (e.g., Access 2016) can still open and save databases down to Access 2000 however functionality may be changed when migrating from an older to newer or newer to older format. There is no version for Macintosh OS.
For opening very old Access versions, the Library of Congress provides resources:
- As of early 2017, Microsoft still provides instructions online for importing an Access 2.0 database into versions of Access from 2000 through 2007.
- As of early 2017, Microsoft still provides instructions online for importing an Access 95 database into Access 2007.
A basic requirement of opening these very old formats is a copy of Access 2007. If it is anticipated that there could be the need to open and curate these old databases, it is recommended to procure and maintain a virtual machine with a working copy of Access 2007 to allow migrations between older and newer versions of Access.
- Is documentation about the columns included? I.e., a codebook. The codebook is especially critical to understand the relationships between the tables
- Are the primary/foreign key relationships described? Exporting individual tables to a spreadsheet format does not keep these relationships so it's critical to document them (at least include an exported version of the Relationships diagram alongside the exported tables.)
- How easily could this database be exported into flat CSVs? The answer will depend on the above points
- If exporting is easy i.e., if there are no relationships between tables and there is no embedded code or forms that are integral to the purpose of sharing the dataset. Suggest simply exporting it rather than archiving the database itself, since the database adds no particular functionality.
- Is the data contained in the database sourced from somewhere else?
- It could be that the database was used as a way to organize other data.
- If the data is available elsewhere, it might be that it's in a more FAIR format there and it would be worth pointing that out in the metadata
- What version of Access was used?
- Most of the time, this won't be stated so you'll just have to try to open it and see if it works
- Is your version of MS Access secure?
- Make sure options to disable macros and other code from executing automatically is disabled since you don't know if the dataset could be malicious. The default of newer versions of Access is to disable such functionality
- For data entry databases with forms, a description of the data entry workflow would help with understanding the underlying database structure
- What is the main entry form? Where does the entry workflow start? Etc.
- Documentation of the meaning of the relationships in the database, not just which relationships exist (the latter can simply be obtained via the Relationship Diagram which is auto-generated by Access)
- Is the data available natively in a non-Access format elsewhere?
- Documentation description of database table fields, ideally within the database design metadata.
- Documentation of macros and modules, ideally as comments within the code.
- What version of Access was used?
- Which tables and/or queries do specific results from the paper come from (in cases where the database is specifically associated with a publication)
Accessing metadata schema: Most database platforms have standard means of exporting database schema: descriptions of the database structure and design, including the tables, fields, relationships, and, sometimes the calculations or procedures used by the database. In Access, the Database Documenter (on the Database Tools tab, Fig. 1) is the direct means of generating reports on the database schema.
The Documenter report includes a lengthy property report of each field, including description. Exporting as .txt format is recommended for utility. The decision to generate schema reports for curation may depend on plans for preservation rather than documentation that is of direct use to researchers interested in the data itself, since much of the content consists of properties of fields, tables, and relationships relevant to the database design schema. For example, table fields could include descriptions as a useful addition to documentation and usage. These are easily viewed within Access Table Design View. See the Preservation section for details on generating the Database Documenter report.
Metadata may be accessed in other ways. Access uses Database Object Listing (MSysObjects) as its underlying standard for its schema. This schema is not used by other platforms, in particular, SQL, which means they are not as interoperable.
- MS Access does NOT use information_schema standards common to a range of SQL databases, which is one of the standards for reporting information about tables, views, columns, and procedures or calculations in a database. (c.f. https://en.wikipedia.org/wiki/Information_schema.
- Access also uses its own SQL standard, Access SQL, which may be incompatible with other SQL standards such as Transaction-SQL. http://rogersaccessblog.blogspot.com/2013/05/what-are-differences-between-access-sql.html
In this section, we present the following recommendations for basic preservation of MS Access databases. These consist of 1) exporting tables and other objects into interoperable formats, and 2), Making extensive use of the Database Documenter and Relationship Tool to export contributor- and machine-generated metadata. Finally, additional preservation avenues developed by 3rd parties for more advanced preservation are presented.
Following the ICPSR recommendations, at minimum, the database tables should be exported into flat CSV files and included alongside the original database itself. To do this, first make a copy of the database and then
- In Access (2016 used as an example), make sure to group by Object Type and display All Access Objects in the side bar (Fig. 2a).
- Right click on a table and select Export -> Text File.
- Enter a name and location for the file. Make sure to change the extension to 'csv' instead of 'txt'. Do NOT check the box to 'Export data with formatting and layout'. Click Ok.
- In the Export Text Wizard window show in Fig 2b, make sure Delimited is selected.
- Click the Advanced button in Fig 2b to change date export format and other settings as shown in Fig. 2c. Save the export settings to apply the date export to other tables more quickly.
- Repeat 2-5 for every table in the database. Also see Linked Tables below for how to deal with linked tables. Note: there is no built-in way to export more than one table at a time.
- Forms: Can be exported directly to PDF by right-clicking and selecting Export -> PDF or XPS. Bound forms that are directly linked to a table will be exported in their entirety, including the records themselves.
- Macros cannot be exported unless they are first converted to Visual Basic (use the Convert Macros to Visual Basic button in the macro Design tab in Access)
- Modules can be directly exported as text files.
- Queries: Don't use the export menu as this will export the query results instead of the query itself. To export the query, open it in Access in Design View and from the View menu, select SQL View. The text can then be copied into an empty text file with extension '.sql'.
- Click on Database Tools in the ribbon and open the Database Documenter
- Click the All Object Types tab and select all
- In the Options, set the settings as shown in Fig. 4 so that all available comments are included.
- When complete, the report will appear in a Print Preview window. Export the report to PDF by clicking the appropriate button on the ribbon. Note: There are other export options but the PDF export is best for preservation since it is a self-contained document.
Finally, in addition to the Relationships Report generated by the Database Documenter, the diagram from the Relationships editor should be included as a standalone attachment in the form of an image. This is because the relationships report generated by the Database Documenter does not include an overview of all the relationships in the database. To obtain the diagram click on the Relationships button in the Database Tools tab (Fig. 5a) and you will see a window that details the relationships within the database (Fig 5b). You can take a screenshot or export the report to PDF. Typically, the database contributor, rather than the curator, would arrange the relationship diagram in a way that makes sense, especially if intending the shared database to be repurposed.
Linked Tables Linked tables are tables that exist in another Access database. If the table is not included with the database being curated, they cannot be opened. These tables can be converted to local tables by right clicking the table and selecting the appropriate menu entry (assuming the linked table was part of the deposit as a separate file or otherwise made available). You can also use the Linked Table Manager (Fig. 6) to see and update all linked tables. Additionally, there is a consideration if a query in Access references external databases (e.g., SQL servers)
ICPSR Recommendations ICPSR preservation recommendations: "To preserve relational databases, users should export the database tables as flat rectangular files and preserve the table relationships using, for instance, SQL schema statements. When databases are used as survey instruments or other data input/out mechanisms, the look and feel of the user interface can be preserved by creating a static PDF image of the interface. Promising software is currently under development to normalize relational databases into non-proprietary formats such XML." (https://www.icpsr.umich.edu/icpsrweb/content/deposit/guide/chapter4.html)
Database Preservation Toolkit (DBPTK) and SIARD standard The Database Preservation Toolkit facilitates digital preservation of databases by converting a range of platforms to preservation-ready formats, in particular, SIARD 2.0. This format currently supports MS Access 2000 to 2016 (both MDB and ACCDB) formats. It is based on four internationally recognized standards: XML, SQL:2008, UNICODE and ZIP64format. The SAIRD format and toolset have been adopted in Switzerland and other countries. Further information and the executable application can be found at http://www.database-preservation.com/. Note: The DBPTK is command-based and would take time to install and implement. The decision to convert to SIARD would be based on your archive's long-term preservation policy rather than shorter-term reuse for research, for which depositing the original Access format may be sufficient.
- Make sure to verify (either by testing each table individually or asking the researcher) that the database does not contain tables that are linked to other databases. In recent version of Access it is indicated by the small arrow beside the table icon (Fig. 7).
Trying to open these linked tables will result in an error. See the Curation section for more information
- Eporting tables to CSV will ensure at least the data is accessible in a non-proprietary format
- Can LibreOffice (open-source alternative to Microsoft Office) open Access databases directly or can the Access database be exported to a format that can be opened by LibreOffice Base?
- No, but tables can be linked between Base and Access. See here for details.
- Preserving databases that reside on servers. How to download (do they act as independent files?) (Sharepoint?) Similarly, Access may be the front-end to tables on server-based SQL, which may be challenging to export and keep associated as a single package for archiving.
- Adding comments to macros and modules (e.g., within the coding).
- Investigating 3rd party tools for accessing Access databases for obtaining version numbers, counts of tables, existence of linked tables without having to manually open each table and query in Access. There are some tools for *nix like mdbtools that claim to be able to interact with Access databases.
- Converting Access to MySQL as an alternative open format. Guidelines can be found at this link. Bullzip offers an Access to MySQL converter utility.
- Exporting tables to CSV: did curator do it or were they provided by the researcher?
- Documenting the presence of linked tables or queries that reference external servers
- Version of MS Access needed to open file
- Any changes to the internal metadata. E.g., if the curator added descriptions to fields or modules using MS Access built-in functionality. This would affect what is produced by the Database Documenter.
- Document relationship among any files or parts exported from the database, e.g., individual tables and the keyfields that relate them, spatial data, reports. Document whether the curator or the researcher generated them.
- Stating whether the curator generated the Database Documenter report vs being generated by the researcher. This implies differences in the quality of the documentation - especially if the curator-generated report was not vetted by the researcher. (A researcher-generated report does not necessarily imply quality either.)
- Data Carpentry SQL discussion
- Library of Congress has some good descriptions
- Database preservation: more advanced techniques that are platform-independent
- SIARD (Software Independent Archiving of Relational Databases)
- Relational Database Viewer for databases based on SIARD 2
- MS Access Documentation and Backup
- Create an Access app package for saving as a Snapshot for preservation
- Database Documenter: accessing and printing the database design
Check files and read documentation (risk mitigation, file inventory, appraisal/selection
Understand the data (or try to), if not… (run files/environment, QA/QC issues, readmes)
Request missing information or changes (tracking provenance of any changes and why)
Augment metadata for findability (DOIs, metadata standards, discoverability)
Transform file formats for reuse (data preservation, conversion tools, data viz)
Evaluate for FAIRness (licences, responsibility standards, metrics for tracking use)
Document your curation activities
CURATE Action | Curator Checklist |
---|---|
Check data files and read documentation
|
|
CURATE Action | Curator Checklist |
---|---|
Understand the data (or try to)
|
Varies based on file formats and subject domain. For example…. MS Access ☐ Is there documentation on tables, relationships, queries, etc?
☐ Which tables or queries are the relevant ones used in a publication? |
CURATE Action | Curator Checklist |
---|---|
Request missing information or changes
|
|
CURATE Action | Curator Checklist |
---|---|
Augment the submission
|
|
CURATE Action | Curator Checklist |
---|---|
Transform file formats
|
|
CURATE Action | Curator Checklist |
---|---|
Evaluate and rate the overall data record for FAIRness.*
|
|
*Rubric evaluating the FAIR principles are based on the scoring matrix by Dunning, de Smaele, & Böhmer (2017).
CURATE Action | Curator Checklist |
---|---|
Document throughout curation activities.
|
☐ Repository collection metadata ☐ Provenance logs ☐ Service workflow ☐ Preservation packaging ☐ Any additional requirements at your institution |