# Joe Bruin Example

# Data Management Plan (DMP) for NIH-Funded Research Project

## Project Title:
[Provide a concise and descriptive title for your research project]

## Principal Investigator (Lead PI):
- **Name:** Example Exam
- **Affiliation:** UCLA
- **Email:** Example@g.ucla.edu
- **ORCID:** 9168707386

## Collaborators:
- **Name:** Joe Bruin
- **Affiliation:** UCLA
- **Email:** JoeBruin@ucla.edu
- **ORCID:** 5597255183

## 1. Data Types:
The research project involves the collection and analysis of imaging data.

## 2. Data Collection Methods:
Data will be collected through imaging techniques using [Specify the instruments or equipment]. Ensure that data collection adheres to ethical standards, and obtain necessary approvals from relevant review boards.

## 3. Documentation:
Comprehensive documentation will be maintained for each dataset, including details on data collection methods, instrument calibration, and any preprocessing steps applied. Standard operating procedures (SOPs) will be created and updated as needed.

## 4. Data Formats:
Data will be stored in standard formats compatible with the imaging instruments used. The use of open and widely accepted formats will be prioritized to enhance interoperability and future data sharing.

## 5. Metadata Standards:
Metadata will be collected and stored using [Specify the metadata standards]. This information will include details on the dataset, data provenance, and any transformations applied during analysis.

## 6. Storage Locations:
Data will be stored securely on servers hosted at UCLA. Access controls will be implemented to restrict unauthorized access. Regular monitoring of storage space and usage will be conducted.

## 7. Backup Procedures:
Regular backups of the data will be performed, and backup copies will be stored in a geographically separate location. Backup frequency will be [Specify the frequency, e.g., weekly], and the integrity of backup copies will be routinely verified.

## 8. Data Security Measures:
Security measures, including encryption and access controls, will be implemented to protect the confidentiality and integrity of the data. Access to sensitive data will be restricted based on roles and responsibilities.

## 9. Data Sharing:
Data sharing is an integral part of this project. De-identified data will be shared with the scientific community through established repositories. The sharing process will comply with NIH data sharing policies, and datasets will be accompanied by detailed metadata.

## 10. Repositories for Data Sharing:
Data will be deposited in [Specify the repositories, e.g., NIH Data Archive (NDA)], ensuring compliance with repository-specific guidelines. Accession numbers and relevant information will be included in publications and shared with collaborators.

## 11. Long-Term Preservation Strategies:
Data will be preserved for the long term to facilitate future research and validation of results. Preservation will follow best practices, including format migration and periodic assessment of file integrity. A plan for the transfer of data stewardship will be established.

## 12. Roles and Responsibilities for Data Management:
Clear roles and responsibilities for data management will be established. The PI and collaborators will ensure the implementation of data management protocols, with specific individuals assigned to oversee documentation, storage, security, and sharing.

## 13. Compliance with Ethical and Legal Considerations:
The research team will adhere to all relevant ethical guidelines and legal requirements governing data collection, storage, and sharing. Informed consent will be obtained from participants, and data handling will comply with institutional, federal, and international regulations.

## 14. Related Tools:
The project will utilize R, Python, Jupyter Notebooks, and Google Collabs for data analysis and visualization. Code and analysis scripts will be documented and stored alongside the data to ensure transparency and reproducibility.

## Contact Information for Data Management:
For inquiries related to data management, please contact:
- Example Exam (Lead PI)
  - Email: Example@g.ucla.edu
  - Phone: [Provide a contact number]

This Data Management Plan is subject to periodic review and updates as necessary to ensure ongoing compliance with NIH guidelines and best practices in data management.

# Covid-19 Example

**Data Management Plan (DMP) for COVID-19 Research Project**

*Project Title: Understanding the Impact of COVID-19: [Project Name]*

*Principal Investigator: [Your Name]*

*Institution: [Your Institution]*

*Date: [Date]*

**1. Introduction**

The purpose of this Data Management Plan (DMP) is to outline the strategies for effective management, storage, security, and sharing of data generated in the course of the COVID-19 research project. This DMP adheres to the National Institutes of Health (NIH) guidelines and incorporates the latest recommendations as of [current date].

**2. Data Types and Collection Methods**

*2.1 Data Types*

The project involves the collection of both primary and secondary data. Primary data includes patient health records, clinical trial data, and laboratory results. Secondary data includes publicly available datasets related to COVID-19.

*2.2 Collection Methods*

Data will be collected through structured interviews, surveys, clinical trials, and laboratory experiments. The instruments for data collection will include electronic health record systems, survey software, and laboratory equipment.

**3. Documentation**

*3.1 Metadata Standards*

Metadata will be documented using standard formats, such as the Data Documentation Initiative (DDI) for survey data and the Clinical Data Interchange Standards Consortium (CDISC) for clinical trial data.

*3.2 Version Control*

A version control system will be implemented to track changes in datasets over time. Each dataset will be assigned a unique identifier to facilitate traceability.

**4. Storage**

*4.1 Storage Locations*

Data will be stored on secure servers located within the institution's data center. Access to these servers will be restricted to authorized personnel only.

*4.2 Backup Procedures*

Regular backups will be conducted to prevent data loss. Backup data will be stored on a separate secure server with redundancy measures in place.

**5. Data Security Measures**

*5.1 Access Control*

Access to sensitive data will be restricted based on roles and responsibilities. Only authorized personnel with the appropriate permissions will be able to access and modify data.

*5.2 Encryption*

All data in transit will be encrypted using secure protocols. Additionally, data at rest will be encrypted to protect against unauthorized access.

**6. Data Sharing**

*6.1 Plans for Data Sharing*

Data sharing is an integral part of the project. De-identified datasets will be shared with other researchers through established data sharing platforms and repositories.

*6.2 Repositories for Data Sharing*

Datasets will be deposited in the NIH-supported repositories, such as the National Center for Biotechnology Information (NCBI) and the National Institute of Allergy and Infectious Diseases (NIAID) data repository.

**7. Long-Term Preservation Strategies**

*7.1 Data Retention Policy*

A data retention policy will be established to determine the duration for which data will be preserved. This policy will be aligned with NIH guidelines and any other relevant regulatory requirements.

*7.2 Archiving*

At the end of the project, datasets will be archived in a secure and sustainable repository for long-term preservation, ensuring accessibility for future research.

**8. Roles and Responsibilities for Data Management**

*8.1 Data Management Team*

A dedicated data management team will be responsible for overseeing the implementation of the DMP. This team will include data managers, IT professionals, and project coordinators.

*8.2 Training and Awareness*

All project personnel will receive training on data management practices, including ethical considerations and compliance with NIH guidelines.

**9. Compliance with Ethical and Legal Considerations**

*9.1 Informed Consent*

Informed consent will be obtained from all participants in the research project. The data management plan will adhere to ethical standards and institutional review board (IRB) requirements.

*9.2 Compliance with NIH Policies*

The project will comply with all NIH policies and guidelines, including those related to data sharing, privacy, and security.

**10. Conclusion**

This Data Management Plan outlines the procedures for the effective and secure management of data generated during the COVID-19 research project. Regular reviews and updates to the DMP will be conducted to ensure ongoing compliance with NIH guidelines and any emerging best practices in data management.


# Example 3

## Input:

## Output

**Data Management Plan (DMP) for the Research Project: "Toxic Stress and Immune Response in Adolescents Vulnerable to HIV Infection"**

**1. Project Overview:**
   - **Title:** Toxic Stress and Immune Response in Adolescents Vulnerable to HIV Infection
   - **Lead PI:**
      - **Name:** Example Exam
      - **Affiliation:** UCLA
      - **Email:** Example@g.ucla.edu
      - **ORCID:** 9168707386
   - **Collaborators:**
      - **Name:** Joe Bruin
      - **Affiliation:** UCLA
      - **Email:** JoeBruin@ucla.edu
      - **OCRID:** 5597255183

**2. Data Types:**
   - We will adhere to the FAIR principles (Findable, Accessible, Interoperable, Reusable) in managing our data.

**3. Data Collection:**
   - Data will be collected from approximately 1500 participants in the ATN 149 study, which is supported by the NIH U19 grant (U19HD089886).

**4. Data Formats and Instruments:**
   - Data will be stored in spreadsheet formats to facilitate sharing and analysis.
   - The R software will be utilized for data collection, ensuring consistency and compatibility.

**5. Metadata Standards:**
   - We will implement standardized metadata to accompany datasets, providing comprehensive information about the data's context and structure.

**6. Data Documentation:**
   - Comprehensive documentation of data collection methods, variable definitions, and any transformations applied will be maintained throughout the project.

**7. Storage Locations:**
   - Data will be securely stored on servers hosted at UCLA, ensuring controlled access for authorized personnel.
   - Cloud storage, with appropriate security measures, will be employed for backup purposes and enhanced accessibility.

**8. Backup Procedures:**
   - Regular automated backups will be conducted to mitigate the risk of data loss.
   - Backup copies will be stored on separate, secure servers to ensure redundancy.

**9. Data Security:**
   - Access to project data will be restricted to authorized personnel, and encryption measures will be applied during both data transmission and storage to enhance security.

**10. Data Sharing:**
   - All spreadsheets containing project data will be shared with collaborators, fostering transparency and collaborative analysis.
   - Data sharing will adhere to controlled access protocols, ensuring compliance with ethical and legal considerations.

**11. Data Repository:**
   - The NIMH Data Archive has been selected as the primary repository for the project's datasets, ensuring long-term accessibility.

**12. Long-Term Preservation:**
   - Data will be preserved for a minimum of 10 years beyond the project's completion.
   - The NIMH Data Archive will be notified of any changes in contact information to ensure continued accessibility and adherence to preservation standards.

**13. Roles and Responsibilities:**
   - The lead PI and collaborators will be responsible for data collection, documentation, and sharing.
   - UCLA's IT department will manage data storage, backup procedures, and security measures, ensuring the integrity and confidentiality of the project's data.

**14. Compliance:**
   - The project will strictly adhere to all NIH guidelines for data management throughout its lifecycle.
   - Regular reviews will be conducted to incorporate any updates or changes in NIH policies into the data management plan.

**15. Ethical and Legal Considerations:**
   - The project will strictly adhere to all relevant ethical standards and legal requirements.
   - Informed consent will be obtained from study participants, and all data handling processes will respect privacy and confidentiality.

This Data Management Plan is a dynamic document, subject to updates as needed throughout the research project. Regular reviews will be conducted to ensure ongoing compliance with NIH guidelines and any changes in project requirements.

# Example Giving a Example DMP

## Input

This is an example DMP:
Plan Overview
A Data Management Plan created using DMPTool
DMP ID: https://doi.org/10.48321/D10S99
Title: Toxic Stress and Immune Response in Adolescents Vulnerable to HIV Infection
Creator: Tara Kerin - ORCID: 0000-0003-2655-5605
Affiliation: University of California, Los Angeles (ucla.edu)
Funder: National Institutes of Health (nih.gov)
Funding opportunity number: PAR-23-061
Template: NIH-Default DMSP
Project abstract:
Psychological stress has been previously shown to alter immune system function and reduce the
body’s ability to fight infection, and additionally burdens those infected with HIV or at risk for
HIV. Utilizing a unique established cohort of young people in Los Angeles and New Orleans who
are at high risk for HIV, this proposed study focuses on examining immune response biomarkers
with stress measures, including adverse childhood experiences (ACEs), and how these change the
course of HIV acquisition. Prevention of HIV is of great relevance and research that may provide
data in decreasing the biological susceptibility of transmission may alleviate the public health
burden. Additionally, more information regarding how stress may affect those at high risk for
HIV, could direct programs and resources in the future for management and prevention.
Biomarkers will be assessed using ELISA, PCR and flow cytometry. ACEs were assessed with a
10-question interview. Additional Positive Childhood Experiences (PCEs) will also be assessed.
Baseline and follow up interviews provide demographic and covariate data, as well as other
markers of stress as potential confounders. We will explore if ACEs and PCEs scores are
associated with inflammation markers in this high-risk cohort. Additionally, we will investigate
how ACEs, as a proxy for inflammation, can predict the acquisition of HIV and other sexually
transmitted infections. These proposed activities will simultaneously provide results relevant to
Created using DMPTool. Last modified 07 September 2023 1 of 6
current gaps in the literature regarding toxic stress and immune response in youth at risk for HIV
and will also provide the preliminary data to advance research towards how trauma response and
treatment may be used in HIV prevention efforts.
Start date: 04-15-2024
End date: 04-14-2026
Last modified: 09-07-2023
Copyright information:
The above plan creator(s) have agreed that others may use as much of the text of this plan as they
would like in their own plans, and customize it as necessary. You do not need to credit the
creator(s) as the source of the language used, but using any of the plan's text does not imply that
the creator(s) endorse, or have any relationship to, your project or proposal
Created using DMPTool. Last modified 07 September 2023 2 of 6
Toxic Stress and Immune Response in Adolescents Vulnerable to HIV Infection
Data Type
Types and amount of scientific data expected to be generated in the project: Summarize
the types and estimated amount of scientific data expected to be generated in the project.
Describe data in general terms that address the type and amount/size of scientific data
expected to be collected and used in the project (e.g., 256-channel EEG data and fMRI
images from ~50 research participants). Descriptions may indicate the data modality (e.g.,
imaging, genomic, mobile, survey), level of aggregation (e.g., individual, aggregated,
summarized), and/or the degree of data processing that has occurred (i.e., how raw or
processed the data will be)
This proposal will be leveraging previously collected data from ~1500 participants in the ATN 149
study (NIH funded U19 grant:U19HD089886). New data collected will include biomarker assays
(biological specimens) for ~200 participants and survey assessments (seven questions, conducted by
phone) for ~400 participants.
This project will produce biomarker assay data generated/obtained from EIA and flow cytometry,
and PCE data from a 7 question survey. Data will be collected from 200 research specimens, and 400
participants generating twp datasets totaling approximately 1 MB in size. Raw data will be
transformed by R and the subsequently processed dataset used for statistical analysis. To protect
research participant identities, summarized data will be made available for sharing.
Scientific data that will be preserved and shared, and the rationale for doing so: Describe
which scientific data from the project will be preserved and shared and provide the
rationale for this decision.
All data produced in the course of the project will be preserved and shared.
In this proposed project, the cleaned, item-level spreadsheet data for all variables will be shared
openly, along with example quantifications and transformations from initial raw data. The rationale
for sharing only cleaned data is to foster ease of data reuse. Data will be shared to allow for
replication of findings, as well as to encourage and promote collaboration.
Metadata, other relevant data, and associated documentation: Briefly list the metadata,
other relevant data, and any associated documentation (e.g., study protocols and data
collection instruments) that will be made accessible to facilitate interpretation of the
scientific data.
Created using DMPTool. Last modified 07 September 2023 3 of 6
To facilitate interpretation of the data, a codebook/data dictionary and statistical analysis plans will
be created, shared, and associated with the relevant datasets.
Related Tools, Software and/or Code
State whether specialized tools, software, and/or code are needed to access or manipulate
shared scientific data, and if so, provide the name(s) of the needed tool(s) and software and
specify how they can be accessed.
All data will be analyzed with R, which is available as an open-source and free data software for
analysis. Coding for the data and relevant code to analyze the data will be available freely.
Standards
State what common data standards will be applied to the scientific data and associated
metadata to enable interoperability of datasets and resources, and provide the name(s) of
the data standards that will be applied and describe how these data standards will be
applied to the scientific data generated by the research proposed in this project. If
applicable, indicate that no consensus standards exist
In accordance with FAIR Principles for data, we will use open file formats (e.g. JPEG, MP4, CSV,
TXT, PDF, HTML, etc.) and persistent unique identifiers (PIDs). Data will be stored in common and
open formats, such as .csv files for our data. Information needed to make use of this data along with
references to the sources of those standardized names and metadata items will be included wherever
applicable.
Data Preservation, Access, and Associated Timelines
Repository where scientific data and metadata will be archived: Provide the name of the
repository(ies) where scientific data and metadata arising from the project will be
archived.
All will be formatted using existing standards and deposited in the NIMH Data Archive
How scientific data will be findable and identifiable: Describe how the scientific data will
be findable and identifiable, i.e., via a persistent unique identifier or other standard
Created using DMPTool. Last modified 07 September 2023 4 of 6
indexing tools.
The NIMH Data Archive provides searchable study-level metadata for dataset discovery, and assigns
DOIs as persistent identifiers, and has a robust preservation plan to ensure long-term access. Data
will be discoverable online through standard web search of the study-level metadata as well as the
persistent pointer from the DOI to the dataset.
When and how long the scientific data will be made available: Describe when the scientific
data will be made available to other users (i.e., no later than time of an associated
publication or end of the performance period, whichever comes first) and for how long
data will be available.
Shared data generated from this project will be made available as soon as possible, and no later than
the time of publication or the end of the funding period, whichever comes first. The duration of
preservation and sharing of the data will be a minimum of 3 years after the end of the funding
period.
Access, Distribution, or Reuse Considerations
Factors affecting subsequent access, distribution, or reuse of scientific data: NIH expects
that in drafting Plans, researchers maximize the appropriate sharing of scientific data.
Describe and justify any applicable factors or data use limitations affecting subsequent
access, distribution, or reuse of scientific data related to informed consent, privacy and
confidentiality protections, and any other considerations that may limit the extent of data
sharing.
The investigators do not foresee any factors preventing access, distribution, or reuse of scientific
data.
Whether access to scientific data will be controlled: State whether access to the scientific
data will be controlled (i.e., made available by a data repository only after approval).
Controlled access will not be used. The data that is shared will be shared by unrestricted download.
Protections for privacy, rights, and confidentiality of human research participants:
If generating scientific data derived from humans, describe how the privacy, rights, and
confidentiality of human research participants will be protected (e.g., through deidentification, Certificates of Confidentiality, and other protective measures).
Created using DMPTool. Last modified 07 September 2023 5 of 6
Screening and study visits were conducted in a room or area where conversations cannot be
overhead by others. Additional follow-up interviews will be conducted via phone or zoom in a
private location. A random subject code has previously been assigned to identifiable data to preserve
confidentiality. No individual identifiers (participant name, social security number, or device serial
number) will be stored. All data use will be in accordance with the Informed Consent Form signed
by the participant. Paper copies of case report forms and electronic copies of any notes will be stored
on an encrypted hard drive. Source forms will be stored in the Dr. Kerin’s locked office located in
the Department of Pediatrics and any digital data in a password-protected computer located within
her office. After the study is completed, hard copies of the notes will be shredded and electronic
records will be digitally erased. To prevent breaches of data security, personally identifiable data
will not be transmitted via the internet. We will notify UCLA ethics committees in writing if there is
a breach of the data safety protocol, loss of files, or other misuse of the data.
Oversight of Data Management and Sharing
Describe how compliance with this Plan will be monitored and managed, frequency of
oversight, and by whom at your institution (e.g., titles, roles).
As part of data management, data monitoring will be performed on a regular basis to maintain the
integrity of the data. As data is entered into the system, the PI, Dr. Kerin (ORCID: tarakerin) will
perform regular checks in all the clinical databases for recurrent missing documentation, data
inaccuracies, errors in submitted data and missing data. Logs of these data issues will be maintained
to identify problem areas with specific variables, or with specific study teams allowing us to
proactively modify the data collection instruments or re-train study coordinator/data entry staff. Logs
of communications with study coordinators about data cleaning and management will also be
maintained to keep track of corrected issues. Broader issues of DMS Plan compliance oversight and
reporting will be handled by the PI and Co-I team as part of general UCLA stewardship, reporting,
and compliance processes.

I want you to create a DMP that adheres to the NIH guidelines regarding ChatGPT's effect on early academics

## Output

**Data Management Plan (DMP) for Investigating ChatGPT's Impact on Early Academics**

**Plan Overview**
- **DMP ID:** [Your generated DMP ID]
- **Title:** Examining the Impact of ChatGPT on Early Academic Learning
- **Creator:** [Your Name] - ORCID: [Your ORCID]
- **Affiliation:** [Your University/Institution]
- **Funder:** National Institutes of Health (NIH) - [Funding Opportunity Number]
- **Template:** NIH-Default DMSP

**Project Abstract:**
This research aims to investigate the potential influence of ChatGPT, a language generation model, on early academic learning in children. Utilizing a diverse sample of young participants, we will explore the interaction between ChatGPT exposure and cognitive development, educational outcomes, and potential ethical considerations.

**Start Date:** [Start Date]
**End Date:** [End Date]
**Last Modified:** [Last Modified Date]

**Copyright Information:**
This plan is available for use by others with the understanding that it does not imply endorsement or a relationship with the original project. No credit is required for the use of this plan.

### **Data Type**

**Types and Amount of Scientific Data Expected:**
The project will involve collecting data from child participants, including cognitive assessments, educational performance metrics, and qualitative feedback. We estimate data from approximately 500 participants, with diverse demographic characteristics.

**Scientific Data Preservation and Sharing:**
All data produced during the project will be preserved and shared openly. Cleaned, de-identified data will be shared to facilitate replication and collaboration.

**Metadata and Documentation:**
A comprehensive codebook/data dictionary and statistical analysis plans will be created and shared to aid in data interpretation. Study protocols and data collection instruments will also be made accessible.

### **Related Tools, Software, and/or Code**

The analysis will be conducted using open-source tools, and all relevant code will be made freely available to ensure transparency and replicability.

### **Standards**

The project will adhere to FAIR Principles, using open file formats and persistent unique identifiers (PIDs) for data. Data will be stored in common and open formats such as .csv files.

### **Data Preservation, Access, and Associated Timelines**

**Repository:**
All data will be formatted using existing standards and deposited in [Repository Name].

**Findable and Identifiable:**
The data will be findable and identifiable through a persistent unique identifier provided by the repository.

**Data Availability:**
Shared data will be available as soon as possible, no later than the time of publication or the end of the funding period. Data will be preserved and shared for a minimum of 5 years after the end of the funding period.

### **Access, Distribution, or Reuse Considerations**

**Factors Affecting Access:**
No factors are foreseen that would prevent access, distribution, or reuse of scientific data.

**Controlled Access:**
Controlled access will not be used. Data will be shared through unrestricted download.

### **Privacy and Confidentiality Protections**

The privacy, rights, and confidentiality of child participants will be safeguarded through de-identification, encryption, and secure storage practices. All data use will comply with the informed consent form signed by the participants.

### **Oversight of Data Management and Sharing**

Data management compliance will be monitored by the Principal Investigator [Your Name], with regular checks for data integrity and ongoing communication with the study team. Logs of data issues and resolutions will be maintained for proactive modification of data collection instruments. Oversight and reporting will align with institutional stewardship, reporting, and compliance processes.

[Note: Replace placeholders like [Your Name], [Your ORCID], [Start Date], [End Date], etc., with actual information relevant to your project.]