Course Description1
Scientific discovery is typically a collective process, as researchers build their work on the preceding efforts of other researchers. This is certainly the case for theory, empirical evidence, and methods, as empirical researchers use analytical techniques developed by methodologists, theoreticians build on up-to-date evidence, and data collection inspires new methods of analysis. The reality is that contemporary research is not possible in isolation. A key element of the web of research relationships is the basic unit of research output, which typically takes the form of a journal paper, book chapter, or report. This unit of output, however, represents only the face of a multilayered process, and by its very nature is limited in the amount of information that it can communicate.
Increasingly, the development of recent technologies makes it easier and less expensive to communicate with greater efficiency. From data repositories to supplementary e-content in journals, as well as data policy requirements of research funders, there is a strong incentive for research to become more open and reproducible. Reproducibility means that research results can be verified independently, including all relevant assumptions and decisions. Every figure, every table, and every result are open for inspection, including the processes used to generate them. Research reproducibility is essential to maintain trust in the process, and has numerous advantages, including accelerating discovery and reducing inequality in access to research tools and results. Furthermore, other researchers can more easily use methods and tools if they are open. Not surprisingly, as newer technologies facilitate the transfer of research findings (including open data, open software, and open publishing), there has been a growth of interest in ways of achieving openness and reproducibility.
The objective of this course is to equip students with the fundamental concepts and tools needed to develop a reproducible research workflow. The course should be of interest to new graduate students in the sciences and social sciences, and is relevant to research involving qualitative or quantitative data. The course is also appropriate for experienced researchers who would like to update their workflow to comply with reproducibility criteria.
The course will cover the following topics:
- Fundamentals of reproducible research
- Basic tools for implementing a reproducible research workflow: GitHub and R
- Data Management Plans
- Creating basic units of shareable code
- Documenting the process of doing research
- Generating reproducible research documents
By the end of the course, the students will produce a report with all the necessary components to make it a unit of reproducible research. In the spirit of the course, resources and materials will be based on mostly open resources.
Antonio Paez | Professor |
---|---|
Office: GSB 236 | |
Office Hours: Tuesday 11:30 am – 12:30 pm | |
Phone: (905) 525-9140, ext. 26099 | |
Email: paezha@mcmaster.ca |
Krysha Dukacz | GWF Data Manager |
---|---|
Office: GSB 218 | |
Office Hours: TBD | |
Phone: (905) 525-9140, ext. 20132 | |
Email: dukaczka@mcmaster.ca |
The course will be organized in weekly 2-hour meetings. The format of the meetings will be a combination of seminar-style discussion, hands-on activities, and guest speakers. The topics and readings are found in the Course Schedule.
Students are responsible for completing the readings indicated in the Course Schedule. Any resources that are not open will be shared by the instructors.
Students will be assessed based on the completion of a sequence of activities. Note that the activities are designed to combine towards one final deliverable, so it is not advisable to skip any of them.
Activity 1: R Markdown Exercise | 5% |
Activity 2: Version Control Exercise | 5% |
Activity 3: Initial DMP | 10% |
Activity 4: Updated DMP | 10% |
Activity 5: Data Package | 15% |
Activity 6: Data Analysis Documentation | 15% |
Activity 7: Peer Review Exercise | 20% |
Final Deliverable | 20% |
McMaster’s graduate grading system will be used. Note that according to section 2.5.3 of the Graduate Calendar located at http://www.mcmaster.ca/graduate/grad_calendar.pdf passing grades are A+, A, A-, B+, B and B- only.
Academic dishonesty consists of misrepresentation by deception or by other fraudulent means and can result in serious consequences, e.g. the grade of zero on an assignment, loss of credit with a notation on the transcript (notation reads: “Grade of F assigned for academic dishonesty”), and/or suspension or expulsion from the university. It is your responsibility to understand what constitutes academic dishonesty. For information on the various kinds of academic dishonesty please refer to the Academic Integrity Policy, specifically Appendix 3, located at http://www.mcmaster.ca/univsec/policy/AcademicIntegrity.pdf
The following illustrates only three forms of academic dishonesty:
- Plagiarism, e.g. the submission of work that is not one’s own or for which other credit has been obtained.
- Improper collaboration in group work.
- Copying or using unauthorized aids tests and examinations.
Week 1 (Sept. 11)
Topic: Course overview and introduction: Why reproducible research?
Readings: No readings this week
For discussion: Principles of open science, advantages, funding and policy environment, journal policies and the publication process, roadmap for course
Week 2 (Sept. 18)
Topic: R + RStudio + markdown
Suggested Readings:
What is R?
R for Data Science
What is Markdown
Activity 1: Use markdown to create a document with basic operations in R
Week 3 (Sept. 25)
Topic: Version Control and GitHub
Readings:
What is version control?
What is GitHub?
Activity 2: Post a README notice in GitHub and one document with basic operations in R
Week 4 (Oct. 2)
Topic: Data Management Plans (DMP): Principles
Readings:
10 aspects of highly effective research data
Activity: Create a list of data that you will be creating and using as part of your project
Week 5 (Oct. 9)
Topic: Data Management Plans (DMP): Tools
Readings: TBD
Activity 3: Write an initial DMP and post in GitHub
Week 6 (Oct. 16)
Topic: Reading week
Readings: N/A
Week 7 (Oct. 23)
Topic: Forensic issues and archiving
Readings: TBD
Activity 4: Update the DMP and post in GitHub
Week 8 (Oct. 30)
Topic: Creating packages in R and documenting datasets
Readings:
Writing an R package from scratch
R Package Primer - A minimal Example
R Packages
Building R Packages
Activity 5: Create a small package with a dataset
Week 9 (Nov. 6)
Topic: Documenting data analysis and use of RMarkdown
Readings:
Ten Simple Rules for Reproducible Computational Research
Best Practices for Scientific Computing
Activity 6: Create an R Makdown file with documented data analysis (a vignette for your package)
Week 10 (Nov. 13)
Topic: Peer review and collaboration
Readings: Review readings of Sessions 7 and 8
Activity 7: In-class activity peer reviewing packages, vignettes, and revisions due in GitHub
Week 11 (Nov. 20)
Topic: Rticles
and practical issues preparing self-contained open research documents (math notation and figures)
Readings:
LaTeX for Beginners
ggplot2
: A Package for a Grammar of Graphics
Activity: No activity this week
Week 12 (Nov. 27)
Topic: Rticles
and practical issues preparing self-contained open research documents (tables and citations)
Readings:
BibTeX
KableExtra for HTML
KableExtra for PDF
Activity: Final deliverable due on DATE TBD.
Week 13 (Dec. 4)
Topic: Extras: 3D plots
Readings: No readings assigned
The University reserves the right to change any aspect of this course outline.