Skip to content


Folders and files

Last commit message
Last commit date

Latest commit


Repository files navigation

Data Science in Education Using R

Note from Our Publisher

The authors of this text and the publisher Taylor and Francis are pleased to make Data Science in Education Using R available via bookdown at They request that readers access the book via the website or in print form only and do not download or reproduce copies in any other form. Any attempt to do so will be considered a contravention of the publisher’s terms of availability.

Reading the Book

We wrote this book for you and are excited to share it! You can read the current version at The print version is available now through Routledge.

The Aims of This Book

School districts, government agencies, and education businesses are generating data at a dizzying pace. They're serving it to teachers, administrators, and education consultants in a mind-boggling variety of formats. Educators and educational data practitioners wanting to use data to improve the lives of students know the questions they want to ask, but the available data is often not ready to be analyzed. Sometimes educators need to use high-cost proprietary systems to access and prepare data before using it to answer their questions.

Educational data rarely comes in a “ready-to-analyze” format. As a result, it's hard for enthusiastic practitioners to feel a connection between their questions and the data needed to answer them. To get value from the data-deluge, some educational data practitioners are adopting data science tools, like R. R is an Open Source programming language for data analysis. When data science meets education, the numbers confined to websites and PDF reports are set free. Teachers, administrators, and consultants apply programming and statistics to prepare data, transform it, visualize it, and analyze it to answer questions that make a difference for their students.

Our book focuses on data science in education, which we define as using data science techniques like preparing, exploring, visualizing, and modeling data, in order to support schooling at all levels. We want to make a case for learning about data science through field-specific examples. Understanding the unique challenges and starting to use a common field-specific language is important for mastering data science in education. We feel that discussing data science using education-specific scenarios more effectively speaks to the needs of educators.

Technology is transforming both the administrative and student-facing sides of education. It's becoming increasingly important for educators - not just people hired to analyze data - to understand what stories this new data tells them them about their students. Our book empowers educators from elementary school to higher education to transform educational data into actionable insights so it helps them serve their students and institutions. We wrote our book to be used as a main textbook in a graduate data science in education course. We also wrote it as a practical reference for data scientists working with education data.

By the end of this book the reader will understand:

  • The diversity of data analysis skills and applications in the education field
  • Special considerations that come with analyzing education data
  • That good data analysis has a basic workflow
  • The wonderful opportunity we have to shape the usefulness of data science in our education jobs

And, the reader will be able to:

  • Reflect on and define their role as a data analyst and educator
  • Identify and apply solutions to education data’s unique challenges, such as cleaning datasets and working with aggregate student data
  • Apply a basic analytic workflow through practice with education datasets
  • Be thoughtful, empathetic, and effective when introducing data science techniques in their education jobs


  1. Introduction: Data Science in Education - You’re Invited to the Party!

  2. How to Use This Book

  3. What Does Data Science in Education Look Like?

  4. Special Considerations

  5. Getting Started with R and R Studio

  6. Foundational Skills

  7. Walkthrough 1: The Education Dataset Science Pipeline With Online Science Class Data

  8. Walkthrough 2: Approaching Gradebook Data From a Data Science Perspective

  9. Walkthrough 3: Using School-Level Aggregate Data to Illuminate Educational Inequities

  10. Walkthrough 4: Longitudinal Analysis With Federal Students With Disabilities Data

  11. Walkthrough 5: Text Analysis With Social Media Data

  12. Walkthrough 6: Exploring Relationships Using Social Network Analysis With Social Media Data

  13. Walkthrough 7: The Role (and Usefulness) of Multi-Level Models

  14. Walkthrough 8: Predicting Students’ Final Grades Using Machine Learning Methods with Online Course Data

  15. Introducing Data Science Tools To Your Education Job

  16. Teaching Data Science

  17. Learning More

  18. Additional Resources

  19. Conclusion: Where to Next?

  20. Appendices


This project started in the #dataedu Slack channel. You can join the workspace here.

Community members can contribute by making changes through a pull request. We encourage community members to do their pull requests on separate branches. This helps us keep all the changes synced up.

Git Issue Labels

To help contributors participate, we're using labels so community members can identify tasks they want to help with. When working on an issue, assign yourself to the issue. This helps us keep track of the work and lets us know who to contact for more collaboration. The labels are:

  • good first issue: These are requests for changes that we think would be fun and achievable if you're new to git and GitHub.

  • discussion: Sometimes we need help talking through a topic to help us make a good design choice for our readers. These issues won't always result in a change, but they help us clarify what's best for the final product.

  • test code: These issues are for running code and giving feedback about how it went. If there were problems, you can help us by letting us know what happened.

  • bug: The code isn't running as expected and needs fixing.

  • help wanted: Need help getting code to run or writing a section. We'll make sure the problem we're working on is clearly described in the issue.

  • writing: New content needed. At least one author will be assigned to writing issues, but we welcome collaboration! Feel free to message the author on Slack or in the issue comments to coordinate.

  • review draft: These are requests to read through a draft chapter and provide feedback on the experience, including reability.

Contact Us

If you have questions, comments, or ideas you can reach the authors by email at or on Twitter:


Bovee, E. A., Estrellado, R. A., Motsipak, J., Rosenberg, J. M., & Velásquez, I. C. (under contract). Data science in education using R. London, England: Routledge. Nb. All authors contributed equally.