Skip to content

allissadillman/DS-Training-Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 

Repository files navigation

Data Science Training Resources

R

  • RStudio Recipes: Learn data science basics using these R code snippets. Topics include everything from data tidying to building interactive apps. Note: You need to be logged into Posit Cloud to use these tutorials.

  • R for Data Science: This is an online book that teaches you how to do data science with R. You’ll learn how to get your data into R, structure it into the most useful structure, transform it, visualize it, and model it. This book provides a practicum of skills for data science.

  • Data Analysis with R Specialization: Master Data Analysis with R. Statistical mastery of data analysis, including basic data visualization, statistical testing and inference, and linear modeling. Note: This is a Coursera course and may not be free.

  • DataTrail: DataTrail is a no-cost, 14-week educational initiative for young-adult, high school and GED-graduates. DataTrail aims to equip members of underserved communities with the necessary skills and support required to work in the booming field of data science.

  • Data Science Specialization: This 10-course specialization covers the concepts and tools you'll need throughout the entire data science pipeline, from asking the right kinds of questions to making inferences and publishing results using R. It is a Coursera course and may not be free.

  • R Programming for Data Science: This book teaches the fundamentals of R programming using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for learning data science.

  • Tidverse Skills for Data Science in R: This is a Leanpub book that will help you develop insights from data using tidy tools. The Tidyverse R packages allow you to import, wrangle, visualize, and model data.

  • Teacups, Giraffes, & Statstics: A delightful series of modules to learn statistics and R coding for students, scientists, and stats-enthusiasts.

  • swirl: swirl teaches you R programming and data science interactively, at your own pace, and right in the R console.

  • Exercism offers programming puzzles to solve against a provided set of test cases. Mimicking the workflow of test-driven development (TDD), Exercism emphasizes iteration and refactoring. After solving a puzzle, solutions can be discussed with a mentor, and peers' solutions can be reviewed.

  • Esquisse: this R add-in lets you explore your data quickly to extract the information they hold. The interactive plots also come with the code used to generate them, which can be a useful way to learn data visualization with ggplots.

  • ggThemeAssist: This R add-in provides a graphical user interface for customizing ggplot2 visualization themes. You can modify the graph's attributes in real-time, and this package will modify your code for the graph output.

  • Statistics with R Specialization: Master Statistics with R in this Coursera specialization includes five courses covering inference, modeling, and Bayesian approaches.

Python

  • A Whirlwind Tour of Python: A fast-paced introduction to essential features of the Python language aimed at researchers and developers who are already familiar with programming in another language. The material is particularly designed for those who wish to use Python for data science and/or scientific programming

  • Python Challenge: In this game, each level can be solved by a bit of programming. You will be able to solve most riddles in any programming language, but some of them will require Python.

  • Data 8: The Foundations of Data Science: This UC Berkeley Foundations of Data Science course combines three perspectives: inferential thinking, computational thinking, and real-world relevance. The course teaches critical concepts and skills in computer programming and statistical inference in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social issues surrounding data analysis, such as privacy and design. Python based.

  • Automate the Boring Stuff with Python: This free e-book covers how to use Python to write programs that do in minutes what would take you hours to do by hand - no prior programming experience required. Once you've mastered the basics of programming, you'll create Python programs that effortlessly perform useful feats of automation.

  • Practice Python: This site provides over 30 beginner Python exercises that are just waiting to be solved. Each exercise includes a brief discussion of a topic and a link to a solution. New exercises are posted monthly.

  • Python for Data Science and AI: This 4-module introduction to Python will kickstart your learning of Python for data science, as well as programming in general. This beginner-friendly Python course will take you from zero to programming in Python in hours.

  • Python for Everybody Specialization: Learn to Program and Analyze Data with Python. Develop programs to gather, clean, analyze, and visualize data.

  • Data Analysis with Python: This course will take you from the basics of Python to exploring many different data types. You will learn how to prepare data for analysis, perform simple statistical analysis, create meaningful data visualizations, predict future trends from data, and more.

  • Data Visualization with Python: This course will teach you how to take data that, at first glance, has little meaning and present it in a form that makes sense to people. Various techniques have been developed for presenting data visually, but in this course, we will be using several data visualization libraries in Python, namely Matplotlib, Seaborn, and Folium.

  • Machine Learning with Python: This course dives into the basics of machine learning using Python. You will learn about the purpose of Machine Learning and where it applies to the real world. You will also get a general overview of machine learning topics such as supervised vs unsupervised learning, model evaluation, and Machine Learning algorithms.

General Bioinformatics

  • Rosalind: Rosalind is a platform for learning bioinformatics and programming through problem solving

  • Learn Galaxy: Galaxy is a scientific workflow, data integration, and data and analysis persistence and publishing platform that aims to make computational biology accessible and freely available to research scientists without computer programming experience.

  • Cyverse Learning Center: CyVerse provides life scientists with free powerful computational infrastructure to handle huge datasets and complex analyses, thus enabling data-driven discovery. This platform provides data storage, bioinformatics tools, image analyses, cloud services, APIs, and more.

  • Modern Statistics for Modern Biology: This book takes a hands-on approach and aims to enable scientists working in biological research to quickly learn many of the important ideas and methods that they need to make the most of their experiments and other available data.

  • Data Carpentry for Biologists: This website hosts introductory material for teaching biologists how to interact with data, including data structure, database management systems, and programming for data manipulation, analysis, and visualization. Most of the modules use R.

  • Statistical Methods for Functional Genomics: This site shares CSHL Statistical Methods for Functional Genomics materials, which uses R and Bioconductor.

  • Python for Biologists: This is a collection of episodes with videos, codes, and exercises for learning the basics of the Python programming language through genomics examples.

  • Genomic Data Science Specialization: This 8-course Corsera specialization covers the concepts and tools to understand, analyze, and interpret data from next generation sequencing experiments. It teaches the most common tools used in genomic data science including how to use the command line, Python, R, Bioconductor, and Galaxy.

  • Bioinformatics for the Terrified: This course will give you a broad overview of how bioinformatics can enable bench-based research. It is aimed at experimental researchers in the molecular life sciences who have little or no previous experience using bioinformatics databases or tools.

  • Quantitative Trait Mapping: This lesson introduces genetic mapping using qtl2, an R package for analyzing quantitative phenotypes and genetic data from complex crosses like the Diversity Outbred (DO). Genetic mapping with qtl2 allows researchers in fields as diverse as medicine, evolution, and agriculture to identify specific chromosomal regions that contribute to variation in phenotypes (quantitative trait loci or QTL).

  • A collection of genomic-scale cloud pipelines for bioinformatics

General Compute

  • Open Science Data Cloud (ODSC): The OSDC is a data science ecosystem in which researchers can house and share their own scientific data, access complementary public datasets, build and share customized virtual machines with whatever tools necessary to analyze their data, and perform the analysis to answer their research questions.

  • Jetstream: A free national science & engineering cloud with a focus on ease of use and broad accessibility, Jetstream is designed for those who have not previously used high performance computing and software resources, includingsome useful preconfigured virtual machines (VMs).

  • Cloud Based Data Science: A free online educational to help anyone who can read, write, and use a computer to move into data science. It is a sequence of 11 Coursera courses offered by faculty members in the Johns Hopkins Department of Biostatistics, Bloomberg School of Public Health.

  • A gallery of interesting Jupyter Notebooks: This page is a curated collection of Jupyter/IPython notebooks that include interesting visual or technical content on a wide variety of programming and scientific computing topics such as image processing, NLP, and machine learning

General Computer Science

  • Happy Git and GitHub for the useR: This tutorial will help you install Git and get it working smoothly with GitHub, in the shell and in RStudio, develop a few key workflows that cover your most common tasks, and integrate Git and GitHub into your daily work with R and RMarkdown.

  • Learn Git Branching: visual and interactive way to learn Git; you'll be challenged with exciting levels, given step-by-step demonstrations of powerful features, and maybe even have a bit of fun along the way.

  • The Missing Semester of Your CS Education: Classes teach you all about advanced topics within CS, from operating systems to machine learning, but there’s one critical subject that’s rarely covered and is instead left to students to figure out on their own: proficiency with their tools. Learn how to master the command line, use a powerful text editor, use fancy features of version control systems, and much more!

  • How to Think Like a Computer Scientist: Interactive Edition: This book aims to teach you to think like a computer scientist. This way of thinking combines some of the best features of mathematics, engineering, and natural science. Chapters include interactive exercises and cover topics like iteration, selection, debugging, and more.

  • PacVim: PacVim is a fun game that teaches you vim commands. Vim is often called a "programmer's editor" and is perfect for all kinds of text editing, from composing emails to editing configuration files.

  • Kaggle Notebook Collection: Collection of useful Kaggle notebooks including various tutorials.

  • Introduction to Computer Science and Programming Specialization: This 3-course Coursera specialization covers topics ranging from basic computing principles to the mathematical foundations required for computer science. You will learn fundamental concepts of how computers work, which can be applied to any software or computer system. You will also gain the practical skillset needed to write interactive, graphical programs at an introductory level.

  • Software Development Processes and Methodologies: In this course, you will get an overview of how software teams work, processes they use, and industry-standard methodologies such as traditional, lean, and agile development methodologies.

  • Software Design and Architecture Specialization: In the 4-course Coursera Software Design and Architecture Specialization, you will learn how to apply design principles, patterns, and architectures to create reusable and flexible software applications and systems. You will learn how to express and document the design and architecture of a software system using a visual notation

Datasets

  • Tidy Tuesday: Join the R4DS Online Learning Community in the weekly #TidyTuesday event! Every week, they post a raw dataset, a chart, or an article related to that dataset and ask you to explore the data. The goal of TidyTuesday is to apply your R skills, get feedback, explore other’s work, and connect with the greater #RStats community.

  • Kaggle Datasets: Explore, analyze, and share quality data. You can learn more about data types, creating, and collaborating.

  • Data is plural: A weekly newsletter of useful/curious datasets

  • A list of public datasets: This list of a topic-centric public data sources in high quality. They are collected and tidied from blogs, answers, and user responses. Most of the data sets listed are free; however, some are not.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published