GitHub - 2DegreesInvesting/ds.tidyeda: Meetups series about exploratory data analysis with the tidyverse

A series of meetups about the exploring data with the tidyverse.

Syllabus

The goal is to lean how to explore data with the tidyverse. At the end of this series you will be able to do things like these:

Explore the variation in your data – visually with ggplot2 and analytically with dplyr.
Manipulate and visualize unusual and missing values.
Explore the covariation between categorical and continuous variables.
Explore patterns in your data and extract them with a model.

Each item in this syllabus corresponds to a meetup and a GitHub release that preserves a snapshot of this repository exactly as it was shown during thee meetup.

Prerequisites

This meetup helps you quickly meet the prerequisites for the rest of this meetup-series. It covers the minimum elements of a defensive workflow, and of the data-science toolkit you’ll need during this series.

Objectives:

Overview the elements of a defensive workflow and know where to learn more.

The data science workflow and toolkit: An overview

This meetup overviews the data science workflow and the toolkit you’ll need for the rest of this meetup series.

Objectives:

Overview the data-science workflow and toolkit.
Introduce rmarkdown documents.
Overview the tool for visualization: ggplot2.
Overview the tool for transformation: dplyr.

Exploring variation with ggplot2 and dplyr

This meetup covers Variation.

Objectives:

Understand the role of “questions” in an exploratory data analysis.
Review the definitions of the main components of a data set.
Explore the distribution of a categorical variable.
Explore the distribution of a continuous variable.
Explore the distribution of multiple groups of data in a single plot.

Typical values

This meetup covers strategies to manipulate and visualize typical values.

Objectives:

Visualize the most common and rare values in a dataset.

Unusual, and missing values

This meetup covers strategies to manipulate and visualize unusual, and missing values.

Objectives:

Explore outliers analytically, and by zooming into a plot.
Transforming incorrect values into missing values.
Convert missing values into a logical variable so you can visualize them.

Covariation: One categorical and one continuous variable

This meetup covers the first part of Covariation: Visualizing the relationship between a categorical variable and a continuous variable. We’ll also discuss some ways to make the code of your plots clearer using “noise cancelling” techniques.

Objectives:

Visualize the covariation between a categorical and a continuous variable.
Omit common, memorable arguments: data, mapping, x, and y.
Save ggplot elements and plots as variables.

Covariation: Two categorical variables and two continuous variables

This meetup covers the last part of Covariation: Exploring the relationship between two categorical variables and between two continuous variables.

Objectives:

Explore the covariation between two categorical variables.
Map count to area and to colour-fill.
Explore the covariation between two continuous variables.
Solve over-plotting with transparency and bins.
Categorize a continuous variable then use a boxplot.
Map the width of a boxplot to the number of data points.

Patterns and models

This meetup covers Patterns and models.

Objectives:

Explore the relationship between two variables.
Extract a pattern with a model.
Explore the relationship again, after removing the pattern.

Resources

YouTube playlist.
The ds-incubator project.
Ideas for future meetups.
Exploratory data analysis (R for data science).
Related issue.
Slides.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
01_prerequisites		01_prerequisites
02_workflow-and-toolkit		02_workflow-and-toolkit
03_variation		03_variation
04_typical-values		04_typical-values
05_unusual-and-missing-values		05_unusual-and-missing-values
06_covariation		06_covariation
07_patterns		07_patterns
data		data
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
LICENSE.md		LICENSE.md
README.Rmd		README.Rmd
README.md		README.md
ds.tidyeda.Rproj		ds.tidyeda.Rproj

License

2DegreesInvesting/ds.tidyeda

Folders and files

Latest commit

History

Repository files navigation

Syllabus

Prerequisites

The data science workflow and toolkit: An overview

Exploring variation with ggplot2 and dplyr

Typical values

Unusual, and missing values

Covariation: One categorical and one continuous variable

Covariation: Two categorical variables and two continuous variables

Patterns and models

Resources

About

Topics

Resources

License

Stars

Watchers

Forks