Skip to content
This repository has been archived by the owner on Sep 30, 2022. It is now read-only.

Meetups series about exploratory data analysis with the tidyverse

License

Notifications You must be signed in to change notification settings

2DegreesInvesting/ds.tidyeda

Repository files navigation

A series of meetups about the exploring data with the tidyverse.

Syllabus

The goal is to lean how to explore data with the tidyverse. At the end of this series you will be able to do things like these:

  • Explore the variation in your data – visually with ggplot2 and analytically with dplyr.
  • Manipulate and visualize unusual and missing values.
  • Explore the covariation between categorical and continuous variables.
  • Explore patterns in your data and extract them with a model.

Each item in this syllabus corresponds to a meetup and a GitHub release that preserves a snapshot of this repository exactly as it was shown during thee meetup.

Prerequisites

This meetup helps you quickly meet the prerequisites for the rest of this meetup-series. It covers the minimum elements of a defensive workflow, and of the data-science toolkit you’ll need during this series.

Objectives:

  • Overview the elements of a defensive workflow and know where to learn more.

The data science workflow and toolkit: An overview

This meetup overviews the data science workflow and the toolkit you’ll need for the rest of this meetup series.

Objectives:

  • Overview the data-science workflow and toolkit.
  • Introduce rmarkdown documents.
  • Overview the tool for visualization: ggplot2.
  • Overview the tool for transformation: dplyr.

Exploring variation with ggplot2 and dplyr

This meetup covers Variation.

Objectives:

  • Understand the role of “questions” in an exploratory data analysis.
  • Review the definitions of the main components of a data set.
  • Explore the distribution of a categorical variable.
  • Explore the distribution of a continuous variable.
  • Explore the distribution of multiple groups of data in a single plot.

Typical values

This meetup covers strategies to manipulate and visualize typical values.

Objectives:

  • Visualize the most common and rare values in a dataset.

Unusual, and missing values

This meetup covers strategies to manipulate and visualize unusual, and missing values.

Objectives:

  • Explore outliers analytically, and by zooming into a plot.
  • Transforming incorrect values into missing values.
  • Convert missing values into a logical variable so you can visualize them.

Covariation: One categorical and one continuous variable

This meetup covers the first part of Covariation: Visualizing the relationship between a categorical variable and a continuous variable. We’ll also discuss some ways to make the code of your plots clearer using “noise cancelling” techniques.

Objectives:

  • Visualize the covariation between a categorical and a continuous variable.

  • Omit common, memorable arguments: data, mapping, x, and y.

  • Save ggplot elements and plots as variables.

Covariation: Two categorical variables and two continuous variables

This meetup covers the last part of Covariation: Exploring the relationship between two categorical variables and between two continuous variables.

Objectives:

  • Explore the covariation between two categorical variables.

  • Map count to area and to colour-fill.

  • Explore the covariation between two continuous variables.

  • Solve over-plotting with transparency and bins.

  • Categorize a continuous variable then use a boxplot.

  • Map the width of a boxplot to the number of data points.

Patterns and models

This meetup covers Patterns and models.

Objectives:

  • Explore the relationship between two variables.

  • Extract a pattern with a model.

  • Explore the relationship again, after removing the pattern.

Resources