Skip to content

daroczig/CEU-R-lab

2018-fall
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
1.R
 
 
2.R
 
 
3.R
 
 
4.R
 
 
5.R
 
 
6.R
 
 
 
 

This is the R script repository of the "Data Analysis 1a: Exploration" course in the 2018/2019 Fall term, part of the MSc in Business Analytics at CEU. For the previous years, see the 2015/2016 Winter, 2016/2017 Fall and 2017/2018 Fall branches.

Table of Contents

Syllabus

Please find in the syllabus folder of this repository.

Technical Prerequisites

Please bring your own laptop and make sure to install the below items before attending the first class:

  1. Install R from https://cran.r-project.org
  2. Install RStudio Desktop (Open Source License) from https://www.rstudio.com/products/rstudio/download
  3. Register an account at https://github.com
  4. Enter the following commands in the R console (bottom left panel of RStudio) and make sure you see a plot in the bottom right panel and no errors in the R console:
install.packages('ggplot2')
library(ggplot2)
ggplot(diamonds, aes(cut)) + geom_bar()

Optional steps I highly suggest to do as well before attending the class if you plan to use git:

  1. Bookmark, watch or star this repository so that you can easily find it later

  2. Install git from https://git-scm.com/

  3. Verify that in RStudio, you can see the path of the git executable binary in the Tools/Global Options menu's "Git/Svn" tab -- if not, then you might have to restart RStudio (if you installed git after starting RStudio) or installed git by not adding that to the PATH on Windows. Either way, browse the "git executable" manually (in some bin folder look for thee git executable file).

  4. Create an RSA key (optionally with a passphrase for increased security -- that you have to enter every time you push and pull to and from GitHub). Copy the public key and add that to you SSH keys on your GitHub profile.

  5. Create a new project choosing "version control", then "git" and paste the SSH version of the repo URL copied from GitHub in the pop-up -- now RStudio should be able to download the repo. If it asks you to accept GitHub's fingerprint, say "Yes".

  6. If RStudio/git is complaining that you have to set your identity, click on the "Git" tab in the top-right panel, then click on the Gear icon and then "Shell" -- here you can set your username and e-mail address in the command line, so that RStudio/git integration can work. Use the following commands:

    $ git config --global user.name "Your Name"
    $ git config --global user.email "Your e-mail address"
    

    Close this window, commit, push changes, all set.

Find more resources in Jenny Bryan's "Happy Git and GitHub for the useR" tutorial if in doubt or contact me.

Class Schedule

Week 1 (100 min): Introduction to R

  • General overview of the R ecosystem: slides
  • Basic math operations: 1.R
  • Numbers, strings, vectors: 1.R
  • Functions: 1.R
  • Basic plots: 1.R
  • Basic stats: 1.R
  • Intro to data frames: 1.R

Suggested reading: Hadley Wickham: Style guide. In Advanced R.

Homework: DataCamp

Week 2 (100 min): Introduction to Data Frames and column types

  • Recap on data frames: 2.R
  • Loading data from text and Excel files: 2.R
  • Variable types, conversion between variable types: 2.R

Homework: DataCamp

Week 3 (100 min): Introduction to Data Transformations

  • Recap on data frames: 3.R
  • Creating new variables: 3.R
  • Finding missing values 3.R and duplicates 3.R
  • Intro into data.table: 3.R
  • Summarizing data, aggregates: 3.R
  • Combining datasets: 3.R

Homework: DataCamp

Week 4 (100 min): More Data Transformations

  • Recap on data.table summaries: 4.R
  • Recap on merging datasets: 4.R
  • Creating new variables - numeric to factor: 4.R
  • Creating new variables - numeric to numeric: 4.R
  • Demo: multiple summaries: 4.R

Homework: DataCamp

Week 5 (100 min): Introduction to Data Visualization

  • Recap on data.table summaries and merging datasets: 5.R
  • Introduction to data visualization with ggplot2: 5.R
  • Scales and coordinate transformations: 5.R
  • Plotting numeric variables: 5.R
  • Recap on factors: 5.R
  • Facets: 5.R
  • Stacked and clustered bar charts: 5.R
  • Histograms and density plots: 5.R
  • Popular and custom themes: 5.R

Suggested reading:

Homework: DataCamp

Ideas to practice using the hotels dataset:

  • plot a barplot on the number of hotels per popularity
  • plot a barplot on the number of hotels per popularity by feeding a data.table summary to ggplot2
  • plot a histogram on the prices in EUR
  • plot a histogram on the prices in EUR with a bindwidth of 100 EUR
  • plot a histogram on the prices in EUR split by popularity
  • plot a boxplot on the prices below 1000 EUR split by city type
  • plot a boxplot on the prices in EUR split by popularity
  • plot a scatterplot on the prices in EUR and the distance from city center
  • add a linear model to the previous plot

Week 6 (100 min): Sampling, Simulations and Hypothesis Testing

  • Introduction to random sampling: 6.R
  • Introduction to writing loops: 6.R
  • Estimating the standard error: 6.R
  • Confidence intervals: 6.R
  • t-test: 6.R
  • Required sample size calculations: 6.R

Suggested reading / materials:

Homework: DataCamp

Ideas to practice using the hotels dataset:

hotels <- readRDS(url('http://bit.ly/CEU-R-hotels-2018-merged'))
  • create a new pricecat variable based on avg_price_per_night: "cheap" below 100 EUR, "expensive" above
  • check if expensive hotels are rated above cheap
  • plot the difference of ratings between cheap and expensive hotels including the results of a t.test
  • create a new popularity variable with 3 categories from the number of bookings (0-3, 4-7, 8-10)
  • check if popular hotels are higher rated than less popular hotels
  • plot the difference

Contact

File a GitHub ticket.

About

Data Analysis 1a: Foundation of Data management in R @ CEU

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages