# Introduction to Statistical Data Science 

## About this book

This book is a collection of notes on statistical data science. This book is a work-in-progress, and materials in this book are updated constantly.

## What this book is about
Learning to use R in order to perform:
* Exploratory Data Analysis
    - Data Visualization
    - Data Transformation
* Analyzing different types of data
    - Numerical data
    - Categorical data (factors)
    - Text / strings
    - Dates and times
* Programming topics:
    - Functions
    - Abstraction
    - Vectors / lists
    - Iteration
* Putting it all together to perform statistical inference
    - Regression
    - Model building
    - Hypothesis testing

## What this course is *not* about
This is not a traditional programming course. You will learn to program in R as a byproduct of learning how to visualize, clean, and model data. However we will *not* cover things like:
- Algorithms
- Data structures
- OOP
- etc.

If you find that you enjoy programming and want to go further, these would be good topics to learn about in a future course.

## Accessing an R programming enviroment
Everything in this course will be done using [Jupyter notebooks](http://jupyter.org/) running the [R programming language](https://www.r-project.org/). Lecture notes will be distributed in Jupyter notebook format before lecture. There are two versions of lecture notes: fully annotated lecture notes and skeleton lecture notes.  The online lecture will be conducted using the skeleton lecture notes, and you are strongly encouraged to follow along using the skeleton lecture notes when you are watching the online lecture videos.  

# What is R
R is a programming language developed by statisticians to perform statistical analysis. The "traditional" way to run R from the Unix command line is by typing the command `R`:

    $ R
    R version 3.5.2 (2018-12-20) -- "Eggshell Igloo"
    Copyright (C) 2018 The R Foundation for Statistical Computing
    Platform: x86_64-apple-darwin17.7.0 (64-bit)

    R is free software and comes with ABSOLUTELY NO WARRANTY.
    You are welcome to redistribute it under certain conditions.
    Type 'license()' or 'licence()' for distribution details.

      Natural language support but running in an English locale

    R is a collaborative project with many contributors.
    Type 'contributors()' for more information and
    'citation()' on how to cite R or R packages in publications.

    Type 'demo()' for some demos, 'help()' for on-line help, or
    'help.start()' for an HTML browser interface to help.
    Type 'q()' to quit R.

    >

We won't use the command line in this class. The Jupyter notebook which runs these slides is running an R "kernel" in the background. Typing commands into these cells is the same as if you had type typed them into the R interpreter:

# Setting up R and Jupyter

Install Anacondda Python 3.x distribution from <https://www.anaconda.com/download/>.  

Then, you can:
-  *windows* -- start up Anaconda Prompt
-  *linux/macOS* -- start up your terminal

And type 

    conda install -c r r-essentials r-irkernel r-tidyverse


1. Install R from https://www.r-project.org/. If you already have an existing installation from 250 or another class you don't have to do this step, but reinstalling may still be the easiest solution since it should fix any paths that were changed by conda.
2. Open R and run
    ```
    install.packages(c('IRkernel', 'tidyverse'))
    ```    
3. On OSX, open Terminal (command-space, type `Terminal`), then type `R` and hit enter to open an R prompt. If you didn't reinstall `R` in step 2, instead you may need to enter something like `/Library/Frameworks/R.framework/Versions/3.6/Resources/bin/R`, where the version number `3.6` may change depending on when you originally installed `R`.

    On Windows, open Anaconda Prompt (or the console from the Anaconda Navigator). You then need to type or paste the path to the `R.exe` file in quotes to the console. If you used the default installation settings it should be something like
    ```
    "C:\Program Files\R\R-3.6.2\bin\x64\R.exe"
    ```
    Then hit enter to open an R prompt.
    
4. Once you have an R prompt open, run 
    ```
    library(IRkernel)
    installspec()
    ```
    
Now you should be able to open jupyter notebooks and set the kernel to R. To check, open a notebook and go to Kernel->Change kernel.


Now you are ready to play around with a Jupyter notebook!

You can startup your notebook through the Anaconda Navigator or by typing 

    jupyter notebook

into the prompt.

# Running Jupyter online

Notebook Binder: TBA