# RStudio and GitHub

## Intro to R and RStudio
- What is R?
    - Statistical programming language
    - Used for data processing and manipulation
    - Statistical, data analysis, and machine learning
    - R is used most by academics, healthcare, and gov't
    - R supports importing data from different sources: flat files, databases, web, statistical software
- R capabilities
    - easy to use compared to other data science tools
    - Great tool for visualization
    - Basic data analysis doesn't require installing packages
- What is RStudio
    - RStudio is an integrated development environment (IDE)
    - It increases productivity in running R programming language
    - Code editor, console for typing R commands, workspace and history tab, files plots packages help tabs
- Tabs in RStudio
    - Files
    - Plots
    - Packages
    - Help
    - Viewer
- Popular R libraries for Data science
    - dplyr (data manipulation)
    - stringr (string manipulation)
    - ggplot (data visualization)
    - caret (machine learning)
- Recap
    - capabilities of R and its uses in data science
    - RStudio interface for running R codes
    - Popular R packages for Data Science

## Downloading R
- Download R: https://cran.r-project.org/bin/windows/base/
- Download RStudio: https://posit.co/download/rstudio-desktop/


## R Basics with RStudio
- can type commands (e.g., print) directly in console
    - run by hitting Enter
    - Clear console: ctrl+L
- can create objects / variables and they will be saved in environment panel
    - to clean workspace, click Broom icon
- two methods of running code:
    - Highlight code and click Run
    - Click "Source" to run full script

## Plotting in RStudio
- Using data visualization in R
    - to install packages, use command: install.packages <package name>
    - packages are:
        - ggplot: histograms, bar charts, scatterplots
        - Plotly: Web-based data visualizations that can be displayed or saved as individual HTML files
        - Lattice: complex, multi-variable data sets (can handle graphics without customizations)
        - Leaflet: Interactive plots
- Using plot function
    - e.g., define cars vector with 5 values and graph
        - cars c(1,4,6,5,10)
        - plot(cars) -> scatterplot with index in x axis and value on y axis
        - plot(cars, type = "o") -> graph cars vector with all defaults (adds line)
        - title(main="Cars vs Index"): create a title
- Using ggplot
    - ggplot adds layers of functions and arguments
    - code:
        - library(ggplot2)
        - ggplot(mtcars, aes(x=mpg, y=wt))+geom_point()
- Adding titles to plot
    - ggplot(mtcars, aes(x=mpg, y-wt))+geom_point()+ggtitle("miles per gallon vs weight") + labs(y="weight", x="Miles per gallon")
- GGally extends ggplot by adding several functions to reduce complexity of combining geometric objects with transformed data
- Recap:
    - Popular data visualization packages in R
    - Plotting with inbuilt R plot function
    - Plotting with ggplot
    - Adding titles and changing the axis names using ggtitle and lab's function
    


## Getting started with R
- View(data_name): shows dataset
- unique(data_frame$column_name): returns list of unique values from column
- install.packages("GGally", repos = "https://cran.r-project.org", type = "source") : install packages

## Creating data visualizations with ggplot
- ?mtcars: returns info on dataset
- ggplot(aes(x=disp,y=mpg,),data=mtcars)+geom_point(): scatter plot
- ggplot(aes(x=disp,y=mpg,),data=mtcars)+geom_point()+ggtitle("displacement vs miles per gallon"): scatter plot with title
- ggplot(aes(x=disp,y=mpg,),data=mtcars)+geom_point()+ggtitle("displacement vs miles per gallon") + labs(x = "Displacement", y = "Miles per Gallon"): change name of x axis and y axis
- mtcars$vs <- as.factor(mtcars$vs): make column vs a string or a factor (needed to create boxplot)
- ggplot(aes(x=vs, y=mpg), data = mtcars) + geom_boxplot(): boxplot
- ggplot(aes(x=vs, y=mpg, fill = vs), data = mtcars) + 
  geom_boxplot(alpha=0.3) +
  theme(legend.position="none"): Add color to boxplot
- ggplot(aes(x=wt),data=mtcars) + geom_histogram(binwidth=0.5): histogram of weight


## Plotting with RStudio
- ggpairs(iris, mapping=ggplot2::aes(colour = Species))



# **GitHub**

## Overview of Git/GitHub
- Version Control
    - keep track of changes to docs
    - recover older versions of docs, makes collab easier
- Git
    - Free and open source software
    - Distributed version control system
    - Accessible anywhere in world
    - one of most common version control systems availabe
    - can also version control images, docs, etc.
    - can run Git through command line but GitHub interface more common
- SHORT Glossary of terms
    - SSH protocol: method for secure remote login from one computer to another
    - Repository: folders of your project that are set up for version control
    - Fork: copy of a repository
    - Pull request: process you use to request that someone reviews and approves your changes before they become final
    - Working directory: directory on your file system, including its files and subdirectories, that is associated with git repository
- Basic Git Commands
    - init: create new repository locally and push to GitHub or by cloning existing repository
    - add: moves changes from working directory to staging area
    - status: see state of working directory and snapshot of changes
    - commit: takes staged snapshot of changes and commits them to project
    - reset: undoes changes that you've made to files in your working directory
    - log: enables browse of previous changes
    - branch: isolated environment to make changes
    - checkout: see and change existing branches
    - merge: put everything back together again
- https://try.github.io/
    - resources to help get started