## The contents include:

* 1. Installing and Library packages
* 2. Reading and writing data
* 3. Filtering and Subsetting data
* 4. Basic Statistical Operations

## 1. Installing and Library packages

### Three ways to install packages:

1、By selecting the menu in the R software, it will need a internet connection:
  * From the `Packages` menu in the toolbar, select `Install package(s)....`
  * We can choose the packages we want in the pop-up dialog box, and then OK.
   
2、By using the command: 
> options(repos=structure(c(CRAN="https://mirrors.tuna.tsinghua.edu.cn/CRAN/")))  ### choose a better mirror
> install.packages("packagename","dir")

oR 

> if (!requireNamespace("BiocManager", quietly = TRUE))  
> install.packages("BiocManager")   
> BiocManager::install("biomaRt")

3、By local files to install:
  First, download the corresponding package compressed file: In windows, unix, macOS operating system installation file extension is not the same: 
  1) linux environment compile and run: tar.gz file
  2) windows environment compile and run: .zip file
  3) MacOSg environment compile and run: .tgz file
  
  Second, type the following commmand to install:
> install.packages("path/to/mypackage.tar.gz", repos = NULL, type="source")

Or from the R toolbar, choose `Packages` menu, select`Install package(s) from local files...`

In [1]:
# options(repos=structure(c(CRAN="http://mirrors.tuna.tsinghua.edu.cn/CRAN/")))
# install.packages("dplyr")
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")


BiocManager::install("biomaRt")
BiocManager::install("dplyr")

Bioconductor version 3.9 (BiocManager 1.30.10), R 3.6.2 (2019-12-12)

Installing package(s) 'biomaRt'



package 'biomaRt' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Master\AppData\Local\Temp\RtmponWTbe\downloaded_packages


Old packages: 'BH', 'bit', 'caTools', 'cli', 'fansi', 'GetoptLong', 'ggridges',
  'gplots', 'hms', 'multcomp', 'mvtnorm', 'precrec', 'prettyunits', 'pROC',
  'RSQLite', 'Rttf2pt1', 'shinyjs', 'stringi', 'tinytex', 'xfun', 'zoo'

Bioconductor version 3.9 (BiocManager 1.30.10), R 3.6.2 (2019-12-12)

Installing package(s) 'dplyr'



package 'dplyr' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Master\AppData\Local\Temp\RtmponWTbe\downloaded_packages


Old packages: 'BH', 'bit', 'caTools', 'cli', 'fansi', 'GetoptLong', 'ggridges',
  'gplots', 'hms', 'multcomp', 'mvtnorm', 'precrec', 'prettyunits', 'pROC',
  'RSQLite', 'Rttf2pt1', 'shinyjs', 'stringi', 'tinytex', 'xfun', 'zoo'



In [2]:
library(biomaRt)
library(dplyr)


Attaching package: 'dplyr'


The following object is masked from 'package:biomaRt':

    select


The following objects are masked from 'package:stats':

    filter, lag


The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union




## 2. Reading and Writing Data

Before we start with analyzing any data, we must read it into R workspace. Not only an external R object (typical file extensions are `.rda` or `.RData`, but also an internal R object for a package or a TXT, CSV, or Excel file can be loaded into the R console. 
At first, learn to find and set the working directory, which is convenient for me to find the files.

In [5]:
getwd() ## Get the current working directory

In [None]:
## Change to a desired directory, by the command:
setwd("D:/Example/file/folder")   ## you could change it with your interest 

## 3. Reading and Writing Data

The following steps: 
* [3.1. Creating data frames](https://github.com/Chengshu21/bio-start-with-R/blob/master/Chapter%201/reading%20and%20writing%20data.md#3.1-creating-data-frames)
* [3.2. Writing data)](https://github.com/Chengshu21/bio-start-with-R/blob/master/Chapter%201/reading%20and%20writing%20data.md#3.2-writing-data)
* [3.3. Reading data](https://github.com/Chengshu21/bio-start-with-R/blob/master/Chapter%201/reading%20and%20writing%20data.md#3.3-reading-data)

If there is a dataset existed in the package, use `data()` to get an access to the dataset.

In [8]:
## create a object "d", it's a data frame
d <- data.frame(obs = c(1, 2, 3, 4, 5, 6), 
                treat = c("A", "B", "A", "A", "O", "B"), 
                weight = c(2.3, NA, 9, 8, 4, 7))
d

obs,treat,weight
<dbl>,<fct>,<dbl>
1,A,2.3
2,B,
3,A,9.0
4,A,8.0
5,O,4.0
6,B,7.0


In [23]:
## Writing data into a TXT, CSV, or XLSX file and even a .RData
#### A TXT file
write.table(d, file = "F:/d.txt", row.names = F, quote = T, sep = "\t")

#### A CSV file
write.csv(d, file = "F:/d.csv", row.names = F,quote = F)

#### .RData
save(d, file = "F:/d.RData")

#### .RDS
saveRDS(d, file = "F:/d.RDS")

In [12]:
## read and write .xlsx file need R package "openxlsx"
BiocManager::install("openxlsx")
library(openxlsx)

Bioconductor version 3.9 (BiocManager 1.30.10), R 3.6.2 (2019-12-12)

Installing package(s) 'openxlsx'



package 'openxlsx' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
	C:\Users\Master\AppData\Local\Temp\RtmponWTbe\downloaded_packages


Old packages: 'BH', 'bit', 'caTools', 'cli', 'fansi', 'GetoptLong', 'ggridges',
  'gplots', 'hms', 'multcomp', 'mvtnorm', 'precrec', 'prettyunits', 'pROC',
  'RSQLite', 'Rttf2pt1', 'shinyjs', 'stringi', 'tinytex', 'xfun', 'zoo'



In [24]:
#### A xlsx file
write.xlsx(d, file = "F:/d.xlsx", row.names = F,quote = F)

In [29]:
## Reading data into R
d1 = read.table("F:/d.txt")
d1
d2 = read.csv("F:/d.csv")
d2
d3 = read.xlsx("F:/d.xlsx")
d3
d4 = readRDS("F:/d.RDS")
d4
load("F:/d.RData")

V1,V2,V3
<fct>,<fct>,<fct>
obs,treat,weight
1,A,2.3
2,B,
3,A,9
4,A,8
5,O,4
6,B,7


obs,treat,weight
<int>,<fct>,<dbl>
1,A,2.3
2,B,
3,A,9.0
4,A,8.0
5,O,4.0
6,B,7.0


Unnamed: 0_level_0,obs,treat,weight
Unnamed: 0_level_1,<dbl>,<chr>,<dbl>
1,1,A,2.3
2,2,B,
3,3,A,9.0
4,4,A,8.0
5,5,O,4.0
6,6,B,7.0


obs,treat,weight
<dbl>,<fct>,<dbl>
1,A,2.3
2,B,
3,A,9.0
4,A,8.0
5,O,4.0
6,B,7.0


In [30]:
ls()  ## check the objects in R

## 4. Basic Statistical Operations

R being a statistical programming environment has a number of built-in functionalities to perform statistics on data. Nevertheless, some specific functionalities are either available in packages or can easily be written. 

In [31]:
summary(iris)

  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width   
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500  
       Species  
 setosa    :50  
 versicolor:50  
 virginica :50  
                
                
                

In [33]:
## "mean()" means arithmetic mean:
mean(iris[, 1])

In [None]:
## "sd()" computes the standard deviation of the values in the object, 'x'. 
sd(iris[, 1])