# 数据重塑

## 于数据帧中加入列和行

我们可以使用cbind()函数连接多个向量来创建数据帧

In [3]:
city = c("Tampa","Seattle","Hartford","Denver")
state = c("FL","WA","CT","CO")
zipcode = c(33602,98104,06161,80294)

addresses = cbind(city, state, zipcode)   #等同于concate
class(addresses)
addresses

city,state,zipcode
Tampa,FL,33602
Seattle,WA,98104
Hartford,CT,6161
Denver,CO,80294


In [7]:
new.address = data.frame(
   city = c("Lowry","Charlotte"),
   state = c("CO","FL"),
   zipcode = c("80230","33949"),
   stringsAsFactors = FALSE
)
new.address

city,state,zipcode
Lowry,CO,80230
Charlotte,FL,33949


使用rbind()函数合并两个数据帧

In [8]:
all.addresses = rbind(addresses,new.address)
all.addresses

city,state,zipcode
Tampa,FL,33602
Seattle,WA,98104
Hartford,CT,6161
Denver,CO,80294
Lowry,CO,80230
Charlotte,FL,33949


## 合并数据帧

有关Pima Indian Women的糖尿病的数据集

In [31]:
'Diabetes in Pima Indian Women'
library(MASS)
class(Pima.te)
Pima.te[1:5,]    #两种输出方法
head(Pima.tr, 5)

npreg,glu,bp,skin,bmi,ped,age,type
6,148,72,35,33.6,0.627,50,Yes
1,85,66,29,26.6,0.351,31,No
1,89,66,23,28.1,0.167,21,No
3,78,50,32,31.0,0.248,26,Yes
2,197,70,45,30.5,0.158,53,Yes


npreg,glu,bp,skin,bmi,ped,age,type
5,86,68,28,30.2,0.364,24,No
7,195,70,33,25.1,0.163,55,Yes
5,77,82,41,35.8,0.156,35,No
0,165,76,43,47.9,0.259,26,No
0,107,60,25,26.4,0.133,23,No


基于血压（“bp”）和体重指数（“bmi”）的值合并两个数据集。选择这两列用于合并时，其中这两个变量的值在两个数据集中匹配的记录被组合在一起以形成单个数据帧

In [9]:
library(MASS)
merged.Pima = merge(x = Pima.te, y = Pima.tr,
   by.x = c("bp", "bmi"),
   by.y = c("bp", "bmi")
)
merged.Pima
cat("行：", nrow(merged.Pima), "\t列：", ncol(merged.Pima))

bp,bmi,npreg.x,glu.x,skin.x,ped.x,age.x,type.x,npreg.y,glu.y,skin.y,ped.y,age.y,type.y
60,33.8,1,117,23,0.466,27,No,2,125,20,0.088,31,No
64,29.7,2,75,24,0.37,33,No,2,100,23,0.368,21,No
64,31.2,5,189,33,0.583,29,Yes,3,158,13,0.295,24,No
64,33.2,4,117,27,0.23,24,No,1,96,27,0.289,21,No
66,38.1,3,115,39,0.15,28,No,1,114,36,0.289,21,No
68,38.5,2,100,25,0.324,26,No,7,129,49,0.439,43,Yes
70,27.4,1,116,28,0.204,21,No,0,124,20,0.254,36,Yes
70,33.1,4,91,32,0.446,22,No,9,123,44,0.374,40,No
70,35.4,9,124,33,0.282,34,No,6,134,23,0.542,29,Yes
72,25.6,1,157,21,0.123,24,No,4,99,17,0.294,28,No


行： 17 	列： 14

## 拆分数据帧

In [7]:
install.packages("reshape")

Updating HTML index of packages in '.Library'
Making 'packages.html' ... done


将除类型和年份以外的所有列转换为多行展示

In [5]:
library(MASS)
library(reshape)
library(reshape2)
ships[1 : 10,]
molten.ships = melt(ships[1 : 5,], id = c("type", "year"))  
molten.ships

type,year,period,service,incidents
A,60,60,127,0
A,60,75,63,0
A,65,60,1095,3
A,65,75,1095,4
A,70,60,1512,6
A,70,75,3353,18
A,75,60,0,0
A,75,75,2244,11
B,60,60,44882,39
B,60,75,17176,29


type,year,variable,value
A,60,period,60
A,60,period,75
A,65,period,60
A,65,period,75
A,70,period,60
A,60,service,127
A,60,service,63
A,65,service,1095
A,65,service,1095
A,70,service,1512


将被拆分的数据转换为一种新形式，使用cast()函数创建每年每种类型的船的总和

In [8]:
recasted.ship = cast(melt(ships, id = c("type", "year")), type + year ~ variable, sum)
recasted.ship

type,year,period,service,incidents
A,60,135,190,0
A,65,135,2190,7
A,70,135,4865,24
A,75,135,2244,11
B,60,135,62058,68
B,65,135,48979,111
B,70,135,20163,56
B,75,135,7117,18
C,60,135,1731,2
C,65,135,1457,1


# 查看数据集

## 查看某一个package

In [3]:
library(MASS)
data()

In [16]:
library(MASS)
data(phones)
print(phones)

$year
 [1] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73

$calls
 [1]   4.4   4.7   4.7   5.9   6.6   7.3   8.1   8.8  10.6  12.0  13.5  14.9
[13]  16.1  21.2 119.0 124.0 142.0 159.0 182.0 212.0  43.0  24.0  27.0  29.0



## 查看所有

In [None]:
data(package = .packages(all.available = TRUE))