Skip to content
Daria-Maltseva edited this page Nov 6, 2018 · 4 revisions

Create and clean networks

September 13, 2018

We combined file Clean_jun12 with terminals from 2 files:

  • CompleteWoSDataBase
  • CompleteRISfinal(2)

Then we made new set of networks using WoS2Pajek - SN17_sep.

We made an analysis of the citation network indegrees. Several works with the largest frequencies have different short names. For example

Input: WASSERMA_S(1994)
Possible Matching: WASSERMA_S(1994): (228)
Possible Matching: WASSERMA_S(1994):169 (1828)
Possible Matching: WASSERMA_S(1994):3 (1829)
Possible Matching: WASSERMA_S(1994)8: (36101)
Possible Matching: WASSERMA_S(1994):20 (39116)
Possible Matching: WASSERMA_S(1994):178 (45519)
Possible Matching: WASSERMA_S(1994):825 (70914)
Possible Matching: WASSERMA_S(1994):34 (118355)
Possible Matching: WASSERMA_S(1994):547 (123305)
Possible Matching: WASSERMA_S(1994):12 (128169)
Possible Matching: WASSERMA_S(1994):1 (138369)
Possible Matching: WASSERMA_S(1994):8 (147020)
Possible Matching: WASSERMA_S(1994):126 (193932)
Possible Matching: WASSERMA_S(1994)8:857 (209248)
Possible Matching: WASSERMA_S(1994):CBO9780511815478 (226137)
Possible Matching: WASSERMA_S(1994):CH8 (237161)
Possible Matching: WASSERMA_S(1994):220 (251825)
Possible Matching: WASSERMA_S(1994)XXXI: (284307)
Possible Matching: WASSERMA_S(1994)52:210 (303495)
Possible Matching: WASSERMA_S(1994):852 (305375)
Possible Matching: WASSERMA_S(1994):[XXXI (332210)
Possible Matching: WASSERMA_S(1994):CB09780511815478 (350950)
Possible Matching: WASSERMA_S(1994):231 (376830)
Possible Matching: WASSERMA_S(1994):2768 (391148)
Possible Matching: WASSERMA_S(1994):857 (408893)
Possible Matching: WASSERMA_S(1994):249 (430991)
Possible Matching: WASSERMA_S(1994):266 (474612)
Possible Matching: WASSERMA_S(1994)8:R31 (479486)
Possible Matching: WASSERMA_S(1994):92 (480233)
Possible Matching: WASSERMA_S(1994):4 (490561)
Possible Matching: WASSERMA_S(1994)9: (492359)
Possible Matching: WASSERMA_S(1994):46 (507753)
Possible Matching: WASSERMA_S(1994)24: (530909)
Possible Matching: WASSERMA_S(1994)171: (538123)
Possible Matching: WASSERMA_S(1994):12055 (573761)
Possible Matching: WASSERMA_S(1994)8:825 (627367)
Possible Matching: WASSERMA_S(1994):291 (645259)
Possible Matching: WASSERMA_S(1994)5:91 (679890)
Possible Matching: WASSERMA_S(1994)81: (730779)
Possible Matching: WASSERMA_S(1994):28 (756184)
Possible Matching: WASSERMA_S(1994):173 (763829)
Possible Matching: WASSERMA_S(1994):505 (772324)
Possible Matching: WASSERMA_S(1994):101 (791200)
Possible Matching: WASSERMA_S(1994):XI (809730)
Possible Matching: WASSERMA_S(1994):177 (835625)
Possible Matching: WASSERMA_S(1994):23 (844661)
Possible Matching: WASSERMA_S(1994)4:857 (866790)
Possible Matching: WASSERMA_S(1994)8:827 (884134)
Possible Matching: WASSERMA_S(1994):506 (901308)
Possible Matching: WASSERMA_S(1994):25 (919190)
Possible Matching: WASSERMA_S(1994)25:825 (924105)
Possible Matching: WASSERMA_S(1994):100 (936034)
Possible Matching: WASSERMA_S(1994)24:219 (1117244)
Possible Matching: WASSERMA_S(1994):31 (1167228)
Possible Matching: WASSERMA_S(1994)19:645 (1167817)
Possible Matching: WASSERMA_S(1994):130 (1169380)
Possible Matching: WASSERMA_S(1994):162 (1173128)
Possible Matching: WASSERMA_S(1994):1994 (1196724)
Possible Matching: WASSERMA_S(1994)506: (1293924)
Possible Matching: WASSERMA_S(1994):107 (1295856)
Possible Matching: WASSERMA_S(1994):CH12 (1295963)

Input: BOYD_D(2007)13
Possible Matching: BOYD_D(2007)13:210 (197230)
Possible Matching: BOYD_D(2007)13:28 (225411)
Possible Matching: BOYD_D(2007)13: (237556)
Possible Matching: BOYD_D(2007)13:1 (253022)
Possible Matching: BOYD_D(2007)13:NI224 (267989)
Possible Matching: BOYD_D(2007)13:2009 (314521)
Possible Matching: BOYD_D(2007)13:4 (318982)
Possible Matching: BOYD_D(2007)13:7 (318983)
Possible Matching: BOYD_D(2007)131: (330689)
Possible Matching: BOYD_D(2007)13:2010 (621029)
Possible Matching: BOYD_D(2007)13:201 (681308)
Possible Matching: BOYD_D(2007)13:335 (783043)
Possible Matching: BOYD_D(2007)13:23 (808827)
Possible Matching: BOYD_D(2007)13:20 (836827)
Possible Matching: BOYD_D(2007)13:56 (1102875)
Input: BOYD_D(2008)13
Possible Matching: BOYD_D(2008)13:11 (236878)
Possible Matching: BOYD_D(2008)13:210 (238102)
Possible Matching: BOYD_D(2008)13: (260157)
Possible Matching: BOYD_D(2008)13:2010 (532561)
Possible Matching: BOYD_D(2008)13:2 (858357)
Possible Matching: BOYD_D(2008)13:217 (928705)
Possible Matching: BOYD_D(2008)13:4 (1051371)

September 14, 2018

Cleaning

There are some identity problems at the top level. Two different books of Wasserman were published in 1994. We should add a volume 171 at the end (replacement in the WoS clean file) for

WASSERMAN S, 1994, ADV SOCIAL NETWORK A
Wasserman S., 1994, ADV SOCIAL NETWORK A
Wasserman S., 1994, ADV SOCIAL NETWORK A, V171

and consider all other node names as the same.

After changing the WoS clean file into cleanNew.WoS we generate again using WoS2Pajek all Pajek files.

The other 'top problems' can be resolved by constructing an equivalence partition and shrinking Pajek networks.

The Boyd`s work was published in 2007

Boyd, Danah M., and Nicole B. Ellison. "Social network sites: Definition, history, and scholarship." Journal of computer‐mediated Communication 13.1 (2007): 210-230.

but there are a lot of cases in complete.WoS.file when it is written as published in 2008.

238102                263.0000   BOYD_D(2008)13:210
237556                475.0000   BOYD_D(2007)13:
197230                1702.0000   BOYD_D(2007)13:210

36101                 344.0000   WASSERMA_S(1994)8:
228                   4903.0000   WASSERMA_S(1994):

120                   4239.0000   GRANOVET_M(1973)78:1360
1702                  122.0000   GRANOVET_M(1973)78:6

566                   753.0000   COLEMAN_J(1988)94:95
186381                987.0000   COLEMAN_J(1988)94:S95

Wasserman, Stanley, and Katherine Faust. Social network analysis: Methods and applications. Vol. 8. Cambridge university press, 1994.

Stanley Wasserman, Joseph Galaskiewicz (eds). Advances in Social Network Analysis: Research in the Social and Behavioral Sciences. Volume 171 of SAGE Focus Editions, 1994.

Mark S. Granovetter. The Strength of Weak Ties. The American Journal of Sociology, Vol. 78, No. 6. (May, 1973), pp. 1360-1380.

Coleman, James S. "Social capital in the creation of human capital." American journal of sociology 94 (1988): S95-S120.

Search for possible matching:

Network / Info / Vertex Label -> Vertex Number

Coleman - join all the nodes which has 94

> setwd("C:/Users/batagelj/work/Python/WoS/SocNet/2018/WoS")
> nw <- 1297257
> C <- 1:nw
> # BOYD_D(2007)13
> boyd <- c(225412,237557,253023,267990,314522,318983,318984,330690,621030,681309,
+ 783044,808828,836828,1102876,236879,238103,260158,532563,858358,928706,1051372)
> C[boyd] <- 197231
> # WASSERMA_S(1994)
> wassA <- c(1828,1829,36102,39117,45520,70915,118356,123306,128170,147021,193933,
+ 209249,226138,237162,251826,284308,303496,305376,332211,350951,376831,391149,
+ 408894,430992,440728,474614,479488,480235,490563,492361,507755,530911,573762,
+ 627368,645260,679891,730780,756185,763830,772325,791201,835626,844662,866791,
+ 884135,901309,919191,924106,936035,1117245,1167229,1167818,1169381,1173129,
+ 1196725,1293925,1295854,1295961)
> C[wassA] <- 228
> wassB <- c(809731,138370)
> C[wassB] <- 14798
> # GRANOVET_M(1973)78
> gran <- c(1702,15444,33189,34144,76516,89155,94349,153939,163526,177693,194608,
+ 237055,246264,247622,259955,294934,327727,416147,420060,432480,584876,633580,
+ 894099,997315,1027981,1118604,1145175)
> C[gran] <- 120
> # COLEMAN_J(1988)94
> cole <- c(4153,9393,29764,88473,106270,161906,186382,252616,324404,376561,
+ 391662,434659,1053431,1104473,1225411,1225412)
> C[cole] <- 566
> out <- file("shrink.clu","w")
> cat(paste("*vertices ",nw),C,sep="\n",file=out)
> close(out)

Because of the strange behavior of R's cat function in the file shrink.clu values 100000, 200000, ..., 1000000 are written in exponential form 1e+05, 2e+05, ..., 1e+06. We manually corrected these values.

Citation network

Boyd file: Operations - Network - Shrink network.

2-mode networks

Read 2-mode networks WA, WJ and WK. Using Info button we get.

WA Rows=1297257, Cols=395972

WJ Rows=1297257, Cols=70425

WK Rows=1297257, Cols=32409

read shrink partition shrink.clu

select WA network
Partition / Create Identity partition [395972]
select shrink partition as First
select identity partition as Second 
Partititons / Fuse partitions 
Operations / Network + Partition / Shrink network [1][0]
save network as WAn

http://vlado.fmf.uni-lj.si/dl/WoS/SN17-sep.zip http://vlado.fmf.uni-lj.si/dl/WoS/SN17new.zip

year of publication partition

> all <- c(boyd,wassA,wassB,gran,cole)
> Y <- read.csv("./years.clu",header=FALSE,skip=2)$V1 
> Z <- Y[-all]; nN <- length(Z)
> out <- file("yearN.clu","w")
> cat(paste("*vertices ",nN),Z,sep="\n",file=out)
> close(out)

number of pages vector

> Y <- read.csv("./NP.vec",header=FALSE,skip=2)$V1 
> N <- Y[-all]; nN <- length(N)
> out <- file("NPn.vec","w")
> cat(paste("*vertices ",nN),N,sep="\n",file=out)
> close(out)

DC partition

> Y <- read.csv("./DC.clu",header=FALSE,skip=2)$V1 
> D <- Y[-all]; nN <- length(D)
> out <- file("DCn.clu","w")
> cat(paste("*vertices ",nN),D,sep="\n",file=out)
> close(out)

http://vlado.fmf.uni-lj.si/dl/WoS/cluN.zip

WAn network

#GRANOVET_M(1973)78:1360 (21) GRANOVET_M - 63 GRANOVET_ - 22953

"#COLEMAN_J(1988)94:95" #WASSERMA_S(1994)171:" "#HASHTAG(2012):"