# **Bibliometrix**

## **Prerequisites and Documentation**  
- Install R
- Install R kernel in Jupyter Lab (I used the instructions from here https://richpauloo.github.io/2018-05-16-Installing-the-R-kernel-in-Jupyter-Lab/)  
  Find the location of the file R.exe on your computer (C\Program Files\R\R-3.4.3\bin), open Anaconda prompt, move to the folder (cd C:\Program Files\R\R-3.4.3\bin) and type "R.exe" to open it. Then type "install.packages("devtools")". Then download and install Git for windows from the web (this step is different in the webpage, but it worked for me this way, probably because of windows). Then finally type "IRkernel::installspec()"  

### **Important Information**

**FIELD TAGS of Web Of Science core collection**
http://www.bibliometrix.org/documents/Field_Tags_bibliometrix.pdf  
**Main manual (for function references)**
https://cran.r-project.org/web/packages/bibliometrix/bibliometrix.pdf  
**Examples and explanations manual**
https://cran.r-project.org/web/packages/bibliometrix/vignettes/bibliometrix-vignette.html

## **Library execution and file reading** 
First we execute the bibliometrix library package and convert the BibTeX files downloaded from Scopus and WOS into dataframes on R

In [None]:
library(bibliometrix)
library(ggplot2)
library(reshape2)
# options(jupyter.plot_mimetypes = c("text/plain", "image/png" ))
setwd("/Users/macadmin/Google Drive/Shared Folders/GW-ABM — Review/Bibliometric Analysis/")

### **I - New Database**

In [None]:
# First we set up the working directory (wd)
setwd("/Users/macadmin/Google Drive/GW-ABM — Review/Bibliometric Analysis/")

# We read databases from WOS and Scopus (BibTeX files in wd) 
SDB <- readFiles("Data/Floods/Floods_SCOPUS.bib")
WDB <- readFiles("Data/Floods/Floods_WOS1.bib")
WDB2 <- readFiles("Data/Floods/Floods_WOS2.bib")

# Converting BibTeX to Dataframes
SDB_DF <- convert2df(SDB, dbsource = "scopus", format = "bibtex")
WDB_DF <- convert2df(WDB, dbsource = "isi", format = "bibtex")
WDB_DF_2 <- convert2df(WDB2, dbsource = "isi", format = "bibtex")


# We merge dataframes and remove duplicates
#Merged <- mergeDbSources(WDB_DF, SDB_DF, remove.duplicated=TRUE)
Merged <- mergeDbSources(WDB_DF, SDB_DF, WDB_DF_2,remove.duplicated=TRUE)

# We remove duplicates by searching through the title field (then we use the DOI identifier in python, since NaN values could be dropped in R)
New_Merged <- duplicatedMatching(Merged, Field = "TI", tol = 0.90)
message(nrow(New_Merged), ' rows and ', ncol(New_Merged), ' columns')

**Document Types Cleanup**

In [None]:
#  Here we erase rows in which column DT equals "BOOK"
M <- New_Merged[New_Merged$DT != "BOOK", ]
message(nrow(M), ' rows and ', ncol(M), ' columns')

**Metatags incertion to Dataframe**

In [None]:
M <- New_Merged
# Here we chek the cited references [CR] column, obtaining a ";" separator
#message(M$CR[1])  
message(nrow(M), ' rows and ', ncol(M), ' columns')

# This code retrieves FIELD TAGS (e.g. references authors) not included in the dataframe generated but still in the database and adds them as new columns
M <- metaTagExtraction(M, Field = "CR_AU", sep = ";")   # First author of each cited reference
M <- metaTagExtraction(M, Field = "CR_SO", sep = ";")   # Source of each cited reference
M <- metaTagExtraction(M, Field = "AU_CO", sep = ";")   # Country of affiliation for each co-author
M <- metaTagExtraction(M, Field = "AU1_CO", sep = ";")  # Country of affiliation for the first author
M <- metaTagExtraction(M, Field = "AU_UN", sep = ";")   # University of affiliation  for each co-author and the corresponding author
M <- metaTagExtraction(M, Field = "SR", sep = ";")      # Short tag of the document
message(nrow(M), ' rows and ', ncol(M), ' columns')

# Finally we create a .csv file for further cleaning of the database through Python
write.csv(M,'Merged_Dataframe_2.csv')

### **I.I - Database cleaning through Python**

### **I.II - Reupdate of database to Bibliometrix**

In [None]:
# Here after reading we set the proper conditions of each column, be aware there might be less column tags.
M <- read.csv(file = "Database.csv") # Or: "Merged_Dataframe_2_checked.csv"
M$DE<- as.character(M$DE)
M$SO<- as.character(M$SO)
M$AU_UN<- as.character(M$AU_UN)
M$AU_CO<- as.character(M$AU_CO)
M$AU<- as.character(M$AU)
M$AU1_CO<- as.character(M$AU1_CO)
M$AU_UN<- as.character(M$AU_UN)
M$AU_UN_NR<- as.logical(M$AU_UN_NR)
M$AU1_UN<- as.character(M$AU1_UN)
M$CR_AU<- as.character(M$CR_AU)
M$CR_SO<- as.character(M$CR_SO)
M$DT<- as.character(M$DT)
M$DT2<- as.character(M$DT2)
M$ID<- as.character(M$ID)
M$JI<- as.character(M$JI)
M$LA<- as.character(M$LA)
M$PN<- as.character(M$PN)
M$PP<- as.character(M$PP)
M$PU<- as.character(M$PU)
M$PY<- as.numeric(M$PY)
M$RP<- as.character(M$RP)
M$SN<- as.character(M$SN)
M$AB<- as.character(M$AB)
M$DB<- as.character(M$DB)
M$CR<- as.character(M$CR)
M$SR<- as.character(M$SR)
M$AR<- as.character(M$AR)
M$C1<- as.character(M$C1)
M$DI<- as.character(M$DI)
M$SR_FULL<- as.character(M$SR_FULL)
M$TC<- as.numeric(M$TC)
M$TI<- as.character(M$TI)
M$VL<- as.character(M$VL)

M$FU<- as.character(M$FU)
M$BN<- as.character(M$BN)
message('Database has ', nrow(M), ' rows and ', ncol(M), ' columns')

In [None]:
# Re-indexing, only if needed
M <- subset(M, select = -c(X) )
rownames(M) <- M$Unnamed..0
M <- subset(M, select = -c(Unnamed..0) )

### **II - Updating (previous) Database with new papers**

To add new papers to the database: First, we read the input bibtex file. Then, export it as a csv with al its columns. Finally, it is concatenated to the actual database in csv and read back to bibliometrix. 

In [None]:
# We read databases from WOS and/or Scopus (BibTeX files in wd) 
#WDB_new <- readFiles("Data/wos.bib")
SDB_new <- readFiles("Data/scopus_new_DB.bib")

# Converting BibTeX to Dataframes
#WDB_DF <- convert2df(WDB_new, dbsource = "isi", format = "bibtex")
SDB_DF <- convert2df(SDB_new, dbsource = "scopus", format = "bibtex")

# N <- WDB_DF
N <- SDB_DF

message('New database has ',nrow(N), ' rows and ', ncol(N), ' columns')
message('Old database has ', nrow(M), ' rows and ', ncol(M), ' columns')
Merged <- mergeDbSources(M, N)
message('Merged database now has ', nrow(Merged), ' rows and ', ncol(Merged), ' columns')


Merged <- metaTagExtraction(Merged, Field = "CR_AU", sep = ";")   # First author of each cited reference
Merged <- metaTagExtraction(Merged, Field = "CR_SO", sep = ";")   # Source of each cited reference
Merged <- metaTagExtraction(Merged, Field = "AU_CO", sep = ";")   # Country of affiliation for each co-author
Merged <- metaTagExtraction(Merged, Field = "AU1_CO", sep = ";")  # Country of affiliation for the first author
Merged <- metaTagExtraction(Merged, Field = "AU_UN", sep = ";")   # University of affiliation  for each co-author and the corresponding author
Merged <- metaTagExtraction(Merged, Field = "SR", sep = ";")      # Short tag of the document
message('Merged database now has ', nrow(Merged), ' rows and ', ncol(Merged), ' columns')

write.csv(Merged,'Updated_Database.csv')

## **II.I - Database cleaning through Python**

## **II.II - Reupdate of database to Bibliometrix**

In [None]:
# Here after reading we set the proper conditions of each column, be aware there might be less column tags.
M <- read.csv(file = "FL_ABM_DB.csv")
M$DE<- as.character(M$DE)
M$SO<- as.character(M$SO)
M$AU_UN<- as.character(M$AU_UN)
M$AU_CO<- as.character(M$AU_CO)
M$AU<- as.character(M$AU)
M$AU1_CO<- as.character(M$AU1_CO)
M$AU_UN<- as.character(M$AU_UN)
M$AU_UN_NR<- as.logical(M$AU_UN_NR)
M$AU1_UN<- as.character(M$AU1_UN)
M$CR_AU<- as.character(M$CR_AU)
M$CR_SO<- as.character(M$CR_SO)
M$DT<- as.character(M$DT)
M$DT2<- as.character(M$DT2)
M$ID<- as.character(M$ID)
M$JI<- as.character(M$JI)
M$LA<- as.character(M$LA)
M$PN<- as.character(M$PN)
M$PP<- as.character(M$PP)
#M$PU<- as.character(M$PU)
M$PY<- as.numeric(M$PY)
M$RP<- as.character(M$RP)
M$SN<- as.character(M$SN)
M$AB<- as.character(M$AB)
M$DB<- as.character(M$DB)
M$CR<- as.character(M$CR)
M$SR<- as.character(M$SR)
M$AR<- as.character(M$AR)
M$C1<- as.character(M$C1)
M$DI<- as.character(M$DI)
M$SR_FULL<- as.character(M$SR_FULL)
M$TC<- as.numeric(M$TC)
M$TI<- as.character(M$TI)
M$VL<- as.character(M$VL)

#M$FU<- as.character(M$FU)
#M$BN<- as.character(M$BN)
message('Database has ', nrow(M), ' rows and ', ncol(M), ' columns')

## **III - Outputs**

### **1. Descriptive Summary**  
Here we create a summary of the bibliographic data through several output tables and graphs

In [None]:
# The variable "results" saves the bibliographic summary (here "k" indicates how many rows will be printed in each table)
results <- biblioAnalysis(M, sep = ";")     
options(width=100)
S <- summary(object = results, k = 20, pause = FALSE)
plot(x = results, k = 20, pause = FALSE)

### **2. Sankey Plot - Three Fields Plot**
This diagram allows to visualize the main items of three fields (e.g. authors, keywords, journals), and how they are related

In [None]:
# Here "n" indicates the number of items to plot in each field, and width and height are in pixels
threeFieldsPlot(M, fields = c("SO", "DE", "AU1_CO"), n = c(20, 20, 20), width = 1000, height = 900)   # Authors - Author Keywords - Sources (Journal)

In [None]:
# Here "n" indicates the number of items to plot in each field, and width and height are in pixels
threeFieldsPlot(M, fields = c("AU", "DE", "SO"), n = c(10, 11, 11), width = 1000, height = 900)   # Authors - Author Keywords - Sources (Journal)

In [None]:
# Here "n" indicates the number of items to plot in each field, and width and height are in pixels
threeFieldsPlot(M, fields = c("AU", "DE", "SO"), n = c(8, 11, 5), width = 1000, height = 900)   # Authors - Author Keywords - Sources (Journal)

### **3. Bibliographic Networks**  
Different bibliographic Networks (matrixes) can be built, considering "authors" (AU), "references" (CR_SO), "sources" (SO), "countries" (AU_CO + AU1_CO), "keywords" (ID), "author_keywords" (DE), "titles" (TI), or "abstracts" (AB). Outputs include main statistics and plots.

**NetworkPlot Function**  
- Normalize: The association strength or proximity index. The inclusion index, also called Simpson coefficient, is an overlap measure used in information retrieval. The Jaccard index (or Jaccard similarity coefficient) gives us a relative measure of the overlap of two sets. It is calculated as the ratio between the intersection and the union of the reference lists (of two manuscripts). The Salton index, instead, relates the intersection of the two lists to the geometric mean of the size of both sets. The square of Salton index is also called Equivalence index. The indices are equal to zero if the intersection of the reference lists is empty.


- Weighted:  This argument specifies whether to create a weighted graph from an adjacency matrix.   If it is NULL then an unweighted graph is created and the elements of the adjacency matrix gives the number of edges between the vertices.  If it is a character constant then for every non-zero matrix entry an edge is created and the value of the entry is added as an edge attribute named by the weighted argument.  If it is TRUE then a weighted graph is created and the name of the edge attribute will be weight.

| Function Parameter |   Possibilities   |    Definition    |
|:--------------------:|:-------------------:|:------------------:|
|*1- Methods*|
|   **Normalize**    | "Association", "Jaccard", "Inclusion", "Salton" or "Equivalence"| Association strength or other similarity indexes are obtained respectively|
|     **Type**       | "circle", "sphere", "mds", "fruchterman", "kamada", "auto"         | Represents the layout type of the network map|
|     **Cluster**    | "none", "optimal", "louvain", "infomap", "edge_betweennes", "walktrap", "spinglass", "leading_eigen", "fast_greedy"         | Type of cluster to perform|
|     **Degree**          |  If different than "NULL", n is ignored        | Indicates the minimum frequency of a vertex|
|     **Weighted**          |  "TRUE" or "NULL"        |See description above|
|    *2- Options*          |  
|     **noloops**          | "TRUE" or "FALSE"         | If TRUE, loops in the network are deleted|
|     **remove.multiple**          | "TRUE" or "FALSE"         | If TRUE, multiple links are plotted using just one edge|
|     **remove.isolates**          | "TRUE" or "FALSE"         | If TRUE, isolated vertices are not plotted|
|     **alpha**          | *integer*  (0 to 1)     | Number from 0 (transparent) to 1 (opaque), while default is 0.5|
|     **halo**          | "TRUE" or "FALSE"         | If TRUE, communities are plotted using different colors (default is false|
|*3- Labels*|
|     **Label**          |  "TRUE" or "FALSE"        | Defines if vertex labels are plotted|
|     **Labelsize**          |  *integer*       | Label size in the plot (default is 1)|
|     **label.color**          |  "TRUE" or "FALSE"       |  If TRUE, "label color" is the same as its "cluster"|
|     **label.n**          |  *integer*    | Indicates the number of vertex labels to draw|
|     **label.cex**          |   "TRUE" or "FALSE"        | If true, "label size" of each vertex is proportional to its "degree"|
|*4- Vertex*|
|     **n**          |  *integer*        | Indicates the number of vertices to plot|
|     **size**          |  *integer*        | Indicates the size of each vertex (default is 3)|
|     **size.cex**          | "TRUE" or "FALSE"         | If TRUE, vertex "size" is proportional to its "degree"|
|*5- Edges*|
|     **edgesize**          | *integer*         | Indicates the network edge size|
|    **edges.min**      | *integer*| Indicates the minimum frequency of edges between two vertices (if zero, all edges are plotted)|
|   **curved**   | "TRUE" or "FALSE"     | Default is FALSE, else, edges are plotted with an optimal curvature (number between 0 to 1) |


####  **3.1 - Coupling Network: Analyzing citing documents**  
Two articles are said to be bibliographically coupled if at least one cited source appears in the reference lists of both articles. Since this depends on the number of references, "normalizesimilarity" can be used in networkplotting afterwards. This networks can be done over references, authors, cources or countries.  
The strength of the bibliographic coupling of two articles, i and j is defined simply by the number of references that the
articles have in common. **It can be calculated for: documents, authors, sources, keywords, and countries**


In [None]:
NetMatrix <- biblioNetwork(M, analysis="coupling", network="authors",sep = ";", shortlabel = TRUE)
net <- networkPlot( NetMatrix, Title="Authors' Coupling",
                   normalize="jaccard", type="auto", cluster="none", degree=NULL, weighted=NULL,                #methods
                   noloops=TRUE, remove.multiple=FALSE, remove.isolates = FALSE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=FALSE, label.color=FALSE, label.n = 15,                            #labels
                   n=700, size=10, size.cex=FALSE,                                                                       #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)                                                               #edges

  #### **3.2 - Co-citation Network: Analycing cited documents**  
Two papers are linked (co-cited) if another paper cites both of them. Co-citation of two articles occurs when both are cited in a third article. Thus, co-citation is the counterpart of bibliographic coupling.  
The useful dimensions to comment the co-citation networks are: (i) centrality and peripherality of nodes, (ii) their proximity and distance, (iii) strength of ties, (iv) clusters, (iiv) bridging contributions.  
**IOt can be calculated for sources, references or authors**

In [None]:
# Co-citation with sources uses CR_SO
## Our visualization algorithm treats each link as a spring and arranges the nodes to make links as short as possible
NetMatrix <- biblioNetwork(M, analysis = "co-citation", network = "sources", sep = ";")
net <- networkPlot( NetMatrix, Title="Co-Citation Network",
                   normalize="jaccard", type="auto", cluster="none", degree=NULL, weighted=NULL,                #methods
                   noloops=TRUE, remove.multiple=FALSE, remove.isolates = FALSE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=FALSE, label.color=FALSE, label.n = 15,                            #labels
                   n=20, size=10, size.cex=TRUE,                                                                       #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)                                                              #edges

  #### **3.3 - Collaboration Network**  
Scientific collaboration network is a network where nodes are authors and links are co-authorships as the latter is one of the most well-documented forms of scientific collaboration. Collaboration networks show how authors, institutions (e.g. universities or departments) and countries relate to others in a specific field of research. **It can be authors, universities or countries**

In [None]:
# This one discovers regular study groups, hidden groups of scholars, and pivotal authors
NetMatrix <- biblioNetwork(M, analysis = "collaboration",  network = "authors", sep = ";")
net <- networkPlot( NetMatrix, Title="Author collaboration",
                   normalize="association", type="auto", cluster="none", degree=NULL, weighted=TRUE,                #methods
                   noloops=TRUE, remove.multiple=TRUE, remove.isolates = TRUE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=FALSE, label.color=FALSE, label.n = 100,                            #labels
                   n=100, size=10, size.cex=TRUE,                                                                       #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 1)                                                              #edges

  #### **3.4 - Co-occurrences Network**  
This can be donde over keywords (plus), authors, sources, author_keywords, titles or abstracts. For the last ones first the extraction algorithm must be used and a new column must be generated for the dataframe

In [None]:
# Co-occurrence of authors in the author list of a document
NetMatrix <- biblioNetwork(M, analysis = "co-occurrences", network = "authors", sep = ";")
net <- networkPlot( NetMatrix, Title="Authors Co-occurrences",
                   normalize="association", type="fruchterman", cluster="none", degree=NULL, weighted=NULL,                #methods
                   noloops=TRUE, remove.multiple=FALSE, remove.isolates = FALSE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=TRUE, label.color=FALSE, label.n = 15,                            #labels
                    n = 100, size=10, size.cex=FALSE,                                                                   #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 1)                                                              #edges

  #### **5. - Descriptive analysis and statistics of networks**

In [None]:
# First we generate statistics
netstat <- networkStat(NetMatrix)
# Then we print the available structural properties of the network, and a short summary (showing "k" rows)
names(netstat$network)
summary(netstat, k=10)

  #### **3.6 - Visualizing network (.net) files in VOSviewer**

In [None]:
# This code saves the net "networkplot" item into a pajek netwrok file named "vosnetwork.net" for vosviewer (create)
net2VOSviewer(net, vos.path="D:/Project R")

### **4. Co-Word Analysis: The conceptual structure of a field**  
Co-Word analysis uses the most important words or keywords of documents to study the conceptual structure of a research field (It is the only method that uses the actual content of the documents to construct a similarity measure).  
It produces semantic maps of a field. Conceptual structure is often used to understand the topics covered by scholars (so-called research front) and identify what are the most important and the most recent issue.
Here we map the conceptual structure by using the word co-occurrences in a bibliographic collection. It performs Correspondence Analysis (CA) or Multiple Correspondence Analysis (MCA) to draw a conceptual structure of the field and K-means clustering to identify clusters of documents which express common concepts.  
Outputs include: **Conceptual Structure Map, Topic Dendogram, Factorial maps of the documents with the highest contributes and factorial map of the most cited documents**  

**Conceptual Structure Function**  

| Function Parameter |   Possibilities   |    Definition    |
|:--------------------:|:-------------------:|:------------------:|
|*1- Methods*|
|   **Field**    | "ID", "DE", "ID_TM", "DE_TM", "TI" or "AB"| Terms extracted from Keywords plus, author's keywords, keywords plus stemmed through Porter's algorithm, Author keywords stemmed through Porter's algorithm, terms extracted from titles and terms extracted from abstracts respectively|
|     **method**       | "CA", "MCA" or "MDS"       | Indicates the factorial method used to create the factorial map: Correspondence Analysis, Multiple CA or Metric Multidimensional Scaling (default is MCA)|
|     **clust**          | "auto" or integer (2-8)     |Indicates the number of clusters to map |
|     **k.max**          | *integer* (max 20) | Indicates maximum number of cluster to keep (default is 5)|
|     **steeming**          | "TRUE" or "FALSE"         | If TRUE, Porter's Stemming algorithm is applied to all extracted terms (default is false)|
|     **mindegree**          |  *integer*      |indicates the minimum occurrences of terms to analize and plot (default is 2)|
|     **labelsize**          | *integer*   | Indicates the label size in the plot (default is 10)|
|     **graph**          |  "TRUE" or "FALSE"        | If TRUE the function plots the maps otherwise they are saved in the output object (Default is true)|
|     **documents**          | *integer*      |Indicates the number of documents to plot in the factorial map, used for CA and MCA (default is 10).|
|*2- only for CA and MCA*|
|     **quali.supp**    | vector    |Vector indicating the indexes of the categorical supplementary variables, used only for CA and MCA |
|     **quanti.supp**          | vector   | Vector indicating the indexes of the quantitative supplementary variables, used only for CA and MCA |

In [None]:
CS$res$var


In [None]:
# Using CA method over Keywords Plus
CS <- conceptualStructure(M, field="DE", method="MCA", clust="10", k.max = 10, stemming=FALSE, minDegree=2, labelsize=10, documents=10)

### **5. Thematic Evolution Analysis**  
It is based on co-word network analysis and clustering, and begins from the "thematicmap" function.

Co-word analysis draws clusters of keywords. They are considered as themes, whose density and centrality can be used in classifying themes and mapping in a two-dimensional diagram. Thematic map is a very intuitive plot and we can analyze themes according to the quadrant in which they are placed: (1) upper-right quadrant: motor-themes; (2) lower-right quadrant: basic themes; (3) lower-left quadrant: emerging or disappearing themes; (4) upper-left quadrant: very specialized/niche themes.

In [None]:
message(M$CR[1])

In [None]:
Map=thematicMap(M, field = "DE", n = 250, minfreq = 5, stemming = FALSE, size = 0.5, n.labels=5, repel = TRUE)
plot(Map$map)

In [None]:
Clusters=Map$words[order(Map$words$Cluster,-Map$words$Occurrences),]
library(dplyr)
CL <- Clusters %>% group_by(.data$Cluster_Label) %>% top_n(5, .data$Occurrences)
CL

In [None]:
# First the Thematic Map function, then the thematic evolution, then the plot
years = c(2000, 2019)
res <- thematicMap(M, field = "ID", n = 250, minfreq = 5, size = 0.5, repel = TRUE)

plot(res$map)

### **6. Reference Publication Year Spectroscopy**  
Method used for detecting the Historical Roots of Research Fields.

In [None]:
# Here sep character is the one of the Cited References column (CR) of dataframe. All timespan is considered with timespam null
res <- rpys(M, sep = ";", timespan = NULL, graph = T)

### **7. Historical Direct Citation Network**  
The historiographic map represents a chronological networkmap of most relevant direct citations resulting from a bibliographic collection. 

In [None]:
histResults <- histNetwork(M, min.citations = 10, sep = ";")
net <- histPlot(histResults, n=15, size = 10, labelsize=5)

### **8. Frequency Distributions and Dynamics**  
To continue the analysis, first we need to check the separator used to split information in the dataframe

* With "citations" we calculate the distribution of "cited citations or cited authors (only first authors for WoS database) with field=article or field=authors respectively
(i.e. en las refencias, calcula los artículos (+source)/autores más citados devolviendolos como lista)    

In [None]:
# Most cited articles and sources associated:
CR <- citations(M, field = "article", sep = ";")
cbind(CR$Cited[1:10]) 
cbind(CR$Source[1:10])

# Most cited authors
CR <- citations(M, field = "author", sep = ";")  
cbind(CR$Cited[1:10])

* Authors indexes and dominance

In [None]:
# Authors' indexes
authors=gsub(","," ",names(results$Authors)[1:10])
indices <- Hindex(M, field = "author", elements=authors, sep = ";", years = 50)
indices$H

# Authors' dominance can also be obtained through:
dominance(results, k = 10)

* Top-Authors' Productivity over Time

In [None]:
# Here K is the number of authors
authorProdOverTime(M, k = 15, graph = TRUE)

* Sources can also be cluster through Bradford's law to obtain the most relevant journals

In [None]:
# See figure at the end
bradford(M)

In [None]:
# With Local citations we measure how many times an author (or a document) included in this collection have been cited by the documents also included in the collection
CR <- localCitations(M, sep = ";")    
CR$Authors[1:10,]
CR$Papers[1:10,]

* Sources Dynamics

In [None]:
# Top Sources Growth, considering the cumulative occurrences distribution
SW <- sourceGrowth(M, top = 5, cdf = TRUE)
DF=melt(SW, id='Year')
ggplot(DF,aes(Year,value, group=variable, color=variable))+geom_line()

## **OTHERS**

## **Keywordgrowth**  
KeywordGrwoth calculates for the top X keywords (DE or ID) the cumulative or punctual distribution, returning a dataframe with each year of information.

In [None]:
KW_PDF <- KeywordGrowth(M, Tag = "DE", sep = ";", top = 4000, cdf = FALSE)
KW_CDF <- KeywordGrowth(M, Tag = "DE", sep = ";", top = 4000, cdf = TRUE)
write.csv(KW_PDF,'KW_PDF.csv')
write.csv(KW_CDF,'KW_CDF.csv')

**TERM EXTRACTION**

In [None]:
termExtraction(M, Field = "TI", stemming = FALSE,language = "english", remove.numbers = TRUE, 
               remove.terms = NULL,keep.terms = NULL, synonyms = NULL, verbose = TRUE)

# **Final Outputs**

In vosviewer
- The attraction/repulsion helps put closer or further the terms
- the clustering options help reach the desired ammount of clusters to graph

In R:
- The number of elements displayed is crucial for visualization. 
- Leave type to auto and then edit it through vosviewer
- Using weigthed True allows to display both "total link strength" or "links" as a visualization
- Looks better without multiples and isolated ones
- Degree messes all up

## **Co-Occurrences of: Author Keywords**

In [None]:
# 1 - Full DB, Normal, weighted. I proved puting more n (175, 200, 225, 250), but just gets more little clusters 
# located far away from the central ones (non-related)
# Dont plot Principal Component Analysis and Dynamic Programming
# Specs: Clustering (resolution = 0.6, min cluster size =1). Layout (2 Att. and 1 Rep.) Lines (min strength 3 and max lines 1483)
# 4 clusters, 1483 links and 2724 total links strenght
# Migth be a good idea to remove 3-5 terms that arent actually well connected
NetMatrix <- biblioNetwork(M, analysis = "co-occurrences", network = "author_keywords", sep = ";")
net <- networkPlot(NetMatrix, Title="Author Keywords Co-occurrences",
                   normalize="association", type="auto", cluster="none", degree=NULL, weighted=TRUE,                #methods
                   noloops=TRUE, remove.multiple=TRUE, remove.isolates = TRUE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=TRUE, label.color=FALSE, label.n = 150,                            #labels
                   n = 150, size=10, size.cex=TRUE,                                                                   #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)                                                              #edges
#net2VOSviewer(net, vos.path="Outputs/DE_Co_occurrences/1")

In [None]:
netstat <- networkStat(NetMatrix, stat = "all", type = "authority")
# Then we print the available structural properties of the network, and a short summary (showing "k" rows)
names(netstat$network)
summary(netstat, k=50)
# Type: "degree", "closeness", "betweenness","eigenvector","pagerank","hub","authority"

In [None]:
# i think this one will be easyer to understand once we do the all revision. Like identifying small clusters from each keystone
# paper or colouring each paper accordingly to the colors in vosviewer

# 2 - GW_ABM DATABASE, Normal, weighted, but less N
# Specs: Clustering (resolution = 0.03, min cluster size =10). Layout (3 Att. and 1 Rep.) Lines (min strength 0 and max lines 1000)
# 3 clusters, 652 links and 677 total links strength
# Migth be a good idea to remove 3-5 terms that arent actually well connected
NetMatrix <- biblioNetwork(M, analysis = "co-occurrences", network = "author_keywords", sep = ";")
net <- networkPlot( NetMatrix, Title="Author Keywords Co-occurrences",
                   normalize="association", type="auto", cluster="none", degree=NULL, weighted=TRUE,                #methods
                   noloops=TRUE, remove.multiple=TRUE, remove.isolates = TRUE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=TRUE, label.color=FALSE, label.n = 500,                            #labels
                   n = 500, size=10, size.cex=TRUE,                                                                   #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)                                                              #edges
net2VOSviewer(net, vos.path="Outputs/DE_Co_occurrences/2")

## **Co-Citation of Authors**

In [None]:
# There is an error on how the author names are being handled. Specifically Lempert appears in 4 different ways, but the Authors
# column looks perfectly fine so its the Citing references column that has the issue.

# 1 Full Database, weighted
# Specs: Clustering (resolution = 0.90, min cluster size =1). Layout (4 Att. and 0 Rep.) Lines (min strength 0 and max lines 1000)
# 5 clusters, 21865 links and 75738 total links strength
# Migth be a good idea to remove terms that arent actually well connected

NetMatrix <- biblioNetwork(M, analysis = "co-citation", network = "authors", sep = ";")
net <- networkPlot( NetMatrix, Title="Authors Co-Citation Network",
                   normalize="association", type="auto", cluster="none", degree=NULL, weighted=TRUE,                #methods
                   noloops=TRUE, remove.multiple=TRUE, remove.isolates = TRUE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=FALSE, label.color=FALSE, label.n = 300,                            #labels
                   n=300, size=10, size.cex=TRUE,                                                                       #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)
net2VOSviewer(net, vos.path="Outputs/Co-Citations/1")

## **Bibliographic Coupling of Authors**

In [None]:
# 2 Full Database, weighted
# Specs: Clustering (resolution = 1.05, min cluster size =10). Layout (3 Att. and 0 Rep.) Lines (min strength 1 and max lines 1000)
# 6 clusters, 104650 links and 2456871 total links strength, 500 elements
# Migth be a good idea to remove terms that arent actually well connected

NetMatrix <- biblioNetwork(M, analysis = "coupling", network = "authors",sep = ";", shortlabel = FALSE)
net <- networkPlot( NetMatrix, Title="Authors' Coupling",
                   normalize="association", type="auto", cluster="none", degree=NULL, weighted=TRUE,                #methods
                   noloops=TRUE, remove.multiple=TRUE, remove.isolates = TRUE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=FALSE, label.color=FALSE, label.n = 500,                            #labels
                   n=500, size=10, size.cex=TRUE,                                                                       #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)
net2VOSviewer(net, vos.path="Outputs/Coupling/2")

In [None]:
# 3 (new) Full Database, weighted
# Specs: Clustering (resolution = 1.05, min cluster size =10). Layout (3 Att. and 0 Rep.) Lines (min strength 1 and max lines 1000)
# 6 clusters, 104650 links and 2456871 total links strength, 500 elements
# Migth be a good idea to remove terms that arent actually well connected

NetMatrix <- biblioNetwork(M, analysis = "coupling", network = "authors",sep = ";", shortlabel = FALSE)
net <- networkPlot( NetMatrix, Title="Authors' Coupling",
                   normalize="association", type="auto", cluster="none", degree=NULL, weighted=TRUE,                #methods
                   noloops=TRUE, remove.multiple=TRUE, remove.isolates = TRUE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=FALSE, label.color=FALSE, label.n = 600,                            #labels
                   n=600, size=10, size.cex=TRUE,                                                                       #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)
net2VOSviewer(net, vos.path="Outputs/Coupling/3")

In [None]:
# 4 (new) Full Database, weighted
# Specs: Clustering (resolution = 1.05, min cluster size =10). Layout (3 Att. and 0 Rep.) Lines (min strength 1 and max lines 1000)
# 6 clusters, 104650 links and 2456871 total links strength, 500 elements
# Migth be a good idea to remove terms that arent actually well connected

NetMatrix <- biblioNetwork(M, analysis = "coupling", network = "authors",sep = ";", shortlabel = FALSE)
net <- networkPlot( NetMatrix, Title="Authors' Coupling",
                   normalize="association", type="auto", cluster="none", degree=NULL, weighted=TRUE,                #methods
                   noloops=TRUE, remove.multiple=TRUE, remove.isolates = TRUE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=FALSE, label.color=FALSE, label.n = 800,                            #labels
                   n=800, size=10, size.cex=TRUE,                                                                       #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)
net2VOSviewer(net, vos.path="Outputs/Coupling/4")

In [None]:
netstat <- networkStat(NetMatrix, stat = "all", type = "authority")
# Then we print the available structural properties of the network, and a short summary (showing "k" rows)
names(netstat$network)
summary(netstat,k=100)
# Type: "degree", "closeness", "betweenness","eigenvector","pagerank","hub","authority"

## **Collaboration**

In [None]:
# This one uses AU_CO metatag
NetMatrix <- biblioNetwork(M, analysis = "collaboration",  network = "countries", sep = ";")
net <- networkPlot( NetMatrix, Title="Country collaboration",
                   normalize="association", type="auto", cluster="none", degree=NULL, weighted=TRUE,                #methods
                   noloops=TRUE, remove.multiple=TRUE, remove.isolates = TRUE, alpha=0.5, halo=TRUE,                   #other options
                   label=TRUE, labelsize=1, label.cex=FALSE, label.color=FALSE,                            #labels
                    n = 30, size=10, size.cex=TRUE,                                                   #vertes/nodes 
                   edgesize=1, edges.min=0.5, curved = 0)                                                              #edges
net2VOSviewer(net, vos.path="Outputs/Collaboration/1")

## **Network Stats**

In [None]:
netstat <- networkStat(NetMatrix, stat = "all", type = "authority")
# Then we print the available structural properties of the network, and a short summary (showing "k" rows)
names(netstat$network)
summary(netstat)
# Type: "degree", "closeness", "betweenness","eigenvector","pagerank","hub","authority"