# One-mode graph with directed and weighted edges
This is a basic workflow to analyse a one-mode network with python.

Inspirations for network analysis in python can be found in the documetation for various libraries and packages.

More step-by step descriptions and tutorials can be found here: https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python.

## The dataset in the context of the research project

My data comprises information about foreign journal reviews in psychiatry journals
from Europe, the United States, Japan and Russia between 1900 and 1916. I am interested
in pre WWI psychiatry networks and their publication mediums. For the one-mode network
analysis module I will only use the sample of journal reviews made in the year
1902 because it is the most complete data-set and because it marks the year in which
the first journal of psychiatry was published in Japan. 

Representing this data as a network was part of a broader research question about the transformation and transmission of psychiatric knowledge in the early 20th century and more specifically at the time of the Russo-Japanese war (1904-5).
In this context, network analysis and visualisation served as a complimentary approach alongside more traditional forms of historical research and the results obtained from the digital methods served various purposes. 

On the most basic level, network analysis and visualisation allowed me to get a new perspective on my sources. Ideas on psychiatric knowledge circulated in a variety of scientific journals from very different disciplines and the process of institutionalization of the discipline of psychiatry was not at the same stage in all the countries under consideration. Launching specialized journals can be considered a part of the institutionalization process and this step was taken somewhat later in Russia and Japan when compared to countries such as Germany and France. Investigating the relational properties of Russian and Japanese journals within the larger network of international psychiatric journals was one of the purposes of casting the data as a network.

It also allowed me to use the network in an exploratory way. By browsing the journal titles by country of publication, by language, or by journal type etc … I was able to obtain a good overview of the pre-WWI publication landscape. As mentioned above, the historical network does not only contain journals purely devoted to psychiatry. Psychiatrists used to publish their work in general medicine journals (like The Lancet), in journals
specializing in pathology (like Virchow’s Archiv), in anthropology, psychology (like the
Annales médico psychologiques), or philosophy. The network reveals psychiatry’s tight
bonds with neurology and pathology at the time, but it also foreshadows the emerging
influence of psychology on the field. This historical constellation slightly differs from psychiatry's present-day position among other fields of knowledge. The network as a whole thus serves to cast light on the historical development of the discipline and to understand psychiatric knowledge production in the context of this historical constellation.

Another, more specific purpose, was to use the network as a finding aid in order to limit the reading load. The publication landscape in 20th century psychiatry is vast but not all journals were equally important to Japanese and Russian psychiatrists who had their particular reading preferences and affiliations but also followed the popular journals of the time. In this sense, network analysis helped me to identify journals in
the network that are reviewed more often than others which might be a sign of high
authority within the community. Similarly, it also helped to identify journals
that served as intermediaries to bridge language barriers. 


## Data content and structure

My data is stored in two different databases. One of them is a relational database (Access), the other one is a graph database (neo4j). Apart from the journals and journal reviews the dataset also
contains information on article authors, journal editors, article reviewers and places
of publication. In Access, this data is stored in separate tables. Records of people, places and
journals are kept apart in different tables but are connected through identifiers. The graph database has a different structure by design, but contains the same information in a different model (and without null values).
Access allows to export the data as **CSV** files, and in neo4j it can be exported as **JSON**.

However, if the data does not come from a database and does not have the required structure it will need to be put into shape first.

In [None]:
# data wrangling for getting data into the appropriate structure; extracting nodes and edges.

Even before I created the databases, I had made several decisions to collect certain types of information. Since "knowledge transmission" was a key element in my research, I had anticipated to create a **directed network** that could represent knowledge flows. As follows from the research interest described above, I was also interested in the "intensity of the connections" between journals which implies a **weighted network**. Several different network types are appropriate for exploration purposes. I have used the data to create a **multi-mode network** and a **one-mode network** to explore and analyse the data in different ways. This workflow will focus on the the one-mode network type with journals as nodes. A one-mode graph is a more abstract representation of the data and is therefore generally better suited for statistical analysis. Especially the questions related to a journal's **influence** or **importance** can be well represented through different types of node centrality in a one-mode graph. This does not exclude other graph types, but limiting the network to journals-only (as opposed to multiple node types:  journals, reviewers, places, review sections etc.) can help to highlight the position of individual journals within the whole network of journals.

If the user doesn't have a clear understanding of how their research question can be phrased (or "translates") in terms of network analysis, or does not have a clear definition of what the relationships are that they wish to study, then the choice between different graph types will be difficult to make. Taking a data-centric approach, one could make this choice partly dependent on the data itself (if the relationships don't have directionality, you could never create a directed graph). However, there can be no straghtforward solution to help with this decision process if someone doesn't know what they want to do, as different approaches or a combination of them might be potentially useful. A user could be more interested in data exploration or in statistical analysis, they could be looking for patterns in flow or in structure, but these directions are not mutually exclusive and when research is understood as experimental and open-ended, then there can't be a strict guideline on how it should be done.

In [None]:
# choosing one or several graph types, data sanity check

## Data import with structured network data

For all network analysis software the data needs to be in a certain structure. 
If the data comes from a database, the first step usually implies querying the database and creating appropriate data files. This could also be a preprocessing step in the DHARPA software. Accessing a relational database or using an API (MySQL, neo4j, PubMed, Twitter ...) from within the software would require specialised plug-ins. 

In [None]:
# import network data from relational database or API

One very common approach that is compatible with most network analysis software (Gephi, nodegoat, etc.) is to create a list of nodes (also called vertices) and a list of edges (also called tiles or ties). 
In the network graph the nodes are usually represented as circles and the edges as lines connecting the
circles. Typically, these are two different CSV files. The edge list contains the relations
and the node list is a complete list of unique “objects” among which the relations exist.
In my project, the node list is a list of journal titles and their characteristics (or node
attributes); the edge list is a file consisting of tuples usually called “source” and “target”
(of two connected objects), indicating that one journal was reviewed by the other. In
my project there is also a third column in the edge list called “weight”. It represents
the sum of connections of the same kind (for example: journal A reviewing journal
B 6 times). There can also be additional columns with other edge attributes. Those
could denote different types of relations or specify the relations by other properties (time or location), for example there could be a subdivision in positive and negative reviews. My full dataset contains different types of inter-journal relations like: “reviewed”, “translated”, and “mentioned in bibliography”.
If the data comes in CSV format it should be possible to import it directly in this format.

In [None]:
# import network data from CSV files