In [1]:
import pandas as pd
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt

#### Constructing Semantic and Product Networks

The edges in cooccurrence
networks are implicit: they are not given (and often not even obvious); you have
to deduce, extract, and calculate them from other data, and this is a significant
departure from the relatively intuitive way you build social networks.
Cooccurrence networks are living proof that you can connect anything to
anything and make sense of the connections.

##### Semantic Networks

A semantic network is a network of nodes that represent terms—words, word
stems, word groups, or concepts—connected based on the similarity or
dissimilarity of their usage or meanings. Link terms that:
* Are commonly used together in the same place in text: same sentence,
paragraph, chapter, scene, act, list of keywords, list of interests in a social
network, and so on (“semantic” ↔ “network”)
* Describe the same property (“red” ↔ “blue”)
* Occupy the same semantic niche (synonyms: “program” ↔ “application”;
hypernyms: “pet” ↔ “cat”; antonyms: “erase” ↔ “restore”)

If your network
has negatively weighted edges by construction, be prepared to remove them
before analyzing the network.

Knowledge specialists use semantic networks for graphical (and machinereadable)
knowledge representation, and social and behavioral researchers and
anthropologists use semantic networks for semantic domain analysis. Let’s have
a look at two not-so-typical semantic networks: a network of keywords for
fraud-related research papers and a network of characters from Othello.

##### Detect Food Fraud

Suppose you do research in accounting—namely, in fraud
—and want to know everything about fraud types. You understand that nobody
knows fraud better than other fraud researchers and fraudsters themselves. The
latter are typically off limits, but the former are well represented in numerous
databases of academic research papers. You could collect all research papers that
mention “fraud,” extract subject tags assigned to them by database editors, and
create a semantic network of the tags, based on their co-occurrence. The subject
tags (such as DNA and meat industry) are the nodes of the network. Two tag nodes
are adjacent if the tags are frequently assigned together to the same paper. For
example, the nodes food fraud and food safety are adjacent because many research
papers focus on food fraud and food safety.

<img src="./images/food fraud.png" alt="lib_compare" />

##### Expose a Protagonist
The emerging field of digital humanities uses co-occurrence semantic networks
to analyze texts: plays, scripts, and other forms of prose and poetry. The method
allows us to identify the main and peripheral characters (see core-periphery
analysis here); group characters and places (see Outline Modularity-Based
Communities); and eventually break down the storyline into scenes suitable, say,
for film or stage adaptation.

Let’s outline a semantic network construction from the text of Othello. After you
read the next chapter and the case studies, you will be able to implement the
algorithm in Python. This exercise is inspired by Measuring Tie Strength in
Implicit Social Network [EG12].

1. You need a list of all characters. Othello is a short text; you can compose
the list by hand. Alternatively, find all references to Enter and Exit remarks;
or collect references to all characters as they speak if there is a property in
the text that identifies the characters. For example, a character may be
marked with an HTML tag, as in <A NAME=speech1><b>RODERIGO</b></a>.[37]

2. You need a definition of co-occurrence. Play scripts are perfect from this
point of view: two characters co-occur if they occur in the same scene! In a
general text, co-occurrence may be based on paragraphs, sections, chapters,
pages, and so on.

3. Now that you have characters (nodes) and their co-occurrences (edges), you
can build a network. Remarkably, once constructed, this network is a social
network, of which you heard so much in Chapter 6, Understanding Social
Networks. The result is shown in the following figure.

4. Finally, you need a measure of importance. How do you know, indeed, who
is the protagonist of the story? Luckily, you have the whole box of network
centralities (Choose the Right Centralities) that you can apply to each node.
When you work with a social network, and the network in the figure is a
social one, the best importance measures are betweenness and eigenvector
centralities. The eigenvector centrality is proportional to the graph node
sizes, and the betweenness centrality is reflected by the node color (the
darker, the more central). Both centralities seem to be in good agreement:
Iago is the protagonist. Not Othello.

#### Product Networks

A product network is a network of retail items. Network nodes in a product
network represent items purchased by individuals and co-occurring in their
shopping baskets or carts. You can connect two product nodes if customers often
or always buy the respective products together. We call such products
complements. Left and right shoes (if sold separately), nuts and bolts, nails and
hammers, and one-way airline tickets from Boston to Seattle and from Seattle to
Boston are good examples of complements: when you buy one, you almost
always buy the other as well.

Product networks can (but do not have to) be weighted: you can define the
weight of the edge as the frequency of co-purchasing. You can slice (Slice
Weighted Networks) the network later to remove low-weighted edges, if you
want.

Sometimes product networks allow negatively weighted edges. If one of the
products in a pair is a reasonable replacement for the other—in some sense!—we
call them substitutes. If you live in Alaska and buy a husky to pull your sled,
then you probably won’t buy a reindeer for the same purpose, at least not at the
same time. (You can still get a reindeer as a pet.) A husky and reindeer are
substitutes; you can connect the respective nodes with a negatively weighted
edge to represent their substitutive nature.

#### Explore Your Pantry

To find a product network, look no further than your pantry.
When you buy prepared food (say, a can of baked beans), you buy an elaborate
concoction of ingredients: prepared beans, water, sugar, applewood smoked
bacon, molasses, textured vegetable protein, and many others. You can think of
the ingredients as separate products that happen to be packed together in the can.
the ingredients as separate products that happen to be packed together in the can.
They occur in the same place at the same time—therefore, they are excellent
candidates for becoming product network nodes. By constructing a product
network, you can learn which ingredient combinations are most common,
whether and how the ingredients group, and which ingredients are central to our
food.

You can collect data for a network of ingredients from the website of the United
States Department of Agriculture (USDA[39]). There is no need to download all
several hundred thousand product descriptions. For starters, we suggest crawling
a couple of thousand pages—for example, 925 products with 356 distinct
ingredients.

For each node, we calculate its betweenness (color) and eigenvector (size)
centralities. The most central nodes represent the core ingredients.

#### Design a Do-It-Yourself Store

Networks of products are common in marketing analysis. Marketing specialists
construct product networks to reveal tightly coupled groups of products
frequently purchased together. Retailers may compactly stock the products in a
group in stores for the ease of shopping. If someone buys a product from a
group, they may be reminded to buy other products from the same group.
Finally, a group of products may be a stepping stone in a long-term customer
project (for example, someone purchasing masonry products may be building a
garage and would later need carpentry tools and materials, followed by brushes
and paints).