Let's put together a more advanced ConceptModel usecase.

In this notebook we will put together a quick script for constructing a diagram of a "corporate network" linking together a list of companies branched out from IBM, our starting point.

In [102]:
from watsongraph.conceptmodel import ConceptModel
import time
import requests

We need some way of categorizing concepts according to whether or not they are companies. The following trick works well enough for our purposes, though as we shall see later, it's by no means foolproof.

In [103]:
def categories_snapshot(concept):
    dat = requests.get('https://en.wikipedia.org/wiki/' + concept).text
    return dat[dat.find("<div id='catlinks' class='catlinks'>"):]

In [104]:
'companies' in categories_snapshot('IBM')

True

In [105]:
def companies(l):
    ret = []
    for item in l:
        if 'companies' in categories_snapshot(item):
            ret.append(item)
    return ret

In [106]:
companies(['IBM', 'Microsoft', 'Apple', 'Apple Inc.', 'Apple pie'])

['IBM', 'Microsoft', 'Apple Inc.']

Each step that we take we will need to:

1. Augment the new nodes in the graph.
2. Pare the resulting graph down to companies.

Along the way we also need to keep track of the nodes that we've already augmented, so as not to waste time reaugmenting nodes that we've already augmented before.

In [107]:
expanded_concepts = []

def iter_model(G):
    for concept in [concept for concept in G.concepts() if concept not in expanded_concepts]:
        G.augment(concept)
        expanded_concepts.append(concept)
    return G

In [108]:
ibm = ConceptModel(['IBM'])
ibm = iter_model(ibm)
ibm.concepts()

['.NET Framework',
 'ARM architecture',
 'Advanced Micro Devices',
 'Application programming interface',
 'Berkeley Software Distribution',
 'C (programming language)',
 'Central processing unit',
 'Cloud computing',
 'Compiler',
 'Digital Equipment Corporation',
 'Fortran',
 'FreeBSD',
 'Graphical user interface',
 'Hard disk drive',
 'Hewlett-Packard',
 'IBM',
 'Intel',
 'Java (programming language)',
 'Library (computing)',
 'Linux',
 'Microprocessor',
 'MySQL',
 'Object-oriented programming',
 'Operating system',
 'Oracle Corporation',
 'Programming language',
 'SQL',
 'Server (computing)',
 'Solaris (operating system)',
 'Sun Microsystems',
 'Supercomputer',
 'Unix',
 'Unix-like',
 'X Window System',
 'X86',
 'X86-64',
 'XML']

In [109]:
def reduce_model(G):
    cmp = companies(ibm.concepts())
    for concept in [concept for concept in G.concepts() if concept not in cmp]:
        G.remove(concept)
    return G

In [110]:
ibm = reduce_model(ibm)

In [111]:
ibm.concepts()

['Advanced Micro Devices',
 'Digital Equipment Corporation',
 'Hewlett-Packard',
 'IBM',
 'Intel',
 'Oracle Corporation',
 'Sun Microsystems']

Let's stitch our two operations together and try it out!

In [118]:
def step(G):
    return reduce_model(iter_model(G))

In [119]:
corporate_network = step(ibm)

In [120]:
corporate_network.concepts()

['Advanced Micro Devices',
 'Apple Inc.',
 'Cisco Systems',
 'Dell',
 'Digital Equipment Corporation',
 'Hewlett-Packard',
 'IBM',
 'Intel',
 'List of mobile network operators of Europe',
 'Motorola',
 'Nokia',
 'Oracle Corporation',
 'Samsung',
 'Sun Microsystems',
 'Texas Instruments',
 'Vodafone']

In [128]:
corporate_network = step(step(corporate_network))

In [129]:
corporate_network.concepts()

['AT&T',
 'Advanced Micro Devices',
 'Apple Inc.',
 'Avex Group',
 'Capcom',
 'Cisco Systems',
 'Comcast',
 'Dell',
 'Digital Equipment Corporation',
 'Fuji Television',
 'Goldman Sachs',
 'Hewlett-Packard',
 'Hudson Soft',
 'IBM',
 'Intel',
 'Konami',
 'Korean Broadcasting System',
 'List of mobile network operators of Europe',
 'London Stock Exchange',
 'Motorola',
 'NASDAQ',
 'Namco',
 'New York Stock Exchange',
 'Nokia',
 'Oracle Corporation',
 'Oricon',
 'Philips',
 'Public company',
 'Samsung',
 'Sega',
 'Seoul Broadcasting System',
 'Siemens',
 'Sony',
 'Sony Computer Entertainment',
 'Sony Music Entertainment Japan',
 'Sun Microsystems',
 'THQ',
 'Tesco',
 'Texas Instruments',
 'Ubisoft',
 'Vodafone']

In [130]:
expanded_concepts

['IBM',
 'Advanced Micro Devices',
 'Digital Equipment Corporation',
 'Hewlett-Packard',
 'Intel',
 'Apple Inc.',
 'Cisco Systems',
 'Dell',
 'Motorola',
 'Oracle Corporation',
 'Sun Microsystems',
 'Texas Instruments',
 'Nokia',
 'Samsung',
 'Vodafone',
 'AT&T',
 'Korean Broadcasting System',
 'List of mobile network operators of Europe',
 'London Stock Exchange',
 'Seoul Broadcasting System',
 'Sony']

Let's get greedy.

In [131]:
corporate_network = step(step(step(step(step(corporate_network)))))

In [133]:
len(corporate_network.concepts())

107

In [136]:
corporate_network.edges()[:20]

[(0.9743249, 'London and North Eastern Railway', 'North Eastern Railway (UK)'),
 (0.9730256, 'Caledonian Railway', 'First ScotRail'),
 (0.97047436, 'Great Eastern Railway', 'London and North Eastern Railway'),
 (0.96971184, 'Advanced Micro Devices', 'Intel'),
 (0.9661626, 'Great Northern Railway (Great Britain)', 'Midland Railway'),
 (0.9650179, 'Shogakukan', 'Viz Media'),
 (0.964935,
  'London and North Eastern Railway',
  'Great Northern Railway (Great Britain)'),
 (0.9637322, 'Korean Broadcasting System', 'Seoul Broadcasting System'),
 (0.9636774, 'Vodafone', 'List of mobile network operators of Europe'),
 (0.9593524, 'First Great Western', 'Network Rail'),
 (0.95859057,
  'Great Northern Railway (Great Britain)',
  'North Eastern Railway (UK)'),
 (0.9582207, 'First Great Western', 'Great Western Railway'),
 (0.95611596, 'Hudson Soft', 'Nintendo'),
 (0.95601696, 'London and North Western Railway', 'Midland Railway'),
 (0.94842815,
  'Great Northern Railway (Great Britain)',
  'Londo

We will stop here for now.

In [140]:
corporate_network.neighborhood('Viz Media')

[('Fuji Television', 0.65638936),
 ('Sony Music Entertainment Japan', 0.60574347),
 ('TV Tokyo', 0.78721863),
 ('Avex Group', 0.50190413),
 ('Bandai', 0.5434743),
 ('Kodansha', 0.80848604),
 ('TV Asahi', 0.61470026),
 ('Tokyopop', 0.8365715),
 ('Square Enix', 0.51347756),
 ('Shueisha', 0.9482551),
 ('Shogakukan', 0.9650179),
 ('Viz Media', 1)]