The [Concept Insights API](http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/concept-insights.html) is an IBM Bluemix -hosted service which exposes the powerful conceptual association and mapping technology implicit in the [IBM Watson](http://en.wikipedia.org/wiki/IBM Watson) supercomputer to a wider developer audience.

The `watsongraph` module is a high-level abstraction of this service. At its core is the `ConceptModel`, a Concept Insights -queried partial local reconstruction of the `wikipedia/en-20120601` cognitive graph, a machine-learned mapping of Wikipedia articles to concepts to nodes to model specifically constructed by IBM for Concept Insights purposes. This allows for an interesting alternative to the usual link-based method of exploring the topical connections between Wikipedia articles, one based not on "dumb" links scattered, but on "smart" learned relationships observed by IBM Watson.

This notebook is a basic demonstration of the module's capacities. It focuses in particular on the methods provided by the `ConceptModel` object. For a demonstration of this module's application as a recommendation engine see the [Recommendations notebook](./watsongraph - Recommendations.ipynb). The sequel to this notebook is the [Advanced Concept Modeling notebook](./watsongraph - Advanced Concept Modeling.ipynb): you should read this one first.

In [7]:
from watsongraph.conceptmodel import ConceptModel
import requests
import random
import time

## Instantiation

To create a new `ConceptModel` object we pass a list of concepts to be initialized.

We can use the `concepts()` method to check up on how many concepts we have inserted so far.

In [8]:
ibm = ConceptModel(['IBM'])
ibm.concepts()

['IBM']

At this point `ibm` is a `ConceptModel` with a single node in it: `IBM`. Now we shall extrapolate forward using `explode()`: this will run every concept in the model (in this case, just the one) through the `Concept Insights` API and attach the resultant graphs to our existing model.

In [9]:
ibm.explode()
len(ibm.concepts())

37

Use `edges()` to list relations amongst concepts in order of relevance.

In [10]:
ibm.edges()

[(0.89564085, 'IBM', 'Digital Equipment Corporation'),
 (0.8213564, 'IBM', 'X86'),
 (0.8081631, 'IBM', 'Fortran'),
 (0.80571836, 'IBM', 'Solaris (operating system)'),
 (0.803906, 'IBM', 'SQL'),
 (0.79933375, 'Supercomputer', 'IBM'),
 (0.79717726, 'IBM', 'X86-64'),
 (0.79349726, 'Advanced Micro Devices', 'IBM'),
 (0.780642, 'IBM', 'Sun Microsystems'),
 (0.7744718, 'IBM', 'Oracle Corporation'),
 (0.7431917, 'IBM', 'Operating system'),
 (0.7338766, 'IBM', 'Microprocessor'),
 (0.7300315, 'IBM', 'Unix'),
 (0.6878544, 'IBM', 'Compiler'),
 (0.6814177, 'IBM', 'Cloud computing'),
 (0.65513045, 'IBM', 'Berkeley Software Distribution'),
 (0.6541496, 'IBM', 'Intel'),
 (0.6436416, 'IBM', 'ARM architecture'),
 (0.62924486, 'Server (computing)', 'IBM'),
 (0.6270959, 'IBM', 'Hewlett-Packard'),
 (0.62568736, 'IBM', 'FreeBSD'),
 (0.6176665, 'IBM', 'Central processing unit'),
 (0.60758543, 'IBM', 'X Window System'),
 (0.60449404, 'Java (programming language)', 'IBM'),
 (0.59871125, 'IBM', 'MySQL'),
 (0.5

## Expansion

The Concept Insights API returns its results in order of their relevance to the one at hand, but between the size of Wikipedia and the depth IBM Watson's own cognitive understanding this can result in unmanageably thousands of articles. To keep the information firehose at a manageable level the Concept Insights service two parameters which are passed through by `watsongraph` graph-expansion methods:
* `limit`: The maximum number of concepts to be returned. Can be any `int`. Throttled to 50 by default.
* `level`: The popularity threshold of the articles that will be returned, on a 0 (highest) to 5 (lowest) scale. Throttled to 0 by default.

The basic `explode()` command is a level-0 query. What happens when we play with the levels a bit?

In [11]:
microsoft = ConceptModel(['Microsoft'])
microsoft.explode(limit=2000, level=1)
len(microsoft.concepts())

555

In [12]:
apple = ConceptModel(['Apple Inc.'])
apple.explode(limit=2000, level=5)
len(apple.concepts())

2000

The `microsoft` query took about 3 seconds to run, the `apple` query a little more than twice that. Notice that `apple` hit our `limit`: letting such a full-depth full-breadth command run in full would take several minutes and generate tens of thousands of results. Level-5 queries in general are mostly academic: as the following introspection shows they mostly generate a lot of junk.

In [13]:
apple.edges()[:20]

[(0.99995303, 'Apple Inc.', 'Apple Corporation'),
 (0.999906, 'Apple Computers Incorporated', 'Apple Inc.'),
 (0.99985904, "Apple's", 'Apple Inc.'),
 (0.99981207, 'Apple Inc.', 'Apple Inc. Slogans'),
 (0.9997651, 'Apple Inc.', 'Spruce Text List'),
 (0.99971807, 'Apple, inc.', 'Apple Inc.'),
 (0.9996711, 'Apple Inc.', 'Big fruit'),
 (0.9995853, 'Apple Inc.', 'FireWave'),
 (0.9995427, 'AppleShare IP Migration', 'Apple Inc.'),
 (0.99950033, 'Apple Inc.', 'QTSS Publisher'),
 (0.9994586, 'Apple Inc.', 'Nightmare 6'),
 (0.99941695, 'Apple Inc.', 'Machspeed'),
 (0.9993754, 'Apple Inc.', 'Apple Pugetsound Program Library Exchange'),
 (0.9993339, 'The Sheffield Institute for the Recording Arts', 'Apple Inc.'),
 (0.99929255, 'Nidomain', 'Apple Inc.'),
 (0.9992513, '/// Cheers!', 'Apple Inc.'),
 (0.99921036, 'Val Golding', 'Apple Inc.'),
 (0.9991697, 'Apple Inc.', 'William Martens'),
 (0.9991291, 'Apple Inc.', 'Cards (iOS)'),
 (0.9990886, 'Power Macintosh (second generation)', 'Apple Inc.')]

`Apple Corporation`, the top result, is just a redirect to `Apple Inc`, as are several other top results. The `Sheffield Institute` is a minimally relevant article dominated by "Apple certification". Compare these outcomes to the much better `microsoft` ones:

In [14]:
microsoft.edges()[:20]

[(0.996133, 'Microsoft', 'Windows Live'),
 (0.9779484, '.NET Framework version history', 'Microsoft'),
 (0.97519916, 'Microsoft', 'Microsoft Office 2007'),
 (0.97378683, 'Windows Live Messenger', 'Microsoft'),
 (0.9682485, 'Microsoft', 'List of Microsoft Windows components'),
 (0.96686125, 'Windows Server 2008', 'Microsoft'),
 (0.9649106, 'Microsoft', 'Windows Presentation Foundation'),
 (0.96461135, 'Microsoft Exchange Server', 'Microsoft'),
 (0.96434224, 'Microsoft', 'Windows NT 4.0'),
 (0.9642824, 'Microsoft', 'Microsoft Visual Studio'),
 (0.9629393, 'Hotmail', 'Microsoft'),
 (0.9623143, 'Windows Server 2003', 'Microsoft'),
 (0.9609486, 'Features new to Windows Vista', 'Microsoft'),
 (0.9600303, 'Microsoft SharePoint', 'Microsoft'),
 (0.95917296, 'Windows 8', 'Microsoft'),
 (0.9589959, 'Active Directory', 'Microsoft'),
 (0.957729, 'Microsoft', 'Comparison of Microsoft Windows versions'),
 (0.957582, 'List of features removed in Windows Vista', 'Microsoft'),
 (0.9536003, 'Microsoft',

So far we've stuck to `explode()`; let's now explore the other two graph expansion methodologies provided by `watsongraph`.

First up, `augment()`. Although we started with `explode()`, `augment()` is actually the simplest of the three. It simply takes the concept you give it, runs it by the API, and merges the resulting graph into the existing one. That concept need not already be present in the graph!

Again, remember that `augment()` takes the same `limit` and `level` parameters as `explode()`, with the same defaults (50 and 0).

In [15]:
ibm.augment('Watson (computer)')
len(ibm.concepts())

62

Since `Watson (computer)` is a concept that's pretty closely related to `IBM`, we expect their nodes to overlap quite a bit.

Let's see by how much with an application of the `neighborhood()` command, which lists every concept directly connected to the chosen one.

In [16]:
ibm.neighborhood('Watson (computer)')

[(1, 'Watson (computer)'),
 (0.92557955, 'Rensselaer Polytechnic Institute'),
 (0.9230226, 'Artificial intelligence'),
 (0.91526705, 'Supercomputer'),
 (0.91326, 'IBM'),
 (0.893338, 'Wikipedia'),
 (0.84179896, 'Index of robotics articles'),
 (0.81731814, 'Game show'),
 (0.8082598, 'Association for Computing Machinery'),
 (0.8067764, 'Web search engine'),
 (0.8037912, 'Cognition'),
 (0.7920823, 'Semantics'),
 (0.779158, 'Institute of Electrical and Electronics Engineers'),
 (0.7453888, 'Carnegie Mellon University'),
 (0.7209754, 'Cloud computing'),
 (0.67921543, 'SQL'),
 (0.6643771, "List of minor The Hitchhiker's Guide to the Galaxy characters"),
 (0.66132975, 'Metadata'),
 (0.6553445, 'Hard disk drive'),
 (0.64754194, 'University of Massachusetts Amherst'),
 (0.63217145, 'Consciousness'),
 (0.61441857, 'Advanced Micro Devices'),
 (0.61389816, 'Sun Microsystems'),
 (0.5969063, 'Fortran'),
 (0.5954265, "Places in The Hitchhiker's Guide to the Galaxy"),
 (0.5881132, 'Database'),
 (0.5800

In [17]:
ibm_n = [t[1] for t in ibm.neighborhood('IBM')]
watson_n = [t[1] for t in ibm.neighborhood('Watson (computer)')]
watsonian_only_club = [c for c in ibm.concepts() if c in watson_n and c not in ibm_n]
watsonian_only_club

['2011 in the United States',
 'Artificial intelligence',
 'Association for Computing Machinery',
 'Carnegie Mellon University',
 'Cognition',
 'Computer program',
 'Computer science',
 'Consciousness',
 'Database',
 'Game show',
 'Index of robotics articles',
 'Institute of Electrical and Electronics Engineers',
 'List of game show hosts',
 'List of members of the National Academy of Engineering',
 "List of minor The Hitchhiker's Guide to the Galaxy characters",
 'Metadata',
 "Places in The Hitchhiker's Guide to the Galaxy",
 'Rensselaer Polytechnic Institute',
 'Robotics',
 'Semantics',
 'Syntax',
 'University of Massachusetts Amherst',
 'Web search engine',
 'Wikipedia']

Next let's try `expand()`. Expand is like a more focused `explode()`: it expands every concept in the model which has fewer than `n` edges, where `n` is an optional argument which defaults to 1. To summarize:

* `augment()` can net us more results much more quickly, but it only works on a single node.
* `expand()` works on as many as you let it.
* `explode()` works on *all* of them.

Again, the same `level` and `limit` parameters are present in all three of these methods.

In [18]:
ibm.expand()
len(ibm.concepts())

517

Of course at this point we are expanding concepts like "Wikipedia" and "Game show". So our results start to get pretty far away from where we started!

In [19]:
ibm.edges()[:20]

[(0.9824447,
  "Places in The Hitchhiker's Guide to the Galaxy",
  "List of minor The Hitchhiker's Guide to the Galaxy characters"),
 (0.9777767, 'C Sharp (programming language)', '.NET Framework'),
 (0.97736937, 'Intel', 'X86-64'),
 (0.9750814, 'List of linguists', 'Syntax'),
 (0.97479653, 'MySQL', 'PHP'),
 (0.9732626, 'Troy, New York', 'Rensselaer Polytechnic Institute'),
 (0.971852, 'Database', 'SQL'),
 (0.96971184, 'Advanced Micro Devices', 'Intel'),
 (0.96823686, 'Berkeley Software Distribution', 'FreeBSD'),
 (0.9644456, 'List of linguists', 'Semantics'),
 (0.9643342, 'MySQL', 'SQL'),
 (0.96209294, 'Unix-like', 'Berkeley Software Distribution'),
 (0.9582793, 'Unix-like', 'X Window System'),
 (0.95488656, 'Central processing unit', 'X86-64'),
 (0.95279807, 'Object-oriented programming', 'Ruby (programming language)'),
 (0.95201236, 'Robotics', 'Index of robotics articles'),
 (0.95163786, 'Unix', 'Berkeley Software Distribution'),
 (0.95121133, 'Unix-like', 'Unix'),
 (0.9511001, 'Un

## Deconstruction

So far we've talked at length about constructing graphs, but there may be times when you want to deconstruct them instead. There are two ways to do so. The simple way is by calling `remove()` on a single concept.

In [20]:
microsoft.add('Microsoft')
microsoft.remove('Microsoft')
len(microsoft.concepts())

554

The more complex but "funner" way is to call `abridge()`, which is an exact inverse operation to `augment()` (with the same parameters). Remember how we defined the microsoft model? The following result shouldn't come as a surprise!

In [21]:
microsoft.abridge('Microsoft', limit=2000, level=1)
len(microsoft.concepts())

0

## Concept titles

Remember that any input we send to `watsongraph` should correspond *exactly* with the title of a Wikipedia article. Doing otherwise won't cause the command to fail, technically, but it will result in different output and cause problems down the road.

For example notice the difference between the following two constructions. Here `Bayes' theorem` is the proper name of a [Wikipedia article](https://en.wikipedia.org/wiki/Bayes'_theorem), while `Bayes' law`, a commonly-used synonym, is merely a [redirect](https://en.wikipedia.org/w/index.php?title=Bayes%27_Law&redirect=no) to the same.

In [22]:
one_bayes = ConceptModel(["Bayes' law"])
one_bayes.explode()
len(one_bayes.concepts())

38

In [23]:
another_bayes = ConceptModel(["Bayes' theorem"])
another_bayes.explode()
len(another_bayes.concepts())

26

We got different output! If you inspect the resultant models you'll also discover that the former is a far closer match than the latter (Watson processes redirects directly, resulting in somewhat arbitrary output).

You can ensure that you don't have this problem in one of two ways:
* By always making sure yourself that any input you use corresponds exactly with the title of a Wikipedia article (obviously).
* Or by using the `watsongraph`-provided correction method, the static `conceptualize` method.

In [24]:
from watsongraph.node import conceptualize
[conceptualize("Bayes' theorem"),
 conceptualize("Bayes' law"),
 conceptualize("Bayes"),
 conceptualize("Met"),
 conceptualize("IBM Watson"),
 conceptualize("Smithsonian Museum")]

["Bayes' theorem",
 "Bayes' theorem",
 'Thomas Bayes',
 'Metropolitan Opera',
 'Watson (computer)',
 'Smithsonian Institution']

## Serialization

`watsongraph` provides default bindings for saving to and loading from `JSON`. These are in the form of the alternating `to_json()` and `load_from_json(data_repr)` object methods, and are useful for saving and loading your models, if so desired.

In [25]:
ibm = ConceptModel(['IBM'])
ibm.explode()
ibm_data_repr = ibm.to_json()
ibm_data_repr

{'directed': False,
 'graph': {'name': 'compose( ,  )'},
 'links': [{'source': 0, 'target': 11, 'weight': 0.57220286},
  {'source': 32, 'target': 11, 'weight': 0.6176665},
  {'source': 27, 'target': 11, 'weight': 0.5745319},
  {'source': 26, 'target': 11, 'weight': 0.7338766},
  {'source': 30, 'target': 11, 'weight': 0.7300315},
  {'source': 1, 'target': 11, 'weight': 0.60758543},
  {'source': 28, 'target': 11, 'weight': 0.79717726},
  {'source': 11, 'target': 2, 'weight': 0.5351206},
  {'source': 11, 'target': 12, 'weight': 0.56133187},
  {'source': 11, 'target': 13, 'weight': 0.6436416},
  {'source': 11, 'target': 7, 'weight': 0.8081631},
  {'source': 11, 'target': 3, 'weight': 0.89564085},
  {'source': 11, 'target': 4, 'weight': 0.5565874},
  {'source': 11, 'target': 14, 'weight': 0.5598469},
  {'source': 11, 'target': 29, 'weight': 0.80571836},
  {'source': 11, 'target': 5, 'weight': 0.79349726},
  {'source': 11, 'target': 20, 'weight': 0.62924486},
  {'source': 11, 'target': 15, '

In [26]:
ibm_copy = ConceptModel()
ibm_copy.load_from_json(ibm_data_repr)
len(ibm.edges())

36

That's all!

For more advanced `ConceptModel` patterns see the [Advanced Concept Modeling notebook](./watsongraph - Advanced Concept Modeling.ipynb).

For an even higher-level, built-in abstraction of the `ConceptModel` object see the [Recommendations notebook](./watsongraph - Recommendations.ipynb).