FAQ

Rob Speer edited this page Nov 3, 2016 · 9 revisions

Here are answers to some frequently-asked questions, updated for ConceptNet 5.5.

The basics

What is ConceptNet?

ConceptNet is a knowledge graph of things people know and computers should know, expressed in various natural languages. See the main page for more details.

Is ConceptNet an AI? Can I talk to it?

ConceptNet is a resource. You can use it as part of making an AI that understands the meanings of words people use.

ConceptNet is not itself a chatbot. Some chatbot systems have used ConceptNet as a resource, but this is not a primary use case that ConceptNet is designed for.

How can I see what ConceptNet knows?

You can browse the knowledge graph at http://www.conceptnet.io/.

How do I use ConceptNet in my own code?

We recommend starting with the Web API. If you need a greater flow of information than the Web API provides, then consider downloading the data.

Comparisons to other projects

How does ConceptNet compare to WordNet?

This is an interesting comparison to make, as the projects have similar goals, and by now they both make use of multilingual linked data.

ConceptNet contains more kinds of relationships than WordNet. ConceptNet's vocabulary is larger and interconnected in many more ways. In exchange, it's somewhat messier than WordNet.

ConceptNet does only the bare minimum to distinguish word senses so far -- in the built graph of ConceptNet 5.5, word senses are only distinguished by their part of speech (similar to sense2vec). WordNet has a large number of senses for every word, though some of them are difficult to distinguish in practice.

WordNet is too sparse for some applications. You can't build word vectors from WordNet alone. You can't compare nouns to verbs in WordNet, because they are mostly unconnected vocabularies.

ConceptNet does not assume that words fall into "synsets", sets of synonyms that are completely interchangeable. Synonymy in ConceptNet is a relation like any other. If you've worked with WordNet, you may have been frustrated by the implications of the synset assumption.

In ConceptNet, we incorporate as much of WordNet as we can while undoing the synset assumption, and we give it a high weight, because the information in WordNet is valuable and usually quite accurate.

How does ConceptNet compare to the Google Knowledge Graph?

ConceptNet is open. The Knowledge Graph isn't.

The Knowledge Graph seems to focus largely on things you can buy and things you can look up on Wikipedia. In ConceptNet we try to focus on words with general meanings, and much less on named entities. We want to understand all nouns, verbs, adjectives, and adverbs, not just proper nouns.

How does ConceptNet compare to the Microsoft Concept Graph?

ConceptNet is open. The Microsoft Concept Graph isn't. (You can download it, but their Terms of Use say you can't do anything with it: derivative works and commercial applications are forbidden.)

The Microsoft Concept Graph is a taxonomy of English nouns, connected with the "IsA" relation, with some automatic word sense disambiguation. Its data comes from machine reading of a Web search index. It resembles an automatically-generated version of OpenCyc.

How does ConceptNet compare to DBPedia?

DBPedia is very much focused on named entities. It's considerably messier than ConceptNet. Its edges are denser but its nodes are sparser: only terms that are titles of Wikipedia articles are represented in ConceptNet.

ConceptNet imports a small amount of DBPedia, and also contains links to DBPedia and Wikidata.

How does ConceptNet compare to DBnary?

DBnary is a counterpart to DBPedia that's actually quite compatible with ConceptNet. Like ConceptNet, it focuses on word definitions rather than named entities, and it gets them from parsing Wiktionary.

Right now we use our own Wiktionary parser, which covers fewer Wiktionary sites than DBnary does but extracts more detail from each entry. We would gladly use DBnary instead, if DBnary starts extracting information such as links from definitions.

How does ConceptNet compare to (Open)Cyc?

Cyc is an ontology built on a predicate logic representation called CycL. CycL can enable very precise reasoning in a way that machine learning over ConceptNet doesn't. However, Cyc is intolerant of errors, and adding information to Cyc is a difficult task.

OpenCyc provides a hierarchy of types of things, with English names, some of which are automatically generated. It seems to be intended as a preview of the full Cyc system, which is not open.

Knowledge representation

How many statements (edges) are there in ConceptNet?

Approximately 28 million.

Does ConceptNet use logical predicates?

No. Its representation is words and phrases of natural language, and relations between them. Natural language can be vague, illogical, and incredibly useful.

How many languages is ConceptNet in?

The data that ConceptNet is built from spans a lot of different languages, with a long tail of marginally-represented languages. 10 languages have core support, and 304 languages are supported in total. See Languages for a complete list.

ConceptNet is missing facts.

This will always be true. We use machine-learning techniques, including word embeddings, to learn generalizable things from ConceptNet despite the incompleteness of the knowledge it contains.

ConceptNet contains false information.

There will probably always be isolated mistakes or falsehoods in ConceptNet. Our data sources and our processes are not perfect. Machine learning can be relatively robust against errors, as long as the errors are not systematic.

If you've identified a systematic source of errors in ConceptNet, that is more important. It would probably improve ConceptNet to get rid of it. In that case, please go to the 'Issues' tab and describe it in an issue report.

What are the relations represented in ConceptNet? What do they mean?

See the table on the Relations page of this wiki.

Where do the edge weights in ConceptNet come from?

Made-up numbers that are programmed into the reader modules that import various sources of knowledge. These weights represent a rough heuristic of which statements you should trust more than other statements.

Can I add new information to ConceptNet?

During the golden age of crowdsourcing (the decade of the 2000s), ConceptNet accepted direct contributions of knowledge. This was a great start, but now the opportunities for improving ConceptNet have changed, and we are content to leave crowdsourcing to the organizations that are really good at it, like the Wikimedia Foundation.

If you contribute to Wiktionary and follow their guidelines, the information you contribute will eventually be represented in ConceptNet.

What I mean is, can I make my own version of ConceptNet that includes information that I need in my domain?

Well, you can reproduce ConceptNet's build process using Docker and change the code to import a new source of data. This may or may not accomplish what you want.

What ConceptNet is designed for is representing general knowledge. Making a useful domain-specific semantic model is a rather different process, in our experience. The software we built on top of ConceptNet to make this possible eventually became our company, Luminoso. Luminoso provides software as a service that creates domain-specific semantic models, which make use of ConceptNet so they can start out knowing what words mean and just have to learn what's different in your domain.

Technologies

What kind of database does ConceptNet use?

We've tried a lot of them. Currently PostgreSQL.

Why not a graph database? Why not [insert new database name here]?

Probably one of the following reasons:

  • It isn't as efficient as PostgreSQL
  • It doesn't actually work as advertised
  • It is no longer maintained
  • It doesn't provide a documented workflow for importing a medium-sized graph such as ConceptNet
  • It takes more than a day to import a medium-sized graph such as ConceptNet
  • It inflates the size of the data it stores by a factor of more than 10
  • It assumes every user has access to and wants to use a distributed computing cluster
  • It doesn't run well inside a container
  • It's not free software
  • It has an unacceptable restriction on it that would prevent people from reusing ConceptNet, such as the GPL or "academic use only"

If you think you know of a database that doesn't fail one of these criteria, I'd still be interested to hear about it.

Is ConceptNet "big data"?

It fits on a hard disk, so no. It's enough data for many purposes. But text is small.

If you have textual knowledge that actually requires distributed computation, you work at a company that does Web search.

Is there a graph visualization of ConceptNet?

You're asking about a visualization like this, right?

Notice that that graph is a few thousand times smaller than ConceptNet and it's already an incomprehensible rainbow-colored hairball. I am not convinced there's a technology that exists that can put all of ConceptNet in one meaningful image, although there may be an approach that involves spreading it out into local clusters using t-SNE.

It will almost certainly involve custom code -- ConceptNet makes off-the-shelf graph visualizers collapse under the insoluble problem of laying out its edges. I'm interested in making such a visualization, but the result has to be informative, not just a hairball.

Can ConceptNet be queried using SPARQL?

Heck no. SPARQL is computationally infeasible. A SPARQL endpoint is a denial-of-service attack. Similar projects that use SPARQL have unacceptable latency and go down whenever anyone starts using them in earnest.

The way to query ConceptNet is using a rather straightforward REST API, described on the API page. If you need to make a form of query that this API doesn't support, open an issue and we'll look into supporting it.

AI hype

I heard that ConceptNet has the intelligence of a 4-year-old, is this true?

Blame science reporting for doing what it usually does. There's a nugget of truth in there surrounded by a big wad of meaningless AI hype. It's true that ConceptNet 4 could compete with 4-year-olds on a particular question-answering task -- and ConceptNet 5 performs much better on a similar task. This is cool. It doesn't mean that anyone's about to make robot children.

Here's the background: A much older version of ConceptNet, ConceptNet 4, was evaluated on some intelligence tests involving question-answering and sentence comprehension. The researchers who performed these tests compared ConceptNet's performance to a 4-year-old child.

We found the comparison odd but flattering. 4-year-old children are incredible beings. They have desires, goals, and imagination, and they can communicate them in their spoken language with a level of competence that second-language learners have to put tremendous effort into achieving. No real AI system can come close to emulating the range of things a child can do.

When it comes to the narrower task of answering questions, though, it's believable that ConceptNet 4 compared to a 4-year-old. We're always interested in measurably improving the general intelligence contained in ConceptNet. Excitingly, we now have a question-answering task in which ConceptNet 5 compares to a 17-year-old: that of answering SAT-style analogy questions.

But there is much more to be done. The Story Cloze Test is a test of story understanding that any human can score close to 100% on in their native language. Natural language AI systems, including ConceptNet, have not yet surpassed 60% on this test.