Skip to content

Commit

Permalink
[old datahub][m]: copied over DBLP, YAGO and open corporates datasets…
Browse files Browse the repository at this point in the history
… from old datahub - refs datahubio/datahub-v2-pm#214

Also refs #29
  • Loading branch information
anuveyatsu committed Jul 17, 2018
1 parent 963e928 commit a74198b
Show file tree
Hide file tree
Showing 4 changed files with 94 additions and 1 deletion.
5 changes: 4 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ keywords: Data Collections, Climate Change, Economic Data, Geodata, Inflation, L

The awesome section presents collections of high quality datasets organized by topic.

# Table of Contents
# Collections

* [Bibliographic data](/awesome/bibliographic-data)
* [Climate Change](/awesome/climate-change)
* [Demographics (population)](/awesome/demographics)
* [Economic Data and Indicators](/awesome/economic-data)
Expand All @@ -19,9 +20,11 @@ The awesome section presents collections of high quality datasets organized by t
* [Linked Open Data](/awesome/linked-open-data)
* [Logistics](/awesome/logistics-data)
* [Machine Learning / Statistical](/awesome/machine-learning-data)
* [Open Corporates](/awesome/opencorporates)
* [Property Prices](/awesome/property-prices)
* [Reference Data](/awesome/reference-data)
* [Stock Market Data](/awesome/stock-market-data)
* [War and Peace](/awesome/war-and-peace)
* [Wealth, Income and Inequality](/awesome/wealth-income-and-inequality)
* [World Bank](/awesome/world-bank)
* [YAGO](/awesome/yago)
30 changes: 30 additions & 0 deletions bibliographic-data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
---
title: Bibliographic data
description: Existing databases or services providing substantial bibliographic data
keywords: DBLP data
date: 2018-07-17
modified: 2018-07-17
---

## The DBLP Computer Science Bibliography

The DBLP computer science bibliography contains the metadata of over 1.8 million publications, written by over 1 million authors in several thousands of journals or conference proceedings series.

Although DBLP started with a focus on database systems and logic programming (hence the acronym), it has grown to cover all disciplines of computer science.

### Data

Resources list the full dump of the DBLP XML records (see http://dblp.uni-trier.de/xml/ - a [simple DTD](http://dblp.uni-trier.de/db/about/dblp.dtd) is available.

The paper "[DBLP - Some Lessons Learned](http://dblp.uni-trier.de/xml/docu/dblpxml.pdf)" documents technical details of this XML file. In the appendix ["DBLP XML Requests"][paper] you may find the description of a primitive DBLP API.

[paper]: http://dblp.uni-trier.de/xml/docu/dblpxml.pdf

### Openness: OPEN

As of 2011-12-09 this data is open (relased under ODC-By). See the license information in the [Readme.txt](http://dblp.uni-trier.de/xml/README.txt) and the announce post: http://openbiblio.net/2011/12/09/dblp-releases-its-1-8-million-bibliographic-records-as-open-data/

### Data and Resources

* [DBLP XML records (Full dump in xml (gzipped))](http://dblp.uni-trier.de/xml/dblp.xml.gz)
* [DBLP DTD - The XML file references this DTD.](http://dblp.uni-trier.de/xml/dblp.dtd)
32 changes: 32 additions & 0 deletions opencorporates.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
---
title: OpenCorporates - The Open Database Of The Corporate World
description: Open Database of corporate entities.
keywords: access-nobulk, corporations, database, ecommerce, format-rdf, government, lod, lodcloud-diagram-2011-09-19, no-deref-vocab, opendatachallenge, published-by-third-party, scraped, size.xlarge
date: 2018-07-17
modified: 2018-07-17
---

Open Database of corporate entities. As of 2011-04-09 has information on 7,841,828 companies from around the world. Jurisdictions covered include:

* 41,292 Bermuda
* 3,886,733 United Kingdom
* 96,104 Gibraltar
* 105,640 Isle of Man
* 77,693 Iceland
* 60,827 Jersey
* 92,795 Luxembourg
* 2,188,873 Netherlands
* 97,653 Alaska (US)
* 197,798 District of Columbia (US)
* 996,420 Michigan (US)

There is good API access but currently but no bulk availability.

## License

See https://opencorporates.com/info/licence. However, should note that most data in OpenCorporates is scraped from elsewhere so this license only covers the 'IP' that OpenCorporates has obtained as a result of their efforts (and license of original databases, e.g. Companies House in the UK, is unclear).

## Data and Resources

* [Example JSON record from the API (for Google)](http://opencorporates.com/companies/gb/03977902.json)
* [Example RDF record](http://opencorporates.com/companies/us_ak/124437.rdf)
28 changes: 28 additions & 0 deletions yago.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
title: The DBLP Computer Science Bibliography
description: YAGO3 is a huge semantic knowledge base, derived from Wikipedia WordNet and GeoNames.
keywords: YAGO, ckanupload.esw.200910, crossdomain, format-rdf, linkeddata, lod, lodcloud-diagram-2011-09-19, lodcloud-diagram-2014-08-30, no-deref-vocab, no-license-metadata, no-provenance-metadata, ontology, published-by-producer
date: 2018-07-17
modified: 2018-07-17
---

YAGO3 is a huge semantic knowledge base, derived from Wikipedia WordNet and GeoNames. Currently, YAGO3 has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.

## Data and Resources

* [The entire YAGO in RDF/TTL/Turtle format](http://resources.mpi-inf.mpg.de/yago-naga/yago3.1/yago3.1_entire_ttl.7z)
* [The entire YAGO in TSV format](http://resources.mpi-inf.mpg.de/yago-naga/yago3.1/yago3.1_entire_tsv.7z)
* [Schema of YAGO in TTL/Turtle/RDF](http://resources.mpi-inf.mpg.de/yago-naga/yago3.1/yagoSchema.ttl.7z)
* [rdf:type facts of YAGO in RDF/TTL/Turtle](http://resources.mpi-inf.mpg.de/yago-naga/yago3.1/yagoTypes.ttl.7z)
* [Taxonomy of YAGO in RDF/TTL/Turtle](http://resources.mpi-inf.mpg.de/yago-naga/yago3.1/yagoTaxonomy.ttl.7z)

Go to the Web page of YAGO to check individual downloads: [Simplified taxonomy, multilingual, links to DBpedia, geonames, WordNet](http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/). These include:

* TAXONOMY: All types of entities, and the class structure of YAGO2s. Moreover, it has formal definitions of YAGO relations.
* SIMPLETAX: An alternative, simpler taxonomy of YAGO.
* CORE: Core facts of YAGO2s, such as the facts between entities, the facts containing literals, i.e., numbers, dates, strings, etc.
* GEONAMES: Geographical entities, classes taken from GeoNames.
* META: Temporally and spatially scoped facts together with statistics and extraction sources about the facts.
* MULTILINGUAL: The multilingual names for entities.
* LINK: The connection of YAGO2s to Wordnet, DBPedia, etc.
* OTHER: Miscellaneous features of YAGO2s, such as Wikipedia in-outlinks, etc.

0 comments on commit a74198b

Please sign in to comment.