Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



25 Commits

Repository files navigation

I3 Shared Data Processing Scripts

This is shared repository for data processing scripts, with a focus on innovation-related data. 'Processing' in this context could refer to a number of different operations, including (but not limited to):

  • normalisation
  • disambiguation and entity reconciliation
  • web scraping
  • parsing web-scraped data
  • transformation/merging different datasets together
  • standardising datasets
  • deduplication

Adding to the catalog

If you'd like to link some data processing scripts, or upload some, please take a look at our contribution guidelines, and make a pull request using a pull request template. Links to external repositories are added below; uploaded scripts get their own folder.

Using code from this repository

Each separate folder here contains a repository of data processing scripts (or, more commonly a link to one plus a description), contributed by a member of the community. Each repository listed here should be documented to a standard that will let you know how and on what to run it. If you have problems with code files that are hosted in this repository directly, please open a github issue, or a pull request if you correct the issue and would like to amend the documentation. If you're having trouble with an external repository that is linked to by a URL, then raise an issue in that repository.

Patent data

Graph visualizations

Scholarly + scientific data

Benchmarks and other meta-datasets

  • Alaska: A data pipeline benchmark, with profiling data

Other (to review)

  • the Allen NLP Guide - general-purpose
  • linked-uspto-patent-data (rdf), forward43 (social innovation)


No releases published


No packages published