Utility tools to help download and parse patent data made available to the public
Java Other
Switch branches/tags
Nothing to show
Clone or download
bfeldman bfeldman
bfeldman and bfeldman Updated JSOUP library
Latest commit f74d2c1 Jun 26, 2018

README.md

Patent Public Bulk Files

Tool kit to download, read, and utilize open patent data provided to the public.

Notice

This source code is a work in progress and has not been fully vetted for a production environment.

Two main modules

  • Bulk Downloader automates downloading of public bulk patent data
  • Patent Document provides the ability to iterate and read patents directly from the large bulk download files, supports reading patent documents from 1976 to current, which includes Greenbook, SGML, PAP, and all Redbook XML formats, into a normalized Patent Object Model.

Features

  • Download Bulk Patent Grants and Applications, as well as additional resources
  • View individual Patent Documents directly from the large bulk files
  • Read Patent Documents directly from the large bulk files, supports reading patent documents from 1976 to current (formats: Greenbook, SGML, PAP, Redbook XML) into a normalized Patent Object Model
  • Extract Patent Documents from bulk files
  • Normalize and transform Patent data before loading into a data resource
  • Patent Claim Tree to facilitate analysis
  • Update Classifications from Master CPC File (current CPC classification for patents starting from patent number 1)
  • Include classification definitions from CPC Scheme
  • Build a corpus using Corpus Builder, which automates building a corpus by downloading and extracting patents/applications matching specified classifications, one bulk file at a time for a date range.

Public Patent Data

  • Rate of Release: Evey Tuesday, a new bulk file is released, which contains around two to five thousand patents granted on the same day as the release.
  • Releases are available on both the USPTO Bulkdata and Reedtech websites.
  • Receiving changes of patents after publication, note bulk files are not updated once published, updates can be received by indexing additional supplemental files which are also publicly available. The following are fields which periodically update after publication:
    Field Update available
    Assignee daily within Patent Assignment XML Dump files
    Classifications monthly within Master Classification File Dumps

Other Information

The United States Department of Commerce (DOC)and the United States Patent and Trademark Office (USPTO) GitHub project code is provided on an ‘as is’ basis without any warranty of any kind, either expressed, implied or statutory, including but not limited to any warranty that the subject software will conform to specifications, any implied warranties of merchantability, fitness for a particular purpose, or freedom from infringement, or any warranty that the documentation, if provided, will conform to the subject software. DOC and USPTO disclaim all warranties and liabilities regarding third party software, if present in the original software, and distribute it as is. The user or recipient assumes responsibility for its use. DOC and USPTO have relinquished control of the information and no longer have responsibility to protect the integrity, confidentiality, or availability of the information.

User and recipient agree to waive any and all claims against the United States Government, its contractors and subcontractors as well as any prior recipient, if any. If user or recipient’s use of the subject software results in any liabilities, demands, damages, expenses or losses arising from such use, including any damages from products based on, or resulting from recipient’s use of the subject software, user or recipient shall indemnify and hold harmless the United States government, its contractors and subcontractors as well as any prior recipient, if any, to the extent permitted by law. User or recipient’s sole remedy for any such matter shall be immediate termination of the agreement. This agreement shall be subject to United States federal law for all purposes including but not limited to the validity of the readme or license files, the meaning of the provisions and rights and the obligations and remedies of the parties. Any claims against DOC or USPTO stemming from the use of its GitHub project will be governed by all applicable Federal law. “User” or “Recipient” means anyone who acquires or utilizes the subject code, including all contributors. “Contributors” means any entity that makes a modification.

This agreement or any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise, does not in any manner constitute or imply their endorsement, recommendation or favoring by DOC or the USPTO, nor does it constitute an endorsement by DOC or USPTO or any prior recipient of any results, resulting designs, hardware, software products or any other applications resulting from the use of the subject software. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, including USPTO, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC, USPTO or the United States Government.



CC0
To the extent possible under law, https://github.com/USPTO/PatentPublicData has waived all copyright and related or neighboring rights to Patent Public Data. This work is published from: United States.