The purpose of this API Wrapper is to extend the functionality of the PatentsView API. The wrapper can take in a list of values (such as patent numbers), retrieve multiple data points, and then convert and merge the results into a CSV file.
- Clone or download this repository
- Install dependencies
- Modify the configuration file "config.cfg" file to point to your input file(s) and specify your queries.
- Run the API Wrapper using Python 3
git clone https://github.com/CSSIP-AIR/PatentsView-APIWrapper.git
pip install -r requirements.txt
cd PatentsView-APIWrapper
python api_wrapper.py
The PatentsView API Wrapper reads in query specifications from the configuration file "config.cfg", in which you may specify where the input list of values to query can be found.
Input files must be text files containing a list of values (such as patent numbers), each separated by a new line. The file "sample_file.txt" provides an example of the correct format.
Specify queries in the configuration file "config.cfg". To do so, modify the required and optional parameters to point to the input file and specify the fields and criteria applied to the search.
-
[QUERY_NAME]: defines the query that will be made. Multiple queries may be specified, as shown in the example configuration file "config.cfg".
-
entity: the type of object that will be returned. Must be one of:
["patents", "inventors", "assignees", "locations", "cpc_subsections", "uspc_mainclasses", "nber_subcategories"]
-
directory: the folder containing the input list of values to query
-
input_file: the filename of the input list of values to query. The input file should be a text file with a list of values separated by newlines.
-
input_type: the type of value in the input_file. The full lists of input_types can be found in the PatentsView API Documentation. Common input types include:
["patent_number", "inventor_id", "assignee_id", "cpc_subsection_id", "location_id", "uspc_mainclass_id"]
-
fields: a list of fields to be included in the resulting output
Optional parameters can be commented out with a hash sign (#) or deleted if not in use.
-
sort: the fields and directions over which the output file will be sorted. This should be specified as an array of JSON objects. For example:
To sort just by patent number (ascending):
sort = [{"patent_number": "asc"}]
To sort first by patent_date (descending), and then by patent title (ascending):
sort = [{"patent_date": "desc"}, {"patent_title":, "asc"}]
-
criteria1, criteria2, ... : allow for additional criteria to be applied to the query. Multiple criteria are combined with AND operators, but a single criterion may be written using an OR operator with multiple criteria. For example:
To limit results to patents from Jan. 1, 2014 to Dec. 31, 2016.
criteria1 = {"_gte":{"patent_date":"2014-01-01"}} criteria2 = {"_lte":{"patent_date":"2016-12-31"}}
To limit results to patents before Jan. 1, 2014 OR after Dec. 31, 2016.
criteria1 = {"_or":[{"_lt":{"patent_date":"2014-01-01"}, {"_gt":{"patent_date":"2016-12-31"}]}
A full syntax guide for specifying criteria can be found at the PatentsView Query Language page.
This example will query the patents endpoint for each patent_number in "C:/path/to/input_file/sample_file.txt" for patents from 2015 or earlier. The resulting output will be called "QUERY1.csv", with "patent_number", "patent_title", and "patent_date" sorted by the patent_number column.
[QUERY1]
entity = "patents"
input_file = "sample_file.txt"
directory = "C:/path/to/input_file"
input_type = "patent_number"
fields = ["patent_number", "patent_title", "patent_date"]
criteria1 = {"_lte":{"patent_date":"2015-12-31"}}
# criteria2 =
sort = "patent_number"
The API wrapper is currently compatible with Python 3.