USPTO Design Patent Search Engine

This project implements a search engine for USPTO design patents based on various criteria. Users can search for design patents by patent title, patent number, inventor(s) name, assignee (owner) name, application date, issue date, and design class (if available).

Introduction

The United States Patent and Trademark Office (USPTO) provides a dataset of design patents, including information about various design patents granted by the USPTO. This project aims to create a search engine that enables users to search for design patents based on specific criteria.

Search Engine Architecture

Flow 1 -> Json Extraction and bulk inserts into postgreSQL and elasticSearch

Flow 2 -> Optimised wildcard search from ElasticSearch with all hits and Pagination

Flow 3 -> Optimised Query to get all the patent metadata from PostgreSQL

Features

Search design patents by patent title, patent number, inventor(s) name, assignee (owner) name, application date, issue date, and design class.
Efficiently parse and store USPTO design patent data.
Optimize search engine performance for large datasets.

Data Extraction

type Patent struct {
	PatentNumber    string         `json:"PatentNumber" gorm:"primaryKey"`
	PatentTitle     string         `json:"PatentTitle"`
	Authors         pq.StringArray `json:"Authors" gorm:"type:text[]"`
	Assignee        string         `json:"Assignee"`
	ApplicationDate string         `json:"ApplicationDate"`
	IssueDate       string         `json:"IssueDate"`
	DesignClass     string         `json:"DesignClass"`
	ReferencesCited pq.StringArray `json:"ReferencesCited" gorm:"type:text[]"`
	Description     pq.StringArray `json:"Description" gorm:"type:text[]"`
}

The fields have been highly optimiised to hold list of data, the extraction has been done refering to the dtd from the USPTO page for design patents.

For extaction, i'd written a script to first unzip all the data and extract the XML files to a folder called all_xml. The second step was to use use encoding/xml and encoding/json to derive all the extracted fields by specifing model structs.

type Inventor struct {
	LastName  string `xml:"addressbook>last-name"`
	FirstName string `xml:"addressbook>first-name"`
}

type UsPatentGrant struct {
	PatentTitle     string      `xml:"us-bibliographic-data-grant>invention-title"`
	PatentNumber    string      `xml:"us-bibliographic-data-grant>publication-reference>document-id>doc-number"`
	Authors         []Inventor  `xml:"us-bibliographic-data-grant>us-parties>inventors>inventor"`
	Assignee        string      `xml:"us-bibliographic-data-grant>us-parties>us-applicants>us-applicant>addressbook>orgname"`
	ApplicationDate CustomTime  `xml:"us-bibliographic-data-grant>application-reference>document-id>date"`
	IssueDate       CustomTime  `xml:"us-bibliographic-data-grant>publication-reference>document-id>date"`
	DesignClass     string      `xml:"us-bibliographic-data-grant>classification-national>main-classification"`
	ReferencesCited []Reference `xml:"us-bibliographic-data-grant>us-references-cited>us-citation,omitempty"`
	Description     Description `xml:"description"`
}

type Reference struct {
	Name string `xml:"patcit>document-id>name"`
}

type Description struct {
	DescriptionDrawings []string `xml:"description-of-drawings>p"`
}

type CustomTime struct {
	Time string `xml:",chardata"`
}

Above you can see the Etree mappings, to extract the data from the xml and map it to the respective json attribute. The xml data was extracted using the NewEncoder method and appended to a combined json file.

Please refer json_generator.go and xml_file_extractor.go

Bulk Insertion

Bulk insertion was done in two places from the combined_json generated from the file extraction with all the metadata. db_bulk_insertion - the file handling the bulk insert into postgres. The code is extremely modular and inserts data according to the specifed schema defined in models. es_bulk_insertion - This file handles chunking of the json_data and effeciently inserting the data into ES_INDEX = design_patents

Performance Optimization

This repository demonstrates a performance-optimized search functionality using Elasticsearch (ES) for Postgres data. The optimization involves a two-step process:

Elasticsearch Indexing

Elasticsearch is utilized to index searchable fields, optimizing search performance.
Only specific searchable fields are stored in Elasticsearch, enhancing efficiency.

Search and Retrieval

Search Process:
- The search process involves querying Elasticsearch for relevant results based on the search query.
Data Retrieval:
- Once search results are obtained, a second query is made to the original data source (e.g., Postgres) using the retrieved unique identifiers (e.g., Patent Number).

This two-step approach minimizes the load on the original data source, enhancing response speed and efficiency. By implementing pagination within the Elasticsearch query and selectively indexing necessary fields, we achieve an efficient search mechanism. Additionally, leveraging Elasticsearch for primary search operations optimizes the overall system's performance.

Search Functionality

The Search engine uses fuzzy logic coupled with ElasticSearch (indexed against a postgres DB) The search engine allows users to search for design patents based on various criteria, including patent title, patent number, inventor(s) name, assignee (owner) name, application date, issue date, and design class (if available).

Getting Started

Prerequisites

To run this project, you need the following prerequisites:

GoLang (v1.20)
PostgreSQL (v12+)
ElasticSearch (v17.17)

Installation

Clone the repository:

git clone https://github.com/yourusername/patent_designs.git
cd patent_designs

Usage

go mod download
go run main.go

Postman Documentation

Postman Documentation Please refer to the API documentation over here.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.idea		.idea
api		api
bulk_insertion		bulk_insertion
config		config
es_bulk_insertion		es_bulk_insertion
file_extraction		file_extraction
pkg		pkg
table_creation		table_creation
.DS_Store		.DS_Store
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
errors.json		errors.json
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USPTO Design Patent Search Engine

Table of Contents

Introduction

Search Engine Architecture

Flow 1 -> Json Extraction and bulk inserts into postgreSQL and elasticSearch

Flow 2 -> Optimised wildcard search from ElasticSearch with all hits and Pagination

Flow 3 -> Optimised Query to get all the patent metadata from PostgreSQL

Features

Data Extraction

Bulk Insertion

Performance Optimization

Elasticsearch Indexing

Search and Retrieval

Search Functionality

Getting Started

Prerequisites

Installation

Usage

Postman Documentation

About

Releases

Packages

Languages

License

Anandsure/patent_design

Folders and files

Latest commit

History

Repository files navigation

USPTO Design Patent Search Engine

Table of Contents

Introduction

Search Engine Architecture

Flow 1 -> Json Extraction and bulk inserts into postgreSQL and elasticSearch

Flow 2 -> Optimised wildcard search from ElasticSearch with all hits and Pagination

Flow 3 -> Optimised Query to get all the patent metadata from PostgreSQL

Features

Data Extraction

Bulk Insertion

Performance Optimization

Elasticsearch Indexing

Search and Retrieval

Search Functionality

Getting Started

Prerequisites

Installation

Usage

Postman Documentation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages