<a href="https://colab.research.google.com/github/GabeAspir/Patent-Prior-Art-Finder/blob/main/documentation/Stage_1_Documentation_The_Official_Patent_Prior_Art_Finder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#The Patent Prior Art Finder

### By Gabriel Aspir, Zach Fish, Ephraim Meiri<br>
### mentored by Dave Feltenberger

### Our Goal:<br>
When given a huge dataset of patents, we want to be able to compare and contrast them in meaningful ways, as well as compare them to new patents.<br>
Real World Example: Let's say somebody wants to get themselves their very own patent. They have to do research first to make sure that they're idea is patentable, and not similar to any other patents currently out there, for if it was, a possible lawsuit could occur.

# Demonstration & Explanation

First, let's import our Patent-Prior-Art-Finder as an object. (Currently, only available to those with access to the github repo to download the file itself into Colab)

In [None]:
from _DevPatentPriorArtFinder import _DevPatentPriorArtFinder as d
patentFinder = d()

Now let's see the types of things we can do with 100 patents. These were already retrieved from Google's patent database and saved as a csv.

The `init()` method will create a pandas dataframe with the passed csv file (or filepath rather) and will add the neccisary metadata to that dataframe. The method has 2 optional parameters that allow the user to specify the names of the column that conains the identifier of each document as well as the column that has the text we should use for the comparison.

**Tokens** contains a list of tokenized words for each document. This representation redies us for the bag of words that is to come by splitting the words, removing the least meaningful words (numbers, as well as words taken from nltk's stopwords list)

**BagOfWords** will convert that tokenized text into vectors that will be truly machine readable. This is acccomplished by constructing a corpus across all our documents and then representing each document's wordcount for each word in an dense ordered list. This vector representation allows us to easily perform mathematical operations for computing similarity.

**TF-IDF** or Term Frequency Inverse Document Frequency, is a statistical measure that evaluates how relevant a word is to a document in a collection of documents. It also utilizes vectors to allow a document to become machine readable. TF-IDF works by multiplying two metrics: how many times a word appears in a document, and the inverse document frequency of the word across a set of documents.

In [None]:
url = 'https://drive.google.com/file/d/1WPm941QT8m0bYJwZd_eY3SHjgcD6k2Z_/view?usp=sharing'
file_id = url.split('/')[-2]
dwn_url='https://drive.google.com/uc?id=' + file_id

dataframe = patentFinder.init(dwn_url, 'Publication_Number', 'Abstract')
dataframe

Unnamed: 0,Publication_Number,Abstract,description,claim,Tokens,BagOfWords,TF-IDF
0,US-8524855-B1,"Environmentally-friendly, biodegradable polyol...",CROSS-REFERENCE TO RELATED APPLICATIONS \n ...,We claim: \n \n 1. A method for maki...,"[environmentally, friendly, biodegradable, pol...","[1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, ...","[0.08689000350920928, 0.13232293952150875, 0.0..."
1,US-2017096783-A1,"A light reference system includes a first, sec...",CROSS-REFERENCE TO RELATED APPLICATION \n ...,"1 . A light reference system, comprising:\n a ...","[light, reference, system, includes, first, se...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
2,US-2019313201-A1,A system and method for externalizing sound. T...,BACKGROUND \n The disclosure relates to me...,"1 . A sound externalization system, comprising...","[system, method, externalizing, sound, system,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
3,US-2011175459-A1,A method for controlling and supplying power t...,The invention relates to a method for controll...,1 . Method for controlling and supplying power...,"[method, controlling, supplying, power, least,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
4,US-6692884-B2,A positive photoresist composition comprises: ...,FIELD OF THE INVENTION \n The present inve...,What is claimed is: \n \n 1. A posi...,"[positive, photoresist, composition, comprises...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
...,...,...,...,...,...,...,...
95,US-2017072818-A1,A seat reclining device includes a first brack...,TECHNICAL FIELD \n The present invention r...,1 . A seat reclining device comprising:\n a fi...,"[seat, reclining, device, includes, first, bra...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
96,US-2020184075-A1,An image forming apparatus to which an externa...,The entire disclosure of Japanese patent Appli...,What is claimed is: \n \n 1 . An i...,"[image, forming, apparatus, external, storage,...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
97,US-2010032961-A1,A wind turbine generator that allows for size ...,RELATED APPLICATIONS \n The present applic...,1 . A wind turbine generator that generates el...,"[wind, turbine, generator, allows, size, weigh...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ..."
98,US-4040872-A,A process for the strengthening of carbon stee...,This invention is directed to a process for th...,I claim: \n \n 1. A method for the s...,"[process, strengthening, carbon, steels, where...","[0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0782135304980..."


Using this data we can create a table representing the Jaccard Similarity between each and every patent. This realtivly simple metric measures the number of shared words relative to the total number of words in each document.

In [None]:
patentFinder.jaccardTable(dataframe)

Unnamed: 0,Publication_Number,US-8524855-B1,US-2017096783-A1,US-2019313201-A1,US-2011175459-A1,US-6692884-B2,US-5054994-A,US-9377224-B2,US-6237425-B1,US-2015380718-A1,US-2019156033-A1,US-2015158911-A1,US-2013246983-A1,US-10500666-B2,US-9887224-B2,US-2014018583-A1,US-2015035579-A1,US-10267841-B2,US-7439349-B2,US-10582237-B2,US-8854099-B1,US-2004245209-A1,US-2005094096-A1,US-2008048410-A1,US-3982652-A,US-6622902-B2,US-2009323086-A1,US-10742279-B2,US-2009224292-A1,US-4755421-A,US-4432138-A,US-2019151795-A1,US-7622310-B2,US-2001031846-A1,US-9622405-B2,US-5192958-A,US-2016109890-A1,US-10633787-B1,US-10233093-B2,US-7956235-B2,...,US-2020197726-A1,US-2003017285-A1,US-5170141-A,US-2014091645-A1,US-2010311401-A1,US-6098185-A,US-2014276522-A1,US-2013202617-A1,US-10453181-B2,US-7126647-B2,US-4355109-A,US-7565836-B2,US-PP30555-P2,US-4084873-A,US-3875565-A,US-6719650-B1,US-10909218-B2,US-2009288140-A1,US-6294928-B1,US-6438026-B2,US-7257204-B2,US-2016209929-A1,US-2013243609-A1,US-2013206255-A1,US-4177870-A,US-2020135384-A1,US-2012271456-A1,US-2016031761-A1,US-2002181953-A1,US-2015224267-A1,US-2014174613-A1,US-8046694-B1,US-2010071178-A1,US-8989241-B2,US-2018351330-A1,US-2017072818-A1,US-2020184075-A1,US-2010032961-A1,US-4040872-A,US-4122433-A
0,US-8524855-B1,1.000000,0.000000,0.000000,0.022727,0.013514,0.009346,0.028986,0.023256,0.014925,0.022727,0.000000,0.000000,0.014706,0.000000,0.052083,0.024096,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.013158,0.022472,0.000000,0.025316,0.023810,0.038095,0.000000,0.011111,0.014085,0.016129,0.012658,0.000000,...,0.022989,0.000000,0.013514,0.012195,0.014493,0.000000,0.000000,0.010989,0.011765,0.011905,0.000000,0.015625,0.000000,0.009804,0.000000,0.000000,0.024390,0.020000,0.009804,0.025316,0.029851,0.000000,0.000000,0.013699,0.015625,0.014493,0.014925,0.015625,0.000000,0.000000,0.014085,0.000000,0.000000,0.000000,0.000000,0.012987,0.000000,0.012500,0.020408,0.000000
1,US-2017096783-A1,0.000000,1.000000,0.083333,0.093333,0.000000,0.063830,0.050000,0.012658,0.016949,0.108108,0.041667,0.057971,0.051724,0.081967,0.021978,0.026667,0.040816,0.000000,0.111111,0.092105,0.012500,0.017241,0.000000,0.054054,0.028986,0.052632,0.083333,0.045455,0.012195,0.000000,0.089552,0.012987,0.020202,0.022727,0.024691,0.049180,0.078431,0.125000,0.043478,...,0.094595,0.000000,0.063492,0.041667,0.033333,0.112903,0.021277,0.000000,0.054054,0.013158,0.000000,0.055556,0.000000,0.055556,0.027778,0.037975,0.085714,0.068182,0.055556,0.042857,0.033898,0.088235,0.024691,0.047619,0.017857,0.087719,0.052632,0.036364,0.050000,0.086957,0.049180,0.057471,0.000000,0.030303,0.098361,0.111111,0.061538,0.042857,0.000000,0.042553
2,US-2019313201-A1,0.000000,0.083333,1.000000,0.071429,0.000000,0.028571,0.044118,0.023256,0.014925,0.097561,0.000000,0.025316,0.029851,0.042254,0.030612,0.036585,0.017241,0.014493,0.073171,0.045977,0.034884,0.000000,0.000000,0.011765,0.025974,0.046154,0.042857,0.040541,0.022472,0.000000,0.012500,0.023810,0.000000,0.031579,0.011111,0.028571,0.032787,0.066667,0.012658,...,0.047059,0.000000,0.027397,0.050633,0.029412,0.054795,0.037037,0.000000,0.011765,0.011905,0.014925,0.015625,0.000000,0.030000,0.064935,0.022727,0.090909,0.040816,0.061856,0.012500,0.061538,0.064935,0.022472,0.042254,0.015625,0.029412,0.030303,0.015625,0.033708,0.050633,0.028571,0.030928,0.012500,0.055556,0.027397,0.026316,0.013158,0.025316,0.000000,0.009524
3,US-2011175459-A1,0.022727,0.093333,0.071429,1.000000,0.012821,0.037037,0.056338,0.022222,0.028571,0.080460,0.011628,0.036585,0.042857,0.068493,0.029412,0.059524,0.032787,0.013699,0.069767,0.067416,0.033333,0.028986,0.000000,0.084337,0.024691,0.058824,0.013158,0.080000,0.010638,0.000000,0.049383,0.022727,0.018018,0.040816,0.032609,0.070423,0.015152,0.063291,0.037037,...,0.068966,0.013158,0.067568,0.060976,0.027778,0.051948,0.052632,0.000000,0.034483,0.098765,0.014085,0.045455,0.011765,0.070000,0.036145,0.021739,0.100000,0.081633,0.028846,0.024096,0.028169,0.023810,0.010638,0.054054,0.061538,0.057143,0.074627,0.045455,0.054945,0.048193,0.027027,0.040000,0.024096,0.052632,0.067568,0.064935,0.051948,0.049383,0.009709,0.037736
4,US-6692884-B2,0.013514,0.000000,0.000000,0.012821,1.000000,0.000000,0.016949,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.046512,0.013699,0.021277,0.000000,0.040541,0.000000,0.026316,0.000000,0.014286,0.013514,0.000000,0.017857,0.000000,0.015385,0.012658,0.000000,0.014493,0.013514,0.020833,0.000000,0.000000,0.016667,0.000000,0.000000,0.000000,...,0.012987,0.000000,0.000000,0.000000,0.017241,0.015385,0.000000,0.012500,0.000000,0.027778,0.000000,0.000000,0.014286,0.000000,0.014286,0.000000,0.000000,0.000000,0.033708,0.014493,0.000000,0.014286,0.000000,0.000000,0.000000,0.000000,0.036364,0.038462,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.015873,0.000000,0.000000,0.000000,0.011364,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
95,US-2017072818-A1,0.012987,0.111111,0.026316,0.064935,0.000000,0.075269,0.032787,0.066667,0.052632,0.051282,0.013514,0.042857,0.089286,0.081967,0.010870,0.040541,0.040816,0.000000,0.052632,0.024691,0.012500,0.035088,0.013699,0.054054,0.059701,0.016949,0.031746,0.029851,0.012195,0.017857,0.106061,0.012987,0.020202,0.011236,0.012195,0.049180,0.078431,0.090909,0.028571,...,0.065789,0.015625,0.063492,0.071429,0.016393,0.061538,0.000000,0.000000,0.026316,0.013158,0.000000,0.055556,0.000000,0.067416,0.013699,0.025000,0.041096,0.032967,0.021505,0.073529,0.016667,0.013699,0.037500,0.064516,0.036364,0.107143,0.071429,0.036364,0.090909,0.056338,0.103448,0.033708,0.013889,0.030303,0.116667,1.000000,0.029851,0.042857,0.000000,0.042553
96,US-2020184075-A1,0.000000,0.061538,0.013158,0.051948,0.000000,0.031250,0.033333,0.025974,0.000000,0.065789,0.000000,0.028571,0.016949,0.065574,0.010989,0.013333,0.000000,0.000000,0.053333,0.064935,0.025641,0.000000,0.000000,0.040541,0.029412,0.035088,0.015873,0.030303,0.012346,0.000000,0.028571,0.013158,0.000000,0.034884,0.012346,0.032787,0.038462,0.028986,0.014286,...,0.066667,0.000000,0.031250,0.027778,0.051724,0.014925,0.000000,0.012195,0.054795,0.041096,0.000000,0.037037,0.000000,0.021739,0.000000,0.025316,0.086957,0.021978,0.021739,0.014085,0.016949,0.028169,0.012346,0.015625,0.000000,0.033898,0.035088,0.000000,0.050633,0.027778,0.032787,0.034091,0.028571,0.046875,0.031250,0.029851,1.000000,0.014085,0.011111,0.031915
97,US-2010032961-A1,0.012500,0.042857,0.025316,0.049383,0.000000,0.040404,0.000000,0.024691,0.067797,0.036585,0.026316,0.027027,0.032258,0.029851,0.010526,0.012658,0.038462,0.000000,0.037500,0.011765,0.012048,0.016393,0.026667,0.038462,0.027778,0.000000,0.014925,0.014085,0.011765,0.034483,0.055556,0.025316,0.009709,0.033333,0.000000,0.046875,0.017544,0.027397,0.027397,...,0.012048,0.000000,0.060606,0.012987,0.015625,0.014085,0.020000,0.000000,0.025316,0.012658,0.000000,0.000000,0.000000,0.065217,0.000000,0.024096,0.012821,0.031915,0.010309,0.027027,0.032258,0.000000,0.011765,0.045455,0.052632,0.015625,0.050000,0.016949,0.035714,0.012987,0.000000,0.010638,0.000000,0.059701,0.044776,0.042857,0.014085,1.000000,0.000000,0.010000
98,US-4040872-A,0.020408,0.000000,0.000000,0.009709,0.011364,0.000000,0.024096,0.000000,0.000000,0.009709,0.010417,0.010638,0.012195,0.011494,0.017699,0.010204,0.000000,0.000000,0.000000,0.009615,0.009804,0.025316,0.010526,0.010101,0.000000,0.000000,0.000000,0.022472,0.009615,0.000000,0.000000,0.020408,0.033613,0.009009,0.039604,0.011765,0.000000,0.021739,0.021739,...,0.000000,0.023529,0.000000,0.000000,0.024390,0.000000,0.000000,0.009524,0.010101,0.000000,0.012346,0.025974,0.000000,0.008621,0.000000,0.000000,0.010309,0.000000,0.017391,0.010638,0.000000,0.010526,0.009615,0.011494,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.008850,0.021505,0.000000,0.000000,0.000000,0.011111,0.000000,1.000000,0.008403


The same data allows us to construct Cosine Similarity Table, with the help of the scikit learn package. Cosine similarity's added complexity is introduced to attempt to account for the inaccuracy that arises when increased document length gives the illusion of greater similarity despite the proportion of shared words not being greater. ([Example](https://www.machinelearningplus.com/nlp/cosine-similarity/))

In [None]:
cosineTable1 = patentFinder.cosineTable(dataframe)
cosineTable1

Publication_Number,US-8524855-B1,US-2017096783-A1,US-2019313201-A1,US-2011175459-A1,US-6692884-B2,US-5054994-A,US-9377224-B2,US-6237425-B1,US-2015380718-A1,US-2019156033-A1,US-2015158911-A1,US-2013246983-A1,US-10500666-B2,US-9887224-B2,US-2014018583-A1,US-2015035579-A1,US-10267841-B2,US-7439349-B2,US-10582237-B2,US-8854099-B1,US-2004245209-A1,US-2005094096-A1,US-2008048410-A1,US-3982652-A,US-6622902-B2,US-2009323086-A1,US-10742279-B2,US-2009224292-A1,US-4755421-A,US-4432138-A,US-2019151795-A1,US-7622310-B2,US-2001031846-A1,US-9622405-B2,US-5192958-A,US-2016109890-A1,US-10633787-B1,US-10233093-B2,US-7956235-B2,US-7113967-B2,...,US-2020197726-A1,US-2003017285-A1,US-5170141-A,US-2014091645-A1,US-2010311401-A1,US-6098185-A,US-2014276522-A1,US-2013202617-A1,US-10453181-B2,US-7126647-B2,US-4355109-A,US-7565836-B2,US-PP30555-P2,US-4084873-A,US-3875565-A,US-6719650-B1,US-10909218-B2,US-2009288140-A1,US-6294928-B1,US-6438026-B2,US-7257204-B2,US-2016209929-A1,US-2013243609-A1,US-2013206255-A1,US-4177870-A,US-2020135384-A1,US-2012271456-A1,US-2016031761-A1,US-2002181953-A1,US-2015224267-A1,US-2014174613-A1,US-8046694-B1,US-2010071178-A1,US-8989241-B2,US-2018351330-A1,US-2017072818-A1,US-2020184075-A1,US-2010032961-A1,US-4040872-A,US-4122433-A
Publication_Number,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1
US-8524855-B1,1.000000,0.000000,0.000000,0.030330,0.052316,0.012760,0.051472,0.020656,0.004451,0.039290,0.000000,0.000000,0.011308,0.000000,0.065859,0.019017,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.006040,0.031072,0.000000,0.041547,0.078756,0.067463,0.000000,0.010271,0.013955,0.078606,0.005727,0.000000,0.000000,...,0.047665,0.000000,0.020715,0.014435,0.005719,0.000000,0.000000,0.017465,0.036868,0.005913,0.000000,0.198272,0.000000,0.013902,0.000000,0.000000,0.027199,0.009728,0.011714,0.011632,0.044706,0.000000,0.000000,0.017465,0.019295,0.023428,0.033499,0.013745,0.000000,0.000000,0.009586,0.000000,0.000000,0.000000,0.000000,0.007764,0.000000,0.011195,0.034095,0.000000
US-2017096783-A1,0.000000,1.000000,0.049970,0.062174,0.000000,0.068006,0.011724,0.005646,0.001825,0.057989,0.160863,0.148127,0.064903,0.209314,0.011572,0.192321,0.307278,0.000000,0.113813,0.230719,0.003717,0.008877,0.000000,0.032684,0.047078,0.060871,0.196959,0.024762,0.004246,0.000000,0.085167,0.032289,0.072604,0.008989,0.033686,0.028606,0.263188,0.267647,0.022780,0.002959,...,0.304296,0.000000,0.046709,0.008877,0.077377,0.192496,0.010582,0.000000,0.120923,0.014546,0.000000,0.205155,0.000000,0.031347,0.039381,0.008501,0.104076,0.019942,0.040821,0.149425,0.061096,0.054578,0.015847,0.014320,0.015821,0.444232,0.131846,0.033812,0.024129,0.063288,0.247597,0.027437,0.000000,0.032784,0.386807,0.261009,0.124208,0.032128,0.000000,0.104605
US-2019313201-A1,0.000000,0.049970,1.000000,0.076382,0.000000,0.061430,0.021180,0.015300,0.003297,0.128046,0.000000,0.010849,0.075378,0.050796,0.020906,0.037562,0.052870,0.034021,0.107481,0.188427,0.033576,0.000000,0.000000,0.021472,0.042526,0.054986,0.084721,0.044736,0.015343,0.000000,0.010258,0.025000,0.000000,0.037892,0.015215,0.031009,0.043667,0.106039,0.012347,0.026729,...,0.030261,0.000000,0.030686,0.042767,0.021180,0.034021,0.095590,0.000000,0.102404,0.004380,0.016667,0.006993,0.000000,0.036038,0.155230,0.010238,0.080582,0.018014,0.073749,0.002872,0.044151,0.161988,0.033402,0.045275,0.057166,0.013014,0.059549,0.030542,0.038749,0.041162,0.031951,0.041307,0.072655,0.076151,0.026537,0.023002,0.005219,0.033168,0.000000,0.005250
US-2011175459-A1,0.030330,0.062174,0.076382,1.000000,0.006268,0.021401,0.071259,0.013199,0.010664,0.233472,0.025071,0.011699,0.016256,0.058427,0.022544,0.109362,0.021379,0.044023,0.084657,0.177326,0.021724,0.083010,0.000000,0.111138,0.038520,0.044469,0.027407,0.063676,0.004963,0.000000,0.079640,0.010783,0.020206,0.021013,0.024609,0.107000,0.003139,0.038420,0.023965,0.017294,...,0.055474,0.011561,0.094305,0.031129,0.054814,0.039131,0.049477,0.000000,0.053004,0.198365,0.021567,0.031672,0.016075,0.116584,0.016739,0.016560,0.108618,0.027972,0.030875,0.005574,0.042848,0.077465,0.015436,0.041847,0.129452,0.036488,0.102742,0.171260,0.021937,0.050304,0.004594,0.088194,0.065810,0.054744,0.060093,0.052088,0.057398,0.042919,0.005446,0.013586
US-6692884-B2,0.052316,0.000000,0.000000,0.006268,1.000000,0.000000,0.053183,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.058327,0.013100,0.018438,0.000000,0.039113,0.000000,0.112410,0.000000,0.011567,0.007488,0.000000,0.009588,0.000000,0.012481,0.010701,0.000000,0.014309,0.011625,0.034853,0.000000,0.000000,0.014419,0.000000,0.000000,0.000000,0.000000,...,0.014071,0.000000,0.000000,0.000000,0.005909,0.010546,0.000000,0.009023,0.000000,0.018330,0.000000,0.000000,0.086646,0.000000,0.009023,0.000000,0.000000,0.000000,0.024206,0.056086,0.000000,0.019649,0.000000,0.000000,0.000000,0.000000,0.020767,0.056808,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.012340,0.000000,0.000000,0.000000,0.011743,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
US-2017072818-A1,0.007764,0.261009,0.023002,0.052088,0.000000,0.089999,0.017539,0.092909,0.008190,0.072294,0.008022,0.113796,0.124838,0.130864,0.011541,0.104978,0.153232,0.000000,0.050306,0.122935,0.016682,0.092962,0.006866,0.271151,0.070430,0.034149,0.080679,0.014818,0.019058,0.159037,0.101929,0.006901,0.036206,0.026895,0.037796,0.034237,0.349541,0.168592,0.034080,0.022134,...,0.171235,0.044390,0.044468,0.061975,0.031570,0.081385,0.000000,0.000000,0.028266,0.007254,0.000000,0.127400,0.000000,0.068212,0.026780,0.016956,0.044486,0.014917,0.025146,0.111772,0.009140,0.023328,0.075075,0.112475,0.047338,0.323310,0.086295,0.050583,0.060164,0.060596,0.176387,0.044467,0.016044,0.035033,0.271024,1.000000,0.038892,0.020599,0.000000,0.082593
US-2020184075-A1,0.000000,0.124208,0.005219,0.057398,0.000000,0.039061,0.009550,0.053654,0.000000,0.065606,0.000000,0.008153,0.006294,0.067861,0.005237,0.052926,0.000000,0.000000,0.119398,0.077235,0.015139,0.000000,0.000000,0.052440,0.121436,0.030990,0.044566,0.006723,0.005765,0.000000,0.026979,0.006262,0.000000,0.024407,0.005717,0.038837,0.010938,0.015937,0.009278,0.008034,...,0.098542,0.000000,0.103768,0.016069,0.219646,0.079537,0.000000,0.009721,0.174428,0.069119,0.000000,0.010510,0.000000,0.011606,0.000000,0.007694,0.287637,0.018952,0.081500,0.002158,0.174185,0.047633,0.017929,0.009721,0.000000,0.035860,0.055937,0.000000,0.043678,0.092795,0.010671,0.049665,0.021839,0.101734,0.036560,0.038892,1.000000,0.037387,0.006326,0.011835
US-2010032961-A1,0.011195,0.032128,0.033168,0.042919,0.000000,0.050781,0.000000,0.060895,0.023618,0.041698,0.023134,0.017272,0.020001,0.033696,0.008321,0.044851,0.031565,0.000000,0.044639,0.006818,0.008018,0.019149,0.029703,0.025638,0.040622,0.000000,0.010116,0.005342,0.018320,0.050961,0.042868,0.019901,0.007458,0.025854,0.000000,0.037026,0.005793,0.020258,0.014742,0.000000,...,0.006022,0.000000,0.064120,0.012766,0.045523,0.004514,0.022828,0.000000,0.032607,0.010460,0.000000,0.000000,0.000000,0.055326,0.000000,0.030562,0.016036,0.012906,0.020720,0.010287,0.026359,0.000000,0.005698,0.030892,0.102388,0.020720,0.065179,0.036469,0.023134,0.021844,0.000000,0.014797,0.000000,0.050515,0.021124,0.020599,0.037387,1.000000,0.000000,0.006268
US-4040872-A,0.034095,0.000000,0.000000,0.005446,0.011743,0.000000,0.041079,0.000000,0.000000,0.007055,0.011743,0.004384,0.050762,0.034208,0.025342,0.005692,0.000000,0.000000,0.000000,0.006921,0.032560,0.038881,0.010051,0.006507,0.000000,0.000000,0.000000,0.032537,0.009299,0.000000,0.000000,0.040406,0.075714,0.019685,0.147542,0.037588,0.000000,0.015425,0.079821,0.019440,...,0.000000,0.021660,0.000000,0.000000,0.015405,0.000000,0.000000,0.031361,0.008275,0.000000,0.040406,0.067816,0.000000,0.012481,0.000000,0.000000,0.008140,0.000000,0.015776,0.003481,0.000000,0.008537,0.017352,0.007840,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.005007,0.017614,0.000000,0.000000,0.000000,0.006326,0.000000,1.000000,0.006363


As we can see by simply looking at the diagonals where a patent is compared against itself that both methods give maximum similarity. 

Lets take a closer look at some of the other results: 

- Cosine and Jaccard Similarity between 2 unlike patents

In [None]:
print(dataframe.at[76,'Abstract'])
print(dataframe.at[34,'Abstract'])

A system and method for providing content consumption data to users in a multi-device environment. Activity data from a plurality of UE devices associated with a subscriber account are obtained when one or more users tied to the subscriber account consume content on one or more UE devices. The activity data may be correlated with one or more pieces of information relating to the consumed content. When a journal request is received from a user operating a UE device associated with the subscriber account, a response is generated containing data for presentation in a journal format that includes correlated subscriber activity data for the subscriber account over a select period of time.
Overall length of LED print bars arranged in a single pass printing system is controlled by controlling the operating temperature of the LED arrays forming the light emitting portion of the print bar. Rather than trying to maintain the arrays at some predetermined temperature, the array temperatures are al

In [None]:
patent1 = dataframe.at[76,'BagOfWords']
patent2 = dataframe.at[34,'BagOfWords']

In [None]:
patentFinder.cosineSimilarity(patent1,patent2)

0.014712247158412491

In [None]:
patentFinder.jaccardSimilarity(patent1, patent2)

0.022988505747126436

Now for comparison, 2 similar patents:

In [None]:
# First, we need to find some similar patents in this data.
r=0
for index,row in cosineTable1.iterrows():       
  n=0
  for entry in row:
     if type(entry) is not str and entry < .99 and entry >= .6:
       print(entry)
       print(index)
       print(dataframe['Publication_Number'][n])
       print("col: " + str(n))
       print("row: "+ str(r))
       print(dataframe['Abstract'][n])
       print(dataframe['Abstract'][r])
       print()
     n+=1
  r+=1

0.6491261742122825
US-2015380718-A1
US-2004129268-A1
col: 58
row: 8
A medicament container ( 10 ) is disclosed which comprises at least two compartments ( 21, 22 ). Each of the compartments ( 21, 22 ) is adapted to contain a dose of a medicament in powdered form and is provided with at least one opening ( 23 ) through which, in use, the medicament can be dispensed. The container ( 10 ) is preferably cylindrical in form, the diameter of the cylinder being greater than its depth such that the container ( 10 ) has the form of a squat drum.
A terminal holder is provided for holding a connection terminal ( 20 ) for at least three round cells ( 14, 15, 16 ). The terminal holder has a centrally encircling rib ( 11 ) for placement of the terminal holder ( 10 ) on the round cells ( 14, 15, 16 ) and for spacing the connection terminal ( 20 ) apart from the round cells ( 14, 15, 16 ). An elastic lip ( 12 ) is arranged at the end for clamping the terminal holder ( 10 ) between the round cells ( 14

In [None]:
cosineTable1['US-2004129268-A1'][8]
cosineTable1.columns[8]
cosineTable1['US-2004129268-A1']['US-2015380718-A1']
dataframe['Abstract'][8]
# dataframe['Abstract']['US-2015380718-A1']

'A terminal holder is provided for holding a connection terminal ( 20 ) for at least three round cells ( 14, 15, 16 ). The terminal holder has a centrally encircling rib ( 11 ) for placement of the terminal holder ( 10 ) on the round cells ( 14, 15, 16 ) and for spacing the connection terminal ( 20 ) apart from the round cells ( 14, 15, 16 ). An elastic lip ( 12 ) is arranged at the end for clamping the terminal holder ( 10 ) between the round cells ( 14, 15, 16 ). A plastic head ( 13 ) is arranged opposite the lip ( 12 ), for fastening the terminal holder ( 10 ) on the connection terminal ( 20 ).'

In [None]:
dataframe['Abstract'][58]

'A medicament container ( 10 ) is disclosed which comprises at least two compartments ( 21, 22 ). Each of the compartments ( 21, 22 ) is adapted to contain a dose of a medicament in powdered form and is provided with at least one opening ( 23 ) through which, in use, the medicament can be dispensed. The container ( 10 ) is preferably cylindrical in form, the diameter of the cylinder being greater than its depth such that the container ( 10 ) has the form of a squat drum.'

In [None]:
print(cosineTable1.columns[8].t)


US-2015380718-A1


In [None]:
print(dataframe.at[])

# Findings & Observations