# HTML Report Template

This document shows the use of **pv_evaluation** to automatically report on a disambiguation's performance.

The key assumption is that the disambiguation algorithm has resulted in a table with one row for each inventor mention and with the following five columns:
- **mention_id**: the inventor mention ID, in the form of US<patent_number>-<sequence_number>, such as "US6584128-0".
- **patent_id**: the inventor mention's patent number
- **inventor_id**: the inventor unique id assigned by the disambiguation algorithm
- **name_first**: the inventor mention's first name
- **name_last**: the inventor mention's last name

For instance, we can use the "rawinventor.tsv" file from [PatentsView's bulk data download](https://patentsview.org/download/data-download-tables) as containing the current inventor disambiguation. Below, this dataset is downloaded and the "mention_id" field is added from patent numbers and inventor sequence numbers:

In [1]:
import pandas as pd
import wget
import zipfile
import os

if not os.path.isfile("rawinventor.tsv"):
    wget.download("https://s3.amazonaws.com/data.patentsview.org/download/rawinventor.tsv.zip")
    with zipfile.ZipFile("rawinventor.tsv.zip", 'r') as zip_ref:
        zip_ref.extractall(".")
    os.remove("rawinventor.tsv.zip")
    rawinventor = pd.read_csv("rawinventor.tsv", sep="\t")
    rawinventor["mention_id"] = "US" + rawinventor.patent_id.astype(str) + "-" + rawinventor.sequence.astype(str)
    rawinventor.to_csv("rawinventor.tsv", sep="\t")

rawinventor = pd.read_csv("rawinventor.tsv", sep="\t", dtype=str, nrows=5)
rawinventor[["mention_id", "inventor_id", "patent_id", "name_first", "name_last"]]

Unnamed: 0,mention_id,inventor_id,patent_id,name_first,name_last
0,US6584128-0,fl:ri_ln:kroeger-1,6584128,Richard,Kroeger
1,US4789863-0,fl:th_ln:bush-1,4789863,Thomas A.,Bush
2,US11161990-1,fl:ma_ln:boudreaux-4,11161990,Matthew F.,Boudreaux
3,US6795487-1,fl:ge_ln:whitworth-1,6795487,Gerald,Whitworth
4,USD474886-0,fl:th_ln:fleming-3,D474886,Thomas W.,Fleming


## Rendering Report

We can now generate the report using the `render_inventor_disambiguation_report()` function. The results are saved to the current folder ".".

Note that, if we wish to compare multiple disambiguations, then we can add more files to the list `summary_table_files`. Each file in this list should be a table with the five columns "mention_id", "inventor_id", "patent_id", "name_first", and "name_last".

In [2]:
from pv_evaluation.templates import render_inventor_disambiguation_report

summary_table_files = ["rawinventor.tsv"]

render_inventor_disambiguation_report(".", summary_table_files=summary_table_files)


Starting python3 kernel...Done

Executing 'index.ipynb'
  Cell 1/22...Done
  Cell 2/22...Done
  Cell 3/22...Done
  Cell 4/22...Done
  Cell 5/22...Done
  Cell 6/22...Done
  Cell 7/22...Done
  Cell 8/22...Done
  Cell 9/22...Done
  Cell 10/22...Done
  Cell 11/22...Done
  Cell 12/22...Done
  Cell 13/22...Done
  Cell 14/22...Done
  Cell 15/22...Done
  Cell 16/22...Done
  Cell 17/22...Done
  Cell 18/22...Done
  Cell 19/22...Done
  Cell 20/22...Done
  Cell 21/22...Done
  Cell 22/22...Done

[1mpandoc --output index.html[22m
  to: html
  standalone: true
  self-contained: true
  section-divs: true
  html-math-method: mathjax
  wrap: none
  default-image-extension: png
  toc: true
  toc-depth: 3
  
[1mmetadata[22m
  document-css: false
  link-citations: true
  date-format: long
  lang: en
  title: Inventor Disambiguation Report
  date: today
  author: PatentsView-Evaluation
  toc-location: left
  jupyter: python3
  theme: cosmo
  fig-cap-location: margin
  code-block-border-left: '#31BAE9'
