# Patent Impact Calculation according to Kelly et al. (2021)

- **Novel** patents are distinct from their predecessors

- **Impactful** patents influence future scientific advances, manifested as high similarity with subsequent innovations/patents

- An **important** patent is both novel **and** impactful

-> An **important** patent is **dissimilar** to previous patents and **similar** to future patents

### Novelty: 

Novel patents should be distinct from earlier patents, i.e., a patent is novel if its similarity to prior patents is low. The authors suggest to measure novelty by calculating the Backward Similarity (BS), which is the cumulative similarity between a focal patent $p$ and all patents filed $5$ years prior to $p$. 
A novel patent should have a low BS score, as it is dissimilar to previous patents. 

Kelly et al. calculate the BS the following:
$$
BS_{j}^{\tau} = \sum_{i \in \mathcal{B}_{j, \tau}} p_{j, i}
$$
where $\mathcal{B}{j, \tau}$ denotes a set of prior patents filed $\tau = 5$ years before patent $j$.

### Impact

Impactful patents should be similar to future patents, i.e., a patent is impactful if its similarity to future patents is high. This is calculated as the Forward Simiarlity (FS), which is the cumulative similarity between a focal patent $p$ and all patents filed $10$ years after $p$.

FS is calculated the following:

$$
FS_{j}^{\tau} = \sum_{i \in \mathcal{F}_{j, \tau}} p_{j, i}
$$

where $\mathcal{F}{j, \tau}$ denotes the set of patents filed $\tau = 10$ years after patent $j$. 

### Importance
An important patent should be novel and impactful, i.e., the BS score should be low and the FS score should be high. This can be expressed by taking the ratio between BS and FS:

$$
q_{j}^{\tau} = \frac{FS_{j}^{\tau}}{BS_{j}^{\tau}}
$$

The value will be high if BS is low and FS is high.

**And here is the issue**: 

The number of patents in the sets $\mathcal{B}_{j, \tau}$ and $\mathcal{F}_{j, \tau}$ varies for each patent. For a patent $p$ the number of patents in $\mathcal{B}{j, \tau}$ or $\mathcal{F}_{j, \tau}$ may be high, which leads to a high BS or FS (or vice versa), even though each individual similarity between $p$ and all other patents in these sets is low. Therefore, the number of patents in each set heavily influences the BS and FS scores, which in return determines the importance score. On the other hand, the actual individual similarity scores used to calculate FS and BS do not influence the importance scores as they are supposed to. 

### Solution:

An easy way to prevent this from happening, is to normalize the FS and BS scores by the length of the sets $\mathcal{B}{j, \tau}$ and $\mathcal{F}{j, \tau}$, i.e., the number of patents $\tau = 5$ years before and $\tau = 10$ years after the focal patent. This corresponds to taking the average. The resulting formulas are the following:


$$
BS_{j}^{\tau} = \frac{1}{| \mathcal{B}_{j, \tau} |} \cdot \sum_{i \in \mathcal{B}_{j, \tau}} p_{j, i}
$$

and

$$
FS_{j}^{\tau} = \frac{1}{| \mathcal{F}_{j, \tau} |} \cdot \sum_{i \in \mathcal{F}_{j, \tau}} p_{j, i}
$$

That way, the importance scores between patents are comparable and the actual similarity scores between patents influence the importance value of a patent. 

### Patent Similarity
The patent similarity is calculated based on the TFIBDF score. 
for a patent $p$ and a term $w$, the TFBIDF score is calculated the following:
$$
    TFBIDF_{w, p} = TF_{w, p} \cdot BIDF_{w}
$$
where 
$$
    TF_{w, p} = \frac{c_{p,w}}{\sum_{k}{}c_{p,k}}
$$
and
$$
    BIDF_{w, p} = \log \left( \frac{\# \text{ patents prior to } p}{1 + \# \text{ documents prior to } p \text{ that include term } w} \right)
$$

To calculate the similarity $p_{i, j}$ between patent $i$ and $j$, create two vectors $V_i$ and $V_j$ with the size of the union of terms in patents $i$ and $j$. In this vectors, store the TFBIDF scores for each term $w$ in $i$ and $j$ respectively. Finally, normalize the vectors and calculate the cosine similarity between the two vectors. 

$$
    p_{i, j} = V_{i} \cdot V_{j}
$$