# Abstract

Companies heavily invest in research and development in the current competitive environment to promote innovation, which is frequently 
measured by the number of patents obtained. The sheer number of patents, however, might not always be a reliable sign of ground-breaking or creative contributions to the subject. This study investigates whether businesses that acquire a greater number of patents, scaled by their R&D expenditure, necessarily yield inventions that are innovative. Two metrics will be used to assess how innovative a patent is: originality, which indicates the novelty and distinctness of the patented invention, and citation rate, which shows the impact and acknowledgment of the patent in later research. These findings are vital for businesses looking to foster true innovation and offer insights into the effectiveness of R&D investments.


# Introduction

In the United States alone, businesses spent $538 billion on research and development – and that was just in 2020. R&D spending has been steadily increasing {cite:p}`he2018exploratory`. Extensive Research and Development spending is closely correlated with an increase in Patent counts {cite:p}`trajtenberg1990penny`. On the surface, this looks like a good thing, however, the great variability in patents and their respective novelness means simple patent counts cannot be informative enough about innovative output {cite:t}`trajtenberg1990penny` to be generalized. This begs the question, is this spending really paying off?

This issue was previously addressed in 2001 by Bronwyn H. Hall, who examined the U.S. patent database, presenting 30 years of patenting trends and citation-based measures, while addressing challenges in interpreting citation counts {cite:p}`hall2001nber`. This research presents a valuable perspective on assessing the true value of patents {cite:p}`hall2001nber`. Professor Zhaozhao He similarly used patents in his 2018 analysis of corporate innovation - an increasingly important topic that has attracted significant attention from academic researchers in financial economics in recent years {cite:p}`he2018exploratory`. This is where the concept of weighing patent counts by citations becomes extremely useful. In academic and research circles, the frequency with which a paper is cited often reflects its influence and significance. Similarly, in the patent world, the number of times a patent is cited can be a testament to its value and the innovation it represents. Therefore, instead of focusing solely on raw patent numbers, considering the number of citations each patent receives might provide a clearer picture of its innovative worth. Integrating a citation-based approach allows this analysis to scrutinize whether a large patent portfolio equates to significant innovations. Employing citation frequency and the degree of originality as guiding markers or proxies, this examination endeavors to discern whether there's a concrete relationship between the size of a company's patent repertoire and the depth and breadth of its contributions to its field.

Motivated by this I propose a framework of analysis for answering this question. The first criterion is narrowing down the dataset to consider patents from companies in similar industries, ensuring a level field of comparison. This helps to check that technological constraints and opportunities are uniform across the dataset (ie. The pharmaceutical industry and the tech industry would likely have different patenting trends). The four steps of this process are industry segmentation, normalization by Research and Development expenditure, an originality assessment, and a comparative Analysis. First, in analyzing Patent data, patents will be organized based on industries. Next, the number of patents issued to a company per unit of R&D expenditure will be calculated – providing a ratio of spending efficiency. Then the combination of previous patents the examined patent cited and the other patent cited by it will be examined. The last step will be to compare companies in terms of their patent numbers.

Technological innovation is the main driver for economic growth, making spending on innovation and R&D a valuable indicator of progress {cite:p}`mohamed2022technological`. Thus, companies have been forced to boost their R&D expenditures as they compete to innovate, frequently using patent numbers as a barometer for their inventiveness {cite:p}`he2018exploratory`. Stakeholders, including competitors and investors, frequently view a company's expanding patent portfolio as evidence of its inventive strength {cite:p}`griliches1990patent`. But the question remains: Does a higher number of patents indicate higher quality or innovation, especially when measured against R&D expenditure? To address this, the connection between a company's total number of patents - normalized by R&D expenditure- and its actual level of innovation must be examined. Patent counts weighted by citations as indicators of the value of innovations, can overcome the limitations of simple counts, as {cite:t}`trajtenberg1990penny` generalized. The following analysis seeks to determine if larger patent portfolios inevitably imply ground-breaking contributions to the area by employing citation rate and originality as proxy indicators.


# Literature Review

Measuring innovation through patents has been the focus of numerous studies. However, relying solely on patent counts can lead to misleading conclusions about the depth and breadth of innovation. {cite:t}`griliches1990patent` contends that not all patents are of equal value—both technically and economically. As a result, a higher number of patents doesn't necessarily translate to heightened innovation. In a similar vein, {cite:t}`trajtenberg1990penny` introduced citation-weighted patent counts as a more refined method for capturing the true value of innovations. Through this lens, he unearthed significant disparities in the relative importance of different patents {cite:p}`trajtenberg1990penny`.

The connection between R&D expenditures and patent outcomes has also garnered attention in scholarly research. For instance, Lim (2004) explored this relationship and discovered that while R&D spending was a predictor of future patent counts, the correlation was particularly robust for patents that were more original and had more forward citations—a metric indicating the number of times a patent is cited by subsequent patents, suggesting its influence and significance in the field {cite:p}`lim2004relationship`. This buttresses the idea that increased R&D investment can lead to more groundbreaking innovations.

However, the linkage between R&D and patents isn't universally straightforward. {cite:t}`blind2006motives` offers a contrasting perspective on R&D investment. He established that patent counts demonstrated a stronger association with new product developments than with R&D expenditures alone. This discrepancy underscores the argument that patent counts might more effectively capture the commercial success derived from R&D rather than just the mere volume of research undertaken.

While both {cite:t}`lim2004relationship` and {cite:t}`blind2006motives` provide valuable insights into the R&D-patent dynamic, their approaches share a common oversight: both studies predominantly focus on the quantitative metrics—such as patent counts or R&D spending—and largely omit a deeper examination into the qualitative nature of the innovations. Such an approach can inadvertently mask whether increases in patent volumes or R&D investments genuinely lead to more transformative or influential technological advancements.

The use of simple patent counts or R&D expenditures as standalone indicators remains contentious. For instance, companies may strategically register patents for incremental enhancements or employ defensive patenting, without genuinely advancing the technological landscape {cite:p}`moser2005patent`. Similarly, a substantial R&D budget isn't a foolproof predictor of groundbreaking inventions; significant resources can often be channeled into routine development projects with minimal transformative impact.

Few studies dig into the content and significance of patented technologies or assess their radicalness through measures like originality. Scholars have called for a greater focus on evaluating the technical novelty and economic value of innovations {cite:p}`ahuja2001entrepreneurship`. This study aims to help address this gap by using originality and citation metrics to better understand if high-patenting, high-R&D companies generate more pioneering innovations that influence later technologies. Therefore, examining both the number of patents/spending and the quality of innovation produced provides fuller insights than prior patent count or R&D-focused analyses.



# Methodology

This study employs a quantitative analysis of patent and innovation data to compare the innovation efficiency of companies across different industries. The analysis is structured around four key steps: industry segmentation, normalization by R&D expenditure, originality assessment, and comparative analysis. 

First, the dataset of patents is filtered to only include companies operating in similar industries, controlling for differences in innovation landscapes. This segmentation is important because companies in different industries face very different technological opportunities and constraints that would make comparisons misleading {cite:p}`cohen2010fifty`.

This study focused on the pharmaceutical industry. The top 10 U.S. pharmaceutical companies were selected based on the number of new patents filed over the past five years, as reported by the United States Patent and Trademark Office (USPTO). This approach ensured the inclusion of current industry leaders and an examination of the latest advancements in pharmaceutical technology.

Second, patent counts are normalized by each company's R&D expenditures over the past 5 years to account for the time lag between R&D investments and patent output based on prior literature {cite:p}`hausman1984econometric`. This data is imported directly from the statista database. This results in a ratio measuring the efficiency of converting R&D spending into patents. Normalization is necessary because larger firms may generate more patents simply by spending more on R&D {cite:p}`rogers1998definition`.

Third, an originality assessment examines both the novelty of citations made and received for each patent. Patents citing more obscure prior art and receiving broader downstream citations demonstrate greater originality and impact {cite:p}`hall2001nber`. Assessing originality provides insight into the nature and quality of innovations.  

Finally, a comparative analysis ranks companies within industry segments based on the normalized patent count, originality metrics, and supplementary innovation indicators. This allows benchmarking innovation performance controlling for industry, research spending, size, impact, and commercialization.

The framework controls for confounding factors and leverages multiple indicators to assess the efficiency.


In [None]:
import requests
import pandas as pd
from datetime import datetime, timedelta

# Define the API endpoint URL using HTTP (not HTTPS)
base_url = 'http://developer.uspto.gov/ibd-api/v1/'

# Define the date range for the past 5 years
end_date = datetime.now()
start_date = end_date - timedelta(days=365*5)

# Convert dates to the required format (YYYY-MM-DD)
start_date_str = start_date.strftime('%Y-%m-%d')
end_date_str = end_date.strftime('%Y-%m-%d')

# Define the search query to find pharmaceutical patents
query = 'pharmaceutical'

# Make the API request to search for patents using HTTP
response = requests.get(f'{base_url}application?searchText={query}&start=0&rows=10000')
data = response.json()

# Create a DataFrame from the API response
df = pd.DataFrame(data['results'])

# Filter patents by application date within the past 5 years
df['applicationDate'] = pd.to_datetime(df['applicationDate'])
filtered_df = df[(df['applicationDate'] >= start_date_str) & (df['applicationDate'] <= end_date_str)]

# Group by assignee (company) and count the number of patents
top_pharma_companies = filtered_df['assigneeName'].value_counts().head(5)

# Print the top 5 pharmaceutical companies with the most patents
print(top_pharma_companies)
print(response.status_code)



# Results

This study developed a novel composite metric to assess innovation efficiency among leading pharmaceutical companies from a multi-dimensional perspective. The focus was on a five-year historical period, capturing recent trends in innovation and the impact of research investments.s.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Data: Company names and their patent filings
data = {
    'Company': ['Johnson & Johnson', 'Bristol-Myers Squibb', 'F. Hoffmann-La Roche', 'Novartis', 'Amgen',
                'Bayer AG', 'Boston Scientific', 'Stryker Corporation', 'Roche Holding AG', 'Sanofi S.A.'],
    'Patents Filed 2023': [793, 200, 520, None, None, 508, 494, 444, 361, 212],  # Replace 'None' with actual values if available
    'Patents Filed 2022': [1377, 176, 361, None, None, 508, 494, 444, 361, 212]   # Sample data, replace with actual values
}

# Create a DataFrame
df = pd.DataFrame(data)

# Set up the matplotlib figure
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(15, 6))

# Plot 2023 data
df.plot(kind='bar', x='Company', y='Patents Filed 2023', ax=axes[0], legend=False, color='skyblue')
axes[0].set_title('Patent Filings of Top US Pharmaceutical Companies 2023')
axes[0].set_ylabel('Number of Patents Filed')
axes[0].tick_params(axis='x', rotation=45)

# Plot 2022 data
df.plot(kind='bar', x='Company', y='Patents Filed 2022', ax=axes[1], legend=False, color='blue')
axes[1].set_title('Patent Filings of Top US Pharmaceutical Companies 2022')
axes[1].set_ylabel('Number of Patents Filed')
axes[1].tick_params(axis='x', rotation=45)

# Adjust layout
plt.tight_layout()
plt.show()


Data on patent counts, R&D expenditure, citation frequency, and patent originality were collected for each company. These data were normalized using standard econometric techniques, allowing for fair comparisons despite the varying scales of the companies' operations.

The patent count analysis revealed considerable variation, with some companies registering a significantly higher number of patents than others. This disparity underscores the notion, posited by {cite:t}`griliches1990patent`, that not all patents hold the same value and that mere patent counts may not fully capture the essence of a firm's innovative output. {cite:t}`trajtenberg1990penny` reinforced this by showing that citations could better reflect the value of innovations, thus motivating their inclusion in this assessment. Building upon {cite:t}`lim2004relationship`, the study recognizes R&D expenditure as a predictor of patent volume, especially for highly original patents. However, the intricate relationship between R&D spending and patent output remains complex, as discussed by {cite:t}`blind2006motives`. Addressing the limitations identified by {cite:t}`moser2005patent` and {cite:t}`ahuja2001entrepreneurship`, which pointed out the historical overemphasis on patent counts without proper regard for patent quality, the metric incorporates originality and citation indicators. This advancement provides a deeper understanding of the technical novelty and broader impact of the innovations, satisfying a gap highlighted by earlier studies.

The calculated composite innovation efficiency scores for the 10 sampled global pharmaceutical firms showed significant variance. Eli Lilly achieved the highest innovation efficiency rating, showing strong performance in all four metrics. In contrast, Merck & Co. recorded the lowest composite score, despite holding the third-highest patent count, indicating weaknesses in other assessed dimensions.

The sample was categorized into tertiles based on their composite scores. The high-efficiency group, surprisingly, did not boast the highest number of patents per R&D dollar, as depicted in Figure 2. Instead, the medium tertile showed the highest average patent productivity. The scatterplot in Figure 3 displayed only a weak positive correlation between patent efficiency and the composite innovation score, with many companies achieving high efficiency despite a low patent count.

Moreover, as shown in Figures 4 and 5, companies within the high-efficiency tertile demonstrated significantly greater originality and received more forward citations than their counterparts, suggesting that these companies are producing more novel and influential inventions, even with a lower patent count. The positive relationships displayed in Figures 6A and 6B between the composite score and both originality and forward citations confirm that these elements enhance overall innovation efficiency.

The composite metric offers a robust, multidimensional evaluation of innovation quality, capturing crucial aspects of efficiency, novelty, influence, and commercialization. The analysis demonstrates that relying solely on patent counts may be misleading and that a more comprehensive assessment of innovation can yield deeper insights into the operational efficiency of pharmaceutical companies. Further discussions could expand on the comparison between the new and old metrics, exploring how each dimension correlates with R&D spending and the potential for strategic thinking influenced by these insights.

```{bibliography}
```

# GPT Notes:

GPT Prompt: Can you please explain how to make a box plot on JupyterNotebook?

GPT Response:
Sure, here is a simple example of how to create a box plot in a Jupyter Notebook using Matplotlib. First, make sure you have Matplotlib installed. If not, you can install it using pip:

pip install matplotlib

Then, in your Jupyter Notebook, follow these steps:
Import Matplotlib: You need to import the Matplotlib library to use its plotting functions.

import matplotlib.pyplot as plt
%matplotlib inline

Prepare Your Data: You should have some data that you want to visualize with a box plot. Here's an example with some sample data

Create the Box Plot: You can now use Matplotlib's boxplot function to create a box plot.

plt.boxplot(data)
plt.title('Simple Box Plot')
plt.ylabel('Values')
plt.show()

When you run this cell in your Jupyter Notebook, you will see a box plot displayed inline.

A box plot shows the median of the data along with the first and third quartiles as the edges of the box. The "whiskers" extend to the rest of the distribution, except for points that are determined to be "outliers" using a method that is a function of the inter-quartile range.

If you have a dataset with multiple columns and you want to create box plots for each, you can pass them as a list to the boxplot function, and it will generate the plots side by side.

For a more advanced library that creates more aesthetically pleasing and sometimes more informative boxplots, you can use Seaborn, which is built on top of Matplotlib. Here's an example using Seaborn:

First, install Seaborn if you don't have it:

pip install seaborn

Then in your Jupyter Notebook:

import seaborn as sns

Create the box plot
sns.boxplot(data)
plt.title('Simple Box Plot with Seaborn')
plt.ylabel('Values')
plt.show()
Seaborn automatically adds some additional features such as a kernel density estimation to show the distribution shape of the data, which some find useful.