# TTO Patent Tracker
## Project Overview

This project is inspired by the 2024 AUTM University Operations Committee’s “Keeping Up With Patenting” curriculum. As an Applied Data Scientist in the tech transfer space, I am building an open-source ML product in public to demonstrate that it’s possible to create AI-driven tools for patent tracking without relying on commercial solutions. The tracker will be updated daily at 6am CST and will use only public data from the USPTO and other open sources.

The tool is designed to support Aether Intelligence, a quant hedge fund that leverages applied mathematics, data science, and algorithmic techniques to analyze patent data and inform investment decisions. By monitoring patents in priority technology areas, the tracker provides actionable insights into the patent landscape and highlights emerging technology trends that may impact financial markets and investment strategies. Aggregating and tagging US and PCT filings by technology domain and inventor institution enables the fund to identify disruptive innovations, assess competitive dynamics, and anticipate market shifts.

This data-driven approach empowers Aether Intelligence to make informed decisions about launching, investing in, or partnering with startups and technologies based on real-time IP insights.

**Function:**  
Build a lightweight, open-source patent tracking tool that aggregates US and PCT filings, tagging them by technology domain and inventor institution.

**Tech Stack:**  
- USPTO API  
- NLP-based keyword classification
- Profiles RNS API (e.g., profiles.uchicago.edu) for publication and researcher data

**Bonus Features:**  
- Link academic publications to patent filings via inventor



In [6]:
#Import packages 

import pandas as pd
import numpy as np
from sklearn import model_selection, metrics
import requests
from bs4 import BeautifulSoup
import nltk
import spacy
import matplotlib.pyplot as plt
import seaborn as sns
import xmltodict

Matplotlib is building the font cache; this may take a moment.


In [2]:
#Generate synthetic data to simulate a real-world scenario
import pandas as pd
import numpy as np
from faker import Faker

fake = Faker()
num_records = 100

data = {
    'patent_number': [f"US{np.random.randint(10000000, 99999999)}B2" for _ in range(num_records)],
    'title': [fake.sentence(nb_words=6) for _ in range(num_records)],
    'abstract': [fake.paragraph(nb_sentences=3) for _ in range(num_records)],
    'inventor': [fake.name() for _ in range(num_records)],
    'filing_date': [fake.date_between(start_date='-10y', end_date='today') for _ in range(num_records)],
    'assignee': [fake.company() for _ in range(num_records)],
    'technology_domain': [np.random.choice(['AI', 'Biotech', 'Robotics', 'Materials', 'Energy']) for _ in range(num_records)]
}

df_synthetic = pd.DataFrame(data)
df_synthetic.head()

ModuleNotFoundError: No module named 'faker'

In [None]:
#Import data into df for exploratory data analysis
