Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Python Scripts for data pre- and post-processing (parsing, cleaning and analysis)
Python Shell
Branch: master
Pull request Compare This branch is 103 commits ahead of patentnetwork:master.

Fetching latest commit…

Cannot retrieve the latest commit at this time

Failed to load latest commit information.
doc
lib
misc
postprocess
test
.gitignore
Makefile
README.markdown
benchmark.py
build_gns.py
build_sas.py
clean.py
consolidate.py
parse.py
preprocess.sh

README.markdown

Python scripts for disambiguating patent data

The following collection of scripts performs pre- and post-processing on patent data as part of the patent inventor disambiguation process.

CURRENT:

(I) DATASET PREPARATION

(1) XML Parsing

a. Open XMLParse2008.py
b. Set variable flder = <folder that contains all XML raw files>
c. Run XMLParse2008.py

(2) Data Cleaning

- scripts_v2.py should be in same directory as all sqlite3 files from XML Parsing step.
a. Run scripts_v2.py

(3) Table Consolidation

a. Run invpat.py
b.

(II) RESULTS ANALYSIS

From the command line, run bmVerify_v3.py.

Use python bmVerify_v3.py ? or python bmVerify_v3.py help for more information.

(III) Other scripts

Run from command line to create files:

python patentYear.py [year] [src] python createFullSet.py [start_year] [end_year]

PREVIOUS:

bmVerify compares the consolidated results with an existing benchmark

compressBlk takes a "disambiguated" dataset and consolidates it into a new dataset.

fwork.py are Python scripts I reuse

Something went wrong with that request. Please try again.