Skip to content

LalaNguyen/patent-analysis-framework

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

patent-analysis-framework

Proposed framework for extracting and analyzing Patent data using Neo4j and MongoDB.

#Installation To install required modules, please use the following command:

pip install -r /path/to/requirements.txt

Neo4j must be installed. For further information, please refer to http://neo4j.com/docs/stable/server-installation.html

MongoDB must be installed. For further information, please refer to http://docs.mongodb.org/manual/installation/

#Usage ##Patent resources For XML patent file you can download from http://www.google.com/googlebooks/uspto-patents-grants-text.html

##Extract data from XML You can extract data from XML by using the following command:

python parse_patent.py --input=<inputfile.xml> --size=<number_of_xmls> --export=<export_type>

######Arguments

inputfile.xml	--	Path to input file
number_of_xmls	--	Number of patents want to parse
export_type		--	File type to export (Till now we just support for json and the exported file name is 'data.json')

##Insert data into database You can insert data into Neo4j or MongoDB using the following command:

python db_client.py

This module will get the data from 'data.json' in the working folder and parse directly to database using REST API.

Caution: You must start Neo4j, Eve and MongoDB server before execute this module, otherwise, it will raise error.

To start Neo4j: ./path/to/neo4j/bin/neo4j start

To start Eve: python /path/to/eve/run.py

To start MongoDB: ./path/to/mongodb/bin/mongod

##Extract data from XML using lxml and insert data to Neo4j using Object Graph Mapping (Py2neo) You can extract data from XML using lxml and insert data to Neo4j using Py2neo by using the following command:

python xmlParser.py -i=<inputfile.xml> -o=<outputfile> -s=<number_of_xmls> <-p>

xmlParser.py is located at src/ogm

######Arguments

inputfile.xml	--	Path to input file.
number_of_xmls	--	Number of patents want to parse
outputfile		--	Path to output file
-p				--	Parse data to Neo4j (if not mentioned, xmlParser just read the whole xml file without inserting data to Neo4j

##Generate csv from Neo4j database You can generate nodes and relationships from Neo4j database into two .csv file by using the following command:

python generate_csv.py

Caution: You must start Neo4j server first

##Other Info

For API documentations, please refer to http://patent-analysis-framework.readthedocs.org/en/latest/

About

Proposed framework for extracting and analyzing Patent data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published