The Text Parsing and Analytics Tool is a lightweight text analysis application designed to assist researchers performing content analysis on newspaper articles. It is specifically designed to highlight proper nouns so a research can determine the people and objects that common among the texts.
PHP CSS
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
TpaTool
README.md

README.md

Text Parsing and Analytics Tool

The Text Parsing and Analytics Tool is a lightweight text analysis application designed to assist researchers performing content analysis on newspaper articles. This is useful for researchers who are analyzing numerous sources of newspaper articles. The application takes text files and converts them to a table for analysis.

The application serves three main purposes:

  1. Import Articles: Gather text output from discrete sources (WestLaw, LexusNexus, ProQuest) and convert the batch of articles in the text file to a table.
  2. Run Query A: Run a regular expression on the body text of an article to locate syntactical distinctions - the output is a list of keywords which match the criteria determined by the regular expression
  3. Run Query B: Takes a list of keywords generated and provides additional context in a CSV file. Additional data includes:
    • the article ID of the article that contained the keyword
    • The total count of the keyword across
    • 30 characters surrounding the keyword

Current state of software

  • This is an early stage application with minimal security. We recommend connecting the text parsing and analytic tool to a local database rather than a server.
  • When parsing Westlaw and ProQuest documents, tab separated value (TSV) output will be generated suitable for entering into the main database ("Master List"), because of the stage of development only LexusNexus commits to the SQL Database

Installation Instruction

  • Download this repository
  • Install "/TpaTool" folder on your server
  • Create a mySql database, user and password
  • open "/TpaTool/settings.php" and enter your database, user and password information
  • Save and run

License

Attribution-NonCommercial-ShareAlike 4.0 International

Code Snippets & Queries:

Query A Example:

The application allows a user to insert whatever regular expression they need. The default regular expression locates two or more consecutive capitalized words (proper nouns). '' Reg Ex Default: '' ([A-Z][a-zA-Z0-9-])([\s][A-Z][a-zA-Z0-9-])+

Query B Example:

Allows a user to past in the keywords generated by Query A and provides additional context in a CSV file (the group by SQL command is configurable). '' Query: '' foreach $keyword '' SELECT SELECT Pub_Date, '' count(*) '' FROM '' Master_List '' WHERE '' match(Article) against('$keyword') '' GROUP By // THIS MAY CHANGE '' Pub_Date;

Authors

  • Code Development: Austin Meyers (AK5A)
  • Research & Query: David Rheams (Dr-Heams)