Skip to content
Bernhard Rieder edited this page Feb 27, 2017 · 7 revisions

LineMiner

A tool for "distant reading" via word searches in temporalized text files, for example CSVs where every line is a textual unit with timestamp (tweet, comment, etc.). Should work with any file that has some text and a timestamp per line. Currently automatically detects files exported from Netvizz, YouTube Data Tools, DMI-TCAT, and Reddit Tools.

Requirements

LineMiner uses basic PHP (> 5.3) for server-side processing and JavaScript for interface and visualization. An SSD is recommended when working with larger files.

Installation

Git or download files into a directory on your server/machine. Make sure that the script can read from folders /data and /stopwords, and write to /output.

File types and locations

The data files to analyze should be uploaded (via FTP) to /data and need to have either a .tab/.tsv or .csv file extension. Currently, the tool automatically detects the following files:

  • Netvizz comment files
  • YouYube Data Tools comment files
  • DMI-TCAT exports

Additional stopword files can be added to /stopwords, using the stopwords_nameoflanguage.txt naming scheme. Stopword files should contain one word per line. There are some stopword files to be found here.

Clone this wiki locally