-
Notifications
You must be signed in to change notification settings - Fork 1
Home
A tool for "distant reading" via word searches in temporalized text files, for example CSVs where every line is a textual unit with timestamp (tweet, comment, etc.). Should work with any file that has some text and a timestamp per line. Currently automatically detects files exported from Netvizz, YouTube Data Tools, DMI-TCAT, and Reddit Tools.
LineMiner uses basic PHP (> 5.3) for server-side processing and JavaScript for interface and visualization. An SSD is recommended when working with larger files.
Git or download files into a directory on your server/machine. Make sure that the script can read from folders /data and /stopwords, and write to /output.
The data files to analyze should be uploaded (via FTP) to /data and need to have either a .tab/.tsv or .csv file extension. Currently, the tool automatically detects the following files:
- Netvizz comment files
- YouYube Data Tools comment files
- DMI-TCAT exports
Additional stopword files can be added to /stopwords, using the stopwords_nameoflanguage.txt naming scheme. Stopword files should contain one word per line. There are some stopword files to be found here.