SOCC/scripts at master · sfu-discourse-lab/SOCC

History

Name		Name	Last commit message	Last commit date
parent directory ..
commenter_stats		commenter_stats
visualization_notebooks		visualization_notebooks
appraisal_analysis.R		appraisal_analysis.R
clean_comments.py		clean_comments.py
combine_webanno.py		combine_webanno.py
old_combine_comments.py		old_combine_comments.py
projects_to_tsv.py		projects_to_tsv.py
readme		readme
rename_webanno.py		rename_webanno.py
socc_comment_profilling.py		socc_comment_profilling.py
webanno_to_sentence.py		webanno_to_sentence.py
webanno_to_span.py		webanno_to_span.py

readme

rename_webanno.py renames all files in an unzipped WebAnno project to use the comment counter names used elsewhere for this project.

clean_comments.py removes lines beginning with '#' from curated files from a WebAnno project when given the path to a '.../curation' folder in an exported WebAnno project, assuming the project was exported in TSV format. This allows those TSVs to be read by the Pandas package in Python.

combine_webanno.py combines all WebAnno TSVs, pulling them from multiple folders into one TSV.

webanno_to_sentence.py uses the output of combine_webanno.py, a mapping of comment names, and a constructiveness and toxicity CSV. It combines Attitude, negation, constructiveness, and toxicity annotations into one dataframe, with a row for each sentence of the comment followed by a row for each annotated span in the comment. Constructiveness and toxicity were annotated at the comment level (not the word level), so all sentences/spans within a comment will have a column for whether the comment was constructive/toxic as well.

webanno_to_span.py asks for a path to a folder containing the files cleaned by clean_comments.py and writes a new CSV of those annotations combined to a path of your choice. It also reorganizes the data so that each row represents one annotated span (in original files, each row represents a word). It also sorts the rows by their position in the comment (so that the comments can be read from top to bottom in order) and then by the size of the annotation (so that longer spans appear before shorter ones). old_combine_comments.py is the older version of this.

appraisal_analysis.R uses the combined dataframe and does different counts and analysis. The analysis is complete, but the code is still messy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

readme

Files

scripts

Directory actions

More options

Directory actions

More options

Latest commit

History

scripts

Folders and files

parent directory

readme