### Gaslighting - Transcript Processing & Data Collection
_____________________________

This is the notebook I used to process all transcripts. You will find data for all four primary transcripts, plus one extra which I transcribed, but decided not to use for the most part. This is the conversation with Munroe Bergdorf, as the presence of a third speaker in the conversation complicated things slightly.


In [1]:
#import libraries and scripts
import pandas as pd
import numpy as np
import matplotlib as plt
import os
from Python.tr_init import *
from Python.tr_class import *
from Python.tr_overlap import *
from Python.tr_tag import *

In [2]:
#append names of txt files in txt/ dir to txt_list
txt_directory = './txt/'
tagged_directory = './tagged/'
txt_list = []

for file in os.listdir(txt_directory):
     filename = os.fsdecode(file)
     txt_list.append(os.path.join(txt_directory, filename))


In [3]:
#define write_latex function

#write new file, given a transcript class instance and destination directory
def write_latex(transcript, directory):
    tagged_file = transcript.path[6:-4] + '_tagged.txt'
    destination = directory + tagged_file
    with open(destination, 'w') as f:
        for line in transcript.new_lines:
            f.write(line)
            f.write('\n')
   

In [4]:
#create transcript class instance & write_latex for each interview
tr_list = []
for txt_path in txt_list:
    #parse name from path and create class instance
    transcript_name = txt_path[6:-4]
    tr_class = Transcript(txt_path)
    #add to local variables and append to tr_class_list
    locals()[transcript_name] = tr_class
    if transcript_name != 'transcript_test_file':
        tr_list.append(tr_class)
    #process class instance
    init_routine(tr_class)
    overlap_routine(tr_class)
    tag_routine(tr_class)
    tr_class.create_df()

    #append line numbers to match transcript
    tr_class.line_number = [0] * len(tr_class.new_lines)
    i = 1
    for j in range(0, len(tr_class.new_lines)):
        if tr_class.new_lines[j] != '\medskip':
            tr_class.line_number[j] = i
            i += 1
    
    write_latex(tr_class, tagged_directory)
    print("Transcript class instance name = ", transcript_name)


Transcript class instance name =  andrew_1
Transcript class instance name =  andrew_2
Transcript class instance name =  janet
Transcript class instance name =  munroe
Transcript class instance name =  transcript_test_file
Transcript class instance name =  trisha


In [5]:
#check for errors in t.transcript_df[overlap_type]

err_found = 0
for tr in tr_list:
    for i in range(0, len(tr.overlap_list)):
        if tr.overlap_list[i]:
            for type in tr.overlap_list[i]:
                if type == 'error':
                    err_found = 1
                    print('parse error found: ', tr.path, "line no.", i + 2)
if not err_found:
    print("No parse errors found")

No parse errors found


In [6]:
#search for two interruption types occuring on consecutive lines
type1 = 'overlapped'
type2 = 'interruption_unsucc'

print('Examples where', type1, 'occurs immediately before', type2, ':')
for tr in tr_list:
    for i in range(0, len(tr.overlap_list) - 2):
        if tr.overlap_list[i] and tr.overlap_list[i][-1] == type1:
            if tr.overlap_list[i + 2] and tr.overlap_list[i + 2][0] == type2:
                print(tr.path, 'line =', tr.line_number[i])


Examples where overlapped occurs immediately before interruption_unsucc :
./txt/andrew_1.txt line = 290
./txt/andrew_2.txt line = 367
./txt/andrew_2.txt line = 382
./txt/janet.txt line = 179
./txt/janet.txt line = 181
./txt/janet.txt line = 188
./txt/janet.txt line = 227
./txt/munroe.txt line = 62
./txt/munroe.txt line = 241
./txt/munroe.txt line = 318
./txt/trisha.txt line = 60
./txt/trisha.txt line = 66
./txt/trisha.txt line = 102
./txt/trisha.txt line = 134
./txt/trisha.txt line = 164


In [7]:
#use this cell to display transcript_df or features_df

pd.options.display.max_rows = 999
pd.options.display.max_colwidth = 80

display(andrew_2.features_df)
#display(andrew_2.transcript_df)

Unnamed: 0,Piers,Andrew
interrupted_succ,9,29
interrupted_unsucc,4,25
overlapped,11,19
interruption_succ,27,4
interruption_unsucc,22,7
overlap,16,9
minimal_response,25,18
word_count,1571,2316
words_%,40,59


In [9]:
for tr in tr_list: 
    #tr.interrupting_behaviour = {}
    tr.interrupting_behaviour = {
        'piers_yield_rate': np.round_((tr.features['Piers']['interrupted_succ'] / (tr.features['Piers']['interrupted_succ'] + tr.features['Piers']['interrupted_unsucc'])) * 100, 1),
        'piers_interrupt_success': np.round_((tr.features['Piers']['interruption_succ'] / (tr.features['Piers']['interruption_succ'] + tr.features['Piers']['interruption_unsucc'])) * 100, 1)
    }
    print('For the transcript:', tr.path)
    print('Percentage of times Piers yields when interrupted =\t', tr.interrupting_behaviour['piers_yield_rate'])
    print('Piers\' success rate percentage when interrupting =\t', tr.interrupting_behaviour['piers_interrupt_success'])
    print('')


andrew_agg = {
    'piers_yield_rate': (andrew_1.interrupting_behaviour['piers_yield_rate'] + andrew_2.interrupting_behaviour['piers_yield_rate']) / 2,
    'piers_interrupt_success': (andrew_2.interrupting_behaviour['piers_interrupt_success'] + andrew_2.interrupting_behaviour['piers_interrupt_success']) / 2
}
print('Aggregate of both Andrew excerpts:')
print('Percentage of times Piers yields when interrupted =\t', andrew_agg['piers_yield_rate'])
print('Piers\' success rate percentage when interrupting =\t', andrew_agg['piers_interrupt_success'])


For the transcript: ./txt/andrew_1.txt
Percentage of times Piers yields when interrupted =	 41.7
Piers' success rate percentage when interrupting =	 85.3

For the transcript: ./txt/andrew_2.txt
Percentage of times Piers yields when interrupted =	 69.2
Piers' success rate percentage when interrupting =	 55.1

For the transcript: ./txt/janet.txt
Percentage of times Piers yields when interrupted =	 65.0
Piers' success rate percentage when interrupting =	 20.8

For the transcript: ./txt/munroe.txt
Percentage of times Piers yields when interrupted =	 57.6
Piers' success rate percentage when interrupting =	 60.0

For the transcript: ./txt/trisha.txt
Percentage of times Piers yields when interrupted =	 33.3
Piers' success rate percentage when interrupting =	 60.9

Aggregate of both Andrew excerpts:
Percentage of times Piers yields when interrupted =	 55.45
Piers' success rate percentage when interrupting =	 55.1
