# A Notebook for Debugging Issues with James

In [1]:
# Uncomment and run this line of code if you need to install the package
# !pip install team_comm_tools

Here, we confirm that we have successfully installed the package!

In [2]:
!pip list | grep team_comm_tools

team_comm_tools         0.1.4.post2


Now let's import the package along with other packages we will need to run the demo.

In [3]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# import seaborn as sns
import textwrap
from team_comm_tools import FeatureBuilder

[nltk_data] Downloading package wordnet to /Users/xehu/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [4]:
uber_transcript = pd.read_csv('all_utterances.csv')

In [5]:
uber_transcript.head(10)

Unnamed: 0,raw_start,raw_end,confidence,transcript,filepath_start,filepath_end,timestamp_start,timestamp_end,file,gameId,position,deliberationId,sampleId
0,8.08,9.3,0.994489,hey can you hear me,8.568,9.788,8.562,9.782,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,1,a125d806-2bcf-496a-a9ae-f727a2803d2f,0be567b7-cda5-42eb-9098-c0187b0225c3
1,8.559999,9.62,0.708329,yes i can,10.291999,11.352,10.277999,11.338,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,0,4c923fdd-fd15-4780-9f16-5a60e040def6,667c2bfe-8401-4272-9ea6-44f0aef7c38c
2,11.44,11.94,0.995984,okay,11.928,12.428,11.922,12.422,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,1,a125d806-2bcf-496a-a9ae-f727a2803d2f,0be567b7-cda5-42eb-9098-c0187b0225c3
3,13.655,15.514999,0.984508,so what are we talking about the,14.143,16.002999,14.137,15.996999,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,1,a125d806-2bcf-496a-a9ae-f727a2803d2f,0be567b7-cda5-42eb-9098-c0187b0225c3
4,14.945001,15.445001,0.650629,yeah,16.677001,17.177001,16.663001,17.163001,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,0,4c923fdd-fd15-4780-9f16-5a60e040def6,667c2bfe-8401-4272-9ea6-44f0aef7c38c
5,16.775,18.075,0.99244,us government funded,17.263,18.563,17.257,18.557,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,1,a125d806-2bcf-496a-a9ae-f727a2803d2f,0be567b7-cda5-42eb-9098-c0187b0225c3
6,18.535,19.835,0.977487,needle exchange programs,19.023,20.323,19.017,20.317,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,1,a125d806-2bcf-496a-a9ae-f727a2803d2f,0be567b7-cda5-42eb-9098-c0187b0225c3
7,19.585,20.085,0.691941,yep,21.317,21.817,21.303,21.803,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,0,4c923fdd-fd15-4780-9f16-5a60e040def6,667c2bfe-8401-4272-9ea6-44f0aef7c38c
8,22.855,24.395,0.86814,or are you a pro,23.343,24.883,23.337,24.877,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,1,a125d806-2bcf-496a-a9ae-f727a2803d2f,0be567b7-cda5-42eb-9098-c0187b0225c3
9,24.775,25.675,0.99936,needle exchange,25.263,26.163,25.257,26.157,recordings/20241001_1844_ctt_w1V35R4E/17278117...,01J94SKS2GJ71CT03G7XV35R4E,1,a125d806-2bcf-496a-a9ae-f727a2803d2f,0be567b7-cda5-42eb-9098-c0187b0225c3


# Run the FeatureBuilder

In [6]:
transcript_featurizer = FeatureBuilder(
    input_df = uber_transcript,
    conversation_id_col = "gameId",
    speaker_id_col = "sampleId",
    message_col = "transcript",
    timestamp_col= ("filepath_start", "filepath_end"),    
    turns = True, # if true, this will combine successive turns by the same speaker.
    # these features depend on sentence vectors, so they take longer to generate on larger datasets. Add them in manually if you are interested in adding them to your output!
    custom_features =  [
        "(BERT) Mimicry",
        "Moving Mimicry",
        "Forward Flow",
        "Discursive Diversity"
    ],
    )
# this line of code runs the FeatureBuilder on your data
transcript_featurizer.featurize()

Initializing Featurization...
Confirmed that data has conversation_id: gameId, speaker_id: sampleId and message: transcript columns!
Generating SBERT sentence vectors...


100%|██████████| 9460/9460 [00:00<00:00, 85446.89it/s]


Generating RoBERTa sentiments...


100%|██████████| 148/148 [02:16<00:00,  1.08it/s]


Chat Level Features ...


100%|██████████| 17/17 [04:06<00:00, 14.49s/it]


Generating features for the first 100.0% of messages...
Generating User Level Features ...
Generating Conversation Level Features ...
All Done!
