# Comparative Analysis of Syntax Patterns in GPT-Generated vs. Crowd-Sourced Datasets

This Jupyter notebook focuses on analyzing and comparing unique syntax patterns across six datasets: three generated by GPT (GPT-T/P-12) and three from crowdsourced data (Crowd-T/P) by [Jorge et al](https://link.springer.com/chapter/10.1007/978-3-031-07472-1_15). The goal is to count the syntax patterns that appear in each GPT dataset but are missing from the corresponding crowd datasets, and vice versa. This analysis will help us identify distinctive syntactic structures in AI-generated text compared to human-generated content.

In [None]:
import os
import pathlib
import json

# let's just make sure we are at the root
os.chdir(pathlib.Path().absolute().parent)
current_directory = os.getcwd()
new_directory = f"{current_directory}/Jorge_paper_replication"
os.chdir(new_directory)
print(f"Current working directory: {os.getcwd()}")

In [67]:
# Import stanza model
import stanza
nlp = stanza.Pipeline(lang='en', processors='tokenize,pos,constituency', tokenize_no_ssplit=True)

2024-09-17 08:07:58 INFO: Checking for updates to resources.json in case models have been updated.  Note: this behavior can be turned off with download_method=None or download_method=DownloadMethod.REUSE_RESOURCES


Downloading https://raw.githubusercontent.com/stanfordnlp/stanza-resources/main/resources_1.4.1.json:   0%|   …

2024-09-17 08:07:59 INFO: Loading these models for language: en (English):
| Processor    | Package  |
---------------------------
| tokenize     | combined |
| pos          | combined |
| constituency | wsj      |

2024-09-17 08:07:59 INFO: Use device: cpu
2024-09-17 08:07:59 INFO: Loading: tokenize
2024-09-17 08:07:59 INFO: Loading: pos
2024-09-17 08:08:00 INFO: Loading: constituency
2024-09-17 08:08:00 INFO: Done loading processors!


In [3]:
import random
import requests
from collections import Counter
from itertools import cycle
import time
import math
import warnings
import pandas as pd
import numpy as np
import stanza
import seaborn as sns
from tqdm import tqdm
from transformers import logging
from bert_score import score
import matplotlib.pyplot as plt
from termcolor import colored

from lib import metrics
from lib import utility as utlty
from lib import prompts_utility
from lib import gpt_utility

logging.set_verbosity_error()#suppressing the display of warning messages.
warnings.simplefilter(action='ignore', category=FutureWarning)#suppress Pandas Future warning

# Read GPT-P-12 dataset
---
Our experiment involves analyzing each unique syntax pattern by counting those present in GPT-T/P-12 but absent in Crowd-T/P, and vice versa.

In [73]:
input_file = f"{os.getcwd()}/CompareSyntaxTemplate/pattern_by_example_12-raw.csv"#test_data
df = pd.read_csv(input_file)
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns}")
df.head()

Shape: (102, 56)
Columns: Index(['input_utterance', 'intent', 'parameters', 'utterance_template',
       'source', 'seed_id', 'selected_templates',
       'selected_templates_related_paraphrases', 'p1', 'p1_template',
       'p1_ted_to_selected_templates', 'p1_is_similar', 'p2', 'p2_template',
       'p2_ted_to_selected_templates', 'p2_is_similar', 'p3', 'p3_template',
       'p3_ted_to_selected_templates', 'p3_is_similar', 'p4', 'p4_template',
       'p4_ted_to_selected_templates', 'p4_is_similar', 'p5', 'p5_template',
       'p5_ted_to_selected_templates', 'p5_is_similar', 'p6', 'p6_template',
       'p6_ted_to_selected_templates', 'p6_is_similar', 'p7', 'p7_template',
       'p7_ted_to_selected_templates', 'p7_is_similar', 'p8', 'p8_template',
       'p8_ted_to_selected_templates', 'p8_is_similar', 'p9', 'p9_template',
       'p9_ted_to_selected_templates', 'p9_is_similar', 'p10', 'p10_template',
       'p10_ted_to_selected_templates', 'p10_is_similar', 'p11',
       'p11_template',

Unnamed: 0,input_utterance,intent,parameters,utterance_template,source,seed_id,selected_templates,selected_templates_related_paraphrases,p1,p1_template,...,p10_ted_to_selected_templates,p10_is_similar,p11,p11_template,p11_ted_to_selected_templates,p11_is_similar,p12,p12_template,p12_ted_to_selected_templates,p12_is_similar
0,Terminate i-a541 now,EndEC2Instance,"[['VM', 'i-a541']]",( S ( VP ( VB ) ( NP ) ( ADVP ) ) ),ParaQuality,1,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...","['define dfw', 'Can you find me the most recen...",Can you terminate i-a541 immediately?,( SQ ( MD ) ( NP ( PRP ) ) ( VP ( VB ) ( NP ) ...,...,"{'( S ( VP ( VB ) ( NP ) ) )': 4.0, '( SQ ( MD...",False,Forbid any further activity from program insta...,( S ( VP ( VB ) ( NP ) ) ( '' ) ( . ) ),"{'( S ( VP ( VB ) ( NP ) ) )': 2.0, '( SQ ( MD...",True,Put an end to all actions performed by applica...,( S ( VP ( VB ) ( NP ) ( PP ) ) ( '' ) ( . ) ),"{'( S ( VP ( VB ) ( NP ) ) )': 3.0, '( SQ ( MD...",True
1,Search for cooking videos,SearchWeb,"[['query', 'cooking videos']]",( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),ParaQuality,2,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...","['Find videos on cooking', 'Can you find me a ...",Look up videos about cooking,( S ( VP ( VB ) ( PRT ) ( NP ) ) ),...,"{'( S ( VP ( VB ) ( NP ) ) )': 5.0, '( SQ ( MD...",False,Do you have any recommendations for instructio...,( SQ ( VBP ) ( NP ( PRP ) ) ( VP ( VB ) ( NP )...,"{'( S ( VP ( VB ) ( NP ) ) )': 5.0, '( SQ ( MD...",True,Can I get recommendations or search results fo...,( SQ ( MD ) ( NP ( PRP ) ) ( VP ( VB ) ( NP ) ...,"{'( S ( VP ( VB ) ( NP ) ) )': 6.0, '( SQ ( MD...",True
2,Find most popular photos tagged #LOVE,SearchWeb,"[['Tag', '#LOVE']]",( S ( S ( VP ) ) ( VP ( VBP ) ) ),ParaQuality,3,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...",['list the airlines that fly from boston to at...,Discover the top-rated images with the #LOVE tag,( S ( VP ( VB ) ( NP ) ( PP ) ) ),...,"{'( S ( VP ( VB ) ( NP ) ) )': 1.0, '( SQ ( MD...",True,What are some of today's most fashionable pict...,( SBARQ ( WHNP ( WP ) ) ( SQ ( VP ) ) ( . ) ),"{'( S ( VP ( VB ) ( NP ) ) )': 6.0, '( SQ ( MD...",False,Do you know where I can see currently trendy v...,( SQ ( VBP ) ( NP ( PRP ) ) ( VP ( VB ) ( SBAR...,"{'( S ( VP ( VB ) ( NP ) ) )': 6.0, '( SQ ( MD...",True
3,Search for a few nice photos matching Opera Ho...,SearchWeb,"[['size', '1024px * 768px'], ['query', 'Opera ...",( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),ParaQuality,4,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...",['find showtimes for the movie The Caretaker a...,Look up some nice Opera House pictures with di...,( S ( VP ( VB ) ( PRT ) ( NP ) ) ( . ) ),...,"{'( S ( VP ( VB ) ( NP ) ) )': 4.0, '( SQ ( MD...",False,Find multiple aesthetically pleasing pics ill...,( S ( S ( VP ) ) ( NP ( DT ) ( NNS ) ) ( VP ( ...,"{'( S ( VP ( VB ) ( NP ) ) )': 7.0, '( SQ ( MD...",False,Help me collect handfulin count vibrant actuat...,"( S ( S ( VP ) ) ( , ) ( S ( VP ) ) ( CC ) ( V...","{'( S ( VP ( VB ) ( NP ) ) )': 6.0, '( SQ ( MD...",False
4,Are the burglar alarms in the office malfuncti...,CheckDevice,"[['location', 'office']]",( SQ ( VBP ) ( NP ( DT ) ( NN ) ( NNS ) ) ( PP...,ParaQuality,5,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...","[""Drop a message to Phil indicating that I've ...",Are the alarms in the office for burglars not ...,( SQ ( VBP ) ( NP ( NP ) ( PP ) ( PP ) ) ( RB ...,...,"{'( S ( VP ( VB ) ( NP ) ) )': 11.0, '( SQ ( M...",False,Are the burglar alarms at work malfunctioning ...,( SQ ( VBP ) ( NP ( DT ) ( NN ) ( NNS ) ) ( PP...,"{'( S ( VP ( VB ) ( NP ) ) )': 16.0, '( SQ ( M...",False,Do you think there's something wrong with the ...,( SQ ( VBP ) ( NP ( PRP ) ) ( VP ( VB ) ( SBAR...,"{'( S ( VP ( VB ) ( NP ) ) )': 6.0, '( SQ ( MD...",True


#### Remove some columns

In [74]:
# List of columns to remove
columns_to_remove = ['parameters', 'utterance_template',
       'source', 'seed_id', 'selected_templates',
       'selected_templates_related_paraphrases',
       'p1_ted_to_selected_templates', 'p1_is_similar',
       'p2_ted_to_selected_templates', 'p2_is_similar',
       'p3_ted_to_selected_templates', 'p3_is_similar',
       'p4_ted_to_selected_templates', 'p4_is_similar',
       'p5_ted_to_selected_templates', 'p5_is_similar',
       'p6_ted_to_selected_templates', 'p6_is_similar',
       'p7_ted_to_selected_templates', 'p7_is_similar',
       'p8_ted_to_selected_templates', 'p8_is_similar',
       'p9_ted_to_selected_templates', 'p9_is_similar',
       'p10_ted_to_selected_templates', 'p10_is_similar',
       'p11_ted_to_selected_templates', 'p11_is_similar',
       'p12_ted_to_selected_templates', 'p12_is_similar']

# Drop the columns from the DataFrame
df = df.drop(columns=columns_to_remove)

# Display the updated DataFrame
print(f"Columns: {df.columns}")
print(df)


Columns: Index(['input_utterance', 'intent', 'p1', 'p1_template', 'p2', 'p2_template',
       'p3', 'p3_template', 'p4', 'p4_template', 'p5', 'p5_template', 'p6',
       'p6_template', 'p7', 'p7_template', 'p8', 'p8_template', 'p9',
       'p9_template', 'p10', 'p10_template', 'p11', 'p11_template', 'p12',
       'p12_template'],
      dtype='object')
                                       input_utterance              intent  \
0                                 Terminate i-a541 now      EndEC2Instance   
1                            Search for cooking videos           SearchWeb   
2                Find most popular photos tagged #LOVE           SearchWeb   
3    Search for a few nice photos matching Opera Ho...           SearchWeb   
4    Are the burglar alarms in the office malfuncti...         CheckDevice   
..                                                 ...                 ...   
97      show me the airlines between boston and denver             airline   
98                    

#### Extract syntax templates from Paraphrases

In [75]:
# List of 'P' and 'Template' columns
p_columns = ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9', 'p10', 'p11', 'p12']
template_columns = ['p1_template', 'p2_template', 'p3_template', 'p4_template', 'p5_template',
                    'p6_template', 'p7_template', 'p8_template', 'p9_template', 'p10_template',
                    'p11_template', 'p12_template']

# Extract values from the original DataFrame
p_values = df[p_columns].values.flatten()
template_values = df[template_columns].values.flatten()

# Repeat 'input_utterance' and 'intent' values for each pair of P and Template
input_utterance_values = df['input_utterance'].repeat(len(p_columns)).values
intent_values = df['intent'].repeat(len(p_columns)).values

# Create a new DataFrame with columns 'Input Utterance', 'Intent', 'P', and 'Template'
df_GPT_P_12 = pd.DataFrame({
    'Input Utterance': input_utterance_values,
    'Intent': intent_values,
    'P': p_values,
    'Template': template_values
})

# Display the new DataFrame
df_GPT_P_12.head()


Unnamed: 0,Input Utterance,Intent,P,Template
0,Terminate i-a541 now,EndEC2Instance,Can you terminate i-a541 immediately?,( SQ ( MD ) ( NP ( PRP ) ) ( VP ( VB ) ( NP ) ...
1,Terminate i-a541 now,EndEC2Instance,End i-a541 right away.,( S ( VP ( NN ) ( NP ) ( ADVP ) ) ( . ) )
2,Terminate i-a541 now,EndEC2Instance,Cease the process of i-a541 now.,( S ( VP ( VB ) ( NP ) ( ADVP ) ) ( . ) )
3,Terminate i-a541 now,EndEC2Instance,Stop the execution of i-a541 at this moment.,( S ( VP ( VB ) ( NP ) ( PP ) ) ( . ) )
4,Terminate i-a541 now,EndEC2Instance,Halt i-a541 at once.,( S ( VP ( VB ) ( NP ) ( ADVP ) ) ( . ) )


# Read GPT-T-12 dataset

In [76]:
input_file = f"{os.getcwd()}/CompareSyntaxTemplate/taboo_patterns_12-raw.csv"#test_data
df = pd.read_csv(input_file)
print(f"Shape: {df.shape}")
print(f"Columns: {df.columns}")
df.head()

Shape: (102, 56)
Columns: Index(['input_utterance', 'intent', 'parameters', 'utterance_template',
       'source', 'seed_id', 'selected_templates',
       'selected_templates_related_paraphrases', 'p1', 'p1_template',
       'p1_ted_to_selected_templates', 'p1_is_similar', 'p2', 'p2_template',
       'p2_ted_to_selected_templates', 'p2_is_similar', 'p3', 'p3_template',
       'p3_ted_to_selected_templates', 'p3_is_similar', 'p4', 'p4_template',
       'p4_ted_to_selected_templates', 'p4_is_similar', 'p5', 'p5_template',
       'p5_ted_to_selected_templates', 'p5_is_similar', 'p6', 'p6_template',
       'p6_ted_to_selected_templates', 'p6_is_similar', 'p7', 'p7_template',
       'p7_ted_to_selected_templates', 'p7_is_similar', 'p8', 'p8_template',
       'p8_ted_to_selected_templates', 'p8_is_similar', 'p9', 'p9_template',
       'p9_ted_to_selected_templates', 'p9_is_similar', 'p10', 'p10_template',
       'p10_ted_to_selected_templates', 'p10_is_similar', 'p11',
       'p11_template',

Unnamed: 0,input_utterance,intent,parameters,utterance_template,source,seed_id,selected_templates,selected_templates_related_paraphrases,p1,p1_template,...,p10_ted_to_selected_templates,p10_is_similar,p11,p11_template,p11_ted_to_selected_templates,p11_is_similar,p12,p12_template,p12_ted_to_selected_templates,p12_is_similar
0,Terminate i-a541 now,EndEC2Instance,"[['VM', 'i-a541']]",( S ( VP ( VB ) ( NP ) ( ADVP ) ) ),ParaQuality,1,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...","['Find the trending photos tagged as #LOVE', '...",Halt the functioning of i-a541 immediately.,( S ( VP ( VB ) ( NP ) ( ADVP ) ) ( . ) ),...,"{'( S ( VP ( VB ) ( NP ) ) )': 1.0, '( SQ ( MD...",True,Cease running task with ID number a-1541 insta...,( NP ( NP ( VB ) ( VBG ) ( NN ) ) ( PP ( IN ) ...,"{'( S ( VP ( VB ) ( NP ) ) )': 9.0, '( SQ ( MD...",False,Stop running instance a-1541 right now,( S ( VP ( VB ) ( S ) ) ),"{'( S ( VP ( VB ) ( NP ) ) )': 1.0, '( SQ ( MD...",True
1,Search for cooking videos,SearchWeb,"[['query', 'cooking videos']]",( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),ParaQuality,2,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...",['locate Fox Theatres featuring The Caretaker'...,Find videos on how to cook,( S ( VP ( VB ) ( NP ) ) ),...,"{'( S ( VP ( VB ) ( NP ) ) )': 0.0, '( SQ ( MD...",True,Look up demonstrations of culinary techniques,( S ( VP ( VB ) ( PRT ) ( NP ) ) ),"{'( S ( VP ( VB ) ( NP ) ) )': 1.0, '( SQ ( MD...",True,Search for videos that teach how to cook,( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),"{'( S ( VP ( VB ) ( NP ) ) )': 4.0, '( SQ ( MD...",False
2,Find most popular photos tagged #LOVE,SearchWeb,"[['Tag', '#LOVE']]",( S ( S ( VP ) ) ( VP ( VBP ) ) ),ParaQuality,3,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...","['Find videos about cooking', 'Can you show me...",Discover the most popular images with the #LOV...,( S ( VP ( VB ) ( NP ) ) ),...,"{'( S ( VP ( VB ) ( NP ) ) )': 0.0, '( SQ ( MD...",True,I would like to see famous photographs that ar...,( S ( NP ( PRP ) ) ( VP ( MD ) ( VP ) ) ( . ) ),"{'( S ( VP ( VB ) ( NP ) ) )': 5.0, '( SQ ( MD...",False,Can you find me some well-known pictures label...,( SQ ( MD ) ( NP ( PRP ) ) ( VP ( VB ) ( NP ) ...,"{'( S ( VP ( VB ) ( NP ) ) )': 6.0, '( SQ ( MD...",True
3,Search for a few nice photos matching Opera Ho...,SearchWeb,"[['size', '1024px * 768px'], ['query', 'Opera ...",( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),ParaQuality,4,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...",['list all the airlines that have flights from...,Find a couple of beautiful pictures that match...,( S ( VP ( VB ) ( NP ) ) ( . ) ),...,"{'( S ( VP ( VB ) ( NP ) ) )': 7.0, '( SQ ( MD...",False,I wonder it would be possible if you search mu...,( S ( NP ( PRP ) ) ( VP ( VBP ) ( SBAR ) ) ( ....,"{'( S ( VP ( VB ) ( NP ) ) )': 5.0, '( SQ ( MD...",False,Please try to locate various colorful photogra...,( S ( S ( VP ) ) ( . ) ( NP ( DT ) ( NNS ) ) (...,"{'( S ( VP ( VB ) ( NP ) ) )': 8.0, '( SQ ( MD...",False
4,Are the burglar alarms in the office malfuncti...,CheckDevice,"[['location', 'office']]",( SQ ( VBP ) ( NP ( DT ) ( NN ) ( NNS ) ) ( PP...,ParaQuality,5,"['( S ( VP ( VB ) ( NP ) ) )', '( SQ ( MD ) ( ...",['display the airlines that operate flights be...,Do the burglar alarms in the office have any i...,( SQ ( VBP ) ( NP ( DT ) ( NN ) ) ( VP ( NNS )...,...,"{'( S ( VP ( VB ) ( NP ) ) )': 6.0, '( SQ ( MD...",True,Do we need someone from maintenance staff or t...,( SQ ( VBP ) ( NP ( PRP ) ) ( VP ( VB ) ( NP )...,"{'( S ( VP ( VB ) ( NP ) ) )': 8.0, '( SQ ( MD...",False,Any chance those annoying false-positives I've...,"( S ( NP ( DT ) ( NN ) ( S ) ) ( , ) ( SQ ( MD...","{'( S ( VP ( VB ) ( NP ) ) )': 8.0, '( SQ ( MD...",False


In [77]:
# List of 'P' and 'Template' columns
p_columns = ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9', 'p10', 'p11', 'p12']
template_columns = ['p1_template', 'p2_template', 'p3_template', 'p4_template', 'p5_template',
                    'p6_template', 'p7_template', 'p8_template', 'p9_template', 'p10_template',
                    'p11_template', 'p12_template']

# Extract values from the original DataFrame
p_values = df[p_columns].values.flatten()
template_values = df[template_columns].values.flatten()

# Repeat 'input_utterance' and 'intent' values for each pair of P and Template
input_utterance_values = df['input_utterance'].repeat(len(p_columns)).values
intent_values = df['intent'].repeat(len(p_columns)).values

# Create a new DataFrame with columns 'Input Utterance', 'Intent', 'P', and 'Template'
df_GPT_T_12 = pd.DataFrame({
    'Input Utterance': input_utterance_values,
    'Intent': intent_values,
    'P': p_values,
    'Template': template_values
})

# Display the new DataFrame
df_GPT_T_12.head()


Unnamed: 0,Input Utterance,Intent,P,Template
0,Terminate i-a541 now,EndEC2Instance,Halt the functioning of i-a541 immediately.,( S ( VP ( VB ) ( NP ) ( ADVP ) ) ( . ) )
1,Terminate i-a541 now,EndEC2Instance,Abort the execution of i-a541 instantly.,( S ( VP ( VB ) ( NP ) ( ADVP ) ) ( . ) )
2,Terminate i-a541 now,EndEC2Instance,Suspend the execution of script named a-1541 w...,( S ( VP ( VB ) ( NP ) ) )
3,Terminate i-a541 now,EndEC2Instance,Finish executing program i-a541 now,( NP ( NN ) ( VP ( VBG ) ( NP ) ( NP ) ( ADVP ...
4,Terminate i-a541 now,EndEC2Instance,Terminate the process named i-a541 right away,( S ( VP ( VB ) ( NP ) ) )


## Read GPT Bootstrap dataset

In [78]:
# input_file = f"{os.getcwd()}/CompareSyntaxTemplate/Jorge-bootstrap-with-bertscores.csv"
input_file = f"{os.getcwd()}/CompareSyntaxTemplate/bootstrap-clean.csv"

df_GPT_bootstrap = pd.read_csv(input_file)
print(f"Shape: {df_GPT_bootstrap.shape}")
print(f"Columns: {df_GPT_bootstrap.columns}")
df_GPT_bootstrap.head()

Shape: (1188, 10)
Columns: Index(['input_utterance', 'intent', 'parameters', 'utterance_template',
       'source', 'seed_id', 'paraphrase_value', 'paraphrase_template',
       'bertscore', 'ted'],
      dtype='object')


Unnamed: 0,input_utterance,intent,parameters,utterance_template,source,seed_id,paraphrase_value,paraphrase_template,bertscore,ted
0,Terminate i-a541 now,EndEC2Instance,"[['VM', 'i-a541']]",( S ( VP ( VB ) ( NP ) ( ADVP ) ) ),ParaQuality,1,End i-a541 immediately,( NP ( NP ( NN ) ( NN ) ) ( ADVP ( RB ) ) ),0.947383,5.0
1,Search for cooking videos,SearchWeb,"[['query', 'cooking videos']]",( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),ParaQuality,2,Find videos on how to cook,( S ( VP ( VB ) ( NP ) ) ),0.917345,4.0
2,Find most popular photos tagged #LOVE,SearchWeb,"[['Tag', '#LOVE']]",( S ( S ( VP ) ) ( VP ( VBP ) ) ),ParaQuality,3,Discover the trending pictures labeled #LOVE,( S ( S ( VP ) ) ( VP ( VBP ) ) ),0.954254,0.0
3,Search for a few nice photos matching Opera Ho...,SearchWeb,"[['size', '1024px * 768px'], ['query', 'Opera ...",( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),ParaQuality,4,Find some beautiful pictures that match Opera ...,( S ( VP ( VB ) ( NP ) ) ( . ) ),0.928975,5.0
4,Are the burglar alarms in the office malfuncti...,CheckDevice,"[['location', 'office']]",( SQ ( VBP ) ( NP ( DT ) ( NN ) ( NNS ) ) ( PP...,ParaQuality,5,Do the burglar alarms in the office have a mal...,( SQ ( VBP ) ( NP ( DT ) ( NN ) ) ( VP ( NNS )...,0.969968,6.0


In [79]:
# List of 'P' and 'Template' columns

columns_to_remove = ['parameters', 'source', 'seed_id', 'ted']


# Drop the columns from the DataFrame
df_GPT_bootstrap = df_GPT_bootstrap.drop(columns=columns_to_remove)

# Display the updated DataFrame
print(f"Columns: {df_GPT_bootstrap.columns}")
df_GPT_bootstrap.head()


Columns: Index(['input_utterance', 'intent', 'utterance_template', 'paraphrase_value',
       'paraphrase_template', 'bertscore'],
      dtype='object')


Unnamed: 0,input_utterance,intent,utterance_template,paraphrase_value,paraphrase_template,bertscore
0,Terminate i-a541 now,EndEC2Instance,( S ( VP ( VB ) ( NP ) ( ADVP ) ) ),End i-a541 immediately,( NP ( NP ( NN ) ( NN ) ) ( ADVP ( RB ) ) ),0.947383
1,Search for cooking videos,SearchWeb,( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),Find videos on how to cook,( S ( VP ( VB ) ( NP ) ) ),0.917345
2,Find most popular photos tagged #LOVE,SearchWeb,( S ( S ( VP ) ) ( VP ( VBP ) ) ),Discover the trending pictures labeled #LOVE,( S ( S ( VP ) ) ( VP ( VBP ) ) ),0.954254
3,Search for a few nice photos matching Opera Ho...,SearchWeb,( NP ( NP ( VB ) ) ( PP ( IN ) ( NP ) ) ),Find some beautiful pictures that match Opera ...,( S ( VP ( VB ) ( NP ) ) ( . ) ),0.928975
4,Are the burglar alarms in the office malfuncti...,CheckDevice,( SQ ( VBP ) ( NP ( DT ) ( NN ) ( NNS ) ) ( PP...,Do the burglar alarms in the office have a mal...,( SQ ( VBP ) ( NP ( DT ) ( NN ) ) ( VP ( NNS )...,0.969968


In [80]:
df_GPT_bootstrap.shape

(1188, 6)

## Read Crowd Bootstrap dataset

In [81]:
input_file = f"{os.getcwd()}/CompareSyntaxTemplate/Jorge_bootstrap_correct_only.csv"#test_data
df_crowd_bootstrap = pd.read_csv(input_file)
print(f"Shape: {df_crowd_bootstrap.shape}")
print(f"Columns: {df_crowd_bootstrap.columns}")
df_crowd_bootstrap.head()

Shape: (790, 13)
Columns: Index(['seed_id', 'intent', 'input_utterance', 'paraphrase_value',
       'parameters', 'is_correct', 'source', 'bert_score',
       'utterance_template', 'paraphrase_template', 'ted', 'duplicate',
       'semantics'],
      dtype='object')


Unnamed: 0,seed_id,intent,input_utterance,paraphrase_value,parameters,is_correct,source,bert_score,utterance_template,paraphrase_template,ted,duplicate,semantics
0,3,SearchWeb,Find most popular photos tagged #LOVE,Find photos tagged #LOVE,"[['Tag', '#LOVE']]",0,ParaQuality,0.553468,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( NP ) ) ),0.0,False,True
1,3,SearchWeb,Find most popular photos tagged #LOVE,Find popular photos tagged #LOVE,"[['Tag', '#LOVE']]",1,ParaQuality,0.642269,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( NP ) ) ),0.0,False,True
2,3,SearchWeb,Find most popular photos tagged #LOVE,Discover most popular photos tagged #LOVE,"[['Tag', '#LOVE']]",1,ParaQuality,0.689777,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( NP ) ) ),0.0,False,True
3,3,SearchWeb,Find most popular photos tagged #LOVE,Find for the most popular photos tagged #LOVE,"[['Tag', '#LOVE']]",0,ParaQuality,0.699084,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( PP ) ) ),1.0,False,True
4,3,SearchWeb,Find most popular photos tagged #LOVE,Find for most popular photos tagged with #LOVE,"[['Tag', '#LOVE']]",0,ParaQuality,0.660166,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( PP ) ) ),1.0,False,True


In [82]:
# List of 'P' and 'Template' columns

columns_to_remove = ['seed_id', 'parameters', 'is_correct', 'source', 'ted', 'duplicate', 'semantics']


# Drop the columns from the DataFrame
df_crowd_bootstrap = df_crowd_bootstrap.drop(columns=columns_to_remove)

# Display the updated DataFrame
print(f"Columns: {df_crowd_bootstrap.columns}")
df_crowd_bootstrap.head()

Columns: Index(['intent', 'input_utterance', 'paraphrase_value', 'bert_score',
       'utterance_template', 'paraphrase_template'],
      dtype='object')


Unnamed: 0,intent,input_utterance,paraphrase_value,bert_score,utterance_template,paraphrase_template
0,SearchWeb,Find most popular photos tagged #LOVE,Find photos tagged #LOVE,0.553468,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( NP ) ) )
1,SearchWeb,Find most popular photos tagged #LOVE,Find popular photos tagged #LOVE,0.642269,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( NP ) ) )
2,SearchWeb,Find most popular photos tagged #LOVE,Discover most popular photos tagged #LOVE,0.689777,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( NP ) ) )
3,SearchWeb,Find most popular photos tagged #LOVE,Find for the most popular photos tagged #LOVE,0.699084,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( PP ) ) )
4,SearchWeb,Find most popular photos tagged #LOVE,Find for most popular photos tagged with #LOVE,0.660166,( S ( VP ( VB ) ( NP ) ) ),( S ( VP ( VB ) ( PP ) ) )


In [83]:
# # Filter rows where 'bertscore' is between 0.5 and 0.98
# df_filtered = df[(df['bertscore'] >= 0.5) & (df['bertscore'] <= 0.98)]

df_crowd_bootstrap.shape

(790, 6)

# Extract Crowd paraphrases per configuration

For each of the following configuration extract the paraphrases to a dataframe: bootstrap, taboo_patterns, patterns_by_example

In [59]:
CONDITIONS = ["baseline", "baseline-cw", "baseline-outliers", "patterns-examples", "taboo-patterns"]

input_file = f"{os.getcwd()}/CompareSyntaxTemplate/Jorge-main-all-with-bertscores.csv"#test_data
df_crowd = pd.read_csv(input_file)

print(f"df_crowd.shape: {df_crowd.shape}")
print(len(df_crowd.columns))
print(df_crowd.columns)
# print(df_crowd['INPUT:pool_id'].unique())
df_crowd.head(3)

df_crowd.shape: (6120, 58)
58
Index(['coding_1', 'select', 'rand', 'select_overlap', 'coding_2',
       'Calibration', 'INPUT:row_pk', 'INPUT:intent',
       'input utterance [coding]', 'INPUT:input_utterance_bootstrap',
       'INPUT:input_utterance', 'INPUT:parameters', 'paraphrase_value',
       'is_correct', 'tag', 'paraphrase_key', 'INPUT:source', 'INPUT:pool_id',
       'INPUT:input_pattern', 'paraphrase_pattern', 'INPUT:rand_cw',
       'INPUT:is_correct', 'INPUT:masked_ngrams', 'INPUT:rand_baseline',
       'INPUT:prompt_context', 'INPUT:target_patterns',
       'INPUT:distance_to_mean', 'INPUT:distance_to_seed',
       'INPUT:input_utterance_words', 'INPUT:input_utterance_lemmatized_words',
       'OUTPUT:trace', 'OUTPUT:worker_uuid', 'OUTPUT:screen_width',
       'OUTPUT:screen_height', 'OUTPUT:page_started_at',
       'OUTPUT:provided_ngrams', 'OUTPUT:page_started_at_string',
       'ASSIGNMENT:link', 'ASSIGNMENT:task_id', 'ASSIGNMENT:assignment_id',
       'ASSIGNMENT:task_

Unnamed: 0,coding_1,select,rand,select_overlap,coding_2,Calibration,INPUT:row_pk,INPUT:intent,input utterance [coding],INPUT:input_utterance_bootstrap,...,ASSIGNMENT:expired,ASSIGNMENT:reward,judgment_time_3p,judgment_time_1p,Jorge,Jorge-tag,Marcos,Marcos-tag,J-M Agreement,bert_score
0,Jorge,False,0.623851,False,,False,1,EndEC2Instance,"""Terminate i-a541 now""","""Terminate i-a541 now""",...,,0.15,97.699,32.566333,1.0,,,,,0.436592
1,Jorge,False,0.662712,False,,False,1,EndEC2Instance,"""Terminate i-a541 now""","""Terminate i-a541 now""",...,,0.15,97.699,32.566333,1.0,,,,,0.52665
2,Jorge,False,0.118898,False,,False,1,EndEC2Instance,"""Terminate i-a541 now""","""Terminate i-a541 now""",...,,0.15,97.699,32.566333,0.0,,,,,0.485834


In [60]:
# List of columns to remove
columns_to_remove = ['coding_1', 'select', 'rand', 'select_overlap', 'coding_2',
       'Calibration', 'INPUT:input_utterance_bootstrap',
       'INPUT:input_utterance', 'INPUT:parameters', 'is_correct', 'tag', 'paraphrase_key', 'INPUT:source',
       'INPUT:input_pattern', 'paraphrase_pattern', 'INPUT:rand_cw',
       'INPUT:is_correct', 'INPUT:masked_ngrams', 'INPUT:rand_baseline',
       'INPUT:prompt_context', 'INPUT:target_patterns',
       'INPUT:distance_to_mean', 'INPUT:distance_to_seed',
       'INPUT:input_utterance_words', 'INPUT:input_utterance_lemmatized_words',
       'OUTPUT:trace', 'OUTPUT:worker_uuid', 'OUTPUT:screen_width',
       'OUTPUT:screen_height', 'OUTPUT:page_started_at',
       'OUTPUT:provided_ngrams', 'OUTPUT:page_started_at_string',
       'ASSIGNMENT:link', 'ASSIGNMENT:task_id', 'ASSIGNMENT:assignment_id',
       'ASSIGNMENT:task_suite_id', 'ASSIGNMENT:worker_id', 'ASSIGNMENT:status',
       'ASSIGNMENT:started', 'ASSIGNMENT:submitted', 'ASSIGNMENT:accepted',
       'ASSIGNMENT:rejected', 'ASSIGNMENT:skipped', 'ASSIGNMENT:expired',
       'ASSIGNMENT:reward', 'judgment_time_3p', 'judgment_time_1p', 'Jorge',
       'Jorge-tag', 'Marcos', 'Marcos-tag', 'J-M Agreement']

# Drop the columns from the DataFrame
df_crowd = df_crowd.drop(columns=columns_to_remove)

# Display the updated DataFrame
print(f"Columns: {df_crowd.columns}")
df_crowd.head()


Columns: Index(['INPUT:row_pk', 'INPUT:intent', 'input utterance [coding]',
       'paraphrase_value', 'INPUT:pool_id', 'bert_score'],
      dtype='object')


Unnamed: 0,INPUT:row_pk,INPUT:intent,input utterance [coding],paraphrase_value,INPUT:pool_id,bert_score
0,1,EndEC2Instance,"""Terminate i-a541 now""",Get over with i-a541,baseline-cw,0.436592
1,1,EndEC2Instance,"""Terminate i-a541 now""",Get i-a541 done,baseline-cw,0.52665
2,1,EndEC2Instance,"""Terminate i-a541 now""",Do I-a541,baseline-cw,0.485834
3,1,EndEC2Instance,Terminate i-a541 now,I want to visit terminate i-a541,taboo-patterns,0.35168
4,1,EndEC2Instance,Terminate i-a541 now,Is terminate i-a541 opened?,taboo-patterns,0.485301


### Extract rows for each INPUT:pool_id value into separate DataFrames:

In [62]:
# Extract the conditions value in the dataframe as coded by Jorge et al. in their experiment from the "INPUT:pool_id" column
df_crowd['INPUT:pool_id'].unique()

array(['baseline-cw', 'taboo-patterns', 'baseline-outliers',
       'patterns-examples', 'baseline'], dtype=object)

In [63]:
# Extract rows where INPUT:pool_id is 'taboo-patterns'
df_crowd_T = df_crowd[df_crowd['INPUT:pool_id'] == 'taboo-patterns']

# Extract rows where INPUT:pool_id is 'patterns-examples'
df_crowd_P = df_crowd[df_crowd['INPUT:pool_id'] == 'patterns-examples']

df_crowd_P.shape


(1224, 6)

In [65]:
# Filter rows where 'bertscore' is between 0.5 and 0.98
df_crowd_P = df_crowd_P[(df_crowd_P['bert_score'] >= 0.5) & (df_crowd_P['bert_score'] <= 0.98)]
print(df_crowd_P.shape)

# Filter rows where 'bertscore' is between 0.5 and 0.98
df_crowd_T = df_crowd_T[(df_crowd_T['bert_score'] >= 0.5) & (df_crowd_T['bert_score'] <= 0.98)]
print(df_crowd_T.shape)

(726, 6)
(658, 6)


### Add syntax template column for the paraphrase value
For each paraphrase we will extarct the constituency parse tree using the stanza library next we will extarct the syntax tree template as the first 2 level

In [70]:
def apply_extract_syntax_pattern(df, nlp, selected_column_name, syntax_columns_name):
    """
    Apply the utlty.extract_syntax_pattern function to a row of the DataFrame.

    :args
        df (pandas.dataframe): A DataFrame.
        nlp (stanza.pipeline): The stanza piepline to extract consittuency parse tree.
        selected_column_name (str): Name of the column to process.
        syntax_columns_name (str): the name of the syntax column

    :returns
        None
    """
    p_templates = []
    
    for index, row in tqdm(df.iterrows(),position=0, leave=True):
        selected_row = row[selected_column_name]
        # some utterance are between quotes remove them e.g. "Terminate i-a541 now" -> Terminate i-a541 now
        selected_row = selected_row.strip('"') if selected_row.startswith('"') and selected_row.endswith('"') else selected_row
        paraphrase_syntax_template, _, _ = utlty.extract_syntax_pattern(nlp, selected_row)
        p_templates.append(paraphrase_syntax_template)
    
    df[syntax_columns_name] = p_templates

#### Extract Syntax templates for Crowd_Taboo dataset

In [71]:
# add ted for INPUT:input_utterance'
selected_column = 'paraphrase_value'  
syntax_column_name = 'paraphrase_template'

apply_extract_syntax_pattern(df_crowd_T, nlp, selected_column, syntax_column_name)

658it [04:45,  2.31it/s]


#### Extract Syntax templates for Crowd_Patterns_by_examples dataset

In [72]:
# add ted for INPUT:input_utterance'
selected_column = 'paraphrase_value'  
syntax_column_name = 'paraphrase_template'

apply_extract_syntax_pattern(df_crowd_P, nlp, selected_column, syntax_column_name)

726it [05:20,  2.26it/s]


# Syntax Pattern Comparison and Extraction

- Overview

This Jupyter notebook performs a comparative analysis of syntax patterns across six datasets: three generated by GPT models and three sourced from crowdsourcing. The analysis focuses on identifying unique syntax patterns in the GPT datasets that do not appear in the corresponding crowd datasets, and vice versa. The unique patterns are then used to extract specific rows from the datasets, which include the `paraphrase` and `paraphrase_template` columns.

- Datasets
- **GPT Datasets:**
  - `df_GPT_P_12`
  - `df_GPT_T_12`
  - `df_GPT_bootstrap`
  
- **Crowd Datasets:**
  - `df_crowd_P`
  - `df_crowd_T`
  - `df_crowd_bootstrap`

- Columns
  - `paraphrase_template`: Contains the syntax patterns to be analyzed.
  - `paraphrase`: Associated text or data related to the `paraphrase_template`.

In [87]:
def get_unique_rows(df, unique_patterns):
    """
    Extract rows from the DataFrame where the 'paraphrase_template' column matches the unique syntax patterns.

    Parameters:
    -----------
    df : pandas.DataFrame
        The input DataFrame containing the data (e.g., GPT or Crowd datasets).
    unique_patterns : set
        A set of unique syntax patterns (from the 'paraphrase_template' column) that are either unique to GPT or Crowd datasets.

    Returns:
    --------
    pandas.DataFrame
        A filtered DataFrame containing only the rows where the 'paraphrase_template' column values are in the set of unique patterns.
        The returned DataFrame includes the 'paraphrase' and 'paraphrase_template' columns.
    """
    return df[df['paraphrase_template'].isin(unique_patterns)][['Paraphrase', 'paraphrase_template']]

In [86]:
#rename columns => df.rename(columns={'old_name': 'new_name'}, inplace=True)
df_GPT_P_12.rename(columns={'P': 'Paraphrase','Template': 'paraphrase_template'}, inplace=True)
df_GPT_T_12.rename(columns={'P': 'Paraphrase','Template': 'paraphrase_template'}, inplace=True)
df_GPT_bootstrap.rename(columns={'paraphrase_value': 'Paraphrase'}, inplace=True)

df_crowd_bootstrap.rename(columns={'paraphrase_value': 'Paraphrase'}, inplace=True)
df_crowd_P.rename(columns={'paraphrase_value': 'Paraphrase'}, inplace=True)
df_crowd_T.rename(columns={'paraphrase_value': 'Paraphrase'}, inplace=True)

In [103]:


# Extract unique syntax patterns (paraphrase_template) from each dataset
gpt_p_12_patterns = set(df_GPT_P_12['paraphrase_template'].unique())
gpt_t_12_patterns = set(df_GPT_T_12['paraphrase_template'].unique())
gpt_bootstrap_patterns = set(df_GPT_bootstrap['paraphrase_template'].unique())

crowd_p_patterns = set(df_crowd_P['paraphrase_template'].unique())
crowd_t_patterns = set(df_crowd_T['paraphrase_template'].unique())
crowd_bootstrap_patterns = set(df_crowd_bootstrap['paraphrase_template'].unique())

# Find unique patterns for GPT that are NOT in Crowd
text = "Find unique patterns for GPT that are NOT in Crowd:"
print('-'*len(text))
print(text)
print('-'*len(text))
print()

gpt_p_12_unique = gpt_p_12_patterns - crowd_p_patterns
print(f'- In GPT_P_12 but not in Crowd_P: {len(gpt_p_12_unique)}')

gpt_p_12_unique_2 = gpt_p_12_patterns - crowd_t_patterns
print(f'- In GPT_P_12 but not in Crowd_T: {len(gpt_p_12_unique_2)}')

gpt_p_12_unique_3 = gpt_p_12_patterns - crowd_bootstrap_patterns
print(f'- In GPT_P_12 but not in Crowd_Bootstrap: {len(gpt_p_12_unique_3)}')

gpt_t_12_unique = gpt_t_12_patterns - crowd_t_patterns
print(f'- In GPT_T_12 but not in Crowd_T: {len(gpt_t_12_unique)}')

gpt_t_12_unique_2 = gpt_t_12_patterns - crowd_p_patterns
print(f'- In GPT_T_12 but not in Crowd_P: {len(gpt_t_12_unique_2)}')

gpt_t_12_unique_3 = gpt_t_12_patterns - crowd_bootstrap_patterns
print(f'- In GPT_T_12 but not in Crowd_Bootstrap: {len(gpt_t_12_unique_3)}')

gpt_bootstrap_unique = gpt_bootstrap_patterns - crowd_bootstrap_patterns
print(f'- In GPT_Bootstrap but not in Crowd_Bootstrap: {len(gpt_bootstrap_unique)}')

gpt_bootstrap_unique_2 = gpt_bootstrap_patterns - crowd_t_patterns
print(f'- In GPT_Bootstrap but not in Crowd_T: {len(gpt_bootstrap_unique_2)}')

gpt_bootstrap_unique_3 = gpt_bootstrap_patterns - crowd_p_patterns
print(f'- In GPT_Bootstrap but not in Crowd_P: {len(gpt_bootstrap_unique_3)}')

print(f"{'-' * 30:^{40}}")

# Find unique patterns for Crowd that are NOT in GPT
crowd_p_unique = crowd_p_patterns - gpt_p_12_patterns
print(f'- In Crowd_P but not in GPT_P_12: {len(crowd_p_unique)}')

crowd_p_unique_2 = crowd_p_patterns - gpt_t_12_patterns
print(f'- In Crowd_P but not in GPT_T_12: {len(crowd_p_unique_2)}')

crowd_p_unique_3 = crowd_p_patterns - gpt_bootstrap_patterns
print(f'- In Crowd_P but not in GPT_Bootstrap: {len(crowd_p_unique_3)}')


crowd_t_unique = crowd_t_patterns - gpt_t_12_patterns
print(f'- In Crowd_T but not in GPT_T_12: {len(crowd_t_unique)}')

crowd_t_unique_2 = crowd_t_patterns - gpt_p_12_patterns
print(f'- In Crowd_T but not in GPT_P_12: {len(crowd_t_unique_2)}')

crowd_t_unique_3 = crowd_t_patterns - gpt_bootstrap_patterns
print(f'- In Crowd_T but not in GPT_Bootstrap: {len(crowd_t_unique_3)}')


crowd_bootstrap_unique = crowd_bootstrap_patterns - gpt_bootstrap_patterns
print(f'- In Crowd_Bootstrap but not in GPT_Bootstrap: {len(crowd_bootstrap_unique)}')

crowd_bootstrap_unique_2 = crowd_bootstrap_patterns - gpt_t_12_patterns
print(f'- In Crowd_Bootstrap but not in GPT_T_12: {len(crowd_bootstrap_unique_2)}')

crowd_bootstrap_unique_3 = crowd_bootstrap_patterns - gpt_p_12_patterns
print(f'- In Crowd_Bootstrap but not in GPT_P_12: {len(crowd_bootstrap_unique_3)}')

print(f"{'-' * 30:^{40}}")

# Get unique rows (paraphrase and template) for each unique set of patterns
df_gpt_p_12_unique = get_unique_rows(df_GPT_P_12, gpt_p_12_unique)
df_gpt_t_12_unique = get_unique_rows(df_GPT_T_12, gpt_t_12_unique)
df_gpt_bootstrap_unique = get_unique_rows(df_GPT_bootstrap, gpt_bootstrap_unique)

df_crowd_p_unique = get_unique_rows(df_crowd_P, crowd_p_unique)
df_crowd_t_unique = get_unique_rows(df_crowd_T, crowd_t_unique)
df_crowd_bootstrap_unique = get_unique_rows(df_crowd_bootstrap, crowd_bootstrap_unique)

# Combine the results into new DataFrames for GPT and Crowd
df_unique_gpt = pd.concat([df_gpt_p_12_unique, df_gpt_t_12_unique, df_gpt_bootstrap_unique]).reset_index(drop=True)
df_unique_crowd = pd.concat([df_crowd_p_unique, df_crowd_t_unique, df_crowd_bootstrap_unique]).reset_index(drop=True)

# Display the resulting DataFrames
print("Unique GPT rows:")
print(df_unique_gpt)

print("\nUnique Crowd rows:")
print(df_unique_crowd)

---------------------------------------------------
Find unique patterns for GPT that are NOT in Crowd:
---------------------------------------------------

- In GPT_P_12 but not in Crowd_P: 375
- In GPT_P_12 but not in Crowd_T: 387
- In GPT_P_12 but not in Crowd_Bootstrap: 378
- In GPT_T_12 but not in Crowd_T: 407
- In GPT_T_12 but not in Crowd_P: 400
- In GPT_T_12 but not in Crowd_Bootstrap: 402
- In GPT_Bootstrap but not in Crowd_Bootstrap: 158
- In GPT_Bootstrap but not in Crowd_T: 170
- In GPT_Bootstrap but not in Crowd_P: 153
     ------------------------------     
- In Crowd_P but not in GPT_P_12: 341
- In Crowd_P but not in GPT_T_12: 348
- In Crowd_P but not in GPT_Bootstrap: 348
- In Crowd_T but not in GPT_T_12: 356
- In Crowd_T but not in GPT_P_12: 354
- In Crowd_T but not in GPT_Bootstrap: 366
- In Crowd_Bootstrap but not in GPT_Bootstrap: 271
- In Crowd_Bootstrap but not in GPT_T_12: 268
- In Crowd_Bootstrap but not in GPT_P_12: 262
     ------------------------------     

(511, 2)