# Goals

Following a similar theme as prior notebooks, this notebook ALSO manually inspects how good the BAD synthetic data is.

This notebook has a similar structure to the other notebooks. First, we download data from the remote server, where I run both open-source models and closed-source models. Then:

1. I manually inspect data generated from the Chicago Manual of Style
2. I manually inspect data generated from other style-guides (e.g. Buzzfeed)

# Download the Data from the server

In [1]:
import pandas as pd 

In [137]:
proj_dir = "/project/jonmay_231/spangher/Projects/style-guides/tasks/generate_synthetic_example_data"
local_data_dir = "../tasks/generate_synthetic_example_data/output_data/chicago_only"
! ssh end "ls $proj_dir/*cot_bad*"

/project/jonmay_231/spangher/Projects/style-guides/tasks/generate_synthetic_example_data/buzzfeed__command-r_zeroshot_cot_bad_only.jsonl
/project/jonmay_231/spangher/Projects/style-guides/tasks/generate_synthetic_example_data/buzzfeed__gpt-4-turbo_zeroshot_cot_bad_only.jsonl
/project/jonmay_231/spangher/Projects/style-guides/tasks/generate_synthetic_example_data/buzzfeed__llama-3-70b_zeroshot_cot_bad_only.jsonl
/project/jonmay_231/spangher/Projects/style-guides/tasks/generate_synthetic_example_data/command-r_zeroshot_cot_bad_only.jsonl
/project/jonmay_231/spangher/Projects/style-guides/tasks/generate_synthetic_example_data/gpt-3.5-turbo_zeroshot_cot_bad_only.jsonl
/project/jonmay_231/spangher/Projects/style-guides/tasks/generate_synthetic_example_data/gpt-4-turbo_zeroshot_cot_bad_only.jsonl
/project/jonmay_231/spangher/Projects/style-guides/tasks/generate_synthetic_example_data/guardian__command-r_zeroshot_cot_bad_only.jsonl
/project/jonmay_231/spangher/Projects/style-guides/tasks/gene

In [138]:
! scp end:"$proj_dir/*cot_bad*" $local_data_dir

buzzfeed__command-r_zeroshot_cot_bad_only.jso 100% 1599KB   1.4MB/s   00:01    
buzzfeed__gpt-4-turbo_zeroshot_cot_bad_only.j 100% 1883KB   1.7MB/s   00:01    
buzzfeed__llama-3-70b_zeroshot_cot_bad_only.j 100% 1676KB   2.2MB/s   00:00    
command-r_zeroshot_cot_bad_only.jsonl         100% 3966KB   4.2MB/s   00:00    
gpt-3.5-turbo_zeroshot_cot_bad_only.jsonl     100% 1361KB   4.3MB/s   00:00    
gpt-4-turbo_zeroshot_cot_bad_only.jsonl       100% 2204KB   4.0MB/s   00:00    
guardian__command-r_zeroshot_cot_bad_only.jso 100% 4568KB   4.7MB/s   00:00    
guardian__gpt-4-turbo_zeroshot_cot_bad_only.j 100% 7792KB   6.0MB/s   00:01    
guardian__llama-3-70b_zeroshot_cot_bad_only.j 100% 2034KB   5.8MB/s   00:00    
llama-3-70b_zeroshot_cot_bad_only.jsonl       100% 4427KB   6.1MB/s   00:00    
mixtral_zeroshot_cot_bad_only.jsonl           100% 2133KB   4.2MB/s   00:00    
mother_jones__command-r_zeroshot_cot_bad_only 100%  426KB   4.1MB/s   00:00    
mother_jones__gpt-4-turbo_zeroshot_cot_b

# Read in Chicago-Only Data

In [139]:
import glob
import pandas as pd 

orig_input_file = pd.read_json('../corpora/chicago-style-guide/all-rules.jsonl', lines=True)
bad_sentence_files = glob.glob(local_data_dir + '/*cot*')

mixtral_data_df = pd.read_json(local_data_dir + '/mixtral_zeroshot_cot_bad_only.jsonl', lines=True)
gpt4_data_df = pd.read_json(local_data_dir + '/gpt-4-turbo_zeroshot_cot_bad_only.jsonl', lines=True)
llama_data_df = pd.read_json(local_data_dir + '/llama-3-70b_zeroshot_cot_bad_only.jsonl', lines=True)
gpt3_data_df = pd.read_json(local_data_dir + '/gpt-3.5-turbo_zeroshot_cot_bad_only.jsonl', lines=True)
command_r_data_df = pd.read_json(local_data_dir + '/command-r_zeroshot_cot_bad_only.jsonl', lines=True)

In [None]:
idx = 7

In [69]:
input_data_df_w_all_annotations = (
    orig_input_file
    ## gpt 4
         .merge(gpt4_data_df
                # .drop(columns=['message'])
                .rename(columns={'key': 'url'}))
         # .assign(was_unclear_gpt4=lambda df: df['parsed_output'] == WAS_UNCLEAR_STR)
         # .assign(parsed_output=lambda df: df['parsed_output'].apply(lambda x: x if x != WAS_UNCLEAR_STR else None))
         # .rename(columns={'parsed_output': 'parsed_gpt4'})
         .rename(columns={'message': 'message_gpt4'})
    # ##  mixtral
         .merge(mixtral_data_df
                    # .drop(columns=['message'])
                    .rename(columns={'key': 'url'}), how='left')
         # .assign(was_unclear_mixtral=lambda df: df['parsed_output'] == WAS_UNCLEAR_STR)
         # .assign(parsed_output=lambda df: df['parsed_output'].apply(lambda x: x if x != WAS_UNCLEAR_STR else None))
         # .rename(columns={'parsed_output': 'parsed_mixtral'})
        .rename(columns={'message': 'message_mixtral'})
    # ##  command-r
         .merge(command_r_data_df
                # .drop(columns=['message'])
                .rename(columns={'key': 'url'}), how='left')
         # .assign(was_unclear_command_r=lambda df: df['parsed_output'] == WAS_UNCLEAR_STR)
         # .assign(parsed_output=lambda df: df['parsed_output'].apply(lambda x: x if x != WAS_UNCLEAR_STR else None))
         # .rename(columns={'parsed_output': 'parsed_command_r'})
         .rename(columns={'message': 'message_command_r'})
    # ##  llama 3
         .merge(llama_data_df
                    # .drop(columns=['message'])
                    .rename(columns={'key': 'url'}), how='left')
         # .assign(was_unclear_llama=lambda df: df['parsed_output'] == WAS_UNCLEAR_STR)
         # .assign(parsed_output=lambda df: df['parsed_output'].apply(lambda x: x if x != WAS_UNCLEAR_STR else None))
         # .rename(columns={'parsed_output': 'parsed_command_llama'})    
         .rename(columns={'message': 'message_llama'})
        .drop_duplicates('url')
)

In [186]:
d = (input_data_df_w_all_annotations
     [['content', 'message_gpt4', 'message_mixtral', 'message_command_r', 'message_llama']]
     .iloc[700]
     # .iloc[0]
     .to_dict()
)

for k, v in d.items():
    print(k)
    print('---------------------------------')
    print(v)

    print()
    print()

content
---------------------------------
Large or complex fractions are expressed as numeric decimal fractions (cf. 9.14). When a quantity equals less than 1.00, a zero normally appears before the decimal point as an aid to readability, particularly in scientific contexts and especially if quantities greater than 1.00 appear in the same context. Note that a unit of measure with a quantity of less than one is generally treated as if it were plural (see 10.65, 10.53). See also 9.55, 9.58.

a mean of 0.73
the ratio 0.85
In Cyprus, there were 0.96 females for every male in the general population; in the sixty-five-and-over age group, the number was 1.30.

In contexts where decimal quantities must be 1.00 or less, as in probabilities, batting averages, and the like, or between −1.00 and 1.00, as in correlation coefficients, a zero is typically omitted before the decimal point. For zeros with decimal points in tables, see 3.72.


p < .05

R = .10
Ty Cobb’s career batting average was .367.



# Read in the Data from other Style Guides

In [148]:
import glob
import pandas as pd 

orig_input_file = pd.read_csv('../corpora/mother_jones/parsed_rules.csv', index_col=0)
mother_jones_data_files = list(filter(lambda x: 'mother_jones' in x, bad_sentence_files))

gpt4_data_df = pd.read_json(list(filter(lambda x: 'gpt-4' in x, mother_jones_data_files))[0], lines=True).assign(algo='gpt4')
llama_data_df = pd.read_json(list(filter(lambda x: 'llama-3' in x, mother_jones_data_files))[0], lines=True).assign(algo='llama')
command_r_data_df = pd.read_json(list(filter(lambda x: 'command-r' in x, mother_jones_data_files))[0], lines=True).assign(algo='command_r')

In [155]:
gpt4_data_df.head(2)

Unnamed: 0,message,key,algo
0,"1. Is there a rule being expressed? \n Yes, ...",mother_jones__0,gpt4
1,"1. Is there a rule being expressed? \n Yes, ...",mother_jones__1,gpt4


In [165]:
full_mother_jones_data_df = (
    orig_input_file
        .assign(key=lambda df: 'mother_jones__' + df.reset_index()['index'].astype(str))
        .merge(gpt4_data_df[['key','message']].rename(columns={'message': 'gpt4_message'}), on='key')
        .merge(llama_data_df[['key','message']].rename(columns={'message': 'llama_message'}), on='key')    
        .merge(command_r_data_df[['key','message']].rename(columns={'message': 'command_r_message'}), on='key')    
)

In [183]:
algos_to_compare =['gpt4', 'llama']
idx = 250
row = full_mother_jones_data_df.iloc[idx]

In [184]:
print(row['rule_text'])
print()
print('-----------------------------------------')
for a in algos_to_compare:
    print(a)
    print()
    print(row[f'{a}_message'])
    print('------------------------------')

Results were barely in Tuesday night, but…

-----------------------------------------
gpt4

1. Is there a rule being expressed?

The entry provided from the style guide, "Results were barely in Tuesday night, but…", does not explicitly state a rule. It appears to be a fragment of a sentence likely used to illustrate a specific style or usage in the context of live blogs.

2. Is this rule something that can be violated?

Since no clear rule or specific grammatical/spelling preference is expressed in the fragment provided, it's challenging to determine a rule that can be explicitly violated.

Conclusion: No clear rule or preference expressed.
------------------------------
llama

Let's break it down step by step.

1. Is there a rule being expressed?
Yes, the rule appears to be related to formatting and punctuation in live blog updates.

2. Is this rule something that can be violated?
Yes, it's possible to violate this rule by not following the specified format.

Now, let's simplify the r

In [170]:
row

rule_title           Capitalization, Punctuation, Grammar, General ...
rule_hierarchy       Capitalization, Punctuation, Grammar, General ...
rule_text            • Use serial commas: one, two, and three. • Us...
key                                                   mother_jones__10
gpt4_message         1. Yes, there is a rule being expressed in the...
llama_message        Let's break down the entry step-by-step.\n\n1....
command_r_message    Yes, there is a rule being expressed, and it c...
Name: 10, dtype: object