In [1]:
import sys
sys.path.append('../src')

In [99]:
import os
import numpy as np
from dotenv import load_dotenv
from openai import OpenAI
from utils import get_combined_df
from tqdm import tqdm
import pandas as pd

In [61]:
load_dotenv()

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

In [51]:
SYSTEM_MESSAGE="""
You are a developer who is tasked with a very specific goal. You will be given commit messages from a Github repo, and your task is to identify the core problem(s) that this particular commit is trying to solve. With that information, you need to write a short description of problem itself in the style of a Github issue before this change was commited. In essence you are trying to reconstruct what potential bugs led to certain commits happening.  Do not mention any information about the solution in the commit (as that would be cheating). Just what the potential issue or problem could have been that led to this commit being needed.

Do not mention the names or contact information of any people involved in the commits (like authors or reviewers). Do not start responding with any description, just start the issue.

For example:
I will give you this commit message:
'[Fizz] Fix for failing id overwrites for postpone (#27684)
When we postpone during a render we inject a new segment synchronously
which we postpone. That gets assigned an ID so we can refer to it
immediately in the postponed state.

When we do that, the parent segment may complete later even though it's
also synchronous. If that ends up not having any content in it, it'll
inline into the child and that will override the child's segment id
which is not correct since it was already assigned one.

To fix this, we simply opt-out of the optimization in that case which is
unfortunate because we'll generate many more unnecessary empty segments.
So we should come up with a new strategy for segment id assignment but
this fixes the bug.

Co-authored-by: Josh Story <story@hey.com>'


And I want you to respond:
'When postponing during a render, injecting a new segment synchronously that gets assigned an ID can lead to issues. If the parent segment, also synchronous, completes later without any content, it inlines into the child segment, incorrectly overriding the child's segment ID which was already assigned.'
"""

In [4]:
repo_path = "../smalldata/fbr"
# index_path = f"{repo_path}/index_commit_tokenized"
combined_df = get_combined_df(repo_path)

In [27]:
def filter_df(combined_df):
    # Step 1: Filter out only the columns we need
    filtered_df = combined_df[['commit_date', 'commit_message', 'commit_id', 'file_path', 'diff']]

    # Step 2: Group by commit_id
    grouped_df = filtered_df.groupby(['commit_id', 'commit_date', 'commit_message'])['file_path'].apply(list).reset_index()
    grouped_df.rename(columns={'file_path': 'actual_files_modified'}, inplace=True)

    # Step 3: Determine midpoint and filter dataframe
    midpoint_date = np.median(grouped_df['commit_date'])
    recent_df = grouped_df[grouped_df['commit_date'] > midpoint_date]
    print(f'Number of commits after midpoint date: {len(recent_df)}')

    # Step 4: Filter out commits with less than average length commit messages
    average_commit_len = recent_df['commit_message'].str.split().str.len().mean()
    # filter out commits with less than average length
    recent_df = recent_df[recent_df['commit_message'].str.split().str.len() > average_commit_len]
    print(f'Number of commits after filtering by commit message length: {len(recent_df)}')

    # Step 5: Remove outliers based on commit message length and number of files modified
    q1_commit_len = recent_df['commit_message'].str.len().quantile(0.25)
    q3_commit_len = recent_df['commit_message'].str.len().quantile(0.75)
    iqr_commit_len = q3_commit_len - q1_commit_len

    q1_files = recent_df['actual_files_modified'].apply(len).quantile(0.25)
    q3_files = recent_df['actual_files_modified'].apply(len).quantile(0.75)
    iqr_files = q3_files - q1_files

    filter_condition = (
        (recent_df['commit_message'].str.len() <= (q3_commit_len + 1.5 * iqr_commit_len)) &
        (recent_df['actual_files_modified'].apply(len) <= (q3_files + 1.5 * iqr_files))
    )
    filtered_df = recent_df[filter_condition]

    print(f'Number of commits after filtering by commit message length and number of files modified: {len(filtered_df)}')

    return filtered_df

In [9]:
def random_print_commit_message(df):
    print(df.sample()['commit_message'].values[0])

In [28]:
filtered_df = filter_df(combined_df)

Number of commits after midpoint date: 5795
Number of commits after filtering by commit message length: 1536
Number of commits after filtering by commit message length and number of files modified: 1325


In [43]:
random_print_commit_message(filtered_df)

Reuse hooks when replaying a suspended component

When a component suspends, under some conditions, we can wait for the
data to resolve and replay the component without unwinding the stack or
showing a fallback in the interim. When we do this, we reuse the
promises that were unwrapped during the previous attempts, so that if
they aren't memoized, the result can still be used.

We should do the same for all hooks. That way, if you _do_ memoize an
async function call with useMemo, it won't be called again during the
replay. This effectively gives you a local version of the functionality
provided by `cache`, using the normal memoization patterns that have
long existed in React.



In [81]:
# sample 100 commits from filtered_df with seed 42 without replacement
final_df = filtered_df.sample(n=100, random_state=42, replace=False)

In [82]:
# number of unique commit_ids
print(f'Number of unique commit_ids: {len(final_df)}')

Number of unique commit_ids: 100


In [83]:
final_df

Unnamed: 0,commit_id,commit_date,commit_message,actual_files_modified
7907,af1b039bdd5a8b5def5d51acad00b79e9b7b377c,1586481094,ESLint rule to forbid cross fork imports (#185...,"[.eslintrc.js, scripts/eslint-rules/__tests__/..."
4082,5aa0c5671fdddc46092d46420fff84a82df558ac,1623102438,Fix Issue with Undefined Lazy Imports By Refac...,[packages/react-reconciler/src/__tests__/React...
7901,af08b5cbcaf4d3e3ad965a9165e41688733a7771,1509740372,Release script follow-up work after 16.1.0-bet...,[scripts/release/build-commands/add-git-tag.js...
1546,24dbe851e8a3a3a5233654183fd80b0d64b99295,1576610956,fix(dev-tools): fix show correct displayName w...,[packages/react-devtools-shared/src/backend/re...
10057,ddc4b65cfe17b3f08ff9f18f8804ff5b663788c8,1586291681,Clear finished discrete updates during commit ...,[packages/react-reconciler/src/ReactFiberWorkL...
...,...,...,...,...
226,05a55a4b09b7b7c8f63778fb8252a001ca66f8d7,1642620847,Fix change events for custom elements (#22938)...,[packages/react-dom/src/__tests__/DOMPropertyO...
1693,27b5699694f20220e0448f0ba3eb6bfa0d3a64ed,1644619917,Simplify cache pool contexts (#23280) The `po...,[packages/react-reconciler/src/ReactFiberCache...
394,09916479219a61ae86d2ec8ce159a161337b9007,1613595642,Use setImmediate when available over MessageCh...,[packages/scheduler/src/SchedulerFeatureFlags....
9041,c826dc50de288758a0b783b2fd37b40a3b512fc4,1681936268,Add (Client) Functions as Form Actions (#26674...,"[fixtures/flight/src/Button.js, fixtures/fligh..."


In [77]:
# Function to send request to OpenAI API
def transform_message(commit_message, model):
    user_message = f"Now do the same thing for this commit message: {commit_message}"
    try:
        response = client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": SYSTEM_MESSAGE},
                {"role": "user", "content": user_message}
            ]
        )
        return response.choices[0].message.content
    except Exception as e:
        print(f"Error: {e}")
        return None

In [94]:
# Specify the model name as a variable
model_name = "gpt-3.5-turbo"


In [90]:
# Iterate over DataFrame rows with tqdm progress bar
for index, row in tqdm(final_df.iterrows(), total=final_df.shape[0]):
    transformed_message = transform_message(row['commit_message'], model_name)
    final_df.at[index, 'transformed_message_gpt3'] = transformed_message

  final_df.at[index, 'transformed_message_gpt3'] = transformed_message
  2%|▏         | 2/100 [00:35<29:07, 17.83s/it]


KeyboardInterrupt: 

In [92]:
final_df.iloc[0]['transformed_message_gpt3']

'When syncing changes across implementations, it is important to ensure that modules from one fork do not import modules from the other fork. This rule helps prevent potential issues, such as code size regressions that may occur if one fork accidentally depends on two copies of the same module.'

In [97]:
# save final_df to csv
save_dir = '../gold/'
repo_name = 'facebook_react'
output_dir = os.path.join(save_dir, repo_name)
if not os.path.exists(output_dir):
    os.makedirs(output_dir)
save_path = os.path.join(output_dir, 'gpt3.csv')
final_df.to_csv(save_path, index=False)


# also save as parquet
save_path = os.path.join(output_dir, 'gpt3.parquet')
final_df.to_parquet(save_path, index=False)

In [100]:
pd.read_parquet(save_path)

Unnamed: 0,commit_id,commit_date,commit_message,actual_files_modified,transformed_message_gpt3
0,af1b039bdd5a8b5def5d51acad00b79e9b7b377c,1586481094,ESLint rule to forbid cross fork imports (#185...,"[.eslintrc.js, scripts/eslint-rules/__tests__/...","When syncing changes across implementations, i..."
1,5aa0c5671fdddc46092d46420fff84a82df558ac,1623102438,Fix Issue with Undefined Lazy Imports By Refac...,[packages/react-reconciler/src/__tests__/React...,"When lazy importing, there is an issue with un..."
2,af08b5cbcaf4d3e3ad965a9165e41688733a7771,1509740372,Release script follow-up work after 16.1.0-bet...,[scripts/release/build-commands/add-git-tag.js...,
3,24dbe851e8a3a3a5233654183fd80b0d64b99295,1576610956,fix(dev-tools): fix show correct displayName w...,[packages/react-devtools-shared/src/backend/re...,
4,ddc4b65cfe17b3f08ff9f18f8804ff5b663788c8,1586291681,Clear finished discrete updates during commit ...,[packages/react-reconciler/src/ReactFiberWorkL...,
...,...,...,...,...,...
95,05a55a4b09b7b7c8f63778fb8252a001ca66f8d7,1642620847,Fix change events for custom elements (#22938)...,[packages/react-dom/src/__tests__/DOMPropertyO...,
96,27b5699694f20220e0448f0ba3eb6bfa0d3a64ed,1644619917,Simplify cache pool contexts (#23280) The `po...,[packages/react-reconciler/src/ReactFiberCache...,
97,09916479219a61ae86d2ec8ce159a161337b9007,1613595642,Use setImmediate when available over MessageCh...,[packages/scheduler/src/SchedulerFeatureFlags....,
98,c826dc50de288758a0b783b2fd37b40a3b512fc4,1681936268,Add (Client) Functions as Form Actions (#26674...,"[fixtures/flight/src/Button.js, fixtures/fligh...",


In [84]:
final_df

Unnamed: 0,commit_id,commit_date,commit_message,actual_files_modified
7907,af1b039bdd5a8b5def5d51acad00b79e9b7b377c,1586481094,ESLint rule to forbid cross fork imports (#185...,"[.eslintrc.js, scripts/eslint-rules/__tests__/..."
4082,5aa0c5671fdddc46092d46420fff84a82df558ac,1623102438,Fix Issue with Undefined Lazy Imports By Refac...,[packages/react-reconciler/src/__tests__/React...
7901,af08b5cbcaf4d3e3ad965a9165e41688733a7771,1509740372,Release script follow-up work after 16.1.0-bet...,[scripts/release/build-commands/add-git-tag.js...
1546,24dbe851e8a3a3a5233654183fd80b0d64b99295,1576610956,fix(dev-tools): fix show correct displayName w...,[packages/react-devtools-shared/src/backend/re...
10057,ddc4b65cfe17b3f08ff9f18f8804ff5b663788c8,1586291681,Clear finished discrete updates during commit ...,[packages/react-reconciler/src/ReactFiberWorkL...
...,...,...,...,...
226,05a55a4b09b7b7c8f63778fb8252a001ca66f8d7,1642620847,Fix change events for custom elements (#22938)...,[packages/react-dom/src/__tests__/DOMPropertyO...
1693,27b5699694f20220e0448f0ba3eb6bfa0d3a64ed,1644619917,Simplify cache pool contexts (#23280) The `po...,[packages/react-reconciler/src/ReactFiberCache...
394,09916479219a61ae86d2ec8ce159a161337b9007,1613595642,Use setImmediate when available over MessageCh...,[packages/scheduler/src/SchedulerFeatureFlags....
9041,c826dc50de288758a0b783b2fd37b40a3b512fc4,1681936268,Add (Client) Functions as Form Actions (#26674...,"[fixtures/flight/src/Button.js, fixtures/fligh..."


In [63]:
test_commit = final_df['commit_message'].values[0]
test_commit

'ESLint rule to forbid cross fork imports (#18568)\n\nModules that belong to one fork should not import modules that belong to\r\nthe other fork.\r\n\r\nHelps make sure you correctly update imports when syncing changes across\r\nimplementations.\r\n\r\nAlso could help protect against code size regressions that might happen\r\nif one of the forks accidentally depends on two copies of the same\r\nmodule.'

In [64]:
user_message = f"Now do the same thing for this commit message: {test_commit}"
try:
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": SYSTEM_MESSAGE},
            {"role": "user", "content": user_message}
        ]
    )
    output= response['choices'][0]['message']['content']
except Exception as e:
    print(f"Error: {e}")

Error: 'ChatCompletion' object is not subscriptable


In [72]:
response.choices[0].message.content

'Imports between forks are currently allowed, which can lead to issues when syncing changes across implementations. Additionally, this can potentially result in code size regressions if one fork accidentally depends on two copies of the same module. To address this, an ESLint rule is being added to forbid cross fork imports.'

In [73]:
user_message = f"Now do the same thing for this commit message: {test_commit}"
try:
    response2 = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": SYSTEM_MESSAGE},
            {"role": "user", "content": user_message}
        ]
    )
    outpu2t= response['choices'][0]['message']['content']
except Exception as e:
    print(f"Error: {e}")

Error: 'ChatCompletion' object is not subscriptable


In [75]:
response2.choices[0].message.content

"'Problems have been detected where modules that belong to one fork are importing modules that belong to another fork. This can create issues when syncing changes across implementations and could potentially lead to code size regressions if a fork accidentally depends on two copies of the same module.'"

In [15]:
import pandas as pd
import os

In [17]:
df = pd.read_parquet('../gold/facebook_react/v2_facebook_react_gp4_gold.parquet')
repo_name = 'facebook_react'

In [11]:
df.head()

Unnamed: 0,commit_id,commit_date,commit_message,actual_files_modified,transformed_message_gpt4
0,f74c89b145c3ed48b0e6e6185c915b96c7f161dc,1563992711,Misc improvements based on user feedback from ...,[packages/react-devtools-core/src/standalone.j...,"Default setting for ""collapse new nodes"" is en..."
1,cfd8c1bd43fc4fcf708a584924f27a6c79803eae,1614020660,DevTools: Restore inspect-element bridge optim...,[packages/react-devtools-shared/src/__tests__/...,Removal of certain optimizations in the inspec...
2,efd8f6442d1aa7c4566fe812cba03e7e83aaccc3,1644508750,Resolve default onRecoverableError at root ini...,"[packages/react-art/src/ReactARTHostConfig.js,...",`createRoot` doesn't provide a default impleme...
3,3cc8a9347bd540ac68acfb181fbc7fba7c371648,1693943707,[Fizz] Move formatContext tracking back to the...,[packages/react-server/src/ReactFizzServer.js],The current architecture potentially conflates...
4,31518135c25aaa1b5c2799d2a18b6b9e9178409c,1553179971,Strengthen nested update counter test coverage...,[packages/react-dom/src/__tests__/ReactUpdates...,Nested update counter behavior needs better co...


In [24]:
openai_model = 'gpt4'
VERSION = 'v2'
def gpt_inspect(openai_model, seed=None):
    # openai_model = 'gpt3'
    print(f'Model: {openai_model}')
    gold_path = os.path.join('..', 'gold', repo_name)
    gold_file_path = os.path.join(gold_path, f'{VERSION}_{repo_name}_{openai_model}_gold.parquet')

    gold_df = pd.read_parquet(gold_file_path)
    # gold_df = gold_df.rename(columns={'transformed_message_gpt3': f'transformed_message_{openai_model}'})
    gold_df.head()

    # get a sample row, print it's commit_message, and transformed_message_gpt3
    if seed:
        random_row = gold_df.sample(1, random_state=seed)
    else:
        random_row = gold_df.sample(1)
    print(f'Original message: \n{random_row["commit_message"].values[0]}')
    print(f'Transformed message: \n{random_row[f"transformed_message_{openai_model}"].values[0]}')

In [27]:
gpt_inspect(openai_model)

Model: gpt4
Original message: 
Remove unnecessary throw catch (#19044)

This was originally added so you could use "break on caught exceptions"
but that feature is pretty useless these days since it's used for feature
detection and Suspense.

The better pattern is to use the stack trace, jump to source and set a
break point here.

Since DevTools injects its own console.error, we could inject a "debugger"
statement in there. Conditionally. E.g. React DevTools could have a flag

Transformed message: 
The presence of an unnecessary throw-catch block is disturbing the feature detection and Suspense, rendering the "break on caught exceptions" feature unusable.
