# Analysing the Impact of Individual Factors on PR Acceptance

## Imports & Load Dataset

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as stats
from utils import df_column_statistics, chi_squared_test, fishers_exact_test


pr_df = pd.read_csv('data/filtered/pull_request.csv')
print("Shape:", pr_df.shape)
pr_df.head()

Shape: (7156, 34)


Unnamed: 0,id,number,title,body,agent,user_id,user,state,created_at,closed_at,...,total_churn,num_bot_users,num_human_users,num_total_users,num_comments,num_human_comments,num_bot_comments,num_reviews,num_human_reviews,num_bot_reviews
0,3264933329,2911,Fix: Wait for all partitions in load_collectio...,## Summary\n\nFixes an issue where `load_colle...,Claude_Code,108661493,weiliu1031,closed,2025-07-26T02:59:01Z,2025-07-29T07:01:20Z,...,396.0,0,2,2,2,2,0,0,0,0
1,3265709660,205,feat: add comprehensive README screenshots wit...,## Type of Change\n\n- [ ] üêõ `bug` - Bug fix (...,Claude_Code,80381,sugyan,closed,2025-07-26T14:07:22Z,2025-07-26T14:45:30Z,...,300.0,1,0,1,0,0,0,1,0,1
2,3214555104,16658,Add function signature breaking change detector,<details><summary>&#x1F6E0 DevTools &#x1F6E0</...,Claude_Code,17039389,harupy,closed,2025-07-09T05:35:26Z,2025-07-11T05:13:35Z,...,620.0,0,2,2,3,3,0,8,8,0
3,3214724259,5489,feat: add comprehensive test coverage for form...,## Summary\n\nThis PR enhances the forms plugi...,Claude_Code,82053242,wtfsayo,closed,2025-07-09T06:43:46Z,2025-07-09T06:44:02Z,...,1353.0,3,0,3,2,0,2,1,0,1
4,3214782537,1538,Major Architecture Refactor - Configuration Sy...,### **User description**\r\nResolves #1529 \r\...,Claude_Code,1206,delano,closed,2025-07-09T07:05:44Z,2025-07-17T18:34:41Z,...,2828.0,3,0,3,9,0,9,3,0,3


## Time to Closure

In [2]:
# Annotate Days to Close
pr_df['days_to_close'] = (pd.to_datetime(pr_df['closed_at']) - pd.to_datetime(pr_df['created_at'])).dt.days + 1
pr_df[['created_at', 'closed_at', 'days_to_close']].head()

Unnamed: 0,created_at,closed_at,days_to_close
0,2025-07-26T02:59:01Z,2025-07-29T07:01:20Z,4
1,2025-07-26T14:07:22Z,2025-07-26T14:45:30Z,1
2,2025-07-09T05:35:26Z,2025-07-11T05:13:35Z,2
3,2025-07-09T06:43:46Z,2025-07-09T06:44:02Z,1
4,2025-07-09T07:05:44Z,2025-07-17T18:34:41Z,9


In [3]:
# Days to Close Distribution
pr_df['days_to_close'].value_counts()

days_to_close
1      4491
2       549
3       293
9       269
4       229
       ... 
132       1
54        1
69        1
121       1
77        1
Name: count, Length: 70, dtype: int64

In [4]:
df_column_statistics(pr_df, 'days_to_close')

Accepted PR Statistics for 'days_to_close':
Mean = 3.006649204110417
Median = 1.0
Standard Deviation = 5.940535313404627
Min = 1
Max = 68

Rejected PR Statistics for 'days_to_close':
Mean = 6.89420884632923
Median = 2.0
Standard Deviation = 11.115630156810978
Min = 1
Max = 148


## Related Issues

In [5]:
pr_df['related_issue'].value_counts()

related_issue
False    4871
True     2285
Name: count, dtype: int64

In [6]:
chi_squared_test(pr_df, 'related_issue')
fishers_exact_test(pr_df, 'related_issue')

Chi-squared Test for 'related_issue' vs 'accepted'

Contingency Table:
accepted       False  True 
related_issue              
False           1391   3480
True             802   1483

Chi-squared statistic: 31.0095
P-value: 0.000000025676976
Degrees of freedom: 1
Fisher's Exact Test for 'related_issue' vs 'accepted'

Contingency Table:
accepted       False  True 
related_issue              
False           1391   3480
True             802   1483

Odds Ratio: 0.7391
P-value: 0.0000


## Reviews

## Comments

## Commits

## Overall Final Plot