# Case studies

1. **Gold standard**: `mine-50-andor` contains the 50 most recent articles from [arxiv.org in both the cs.LG and stat.ML categories](https://arxiv.org/list/cs.LG/recent), between the dates 2022-10-24 and 2022-10-25 and contained 570 search results at the time of the dataset creation. We select articles that belong to cs.LG `or` (cs.LG `and` stat.ML) category.

2. `mine50` contains the 50 most recent articles from [arxiv.org in both the cs.LG and stat.ML categories](https://arxiv.org/list/cs.LG/recent), between the dates 2022-10-24 and 2022-10-25 and contained 570 search results at the time of the dataset creation. The search result is sorted by date in descending order

    !!! note
        The date being queried for is the last updated date and not the date of paper submission

3. `mine50-csLG` contains the results using the same method as `mine50` but without looking for articles in both cs.LG and stat.ML.

## Evaluating ReproScreener on the manually labeled (gold standard) dataset

In [1]:
import pandas as pd
from IPython.display import display
from pathlib import Path

path_corpus_andor = Path("../case-studies/arxiv-corpus/mine50-andor/")

dtypes_repro = {'id': str, 'link_count': float, 'found_links': str}
eval_andor = pd.read_csv(path_corpus_andor / 'output/repro_eval_tex.csv', dtype=dtypes_repro)[['id', 'link_count', 'found_links']]

In [17]:
eval_andor_links = eval_andor[eval_andor['link_count'] > 0]
display(eval_andor_links.info())

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21 entries, 4 to 49
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   id           21 non-null     object 
 1   link_count   21 non-null     float64
 2   found_links  21 non-null     object 
dtypes: float64(1), object(2)
memory usage: 672.0+ bytes


None

In [3]:
manual = pd.read_csv("./manual_eval.csv")
manual.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52 entries, 0 to 51
Data columns (total 15 columns):
 #   Column                         Non-Null Count  Dtype 
---  ------                         --------------  ----- 
 0   paper                          52 non-null     object
 1   notes                          44 non-null     object
 2   empirical_dataset              30 non-null     object
 3   article_link_avail             25 non-null     object
 4   code_available_article_desc    26 non-null     object
 5   pwc_link_avail                 25 non-null     object
 6   pwc_link_match                 25 non-null     object
 7   pwc_link_desc                  24 non-null     object
 8   result_replication_code_avail  23 non-null     object
 9   code_language                  23 non-null     object
 10  package                        7 non-null      object
 11  wrapper_script                 16 non-null     object
 12  hardware_specifications        8 non-null      object
 13  softwar

In [4]:
manual_df_numerical = manual[['paper', 'article_link_avail', 'pwc_link_avail', 'pwc_link_match', 'result_replication_code_avail']]
manual_df_numerical = manual_df_numerical.drop(index=[0,51]) # drop first row (summary) and last row (totals)
manual_df_numerical = manual_df_numerical.fillna(0) # fill NaN with 0
dtypes_manual = {'paper': str, 'article_link_avail': float, 'pwc_link_avail': float, 'pwc_link_match': float, 'result_replication_code_avail': float}
manual_df_numerical = manual_df_numerical.astype(dtypes_manual) # convert to int
manual_df_numerical.sum(axis=0, numeric_only=True)

article_link_avail               23.0
pwc_link_avail                   22.0
pwc_link_match                   19.0
result_replication_code_avail    20.0
dtype: float64

In [16]:
manual_vs_repro = manual_df_numerical.merge(eval_andor_links, left_on='paper', right_on='id', how='left')
# manual_df_numerical.article_link_avail.sum(), manual_df_numerical.result_replication_code_avail.sum()
print(f"Manual evaluation found links in {manual_vs_repro.article_link_avail.sum()} papers, ReproScreener found links in {(manual_vs_repro.link_count>0).sum()} papers")

Manual evaluation found links in 23.0 papers, ReproScreener found links in 21 papers
