Skip to content
/ reuse Public

This repo supports: Imker, H.J., Luong, H., Mischo, W.H., Schlembach, M.C., Wiley, C.A. (2021) An “Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University,” The Journal of Academic Librarianship 47, 102369.

Notifications You must be signed in to change notification settings


Folders and files

Last commit message
Last commit date

Latest commit



15 Commits

Repository files navigation




Data for: An Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University


  • Name: Heidi Imker

    • Organization/institution: University of Illinois at Urbana-Champaign
    • ORCID: 0000-0003-4748-7453
    • Email:
  • Name: Hoa Luong

    • Organization/institution: University of Illinois at Urbana-Champaign
    • ORCID: 0000-0001-8054-9412
    • Email:
  • Name: William H. Mischo

    • Organization/institution: University of Illinois at Urbana-Champaign
    • ORCID: 0000-0003-4234-9836
    • Email:
  • Name: Mary C. Schlembach

    • Organization/institution: University of Illinois at Urbana-Champaign
    • ORCID: 0000-0002-3145-4828
    • Email:
  • Name: Chris Wiley

    • Organization/institution: University of Illinois at Urbana-Champaign
    • ORCID: 0000-0003-0433-9151
    • Email:


  • English


This dataset was developed as part of a study that assessed data reuse. Through bibliometric analysis, corresponding authors of highly cited papers published in 2015 at the University of Illinois at Urbana-Champaign in nine STEM disciplines were identified and then surveyed to determine if data were generated for their article and their knowledge of reuse by other researchers. Second, the corresponding authors who cited those 2015 articles were identified and surveyed to ascertain whether they reused data from the original article and how that data was obtained. The project goal was to better understand data reuse in practice and to explore if research data from an initial publication was reused in subsequent publications.

  • Please read the associated openly available research article for context, additional details, and results:

Imker, H.J., Luong, H., Mischo, W.H., Schlembach, M.C., Wiley, C.A. (2021) "An Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University” Journal of Academic Librarianship


data reuse; data sharing; data management; data services; Scopus API


  • This dataset consists of the following files:

    • 1 Readme file (TXT)
    • 1 Survey A file (TXT)
    • 1 Survey B file (TXT)
    • 14 data files (CSV)
    • 6 R script files (R)
  • The initial input data file for SURVEY A is survey_a_results.csv.

  • The initial input data file for SURVEY B is survey_b_results.csv.

  • Each script is named for its step in the analysis process, with an additional short descriptor (see also “DATA ANALYSIS” section below).

  • Files necessary to create Tables for Survey A

    • Table 1 Survey A: A_STEP_1_Summary.R & a_overall_summary.csv
    • Table 2 Survey A: A_STEP_2_Others_Use.R & a_summary_importance.csv
    • Table 3 Survey A: A_STEP_2_Others_Use.R & a_summary_why_data_not_used.csv
    • Table 4 Survey A: A_STEP_3_Self_Use.R & a_summary_importance_UIUC_data_used.csv
    • Table 5 Survey A: A_STEP_3_Self_Use.R & a_summary_UIUC_why_data_not_used.csv
  • Additional files to aggregate "write-in" comments in Survey A (no tables)

    • A_STEP_2_Others_Use.R & a_write_ins_why_data_not_used.csv
    • A_STEP_3_Self_Use.R & a_write_ins_UIUC_why_data_not_used.csv
  • Files necessary to create Tables for Survey B

    • Table 6 Survey B: B_STEP_1_Summary.R & b_overall_summary.csv
    • Table 7 Survey B: B_STEP_2_Why_Not_Used.R & b_summary_why_data_not_used.csv
    • Table 8 Survey B: B_STEP_3_Why_Cite.R & b_summary_authors_cited_UIUC_article.csv
  • Additional files to aggregate "write-in" comments in Survey B (no tables)

    • B_STEP_2_Why_Not_Used.R & b_write_ins_why_data_not_used.csv
    • B_STEP_3_Why_Cite.R & b_write_ins_why_cited_UIUC_article.csv


Data sources:

  • Initial article metadata was obtained through Scopus via University of Illinois at Urbana-Champaign’s library subscription.


Variables for initial input CSV files, survey_a_results.csv:

  • ResponseId_A = Random unique ID generated by Qualtrics for Survey A
  • Discipline = Scopus subject areas as Biochemistry (BIOC), Chemistry (CHEM), Computer Science (COMP), Earth Science (EART), Engineering (ENGI), Environmental Science (ENVI), Materials Science (MATE), Medicine (MEDI), and Physics (PHYS)
  • Q1_A_was_data = See Q1 in Survey_A.txt
  • Q2_A_anyone_used = See Q2 in Survey_A.txt
  • Q3_A_importance_to_them = See Q3 in Survey_A.txt
  • Q4_A_how_provided = See Q4 in Survey_A.txt
  • Q4_A_specify_online_database = write infor Q4
  • Q4_A_other_specify = write in for Q4
  • Q5_A_reason_not_used = See Q5 in Survey_A.txt
  • Q5_A_other_specify = write in for Q5
  • Q6_A_UIUC_reuse = See Q6 in Survey_A.txt
  • Q7_A_importance_to_you = See Q7 in Survey_A.txt
  • Q8_A_data_obtained_by_you = See Q8 in Survey_A.txt
  • Q8_A_specify_online_database = write in for Q8
  • Q8_A_other_specify = write in for Q8
  • Q9_A_reason_not_used_UIUC_author = See Q9 in Survey_A.txt
  • Q9_A_other_specify = write in for Q9

Variables for initial input CSV files, survey_b_results.csv:

  • ResponseId_B = Random unique ID generated by Qualtrics for Survey B
  • ResponseId_A = as above
  • Discipline = as above
  • Q1_B_UIUC_data_used = See Q1 in Survey_B.txt
  • Q2_B_how_provided = See Q2 in Survey_B.txt
  • Q2_B_specify_online_database = write in for Q2
  • Q2_B_other_specify = write in for Q2
  • Q3_B_why_opt_not_use = See Q3 in Survey_B.txt
  • Q3_B_other_specify = write in for Q3
  • Q4_B_why_cited = See Q4 in Survey_B.txt
  • Q4_B_other_specify = write in for Q3

Variables for output CSV file: a_summary_overall.csv:

  • Discipline = as above
  • count_discip_response = total count of responses per discipline
  • count_data_yes = count of articles containing data per discipline
  • percent_data_yes = percent of articles containing data per discipline
  • count_used = count of articles using others' data per discipline
  • percent_used = percent of articles using others' data per discipline

Variables for output CSV file: b_summary_overall.csv:

  • Discipline = as above
  • count_discip_response = as above
  • percent_discip_response =
  • count_data_used = count of articles that used data from the cited UIUC paper
  • percent_data_used = percent of articles that used data from the cited UIUC paper
  • count_not_used = count of articles that did not use data from the cited UIUC paper
  • percent_not_used = percent of articles that did not use data from the cited UIUC paper

Variables for output CSV files (2) tallying importance: a_summary_importance.csv and a_summary_importance_UIUC_data_used.csv

  • Discipline = as above
  • N = total count of responses
  • Extremely = count of responses (code 5)
  • Very = count of responses (code 4)
  • Somewhat = count of responses (code 3)
  • Not_very = count of responses (code 2)
  • Not_at_all = count of responses (code 1)
  • ave_importance = average per discipline based on count per discipline

Variables for output CSV files (3) summarizing reasons selected that explain lack of reuse: a_summary_why_data_not_used.csv and a_summary_UIUC_why_data_not_used.csv, b_summary_why_data_not_used.csv

  • fix_explain = explanations for why data was not used (see Q5 and Q9 in Survey_A.txt and Q3 in Survey_B.txt), note "fix" is appended because explanations were truncated in code to be more manageable
  • sum = sum of all responses that selected a given explanation
  • all_disciplines = list of all disciplines represented in the responses that selected a given explanation

Variables for output CSV file (1) summarizing reasons selected why the UIUC article was cited: b_summary_authors_cited_UIUC_article,csv

  • explain = explanations for why UIUC article was cited (see Q4 in Survey_B.txt)
  • sum = as above
  • all_disciplines = as above

Variables for output CSV files (4) aggregating write ins: a_write_ins_why_data_not_used.csv, a_write_ins_UIUC_why_data_not_used.csv, b_write_ins_why_data_not_used.csv, and b_write_ins_why_cited_UIUC_article.csv

  • all available as above


Program used:

R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

  • Attached base packages:
    • stats
    • graphics
    • grDevices
    • utils
    • datasets
    • methods
    • base
  • Other attached packages:
    • forcats_0.5.0
    • stringr_1.4.0
    • dplyr_1.0.2
    • purrr_0.3.4
    • readr_1.4.0
    • tidyr_1.1.2
    • tibble_3.0.4
    • ggplot2_3.3.2
    • tidyverse_1.3.0

There are 3 scripts for analysis of Survey A:

STEP 1 Purpose: Summary Across Disciplines

  • Package(s): tidyverse
  • Input file(s): survey_a_results.csv
  • Output file(s): a_overall_summary.csv

STEP 2 Purpose: Analyze responses for data has been used, how important, access (results within commented code - no output), and then not used and reasons

  • Package(s): tidyverse, stringr
  • Input file(s): survey_a_results.csv
  • Output file(s): a_summary_importance.csv, a_write_ins_why_data_not_used.csv, a_summary_why_data_not_used.csv

STEP 3 Purpose: Analyze did data was reused in the initial UIUC article, importance or why data not used, and access (results within commented code - no output)

  • Package(s): tidyverse, stringr ##Input file(s): survey_a_results.csv ##Output file(s): a_summary_importance_UIUC_data_used.csv, a_write_ins_UIUC_why_data_not_used.csv,a_summary_UIUC_why_data_not_used.csv

There are 3 scripts for analysis of Survey B:

STEP 1 Purpose: Determine counts of data from initial UIUC article used or not used, and access mechanisms (results within commnented code - no output)

  • Package(s): tidyverse
  • Output file(s): b_overall_summary.csv

STEP 2 Purpose: Analyze why data has not been use

  • Package(s): tidyverse, stringr
  • Input file(s): survey_b_results.csv
  • Output file(s): b_write_ins_why_data_not_used.csv, b_summary_why_data_not_used.csv

STEP 3 Purpose: Analyze why UIUC articles where cited

  • Package(s): tidyverse, stringr
  • Input file(s): survey_b_results.csv
  • Output file(s): b_write_ins_why_cited_UIUC_article.csv, b_summary_authors_cited_UIUC_article.csv


  • Formally: CC0 to facilitate ease-of-use

  • Informally: Please cite this dataset regardless. It matters to us, and provenance is important. The citation is:

  • Imker, H.J., Luong, H., Mischo, W.H., Schlembach, M.C., Wiley, C.A. (2021) Data for: An Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University. University of Illinois at Urbana-Champaign


This repo supports: Imker, H.J., Luong, H., Mischo, W.H., Schlembach, M.C., Wiley, C.A. (2021) An “Examination of Data Reuse Practices within Highly Cited Articles of Faculty at a Research University,” The Journal of Academic Librarianship 47, 102369.







No releases published


No packages published
