TSBugs: From Logic to Toolchains — Bugs in the TypeScript Ecosystem

An empirical dataset and analysis of bugs in modern TypeScript projects

Dependecies Install

pip install requests nltk beautifulsoup4 label-studio-sdk setfit pandas

Project Structure

TSBugsArtifact/
├── raw_data                           # All raw data we collect
├── Data_Collection                    # Collect bug-fix relative commit and extract the commit URLs from given repository
├── Model_Train_Predict_Setfit         # Training model on label data and use it to predict labels for untrain commit
├── Summarize_and_Verification         # Produce summarization and verification report for model predicted label result        
└── Experimental Analyses              # This directory contains the scripts, data, and output of the analyses conducted on data obtained from previous steps (bugs, projects, dependencies) in order to conduct the analyeses discussed for RQ1, and reproduce the results, diagrams, and tabels used in the paper.

Key Json Files and Configuration

Key Json Files

Json Files Path: <realtive_path>/TSBugsArtifact/Summarize_and_Verification

Name: github_repository_name.json
Description: List of all repository name (Should match with repository's URL)

Name: manually_label_collection.json
Description: Training set label result

Name: verification_set.json
Description: Verification set label result

Key Configuration of Label studio

To correctly extract data from a Label Studio project, the projects must have the following configurations.

Labeling Interface

Label interface of the project has to follow this pattern:

<View>
  <Text name="text" value="$commit_index"/>
  <View style="box-shadow: 2px 2px 5px #999;                padding: 20px; margin-top: 2em;                border-radius: 5px;">
    <Header value="Choose bug category"/>
    <Choices name="sentiment" toName="text" choice="single" showInLine="true">
      <Choice value="Test Fault"/>
      <Choice value="Asynchrony / Event Handling Bug"/>
      <Choice value="Tooling / Configuration Issue"/>
      <Choice value="Missing Cases"/>
      <Choice value="Exception Handling"/>
      <Choice value="Missing Features"/>
      <Choice value="Type Error"/>
      <Choice value="UI Behavior Bug"/>
      <Choice value="API Misuse"/>
      <Choice value="Logic Error"/>
      <Choice value="Runtime Exception"/></Choices>
  </View>
</View>

Automatic Upload Local File (Could Storage)

It’s not required, but it’s strongly recommended if you want to upload all local data to your Label Studio project at once.

Start Label Studio with following comment, fill in your root directory:

LABEL_STUDIO_LOCAL_FILES_SERVING_ENABLED=true LABEL_STUDIO_LOCAL_FILES_DOCUMENT_ROOT=<YOUR_ROOT_DIRECTORY> label-studio

Connect local files to your project:

In your project Settings, open Cloud Storage.
Click Add source storage and set Storage type to Local files
Fill in the fields:
- Absolute local path: <realtive_path>/TSBugsArtifact/raw_data/predict-sample-collection
- File Filter Regex: .*json

Key Scripts Usage

Data_Collection

Script: github_commits_collector
Description: extract commits relative to bug fix from given GitHub repository and create a json file under current directory to store them Arguments:

agument 1: URL of Github repository that being targeted
agument 2: Number of commit going to collect
agument 3: person access token (classic), optional

python github_commits_collector.py <GITHUB_REPOSITORY_URL> [<Collect_Commit_Number>] [<YOUR_GITHUB_TOKEN>]

Script: index.py
Description: collect key information from commit/commits that realtive discussion (Github PR, issue and security page) that you given. These data will be store in json file at /TSBugsArtifact/raw_data/output directory (If not have such directory, then script will generate)
Arguments:

agument 1: URL of Github commit that need to collect
agument 2: person access token (classic); defualt: None

For single commit:

python index.py <GITHUB_COMMIT_URL> [<YOUR_GITHUB_TOKEN>]

For multiple commits:

python index.py <JSON_FILE>

Json File Format

Reference to collect_commit_template.json

{
    "token": "YOUR_GITHUB_TOKEN" OR null,
    "urls": [
      "GITHUB_COMMIT_URL",
      .........
    ]
}

Model_Train_Predict_Setfit

Script: index.py
Prerequisite: You should have labe studio person access token and a project with a few label sample
Description: fine-tun the model with label simples in your label studio project and upload the predict label for un-label simples into same project Arguments:

agument 1: Label Studio person access token
agument 2: Label Studio project ID

python index.py <YOUR_LabelStudio_Access_Token> <PROJECT_ID>

Summarize_and_Verification

Script: label_studio_extractor.py
Prerequisite: You should have labe studio person access token and a project with a few predict sample
Description: get predict/manually label result summarization for then given project Arguments:

agument 1: Label Studio person access token
agument 2: Label Studio project ID
agument 3: Produce Summary report of manual label result (input: 1) or predict label result (input: non-1 char); defualt: non-1 char

python label_studio_extractor.py <YOUR_LabelStudio_Access_Token> <PROJECT_ID> [<Is_Manual_Label_Summarize>]

Script: compare_verification_predict.py
Prerequisite: You should have a json file that store the summarize of predict label (run prediction_extractor.py)
Description: Generate two Excel files: one containing samples the model labeled correctly and another containing samples it mislabeled, based on comparsion between the model’s predicted labels and verification set Arguments:

agument 1: Name of verification set json file (reference to verification_set.json); default: verification_set.json
agument 2: Name of predicted label json summary file; default: predict_label_collection.json

python compare_verification_predict.py [<Verification_Set>] [<Predict_Label_Result>]

Script: predict_summarize_repository.py
Prerequisite: You should have a json file that store the summarize of predict label (run prediction_extractor.py)
Description: Base on the model’s predicted label result, print out number of commits belong the repository under each bug category Arguments:

agument 1: Name of predicted label json summary file; default: predict_label_collection.json
agument 2: Name of collected Github repository json file (reference to github_repository_name.json); default: github_repository_name.json

python predict_summarize_repository.py [<Predict_Label_Result>] [<GitHub_Repository_Lists>]

Reproducing the Results Discussed in the Paper

Please run the R scripts in Experimental Analyses/scripts to conduct analyeses for RQ1 and RQ2 (a, b, and c), respectively. These analyses use the data in Experimental Analyses/scripts to reproduce the results, diagrams, and tables used in the paper in Experimental Analyses/output.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TSBugs: From Logic to Toolchains — Bugs in the TypeScript Ecosystem

Table of Contents

Dependecies Install

Project Structure

Key Json Files and Configuration

Key Json Files

Key Configuration of Label studio

Labeling Interface

Automatic Upload Local File (Could Storage)

Key Scripts Usage

Data_Collection

Model_Train_Predict_Setfit

Summarize_and_Verification

Reproducing the Results Discussed in the Paper

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Data_Collection		Data_Collection
Experimental Analyses		Experimental Analyses
Model_Train_Predict_Setfit		Model_Train_Predict_Setfit
Summarize_and_Verification		Summarize_and_Verification
raw_data		raw_data
.gitignore		.gitignore
README.md		README.md

SEatSFU/tsbugs

Folders and files

Latest commit

History

Repository files navigation

TSBugs: From Logic to Toolchains — Bugs in the TypeScript Ecosystem

Table of Contents

Dependecies Install

Project Structure

Key Json Files and Configuration

Key Json Files

Key Configuration of Label studio

Labeling Interface

Automatic Upload Local File (Could Storage)

Key Scripts Usage

Data_Collection

Model_Train_Predict_Setfit

Summarize_and_Verification

Reproducing the Results Discussed in the Paper

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages