In [11]:
import pandas as pd

df = pd.read_csv("../../../replication_package/gpt4_dataset/peta/gpt4_result_0_2500.csv")
df.groupby(['Prediction', 'Label']).size().reset_index()

Unnamed: 0,Prediction,Label,0
0,0,0,1732
1,1,0,767
2,1,1,1



I have constructed a call graph using static analysis, and I need your help to prune irrelevant edges. For each edge, I will provide the following information:

1. Caller Code: The function or block of code attempting the call.
2. Callee Candidates: One or more possible target functions for the call.

Your task is to determine whether the edge between the caller and each callee candidate should be kept or pruned. Follow these steps:
1. Special Case - <boot> Callees:
   - <boot> callees often represent system-generated, default, or framework-specific initialization functions. Keep it.
2. Analyze the Semantic Relationship:
   - Check if the caller and callee are logically connected based on their functionality and purpose.
   - Prune the edge if the callee's functionality does not align with the caller’s context or intent.
3. Utility Filtering:
   - Identify generic or utility functions (e.g., print(), log()) that are commonly used but not central to the logic.
   - Prune such edges unless they are directly relevant to the caller’s purpose.
4. Dynamic Dispatch:
   - For polymorphic or dynamic calls, analyze the caller’s context to determine which callee candidates could realistically be invoked.
   - Keep only the valid candidates and prune the rest. 
5. Analyze the Directionality:
   - Validate whether the caller explicitly calls the callee. The edge should only be kept if the caller directly invokes the callee within its logic.
   - If the callee invokes the caller or the relationship is undefined, prune the edge.
6. Optional - Use Static Features as a Final Gate:
    If the semantic analysis is inconclusive, use provided static features (e.g., in-degree, out-degree) to refine your decision:
        1. Prune edges with very high in-degree if they are likely noise (e.g., utility functions).
        2. Retain edges for nodes with low in-degree or out-degree if they seem critical to the flow (e.g., leaf nodes or controllers).


**Input**:

**Caller Code**:
```java
{dest}
```

**Callee Code**:
```java
{src}
```

**Structure**:
```json
{
    {struct_feat}
}
```

Response Format:
Provide the final decision as `1` (Keep) or `0` (Prune).

In [2]:
import pandas as pd
from sklearn.metrics import precision_score, recall_score, f1_score

df = pd.read_csv("../../../replication_package/gpt4_dataset/wala/gpt4_result_0_2500.csv")
df.groupby(['Prediction', 'Label']).size().reset_index()


all_labels = df['Label'].tolist()
all_outputs = df['Prediction'].tolist()

overall_precision = precision_score(all_labels, all_outputs, zero_division=0) if all_labels else 0
overall_recall = recall_score(all_labels, all_outputs, zero_division=0) if all_labels else 0
overall_f1 = f1_score(all_labels, all_outputs, zero_division=0) if all_labels else 0

overall_precision, overall_recall, overall_f1

(0.25096227867590454, 0.6777546777546778, 0.36629213483146067)

In [3]:
df

Unnamed: 0,Index,Start,Destination,Structure,Label,Prediction
0,0,<boot>,"class CMAExample1 { void main ( String[] a0, )...","tensor([ 0.0000, 0.0000, 30.0000, 1.0000...",1,1
1,1,/** * Constructs an array of control specifica...,class PrintfFormat$ConversionSpecification { v...,"tensor([ 4.0000, 18.0000, 6.0000, 1.0000...",1,1
2,2,/** * Format an int. * @param x The int to for...,/** * Format an int argument using this conver...,"tensor([ 4.0000, 4.0000, 10.0000, 1.0000...",1,1
3,3,"class PrintfFormat { String sprintf ( long a0,...",/** * Format a long argument using this conver...,"tensor([ 4.0000, 4.0000, 10.0000, 1.0000...",1,1
4,4,/** * Format a double. * @param x The double t...,/** * Format a double argument using this conv...,"tensor([ 4.0000, 10.0000, 3.0000, 1.0000...",1,1
...,...,...,...,...,...,...
2496,2496,class OptionConverter { Object instantiateByCl...,"class LogLog { void error ( String a0, Throwab...","tensor([3.0000e+00, 4.0000e+00, 0.0000e+00, 5....",0,0
2497,2497,"class RootLogger { void setLevel ( Level a0, )...","class LogLog { void error ( String a0, Throwab...","tensor([3.0000e+00, 3.0000e+00, 0.0000e+00, 5....",0,0
2498,2498,class RendererMap { void addRenderer ( Rendere...,"class LogLog { void error ( String a0, Throwab...","tensor([6.0000e+00, 1.0000e+00, 0.0000e+00, 5....",0,0
2499,2499,class PropertyConfigurator { void doConfigure ...,"class LogLog { void warn ( String a0, ) { retu...","tensor([4.0000e+00, 1.0000e+00, 0.0000e+00, 6....",0,0


In [4]:
import streamlit as st
import pandas as pd
import numpy as np

# Streamlit app
st.title("Dataset Viewer")

# Display the dataset
st.write("### Dataset Overview")
st.dataframe(df)

# Filtering options
st.write("### Filter Options")

# Filter by Label
label_filter = st.multiselect(
    "Filter by Label",
    options=df["Label"].unique(),
    default=df["Label"].unique()
)

# Filter by Prediction
prediction_filter = st.multiselect(
    "Filter by Prediction",
    options=df["Prediction"].unique(),
    default=df["Prediction"].unique()
)

# Apply filters
filtered_data = df[
    (df["Label"].isin(label_filter)) &
    (df["Prediction"].isin(prediction_filter))
]

# Display filtered data
st.write("### Filtered Dataset")
st.dataframe(filtered_data)

2025-02-19 12:24:59.425 
  command:

    streamlit run C:\Users\rioau\AppData\Roaming\Python\Python310\site-packages\ipykernel_launcher.py [ARGUMENTS]


DeltaGenerator()

Unnamed: 0,Prediction,Label,0
0,0,0,1047
1,0,1,155
2,1,0,973
3,1,1,326


Unnamed: 0,Prediction,Label,0
0,0,0,11
1,0,1,12
2,1,0,29
3,1,1,49


Unnamed: 0,Prediction,Label,0
0,0,0,8
1,0,1,5
2,1,0,32
3,1,1,56


In [None]:
import os
import shutil

def move_media_files(source_dir, destination_dir, extensions=(".jpg", ".mov")):
    """
    Move all files with the given extensions from subdirectories to a single destination folder.
    
    :param source_dir: The root directory to scan for files.
    :param destination_dir: The directory where the files should be moved.
    :param extensions: A tuple of file extensions to move.
    """
    if not os.path.exists(destination_dir):
        os.makedirs(destination_dir)
    
    for root, _, files in os.walk(source_dir):
        for file in files:
            if file.lower().endswith(extensions):
                source_path = os.path.join(root, file)
                destination_path = os.path.join(destination_dir, file)
                
                # Ensure we don't overwrite existing files
                counter = 1
                while os.path.exists(destination_path):
                    name, ext = os.path.splitext(file)
                    destination_path = os.path.join(destination_dir, f"{name}_{counter}{ext}")
                    counter += 1
                
                # shutil.move(source_path, destination_path)
                print(f"Moved: {source_path} -> {destination_path}")
                break
        break

if __name__ == "__main__":
    source_directory = "D:/backup-iphone-rio/"  # Change this to your external SSD's path
    destination_directory = "D:/backup-iphone-rio-all-photo/"  # Change this to where you want to store the files
    
    move_media_files(source_directory, destination_directory)


: 