# Section 1: Introduction to FlowDroid for Taint Analysis

## Research Android App Components

## Understand FlowDroid and Configuration Options

we can see that flowdroid has many options that can be used

In [3]:
%%!
java -jar soot-infoflow-cmd-2.13.0-jar-with-dependencies.jar --help

['usage: soot-infoflow-cmd [OPTIONS]',
 ' -?,--help                                Print this help message',
 ' -a,--apkfile <arg>                       APK file to analyze',
 ' -aa,--aliasalgo <arg>                    Use the specified aliasing',
 '                                          algorithm (NONE, FLOWSENSITIVE,',
 '                                          PTSBASED, LAZY)',
 ' -ac,--additionalclasspath <arg>          Additional JAR file that shal be',
 '                                          put on the classpath',
 ' -af,--aliasflowins                       Use a flow-insensitive alias',
 '                                          analysis',
 ' -al,--aplength <arg>                     Maximum access path length',
 ' -c,--configfile <arg>                    Use the given configuration file',
 ' -ca,--callbackanalyzer <arg>             Use the specified callback',
 '                                          analyzer (DEFAULT, FAST)',
 ' -ce,--codeelimination <arg>          

the options regarding timeout are the following

In [5]:
%%!
java -jar soot-infoflow-cmd-2.13.0-jar-with-dependencies.jar | grep timeout

[' -ct,--callbacktimeout <arg>              Timeout for the callback',
 ' -dt,--timeout <arg>                      Timeout for the main data flow',
 ' -rt,--resulttimeout <arg>                Timeout for the result']

You can also define timeouts:

-dt N Aborts the data flow analysis after N seconds and returns the results obtained so far.
-ct N Aborts the callback collection during callgraph construction after N seconds and continues with the (incomplete) callgraph constructed so far.


In **FlowDroid**, these options allow us to set timeouts for various phases of its static analysis process. Timeouts are used to ensure that the analysis doesn't hang indefinitely or take an excessively long time on complex code. Here's what each option does:

1. **`-ct`, `--callbacktimeout <arg>`**:
    Sets a timeout for analyzing callback methods in the app. Callbacks (e.g., `onClick`, `onResume`) are entry points triggered by the Android framework. We use this to limit the time spent on individual callback methods (which may become overly complex in large or obfuscated apps) during the callgraph construction and we continue with the incomplete callgraph constructed so far.

2. **`-dt`, `--timeout <arg>`**:
    Sets a timeout for analyzing the main data flow in the application. The main data flow analysis tracks the flow of sensitive information (e.g., from sources to sinks) throughout the app. We use this to ensure the tool doesn't analyze indefinitely in cases of massive data flows or overly complex dependencies.

3. **`-rt`, `--resulttimeout <arg>`**:
    Sets a timeout for generating and reporting the analysis results. After completing the analysis, FlowDroid generates a result (e.g., a report of identified leaks). If this step takes too long (e.g., due to the volume of findings), the timeout ensures it gets terminated.

`<arg>` is in seconds

## Identify Sources and Sinks
we decided to use flowdroid's default sources and sinks list found in the repository 

In [6]:
%%!
head -n 5 SourcesAndSinks.txt

['<javax.servlet.ServletRequest: java.lang.String getParameter(java.lang.String)> -> _SOURCE_',
 '<javax.persistence.EntityManager: javax.persistence.TypedQuery createQuery(java.lang.String,java.lang.Class)> -> _SINK_',
 '<javax.servlet.http.HttpServletResponse: void sendRedirect(java.lang.String)> -> _SINK_',
 '<java.io.File: boolean delete()> -> _SINK_',
 '']

## Run FlowDroid Analysis
For each APK, perform three separate analyses by setting different
timeout values—1 minute, 5 minutes, and 20 minutes. Each APK should thus be analyzed three times.

To performe the above the following script was used

In [11]:
%%!
batcat flowdroid_v1.bash

['#!/bin/bash',
 '',
 '# Paths (Modify these paths according to your environment)',
 'FLOWDROID_JAR="soot-infoflow-cmd-2.13.0-jar-with-dependencies.jar"',
 'PLATFORMS_DIR="./tools"',
 'SOURCES_AND_SINKS="./SourcesAndSinks.txt"',
 'APK_DIR="./APKs"',
 '',
 '# Timeout settings in minutes',
 'TIMEOUTS=(60 300 1200)',
 '',
 '# Create an array of APK files in the APK_DIR',
 'APK_FILES=("$APK_DIR"/*.apk)',
 '',
 '# Loop over each APK file',
 'for APK_PATH in "${APK_FILES[@]}"; do',
 '    # Extract the APK filename without the directory path',
 '    APK_FILENAME=$(basename "$APK_PATH")',
 '    # Remove the .apk extension to get the base name',
 '    APK_NAME="${APK_FILENAME%.apk}"',
 '',
 '    # Loop over each timeout setting',
 '    for TIMEOUT in "${TIMEOUTS[@]}"; do',
 '        # Output file name',
 '        OUTPUT_FILE="./outputs/${APK_NAME}-${TIMEOUT}min.xml"',
 '',
 '        echo "Analyzing ${APK_FILENAME} with a timeout of ${TIMEOUT} minute(s)..."',
 '',
 '        # Run FlowDroid analy

In [None]:
%%!
java -jar soot-infoflow-cmd-2.10.0-jar-with-dependencies.jar \
-a /home/yacine/Art/SDSA_Project/Project/APKs/com.hawaiianairlines.app.apk \
-o output.xml \
-p /home/yacine/Android/Sdk/platforms/ -s ./SourcesAndSinks.txt --timeout 10 \

might be worth it to try with this option   

 "[main] WARN soot.dexpler.DexFileProvider - Multiple dex files detected, only processing 'classes.dex'. Use '-process-multiple-dex' option to process them all.",

had problem with tado

In [17]:
%%bash
#!/bin/bash

# Output file for the table
output_file="leak_analysis_table.csv"

# Create an associative array to hold the data
declare -A table
declare -A times_set
apps_set=()

# Process each log file
for log_file in ./outputs/*.xml.log; do
    # Extract app name
    app_name=$(echo "$log_file" | awk -F '-' '{print $1}')
    
    # Extract execution time
    exec_time=$(echo "$log_file" | awk -F '-' '{print $2}' | sed 's/min.xml.log//')min
    
    # Extract the number of leaks
    
    leaks_found=$(grep "Found" "$log_file" | awk '{print $(NF-1)}')
    # Store the app name in the list of apps
    if [[ ! " ${apps_set[@]} " =~ " ${app_name} " ]]; then
        apps_set+=("$app_name")
    fi

    # Mark the time for the header
    times_set["$exec_time"]=1

    # Store the leaks found in the table
    table["$app_name,$exec_time"]=$leaks_found
done

# Create the header row
echo -n "apps" > $output_file
for time in $(echo "${!times_set[@]}" | tr ' ' '\n' | sort -n); do
    echo -n ",$time" >> $output_file
done
echo "" >> $output_file

# Populate the rows with app data
for app in "${apps_set[@]}"; do
    echo -n "$app" >> $output_file
    for time in $(echo "${!times_set[@]}" | tr ' ' '\n' | sort -n); do
        leaks=${table["$app,$time"]}
        if [[ -z "$leaks" ]]; then
            echo -n ",-" >> $output_file
        else
            echo -n ",$leaks" >> $output_file
        fi
    done
    echo "" >> $output_file
done

echo "Table created in $output_file"

Table created in leak_analysis_table.csv


In [None]:
import subprocess
import pandas as pd

# Step 1: Run the Bash script to generate the CSV file

# Step 2: Load the generated CSV file
output_file = "leak_analysis_table.csv"
df = pd.read_csv(output_file)

# Step 3: Display the DataFrame in the notebook
df

Unnamed: 0,apps,1min,5min,20min
0,./outputs/com.delhi.metro.dtc,0,3,3
1,./outputs/com.hawaiianairlines.app,6,7,0
2,./outputs/com.imo.android.imoim,4,4,4
3,./outputs/com.tado,5,5,15
4,./outputs/com.walkme.azores.new,4,5,7
5,./outputs/com.wooxhome.smart,0,0,0
6,./outputs/com.yourdelivery.pyszne,2,2,3
7,./outputs/linko.home,5,24,24
8,./outputs/mynt.app,8,9,16
9,./outputs/nz.co.stuff.android.news,14,17,26
