SQL Injection attack is one of the oldest yet effective attacks for web applications. Even in 2020, applications were compromised using this. The developers are supposed to take precautions such as parameterizing SQL queries, escaping special characters, etc. However, developers, especially inexperienced ones, often fail to comply with such guidelines. There are quite a few SQL Injection detection tools to expose any unattended SQL Injection vulnerability in source code. However, to the best of our knowledge, very few works have been done to suggest a fix of these vulnerabilities in the source code. We have developed a learning-based approach that prepares abstraction of SQL Injection vulnerable codes from training dataset and clusters them using hierarchical clustering. The test samples are matched with a cluster of similar samples and a fix suggestion is generated. We have developed a manually validated training and test dataset from real-world projects of Java and PHP to evaluate our language-agnostic approach. The results establish the superiority of our technique over comparable techniques. The code and dataset are released publicly to encourage reproduction.
Index Terms—component, formatting, style, styling, insert
Outputs generated by out model are given in the Output folder. In the outputs, "Target" is the selected code segment from the input java file. It is followed by at least one or more suggestions named "suggestionX", depending on the number of suggestions found in the training dataset.
In the Sample Output folder few output files of various types are given.
We have two types of Train Data - Manual and Synthetic In both cases 'Before' contains the java functions with vulnerable sql queries and 'After' contains the java functions after replacing with Prepared Statement. In the Test Set there is only 'Before' as we are going to generate slutions for them.