# Experimenting with Python Clang Bindings

The goal of this notebook is to load the python clang bindings and the juliet dataset. Then try to get an AST out of some of Juliet's code snippets, can we make sense of any of it?

## Setup
You need clang and llvm installed. 

Then you need to make sure the clang python bindings are in your python path. What this really means is that you need to run the next cell, and if it fails you need to:
  1. Download the clang source code from this location: http://releases.llvm.org/download.html
     - Note: You probably need to be careful to download the correct version, check this by running `clang --version` in your shell. Then download the clang source code for the version it outputs (from the page linked above). I had to download 7.0.1.
  2. Extract this source to a know location, I chose "/home/dan/masters-cyber-security/project/clang-src/". 
  3. Open `~/.bashrc` in a text editor, and at the end add the following line:
 `PYTHONPATH=/home/dan/masters-cyber-security/project/clang-src/:$PYTHONPATH`
      
  4. Restart jupyter notebook and all python sessions.

Hopefully it'll work then.

In [None]:
import clang.cindex

In [2]:
import os
import pandas as pd

In [None]:
# This cell might not be needed for you.
clang.cindex.Config.set_library_path(os.environ.get('LIBRARY_PATH'))

Load in the juliet data set, and pick the first data point as an example

In [6]:
juliet = pd.read_csv("../data/juliet.csv.zip")

In [10]:
example = juliet.iloc[0]
example

Unnamed: 0                                                     0
testcase_ID                                                61940
filename       000/061/940/CWE114_Process_Control__w32_char_c...
code           /* TEMPLATE GENERATED TESTCASE FILE\nFilename:...
flaw                                                     CWE-114
flaw_loc                                                     121
CWE-015                                                    False
CWE-023                                                    False
CWE-036                                                    False
CWE-078                                                    False
CWE-090                                                    False
CWE-114                                                     True
CWE-121                                                    False
CWE-122                                                    False
CWE-123                                                    False
CWE-124                  

In [27]:
print(example.code)

/* TEMPLATE GENERATED TESTCASE FILE
Filename: CWE114_Process_Control__w32_char_connect_socket_01.c
Label Definition File: CWE114_Process_Control__w32.label.xml
Template File: sources-sink-01.tmpl.c
*/
/*
 * @description
 * CWE: 114 Process Control
 * BadSource: connect_socket Read data using a connect socket (client side)
 * GoodSource: Hard code the full pathname to the library
 * Sink:
 *    BadSink : Load a dynamic link library
 * Flow Variant: 01 Baseline
 *
 * */

#include "std_testcase.h"

#include <wchar.h>

#ifdef _WIN32
#include <winsock2.h>
#include <windows.h>
#include <direct.h>
#pragma comment(lib, "ws2_32") /* include ws2_32.lib when linking */
#define CLOSE_SOCKET closesocket
#else /* NOT _WIN32 */
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <unistd.h>
#define INVALID_SOCKET -1
#define SOCKET_ERROR -1
#define CLOSE_SOCKET close
#define SOCKET int
#endif

#define TCP_PORT 27015
#define IP_ADDRESS "127.0.0.1"


#if

Instantiate the clang parser and give it our example. We use `unsaved_files` to tell it to parse a file that doesn't actually exist on disk.

In [15]:
index = clang.cindex.Index.create()
translation_unit = index.parse(path=example.filename, unsaved_files=[(example.filename, example.code)])

In [17]:
translation_unit

<clang.cindex.TranslationUnit at 0x7f897d498cf8>

`root` is the root note of the AST. Try to explore and figure out what this all means! It's pretty dense ha

In [18]:
root = translation_unit.cursor

In [21]:
dir(root)

['__class__',
 '__ctypes_from_outparam__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_b_base_',
 '_b_needsfree_',
 '_fields_',
 '_kind_id',
 '_objects',
 '_tu',
 'access_specifier',
 'availability',
 'brief_comment',
 'canonical',
 'data',
 'displayname',
 'enum_type',
 'enum_value',
 'exception_specification_kind',
 'extent',
 'from_cursor_result',
 'from_location',
 'from_result',
 'get_arguments',
 'get_bitfield_width',
 'get_children',
 'get_definition',
 'get_field_offsetof',
 'get_included_file',
 'get_num_template_arguments',
 'get_template_argument_kind',
 'get_template_argument_type',
 'get_template_argument_unsigned_value',
 'get_template_argument_v

In [33]:
children = list(root.get_children())

In [40]:
children[0].kind

CursorKind.TYPEDEF_DECL