# Data Exploration JSON Parser
The goal of this notebook is to flatten and extract the data into .csv from .apk.json files included un the MalDroid dataset containing the CopperDroid analysis of android APKs. This notebook focuses on the data included under the dynamic:host: headers of the JSON file. This is done to be able to more easily visualize and analyze the houndreds of thousands of rows of sys calls. 
## Objectives
1. Isolate the data under the 'dynamic' header into a JSON array under the header 'host'
2. Create seperate JSON files of objects with the same "class" attribute
3. Record a nested dictionary of common attributes associated with each "class" attribute
4. Flatten "class"-separated JSON files into their own CSVs for analysis

In [1]:
import pandas as pd
import json
import ast

with open('075049984D2937039DDE452818BC6B844C8C8CD17232DB8D951306F02234B2EA/sample_for_analysis.apk.json') as path:
    full_json = json.load(path)

# Objective 1

In [2]:
isolated_json = full_json['behaviors']['dynamic']

# Objective 2
A list of all class names is compilied and the dictionary is formed. Then the dictionary entries are exported to JSON and a dictionary is formed and exported of the relative file paths for each class for refrence.

In [3]:
class_list = []
class_dict = {}

for item in isolated_json['host']:
    if type(item) != dict:
        print("item not of type dict")
        break
    item_class = item['class']
    if item_class not in class_list:
        class_list.append(item_class)
        class_dict[item_class] = [item]
    else:
        class_dict[item_class].append(item)
        
file_path_dict = {}

for class_type in class_dict.keys():
    holder_dict = {'content': class_dict[class_type]}
    path = 'data_exploration/' + class_type + '.json'
    file_path_dict[class_type] = path
    with open('data_exploration/' + class_type + '.json', 'w') as write_json:
        json.dump(holder_dict, write_json)

with open('data_exploration/relative_class_file_paths.json', 'w') as write_json:
        json.dump(file_path_dict, write_json)

# Objective 3
Variances in the structure of objects even within the same class need to be noted and accounted for. Each dictionary contains a list of the possible structures within a class. Attributes are treated as keys in the dictionary, where the values are their corresponding dtypes or sub-Attributes with their own dictionaries.
The end result is a JSON object where the primary Attributes are each class type, containing a list of dictionaries of each possible structure within that class.

In [4]:
class_attributes_dict = {}

def dictparse(item):
    #A bit messy, but seems to be accurate. Had to workaround some strange formatting, long dtypes, and unicode
    attribute_dict = {}
    for key in item.keys():
        attribute = item[key]
        if key == 'blob' and type(attribute) is str:
            if '{' in attribute:
                attribute = attribute.replace("L,", ",")
                attribute = attribute.replace("L}", "}")
                attribute = attribute.replace("u\'", "\'")
                try:
                    attribute = ast.literal_eval(attribute)
                except:
                    print(item)
        if type(attribute) is dict:
            attribute_dict[key] = dictparse(attribute)
        elif type(attribute) is list:
            #WARNING: This does not account for n-dimensional lists
            for entry in attribute:
                if type(entry) is dict:
                    attribute_dict[key] = dictparse(entry)
                else:
                    attribute_dict[key] = str(type(attribute))
        else:
            attribute_dict[key] = str(type(attribute))
    return attribute_dict

for class_type in class_dict.keys():
    class_attributes_dict[class_type] = []
    for item in class_dict[class_type]:
        item_dict = dictparse(item)
        if item_dict not in class_attributes_dict[class_type]:
            class_attributes_dict[class_type].append(item_dict)

with open('data_exploration/class_attributes.json', 'w') as write_json:
        json.dump(class_attributes_dict, write_json)

# Objective 4