# Visualize Directory
- Following thread "http://www.austintaylor.io/d3/python/pandas/2016/02/01/create-d3-chart-python-force-directed/"

## The Network Structure
- A dictionary with two lists, nodes and links.
- links contains the relationships between nodes
- nodes contains each individual node

```json
{
  "nodes":  [
    { "name": "desktop", "group":  1},
    { "name": "desktop/apples.txt", "group":  1},
    { "name": "desktop/pineapple/apples.txt", "group":  1},
    { "name": "desktop/bananas.txt", "group":  1}
  ],

  "links":  [
    { "source":  1,  "target":  0,  "value":  5555 },
    { "source":  2,  "target":  0,  "value":  1 },
    { "source":  3,  "target":  0,  "value": 1 }
  ]
}
```

## Setup

### Modules

In [1]:
import os
import pandas
import json

### Set path of directory you wish to visualize

In [2]:
path = '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/'
export_path = "/users/danielcorcoran/desktop/github_repos/python_nb_networks/json/"

### Helper functions to get size of directory/files, define node sizes

In [3]:
def get_directory_size(directory_path = ""):
    
    directory_size = 0
    
    for dirpath, dirnames, filenames in os.walk(directory_path):
        for f in filenames:
            fp = os.path.join(dirpath, f)
            directory_size += os.path.getsize(fp)
            
    return directory_size

In [4]:
def get_file_size(file_path = ""):
    
    file_size = os.path.getsize(file_path)
    
    return file_size

In [5]:
def get_size_bracket(file_size_megabytes):
    
    if file_size_megabytes <= 1:
        node_size = 3
    elif file_size_megabytes <= 10:
        node_size = 4
    elif file_size_megabytes <= 100:
        node_size = 4.5
    elif file_size_megabytes <= 1000:
        node_size = 5
    elif file_size_megabytes <= 10000:
        node_size = 6
    else:
        node_size = 10
        
    return node_size
        

### Set group node option

In [6]:
set_groups_to_file_types = True

## Collect Data

### Create list to store all the absolute paths within the path directory, this will be used to branch out relationships

In [7]:
absolute_paths = []

In [8]:
for dirpath, dirnames, filenames in os.walk(path):

    #print(dirpath, dirnames, filenames)

    for dirname in dirnames:
        x = dirpath + "/" + dirname
        absolute_paths.append(x.strip().replace("//",
                                                "/").replace("'", "").replace(
                                                    '"', ''))

    for filename in filenames:
        y = dirpath + "/" + filename
        absolute_paths.append(y.strip().replace("//",
                                                "/").replace("'", "").replace(
                                                    '"', ''))

### Store data in pandas dataframe

In [9]:
data = pandas.DataFrame(absolute_paths)
data.rename({0: "absolute_path"}, axis=1, inplace=True)
data.head(15)

Unnamed: 0,absolute_path
0,/Users/danielcorcoran/desktop/github_repos/pyt...
1,/Users/danielcorcoran/desktop/github_repos/pyt...
2,/Users/danielcorcoran/desktop/github_repos/pyt...
3,/Users/danielcorcoran/desktop/github_repos/pyt...
4,/Users/danielcorcoran/desktop/github_repos/pyt...
5,/Users/danielcorcoran/desktop/github_repos/pyt...
6,/Users/danielcorcoran/desktop/github_repos/pyt...
7,/Users/danielcorcoran/desktop/github_repos/pyt...
8,/Users/danielcorcoran/desktop/github_repos/pyt...
9,/Users/danielcorcoran/desktop/github_repos/pyt...


### Create two columns, destination and source

In [10]:
for index in range(data.shape[0]):
    item = data.iloc[index, 0]
    split = item.split("/")
    
#     destination = split[len(split)-1]
#     source = split[len(split)-2]

    source = ("/").join(split[: len(split)-1])
    destination = ("/").join(split)
    
    data.loc[index, "destination"] = destination
    data.loc[index, "source"] = source

In [11]:
data.head()

Unnamed: 0,absolute_path,destination,source
0,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
1,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
2,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
3,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
4,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...


In [12]:
data

Unnamed: 0,absolute_path,destination,source
0,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
1,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
2,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
3,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
4,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
5,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
6,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
7,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
8,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...
9,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...


## Calculate size of each file/directory 

In [13]:
for index in range(data.shape[0]):
    item = data.loc[index, "absolute_path"]
    last_item = item.split("/")[-1]
    
    if last_item.startswith(".") == False and "." in last_item:
        print(last_item, "is a filename")
        size = get_file_size(item)
    else:
        print(last_item, "is a directory")
        size = get_directory_size(item)
        
    data.loc[index, "size_bytes"] = size
    
data["size_megabytes"] = data["size_bytes"]/1000000

drivers is a directory
.ipynb_checkpoints is a directory
.git is a directory
data is a directory
notebook_calculate_closest_street_pool.ipynb is a filename
.DS_Store is a directory
notebook_reverse_geocode.ipynb is a filename
README.md is a filename
notebook_calculate_closest_street.ipynb is a filename
geckodriver.log is a filename
.gitignore is a directory
notebook_spatial_joins.ipynb is a filename
gnaf_tests.ipynb is a filename
geckodriver is a directory
chromedriver is a directory
notebook_reverse_geocode-checkpoint.ipynb is a filename
notebook_spatial_joins-checkpoint.ipynb is a filename
objects is a directory
info is a directory
logs is a directory
hooks is a directory
refs is a directory
config is a directory
HEAD is a directory
description is a directory
index is a directory
packed-refs is a directory
COMMIT_EDITMSG is a directory
3e is a directory
50 is a directory
6f is a directory
9b is a directory
32 is a directory
51 is a directory
94 is a directory
02 is a directory
a4 is 

In [14]:
data.loc[25, "absolute_path"]

'/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.git/index'

In [15]:
data.head()

Unnamed: 0,absolute_path,destination,source,size_bytes,size_megabytes
0,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,16498524.0,16.498524
1,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,68776.0,0.068776
2,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,103168925.0,103.168925
3,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,136112411.0,136.112411
4,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,150444.0,0.150444


### Calculate size multipliers

In [16]:
for index in range(data.shape[0]):
    
    item = data.loc[index, "size_megabytes"]
    
    node_size = get_size_bracket(item)
    
    data.loc[index, "node_size"] = node_size

data.head(5)

Unnamed: 0,absolute_path,destination,source,size_bytes,size_megabytes,node_size
0,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,16498524.0,16.498524,4.5
1,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,68776.0,0.068776,3.0
2,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,103168925.0,103.168925,5.0
3,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,136112411.0,136.112411,5.0
4,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,/Users/danielcorcoran/desktop/github_repos/pyt...,150444.0,0.150444,3.0


### Create groups based on file type (optional)

In [17]:
for index in range(data.shape[0]):
    
    absolute_path = data.loc[index, "absolute_path"]
    
    last_item = absolute_path.split("/")[-1] 
    
    if "." in last_item:
        data.loc[index, "file_extension"] = "." + last_item.split(".")[-1]
    else:
        data.loc[index, "file_extension"] = "folder"

In [18]:
unique_extensions = list(data["file_extension"].unique())
unique_extensions

['folder',
 '.ipynb_checkpoints',
 '.git',
 '.ipynb',
 '.DS_Store',
 '.md',
 '.log',
 '.gitignore',
 '.sample',
 '.shx',
 '.xml',
 '.shp',
 '.dbf',
 '.prj']

### Create a list containing only destinations, this will be used to build the nodes list as part of the main dictionary

In [19]:
destination_list = list(data["destination"])
destination_list

['/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/drivers',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.ipynb_checkpoints',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.git',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/data',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/notebook_calculate_closest_street_pool.ipynb',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.DS_Store',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/notebook_reverse_geocode.ipynb',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/README.md',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/notebook_calculate_closest_street.ipynb',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/geckodriver.log',
 '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.gitignore',
 '/Users/danielcorcoran/deskto

In [20]:
destination_list.append(path)

### Create nodes_list

In [21]:
nodes_list = []

for index in range(len(destination_list)):

    if index == len(destination_list) - 1:
        group_index = 999999999999999999
        
        nodes_list.append({"group": group_index, 
                           "name": destination_list[index],
                           "size" : get_size_bracket(get_directory_size(directory_path= path))
                          })
        
    else:
        if set_groups_to_file_types == True:
            group_text = data.loc[index, "file_extension"]
            group_index = unique_extensions.index(group_text)
        else:
            group_index = 1

        nodes_list.append({"group": group_index, 
                           "name": destination_list[index],
                           "size" : data.loc[index, "node_size"]
                          })

In [22]:
nodes_list

[{'group': 0,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/drivers',
  'size': 4.5},
 {'group': 1,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.ipynb_checkpoints',
  'size': 3.0},
 {'group': 2,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.git',
  'size': 5.0},
 {'group': 0,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/data',
  'size': 5.0},
 {'group': 3,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/notebook_calculate_closest_street_pool.ipynb',
  'size': 3.0},
 {'group': 4,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.DS_Store',
  'size': 3.0},
 {'group': 3,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/notebook_reverse_geocode.ipynb',
  'size': 3.0},
 {'group': 5,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/README.md',
  'size': 3

In [23]:
nodes_list[:3]

[{'group': 0,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/drivers',
  'size': 4.5},
 {'group': 1,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.ipynb_checkpoints',
  'size': 3.0},
 {'group': 2,
  'name': '/Users/danielcorcoran/desktop/github_repos/python_nb_data_spatial/.git',
  'size': 5.0}]

### Next, build up the second part of the dictionary, the links list

In [24]:
links_list = []

In [25]:
for index in range(data.shape[0]):

    try:

        target = index

        source_text = data.loc[index, "source"]

        source = destination_list.index(source_text)

        links_list.append({"source": source, "target": target, "value": 1})
    except:

        print(index, ' has failed, attempting alternative method')

        target = index

        source = len(destination_list) - 1

        links_list.append({"source": source, "target": target, "value": 1})

0  has failed, attempting alternative method
1  has failed, attempting alternative method
2  has failed, attempting alternative method
3  has failed, attempting alternative method
4  has failed, attempting alternative method
5  has failed, attempting alternative method
6  has failed, attempting alternative method
7  has failed, attempting alternative method
8  has failed, attempting alternative method
9  has failed, attempting alternative method
10  has failed, attempting alternative method
11  has failed, attempting alternative method
12  has failed, attempting alternative method


## Process final data

### Merge nodes and links lists into one dictionary

In [26]:
json_data = {"nodes": nodes_list, "links": links_list}

### Convert python dictionary to json string

In [27]:
json_dump = json.dumps(json_data, indent=1, sort_keys=True)

### Export to filename 'pcap_export.json' to be used in index.html

In [28]:
json_out = open(export_path + "pcap_export.json", "w")
json_out.write(json_dump)
json_out.close()