# CSV to NPZ
In this notebook I am gathering the data given by Max Hinne and converting it to a single file.

In [1]:
import csv, sys, os
import numpy as np

print("Python Version: ", sys.version, "\n")
print("Numpy Version: ",np.__version__)

Python Version:  3.9.12 (main, Apr  5 2022, 01:53:17) 
[Clang 12.0.0 ] 

Numpy Version:  1.22.3


## SC & ID
The following two cells describe how I extract the values and the filenames (IDs) of the csv files supplied by Max Hinne.

In [2]:
def hinne_csv_to_graph(name):
    with open(f"{name}", newline="") as csvfile:
        reader = csv.reader(csvfile, delimiter=",")
        read = np.array([item for row in reader for item in row]).astype(np.int64)
        read = read.reshape((int(np.sqrt(len(read))), int(np.sqrt(len(read)))))
    return read

In [3]:
combined = []
ids = []
for file in sorted(os.listdir("./HCP/anatomical")):
    if file.endswith(".csv"):
        path = os.path.join("./HCP/anatomical", file)
        ids.append(path.split("/")[3].split("_")[0])
        arr = hinne_csv_to_graph(path)
        combined.append(arr)
streamline_counts = np.array(combined)
ids = np.array(ids)
print(streamline_counts.shape)
print(ids.shape)

(30, 164, 164)
(30,)


## Labels
In the following cell I am loading the ordered list of labels supplied by Max Hinne.

In [4]:
labels = np.load("HCP/labels.npz")["labels"]
labels.shape

(164,)

## Component Positions
In the following cell I am extracting the brain component locations from the file structural_labels.csv.

# Warning!!!
As I have just gotten to know, these values do not actually represent the correct locations. Do not plot them.

In [5]:
def structural_positions_to_arr():
    with open("HCP/structural_labels.csv", newline="") as csvfile:
        lines = csvfile.readlines()
        component_positions = []
        for line in lines:
            # print(line.split()[4])
            coordinates = [line.split()[2], line.split()[3], line.split()[4]]
            component_positions.append(coordinates)
        component_positions = np.array(component_positions).astype(np.uint8)
    return component_positions
structural_positions_to_arr().shape

(164, 3)

## Save to File
In this cell I am saving the 4 arrays to one .npz file.

In [6]:
np.savez_compressed("HCP/data_hinne.npz", ID = ids, SC=streamline_counts, Labels = labels, Pos = structural_positions_to_arr())

## Test the file
In the following cells I am loading the .npz file and check the shape of the different arrays as a sanity check.

In [7]:
test = np.load("HCP/data_hinne.npz")

In [8]:
print(test["SC"].shape)
print(test["Labels"].shape)
print(test["Pos"].shape)
print(test["ID"].shape)

(30, 164, 164)
(164,)
(164, 3)
(30,)
