# Data Conditioning when using data from Neuprint and FlyWire
When we ask questions about neuron connectivity, the synapses a neuron gets and where, we have to make sure that the data we pull from the databases is answering that question accuratly. We can make sure that the data is treated the same way each time by following these steps each time. 
1. Neuprint
2. FlyWire

## Neuprint

In [1]:
# Connecting to Neuprint
# Import packages from neuprint (Setting up access is shown in the tutorial file on basecamp)
from neuprint import Client, fetch_neurons, NeuronCriteria as NC, fetch_adjacencies

# Load the authentication token from a file
# I chose to store my authentication token in a file called "flybrain.auth.txt", this makes it easier to access and implement in the code
auth_token_file = open("flybrain.auth.txt", 'r')
auth_token = next(auth_token_file).strip()
try:
    np_client = Client('neuprint.janelia.org', dataset='hemibrain:v1.2.1', token=auth_token)
except:
    print("Failed to connect to Neuprint")
    np_client = None



In [None]:
# Pulling data from Neuprint using fetch_adjacencies
# fetch_adjacencies returns the connections between neurons that match the criteria you set. 
neuron_data, conn_data = fetch_adjacencies(None, NC(bodyId=423101189))

# Here we can see the expected number of rows (one for each neuron)
neuron_data


Unnamed: 0,bodyId,type,instance
0,423101189,oviIN,oviIN_R
1,234630133,SMP184,SMP184(PDL05)_L
2,263674097,LHPD2a5_a,LHPD2a5_a_R
3,266187480,SMP349,SMP349_R
4,266187559,SLP399,SLP399_R
...,...,...,...
2520,5901231318,,
2521,5901232053,SMP272,SMP272(PDL21)_L
2522,6400000773,SMP411,SMP411_R
2523,7112622044,LAL137,LAL137(PVL05)_L


In [11]:
# In the connection dataframe, we can see that fetch_adjacencies returns repeating pairs of connections for each ROI, rather than grouping them together.
# For this specific query, there are almost 1000 more repititons
conn_data

Unnamed: 0,bodyId_pre,bodyId_post,roi,weight
0,234630133,423101189,CRE(R),2
1,263674097,423101189,SMP(R),2
2,266187480,423101189,SMP(R),1
3,266187559,423101189,SMP(R),3
4,267214250,423101189,SMP(R),9
...,...,...,...,...
3526,6400000773,423101189,SMP(R),2
3527,7112622044,423101189,SIP(R),1
3528,7112622044,423101189,SMP(R),1
3529,7112622044,423101189,SMP(L),1


In [13]:
# When working with synapses, we need to make sure a pair of neurons is accuratlly represented by a single number that describes the total number of synapses between them.
# To do this, we can group the data by the pre and post synaptic neurons and sum the number of synapses between them.
conn_grouped = conn_data.groupby(['bodyId_pre','bodyId_post']).sum('weight')

# After doing that we can see that there are the correct rows. Each row is a representation of a connection between a pre-synaptic neuron and our neuron of interest.
# There is one less due to our neuron of interest being included in the neuron_data dataframe.
conn_grouped


Unnamed: 0_level_0,Unnamed: 1_level_0,weight
bodyId_pre,bodyId_post,Unnamed: 2_level_1
234630133,423101189,2
263674097,423101189,2
266187480,423101189,1
266187559,423101189,3
267214250,423101189,9
...,...,...
5901231318,423101189,1
5901232053,423101189,3
6400000773,423101189,2
7112622044,423101189,3


## FlyWire

This can be a bit different depending on the dataset you download from the codex. Generally it is important to check if the the connections are collapsed by neuropil.

In [None]:
# Let's import the dataframe I have downloaded from the codex
