In [3]:
import os
import pandas as pd
import numpy as np

def getMotifCoordDiff(motifCoordinatesFile,outputDirectory=""):
    
    if not os.path.exists(motifCoordinatesFile):
        print('Path to motifCoordinatesFile is incorrect or does not exist\n')
        return 1
    
    allCoords = pd.read_table(motifCoordinatesFile, sep='\t', header=0, index_col=0)
    allCoords = allCoords.sort_index(axis = 1)
    
    numOfMotifs = allCoords.shape[0]
    numOfpeaks = allCoords.shape[1]
    
    # all combinations of n motifs is n*(n-1)/2
    allCombinations = int(numOfMotifs*(numOfMotifs-1)/2) 
    
    numpyMatrix = np.zeros(shape=(numOfpeaks,allCombinations))
    
    for i in range(numOfpeaks):
        distanceMatrix = np.zeros(shape=(numOfMotifs,numOfMotifs))
        motifCoords = allCoords.iloc[:,i]
        for j in range(numOfMotifs):
             for k in range(numOfMotifs):
                if motifCoords[j] == 0 or motifCoords[k] == 0:
                    distanceMatrix[j][k] = 10000
                else:
                    distanceMatrix[j][k] = motifCoords[j]-motifCoords[k]
        # get Upper Triangular Half of the Matrix as to not repeat distances
        # this also flattens it into a n*(n-1)/2 vector
        numpyMatrix[i] = distanceMatrix[np.triu_indices(numOfMotifs,k=1)] 
    
    pdMatrix = pd.DataFrame(numpyMatrix.transpose())
    pdMatrix.columns = allCoords.columns
    
    output = outputDirectory + 'coordinateDifference.csv'
    pdMatrix.to_csv(output,header=True, index=True, sep='\t')
    
    print("Coordinates Differences have been placed in " + output)

# getMotifCoordDiff
getMotifCoordDiff( motifCoordinatesFile, outputDirectory (optional) )

# getMotifCoordDiff
getMotifCoordDiff( motifCoordinatesFile, outputDirectory (optional) )

The function provides a N*(N-1)/2 by M matrix containing the differences between the location of motifs on each peak given a motif coordinates file provided by the getMotifLocation function. N is the number of motifs and M is the number of peaks.
## Parameters:
**motifCoordinatesFile** - Filepath to the csv file containing the motif coordinates. This is provided by getMotifLocation function

**outputDirectory** - The output directory you would like the function to output to, this is an optional parameter, if not specified it will output in the directory where this code is run. Include the / in your input, for example if the directory is glasslab/data, input it as glasslab/data/

## Output
The program will output the coordinates to a csv file named coordinateDifference.csv in the following format. 
The csv file is separated by tabs and has a header and index. 
You can read it in pandas like this `pd.read_table('motifCoordinates.csv', sep='\t', header=0, index_col=0)`

| File            | peak1 | peak2  |  ...  | peakM  |
| --------------- | ----- | ------ | ----- | ------ |
| (Motif1_Coordinate)-(Motif2_Coordinate) |   a1  |   b1   |  ...  |   x1   |
| (Motif1_Coordinate)-(Motif3_Coordinate) |   a2  |   b2   |  ...  |   x2   |
|  ...            |  ...  |   ...  |  ...  |  ...   |
| (MotifN-2_Coordinate)-(MotifN-1_Coordinate) |   aY  |   bY   |  ...  |   xy   |

N is the number of motifs and M is the number of peaks
## Example Usage:

Will produce coordinateDifference.csv in the same directory the program is run

`getMotifCoordDiff("motifCoordinates.csv")`

Will produce coordinateDifference.csv in the the directory /user/test/

`getMotifLocation("motifCoordinates.csv","/user/test/","")`

In [5]:
getMotifCoordDiff("motifCoordinates.csv")
motifDiff = pd.read_table('coordinateDifference.csv', sep='\t', header=0, index_col=0)
motifDiff

Coordinates Differences have been placed in coordinateDifference.csv


Unnamed: 0,chr1-1,chr1-10,chr1-100,chr1-1003,chr1-1005,chr1-1008,chr1-101,chr1-1010,chr1-1012,chr1-1014,...,chrX-87,chrX-88,chrX-9,chrX-92,chrX-95,chrY-1,chrY-11,chrY-2,chrY-3,chrY-9
0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,-32.0,10000.0,...,10000.0,10000.0,1.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
1,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,...,10000.0,10000.0,2.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
2,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,...,10000.0,10000.0,1.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
