<h1>Welcome to our Interactive Documentation</h1>
<p>This notebook is designed to walk you through the steps necessary to getting started with the Ghost PII code base in python</p><br><br><br>

<h2>Imports</h2>
Once you have all the dependencies for Ghost PII installed, there are at least two packages you need to import every time, shown below

In [2]:
import pandas as pd
import numpy as np

In [3]:
#use this version for what was installed via pip
import ghostPii as gp



<br><br><br><br>
<h2>CryptoContext</h2>
<p>
    <b>class </b>db_toolbox.CryptoContext(<b>headers</b>)
    <p>This class is used to store authentication and controls all communication with the API</p>

<h3>Parameters</h3>
<ul>
    <li><b>headers</b> -- Json Dict<br>
        This argument is a dictionary containing your personal authentication token which can be obtained 
        <a href="https://ghostpii.com/rest-auth/registration/">here</a>
    </li>
</ul>

In [4]:
headers = {'Authorization': 'Token fb5f4075ad70614393a70f644397e4b14d612571'}
myContext = gp.CryptoContext(headers)

<br><br><br><br>
<h2>NormCipherFrame</h2>
<p>
    <b>class </b>NormCipherFrame(<b> myContext, cipherListOfListOfList, indexData=False,fromPlain=False,dataTypes=False,keyRange=32766,allFloat=False,permLevel='standard' </b>)
    <p>This class mimics a pandas DataFrame except it stores encrypted data instead of plaintext. It can be sliced and/or indexed similarly.</p>

<h3>Parameters</h3>
<ul>
    <li><b>myContext</b> -- CryptoContext<br>
        This argument accepts a CryptoContext object that is used for authentication/communication with the API
    </li><br>
    <li><b>cipherListOfListOfList</b> -- Pandas dataframe or 3D list of ciphertext integers<br>
        Almost always this argument is a plaintext pandas dataframe unless you are doing some custom frame of already encrypted data
    </li><br>
    <li><b>indexData</b> -- None or 3D list of index integers<br>
        Almost always ommitted as an argument unless you are doing some custom frame of already encrypted data
    </li><br>
    <li><b>fromPlain</b> -- None or True<br>
        Indicates whether the data needs encryption
    </li><br>
    <li><b>dataTypes</b> -- None or a list of types corresponding to each column<br>
        Almost always ommitted as an argument unless you already have a different encrypted frame. We will determine data types at runtime if given plaintext
    </li><br>
    <li><b>keyRange</b> -- None or int<br>
        This argument specifies the range of values to use for one time pad keys, smaller numbers will provide increased accuracy (particularly for numerical computations) at the cost of security
    </li><br>
    <li><b>allFloat</b> -- None or True<br>
        This specifies if all numbers in the frame should be padded with floating point keys or if they should be padded according to their basic data type (int vs float)
    </li><br>
    <li><b>permLevel</b> -- string<br>
        This specifies the level of permissions granted to a newly encrypted frame
    </li><br>
</ul>
<h3>Methods</h3>
<ul>
    <li><b>horiz_merge</b>( otherNCF )<br>
        This method accepts another NormCipherFrame or NormCipherList and performs a horizontal merge
    </li><br>
    <li><b>vert_merge</b>( otherNCF )<br>
        This method accepts another NormCipherFrame and performs a vertical merge
    </li><br>
    <li><b>frame_of_ciphertext</b>()<br>
        Accepts no arguments and returns a pandas dataframe of printable ciphertext (will not work if you have floating point numbers in the frame)
    </li><br>
    <li><b>metadata</b>()<br>
        Accepts no arguments and returns a Json Dict of metadata useful for sending encrypted data to others
    </li><br>
    <li><b>decrypt</b>()<br>
        Accepts no arguments and returns the decrypted dataframe. Will raise an error if you lack the permissions for this operation
    </li><br>
</ul>
<h3>Attributes</h3>
<ul>
    <li><b>rows</b><br>
        Number of rows in this NCF
    </li><br>
    <li><b>cols</b><br>
        Number of columns in this NCF
    </li><br>
    <li><b>listOfColMaxChars</b><br>
        Length of strings in each column
    </li><br>
    <li><b>cipherListOfListOfList</b><br>
        3D list of ciphertext integers
    </li><br>
    <li><b>indicesListOfListOfList</b><br>
        3D list of index integers
    </li><br>
    <li><b>dataTypes</b><br>
        a list of strings indicating what type of data is stored in each column
    </li>
</ul><br>
<h3>Supported Operations</h3>
<ul>
    <li>len( )</li>
    <li>[ i ]  (indexing)</li>
    <li>[ i : j ]  (slicing)</li>
    <li>for encryptedList in NormCipherFrame  (iteration)</li>
</ul>

In [5]:
plaintext = pd.read_csv('demo_data/rldata500.csv')
myCipherFrame = gp.NormCipherFrame(myContext,plaintext)
print(myCipherFrame.frame_of_ciphertext())

                               0                               1    2    3  \
0    j0!`(!X'!o'!%"!i'!i,!@7!f.!  "-!E9!>&!;,!:6!V+!z!!D!!e0!:4!  jG!  _+!   
1    f3!+%!c-!d.!!1!*,!D9!F7!)%!  1#!s4!B0!90!@-!M&!%1!%0!i-!6*!  ZC!  T,!   
2    &.!]9!r4!B'!s-!]%!(0!R7!x(!  (9!49!2,!#"!l,!p&!O5!#4!Z(!j2!  L9!  &(!   
3    z#!b.!;+!M2!r&!o,!!.!A6!!9!  j1!M0!,,!("!+&!*)!e5!V4!Y2!)*!  W?!  O,!   
4    ]3!"8!"&!S+!m1!a3!V&!h#!29!  P1!t*!!.!a"!.6!O(!"9!O0!(7!6%!  qG!  8%!   
..                           ...                             ...  ...  ...   
495  -+!B6!@5!H4!f-!20!51!l4!U%!  b%!"0!53!(2!F+!t&!?*!C(!J2!P&!  uO!  r&!   
496  i1!s&!Z7!H'!Z5!d5!U7!B0!0+!  T+!a1!]+!;:!d#!T(!j2!O+!51!d8!  =F!  P"!   
497  %2!2'!-*!g0!P%!F9!z4!N,!w+!  W.!;7!C%!s+!d3!:1!n(!K*!m-!8+!  )C!  Z"!   
498  >5!:(!c7!#*!=(!A-!X-!^&!S+!  V+!v8!d1!g1!##!P3!H&!Y0!@7!1&!  CH!  K+!   
499  !9!=3!r&!1%!x+!_*!'#!i-!&%!  *"!*(!l.!'0!%0!U9!i1!Q3!l7!s'!  N>!  x(!   

       4    5  
0    t3!  k9!  
1    >8!  (0!  
2    V+!  ^%!  

<br><br><br>
<h2>NormCipherList</h2>
<p>
    <b>class </b>NormCipherList(<b> myContext, cipherListOfList, indexData=False,fromPlain=False,seedString=False,keyRange=32766,permLevel='standard' </b>)
   <p>this class mimics a list object with some additional methods and features</p>

<h3>Parameters</h3>
<ul>
    <li><b>myContext</b> -- CryptoContext<br>
        This argument accepts a CryptoContext object that is used for authentication/communication with the API
    </li><br>
    <li><b>cipherListOfList</b> -- 2D list of ciphertext integers<br>
        A two dimensional list of ciphertext integers
    </li><br>
    <li><b>indexData</b> -- None or int or 2D list of index integers<br>
        A two dimensional list of index integers (if passed an int this 2D list is procedurally generated)
    </li><br>
    <li><b>fromPlain</b> -- None or True<br>
        Indicates if the list needs to be encrypted
    </li><br>
    <li><b>seedString</b> -- None or String<br>
        An optional seed from which to generate the one time pad keys
    </li><br>
    <li><b>keyRange</b> -- None or int<br>
        The max value to use when generating one time pad keys
    </li><br>
    <li><b>permLevel</b> -- string<br>
        This specifies the level of permissions granted to a newly encrypted list
    </li><br>
</ul>
<h3>Methods</h3>
<ul>
    <li><b>pad</b>( int )<br>
        Pads each entry in the string by the specified amount
    </li><br>
    <li><b>vert_merge</b>( otherNCL )<br>
        This method accepts another NormCipherList and performs a vertical merge
    </li><br>
    <li><b>ngram_hashes</b>( n-int )<br>
        Accepts an integer less than the length of the strings in the list and returns a list of our ngram hash values of the specified length
    </li><br>
    <li><b>ngram_distance_matrix</b>( n-int )<br>
        Accepts an integer less than the length of the strings in the list and returns a matrix of approximate ngram distances between words in the list
    </li><br>
    <li><b>list_of_ciphertext</b>( )<br>
        Returns a ciphertext representation of the encrypted data
    </li><br>
    <li><b>search</b>( queryString )<br>
        Accepts either a plaintext string or a NormCipherString and returns indices of matches contained in the list
    </li><br>
    <li><b>levenshtein</b>()<br>
        Accepts no arguments and returns a matrix of the Levenshtein distance between words in the list
    </li><br>
    <li><b>custom_equality</b>( func )<br>
        Accepts a function as an argument. This function is intended to be a distance formula written in regular python. This function is applied to the list. (example shown below)
    </li><br>
    <li><b>decrypt</b>()<br>
        Accepts no arguments and returns the decrypted list. Will raise an error if you lack the permissions for this operation
    </li><br>
</ul>
<h3>Attributes</h3>
<ul>
    <li><b>colMaxChars</b><br>
        Length of strings in this column
    </li><br>
    <li><b>cipherListOfList</b><br>
        2D list of ciphertext integers
    </li><br>
    <li><b>indicesListOfList</b><br>
        2D list of index integers
    </li>
</ul>
<h3>Supported Operations</h3>
<ul>
    <li>len( )</li>
    <li>[ i ]  (indexing)</li>
    <li>[ i : j ]  (slicing)</li>
    <li>for encryptedWord in NormCipherList (iteration)</li>
    
</ul>

In [6]:
def bigramDistance(stringList):
    
    distanceMx = np.zeros((len(stringList),len(stringList)),dtype=int)
    strLength = len(stringList[0])
    
    # iterate through the pairs in the list (because it's symmetric we only need to do half the matrix)
    for i in range(len(stringList)):
        for j in range(len(stringList)):
            
            # make sure we aren't doing both halves of the matrix
            if i <= j:
                
                if i == j:
                    # special case to avoid unnecessarily comparing strings to themselves
                    distanceMx[i,j] = 0
                    
                else:
                    numMatches = 0
                    
                    for k in range(len(stringList[i])-1):
                        
                        # compare pair characters for each string
                        if stringList[i][k:k+2] == stringList[j][k:k+2]:
                            numMatches += 1
                    
                    # find the distance and update the matrix
                    curWordDistance = strLength-1-numMatches
                    distanceMx[i,j] = curWordDistance
                    distanceMx[j,i] = curWordDistance
                                    
    
    return distanceMx

# index into our NCF to get one of the NormCipherLists
myCipherList = myCipherFrame[0]

# apply the function defined above to the list
bigramDistances = myCipherList.custom_equality(bigramDistance)

print(bigramDistances)

[[0 7 7 ... 6 7 6]
 [7 0 6 ... 5 4 5]
 [7 6 0 ... 6 6 6]
 ...
 [6 5 6 ... 0 5 3]
 [7 4 6 ... 5 0 5]
 [6 5 6 ... 3 5 0]]


<br><br><br>
<h2>NormCipherString</h2>
<p>
    <b>class </b>NormCipherString(<b> myContext, cipherList, indexData = False </b>)
   <p>this class mimics a plaintext string</p>

<h3>Parameters</h3>
<ul>
    <li><b>myContext</b> -- CryptoContext<br>
        This argument accepts a CryptoContext object that is used for authentication/communication with the API
    </li><br>
    <li><b>cipherList</b> -- list of ciphertext integers or str<br>
        A one dimensional list of ciphertext integers or if given a string it will encrypt the string as a list of ciphertext integers
    </li><br>
    <li><b>indexData</b> -- None or int or list of index integers<br>
        A one dimensional list of index integers
    </li><br>
    <li><b>keyRange</b> -- None or int<br>
        The max value to use when generating one time pad keys
    </li><br>
    <li><b>permLevel</b> -- string<br>
        This specifies the level of permissions granted to a newly encrypted frame
    </li><br>
</ul>
<h3>Methods</h3>
<ul>
    <li><b>ciphertext</b>( )<br>
        This method accepts no arguments and returns a string of printable ciphertext
    </li><br>
    <li><b>decrypt</b>()<br>
        Accepts no arguments and returns the decrypted string. Will raise an error if you lack the permissions for this operation
    </li><br>
</ul>
<h3>Attributes</h3>
<ul>
    <li><b>length</b><br>
        Length of string
    </li><br>
    <li><b>cipherList</b><br>
        List of ciphertext integers
    </li><br>
    <li><b>indicesList</b><br>
        List of index integers
    </li><br>
    <li><b>pairsList</b><br>
        List of ciphertext integers with their corresponding index integers in tuples
    </li>
</ul>
<h3>Supported Operations</h3>
<ul>
    <li>len( )</li>
    <li>str( )</li>
    <li>[ i ]  (indexing)</li>
    <li>[ i : j ]  (slicing)</li>

</ul>

In [7]:
myCipherString = gp.NormCipherString(myContext, "This is a string")
print(myCipherString.ciphertext())

!8!Lb%O4"8=!Ll!Jw"rh"*d"."#x2%)P!mm#T&!Io!.6"s;!


<br><br><br>
<h2>NormCipherQuant</h2>
<p>
    <b>class </b>NormCipherQuant(<b> myContext, cipherList, indexData=False,fromPlain=False,keyRange=32766,floatData=False,permLevel='standard' </b>)
   <p>this class mimics a list object with some additional methods and features</p>

<h3>Parameters</h3>
<ul>
    <li><b>myContext</b> -- CryptoContext<br>
        This argument accepts a CryptoContext object that is used for authentication/communication with the API
    </li><br>
    <li><b>cipherList</b> -- A list of ciphertext integers<br>
        A two dimensional list of ciphertext integers
    </li><br>
    <li><b>indexData</b> -- None or int or list of index integers<br>
        A two dimensional list of index integers (if passed an int this 2D list is procedurally generated)
    </li><br>
    <li><b>fromPlain</b> -- None or True<br>
        Indicates if the list needs to be encrypted
    </li><br>
    <li><b>keyRange</b> -- None or int<br>
        The max value to use when generating one time pad keys
    </li><br>
    <li><b>floatData</b> -- None or True<br>
        Indicates whether the encrypted data should be treated as floating point values
    </li><br>
    <li><b>permLevel</b> -- string<br>
        This specifies the level of permissions granted to a newly encrypted list
    </li><br>
    
</ul>
<h3>Methods</h3>
<ul>
    <li> <b>vert_merge</b> (otherNCQ) <br>
    This method accepts another NormCipherList and performs a vertical merge
    </li> <br>
    <li> <b> mean </b> ( ) <br>
        Accepts no arguments and calculates the mean value of the column
    </li><br>
    <li><b>stdev</b>( )<br>
        Accepts no arguments and calculates the standard deviation of the column
    </li><br>
    <li><b>median</b>( )<br>
        Accepts no arguments and calculates the median of the column
    </li><br>
    <li><b>cosine_similarity</b>( other )<br>
        Accepts another NCQ and calculates the cosine similarity between them, treating them as vectors
    </li><br>
    <li><b>dot_product</b>( other )<br>
        Accepts another NCQ and calculates the dot product between them, treating them as vectors
    </li><br>
    <li><b> magnitude</b> ( ) <br>
    Accepts no arguments and calculates the magnitude of this NCQ treating it as a vector
    </li> <br>
    <li><b>ciphertext or list_of_ciphertext</b>( )<br>
        Accepts no arguments and returns a ciphertext representation of the data. Won't work if data is floating point
    </li><br>
    <li><b>summation</b>( )<br>
        Accepts no arguments and returns a sum of of the column
    </li><br>
    <li><b>decrypt</b>( )<br>
        Accepts no arguments and returns the decrypted list. Will raise an error if you lack the permissions for this operation
    </li><br>
    
</ul>
<h3>Attributes</h3>
<ul>
    <li><b>floatData</b><br>
        Indicates if the data is treated as floating point numbers or integers
    </li><br>
    <li><b>cipherList</b><br>
        List of ciphertext integers
    </li><br>
    <li><b>indicesList</b><br>
        List of index integers
    </li>
</ul>
<h3>Supported Operations</h3>
<ul>
    <li>len( )</li>
    <li>[ i ]  (indexing)</li>
    <li>[ i : j ]  (slicing)</li>
    <li>for encryptedNum in NormCipherQuant (iteration)</li>
    <li>==, >, < </li>
    
</ul>

In [10]:
myCipherQuant = myCipherFrame[3]

In [11]:
myCipherQuant[0:10].dot_product(myCipherQuant[10:20])

334

<br><br><br>
<h2>NormCipherNum</h2>
<p>
    <b>class </b>NormCipherNum(<b> apiContext,cipher,index=False,fromPlain=False,floatData=False,keyRange = 32766,permLevel='standard'</b>)
   <p>this class mimics a plaintext int or float</p>

<h3>Parameters</h3>
<ul>
    <li><b>apiContext</b> -- CryptoContext<br>
        This argument accepts a CryptoContext object that is used for authentication/communication with the API
    </li><br>
    <li><b>cipher</b> -- int or float<br>
        A ciphertext int or float. Alternatively if fromPlain is True, this is a plaintext number to be encrypted
    </li><br>
    <li><b>index</b> -- None or int<br>
        An index integer
    </li><br>
    <li><b>fromPlain</b> -- None or True<br>
        Indicates if the number needs to be encrypted
    </li><br>
    <li><b>keyRange</b> -- None or int<br>
        The max value to use when generating one time pad keys
    </li><br>
    <li><b>floatData</b> -- None or True<br>
        Indicates whether the encrypted data should be treated as a floating point value
    </li><br>
    <li><b>permLevel</b> -- string<br>
        This specifies the level of permissions granted to a newly encrypted number
    </li><br>
</ul>
<h3>Methods</h3>
<ul>
    <li><b>ciphertext</b>( )<br>
        This method accepts no arguments and returns a string of printable ciphertext. Will not work if data is floating point
    </li><br>
    <li><b>decrypt</b>( )<br>
        Accepts no arguments and returns the decrypted number. Will raise an error if you lack the permissions for this operation
    </li><br>
</ul>
<h3>Attributes</h3>
<ul>
    <li><b>length</b><br>
        Length of string
    </li><br>
    <li><b>cipherList</b><br>
        List of ciphertext integers
    </li><br>
    <li><b>indicesList</b><br>
        List of index integers
    </li><br>
    <li><b>pairsList</b><br>
        List of ciphertext integers with their corresponding index integers in tuples
    </li>
</ul>
<h3>Supported Operations</h3>
<ul>
    <li>str( )</li>
    <li>+ , -, *, **</li>
    <li>==, >=, <=, >, <, != </li>

</ul>

In [14]:
myCipherNum1 = myCipherQuant[1]
myCipherNum2 = myCipherQuant[2]

In [17]:
myCipherNum1 > myCipherNum2

True

In [16]:
myCipherNum1+myCipherNum2

<ghostPii.data_structures.norm_cipher_num.NormCipherNum at 0x7efd8e073d30>