Using face + voiceprint + secret spoken phrase (which is used for both voice print and text) we can make a private key unique to you and deterministic.

In [11]:
!pip install deterministic-rsa-keygen mtcnn matplotlib scipy librosa numpy pocketsphinx SpeechRecognition pydub pyplot

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
[31mERROR: Could not find a version that satisfies the requirement pyplot (from versions: none)[0m
[31mERROR: No matching distribution found for pyplot[0m


# Face metrics

Make a rudimentary face detector, which looks for keypoints in a frame of a detected face (as a set of numbers). Scale these to a large enough Multi Dimensional Space and convert it into one value which can be used as part of a seed for a key, which is unique to a face.

What we really want to do is put the face dimensions in a large vector space and get one number we can use as the seed - we can round things down on the way so that it is "close" even with differing values from the picture (borrowed from https://stackabuse.com/guide-to-multidimensional-scaling-in-python-with-scikit-learn/)

In [12]:
def face_to_stress(face_file):

  from matplotlib import pyplot
  from mtcnn.mtcnn import MTCNN


  # load image from file
  pixels = pyplot.imread(face_file)

  # create the detector, using default weights
  detector = MTCNN()

  # detect faces in the image
  faces = detector.detect_faces(pixels)
  face = faces[0]


  from sklearn.manifold import MDS
  from matplotlib import pyplot as plt
  import sklearn.datasets as dt
  import seaborn as sns         
  import numpy as np
  from sklearn.metrics.pairwise import manhattan_distances, euclidean_distances
  from matplotlib.offsetbox import OffsetImage, AnnotationBbox

  x_offset = face['box'][0]
  y_offset = face['box'][1]


  def round_ten(num):    
    return round(num/10)*10

  (left_eye_1, left_eye_2) = face['keypoints']['left_eye'] 
  (right_eye_1, right_eye_2) = face['keypoints']['right_eye']
  (nose_1, nose_2) = face['keypoints']['nose']
  (mouth_left_1, mouth_left_2) = face['keypoints']['mouth_left']
  (mouth_right_1, mouth_right_2) = face['keypoints']['mouth_right']

  X = np.array([[round_ten(left_eye_1-x_offset), round_ten(left_eye_2-y_offset)], 
                [round_ten(right_eye_1-x_offset), round_ten(right_eye_2-y_offset)], 
                [round_ten(nose_1-x_offset), round_ten(nose_2-y_offset)], 
                [round_ten(mouth_left_1-x_offset), round_ten(mouth_left_2-y_offset)], 
                [round_ten(mouth_right_1-x_offset), round_ten(mouth_right_2-y_offset)]])
  mds = MDS(random_state=0)
  X_transform = mds.fit_transform(X)

  stress = mds.stress_
  print(stress)



# Voice print

Everyone in the world can have a reasonably unique voice print which is hard to spoof, especially if combined with a secret phrase. librosa provides some simple utilities to calculate this. Using https://en.wikipedia.org/wiki/Linear_predictive_coding to provide utterance tolerant fingerprint (not secure enough to be non replayable - needs to be combined with a spoken secret)

In [3]:
import librosa
import numpy as np

def calculate_voiceprint(audio_file, num_coeffs=5):


  # Calculate the linear predictive coefficients (LPCs) for the audio signal
  audio, sr = librosa.load(audio_file)
  lpcs = librosa.lpc(audio, num_coeffs)

  def round_array(x, round_to):
    # Round each element in the array to the nearest round_to value
    rounded_array = [round(n * (1 / round_to)) * round_to for n in x]
    return rounded_array

  
  return round_array(lpcs, 0.5)[:3]


# Voice to text

Here is some rudimentary voice to text to provide some extra signal

In [14]:
def voice_text(audio_file):
  import speech_recognition as sr
  from pydub import AudioSegment

  audio = AudioSegment.from_file(audio_file, format="m4a")
  raw_data = audio.raw_data
  audio_data = sr.AudioData(raw_data, audio.frame_rate, audio.sample_width)


  r = sr.Recognizer()
  text = r.recognize_sphinx(audio_data)
  print("text detected: " + text)
  return text

# Combine into deterministic seed

In [9]:
def make_seed(face_file, voice_file):
  return str(face_to_stress(face_file)) + str(calculate_voiceprint(voice_file)) + voice_text(voice_file)

# Encrypt from face and voice

Use the determinisic seed to create a private key

In [15]:
from rsa import generate_key, encrypt, decrypt

secret_key = generate_key(make_seed("test1.jpg", "voice_mic1.m4a"))

public_key = secret_key.publickey().exportKey("PEM")

# eg round trip:
secret = encrypt("Hello World using face as key", public_key)

print(secret)


0.07707175558959707




text detected: if my voice is my passport
b'qCL+DYTW7+l+7oIPnmMnwr6fWm7Ci5YFdC/papv5PQI0l5yACGxYygPO28G0v3OO+svNiX0pZwreAofz0l9dBlcAb+XIJox+7x0afmBcWrD4oiKCHLCwHwgKs2LnH8O3y3Gk1SwmD2u2M0AmBUEB6Hxacolv0aNdDuQZzpg0wuKB9LVbfMPIL83P0+1IkYdXIY+46Ka1WVgzhIQ0A8pN7YkXW91izcIwC0wMLbkJwX4UyiffAFiqvYfACRkMmUENQ/AogznM4AEOGX/ONgZ7U4SneU/m2M0uIt+QsNplVt1GQxgYHa5gkGB8xZu1WKsi6qrmZtvZ99WHgvrxwe91Cg=='


Now will use a different photo and voice to ensure we can make the same key and then decrypt

In [19]:

# using the other photo we can make the same key
secret_key = generate_key(make_seed("test2.jpg", "voice_mic1.m4a"))

private_key = secret_key.exportKey("PEM")

# and we get the secret back (and can use alternative audio if we are clear enough)
decrypt(secret, private_key)



0.12541588643659957




text detected: if my voice is my passport


b'Hello World using face as key'