#Experiment probing the possibility of using facial expressions as input data
Taken from "Investigating a User Controlled Narrative through Interactive Technologies Applied to Cinematographic Storytelling" by Hannes Andersson Which can be found here: http://donuan.tv/academic.htm
In this experiment, a characteristic of a facial expression is controlling a particular video action: a smile/frown controls the frame number in a video of a smile/frown. This is done in order to better understand the tool as well as the interaction in itself. I chose this interaction and functionality - even though frame-by-frame playback might not be the most useful functionality in an interactive film - because the similarity between the controlling action and the controlled object facilitates analysis of performance factors such as correspondence, delay and general performance. The experiment is made using FaceOSC, in combination with Pure Data Extended (Pd), and uses the FaceOSCReceiver patch (Anexo 1) by Danomatica, to receive the facial metrics.
#Outline of the Experiment
The FaceOSCReceiver patch receives the OSC message stream from FaceOSC and divides it into the categories left eye, right eye, nose, and mouth as well as sub categories such as height and width (Figure2).
The face is mapped in FaceOSC, and the facial metrics data is sent as an OSC message stream, which is received by Pd as numeric values (Figure 3).
As the smile/frown will be used as the relevant action, the width of the mouth is the most relevant facial data. When the mouth is compressed, the number received is lower than when the mouth smiles (Figure 4).
A combination of objects is added to the patch, in order to more easily determine the minimum and maximum as well as idle number (Figure 5).
As the intention of the experiment is to control a smile/frown in a video, a short video capturing this action is recorded. (Figure 6)
A configuration of objects is created with the function for import and playback of the video (using the built in playback engine GEM), as well as determining the relevant number of frames. The video has 116 frames, but as the smile peak occurs at frame 0, and frown peak at frame 31, this is the relevant frame rage (Figure 7). ¬¬
A configuration of objects is created to remap the relevant range of numeric data generated by FaceOSC, to the relevant range of frames. A slider is added to receive the remapped data with a range corresponding with the total number of frames in the video (Figure 8).
The slider is connected to the object controlling the video playback (Figure 9).
As a result the smile in the video is controlled by the user´s width of the mouth. The level of the smile in the video is corresponding with the level of the smile in the face of the user (Figure 10 & 11).
#Conclusion of the Experiment
The main strength of this interaction functionality is the intuitive nature of the action itself. It does not require any explanation in terms of functionality and the user does not have to engage with it actively, but only behave as s/he would normally behave. This is a suitable quality for interaction applied to fiction film, as it is not distracting the viewer from the film. Another quality of not requiring active engagement is that it does not involve consistency in the consequences of the specific actions. This enables the possibility of assigning different reactions to the same facial expression at different times. This is convenient as it enables the differentiation between, for example, a sad face in a tragic scene, and the same face it a comic scene. The interaction is also very flexible and could be translated to control: when cuts are made, how much importance is given to various aspects of the story, how long dialogues last, how frequently characters appear and the general outcome of the story. Another strength of this interaction is that, in difference to other forms of interactions requiring specific infrastructure, it could easily be reproduced on laptops, using the built in camera as the interaction interface. The limitations to this type of interaction are mainly dictated by how accurately the correspondence between facial expressions and emotions can be defined. A wide range of emotions having effect-ability would allow for a more dynamic interaction, provided that they could clearly be defined. Too many could, however, also easily result in emotions being misinterpreted. Some facial expressions might have very subtitle characteristics, which would require very specific definitions. If expressions are to specific it is hard to universally apply them. Similarly, a facial expression that is too general will too often be mistakenly read. Another problem is that the expression of some emotions might have similar characteristics to others, which are caused by emotions of an entirely different nature. An example of this is anticipation and confusion. If anticipation is interpreted as confusion, and therefor triggers further explanation of the plot, it would mainly generate frustration in the viewer. If this frustration further results in the story changing track, and the anticipated action never arriving, the interactive system would have failed altogether.
Facial expressions that cannot be clearly defined, either by their characteristics or by context, is better left out, as they might break the system. Interaction being limited to moments in which the context is clear lowers the risk of error, and therefor allow for a wider range of facial expressions having affectability. Emotions with very specific characteristics and small margin should not be assigned as controllers for important actions as they are liable not to be read.
FaceOSC works well if the face is well lit. It experiences problems in lower lighting, where it often looses as well as misinterprets the face. This is a serious limitation as a tool to be used in cinema, where low lighting is preferred. Using an infrared camera to collect the facial data would solve this issue, although this would probably require slight modifications in the code for face readings to be accurate. Face tracking is easily lost, if the face is moving too much, this is however a lesser limitation in cinema, where this is usually not the case. Another limitation is that the polygon resolution of the 3D face is quite low. This was not an issue in this specific experiment, but could be problematic when attempting to measure more subtitle facial expressions defined by a combination of readings, in which case resolution would need to be improved. It is further not able to correctly interpret the face of people having beards or wearing glasses. This is probably the biggest limitation, because it seems to be not specific to the tool, but a problem with the technique it self, that might further be difficult to solve.
Pd Extended is a good tool for prototyping as it is relatively easy to use, while still being highly flexible. The software is Open Source and has an active online community, who generally share their code publicly, along with the documentation of their work. This facilitates finding solutions to problems encountered, and further provides practical examples of the possible implementations of different functions. The key strength of Pd is that, because of its flexibility, it allows for experimenting with interconnectivity between functions, software and hardware, i.e. like Pd in this case established the connection between FaceOsc and the video playback. For processing video the tool is however limited to lower resolutions, as it experiences reproduction problems and is liable to crash, when reproducing high definition video. Consequently it would not be a suitable tool to use, if the functionality in this example was to be applied in the production of a face-controlled film. One could go around this issue by having another software reproducing the video, while still controlling it from Pd, a process I illustrated in chapter 5.7 and 5.8. If the work is to be reproduced in several places this is not very practical, as it requires the combination of several different software. A stand-alone product would be more convenient. Pd’s commercial equivalent MaxMSP would be able to export this functionality into a standalone application of the part handled, in this example by Pd, but would still require FaceOSC running separately. A specific application would therefor need to be designed for the purpose, if the film would be publicly distributed. As FaceTracker exist as an ad-on for openFrameworks, also developed by Kyle McDonnald, this would be a suitable application for the purpose.