eluent

A tool for conducting hybrid microgenetic analysis with sensor data. Eluent consists of a Python package for data analysis and processing and a Ruby web app interface for post-study qualitative analysis. Eluent accompanies our paper:

Cesar Torres, Matthew Jörke, Emily Hill, Eric Paulos. Hybrid Microgenetic Analysis: Using Activity Codebooks toIdentify and Characterize Creative Process, Creativity & Cognition 2019.

Each JSON file should be structured as

{
    "timestamp":            # start of session (Unix time)
    "sampling_rate":        # sensor sampling rate in ms
    "data": [...]           # an array of sensor data sampled @ sampling_rate
}

Asynchronously sampled data (e.g. Juptyer notebook events) can alternatively be represented as

{
    "data": {           
        "t": [...]          # array of time values (Unix time)
        "y": [...]          # array of sample values
    }                       # --> value y[i] was sampled at time t[i]
}

MTS Matrix Construction

MTS matrix tools are contained in the MTS object within the dataset module. Properly formatted input data can be automatically read, time-aligned, normalized, and sampled into windows of size L.

mts = dataset.MTS(users, 'jupyter')     # construct MTS matrix for juptyer features
L = mts.time2L(8)                       # determine the window size using the sampling frequency
mts.extract_samples(L, normalize=True)  # extract samples and perform per sample normalization

Services (job queues, cache servers, search engines, etc.)
Deployment instructions
...

codebook = activity.Codebook(mts)

(1) Distillation

The distillation phase uses an adaptive greedy centers algorithm to reduce the size of the dataset. The cull_threshold ε controls how many samples are discarded at every step; a higher cull threshold produces fewer samples.

codebook.distill(cull_threshold=10)

After subsampling, the resulting pruned dataset is hierarchically clustered. This operation is computationally intensive: the cull threshold should be set such that approx. 1000 samples are found.

(2) Extraction

The exatraction phase identifies K maximially distinctive codewords by pruning the dendrogram at the K-th level.

codebook.extract(K=5)

(3) Application

Finally, the codebook is applied the original MTS matrix, producing a Chromatogram object.

chromatogram = codebook.apply()

Chromatogram Analysis

Rendering

To generate and visualize the full chromatogram, the chromatogram object must first be rendered: rendering applies window smoothing, clusters users based on a given statistic, and recolors the chromatogram based on clustering statistics.

chromatogram.render(smoothing_window=3, segment_on='freqs', reorder_colors=True)

smoothing_window: controls the kernel size W during window smoothing
segment_on: determines which feature vector η users are segmented on
- freqs: codeword frequency vector
- logfreqs: log of codeword frequency vector
- markov: codeword transition matrix
- width: bandwidth mean and standard deviations

Statistics & Analysis

The Chromatogram object supports the following methods for codeword analysis.

Function Name	Output
`get_length_stats`	dict from codeword number to (mean bandwidth, bandwidth stddev)
`get_codeword_length_distribution`	dict from codeword number to raw list of all bandwidth lengths
`get_lengths_per_user`	dict from user to dict of raw bandwidth lengths per codeword
`get_markov_model`	dict from user to markov transition matrix
`get_freqs_per_user`	dict from user to codeword frequency vector

Exporting

The qualitative analysis interface is able to annotate each user's screen capture with the current codeword. To export codeword data for each user, time-tracked VTT (subtitle) files can be generated for each user.

save_path = 'vtt'
prefix = 'jupyter'
dataset.save_subtitles(save_path, chromatogram, prefix)

Visualizations

The following methods are supported:

Codebook.visualize()
codebook.visualize_linkage()
Chromatogram.visualize()
Chromatogram.plot_user()
Chromatogram.plot_freq_diff()
(in development) Chromatogram.plot_bandwidth_diff()
(in development) Chromatogram.plot_markov_diff()

See the Jupyter notebook example.py for examples of each visualization method.

Note that all visualizations use the open source CMU Bright font.

_{^{Logo Attribution––Binary Code by Taylan Sentürk, Flask by iconsmind.com from the Noun Project}}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
eluent		eluent
nbextensions		nbextensions
rails-hma		rails-hma
supplemental_materials		supplemental_materials
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
example.ipynb		example.ipynb
logo.png		logo.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

eluent

MTS Matrix Construction

(1) Distillation

(2) Extraction

(3) Application

Chromatogram Analysis

Rendering

Statistics & Analysis

Exporting

Visualizations

About

Releases

Packages

Contributors 2

Languages

License

Hybrid-Ecologies/eluent

Folders and files

Latest commit

History

Repository files navigation

eluent

MTS Matrix Construction

(1) Distillation

(2) Extraction

(3) Application

Chromatogram Analysis

Rendering

Statistics & Analysis

Exporting

Visualizations

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages