Orthographic-DNN

This is the code repository for the research project Convolutional Neural Networks Trained to Identify Words Provide a Good Account of Visual Form Priming Effects

Project objective: comparing human orthographic perception with visual DNN models (CNNs and ViTs).

Project outcome: CNNs did a good job in predicting the pattern of human priming scores across conditions, with correlations ranging from τ = .49 (AlexNet) to τ = .71. (ResNet101) with all p-values < .01. The CNNs performed similarly to the various orthographic coding schemes word recognition models, and often better. This contrasts with the relatively poor performance of the Transformer networks, with τ ranging from .25 to .38.

Rationale

Prime Conditions The Form Priming Project includes 28 prime conditions for how a letter string can be amended to form a new string. For example, the word $design$ in the "final tansposition" condition will be presented as $deigns$.
Measuring humans' perceptual similarity of words or letter strings: For a human participant, the similarity $sim(s_1, s_2)$ of two word strings $s_1$ (the target) and $s_2$ (the prime) is measured using a Lexical Decision Task (LDT), where $s_1$ and $s_2$ are presented one at a time, with a fixation cross in between, and the participant has to decide as quickly as possible whether $s_1$ is a word or not. The reaction time is compared to that when the target word is presented with an arbitrary random string $s_3$ as prime. The similarity $sim(s_1, s_2)$ is calculated as $sim(s_1, s_2) = RT_{s_1|s_2} - RT_{s_1|s_3}$. For each condition $C$ and each prime string $s_2$, the mean similarity $\bar{sim}(s_1, C)$ is calculated by averaging the similarity $sim(s_1, s_2)$ over the 420 prime strings $s_2$ for $C$.
Measuring models' perceptual similarity of words or letter strings: For the models, the similarity $sim(s_1, s_2)$ is measured by the cosine similarity $sim(s_1, s_2) = \cos(s_1, s_2)$ between the two vectors $s_1$ and $s_2$ where $s_1$ and $s_2$ are the flattened penulimate layer outputs when the models are fed with two images of the two strings. For each condition $C$ and each prime string $s_2$, the mean similarity $\bar{sim}(s_1, C)$ is calculated by averaging the similarity $sim(s_1, s_2)$ over the 420 prime strings $s_2$ for $C$.
Comparing the perceptual patterns between humans and models: Kendall's rank correlation coefficient $\tau$ is used to measure the correlation between the human and model priming scores across conditions. The human priming scores are taken from the Form Priming Project, and the model priming scores are calculated by the code in this repository. For a given model $M$, its similarity with human priming is calculated as $\tau(M) = \sum_{C}(\bar{sim}(s_1, C)M - \bar{sim}(s_1, C){human})\text{sign}(\bar{sim}(s_1, C)M - \bar{sim}(s_1, C){human})$ where $\bar{sim}(s_1, C){M}$ and $\bar{sim}(s_1, C){human}$ are the mean similarity scores of the model $M$ and the human participant, respectively, for condition $C$.

Data

the Fonts used to generate the data are in assets/fonts stored as .ttf files.
The human priming data is sourced from the Form Priming Project (FPP), available at this link or here or here.
You can either download the training data and the prime data as zip files or run the generate_data.py script to generate as many images as you like. The configurations of letter translation, rotation, variation s of font and sizes are at here. The zip file of the training data contains 800,000 images which should be enough for all models used in the current research.

Setup

install python==3.10.4
install cuda driver
install pytorch on pytorch.org - >= torch-1.11.0
pip install -r requirements.txt

Psychological Priming Models and Coding Schemes

The LTRS model simulator is available at AdelmanLab
The Interactive Activation Model and the Spatial Coding Model are implemented using this calculator developed by Prof. Colin Davis.

Model Parameters

The tested models are Alexnet, DenseNet169, EfficientNet-B1 , ResNet50, ResNet101, VGG16, VGG19, ViT-B/16, ViT-B/32, ViT-L/16 and ViT-L/32. The models are initiated using ImageNet pre-trained weights from Torchvision, code for loading the weights are at tune.py. The trained parameters are available here

Additional Findings

layer-wise correlation coefficient: to be added*

Acknowledgement

The project was conducted under the auspices of the University of Bristol Mind & Machine Lab and supported by European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 741134).

Contact

For further instructions and enquiries, please contact Don Yin.

License

MIT License (see LICENSE file)

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
src		src
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
_demo.png		_demo.png
feature_extraction.png		feature_extraction.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orthographic-DNN

Rationale

Data

Setup

Psychological Priming Models and Coding Schemes

Model Parameters

Additional Findings

Acknowledgement

Contact

License

About

Releases

Packages

Languages

License

Don-Yin/Orthographic-DNN

Folders and files

Latest commit

History

Repository files navigation

Orthographic-DNN

Rationale

Data

Setup

Psychological Priming Models and Coding Schemes

Model Parameters

Additional Findings

Acknowledgement

Contact

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages