TAALES_ES

Code for the Spanish version of TAALES (TAALES Español)

This page includes the code for the Spanish Version of TAALES (TAALES Español) used in Diez-Ortega & Kyle (2023).

Citation

If you use TAALES_ES in your work, please cite the following paper:

Díez-Ortega, M., & Kyle, K. (2023). Measuring the development of lexical richness of L2 Spanish: A longitudinal learner corpus study. Studies in Second Language Acquisition, 1-31. https://doi.org/10.1017/S0272263123000384

Getting Started

In order to use this code, you will first need to have a working Python 3 environment. If this is your first time using Python, we suggest using the Anaconda distribution of Python.

You will also need to install (Spacy)[https://spacy.io/usage] and download/install the "es_core_news_sm" Spanish language model for Spacy. Spacy has excellent tutorials for installing both the Spacy package itself and its language models.

Finally, you will need to install the lexical-diversity package.

You can install this using pip:

pip install lexical-diversity

Running TAALES_ES

Once Spacy, the "es_core_news_sm" model, and lexical-diversity are installed, TAALES_ES can be run.

Make sure that your working directory includes the TAALES_ES_0501 script and the data files. Then, open a Python interpreter and run the following code:

import TAALES_ES_0501 as tes
tes.TAALES_ES("path_to_texts_files/","results.csv") #where path_to_texts_files/ is the name of the folder that your text files are in

TAALES_ES will then process each text file in the target folder and produce a .csv file with the results for each text AND three files that include the index scores for each word/bigram in each file.

Brief Description of Indices

Index Name	Index Category	Basic Description	Reference Corpus/Database	Details
filename	Basic Description	name of processed text file	n/a	n/a
nwords	Basic Description	number of word tokens	n/a	n/a
ncontentwords	Basic Description	number of content word tokens	n/a	n/a
MATTR_lemmas	Lexical Diversity	moving average type-token ratio (50-word window)	n/a	n/a
lemma_frequency_log_AW	Word Frequency	Average reference-corpus frequency (log-transformed) for all words	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
lemma_frequency_log_CW	Word Frequency	Average reference-corpus frequency (log-transformed) for content words	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
lemma_frequency_log_verbs	Word Frequency	Average reference-corpus frequency (log-transformed) for verbs	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
lemma_frequency_log_nouns	Word Frequency	Average reference-corpus frequency (log-transformed) for nouns	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
bigram_freq_log	Bigram Frequency	Average reference-corpus frequency (log-transformed) for bigrams	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
bigram_soa_dp_LR	Bigram Strength of Association	Average delta p score (left word as cue) for bigrams	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
bigram_soa_dp_RL	Bigram Strength of Association	Average delta p score (right word as cue) for bigrams	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
bigram_soa_dp_max	Bigram Strength of Association	Average max delta p score max(right word as cue, left word as cue) for bigrams	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
bigram_soa_MI	Bigram Strength of Association	Average mutual information score for bigrams	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
bigram_soa_T	Bigram Strength of Association	Average T score for bigrams	Spanish Corpus of the Web (ESCOW14; Schäfer, 2015; Schäfer & Bildhauer, 2012)	Section ax01 (450 million words)
valency_CW	Psycholinguistic Information	Average valency score for content words	Guash et al., 2016	Includes ratings for 1,400 words
arousal_CW	Psycholinguistic Information	Average arousal score for content words	Guash et al., 2017	Includes ratings for 1,400 words
concreteness_CW	Psycholinguistic Information	Average concreteness score for content words	Guash et al., 2018	Includes ratings for 1,400 words
imageability_CW	Psycholinguistic Information	Average imageability score for content words	Guash et al., 2019	Includes ratings for 1,400 words
availability_CW	Psycholinguistic Information	Average availability score for content words	Guash et al., 2020	Includes ratings for 1,400 words
familiarity_CW	Psycholinguistic Information	Average familiarity score for content words	Guash et al., 2021	Includes ratings for 1,400 words

License

TAALES_ES is available for use under a Creative Commons Attribution-NonCommercial-Sharealike license (4.0)

For a summary of this license (and a link to the full license) click here.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ESM_database.txt		ESM_database.txt
README.md		README.md
TAALES_ES_0501.py		TAALES_ES_0501.py
escowax01_bi_soa.txt		escowax01_bi_soa.txt
escowax01_pos_freq.txt		escowax01_pos_freq.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TAALES_ES

Citation

Getting Started

Running TAALES_ES

Brief Description of Indices

License

About

Releases

Packages

Languages

LCR-ADS-Lab/TAALES_ES

Folders and files

Latest commit

History

Repository files navigation

TAALES_ES

Citation

Getting Started

Running TAALES_ES

Brief Description of Indices

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages