R script to tokenize, split and arrange sentences by order of complexity
ordersentences.R tokenizes and splits a text. Then calculates and ordenates sentences according to length and most common terms. The results of the script can be seen by running it and typing the commands as mentioned in the comments at the end of the script. The text used as example is cyntaf2.txt. Any other text can be used by changing the name of the file in the script ordersentences.R
cmu_wfreq.R runs a series of analysis and shows graphs mainly following Zipf's laws. As an example and to facilitate the analysis, brawddeg.csv and terms.csv are here added as data files (they contain the same data as the dataframes obtained with ordersentences.R).