Skip to content
Formulas used in the papers and posters about Lexical Database Archiving
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
2013-Hawaii
2013-SIL-UND
2014-Oregon
2015-Hawaii
.DS_Store
.gitignore
OnlineDictionaries.md
README.md
Taxonomy of faithful archives.md
What does archiving mean.md
mailinglists.md

README.md

Lexical-Database-Archiving-Stats

This gihub repository contains scripts, formulas and data used in posters, presentations and papers presented, authored or co-authored by Hugh Paterson III on the topic of User Experience in digital interactions surrounding language archives. The purpose of these papers and presentations is to present research and rasise questions about the activites and attitudes of people interacting with language archives.

Papers, Publications, and Presentations (and notes or additional discussion items)

In chronological order, most recent first, these are:

  • Paterson, Hugh J, III. 2015. Lexical dataset archiving: an assessment of practice. Poster presented at the 4th International Conference on Language Documentation and Conservation, at the University of Hawai’i Mānoa, Honolulu, HI. February 26 – March 1st. Version 1.0

  • Paterson, Hugh J, III. 2014. Language archiving and data ecology. Presentation at the University of Oregon GLOSS and the Department of Linguistics Colloquium, Eugene, OR, 23 May. [Handout] (http://linguistics.uoregon.edu/wp-content/uploads/2013/09/Paterson-Handout.pdf)

  • Paterson, Hugh J, III and Jeremy Nordmoe. 2013. Challenges of implementing a tool to extract metadata from linguists: The use case of RAMP. Poster presented at SIL-UND, Summer 2013 session in Grand Forks, North Dakota. Version 1.5 (Essentially this is the same poster as presented in Hawai'i, but laid out with different dimensions.)

  • Paterson, Hugh J, III and Jeremy Nordmoe. 2013. Challenges of implementing a tool to extract metadata from linguists: The use case of RAMP. Poster presented at 3rd International Conference on Language Documentation and Conservation, at the University of Hawai’i Mānoa, Honolulu, HI. February 28 – March 3rd. Version 1.4 Online at: http://hdl.handle.net/10125/26178

Formulas

Formulas used in the papers and posters about Lexical Database Archiving.

  • Hawai'i 2015 - Formulas used in the Hawai'i 2015 presentation
  • Oregon 2014 - Methodologies are mostly the same as Hawai'i 2015 only the data changes
  • SIL-UND 2013 - Methodologies are mostly the same and the data is the same as Hawai'i 2013
  • Hawai'i 2013 - Not availble yet.

Scripts

Combinations of R scritps and Python scripts used to process the data and acomplish the goals of the formulas.

  • Hawai'i 2015
  • Oregon 2014 - Methodologies are mostly the same as Hawai'i 2015 only the data changes
  • SIL-UND 2013 - Methodologies are mostly the same and the data is the same as Hawai'i 2013
  • Hawai'i 2013 - Not availble yet. (includes Gephi data and use)

Data

Some data are used across presentations. This makes it difficult to categorize the data in one single folder.

There are several sets of data outlined as follows:

  • Archive specific data from SIL International - This is corporation specific and confidential data and can not be released, but can be discussed in general terms. This data is only used in Paterson & Nordmoe.

  • Questionaire response data - This data is collected via the google form at the following link: http://bit.ly/19QSPMb This must be anonymized before release, as indicated in the terms of collection. One portion of this data is accessible via: Data File 1

    A subset of the questionaire data should be made public and included in this project. However, the data yet needs to be anonymized as much as is possible, to comply with the terms under which it was collected. It is well known in both the linguistics and big-data communities anonomized data can usually be reconstructed to some degree given access and comparison to tertiary data sets. So, even my efforts may not completely obscure retraceable facts.

  • ISO 639-3 data - This data is openly avaible from the ISO 639-3 Registrar, but is replicated in this repo for consistency across papers and presentations. The orginal documents are presented in a folder titled: iso-639-3_Code_Tables_20140320. Additionally, a CSV file titled:iso-639-3_20140320.csv is in the data folder and is an export of the table used to collate other data.

  • SIL.org data for GIS locations of languages. This data was taken from SIL.org and was merged against ISO 639-3 tables. This data appears in two files. A .txt file with the raw data grabed from SIL.org on 20. February 2015. and a .csv file, SIL-org GIS Data.csv with the data as originally grabbed in 2014.

Leave a comment in the issue tracker for more information.

Contents of this repo, when not covereed under other licenses are licensed Creative Commons 4.0 with the By, NC, and SA clauses CC-BY-NC-SA. Some materials may be licensed in other ways as directly indicated.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.