Skip to content

cdli-gh/data

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 

CDLI Daily Bulk Data Dump

Last update was August 2022. Head to the open-data repository for the current data dumps

The repository contains a daily dump of all public catalogue and text data from the Cuneiform Digital Library Initiative.

Getting the data

Make sure you have the Git Large File Storage extentions (git-lsf) installed, see here for instructions. For installing under, say, Ubuntu, you can also use

$> curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
$> sudo apt-get install git-lfs

Clone the repository

$> git clone https://github.com/cdli-gh/data

Retrieve Git LSF data:

$> cd data
$> git lfs fetch

Format

Text Data

The CDLI transliterations dump is offered in plain text UTF-8 ATF format. For more information about ATF, visit : http://oracc.museum.upenn.edu/doc/help/editinginatf/cdliatf/index.html (Scroll down for an example).

Catalogue data

The catalogue is offered in a UTF-8 comma separated format. Most fields are thoroughly explained here: https://cdli.ucla.edu/?q=cdli-search-information
Our data schema is currently being remodeled, get in touch if you would like a sneak peak!

To view a sample of the catalogue, you can use the head command on a Unix machine using this syntax, while you are in the directory where the file is stored:

head cdli_catalogue_1of2.csv

With Windows Power Shell, try

Get-Content *filename* -Head *n*

EPP cdli@ucla.edu

About

This is a copy of the daily dump of catalogue and ATF data from the Cuneiform Digital Library Initiative (http://cdli.ucla.edu)

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •