Skip to content

Latest commit

 

History

History
34 lines (23 loc) · 1.8 KB

README.md

File metadata and controls

34 lines (23 loc) · 1.8 KB

CBETA-txt

內容(Content):

  • 由CBETA_TAFxml repo 內容所轉置的純文字內容,內容包含「一經一檔」與「各卷一檔」等兩種格式
  • This respository contains the plain txt files that are transforming from the XML files in CBETA_TAFxml repo. Two different formats of text files are generated: "whole text of a scripture in one file" and "text of a scripture is splitted by fascicles".

內容範圍(Scope):

  • Taisho vol.1 - vol.55 + vol.85
  • Taisho No. T0001-T2184, T2732-T2920

** 注意事項 (Notices): **

  • 因為經文之中,經常含有序跋文字或梵文文字,(序:<div type="xu">, 跋:<div type="w">) 這些都不該納入文字分析的範圍,因此在轉換存文字檔時,這些區段會略過。

  • There is a possiblilty that the text of a scripture may consist of introduction(序) or postscript(拔), which was composed by other people rather than the original author. (in xml files: 序 is markuped as <div type="xu">, 跋 is markuped as <div type="w">). For servering the propose of the text analysis, those parts are exclueded in these text files.

資料檔案(Files):

  • 資料依經號,分不同子資料夾存放
  • 資料夾內 *_000.txt 為"整經不分卷"之文字檔
  • 資料夾內 *_nnn.txt 為"該經第nnn卷"之單卷文字檔
  • Each directory represents a scripture in CBETA.
  • Inside the directory, the filename ends with _000.txt denotes the file that containing the complete text of a scripture.
  • The filename ends with _NNN.txt denotes file consisting of the text of NNN-th fascicle of that scripture.

製作團隊(Our Team)

  • 法鼓文理學院/佛教學系
  • 佛典數位研究學習平台專案
  • 專案主持人:洪振洲
  • 資料整理:董惠珠、李慧萍