## Pycantus tutorial

Here comes basics of how to work with `pycantus` library.  
`pycantus`: A Python library designed to enhance accessibility of Gregorian chants for both coders and non-coders.

First make sure you have pycantus installed.  
  
For that one have to have `python` version 3.11 and above downloaded and installed (e.g. from [here](https://www.python.org/downloads/)) as well as `pip` ([guide here](https://packaging.python.org/en/latest/tutorials/installing-packages/)).  
  
Then you need to have pycantus source code directory downloaded (e.g. from [here]()).  
And then you can install pycantus library locally.  
That can be ensured by running this in command line in the same directory as the root directory of the project (`PyCantus`) is placed:  
`pip install -e pycantus`

Now let's make sure you can use `pycantus`.

In [106]:
import pycantus
import pycantus.data as data

pycantus.hello_pycantus()


    *********************************************
    *                                           *
    *           Welcome to PyCantus!            *
    *                                           *
    *    A Python library designed to enhance   *
    *    accessibility of Gregorian chants for  *
    *    both coders and non-coders.            *
    *                                           *
    *********************************************
    


## Get your first corpus to play with
Base of `pycantus` is work with data.  
This data are stored in `Corpus` object containing list of chants (objects `Chant`) and possibly also list of sources (objects `Source`) and list of melodies (objects `Melody`) accosiated with chnats of the corpus.  
You can load one of predefined datasets as well as your own files.

In [107]:
sample_corpus = data.load_dataset('sample_dataset')

Loading chants and sources...
Data loaded!


Now we can look how `Chant` and `Source` do look like as data holders:

In [108]:
sample_corpus.csv_chants_header

'cantus_id,incipit,siglum,srclink,chantlink,folio,db,sequence,feast,genre,office,position,melody_id,image,mode,full_text,melody,century'

In [109]:
sample_corpus.csv_sources_header

'title,siglum,century,provenance,srclink,numeric_century'

And also how particulary ones of them look like:

In [110]:
sample_corpus.chants[0].to_csv_row

'004141,Omnibus se invocantibus benignus adest,A-Gu 29,https://cantusdatabase.org/source/123610,https://cantusdatabase.org/chant/245439,215r,CD,,Nicolai,A,M,2.6,,https://unipub.uni-graz.at/obvugrscript/content/pageview/6705437,4,Omnibus se invocantibus benignus adest sanctus Nicolaus gloria tibi trinitas deus,,'

In [111]:
sample_corpus.sources[2].to_csv_row

'"Linz, Oberösterreichische Landesbibliothek, 290 (olim 183; olim Gamma p 19)",A-LIb 290 (olim 183; olim Gamma p 19),12th century,Kremsmünster,https://cantusdatabase.org/source/123617,12'

In [112]:
print('My first 20 chants have incipts:')
for chant in sample_corpus.chants[:20]:
    print('\t', chant.incipit)

My first 20 chants have incipts:
	 Omnibus se invocantibus benignus adest
	 Omnibus se*
	 Omnibus se invocantibus benignus adest
	 Omnibus se invocantibus*
	 Omnibus se invocantibus
	 Omnibus se invocantibus benignus adest
	 Omnibus se invocantibus benignus adest
	 Omnibus se invocantibus benignus adest sanctus 
	 Omnibus se invocantibus
	 Omnibus se invocantibus
	 Omnibus se invocantibus benignus adest
	 Omnibus se invocantibus
	 Humiliamini sub potenti manu dei ut 
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer 
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer


### Editability

By default `Corpus` is not editable and so are all its `Chant` and `Source` objects.  
That means that you cannot change values in them, they are locked, and if we try to change some value, we shlould recieve an error.

In [113]:
sample_corpus.chants[0].incipit = 'Mamma Mia! Here I go again!'

AttributeError: Cannot modify 'incipit' because the object is locked.

However, if you need to do such edits (e.g. clean volpiano melodies) you can create whole `Corpus` editable:

In [114]:
sample_corpus_editable = data.load_dataset('sample_dataset', is_editable=True)
sample_corpus_editable.chants[0].incipit = 'Mamma Mia! Here I go again!'
print('Edited incipit:')
print('\t', sample_corpus_editable.chants[0].incipit)

Loading chants and sources...
Data loaded!
Edited incipit:
	 Mamma Mia! Here I go again!


### Handeling sources during load
When loading data into `Corpus` we can be interested if we have `Source` for each of our `Chants` (e.g. when creating collection).


One way of handeling missing records about sources is creating them (because all three mandatory values for `Source` can be obtained form mandatory field of corresponding `Chant` : title <= siglum):

In [115]:
sample_corpus_edit_miss_s = data.load_dataset('sample_dataset', is_editable=True, create_missing_sources=True)

Loading chants and sources...
Data loaded!


Other way is to ask for warning about each such missing source:

In [116]:
sample_corpus_edit_miss_s = data.load_dataset('sample_dataset', is_editable=True, check_missing_sources=True)

Loading chants and sources...
Data loaded!


Default setting is:
- `check_missing_sources=False`
- `create_missing_sources=False`

### Export results of your work

In [137]:
chants_csv_file_name = 'my_great_corpus-mamma_mia-CHANTS.csv'
sources_csv_file_name = 'my_great_corpus-mamma_mia-SOURCES.csv'
sample_corpus_editable.export_csv(chants_csv_file_name, sources_csv_file_name)

#### Use your own data
You can also use your own data and then process them in provided data model with our analytics and data filtration tools.  
The only thing to do is to prepare your data in csv file(s) of perscribed format - that means file(s) having correct fileds with all mandatory fileds present, such as:  
For chants:  
- siglum (*): Abbreviation for the source manuscript or collection (e.g., "A-ABC Fragm. 1").
- srclink (*): URL link to the source in the external database (e.g., "https://yourdatabase.org/source/123").
- chantlink (*): URL link directly to the chant entry in the external database (e.g., "https://yourdatabase.org/chant/45678").
- folio (*): Folio information for the chant (e.g., "001v").
- sequence: The order of the chant on the folio (e.g., "1").
- incipit (*): The opening words or phrase of the chant (e.g., "Non sufficiens sibi semel aspexisse vis ").
- feast: Feast or liturgical occasion associated with the chant (e.g., "Nativitas Mariae").
- feast_code: Additional identifier unifying feasts with multiple spellings. The values themselves are meaningful in Cantus Index.
- genre: Genre of the chant, such as antiphon (A), responsory (R), hymn (H), etc. (e.g., "V").
- office: The office in which the chant is used, such as Matins (M) or Lauds (L) (e.g., "M").
- position: Liturgical position of the chant in the office (e.g., "01").
- cantus_id (*): The unique Cantus ID associated with the chant (e.g., "007129a").
- melody_id: The unique Melody ID associated with the chant (e.g., "001216m1").
- image: URL link to an image of the manuscript page, if available (e.g., "https://yourdatabase.org/image/12345").
- mode: Mode of the chant, if available (e.g., "1").
- full_text: Full text of the chant (e.g., "Non sufficiens sibi semel aspexisse vis amoris multiplicavit in ea intentionem inquisitionis").
- melody: Melody encoded in Volpiano, if available (e.g., "1---dH---h7--h--ghgfed--gH---h--h---").
- century: Number identifying the century of the source. If multiple centuries apply, the lowest number should be used.
- db (*): Code for the database providing the data, used for identification within CI (e.g., "DBcode").

For sources:  
- title(*): Name of the manuscript (can be same as siglum)
- srclink(*): URL link to the source in the external database (e.g., "https://yourdatabase.org/source/123").
- siglum(*): Abbreviation for the source manuscript or collection (e.g., "A-ABC Fragm. 1"). Use RISM whenever possible.
- century: Textual value identifying the century of the source. (e.g., "14th century").
- provenance: Place of origin or place of use of the source.
- numerical_century: Number only representation of century value.
- cursus: Secular (Cathedral, Roman) or Monastic cursus of the source. 

(Fields marked with an asterisk (*) are obligatory and must be included in every record. Other fields are optional but recommended when data is available.)

In [118]:
# Fill in your data paths
my_great_corpus_chants_filename = 'corpus_monodicum_pilot_2025-06-18/cm_pycantus_data/chants.csv' # or e.g 'cantuscorpus_1.0/cantuscorpus_1.0/chants.csv'
my_great_corpus_sources_filename = 'corpus_monodicum_pilot_2025-06-18/cm_pycantus_data/sources.csv' # or e.g. cantuscorpus_1.0/cantuscorpus_1.0/sources.csv

In [119]:
# And then load your data into pycantus data model
my_great_corpus_editable = data.load_dataset(my_great_corpus_chants_filename, my_great_corpus_sources_filename, 
                                             is_editable=True, create_missing_sources=True)

Loading chants and sources...
Data loaded!


In [120]:
print('number of chants in my great corpus:', len(my_great_corpus_editable.chants))
print('My great corpus first chant record:\n', my_great_corpus_editable.chants[0])

number of chants in my great corpus: 401
My great corpus first chant record:
 https://corpus-monodicum.de/d/76459913-d885-4a4f-aef6-2e72a32d5bba : g00553


All other steps are the same as when using one of available datasets (such as sample dataset).

### Filter data
Most experimental workflows involve some filtering of the input chants: by genre, feast set, sources, etc. Describing and replicating this filtering exactly is crucial for research reproducibility. This is partly mitigated by just providing the filtered dataset, but not entirely: it is important to assess how findings apply to different sets of repertoire selected according to the same principles (e.g., as new sources and database segments are added).
To this end, PyCantus provides a mechanism to export filtering conditions to a YAML configuration file, and to load and apply such a file to a corpus.

For data filtration we have Filter class.  

In [121]:
# Import the appropriate class
from pycantus.filtration import Filter

In [122]:
# Create filter object
tutorial_filter = Filter('tutorial_filter')

In [123]:
# Specify fields with their values that you want to drop
# Fields you do not specify are tretaed as 'I do not care', so not checked
# With sources also their chants are droped
tutorial_filter.add_value_exclude('cantus_id', '004141')
tutorial_filter.add_value_exclude('feast', 'Pascha')
tutorial_filter.add_value_exclude('provenance', ['St. Martial', 'St-Martial', 'Albi'])

In [124]:
# Specify fields with their values that you want to keep 
# Other values of that filed would be droped
tutorial_filter.add_value_include('numeric_century', [10, 11, 12, 13, 14])

In [125]:
# Apply filter on corpus:
print('Number of chants before filtration:', len(sample_corpus_editable.chants))
print('Number of sources before filtration:', len(sample_corpus_editable.sources))
sample_corpus_editable.apply_filter(tutorial_filter)
print('Number of chants after filtration:', len(sample_corpus_editable.chants))
print('Number of sources after filtration:', len(sample_corpus_editable.sources))

Number of chants before filtration: 100
Number of sources before filtration: 78
Number of chants after filtration: 65
Number of sources after filtration: 57


In [126]:
# If you want to dicard sources that has no chants in the corpus after filtration, call
sample_corpus_editable.drop_empty_sources()
print('Number of sources after drop_empty_sources:', len(sample_corpus_editable.sources))

Number of sources after drop_empty_sources: 55


In [127]:
EXPORTED_YAML_DIR_PATH = 'tutorial_filter_export'
tutorial_filter.export_yaml(EXPORTED_YAML_DIR_PATH)

Filter 'tutorial_filter' successfully exported to: tutorial_filter_export\tutorial_filter.yaml


Let's try how our stored filter works e.g. moved to different project:

In [128]:
loaded_filter = Filter('loaded_titorial_filter')
loaded_filter.import_yaml(EXPORTED_YAML_DIR_PATH+'/tutorial_filter.yaml')

In [129]:
print(loaded_filter)

name: tutorial_filter
include_values:
  numeric_century:
  - 10
  - 11
  - 12
  - 13
  - 14
exclude_values:
  cantus_id:
  - '004141'
  feast:
  - Pascha
  provenance:
  - St. Martial
  - St-Martial
  - Albi



In [130]:
print('Number of chants before filtration:', len(my_great_corpus_editable.chants))
my_great_corpus_editable.apply_filter(loaded_filter)
print('Number of chants after filtration:', len(my_great_corpus_editable.chants))

Number of chants before filtration: 401
Number of chants after filtration: 264


##### Clean corpus after filtration

In [131]:
print('number of sources after filtration, before cleaning:', len(my_great_corpus_editable.sources))
my_great_corpus_editable.drop_empty_sources()
print('number of sources after filtration, after cleaning:', len(my_great_corpus_editable.sources))

number of sources after filtration, before cleaning: 26
number of sources after filtration, after cleaning: 25


### Work with melody


In [132]:
sample_corpus_editable.create_melodies()

In [133]:
print('First melody in my great corpus:')
print(sample_corpus_editable.melodies[0])

First melody in my great corpus:
1---hk-kj---kh--jh--gh--h---j---h---h--jk--hj---hh--g---h--eh--hj--h--h---j--k--lml-mnm---m---l--k--kj---hk--kj--g7---hk--k---kJ---h--jH--gh---hge---h--g--h---kj--g---jk--h---4---k--k--k--j--g--h---3


In [134]:
for melody in sample_corpus_editable.melodies:
    melody.clean_volpiano()

In [135]:
print('First melody in my great corpus after clean_volpiano():')
print(sample_corpus_editable.melodies[0])

First melody in my great corpus after clean_volpiano():
hkkjkhjhghhjhhjkhjhhghehhjhhjklmlmnmmlkkjhkkjghkkkJhjHghhgehghkjgjkhkkkjgh


In [136]:
from collections import Counter

mode_frequecy = Counter([m.mode for m in sample_corpus_editable.melodies])
print('modes in melodies of my great corpus and their frequency in corpus:')
for mode, frequency in mode_frequecy.items():
    print(mode, '\t:\t', frequency)

modes in melodies of my great corpus and their frequency in corpus:
2T 	:	 3
2 	:	 2
