## Pycantus tutorial

Here comes basics of how to work with `pycantus` library.  
`pycantus`: A Python library designed to enhance accessibility of Gregorian chants for both coders and non-coders.

First make sure you have pycantus installed.  
  
For that one have to have `python` version 3.11 and above downloaded and installed (e.g. from [here](https://www.python.org/downloads/)) as well as `pip` ([guide here](https://packaging.python.org/en/latest/tutorials/installing-packages/)).  
  
Then you need to have pycantus source code directory downloaded (e.g. from [here]()).  
And then you can install pycantus library locally.  
That can be ensured by running this in command line in the root directory of the project (`PyCantus`):  
`pip install -e pycantus`

Now let's make sure you can use `pycantus`.

In [1]:
import pycantus
import pycantus.data as data

pycantus.hello_pycantus()


    *********************************************
    *                                           *
    *           Welcome to PyCantus!            *
    *                                           *
    *    A Python library designed to enhance   *
    *    accessibility of Gregorian chants for  *
    *    both coders and non-coders.            *
    *                                           *
    *********************************************
    


#### Get your first corpus to play with
Base of `pycantus` is work with data.  
This data are stored in `Corpus` object containing list of chants (objects `Chant`) and possibly also list of sources (objects `Source`) and list of melodies (objects `Melody`) accosiated with chnats of the corpus.  
You can load one of predefined datasets as well as your own files.

In [17]:
sample_corpus = data.load_dataset('sample_dataset')

Loading chants and sources...
Data loaded!


Now we can look how `Chant` and `Source` do look like as data holders:

In [18]:
sample_corpus.csv_chants_header

'cantus_id,incipit,siglum,srclink,chantlink,folio,db,sequence,feast,genre,office,position,melody_id,image,mode,full_text,melody,century'

In [19]:
sample_corpus.csv_sources_header

'title,siglum,century,provenance,srclink,numeric_century'

And also how particulary ones of them look like:

In [20]:
sample_corpus.chants[0].to_csv_row

'004141,Omnibus se invocantibus benignus adest,A-Gu 29,https://cantusdatabase.org/source/123610,https://cantusdatabase.org/chant/245439,215r,CD,,Nicolai,A,M,2.6,,https://unipub.uni-graz.at/obvugrscript/content/pageview/6705437,4,Omnibus se invocantibus benignus adest sanctus Nicolaus gloria tibi trinitas deus,,'

In [21]:
sample_corpus.sources[2].to_csv_row

'"Linz, Oberösterreichische Landesbibliothek, 290 (olim 183; olim Gamma p 19)",A-LIb 290 (olim 183; olim Gamma p 19),12th century,Kremsmünster,https://cantusdatabase.org/source/123617,12'

In [22]:
print('My first 20 chants have incipts:')
for chant in sample_corpus.chants[:20]:
    print('\t', chant.incipit)

My first 20 chants have incipts:
	 Omnibus se invocantibus benignus adest
	 Omnibus se*
	 Omnibus se invocantibus benignus adest
	 Omnibus se invocantibus*
	 Omnibus se invocantibus
	 Omnibus se invocantibus benignus adest
	 Omnibus se invocantibus benignus adest
	 Omnibus se invocantibus benignus adest sanctus 
	 Omnibus se invocantibus
	 Omnibus se invocantibus
	 Omnibus se invocantibus benignus adest
	 Omnibus se invocantibus
	 Humiliamini sub potenti manu dei ut 
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer 
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer
	 O Emmanuel rex et legifer


#### Editability

By default `Corpus` is not editable and so are all its `Chant` and `Source` objects.  
That means that you cannot change values in them, they are locked, and if we try to change some value, we shlould recieve an error.

In [23]:
sample_corpus.chants[0].incipit = 'Mamma Mia! Here I go again!'

AttributeError: Cannot modify 'incipit' because the object is locked.

However, if you need to do such edits (e.g. clean volpiano melodies) you can create whole `Corpus` editable:

In [24]:
sample_corpus_editable = data.load_dataset('sample_dataset', is_editable=True)
sample_corpus_editable.chants[0].incipit = 'Mamma Mia! Here I go again!'
print('Edited incipit:')
print('\t', sample_corpus_editable.chants[0].incipit)

Loading chants and sources...
Data loaded!
Edited incipit:
	 Mamma Mia! Here I go again!


### Export results of your work

In [26]:
chants_csv_file_name = 'my_great_corpus-mamma_mia-CHANTS.csv'
sources_csv_file_name = 'my_great_corpus-mamma_mia-SOURCES.csv'
sample_corpus_editable.export_csv(chants_csv_file_name, sources_csv_file_name)

#### Use your own data
You can also use your own data and then process them in provided data model with our analytics and data filtration tools.  
The only thing to do is to prepare your data in csv file(s) of perscribed format - that means file(s) having correct fileds with all mandatory fileds present, such as:  
For chants:  
- siglum (*): Abbreviation for the source manuscript or collection (e.g., "A-ABC Fragm. 1").
- srclink (*): URL link to the source in the external database (e.g., "https://yourdatabase.org/source/123").
- chantlink (*): URL link directly to the chant entry in the external database (e.g., "https://yourdatabase.org/chant/45678").
- folio (*): Folio information for the chant (e.g., "001v").
- sequence: The order of the chant on the folio (e.g., "1").
- incipit (*): The opening words or phrase of the chant (e.g., "Non sufficiens sibi semel aspexisse vis ").
- feast: Feast or liturgical occasion associated with the chant (e.g., "Nativitas Mariae").
- feast_code: Additional identifier unifying feasts with multiple spellings. The values themselves are meaningful in Cantus Index.
- genre: Genre of the chant, such as antiphon (A), responsory (R), hymn (H), etc. (e.g., "V").
- office: The office in which the chant is used, such as Matins (M) or Lauds (L) (e.g., "M").
- position: Liturgical position of the chant in the office (e.g., "01").
- cantus_id (*): The unique Cantus ID associated with the chant (e.g., "007129a").
- melody_id: The unique Melody ID associated with the chant (e.g., "001216m1").
- image: URL link to an image of the manuscript page, if available (e.g., "https://yourdatabase.org/image/12345").
- mode: Mode of the chant, if available (e.g., "1").
- full_text: Full text of the chant (e.g., "Non sufficiens sibi semel aspexisse vis amoris multiplicavit in ea intentionem inquisitionis").
- melody: Melody encoded in Volpiano, if available (e.g., "1---dH---h7--h--ghgfed--gH---h--h---").
- century: Number identifying the century of the source. If multiple centuries apply, the lowest number should be used.
- db (*): Code for the database providing the data, used for identification within CI (e.g., "DBcode").

For sources:  
- title(*): Name of the manuscript (can be same as siglum)
- srclink(*): URL link to the source in the external database (e.g., "https://yourdatabase.org/source/123").
- siglum(*): Abbreviation for the source manuscript or collection (e.g., "A-ABC Fragm. 1"). Use RISM whenever possible.
- century: Textual value identifying the century of the source. (e.g., "14th century").
- provenance: Place of origin or place of use of the source.
- numerical_century: Number only representation of century value.

(Fields marked with an asterisk (*) are obligatory and must be included in every record. Other fields are optional but recommended when data is available.)

In [None]:
# Fill in your data paths
my_great_corpus_chants_filename = ... # e.g. cantuscorpus_1.0/chants.csv
my_great_corpus_sources_filename = ... # e.g. cantuscorpus_1.0/sources.csv

In [None]:
my_great_corpus_editable = data.load_dataset(my_great_corpus_chants_filename, my_great_corpus_sources_filename, is_editable=True)

In [None]:
print('My great corpus first chant record:\n', my_great_corpus_editable.chants[0])

All other steps are the same as when using one of available datasets (such as sample dataset).

### Filter data
For data filtration we have Filter class.

In [27]:
from pycantus.filtration import Filter