## Pycantus tutorial

Here comes basics of how to work with `pycantus` library.  
`pycantus`: A Python library designed to enhance accessibility of Gregorian chants for both coders and non-coders.

First make sure you have latest pycantus version installed.  
That can be ensured by running in command line:  
`pip install --upgrade pycantus`

Now let's make sure you can use `pycantus`.

In [1]:
import pycantus
import pycantus.data as data

pycantus.hello_pycantus()


    *********************************************
    *                                           *
    *           Welcome to PyCantus!            *
    *                                           *
    *    A Python library designed to enhance   *
    *    accessibility of Gregorian chants for  *
    *    both coders and non-coders.            *
    *                                           *
    *********************************************
    


#### Get your firts corpus to play with
Base of `pycantus` is work with data.  
This data are stored in `Corpus` object containing list of chants (objects `Chant`) and possibly also list of sources (objects `Source`).  
You can load one of predefined datasets as well as your own files.

In [2]:
sample_corpus = data.load_dataset('sample_dataset')

Loading chants and sources...
Data loaded!


Now we can look how `Chant` and `Source` do look like as data holders:

In [3]:
sample_corpus.csv_chants_header

'cantus_id,incipit,siglum,srclink,chantlink,folio,db,sequence,feast,genre,office,position,melody_id,image,mode,full_text,melody,century'

In [4]:
sample_corpus.csv_sources_header

'title,siglum,century,provenance,srclink'

And also how particulary ones of them look like:

In [5]:
sample_corpus.chants[0].to_csv_row

'206135,Quasi David dejicit funda Philistaeum dum ,SK-Bra (Bratislava) Antiphonary of Bratislava I EC Lad.3,https://cantus.sk/source/14828,https://cantus.sk/chant/21201,163v,CSK,,Emerici,A,L,4,,,,Quasi David dejicit funda Philistaeum dum de carne perficit Emericus trophaeum,,'

In [6]:
sample_corpus.sources[2].to_csv_row

'Wien, Österreichische Nationalbibliothek, 1799**,A-Wn 1799**,13th century,Rein,https://cantusdatabase.org/source/123667'

In [7]:
print('My chants have incipts:')
for chant in sample_corpus.chants:
    print('\t', chant.incipit)

My chants have incipts:
	 Quasi David dejicit funda Philistaeum dum 
	 Quasi David deicit funda Philisteum
	 Quidam autem ex Judaeis
	 Quidam autem ex Judaeis volebant
	 Quidam autem ex Judaeis volebant
	 Quidam autem ex Judaeis volebant 
	 Quidam autem ex Judaeis volebant inter 
	 Quidam autem ex Judaeis
	 Quidam autem ex Judaeis volebant inter 
	 Quidam autem ex Judaeis
	 Iussit eam virgis cedi suspensam
	 His festis mundus jubili sit
	 Puer Jesus crescebat plenus
	 Puer Jesus crescebat plenus
	 Puer Jesus*
	 Puer Jesus crescebat plenus
	 Puer Jesus crescebat plenus
	 Maxima namque dominus per fimbrias
	 Maxima*
	 Cum oraret beatus Martialis
	 Cum oraret*
	 Cum oraret beatus Martialis apparens Dominus 
	 Cum oraret *
	 Milites sumus*
	 Utquem jam venerabilem vitam fecerat etiam 
	 Utquem jam venerabilem vitam
	 Deo cum laetitia serviens puella
	 Deo cum laetitia serviens
	 Deo cum laetitia serviens
	 Deo cum laetitia serviens
	 Deo cum laetitia serviens puella
	 Deo cum laetitia serv

##### Editability

By default `Corpus` is not editable and so are all its `Chant` and `Source` objects.  
That means that you cannot change values in them, they are locked, and if we try to change some value, we shlould recieve an error.

In [8]:
sample_corpus.chants[0].incipit = 'Mamma Mia! Here I go again!'

AttributeError: Cannot modify 'incipit' because the object is locked.

However, if you need to do such edits (e.g. clean volpiano melodies) you can create whole `Corpus` editable:

In [None]:
sample_corpus_editable = data.load_dataset('sample_dataset', is_editable=True)
sample_corpus_editable.chants[0].incipit = 'Mamma Mia! Here I go again!'
print('Edited incipit:')
print('\t', sample_corpus_editable.chants[0].incipit)

Loading chants and sources...
Data loaded!
Edited incipit:
	 Mamma Mia! Here I go again!


#### Export results of your work

In [None]:
chants_csv_file_name = 'my_great_chant_corpus-CHANTS.csv'
sources_csv_file_name = 'my_great_chant_corpus-SOURCES.csv'
sample_corpus_editable.export_csv(chants_csv_file_name, sources_csv_file_name)

#### Use your own data
You can also use your own data and then process them in provided data model with our analytics and data filtration tools.  
The only thing to do is to prepare your data in csv file(s) of perscribed format - that means file(s) having correct fileds with all mandatory fileds present, such as:  
For chants:  
- siglum (*): Abbreviation for the source manuscript or collection (e.g., "A-ABC Fragm. 1").
- srclink (*): URL link to the source in the external database (e.g., "https://yourdatabase.org/source/123").
- chantlink (*): URL link directly to the chant entry in the external database (e.g., "https://yourdatabase.org/chant/45678").
- folio (*): Folio information for the chant (e.g., "001v").
- sequence: The order of the chant on the folio (e.g., "1").
- incipit (*): The opening words or phrase of the chant (e.g., "Non sufficiens sibi semel aspexisse vis ").
- feast: Feast or liturgical occasion associated with the chant (e.g., "Nativitas Mariae").
- genre: Genre of the chant, such as antiphon (A), responsory (R), hymn (H), etc. (e.g., "V").
- office: The office in which the chant is used, such as Matins (M) or Lauds (L) (e.g., "M").
- position: Liturgical position of the chant in the office (e.g., "01").
- cantus_id (*): The unique Cantus ID associated with the chant (e.g., "007129a").
- melody_id: The unique Melody ID associated with the chant (e.g., "001216m1").
- image: URL link to an image of the manuscript page, if available (e.g., "https://yourdatabase.org/image/12345").
- mode: Mode of the chant, if available (e.g., "1").
- full_text: Full text of the chant (e.g., "Non sufficiens sibi semel aspexisse vis amoris multiplicavit in ea inten]tionem inquisitionis").
- melody: Melody encoded in Volpiano, if available (e.g., "1---dH---h7--h--ghgfed--gH---h--h---").
- century: Number identifying the century of the source. If multiple centuries apply, the lowest number should be used. (e.g., "12").
- db (*): Code for the database providing the data, used for identification within CI (e.g., "DBcode").

For sources:  
- title(*): Name of the source (can be same as siglum)
- srclink(*): URL link to the source in the external database (e.g., "https://yourdatabase.org/source/123").
- siglum(*): Abbreviation for the source manuscript or collection (e.g., "A-ABC Fragm. 1"). Use RISM whenever possible.
- century: century of source origin
- provenance: name of the place of source origin

(Fields marked with an asterisk (*) are obligatory and must be included in every record. Other fields are optional but recommended when data is available.)

In [None]:
# Fill in your data paths
my_great_corpus_chants_filename = ...
my_great_corpus_sources_filename = ...

In [None]:
my_great_corpus_editable = data.load_dataset(my_great_corpus_chants_filename, my_great_corpus_sources_filename, is_editable=True)

In [None]:
print('My great corpus first chant record:\n', my_great_corpus_editable.chants[0])

All other steps are the same as when using available datasets (such as sample dataset)

#### Filter data

#### Validation component

#### Analytics tools