swedish_poetry

Poetic Corpus for Swedish Language

Database

Current corpus is in an SQLite database that can be viewed with, for example, DB Browser for SQLite.

Database has following schema:

Corpus has texts from three swedish authors:

Karin Boye, volumes:
- MOLN
- GÖMDA LAND
- HÄRDARNA
- FÖR TRÄDETS SKULL
- DE SJU DÖDSSYNDERNA
Nils Ferlin, volumes:
- GOGGLES
- BARFOTABARN
- FRÅN MITT EKORRHJUL
- EN DÖDDANSARES VISOR
- KEJSARENS PAPEGOJA
- MED MÅNGA KULÖRTA LYKTOR
Esaias Tegnér, volumes:
- MINDRE DIKTER
- FRITHIOFS SAGA

Source code

Available source code by file:

get_texts_boye.py:
- Crawl http://www.karinboye.se/verk/dikter/dikter/ to save poems as .txt files.
- List of poems to collect is in dikt.txt
get_texts_tegner.py:
- Crawl https://svenskadikter.com/ to collect texts of Frithiofs saga in JSON format to the output file tegner_frithiofs_saga_source_text.json.
- List of poems to collect is in file tegner_saga.txt
count_strofika.py:
- Split poems in stanzas.
- Split poems into individual lines.
- Determine whether stanza pattern is regular vs. irregular.
- Update number of syllables per line.
count_syllabs.py:
- Insert poems into DB.
- Create connection between poem and author.
- Split poems into stanzas and lines.
add_words.py:
- Add individual words from each poem to the database table word, including its gloss, lemma and line id it came from.
- Stagger was used to obtain words' glosses and lemmas.

Some code (DB creation, determining genre, getting Ferlin's texts) has been lost, most of the other code I wouldn't recommend using as is. It would require significant refactoring for any future use and to adhere to better coding practices.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
annotated		annotated
annotated_conll		annotated_conll
doc		doc
models		models
poems		poems
scripts		scripts
src		src
README.md		README.md
add_words.py		add_words.py
build.xml		build.xml
count_strofika.py		count_strofika.py
count_syllabs.py		count_syllabs.py
db_schema-1.png		db_schema-1.png
dikt.txt		dikt.txt
get_texts_boye.py		get_texts_boye.py
get_texts_tegner.py		get_texts_tegner.py
stagger.jar		stagger.jar
start.bat		start.bat
swedish_poetry.db		swedish_poetry.db
tegner_frithiofs_saga_source_text.json		tegner_frithiofs_saga_source_text.json
tegner_saga.txt		tegner_saga.txt
temp.txt		temp.txt
train_file.conll		train_file.conll
train_file.py		train_file.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

swedish_poetry

Database

Source code

About

Releases

Packages

Languages

aischeveva/swedish_poetry

Folders and files

Latest commit

History

Repository files navigation

swedish_poetry

Database

Source code

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages