Language Modeling

This project contains a Bert-model on norwegian and the code to rebuild. Feel free to reuse it.

The model is in the data-directory. When checkout use Git lfs, as the file is pretty big.

Why

This model works out of the box with f.eks. the sentence transfomer library. It is built on top of the multilingual bert.

Getting Started

from sentence_transformers import SentenceTransformer
model = SentenceTransformer('checkpoint-33500')
vector =  model.encode("Ibsen er en norsk forfatter som ikke var redd for litt kinnskjegg..")

Fine-tuning

Bert works best when fine tuned against the relevant material.

Put data in a directory and follow the train-part in the scripts here.

Notebooks

To use your module code (src/) in Jupyter notebooks (notebooks/) without running into import errors, make sure to install the source locally

pip install -e .

This way, you'll always use the latest version of your module code in your notebooks via import language_modeling.

Assuming you already have Jupyter installed, you can make your virtual environment available as a separate kernel by running:

pip install ipykernel
python -m ipykernel install --user --name="language-modeling"

Note that we mainly use notebooks for experiments, visualizations and reports. Every piece of functionality that is meant to be reused should go into module code and be imported into notebooks.

Distribution Package

To build a distribution package (wheel), please use

python setup.py dist

this will clean up the build folder and then run the bdist_wheel command.

Contributions

Before contributing, please set up the pre-commit hooks to reduce errors and ensure consistency

pip install -U pre-commit
pre-commit install

Contact

petter.egesund@sannsyn.com

Thanks to Russ and Wassim for cooperation :)

License

Freebsd

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
build/lib/language_modeling		build/lib/language_modeling
config		config
data		data
dist		dist
docs		docs
scripts		scripts
src		src
tests		tests
.DS_Store		.DS_Store
.gitattributes		.gitattributes
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
setup.py		setup.py

pegesund/nor_bert

Folders and files

Latest commit

History

Repository files navigation

Language Modeling

Why

Getting Started

Fine-tuning

Notebooks

Distribution Package

Contributions

Contact

License

About

Resources

Stars

Watchers

Forks

Languages