Topic-Modeling-Using-Transformer-Models

About this repo

In this analysis, I conduct Topic Modeling on a collection of scientific publications from the Deggendorf Institute of Technology's publication database. The objective is to group similar documents together and identify the active research areas within the database. The documents are encoded using the encoder component of the model TinyLlama-1.1B-Chat-v1.0 (https://arxiv.org/abs/2401.02385), which is a small chat model that adopts the architecture of Llama2.

Instructions

To successfully load and run the model, you need sufficient RAM memory and preferably a GPU, or you can use Google Colab.
Please ensure that the following libraries are installed: torch, transformers, keybert, langdetect, tqdm, matplotlib, sklearn, nltk, and umap-learn. If you encounter the error "No module X found," simply install the missing module.

References

@misc{zhang2024tinyllama, title={TinyLlama: An Open-Source Small Language Model}, author={Peiyuan Zhang and Guangtao Zeng and Tianduo Wang and Wei Lu}, year={2024}, eprint={2401.02385}, archivePrefix={arXiv}, primaryClass={cs.CL} }

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.ipynb_checkpoints		.ipynb_checkpoints
input		input
.gitignore		.gitignore
01_Data_Pre_processing.ipynb		01_Data_Pre_processing.ipynb
02 _Clustering.ipynb		02 _Clustering.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Topic-Modeling-Using-Transformer-Models

About this repo

Instructions

References

About

Uh oh!

Releases

Packages

Languages

basharF/Topic-Modeling-Using-Transformer-Models

Folders and files

Latest commit

History

Repository files navigation

Topic-Modeling-Using-Transformer-Models

About this repo

Instructions

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages