CSC2611-project

This is the Repo for Computational Semantic Change Course Capstone Project.

Full Presentation Slides: https://docs.google.com/presentation/d/1J9_k48yt2Msn836-zUJ1p2kl1bj13QsNrtHj_hC_9E4/edit?usp=sharing

For the full datasets and trained word embeddings / models, contact: mhorusli@cs.toronto.edu

Previous References (Deprecated)

Learning Dynamic Contextualised Embedding (2022) https://arxiv.org/pdf/2208.10734.pdf
Di Carlo, V., Bianchi, F., & Palmonari, M. (2019). Training Temporal Word Embeddings with a Compass. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 6326-6334. https://doi.org/10.1609/aaai.v33i01.33016326
@inproceedings{jawahar-seddah-2019-contextualized, title = "Contextualized Diachronic Word Representations", author = "Jawahar, Ganesh and Seddah, Djam{'e}", booktitle = "Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change", month = aug, year = "2019", address = "Florence, Italy", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/W19-4705", doi = "10.18653/v1/W19-4705", pages = "35--47", abstract = "Diachronic word embeddings play a key role in capturing interesting patterns about how language evolves over time. Most of the existing work focuses on studying corpora spanning across several decades, which is understandably still not a possibility when working on social media-based user-generated content. In this work, we address the problem of studying semantic changes in a large Twitter corpus collected over five years, a much shorter period than what is usually the norm in diachronic studies. We devise a novel attentional model, based on Bernoulli word embeddings, that are conditioned on contextual extra-linguistic (social) features such as network, spatial and socio-economic variables, which are associated with Twitter users, as well as topic-based features. We posit that these social features provide an inductive bias that helps our model to overcome the narrow time-span regime problem. Our extensive experiments reveal that our proposed model is able to capture subtle semantic shifts without being biased towards frequency cues and also works well when certain contextual features are absent. Our model fits the data better than current state-of-the-art dynamic word embedding models and therefore is a promising tool to study diachronic semantic changes over small time periods.", }
Maja Rudolph and David Blei, 2017. Dynamic Bernoulli Embeddings for Language Evolution. arxiv preprint arxiv:1703.08052.

[evaluation related]

word embedding direct evaluation method

Wang, B., Wang, A., Chen, F., Wang, Y., & Kuo, C. (2019). Evaluating word embedding models: Methods and experimental results. APSIPA Transactions on Signal and Information Processing, 8, E19. doi:10.1017/ATSIP.2019.12 https://www.cambridge.org/core/services/aop-cambridge-core/content/view/EDF43F837150B94E71DBB36B28B85E79/S204877031900012Xa.pdf/div-class-title-evaluating-word-embedding-models-methods-and-experimental-results-div.pdf [grand general guidelines]
Arianna Betti, Martin Reynaert, Thijs Ossenkoppele, Yvette Oortwijn, Andrew Salway, and Jelke Bloem. 2020. Expert Concept-Modeling Ground Truth Construction for Word Embeddings Evaluation in Concept-Focused Domains. In Proceedings of the 28th International Conference on Computational Linguistics, pages 6690–6702, Barcelona, Spain (Online). International Committee on Computational Linguistics. https://aclanthology.org/2020.coling-main.586.pdf [philosphy, no code available]
https://arxiv.org/pdf/2005.03812.pdf [COMPARATIVE ANALYSIS OF WORD EMBEDDINGS FOR CAPTURING WORD SIMILARITIES]
https://arxiv.org/pdf/1605.09096.pdf [histwords]

use domain specific embedding not as an "end" but applied (e.g. social science prior knowledge/results)

use eval methods from lab, visualize & compare the sims/clusters

first - last
max/min/mean/median/...
cluster & jaccard

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
__pycache__		__pycache__
data		data
PPMI_Results.ipynb		PPMI_Results.ipynb
README.md		README.md
augmented_train_twec.ipynb		augmented_train_twec.ipynb
comparative_train_and_analysis.ipynb		comparative_train_and_analysis.ipynb
data_ETL_with_spark.py		data_ETL_with_spark.py
data_preprocess.py		data_preprocess.py
download_s2orc.py		download_s2orc.py
frequency_of_original_common_words.csv		frequency_of_original_common_words.csv
pre_results.ipynb		pre_results.ipynb
requirements_dev.txt		requirements_dev.txt
trainTWEC.py		trainTWEC.py
twec.py		twec.py
twec_training_experiment_hyper.ipynb		twec_training_experiment_hyper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CSC2611-project

Previous References (Deprecated)

[evaluation related]

word embedding direct evaluation method

use domain specific embedding not as an "end" but applied (e.g. social science prior knowledge/results)

use eval methods from lab, visualize & compare the sims/clusters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

ProjectsOfMLee/Semantic-Change-Detection

Folders and files

Latest commit

History

Repository files navigation

CSC2611-project

Previous References (Deprecated)

[evaluation related]

word embedding direct evaluation method

use domain specific embedding not as an "end" but applied (e.g. social science prior knowledge/results)

use eval methods from lab, visualize & compare the sims/clusters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages