Experiments with word2vec embeddings for synonyms detection, for the Romanian language.
-
Updated
Sep 10, 2023 - Python
Experiments with word2vec embeddings for synonyms detection, for the Romanian language.
I tried to figure out positive and negative comments on my Youtube videos. So, I used NLP to analyze comments. I set the main language as Korean, but you can try setting English as the main language.
A dataset of 2095 plain text articles of 5 categories with over 805k words in total.
4,308 short stories (4 million words) scraped from https://reddit.com/r/WritingPrompts
Repo for Turkish sentiment analysis dataset, "Vitamins and Supplements Customer Reviews"
Data preprocessing and training on Drug Review Dataset using Hugging Face library
Sentiment Analysis on Product Reviews ( Project Associated with Zummit Infolabs ).
My project storage in NLP
Webcrawler for Turkish news.
Official repository for "Demonstrations Are All You Need: Advancing Offensive Content Paraphrasing using In-Context Learning".
Python program for detecting unintentional bilingual and translation instances in NLP datasets.
a novel Romanian language dataset for offensive message detection with manually annotated comment from a local Romanian news website (stiri de cluj) into five classes
a Python library for managing and annotating text corpuses in different formats.
➰Loop through a TSV file and pass columns of data to an external program. A Bash script.
Classifying a SMS as spam or non-spam using Natural Language Processing (NLP) and Machine Learning
EACL 2021 paper (SJ_AJ@DravidianLangTech-EACL2021: Task-Adaptive Pre-Training of Multilingual BERT models for Offensive Language Identification)
Creating a NLP Pipeline to 'Clean' Movie Reviews Data and writing cleaned data to output file
Add a description, image, and links to the nlp-datasets topic page so that developers can more easily learn about it.
To associate your repository with the nlp-datasets topic, visit your repo's landing page and select "manage topics."