Part Of Speech Tagger

Part-of-speech (POS) tagging is one of the most important addressed areas in the natural language processing (NLP). There are effective POS taggers for many languages. We tried to develop a POS tagger for the Arabic language, specifically for the modern standard Arabic (MSA), because it’s the language used in the formal textbooks and news. The objective of our solution is to firstly create a tokenizer that splits any file you choose into a list of words with removing any punctuations and numbers from the list. And secondly create a POS tagger which takes the list of words from tokenizer and then tag each word with its appropriate POS(verb, noun, particle) based on a combination of rules. Finally we created a golden corpus from a sample of the actual corpus folder to test our algorithm and see how accurate and precise with its tagging.

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them

python, pip, pandas, matplotlib, xlrd

Installing Python :

https://www.python.org/downloads/

Installing Pip :

If you're running Python 2.7.9+ or Python 3.4+
Congrats, you should already have pip installed. If you do not, read onward.

Download get-pip.py(https://bootstrap.pypa.io/get-pip.py) to a folder on your computer.
Open a command prompt and navigate to the folder containing get-pip.py.
Run the following command:

python get-pip.py

Pip is now installed!

Installing Pandas, matplotlib and xlrd :

pip install pandas matplotlib xlrd

Running the program

Now you can double-click the .bat file and this window should pop up:

After testing one of the files:

Built With

Tkinter - Tkinter is Python's de-facto standard GUI (Graphical User Interface) package

Developers

Omar AlQaisi - OmarQaisi
Marwan AlRamahi - Marwan998
Motassem Naqawah - moenaqawah

License

This project is licensed under the MIT License - see the LICENSE file for details

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.idea		.idea
Corpus		Corpus
ScreenShots		ScreenShots
__pycache__		__pycache__
LICENSE		LICENSE
POSTagger.bat		POSTagger.bat
POSreport.pdf		POSreport.pdf
README.md		README.md
__init__.py		__init__.py
gui.py		gui.py
tagger.py		tagger.py
tokenizer.py		tokenizer.py
tokenizer.pyc		tokenizer.pyc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Part Of Speech Tagger

Getting Started

Prerequisites

Installing Python :

Installing Pip :

Installing Pandas, matplotlib and xlrd :

Running the program

Built With

Developers

License

About

Releases

Packages

Contributors 3

Languages

License

OmarQaisi/Part-Of-Speech-Tagger-for-Arabic-Language

Folders and files

Latest commit

History

Repository files navigation

Part Of Speech Tagger

Getting Started

Prerequisites

Installing Python :

Installing Pip :

Installing Pandas, matplotlib and xlrd :

Running the program

Built With

Developers

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages