Advanced Data Analysis (ADA) Project
This is the repository for the code of the ADA project - Carnegie Mellon Statistics Department, in collaboration with Human Rights Data Analysis Group (HRDAG).
The repository has three main subfolders:
- Arabic Soundex - Folder dedicated to the implementation and testing of the arabic soundex
- Arabic Parser - Folder dedicated to the implementation and testing of the arabic parser
- Arabic Utils - Folder dedicated to general functions useful to deal with arabic characters and string.
The functions in main.py file are functions used to retrieve language agnostic features from strings - same for the functions in utils.py
Code has been written in Python 2.7. Only "unittest2" would need to be installed as extra packages and it is imported in the following way when testing is needed:
import unittest2 as unittest
All files are coded in "utf-8" to support the unicode characters for Arabic. All the functions are provided with docstrings. When importing them, use the following to visualize the docstring - or look at the file in which they are defined:
from utils.py import foo help(foo)
Running the tests
The "test" folder are where the tests are contained for each subfolder and for the main functions. The tests are meant to be run within the folder - i.e. you should have to cd into the "test" folder in order to run the tests.
Tests are commented within the test files. Names used are synthetical and do not represent any real individual.
- Nic Dalmasso
- Jared Murray, Jordan Rodu and Robin Mejia for pivotal feedbacks and discussions
(README style from https://gist.github.com/PurpleBooth/109311bb0361f32d87a2)