Skip to content
Repository for the code of the ADA project - CMU Stats Department
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.

Advanced Data Analysis (ADA) Project

This is the repository for the code of the ADA project - Carnegie Mellon Statistics Department, in collaboration with Human Rights Data Analysis Group (HRDAG).

Getting Started

The repository has three main subfolders:

  • Arabic Soundex - Folder dedicated to the implementation and testing of the arabic soundex
  • Arabic Parser - Folder dedicated to the implementation and testing of the arabic parser
  • Arabic Utils - Folder dedicated to general functions useful to deal with arabic characters and string.

The functions in file are functions used to retrieve language agnostic features from strings - same for the functions in


Code has been written in Python 2.7. Only "unittest2" would need to be installed as extra packages and it is imported in the following way when testing is needed:

import unittest2 as unittest

All files are coded in "utf-8" to support the unicode characters for Arabic. All the functions are provided with docstrings. When importing them, use the following to visualize the docstring - or look at the file in which they are defined:

from import foo

Running the tests

The "test" folder are where the tests are contained for each subfolder and for the main functions. The tests are meant to be run within the folder - i.e. you should have to cd into the "test" folder in order to run the tests.

Tests are commented within the test files. Names used are synthetical and do not represent any real individual.


  • Nic Dalmasso


  • Jared Murray, Jordan Rodu and Robin Mejia for pivotal feedbacks and discussions

(README style from

You can’t perform that action at this time.