Repository for the ICPE machine learning workgroup
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
cache/mlmodels
docker
learner
tests
.editorconfig
.gitignore
.pylintrc
.style.yapf
README.md
analysis.R
circle.yml
install-packages.R
main.py
requirements.txt
setup.cfg
setup.py
setup.sh
test-requirements.txt
test.sh
todo

README.md

Learning Emotions

CircleCI

This is the repository for the ICPE machine learning workgroup. In this readme we present how one can setup the software and run the analysis.

Intalling

The procedure to run the software is as follows. There exists a setup.sh file, but that's still in development, and following the next steps probably gives a better result

1. Install the dependencies

First the dependencies used by the application need to be installed. Open a terminal, clone the project, and cd to the cloned directory. Make sure you have python 3.6 installed. Then, depending on your preferences, create a virtual environment to save the dependencies in. Note that this is a Python 3.6 project, and we need to use a Python 3.6 virtual environment.

python3.6 -m venv venv
source venv/bin/activate

Your terminal should now show that you are using the venv virtual environment. The final step is to install the dependencies. The easiest way to installing the dependencies is by using pip:

pip install -r requirements.txt

2. Initializing the data and cache

The data used for the present project is provided by NESDA. The easiest method to get the data in the project is by simlinking to the location where the data is stored. In case of Compsy development machines the following lines suffice:

ln -s ~/vault/NESDA/SPSS data

After linking the data, it is important to create a cache directory to store the data and model caches.

mkdir -p cache/mlmodels
mkdir exports

3. Setting up AWS credentials

In the current setup the whole system is based on exchanging data with AWS. ML models are created in the application, dumped to disk, and uploaded to AWS so multiple clients could work on the same project. The package uses a python package to perform these data uploads, and these use the following env variables:

AWS_ACCESS_KEY_ID=CHANGEMEINTHECORRECTTOKEN
AWS_SECRET_ACCESS_KEY=CHANGEMEINTHECORRECTTOKEN

4. Running the software

To test whether everything works, we can now run the application. Because the design of the application is built in such a way that it can potentially be distributed over a number of machines, there are a number of different configurations one could use to start the analysis. The first step is to split the data in a test and training set. The test set is only used for evaluating the algorithm after it was trained on the training set. This training set is internally used as a cross-validation set. Creating the set can be done as follows:

python3.6 main.py -t createset -f -p -n

In this case, -t specifies the part of the application to run, -f specifies the use of feature selection, -p allows the use of polynomial features, and -n removes previous cached files.

The next step is to actually train the algorithms on these created datasets. This can be done using the following command:

python3.6 main.py -t train

What this steps does is run all of the models specified in driver.py and upload the fitted models to S3. After this step has completed, we can retrieve the results from S3 using the following command:

python3.6 main.py -t evaluate

Which exports the output of the project.

Docker

Apart from installing everything locally, one could also run the application in Docker using the following command:

docker run --rm -it \
  -e "AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID"\
  -e "AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY"\
  --name ml frbl/icpe-machine-learning

The docker image can be updated by changing the code, and running the build script.