Components

This repository holds code related to the business communication analysis project.

Components

Pipeline

The pipeline is used to preprocess raw data and store it in the ArangoDB. Parts of the pipeline are mostly interchangeable, as long as data one component depends on is present. One can also implement different sources and sinks. This can be used to add parts to the pipeline, that tune errors in the database without having to import everything again, while keeping the code for later to run a full import stack.

Check the options available using python pipeline/main.py --help.

Setup

Get data

There are many versions of the Enron Corpus around:

Download the original set (first in the list) and store it in data/original/.

Requirements

Install and run arangodb

yaourt -S arangodb
sudo systemctl start arangodb.service

Install python dependencies

pip install numpy, pandas, keras, python-arango

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
data/original		data/original
pipeline		pipeline
vis		vis
README.md		README.md
TopicModel.ipynb		TopicModel.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Components

Pipeline

Setup

Get data

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Components

Pipeline

Setup

Get data

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages