Econobox

First things first.

Dependencies

To use this repo you should install the dependencies that you find listed in the file requirements.txt.
It'll be easy to do it if you use a virtual environment (recommended):

#install pipenv if you don't have it
pip install --user pipenv
pipenv install

To update the requirements file:

pipenv lock -r > requirements.txt

However, if you don't want to set up a virtual environment you can still download the requirements as follows:

# make sure to run this command in the project directory
pip install -r requirements.txt

Data

To correctly run the code in this repo you need the data.
There are two options here as well:

Insert the .zip file containing the dataset inside the data directory and from your terminal run
```
./data/load_data.sh
```
Directly copy the unzipped folder containing the data inside the /data directory.

Now the structure.

This repo is oganised in 4 modules :

|_ Data

Here you can find all the code and stuff needed to load the data

|_ Preprocessing

Here you can find all the code responsible for the preprocessing of the input files. The preprocessing steps include:

tokenize and lemmatize the input files
build a vocabulary
build a co-occurrence matrix

|_ Embedding

Here you can find all the code responsible for the embedding of the tweets. The embedding can be done in two ways:

learn word embeddings -> find some fancy way to aggregate the word embeddings into a sentence embedding
learn the sentence embeddings directly

|_ Classifier

Here you can find all the code responsible for the classification of the sentiment of tweets. The classifier module is responsible for both training and testing.

The idea behind the modules structure

The goal of the re-factoring is to give each module a simple (and hence easily extensible) internal structure.

Ideally each module is defined by a main, an init, a pipeline and other helper scripts. The main should have little if no code, the init should define the constants that are needed across the module, the pipeline should define all the functions that access and organize the methods implemented in the helper scripts. The heavy logic of each implementation of the module should be in an helper script.

How to extend the structure?

Add a new Pre-precessing method: write it in a script and make sure to use it in the pipeline.
Add a new Embedding method : write a new subclass of EmbeddingBase class, and implement its train_embedding method.
Add a new Classifier method: write a new subclass of ClassifierBase class.

Name		Name	Last commit message	Last commit date
Latest commit History 270 Commits
classifier		classifier
classifiers		classifiers
cooc_matrices		cooc_matrices
data		data
embedding		embedding
embeddings		embeddings
preprocessing		preprocessing
vocabularies		vocabularies
.gitignore		.gitignore
BERT_NN.py		BERT_NN.py
Construction-of-training-matrix-prep.ipynb		Construction-of-training-matrix-prep.ipynb
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
__init__.py		__init__.py
base_NN.py		base_NN.py
chevrolet_render.jpg		chevrolet_render.jpg
links_to_files		links_to_files
pipeline.py		pipeline.py
plot_words.jpg		plot_words.jpg
preprocessing_pipeline_example.ipynb		preprocessing_pipeline_example.ipynb
requirements.txt		requirements.txt
results_layout.ipynb		results_layout.ipynb
test_matrix_2.npz		test_matrix_2.npz
vocs_model_results.ipynb		vocs_model_results.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Econobox

First things first.

Dependencies

Data

Now the structure.

The idea behind the modules structure

How to extend the structure?

About

Releases

Packages

Contributors 3

Languages

GiuliaLanzillotta/Econobox-SA

Folders and files

Latest commit

History

Repository files navigation

Econobox

First things first.

Dependencies

Data

Now the structure.

The idea behind the modules structure

How to extend the structure?

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages