Whiskey Business

Setup Instructions

Please run everything as the w205 user unless otherwise stated.

The user should already have hadoop and hive installed and running.

More specifically, if you're booting a UCB instance, you can use the following commands:

As root (update your dev to reflect where your EBS volume is):

mount /dev/xvdf /data
su - w205

As w205 (Optional):


Env setup

If you don't have anaconda installed already, please install it from:

Setup conda env called "w205-project":

conda env create -f environment.yml

Activate env:

source activate w205-project

Update the env when activated if environment.yml is updated:

conda env update -f environment.yml

To remove the project:

conda remove --name w205-project --all

Run all

Activate environment:

source activate w205-project

Add google docs credentials to: export_data/client_secret.json

Run all scripts: ./

Manual Data setup commands

Download data to data source:

python data_get/

Transform data in data source:

python data_get/

Put data into HDFS:

cd loading_and_modelling


Transform data in hive:

cd ../transforming


Pull final table down as CSV with headers:

hive -e 'set hive.cli.print.header=true;select * from whiskey_business;' | sed 's/[\t]/,/g' | sed 's/whiskey_business\.//g' > export_data/data/whiskey_business.csv

Export data from csv to google sheets:

python export_data/

