Skip to content
This repository has been archived by the owner on Jul 15, 2023. It is now read-only.

MichiganDataScienceTeam/googleanalytics

Repository files navigation

googleanalytics

Running the code

  1. Clone this repo
git clone git@github.com:MichiganDataScienceTeam/googleanalytics.git

If you don't have an SSH key set up on Github, the above will not work. As a temporary solution, use the command below.

git clone https://github.com/MichiganDataScienceTeam/googleanalytics.git
  1. Download the data from Google Drive and place it in ./data

  2. Unzip the data and make sure they have read permissions

cd data
unzip train.csv.zip
unzip test.csv.zip
unzip sample_submission.csv.zip
chmod +r train.csv test.csv sample_submission.csv
cd ..
  1. Create a virtualenv named env so that you can prevent version conflicts (this will likely solve any package installation issues you have.)
sudo pip install virtualenv
python -m virtualenv env
  1. Activate/go into the virtualenv
source env/bin/activate
  1. Install the required packages.
pip install -r requirements.txt
  1. Process the train/val split.
python split_train_valid.py
  1. Make sure the dataset is in the correct place and run the exploration code. Note: removing the --debug flag will cause the full dataset to be loaded, which may take a long time on your machine.
python dataset.py --debug
python explore.py --debug

Contributing

  1. Create an account on Github and add an SSH key to your account
  2. Ask @stroud on slack to join the MDST Organization
  3. Assign yourself to an issue
  4. Create a branch and write your code
  5. Submit a pull request when you are done!