Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introductions + kaggle data download Fall 2020 #68

Open
chendaniely opened this issue Aug 31, 2020 · 0 comments
Open

Introductions + kaggle data download Fall 2020 #68

chendaniely opened this issue Aug 31, 2020 · 0 comments

Comments

@chendaniely
Copy link
Member

chendaniely commented Aug 31, 2020

Hi all:

As you are joining the team and getting acquainted with the project (and learning git).

Part 1: Introduce yourself to the repository

Please do the following to add your name to the "Teams" section of the README.md.

Follow the non-maintainer steps in: https://chendaniely.github.io/training_ds_r/help-faq.html#general-workflow

You should see your changes when you go to the repository page under the "Teams" section.
This task also serves as your understanding of Git and makes sure the settings on this repository are correct. So, please let me know if you run into issues.

If you are having issues

Depending on when people clone the repository, when you try to push your changes you may be blocked for one of 2 reasons

  1. Permissions (403 error): let me know your GH username so I can add you to this repository as a maintainer
  2. Something about the remote having changes you don't have: if you keep reading the error message it's essentially telling you that you'd need to pull first before pushing again.
  • You may run into a merge conflict here depending on what lines were changed. Just let me know if you end up with problems here
  • The tl;dr is you need to open the README.md file and remove the >>>>>>>, =====, and <<<<< and clean up the entire file so you're happy with it. Then add, commit, and push again.

Part 2: Download the kaggle dataset

Tasks:

  1. Make sure you have Python installed (anaconda or miniconda is preferred, otherwise you'll have to manage your own virtual environment)
# run this in your terminal (anaconda command prompt for windows)
conda config --add channels conda-forge
conda config --set channel_priority strict 
  1. pull down the new updates from master. What you see on your computer should be what's displayed on GitHub
  2. Install/update the conda environment by going to this project in the terminal: conda env create -f environment.yml
    • The environment.yml has make in it so it should install make for you now
  3. Enable the environment with conda activate db_covid19
  4. Get your kaggle API information on your computer, directions here: https://github.com/Kaggle/kaggle-api#api-credentials
  5. make the data with make data_kaggle, it will install and unzip the kaggle dataset, know it's about 4.2GB after extraction

You should have all the kaggle files in the data/db/original/kaggle folder.

Optional, but will make your life easier later

If you get make installed outside the db_covid19 environment you can do all the setup steps listed above by running make setup_env followed by make data_kaggle.
This will delete your db_covid19 environment and re-install all the packages in environment.yml from scratch.

  1. Go back to your base evironment: conda activate base
  2. Install make there conda install make

Copy of #1 #3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant