Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download the kaggle dataset #3

Closed
6 of 7 tasks
chendaniely opened this issue Mar 31, 2020 · 3 comments
Closed
6 of 7 tasks

Download the kaggle dataset #3

chendaniely opened this issue Mar 31, 2020 · 3 comments
Labels
setup Tasks to get things started

Comments

@chendaniely
Copy link
Member

chendaniely commented Mar 31, 2020

PR #2 updates the README.md file with the links @datajonbrig sent out in the project description (via sharepoint).

The PR also sets up the Makefile target to download and unzip the kaggle dataset into the data/db/original/kaggle folder.

This issue is for everyone to setup their Kaggle API keys and get their environments setup.

Tasks:

  1. Make sure you have Python installed (anaconda or miniconda is preferred, otherwise you'll have to manage your own virtual environment)
# run this in your terminal (anaconda command prompt for windows)
conda config --add channels conda-forge
conda config --set channel_priority strict 
  1. Ensure make is installed on your computer
    • If you are using conda you can install it with conda install make, if you have conda-forge setup (see comment above)
  2. pull down the new updates from master. What you see on your computer should be what's displayed on GitHub
  3. Install/update the conda environment by going to this project in the terminal: conda env create -f environment.yml
    • The environment.yml has make in it so it should install make for you now
  4. Enable the environment with conda activate db_covid19
  5. Get your kaggle API information on your computer, directions here: https://github.com/Kaggle/kaggle-api#api-credentials
  6. make the data with make data make data_kaggle, it will install and unzip the kaggle dataset, know it's about 4.2GB after extraction

You should have all the kaggle files in the data/db/original/kaggle folder.

Optional, but will make your life easier later

If you get make installed outside the db_covid19 environment you can do all the setup steps listed above by running make setup_env followed by make data make data_kaggle.
This will delete your db_covid19 environment and re-install all the packages in environment.yml from scratch.

  1. Go back to your base evironment: conda activate base
  2. Install make there conda install make

Please check off your name when you've completed the tasks:

@chendaniely
Copy link
Member Author

If you don't have make instaleld you can also run this command in your environment:

conda config --add channels conda-forge
conda config --set channel_priority strict 

@chendaniely
Copy link
Member Author

#15 changes the make data target to make data_kgl_text

@chendaniely
Copy link
Member Author

@benrayden37 don't close the issue. you just needed to check them off

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
setup Tasks to get things started
Projects
None yet
Development

No branches or pull requests

8 participants