Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When will the data used in the book be available ? #2

Open
gary-mu opened this issue May 27, 2017 · 5 comments
Open

When will the data used in the book be available ? #2

gary-mu opened this issue May 27, 2017 · 5 comments

Comments

@gary-mu
Copy link

gary-mu commented May 27, 2017

The book uses a couple of datasets for coding examples. When will they become available here?

@trejas
Copy link

trejas commented Jun 11, 2017

Any word here?

@jpcartailler
Copy link

Yes, having access to the data is rather important.

@andrewgbruce
Copy link
Owner

The scripts for the book have been uploaded. The scripts are organized by chapter.
To download the data, run the script download_data.r: this pulls the data from google drive.

The scripts all assume that you have clone the repository into the top level directory.
If you save the repository elsewhere, you will need to edit the line

PSDS_PATH <- file.path('~', 'statistics-for-data-scientists')

to point to the appropriate directory in all of the scripts.

@singing-scientist
Copy link

The download_data.r script assumes the presence of a /data/ directory as in ~/statistics-for-data-scientists/data/

The user could create this manually or add a line to the script to do so. Then the download process will work.

@ventean
Copy link

ventean commented Oct 18, 2017

Tip: use dropbox if you manually download. The Google doc folder is missing a few files.

I had an error with download_data.r so went the manual route.

source("~/statistics-for-data-scientists/src/download_data.r")
Error in file(con, "wb") : cannot open the connection

$ diff dropbox.files googledocs.files
0a1

762609057_112015_3429_airline_delay_causes.csv
12a14
loanStats.csv
15,16c17,18
< sp500_px.csv
< sp500_sym.csv


sp500_data.csv
sp500_sectors.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants