-
Notifications
You must be signed in to change notification settings - Fork 121
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit aea3bc4
Showing
148 changed files
with
225,635 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
*.lock.db | ||
.DS_Store | ||
*~ | ||
.#* | ||
#*# | ||
.RHistory | ||
.Rhistory | ||
c:\\sw\\text.txt | ||
*.temp.xml | ||
temp.xml | ||
.Rdata | ||
.RData | ||
*_external_links.xml | ||
generated | ||
|
||
|
||
|
||
# History files | ||
.Rhistory | ||
.Rapp.history | ||
|
||
# Session Data files | ||
.RData | ||
|
||
# Example code in package build process | ||
*-Ex.R | ||
|
||
# Output files from R CMD build | ||
/*.tar.gz | ||
|
||
# Output files from R CMD check | ||
/*.Rcheck/ | ||
|
||
# RStudio files | ||
.Rproj.user/ | ||
|
||
# produced vignettes | ||
vignettes/*.html | ||
vignettes/*.pdf | ||
|
||
# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3 | ||
.httr-oauth | ||
|
||
# knitr and R markdown default cache directories | ||
/*_cache/ | ||
/cache/ | ||
|
||
# Temporary files created by R markdown | ||
*.utf8.md | ||
*.knit.md | ||
.Rproj.user |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
<h1 id="example-code-and-data-for-practical-data-science-with-r-by-nina-zumel-and-john-mount-manning-2014.">Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.</h1> | ||
<ul> | ||
<li>The book: <a href="http://www.manning.com/zumel/">"Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014</a> (book copyright Manning Publications Co., all rights reserved)</li> | ||
<li>The support site: <a href="https://github.com/WinVector/zmPDSwR">GitHub WinVector/zmPDSwR</a></li> | ||
</ul> | ||
<h2 id="the-code-and-data-in-this-directory-supports-examples-from">The code and data in this directory supports examples from:</h2> | ||
<ul> | ||
<li>Chapter 8: Using Unsupervised Methods</li> | ||
</ul> | ||
<h2 id="original-data">Original data:</h2> | ||
<p>Book-Crossing dataset mined by Cai-Nicolas Ziegler, DBIS Freiburg original link http://www.informatik.uni-freiburg.de/~cziegler/BX/</p> | ||
<p>Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.</p> | ||
<p>Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):</p> | ||
<p>Improving Recommendation Lists Through Topic Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.</p> | ||
<p>http://www.informatik.uni-freiburg.de/~cziegler/BX/WWW-2005-Preprint.pdf</p> | ||
<h2 id="derived-works-no-claim-of-license-on-these">Derived works (no claim of license on these):</h2> | ||
<ul> | ||
<li>bxBooks.RData : R-binary version of Book-Crossing dataset.</li> | ||
<li>bookdata.tsv.gz : gzipped tab-separated file containing customer book ratings by title and numerical rating</li> | ||
</ul> | ||
<h2 id="our-additional-documentation-notes-code-and-example-data">Our additional documentation, notes, code, and example data:</h2> | ||
<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</p> | ||
<ul> | ||
<li>read_bookcrossing.R : script to read in original data files and create bxBooks.RData</li> | ||
<li>create_bookdata.R : script to create the data file bookdata.tsv</li> | ||
</ul> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
|
||
# Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014. | ||
|
||
|
||
* The book: ["Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014](http://www.manning.com/zumel/) (book copyright Manning Publications Co., all rights reserved) | ||
* The support site: [GitHub WinVector/zmPDSwR](https://github.com/WinVector/zmPDSwR) | ||
|
||
|
||
## The code and data in this directory supports examples from: | ||
* Chapter 8: Using Unsupervised Methods | ||
|
||
|
||
## Original data: | ||
Book-Crossing dataset mined by Cai-Nicolas Ziegler, DBIS Freiburg | ||
original link http://www.informatik.uni-freiburg.de/~cziegler/BX/ | ||
|
||
Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September | ||
2004) from the Book-Crossing community with kind permission from Ron | ||
Hornbaker, CTO of Humankind Systems. Contains 278,858 users | ||
(anonymized but with demographic information) providing 1,149,780 | ||
ratings (explicit / implicit) about 271,379 books. | ||
|
||
Freely available for research use when acknowledged with the | ||
following reference (further details on the dataset are given in this | ||
publication): | ||
|
||
Improving Recommendation Lists Through Topic | ||
Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph | ||
A. Konstan, Georg Lausen; Proceedings of the 14th International World | ||
Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To | ||
appear. | ||
|
||
http://www.informatik.uni-freiburg.de/~cziegler/BX/WWW-2005-Preprint.pdf | ||
|
||
|
||
## Derived works (no claim of license on these): | ||
|
||
* bxBooks.RData : R-binary version of Book-Crossing dataset. | ||
* bookdata.tsv.gz : gzipped tab-separated file containing customer book ratings by title and numerical rating | ||
|
||
## Our additional documentation, notes, code, and example data: | ||
|
||
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. | ||
|
||
* read_bookcrossing.R : script to read in original data files and create bxBooks.RData | ||
* create_bookdata.R : script to create the data file bookdata.tsv | ||
|
||
|
||
|
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
load("bxBooks.RData") | ||
colnames(bxBooks) <- gsub(".", "_", colnames(bxBooks), fixed=T) | ||
colnames(bxBookRatings) <- gsub(".", "_", colnames(bxBookRatings), fixed=T) | ||
colnames(bxUsers) <- gsub(".", "_", colnames(bxUsers), fixed=T) | ||
|
||
Sys.setlocale('LC_ALL','C') # to deal with the non-US characters | ||
# remove parentheticals, which are usually | ||
# at the end of the title. First get rid of the open paren | ||
booktokens <- gsub("(", "#", bxBooks$Book_Title, fixed=T) | ||
booktokens <- gsub("^#", "(", booktokens) | ||
booktokens <- gsub("#.*$", "", booktokens) # leaves a trailing white space | ||
cleantitles <- sub("[[:space:]]+$","",booktokens) # save these | ||
|
||
booktokens <- tolower(cleantitles) | ||
Books <- data.frame(ISBN=bxBooks$ISBN, token=booktokens, title=cleantitles) | ||
|
||
library(sqldf) | ||
# picks a unique isbn for every token -- this is the number of unique tokens | ||
bookmap <- sqldf('SELECT min(ISBN) as misbn, | ||
token | ||
FROM Books | ||
GROUP BY token') | ||
|
||
# displaymap has a title for every unique token | ||
displaymap <- sqldf('SELECT Books.title as title, | ||
bookmap.token as token | ||
FROM Books, | ||
bookmap | ||
WHERE Books.ISBN=bookmap.misbn') | ||
|
||
# bookdata1 is shorter than bxBookRatings because | ||
# some of the rated books are not in the bxBooks data | ||
bookdata1 <- sqldf('SELECT ratings.User_ID as userid, | ||
Books.token as token, | ||
ratings.Book_Rating as rating | ||
FROM Books, | ||
bxBookRatings as ratings | ||
WHERE ratings.ISBN=Books.ISBN') | ||
|
||
# add the displayname | ||
bookdata <- merge(bookdata1, displaymap, by="token") | ||
|
||
write.table(bookdata, file="bookdata.tsv", | ||
sep="\t", row.names=F, col.names=T) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
|
||
# first: replace \" with ' | ||
bxUsers <- read.table('BX-Users.csv',header=T,sep=';',comment.char='',stringsAsFactors=F) | ||
# first replace \" with blank | ||
bxBookRatings <- read.table('BX-Book-Ratings.csv',header=T,sep=';',comment.char='',stringsAsFactors=F) | ||
# first: replace \" with ' | ||
bxBooks <- read.table('BX-Books.csv',header=T,sep=';',comment.char='',stringsAsFactors=F) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
buzz.aux | ||
buzz.log | ||
buzz.out | ||
cache |
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
<h1 id="example-code-and-data-for-practical-data-science-with-r-by-nina-zumel-and-john-mount-manning-2014.">Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.</h1> | ||
<ul> | ||
<li>The book: <a href="http://www.manning.com/zumel/">"Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014</a> (book copyright Manning Publications Co., all rights reserved)</li> | ||
<li>The support site: <a href="https://github.com/WinVector/zmPDSwR">GitHub WinVector/zmPDSwR</a></li> | ||
</ul> | ||
<h2 id="the-code-and-data-in-this-directory-supports-examples-from">The code and data in this directory supports examples from:</h2> | ||
<ul> | ||
<li>Chapter 10: Documentation and Deployment</li> | ||
<li>Chapter 11: Producing Effective Presentations</li> | ||
</ul> | ||
<h2 id="original-data">Original data:</h2> | ||
<p>10-13-2013 Data from: http://ama.liglab.fr/datasets/buzz/ Using: http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.data</p> | ||
<p>(described in http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.names )</p> | ||
<p>Crypto hashes: $ shasum TomsHardware-*.txt 5a1cc7863a9da8d6e8380e1446f25eec2032bd91 TomsHardware-Absolute-Sigma-500.data.txt 86f2c0f4fba4fb42fe4ee45b48078ab51dba227e TomsHardware-Absolute-Sigma-500.names.txt c239182c786baf678b55f559b3d0223da91e869c TomsHardware-Relative-Sigma-500.data.txt ec890723f91ae1dc87371e32943517bcfcd9e16a TomsHardware-Relative-Sigma-500.names.txt</p> | ||
<p>R objects produced by commands in rsteps.R saved in thRS500.Rdata</p> | ||
<p>11-6-2013</p> | ||
<p>Adding latex ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzz.pdf ) and markdown ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzzm.md ) versions of the documentation.</p> | ||
<h2 id="license-for-additional-documentation-notes-code-and-example-data">License for additional documentation, notes, code, and example data:</h2> | ||
<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</p> | ||
<p>No guarantee, indemnification or claim of fitness is made regarding any of these items.</p> | ||
<p>No claim of license on works of others or derived data.</p> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
|
||
# Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014. | ||
|
||
|
||
* The book: ["Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014](http://www.manning.com/zumel/) (book copyright Manning Publications Co., all rights reserved) | ||
* The support site: [GitHub WinVector/zmPDSwR](https://github.com/WinVector/zmPDSwR) | ||
|
||
|
||
## The code and data in this directory supports examples from: | ||
* Chapter 10: Documentation and Deployment | ||
* Chapter 11: Producing Effective Presentations | ||
|
||
|
||
## Original data: | ||
|
||
|
||
10-13-2013 | ||
Data from: http://ama.liglab.fr/datasets/buzz/ | ||
Using: | ||
http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.data | ||
|
||
(described in http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.names ) | ||
|
||
Crypto hashes: | ||
$ shasum TomsHardware-*.txt | ||
5a1cc7863a9da8d6e8380e1446f25eec2032bd91 TomsHardware-Absolute-Sigma-500.data.txt | ||
86f2c0f4fba4fb42fe4ee45b48078ab51dba227e TomsHardware-Absolute-Sigma-500.names.txt | ||
c239182c786baf678b55f559b3d0223da91e869c TomsHardware-Relative-Sigma-500.data.txt | ||
ec890723f91ae1dc87371e32943517bcfcd9e16a TomsHardware-Relative-Sigma-500.names.txt | ||
|
||
|
||
R objects produced by commands in rsteps.R saved in thRS500.Rdata | ||
|
||
|
||
11-6-2013 | ||
|
||
Adding latex ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzz.pdf ) and markdown ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzzm.md ) versions of the documentation. | ||
|
||
|
||
|
||
## License for additional documentation, notes, code, and example data: | ||
|
||
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. | ||
|
||
No guarantee, indemnification or claim of fitness is made regarding any of these items. | ||
|
||
No claim of license on works of others or derived data. |
Oops, something went wrong.