Skip to content

Commit

Permalink
Newer examples
Browse files Browse the repository at this point in the history
  • Loading branch information
JohnMount committed Jul 18, 2018
0 parents commit aea3bc4
Show file tree
Hide file tree
Showing 148 changed files with 225,635 additions and 0 deletions.
51 changes: 51 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
*.lock.db
.DS_Store
*~
.#*
#*#
.RHistory
.Rhistory
c:\\sw\\text.txt
*.temp.xml
temp.xml
.Rdata
.RData
*_external_links.xml
generated



# History files
.Rhistory
.Rapp.history

# Session Data files
.RData

# Example code in package build process
*-Ex.R

# Output files from R CMD build
/*.tar.gz

# Output files from R CMD check
/*.Rcheck/

# RStudio files
.Rproj.user/

# produced vignettes
vignettes/*.html
vignettes/*.pdf

# OAuth2 token, see https://github.com/hadley/httr/releases/tag/v0.3
.httr-oauth

# knitr and R markdown default cache directories
/*_cache/
/cache/

# Temporary files created by R markdown
*.utf8.md
*.knit.md
.Rproj.user
26 changes: 26 additions & 0 deletions Bookdata/README.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<h1 id="example-code-and-data-for-practical-data-science-with-r-by-nina-zumel-and-john-mount-manning-2014.">Example code and data for &quot;Practical Data Science with R&quot; by Nina Zumel and John Mount, Manning 2014.</h1>
<ul>
<li>The book: <a href="http://www.manning.com/zumel/">&quot;Practical Data Science with R&quot; by Nina Zumel and John Mount, Manning 2014</a> (book copyright Manning Publications Co., all rights reserved)</li>
<li>The support site: <a href="https://github.com/WinVector/zmPDSwR">GitHub WinVector/zmPDSwR</a></li>
</ul>
<h2 id="the-code-and-data-in-this-directory-supports-examples-from">The code and data in this directory supports examples from:</h2>
<ul>
<li>Chapter 8: Using Unsupervised Methods</li>
</ul>
<h2 id="original-data">Original data:</h2>
<p>Book-Crossing dataset mined by Cai-Nicolas Ziegler, DBIS Freiburg original link http://www.informatik.uni-freiburg.de/~cziegler/BX/</p>
<p>Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.</p>
<p>Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):</p>
<p>Improving Recommendation Lists Through Topic Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.</p>
<p>http://www.informatik.uni-freiburg.de/~cziegler/BX/WWW-2005-Preprint.pdf</p>
<h2 id="derived-works-no-claim-of-license-on-these">Derived works (no claim of license on these):</h2>
<ul>
<li>bxBooks.RData : R-binary version of Book-Crossing dataset.</li>
<li>bookdata.tsv.gz : gzipped tab-separated file containing customer book ratings by title and numerical rating</li>
</ul>
<h2 id="our-additional-documentation-notes-code-and-example-data">Our additional documentation, notes, code, and example data:</h2>
<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</p>
<ul>
<li>read_bookcrossing.R : script to read in original data files and create bxBooks.RData</li>
<li>create_bookdata.R : script to create the data file bookdata.tsv</li>
</ul>
49 changes: 49 additions & 0 deletions Bookdata/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@

# Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.


* The book: ["Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014](http://www.manning.com/zumel/) (book copyright Manning Publications Co., all rights reserved)
* The support site: [GitHub WinVector/zmPDSwR](https://github.com/WinVector/zmPDSwR)


## The code and data in this directory supports examples from:
* Chapter 8: Using Unsupervised Methods


## Original data:
Book-Crossing dataset mined by Cai-Nicolas Ziegler, DBIS Freiburg
original link http://www.informatik.uni-freiburg.de/~cziegler/BX/

Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September
2004) from the Book-Crossing community with kind permission from Ron
Hornbaker, CTO of Humankind Systems. Contains 278,858 users
(anonymized but with demographic information) providing 1,149,780
ratings (explicit / implicit) about 271,379 books.

Freely available for research use when acknowledged with the
following reference (further details on the dataset are given in this
publication):

Improving Recommendation Lists Through Topic
Diversification, Cai-Nicolas Ziegler, Sean M. McNee, Joseph
A. Konstan, Georg Lausen; Proceedings of the 14th International World
Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To
appear.

http://www.informatik.uni-freiburg.de/~cziegler/BX/WWW-2005-Preprint.pdf


## Derived works (no claim of license on these):

* bxBooks.RData : R-binary version of Book-Crossing dataset.
* bookdata.tsv.gz : gzipped tab-separated file containing customer book ratings by title and numerical rating

## Our additional documentation, notes, code, and example data:

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

* read_bookcrossing.R : script to read in original data files and create bxBooks.RData
* create_bookdata.R : script to create the data file bookdata.tsv



Binary file added Bookdata/bookdata.tsv.gz
Binary file not shown.
Binary file added Bookdata/bxBooks.RData
Binary file not shown.
44 changes: 44 additions & 0 deletions Bookdata/create_bookdata.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
load("bxBooks.RData")
colnames(bxBooks) <- gsub(".", "_", colnames(bxBooks), fixed=T)
colnames(bxBookRatings) <- gsub(".", "_", colnames(bxBookRatings), fixed=T)
colnames(bxUsers) <- gsub(".", "_", colnames(bxUsers), fixed=T)

Sys.setlocale('LC_ALL','C') # to deal with the non-US characters
# remove parentheticals, which are usually
# at the end of the title. First get rid of the open paren
booktokens <- gsub("(", "#", bxBooks$Book_Title, fixed=T)
booktokens <- gsub("^#", "(", booktokens)
booktokens <- gsub("#.*$", "", booktokens) # leaves a trailing white space
cleantitles <- sub("[[:space:]]+$","",booktokens) # save these

booktokens <- tolower(cleantitles)
Books <- data.frame(ISBN=bxBooks$ISBN, token=booktokens, title=cleantitles)

library(sqldf)
# picks a unique isbn for every token -- this is the number of unique tokens
bookmap <- sqldf('SELECT min(ISBN) as misbn,
token
FROM Books
GROUP BY token')

# displaymap has a title for every unique token
displaymap <- sqldf('SELECT Books.title as title,
bookmap.token as token
FROM Books,
bookmap
WHERE Books.ISBN=bookmap.misbn')

# bookdata1 is shorter than bxBookRatings because
# some of the rated books are not in the bxBooks data
bookdata1 <- sqldf('SELECT ratings.User_ID as userid,
Books.token as token,
ratings.Book_Rating as rating
FROM Books,
bxBookRatings as ratings
WHERE ratings.ISBN=Books.ISBN')

# add the displayname
bookdata <- merge(bookdata1, displaymap, by="token")

write.table(bookdata, file="bookdata.tsv",
sep="\t", row.names=F, col.names=T)
7 changes: 7 additions & 0 deletions Bookdata/read_bookcrossing.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@

# first: replace \" with '
bxUsers <- read.table('BX-Users.csv',header=T,sep=';',comment.char='',stringsAsFactors=F)
# first replace \" with blank
bxBookRatings <- read.table('BX-Book-Ratings.csv',header=T,sep=';',comment.char='',stringsAsFactors=F)
# first: replace \" with '
bxBooks <- read.table('BX-Books.csv',header=T,sep=';',comment.char='',stringsAsFactors=F)
4 changes: 4 additions & 0 deletions Buzz/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
buzz.aux
buzz.log
buzz.out
cache
Binary file added Buzz/BuzzDataSetDoc.pdf
Binary file not shown.
Binary file added Buzz/PeerPresentation.pdf
Binary file not shown.
Binary file added Buzz/ProjectSponsorPresentation.pdf
Binary file not shown.
21 changes: 21 additions & 0 deletions Buzz/README.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
<h1 id="example-code-and-data-for-practical-data-science-with-r-by-nina-zumel-and-john-mount-manning-2014.">Example code and data for &quot;Practical Data Science with R&quot; by Nina Zumel and John Mount, Manning 2014.</h1>
<ul>
<li>The book: <a href="http://www.manning.com/zumel/">&quot;Practical Data Science with R&quot; by Nina Zumel and John Mount, Manning 2014</a> (book copyright Manning Publications Co., all rights reserved)</li>
<li>The support site: <a href="https://github.com/WinVector/zmPDSwR">GitHub WinVector/zmPDSwR</a></li>
</ul>
<h2 id="the-code-and-data-in-this-directory-supports-examples-from">The code and data in this directory supports examples from:</h2>
<ul>
<li>Chapter 10: Documentation and Deployment</li>
<li>Chapter 11: Producing Effective Presentations</li>
</ul>
<h2 id="original-data">Original data:</h2>
<p>10-13-2013 Data from: http://ama.liglab.fr/datasets/buzz/ Using: http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.data</p>
<p>(described in http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.names )</p>
<p>Crypto hashes: $ shasum TomsHardware-*.txt 5a1cc7863a9da8d6e8380e1446f25eec2032bd91 TomsHardware-Absolute-Sigma-500.data.txt 86f2c0f4fba4fb42fe4ee45b48078ab51dba227e TomsHardware-Absolute-Sigma-500.names.txt c239182c786baf678b55f559b3d0223da91e869c TomsHardware-Relative-Sigma-500.data.txt ec890723f91ae1dc87371e32943517bcfcd9e16a TomsHardware-Relative-Sigma-500.names.txt</p>
<p>R objects produced by commands in rsteps.R saved in thRS500.Rdata</p>
<p>11-6-2013</p>
<p>Adding latex ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzz.pdf ) and markdown ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzzm.md ) versions of the documentation.</p>
<h2 id="license-for-additional-documentation-notes-code-and-example-data">License for additional documentation, notes, code, and example data:</h2>
<p><a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.</p>
<p>No guarantee, indemnification or claim of fitness is made regarding any of these items.</p>
<p>No claim of license on works of others or derived data.</p>
47 changes: 47 additions & 0 deletions Buzz/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@

# Example code and data for "Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014.


* The book: ["Practical Data Science with R" by Nina Zumel and John Mount, Manning 2014](http://www.manning.com/zumel/) (book copyright Manning Publications Co., all rights reserved)
* The support site: [GitHub WinVector/zmPDSwR](https://github.com/WinVector/zmPDSwR)


## The code and data in this directory supports examples from:
* Chapter 10: Documentation and Deployment
* Chapter 11: Producing Effective Presentations


## Original data:


10-13-2013
Data from: http://ama.liglab.fr/datasets/buzz/
Using:
http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.data

(described in http://ama.liglab.fr/datasets/buzz/classification/TomsHardware/Relative_labeling/sigma=500/TomsHardware-Relative-Sigma-500.names )

Crypto hashes:
$ shasum TomsHardware-*.txt
5a1cc7863a9da8d6e8380e1446f25eec2032bd91 TomsHardware-Absolute-Sigma-500.data.txt
86f2c0f4fba4fb42fe4ee45b48078ab51dba227e TomsHardware-Absolute-Sigma-500.names.txt
c239182c786baf678b55f559b3d0223da91e869c TomsHardware-Relative-Sigma-500.data.txt
ec890723f91ae1dc87371e32943517bcfcd9e16a TomsHardware-Relative-Sigma-500.names.txt


R objects produced by commands in rsteps.R saved in thRS500.Rdata


11-6-2013

Adding latex ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzz.pdf ) and markdown ( https://github.com/WinVector/zmPDSwR/blob/master/Buzz/buzzm.md ) versions of the documentation.



## License for additional documentation, notes, code, and example data:

<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.

No guarantee, indemnification or claim of fitness is made regarding any of these items.

No claim of license on works of others or derived data.
Loading

0 comments on commit aea3bc4

Please sign in to comment.