# Web Scrapping on [GoodReads](https://www.goodreads.com/)

Reload the kernel:

In [1]:
# Some more magic so that the notebook will reload external python modules;
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

# 1. Scrape data with [scrapereads](https://github.com/arthurdjn/scrape-goodreads)

Basic import:

In [2]:
from scrapereads import GoodReads
from scrapereads import Author, Book, Quote

## 1.1. GoodReads scraping from the API

The API uses ``id`` to search authors, books or quotes.

In [5]:
goodreads = GoodReads()

In [6]:
# Examples
# Author id: 3389 -> Stephen King
#            1077326 -> J. K. Rolling

AUTHOR_ID = 3389
author = goodreads.search_author(AUTHOR_ID)

author

Author: Stephen King

In [7]:
info = author.get_info()

print(info['Description'])

Stephen Edwin King was born the second son of Donald and Nellie Ruth Pillsbury King. After his father left them when Stephen was two, he and his older brother, David, were raised by his mother. Parts of his childhood were spent in Fort Wayne, Indiana, where his father's family was at the time, and in Stratford, Connecticut. When Stephen was eleven, his mother brought her children back to Durham, Maine, for good. Her parents, Guy and Nellie Pillsbury, had become incapacitated with old age, and Ruth King was persuaded by her sisters to take over the physical care of them. Other family members provided a small house in Durham and financial support. After Stephen's grandparents passed away, Mrs. King found work in the kitchens of Pineland, a nearby residential facility for the mentally challenged.

Stephen attended the grammar school in Durham and Lisbon Falls High School, graduating in 1966. From his sophomore year at the University of Maine at Orono, he wrote a weekly column for the scho

Similarly, to retrieve books (you can use ``top_k=NUMBER`` to look for the first $k$ books):

In [9]:
books = goodreads.search_books(AUTHOR_ID, top_k=5)

books

[Stephen King: "The Shining", 387 editions,
 Stephen King: "It", 313 editions,
 Stephen King: "The Stand", 230 editions,
 Stephen King: "Misery", 263 editions,
 Stephen King: "Carrie", 315 editions]

## 1.2. Retrieve data

#### WARNING: *scrapereads* uses a cache system. Each time a connection is made and data is extracted, it is saved within the object.

For example, if 5 quotes are retrieved from *goodreads.com*, they wil be stocked in ``_quotes`` private attribute. Then, you can have access through ``.get_quotes()`` method (or ``.quote()`` if you wan to iter on ``_quotes``).

In [10]:
author = Author(AUTHOR_ID)

author

Author: Stephen King

In [11]:
quotes = author.get_quotes(top_k=5)

for quote in quotes:
    print(quote)
    print()

“Books are a uniquely portable magic.”
― Stephen King, from "On Writing: A Memoir Of The Craft"
  Likes: 16225, Tags: books, magic, reading

“If you don't have time to read, you don't have the time (or the tools) to write. Simple as that.”
― Stephen King
  Likes: 12565, Tags: reading, writing

“Get busy living or get busy dying.”
― Stephen King, from "Different Seasons"
  Likes: 9014, Tags: life

“Books are the perfect entertainment: no commercials, no batteries, hours of enjoyment for each dollar spent. What I wonder is why everybody doesn't carry a book around for those inevitable dead spots in life.”
― Stephen King
  Likes: 8667, Tags: books

“When his life was ruined, his family killed, his farm destroyed, Job knelt down on the ground and yelled up to the heavens, "Why god? Why me?" and the thundering voice of God answered, There's just something about you that pisses me off.”
― Stephen King, from "Storm Of The Century"
  Likes: 7686, Tags: god, humor, religion



In [12]:
author.get_books()

[Stephen King: "On Writing: A Memoir Of The Craft",
 Stephen King: "Different Seasons",
 Stephen King: "Storm Of The Century"]

With the cache system, 5 quotes have been added to the cache ``_quotes`` of ``author``.

In [13]:
num_quotes = len(author.get_quotes())

print(f'Number of quotes in the cache: {num_quotes}')

Number of quotes in the cache: 5


This behavior takes place in ``Book`` classes too.

**The ``cache=False`` attribute can be used to suppress this behavior**

In [14]:
quotes = author.get_quotes(top_k=3, cache=False)

num_quotes = len(author.get_quotes())

print(f'Number of quotes in the cache: {num_quotes}')

Number of quotes in the cache: 3


In [15]:
books = author.get_books(top_k=3)

books

[Stephen King: "On Writing: A Memoir Of The Craft",
 Stephen King: "Different Seasons",
 Stephen King: "Storm Of The Century"]

Setting ``cache=False`` will ignore the cache and scrape data from *goodreads.com*.

## 1.3. Objects interactions

Let's remind that ``Quote``, ``Book`` inherit from ``Author`` structure.

You can, from a quote, retrieve its author (or from a book). From an author, you can search for specific quotes or books (by looking at their id).

In [16]:
quote

“When his life was ruined, his family killed, his farm destroyed, Job knelt down on the ground and yelled up to the heavens, "Why god? Why me?" and the thundering voice of God answered, There's just something about you that pisses me off.”
― Stephen King, from "Storm Of The Century"
  Likes: 7686, Tags: god, humor, religion

In [17]:
quote.get_author()

Author: Stephen King

In [18]:
book = books[-1]

book

Stephen King: "Storm Of The Century"

In [19]:
quote.get_book()

Stephen King: "Storm Of The Century"

...and their parent are the same object:

In [20]:
book.get_author() == quote.get_author()

True

In [21]:
quote.get_book() == book

True

# 2. Export data

You can use the ``.to_json(encode='ascii')`` method to save an object in a JSON format. The ``encode='ascii'`` is used to encode quotes and text in ASCII format (and remove all accents). You can turn it of by leaving ``encode=None``.

In [22]:
quote.to_json(encode=None)

{'author': 'Stephen King',
 'book': 'Storm Of The Century',
 'likes': 7686,
 'tags': ['god', 'humor', 'religion'],
 'quote': 'When his life was ruined, his family killed, his farm destroyed, Job knelt down on the ground and yelled up to the heavens, "Why god? Why me?" and the thundering voice of God answered, There\'s just something about you that pisses me off.'}