## Live Code: Python Sets

In this example we'll play around with call results from the [New York Times Books API](https://developer.nytimes.com/docs/books-product/1/overview) to demonstrate the use of __set operations__ (stay tuned for week 6, to learn more about APIs). 

With the Book API we can access data from the NY Times Bestseller List
The Books API has service, that returns best sellers for a specified date and list-name.
The request requires two parameters: {publishing date} and {list}

We'll look at following categories: 
* Hardcover Fiction
* Hardcover Nonfiction
* Paperback Trade Fiction
* Paperback Nonfiction

These are updated weekly, we’ll look at lists of the current and previous week for comparison.

In the first part of this code we'll create sets of titles for each category and week, in the second section we'll make use of set operations to get insights about the bestsellers. 

Things that we cand find out:
- which books have stayed in the top 15 compared to the previous week? 
- which titles are newcomers?
- ...

### Generating Sets

In [23]:
# import requests and json libraries
import requests
import json

# this function will make requests to the Books API
# and generate sets of bestsellers for different lists
# by passing 'date' as an argument, we can later call this function 
# several times for the lists of the current and the previous weeks
def generateSets(date):
    
    # if you want to play around with the API, please make your own key at https://developer.nytimes.com/
    authorized_key = "QftZeSssSfBqTSFet3RBaTE9inc3iWAw"
    # create list of the categories we want to access:
    # please refer to the 'class_code/nytimes_bestseller_find_lists.ipynb' notebook
    # to understand how you can retrieve the encoded list names
    categories = ['hardcover-fiction', 'hardcover-nonfiction', 'paperback-nonfiction', 'trade-fiction-paperback']
    
    """ This is an excerpt of the data structure the API will return:      
{(...)
 (...)
 'results': {(...)
     (...)
     'books': [{(...)
         (...)
         'title': 'LITTLE FIRES EVERYWHERE',
         'contributor': 'by Celeste Ng',
    
    """
    
    # our goal is to create a set for each of the above categories, 
    # containing the title of the top 15 books
    
    # step 1: 
    # declare a global variable, so we can access it later outside of the function
    global bestseller_titles 
    # create an empty, nested list (one list for each category)
    bestseller_titles = [[],[],[],[]] 
                
    # step 2: 
    # populate those lists with the top-15 titles in the respective category

    # create a variable to index the nested list
    index = 0

    for category in categories:
        # call the API-url
        # use string formatters to parse in the date, category, and the API-key
        api_url = "https://api.nytimes.com/svc/books/v3/lists/{}/{}.json?api-key={}".format(date, category, authorized_key)

        # call the API with requests
        response = requests.get(api_url)
        # create a variable called 'data' to hold the json formatted result
        data = response.json()

        # define the 'path' inside the json structure
        books = data['results']['books']

        for book in books:
            # append the title to bestseller_titles at the current index
            bestseller_titles[index].append(book['title'])
        
        # +1 to jump to the next nested list
        index += 1

    print(bestseller_titles)

In [11]:
# call the generatSets() function 
# with 'date' = 'current' to recieve this week's bestseller list
generateSets('current')

4


In [3]:
# create a set from each nested list
hc_fiction_jun21 = set(bestseller_titles[0]) 
hc_nonfiction_jun21 = set(bestseller_titles[1])
pb_nonfiction_jun21 = set(bestseller_titles[2])
pb_fiction_jun21 = set(bestseller_titles[3])

print('Hardcover Fiction, June 21:\n', hc_fiction_jun21)
print('\nHardcover Nonfiction, June 21:\n', hc_nonfiction_jun21)
print('\nPaperback Nonfiction, June 21:\n', pb_nonfiction_jun21)
print('\nPaperback Fiction, June 21:\n', pb_fiction_jun21)

Hardcover Fiction, June 21:

Hardcover Nonfiction, June 21:
 {'BETWEEN THE WORLD AND ME', 'BECOMING', 'HOW TO BE AN ANTIRACIST', 'UNTAMED', 'ME AND WHITE SUPREMACY', 'UNITED STATES OF SOCIALISM', 'EDUCATED', 'THE MAMBA MENTALITY', 'FORTITUDE', 'OUR TIME IS NOW', 'COUNTDOWN 1945', 'THE DEFICIT MYTH', 'MY VANISHING COUNTRY', "I'M STILL HERE", 'THE SPLENDID AND THE VILE'}

Paperback Nonfiction, June 21:
 {'THE COLOR OF LAW', 'JUST MERCY', 'THE NEW JIM CROW', 'THE COLOR OF COMPROMISE', 'BORN A CRIME', 'WHITE RAGE', 'STAMPED FROM THE BEGINNING', 'THE GREAT INFLUENZA', "WHY I'M NO LONGER TALKING TO WHITE PEOPLE ABOUT RACE", 'WAKING UP WHITE', 'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA?', 'SO YOU WANT TO TALK ABOUT RACE', 'RAISING WHITE KIDS', 'THE BODY KEEPS THE SCORE', 'WHITE FRAGILITY'}

Paperback Fiction, June 21:
 {'THE NIGHTINGALE', 'HUSH', "THE HANDMAID'S TALE", 'NORMAL PEOPLE', 'THE TATTOOIST OF AUSCHWITZ', 'THE BLUEST EYE', 'A MINUTE TO MIDNIGHT', 'THEN SHE WAS GON

In [4]:
# call the generatSets() function again
# with 'date' = '2020-06-14' to recieve last week's bestseller list
generateSets('2020-06-14')



In [5]:
# create a set from each nested list
hc_fiction_jun14 = set(bestseller_titles[0]) 
hc_nonfiction_jun14 = set(bestseller_titles[1]) 
pb_nonfiction_jun14 = set(bestseller_titles[2]) 
pb_fiction_jun14 = set(bestseller_titles[3]) 

print('Hardcover Fiction, June 14:\n', hc_fiction_jun14)
print('\nHardcover Nonfiction, June 14:\n', hc_nonfiction_jun14)
print('\nPaperback Nonfiction, June 14:\n', pb_nonfiction_jun14)
print('\nPaperback Fiction, June 14:\n', pb_fiction_jun14)

Hardcover Fiction, June 14:

Hardcover Nonfiction, June 14:
 {'PLAGUE OF CORRUPTION', 'HOW TO BE AN ANTIRACIST', 'BECOMING', 'UNTAMED', 'ME AND WHITE SUPREMACY', 'EDUCATED', 'THE MAMBA MENTALITY', 'BREATH', 'FORTITUDE', 'THE CHIFFON TRENCHES', 'AMERICAN CRUSADE', 'MY VANISHING COUNTRY', 'HOLLYWOOD PARK', 'HIDDEN VALLEY ROAD', 'THE SPLENDID AND THE VILE'}

Paperback Nonfiction, June 14:
 {'THE COLOR OF LAW', 'JUST MERCY', 'THE NEW JIM CROW', 'A WOMAN OF NO IMPORTANCE', 'OUTLIERS', 'THE GREAT INFLUENZA', 'BORN A CRIME', 'BRAIDING SWEETGRASS', 'UNORTHODOX', 'WHITE FRAGILITY', 'SAPIENS', 'SO YOU WANT TO TALK ABOUT RACE', 'THINKING, FAST AND SLOW', 'THE BODY KEEPS THE SCORE', 'GRIT'}

Paperback Fiction, June 14:
 {'THE NIGHTINGALE', 'THE BOOK WOMAN OF TROUBLESOME CREEK', 'CITY OF GIRLS', 'NORMAL PEOPLE', 'THE TATTOOIST OF AUSCHWITZ', 'CALL ME BY YOUR NAME', 'THEN SHE WAS GONE', 'LITTLE FIRES EVERYWHERE', 'THIS TENDER LAND', 'CIRCE', 'BEFORE WE WERE YOURS', 'BEACH READ', 'A GENTLEMAN IN MOSC

## Set Operations

Now that we have declared multiple sets of books, let's make use of set operations to get insights about the bestsellers.

In [6]:
# create an intersection function to test if a books shows up in two categories
def intersection(A , B): 
    inter = set(A) & set(B)
    print('A & B\nFollowing books match your criteria:\n{}\n'.format(inter))

# call the function
# show titles in paperback nonfiction, that were both on this and last week's bestseller list
intersection(pb_nonfiction_jun21, pb_nonfiction_jun14)

A & B
Following books match your criteria:
{'THE COLOR OF LAW', 'JUST MERCY', 'THE NEW JIM CROW', 'THE GREAT INFLUENZA', 'BORN A CRIME', 'SO YOU WANT TO TALK ABOUT RACE', 'THE BODY KEEPS THE SCORE', 'WHITE FRAGILITY'}



In [7]:
# create a difference function
def difference(A , B): 
    diff = set(A) - set(B)
    print('A - B\nFollowing books match your criteria:\n{}\n'.format(diff))

# call the function
# show this week's newcomers in the paperback nonfiction category
difference(pb_nonfiction_jun21, pb_nonfiction_jun14)

A - B
Following books match your criteria:
{'THE COLOR OF COMPROMISE', 'WHITE RAGE', 'STAMPED FROM THE BEGINNING', "WHY I'M NO LONGER TALKING TO WHITE PEOPLE ABOUT RACE", 'WAKING UP WHITE', 'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA?', 'RAISING WHITE KIDS'}



In [8]:
# create a union function to show two categories combined
def union(A , B): 
    union = set(A) | set(B)
    print('A | B\nFollowing books match your criteria:\n{}\n'.format(union))

# call the function
# show paperback nonfiction titles of this and last week combined
union(pb_nonfiction_jun21, pb_nonfiction_jun14)

A | B
Following books match your criteria:
{'THE NEW JIM CROW', 'BORN A CRIME', 'STAMPED FROM THE BEGINNING', 'BRAIDING SWEETGRASS', "WHY I'M NO LONGER TALKING TO WHITE PEOPLE ABOUT RACE", 'WAKING UP WHITE', 'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA?', 'SO YOU WANT TO TALK ABOUT RACE', 'RAISING WHITE KIDS', 'THE BODY KEEPS THE SCORE', 'WHITE FRAGILITY', 'GRIT', 'THE COLOR OF LAW', 'JUST MERCY', 'A WOMAN OF NO IMPORTANCE', 'OUTLIERS', 'THE COLOR OF COMPROMISE', 'THE GREAT INFLUENZA', 'WHITE RAGE', 'UNORTHODOX', 'SAPIENS', 'THINKING, FAST AND SLOW'}



In [9]:
# Show ALL nonfiction bestsellers, current and last week combined
all_nonfiction = pb_nonfiction_jun21 | pb_nonfiction_jun14 | hc_nonfiction_jun21 | hc_nonfiction_jun14
print(all_nonfiction)

{'THE NEW JIM CROW', 'HOW TO BE AN ANTIRACIST', 'BORN A CRIME', 'BRAIDING SWEETGRASS', 'MY VANISHING COUNTRY', 'COUNTDOWN 1945', 'AMERICAN CRUSADE', 'HOLLYWOOD PARK', 'RAISING WHITE KIDS', 'THE SPLENDID AND THE VILE', 'THE BODY KEEPS THE SCORE', 'THE COLOR OF LAW', 'A WOMAN OF NO IMPORTANCE', 'ME AND WHITE SUPREMACY', 'THE CHIFFON TRENCHES', 'HIDDEN VALLEY ROAD', 'THINKING, FAST AND SLOW', 'BECOMING', 'UNITED STATES OF SOCIALISM', 'STAMPED FROM THE BEGINNING', 'EDUCATED', 'FORTITUDE', "WHY I'M NO LONGER TALKING TO WHITE PEOPLE ABOUT RACE", 'WAKING UP WHITE', 'WHY ARE ALL THE BLACK KIDS SITTING TOGETHER IN THE CAFETERIA?', 'SO YOU WANT TO TALK ABOUT RACE', 'THE DEFICIT MYTH', 'WHITE FRAGILITY', 'GRIT', 'JUST MERCY', 'PLAGUE OF CORRUPTION', 'OUTLIERS', 'BETWEEN THE WORLD AND ME', 'UNTAMED', 'THE COLOR OF COMPROMISE', 'THE GREAT INFLUENZA', 'WHITE RAGE', 'THE MAMBA MENTALITY', 'BREATH', 'UNORTHODOX', 'OUR TIME IS NOW', 'SAPIENS', "I'M STILL HERE"}
