## Live Code: Python Sets

In this example we'll play around with call results from the [New York Times Books API](https://developer.nytimes.com/docs/books-product/1/overview) to demonstrate the use of __set operations__ (stay tuned for week 6, to learn more about APIs). 

With the Book API we can access data from the NY Times Bestseller List
The Books API has service, that returns best sellers for a specified date and list-name.
The request requires two parameters: {publishing date} and {list}

We'll look at following categories: 
* Hardcover Fiction
* Hardcover Nonfiction
* Paperback Trade Fiction
* Paperback Nonfiction

These are updated weekly, we’ll look at lists of the current and previous week for comparison.

In the first part of this code we'll create sets of titles for each category and week, in the second section we'll make use of set operations to get insights about the bestsellers. 

Things that we cand find out:
- which books have stayed in the top 15 compared to the previous week? 
- which titles are newcomers?
- ...

### Generating Sets

In [6]:
# import requests and json libraries
import requests
import json

# this function will make requests to the Books API
# and generate sets of bestsellers for different lists
# by passing 'date' as an argument, we can later call this function 
# several times for the lists of the current and the previous weeks
def generateSets(date):
    
    # if you want to play around with the API, please make your own key at https://developer.nytimes.com/
    authorized_key = "hIuztxwsXJG9CIqWAXU5em7NFCEAWFs2"
    # create list of the categories we want to access:
    # please refer to the 'class_code/nytimes_bestseller_find_lists.ipynb' notebook
    # to understand how you can retrieve the encoded list names
    categories = ['hardcover-fiction', 'hardcover-nonfiction', 'paperback-nonfiction', 'trade-fiction-paperback']
    
    """ This is an excerpt of the data structure the API will return:      
{(...)
 (...)
 'results': {(...)
     (...)
     'books': [{(...)
         (...)
         'title': 'LITTLE FIRES EVERYWHERE',
         'contributor': 'by Celeste Ng',
    
    """
    
    # our goal is to create a set for each of the above categories, 
    # containing the title of the top 15 books
    
    # step 1: 
    # declare a global variable, so we can access it later outside of the function
    global bestseller_titles 
    # create an empty, nested list (one list for each category)
    bestseller_titles = [[],[],[],[]] 
                
    # step 2: 
    # populate those lists with the top-15 titles in the respective category

    # create a variable to index the nested list
    index = 0

    for category in categories:
        # call the API-url
        # use string formatters to parse in the date, category, and the API-key
        api_url = "https://api.nytimes.com/svc/books/v3/lists/{}/{}.json?api-key={}".format(date, category, authorized_key)

        # call the API with requests
        response = requests.get(api_url)
        # create a variable called 'data' to hold the json formatted result
        data = response.json()

        # define the 'path' inside the json structure
        books = data['results']['books']

        for book in books:
            # append the title to bestseller_titles at the current index
            bestseller_titles[index].append(book['title'])
        
        # +1 to jump to the next nested list
        index += 1
    print(bestseller_titles)

In [7]:
# call the generateSets() function 
# with 'date' = 'current' to recieve this week's bestseller list
generateSets('current')



In [8]:
# call the generateSets() function again
# with 'date' = '2020-06-14' to recieve last week's bestseller list
generateSets('2020-06-14')



In [9]:
# create a set from each nested list
hc_fiction_jun21 = set(bestseller_titles[0]) 
hc_nonfiction_jun21 = set(bestseller_titles[1])
pb_nonfiction_jun21 = set(bestseller_titles[2])
pb_fiction_jun21 = set(bestseller_titles[3])

print('Hardcover Fiction, June 21:\n', hc_fiction_jun21)
print('\nHardcover Nonfiction, June 21:\n', hc_nonfiction_jun21)
print('\nPaperback Nonfiction, June 21:\n', pb_nonfiction_jun21)
print('\nPaperback Fiction, June 21:\n', pb_fiction_jun21)

Hardcover Fiction, June 21:

Hardcover Nonfiction, June 21:
 {'THE MAMBA MENTALITY', 'HIDDEN VALLEY ROAD', 'MY VANISHING COUNTRY', 'HOW TO BE AN ANTIRACIST', 'HOLLYWOOD PARK', 'PLAGUE OF CORRUPTION', 'THE SPLENDID AND THE VILE', 'BREATH', 'FORTITUDE', 'EDUCATED', 'AMERICAN CRUSADE', 'BECOMING', 'THE CHIFFON TRENCHES', 'ME AND WHITE SUPREMACY', 'UNTAMED'}

Paperback Nonfiction, June 21:
 {'THINKING, FAST AND SLOW', 'SO YOU WANT TO TALK ABOUT RACE', 'UNORTHODOX', 'JUST MERCY', 'GRIT', 'THE GREAT INFLUENZA', 'THE BODY KEEPS THE SCORE', 'BRAIDING SWEETGRASS', 'THE COLOR OF LAW', 'A WOMAN OF NO IMPORTANCE', 'WHITE FRAGILITY', 'SAPIENS', 'BORN A CRIME', 'OUTLIERS', 'THE NEW JIM CROW'}

Paperback Fiction, June 21:
 {'THIS TENDER LAND', 'NORMAL PEOPLE', 'LITTLE FIRES EVERYWHERE', 'THEN SHE WAS GONE', 'BEACH READ', 'THE NIGHTINGALE', 'THE WOMAN IN THE WINDOW', 'CIRCE', 'THE OVERSTORY', 'A GENTLEMAN IN MOSCOW', 'CALL ME BY YOUR NAME', 'CITY OF GIRLS', 'BEFORE WE WERE YOURS', 'THE TATTOOIST OF AU

In [10]:
# create a set from each nested list
hc_fiction_jun14 = set(bestseller_titles[0]) 
hc_nonfiction_jun14 = set(bestseller_titles[1]) 
pb_nonfiction_jun14 = set(bestseller_titles[2]) 
pb_fiction_jun14 = set(bestseller_titles[3]) 

print('Hardcover Fiction, June 14:\n', hc_fiction_jun14)
print('\nHardcover Nonfiction, June 14:\n', hc_nonfiction_jun14)
print('\nPaperback Nonfiction, June 14:\n', pb_nonfiction_jun14)
print('\nPaperback Fiction, June 14:\n', pb_fiction_jun14)

Hardcover Fiction, June 14:

Hardcover Nonfiction, June 14:
 {'THE MAMBA MENTALITY', 'HIDDEN VALLEY ROAD', 'MY VANISHING COUNTRY', 'HOW TO BE AN ANTIRACIST', 'HOLLYWOOD PARK', 'PLAGUE OF CORRUPTION', 'THE SPLENDID AND THE VILE', 'BREATH', 'FORTITUDE', 'EDUCATED', 'AMERICAN CRUSADE', 'BECOMING', 'THE CHIFFON TRENCHES', 'ME AND WHITE SUPREMACY', 'UNTAMED'}

Paperback Nonfiction, June 14:
 {'THINKING, FAST AND SLOW', 'SO YOU WANT TO TALK ABOUT RACE', 'UNORTHODOX', 'JUST MERCY', 'GRIT', 'THE GREAT INFLUENZA', 'THE BODY KEEPS THE SCORE', 'BRAIDING SWEETGRASS', 'THE COLOR OF LAW', 'A WOMAN OF NO IMPORTANCE', 'WHITE FRAGILITY', 'SAPIENS', 'BORN A CRIME', 'OUTLIERS', 'THE NEW JIM CROW'}

Paperback Fiction, June 14:
 {'THIS TENDER LAND', 'NORMAL PEOPLE', 'LITTLE FIRES EVERYWHERE', 'THEN SHE WAS GONE', 'BEACH READ', 'THE NIGHTINGALE', 'THE WOMAN IN THE WINDOW', 'CIRCE', 'THE OVERSTORY', 'A GENTLEMAN IN MOSCOW', 'CALL ME BY YOUR NAME', 'CITY OF GIRLS', 'BEFORE WE WERE YOURS', 'THE TATTOOIST OF AU

## Set Operations

Now that we have declared multiple sets of books, let's make use of set operations to get insights about the bestsellers.

In [11]:
# create an intersection function
def intersection(A,B):
    inter =A&B
    print(inter)

intersection(hc_fiction_jun21,hc_fiction_jun14)



In [25]:
# create a difference function
def difference(A,B):
    diff=A-B
    print(diff)
    
difference(hc_fiction_jun14,hc_fiction_jun21)

set()


In [22]:
# create a union function 
def union(A,B):
    uni= A|B
    print(uni)
union(hc_fiction_jun21,hc_fiction_jun14)




In [24]:
# perform an operation on more than two sets
all_fiction = hc_fiction_jun21|hc_fiction_jun14|pb_nonfiction_jun14
all_fiction

{'A WOMAN OF NO IMPORTANCE',
 'ALL ADULTS HERE',
 'AMERICAN DIRT',
 'BIG SUMMER',
 'BORN A CRIME',
 'BRAIDING SWEETGRASS',
 'CAMINO WINDS',
 'GRIT',
 'HIDEAWAY',
 'IF IT BLEEDS',
 'JUST MERCY',
 'OUTLIERS',
 'SAPIENS',
 'SO YOU WANT TO TALK ABOUT RACE',
 'THE 20TH VICTIM',
 'THE BODY KEEPS THE SCORE',
 'THE BOOK OF LONGINGS',
 'THE COLOR OF LAW',
 'THE GIVER OF STARS',
 'THE GREAT INFLUENZA',
 'THE LAST TRIAL',
 'THE NEW JIM CROW',
 'THE SILENT PATIENT',
 'THINKING, FAST AND SLOW',
 'UNORTHODOX',
 'WALK THE WIRE',
 'WHERE THE CRAWDADS SING',
 'WHITE FRAGILITY',
 'WRATH OF POSEIDON'}