In this notebook we attempt to scrape and gather newly relased fiction books ordered by subjects from [Barnes&Noble](https://www.barnesandnoble.com/). We collect our data as follows:
- scrape to get the list of subjects used to categorize the fiction books. 
- for each subject we will accumulate 40 different books along with the title, author, and summary(if it exists) for each book in the subject.
<br>if a summary does not exist, and if we are missing just a few then we should be able to manually replace using targeted scrape functions. 

Note: Be aware that we only take 40 because B&N website is terrible and will max out at 40 books per page. We may collect more data by scraping more than one page if we think it is neccessary for the purpose of our project. 



In [9]:
from bs4 import BeautifulSoup
import requests
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

The reason we are using both Requests and Selenium Python libraries is due to the fact that some pages in B&N's website do not work with Requests, and for those pages we use Selenium.  

In [21]:
#Make the list of subjects along with their urls. 

URL = "https://www.barnesandnoble.com/b/fiction/books/_/N-2usxZ29Z8q8"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

results = soup.find(id="sidebar-section-FictionSubjects")

rom_hor = []
for litag in results.find_all('a'):
    if litag.text == 'Romance' or litag.text == 'Horror' :
        rom_hor.append((str(litag).split('"')[1], litag.text.strip()))       


In [22]:
rom_hor

[('/b/books/fiction/horror/_/N-29Z8q8Z1d51', 'Horror'),
 ('/b/books/romance/_/N-29Z8q8Z17y3', 'Romance')]

In [23]:
#Make the list of new releases along with their urls for the subjects Horror and Romance.

def subjects_new_relases_url(list):
    new_releases_url = []
    for link in list:
        url = 'https://www.barnesandnoble.com' + link[0]
        page = requests.get(url)
        soup = BeautifulSoup(page.content, "html.parser")
        if link == ('/b/books/fiction/horror/_/N-29Z8q8Z1d51', 'Horror'):
            target = soup.find(id='hotBooksWithDesc_DigitalHorror:ChillingeBooks')
            new_releases_url.append((str(target.find('a', class_='see-all-link')).split('"')[3],link[1]))
        else:
            target = soup.find(id='hotBooksWithDesc_NewReleases')
            new_releases_url.append((str(target.find('a', class_='see-all-link')).split('"')[3],link[1]))
        
    return new_releases_url

In [25]:
subjects_new_relases_url(rom_hor)

[('/b/books/fiction/horror/_/N-1z13q39Z29Z8q8Z1d51', 'Horror'),
 ('/b/books/romance/_/N-1sZ29Z8q8Z17y3?Ns=P_Sales_Rank', 'Romance')]

The new releases page by default only displays 20 books per page, but one can choose to show 40 books on one the page. Hence, the following code snippet accomplishes just that.

Note: The horror subject doesn't have new releases and instead we scraped the category *Digital Horror: Chilling eBooks*.

In [110]:
def show_200(list):    
    show_200_list = []
    for i in range(26,31):
        for link in list:
            if link[0] == '/b/books/fiction/horror/_/N-1z13q39Z29Z8q8Z1d51':
                url = 'https://www.barnesandnoble.com' + link[0] + f'?Nrpp=40&page={i}'         
            else:
                url = 'https://www.barnesandnoble.com' + link[0].replace("s=P_Sales_Rank",f'rpp=40&Ns=P_Sales_Rank&page={i}')
            show_200_list.append((url,link[1]))
    return show_200_list

In [111]:
show_200(subjects_new_relases_url(rom_hor))[:8]

[('https://www.barnesandnoble.com/b/books/fiction/horror/_/N-1z13q39Z29Z8q8Z1d51?Nrpp=40&page=26',
  'Horror'),
 ('https://www.barnesandnoble.com/b/books/romance/_/N-1sZ29Z8q8Z17y3?Nrpp=40&Ns=P_Sales_Rank&page=26',
  'Romance'),
 ('https://www.barnesandnoble.com/b/books/fiction/horror/_/N-1z13q39Z29Z8q8Z1d51?Nrpp=40&page=27',
  'Horror'),
 ('https://www.barnesandnoble.com/b/books/romance/_/N-1sZ29Z8q8Z17y3?Nrpp=40&Ns=P_Sales_Rank&page=27',
  'Romance'),
 ('https://www.barnesandnoble.com/b/books/fiction/horror/_/N-1z13q39Z29Z8q8Z1d51?Nrpp=40&page=28',
  'Horror'),
 ('https://www.barnesandnoble.com/b/books/romance/_/N-1sZ29Z8q8Z17y3?Nrpp=40&Ns=P_Sales_Rank&page=28',
  'Romance'),
 ('https://www.barnesandnoble.com/b/books/fiction/horror/_/N-1z13q39Z29Z8q8Z1d51?Nrpp=40&page=29',
  'Horror'),
 ('https://www.barnesandnoble.com/b/books/romance/_/N-1sZ29Z8q8Z17y3?Nrpp=40&Ns=P_Sales_Rank&page=29',
  'Romance')]

In [112]:
#Make the list containing the urls for all the newly released books along with their subjects.

def get_all_new_relases(list):
    
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    browser = webdriver.Chrome(options=chrome_options)

    all_new_releases_url_list = []
    for x in list:
        browser.get(x[0])
        html = browser.page_source
        soup = BeautifulSoup(html, "html.parser")        
        target = soup.find_all('a', attrs = {'class' : 'pImageLink'})
        for y in target:
            a = str(y).split('"')[3]
            all_new_releases_url_list.append((a,x[1]))
    return all_new_releases_url_list

In [113]:
urls = get_all_new_relases(show_200(subjects_new_relases_url(rom_hor)))

In [114]:
urls[:5]

[('/w/30-days-of-night-tim-lebbon/1100332337;jsessionid=6A34B9C8CA9F330F13BD6A4A9F3414E8.prodny_store01-atgap17?ean=9781439169339',
  'Horror'),
 ('/w/monsters-corner-christopher-golden/1103110013;jsessionid=6A34B9C8CA9F330F13BD6A4A9F3414E8.prodny_store01-atgap17?ean=9781429984447',
  'Horror'),
 ('/w/red-white-and-blood-christopher-farnsworth/1107390383;jsessionid=6A34B9C8CA9F330F13BD6A4A9F3414E8.prodny_store01-atgap17?ean=9781101580646',
  'Horror'),
 ('/w/march-in-country-e-e-knight/1019557939;jsessionid=6A34B9C8CA9F330F13BD6A4A9F3414E8.prodny_store01-atgap17?ean=9781101188484',
  'Horror'),
 ('/w/preacher-book-five-garth-ennis/1118958569;jsessionid=6A34B9C8CA9F330F13BD6A4A9F3414E8.prodny_store01-atgap17?ean=9781401254513',
  'Horror')]

In [115]:
len(urls)

400

In [116]:
#Get the title, author and summary for each newly released book while keeping track of what subject the book belongs to.

def get_data(list):
    
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    browser = webdriver.Chrome(options=chrome_options)
    
    data_dict = {"title":[], "author":[], "overview":[], "subject":[]}
    for link in list:
        url = 'https://www.barnesandnoble.com' + link[0]
        browser.get(url)
        html = browser.page_source
        soup = BeautifulSoup(html, 'html.parser')
        
        #get title
        title = soup.find('h1')
        title = str(title).split('>')[1].split('<')[0]
        
         #get author
        author = soup.find('span', attrs = {'id' : 'key-contributors'})
        author = str(author).split('>')[2].split('<')[0]
            
         #get summary if there is one
        overview = soup.find('div', attrs = {'class': 'overview-cntnt'})
        if overview:
            overview = overview.text.strip()
        else: 
            overview = 'None'
        
        data_dict["title"].append(title)
        data_dict["author"].append(author)
        data_dict["overview"].append(overview)
        data_dict["subject"].append(link[1])
            
                    
    return data_dict     
        
        

In [117]:
data3 = get_data(urls)

In [67]:
import pandas as pd

In [68]:
df = pd.DataFrame.from_dict(data)

In [78]:
df

Unnamed: 0,title,author,overview,subject
0,The Book of Accidents: A Novel,Chuck Wendig,A family returns to their hometown—and to the ...,Horror
1,The Final Girl Support Group,Grady Hendrix,THE INSTANT NEW YORK TIMES BESTSELLER“The horr...,Horror
2,The Institute: A Novel,Stephen King,From #1 New York Times bestselling author Step...,Horror
3,The Southern Book Club's Guide to Slaying Vamp...,Grady Hendrix,The New York Times Best SellerA Barnes & Noble...,Horror
4,Lisey's Story,Stephen King,*Now an Apple TV+ limited series starring Juli...,Horror
...,...,...,...,...
395,A Gentleman Never Tells,Amelia Grey,Enter the unforgettable world of New York Time...,Romance
396,Nursing A Broken Heart Complete BOXSET : Weste...,Molly Maco,"Dean left everything he knew behind, the day h...",Romance
397,Summer Lovin',Julie Cannon,Three stories of fun in the sun during one unf...,Romance
398,One Weekend in Aspen,Jaime Clevenger,"Fresh out of a ten-year relationship, Emily Br...",Romance


In [81]:
df1 = pd.DataFrame.from_dict(data1)

In [82]:
df1

Unnamed: 0,title,author,overview,subject
0,The Girl Who Loved Tom Gordon,Stephen King,"From international bestseller Stephen King, a ...",Horror
1,Winter Moon,Dean Koontz,BONUS: This edition contains an excerpt from D...,Horror
2,The Fisherman,John Langan,"In upstate New York, in the woods around Woods...",Horror
3,Ruddy Gore (Phryne Fisher Series #7),Kerry Greenwood,"""The appeal of this story is the glimpse it pr...",Horror
4,The Illustrated Man,Ray Bradbury,"You could hear the voices murmuring, small and...",Horror
...,...,...,...,...
395,Ethan Frome,Edith Wharton,"Ethan Frome, written in 1911 by Pulitzer Prize...",Romance
396,The Druid Chronicles: Four Book Complete Colle...,Christina Phillips,"They were destined to be enemies, but fate had...",Romance
397,The Dark Crusader,Jackie Ivie,Set in the wake of the disastrous Seventh Crus...,Romance
398,Cassethea of Mercia,Regina Hanel,History and archeology professor at St. Albans...,Romance


In [91]:
df2 = pd.DataFrame.from_dict(data2)

In [92]:
df2

Unnamed: 0,title,author,overview,subject
0,From a Buick 8,Stephen King,The #1 New York Times bestseller from Stephen ...,Horror
1,The Fall (Strain Trilogy #2),Guillermo del Toro,“A cross between The Hot Zone and ’Salem’s Lot...,Horror
2,Exquisite Corpse,Poppy Z. Brite,"From the author of Lost Souls, Drawing Blood, ...",Horror
3,Rain (Hudson Series #1),V. C. Andrews,RAIN ALREADY KNOWS HOW HARD LIFE CAN BE. BUT S...,Horror
4,Prince Lestat (Vampire Chronicles Series #11),Anne Rice,"A stunning departure, a surprising and compell...",Horror
...,...,...,...,...
395,The Lucky Heart,Devney Perry,Life on his ranch could be their future. If th...,Romance
396,The Outpost,Devney Perry,"Trapped in his tiny mountain cabin, she didn't...",Romance
397,The Clover Chapel,Devney Perry,She took a dare and fell in love. Truth would ...,Romance
398,Agnes Grey,Anne Brontë,Part of our collection of Young Adult Classics...,Romance


In [118]:
df3 = pd.DataFrame.from_dict(data3)

In [122]:
df3

Unnamed: 0,title,author,overview,subject
0,30 Days of Night: Fear of the Dark,Tim Lebbon,Marty Volk has a guardian angel. For the past ...,Horror
1,The Monster's Corner: Stories Through Inhuman ...,Christopher Golden,An all original anthology from some of todays ...,Horror
2,"Red, White, and Blood (Nathaniel Cade Series #3)",Christopher Farnsworth,"“This is not a man. In almost a hundred years,...",Horror
3,March in Country (Vampire Earth Series #9),E. E. Knight,The race is on to claim the area between the O...,Horror
4,Preacher Book Five (NOOK Comic with Zoom View),Garth Ennis,"Writer Garth Ennis’s violent, scabrous journey...",Horror
...,...,...,...,...
395,Castles in the Sand: Bennett: The Patriot,Allie McCormack,A proud fighter pilot for the Taqara National ...,Romance
396,Wish Upon A Rockstar: A Steamy Wrong Sister Ro...,Annabeth Saryu,She never intended to deceive him. Now he's in...,Romance
397,Jungle Inferno,Desiree Holt,"For Faith and Mark, the telepathic connection ...",Romance
398,The Bowie Brothers (A Calamity Falls Small Tow...,Erika Kelly,"Steamy, small town romance starring four sexy,...",Romance


In [123]:
Data = df.append([df1,df2,df3])

In [126]:
Data

Unnamed: 0,title,author,overview,subject
0,The Book of Accidents: A Novel,Chuck Wendig,A family returns to their hometown—and to the ...,Horror
1,The Final Girl Support Group,Grady Hendrix,THE INSTANT NEW YORK TIMES BESTSELLER“The horr...,Horror
2,The Institute: A Novel,Stephen King,From #1 New York Times bestselling author Step...,Horror
3,The Southern Book Club's Guide to Slaying Vamp...,Grady Hendrix,The New York Times Best SellerA Barnes & Noble...,Horror
4,Lisey's Story,Stephen King,*Now an Apple TV+ limited series starring Juli...,Horror
...,...,...,...,...
395,Castles in the Sand: Bennett: The Patriot,Allie McCormack,A proud fighter pilot for the Taqara National ...,Romance
396,Wish Upon A Rockstar: A Steamy Wrong Sister Ro...,Annabeth Saryu,She never intended to deceive him. Now he's in...,Romance
397,Jungle Inferno,Desiree Holt,"For Faith and Mark, the telepathic connection ...",Romance
398,The Bowie Brothers (A Calamity Falls Small Tow...,Erika Kelly,"Steamy, small town romance starring four sexy,...",Romance


In [127]:
Data.to_csv('new_releases_Horror&Romance.csv', index=False)