# Gift Giving
### Finding the perfect gift
---

**Introduction**

What should you buy for mother's day? Your mom never really asks for anything and it seems like she has everything she wants, though you know that can't be true. Plus, you feel pressure to get her something she will really like but it seems so difficult that you want to give up.

It is hard to come up with a meaningful gift. If you are anything like me, you love to think of the perfect suprise and watch for the look of suprise and happiness.

**Gifts matter**

An effective gift can strengthen a relationship and build friendship between the giver and receiver. 


**The Problem with Presents**

In, "Money Can't Buy Love" we learn that the amount of money spent on a gift is not correlated with the satisfaction it generates. Wanting to impress the receiver, gift givers often overspend. Unfortunately, this often produces a gift the receiver doesn't really want. Though anthropologists label gift giving a positive social process, economists label it as an objective waste of resources. (i.e. The deadweight loss of Christmas)

Several psycology papers highlight this unfortunate disconnect between the giver and the receiver. Some hypothesize that people tend to have trouble learning from their own gift receiving experiences.

It takes time to think of a gift and often, we may not know what the person wants at all. We may end up sacrificing a stronger relationship for the time we save when we give a lame gift.

**Solution**

Using the tools of data science and new psycology studies, we can provide assistance to gift givers and help them spend less time finding a meaningful gift.

### Quantify Meaningful Gifts
---
Recent studies highlight three key principles that can lead to a meaningful gift.

1. Give something that reflects their personal interests
2. Give something they can actually use
3. Give something that provides long lasting value

(see "Why Certain Gifts Are Great to Give but Not to Get")

Next we need data to analyze people's interests, the usefullness of the gifts and how long they last. We will start by determining people's interests.

Amazon public wishlist data is availible online. Because it is owned by Amazon, we can't use it commercially, but we can use it in research and development. We will begin by writing a script that scrapes Amazon wishlists. 


# Amazon Wishlist Webscraper

The following code searches through Amazon wishlists and retrives item names, prices and categories. Unfortunately, Amazon does not allow their data to be distributed so this code is only for personal use.

To begin, I collected the top ten names for each decade from the US social security web page. We will read them in and use them to search the Amazon lists.

In [36]:
topNames = set()

#Open the top names file and parse
file = open('topNames.txt')
lines = [l.split() for l in file.readlines()]

#Handpicked indices for each line
#based on file
for l in lines:
    topNames.add(l[1])
    topNames.add(l[3])

topNames = list(topNames)
print(topNames)

['Sarah', 'Charles', 'Heather', 'Thomas', 'Richard', 'Robert', 'Patricia', 'Abigail', 'Megan', 'Brian', 'Matthew', 'Melissa', 'Joseph', 'Ethan', 'Linda', 'Ashley', 'Nicholas', 'Mary', 'Deborah', 'Amy', 'Jeffrey', 'John', 'Elizabeth', 'Donna', 'Hannah', 'Samantha', 'Andrew', 'William', 'Olivia', 'Jacob', 'Christopher', 'Michelle', 'Isabella', 'Emma', 'Susan', 'James', 'Michael', 'Brittany', 'David', 'Stephanie', 'Angela', 'Nancy', 'Lisa', 'Barbara', 'Mark', 'Amanda', 'Jessica', 'Joshua', 'Madison', 'Taylor', 'Jason', 'Kimberly', 'Debra', 'Cynthia', 'Jennifer', 'Tyler', 'Karen', 'Daniel', 'Nicole', 'Emily']


In [1]:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
import numpy as np
from bs4 import BeautifulSoup
import time
import sqlite3

By changing the swapping out the names in the wishlist lookup URL we can find pages of random people and their wishlists.

In [38]:
#The url for searching wish lists
wishlistSearch = 'https://www.amazon.com/gp/registry/search/'
siteVar = 'ref=cm_wl_search__sortbar_page_2?ie=UTF8&field-firstname=&field-lastname=&'
changeVar = 'field-name={}&index=us-xml-wishlist&page={}&submit.search=1'

#Put together the pieces:
searchPage = wishlistSearch+siteVar+changeVar.format(topNames[5],2)
print(searchPage)

https://www.amazon.com/gp/registry/search/ref=cm_wl_search__sortbar_page_2?ie=UTF8&field-firstname=&field-lastname=&field-name=Robert&index=us-xml-wishlist&page=2&submit.search=1


The above url works!

### Functions for scraping each portion

** 1. Find a bunch of random people **

In [39]:
def parseSearchResults(searchPage,browser):
    #Open the search page
    browser.get(searchPage)
    searchResultsHTML = browser.page_source
    soup = BeautifulSoup(searchResultsHTML,'html.parser')
    
    #Create an empty dictionary
    namesUrls = {'Name':[],'Place':[],'userUrls':[]}
    
    #Find the first wishlist box
    name_box = soup.find('div', attrs={'class':
"a-box a-spacing-top-medium a-color-base-background a-text-left people-box"})
    
    while name_box:
        #Get the name and user url
        textSection = name_box.find('div',attrs={'class':
                 'a-section a-spacing-none a-spacing-top-none'})
        namesUrls['userUrls'].append(textSection.find('a').get('href'))
        namesUrls['Name'].append(textSection.find('a').text.strip())


        #If displayed, get where the user is from
        item1 = None
        item2 = None
        
        textSection = textSection.find_next_sibling()
        if textSection:
            item1 = textSection.text
            textSection = textSection.find_next_sibling()
            if textSection:
                item2 = textSection.text

        #Check which item is a birthday and which is a place
        if item1 and item2:
            place = item2.strip()
        else:
            try: 
                int(item1.strip()[-1])
                place = None
            except ValueError:
                place = item1.strip()
            except TypeError:
                place = None
            except AttributeError:
                place = None
                
        namesUrls['Place'].append(place)
        
        #Iterate
        name_box = name_box.find_next_sibling()
    
    #Rest
    time.sleep(1)
    
    return namesUrls

** 2. Get people's wishlists **

In [40]:
def findWishlistUrls(userUrl,browser):
    """Parses the user list page and returns
    the wish lists urls
    """
    #To store wishlist urls
    wishlists = []
    
    try:
        browser.get(userUrl)
        userPageHTML = browser.page_source
        soup = BeautifulSoup(userPageHTML,'html.parser')
        listsBox = soup.find('div', attrs={'id':'my-lists-tab'})
        if listsBox:
            listsBox = listsBox.find('div',attrs={"aria-expanded":"true"})
            for l in listsBox.findAll('a'): 
                wishlists.append(str(l.get('href')))
        
    except TimeoutException:
        pass
    
    #Rest
    time.sleep(1)
    
    return wishlists

** 3.Get all the items in each wishlist **

In [47]:
def parseWishlist(wishlistUrl,browser):
    allItems = {
        'Name':[],
        'Price':[],
        'Url':[]
    }
    try:
        browser.get(wishlistUrl)
        time.sleep(3)
        #Scroll to the bottom of page with arbitarily large number
        browser.execute_script("window.scrollTo(0,10000000)")
        time.sleep(3)
        
    except TimeoutException:
        pass
    
    #Seach through the HTML for the data
    listPageHTML = browser.page_source
    soup = BeautifulSoup(listPageHTML,'html.parser')

    gifts = soup.find('ul', attrs={'id':'g-items'})
    try:
        item = gifts.find('li')

        while item:
            #Find the name and url
            nameLink = item.find('a', attrs={'class':'a-link-normal'})

            #Store the data
            if nameLink:
                allItems['Name'].append(nameLink.get('title'))
                allItems['Url'].append(nameLink.get('href'))
                allItems['Price'].append(item.get('data-price'))

            #iterate
            item = item.find_next('li',attrs={'class':'a-spacing-none g-item-sortable'})

    except AttributeError:
        return None
    
    return allItems

**4. Get the item category for each item**

In [2]:
def getCategory(itemUrl,browser):
    #Find the category
    try:
        browser.get(itemUrl)
        soup = BeautifulSoup(browser.page_source,'html.parser')
    except TimeoutException:
        return ''
    
    breadcrumb = soup.find('div',attrs={'id':'wayfinding-breadcrumbs_container'})
    
    if breadcrumb:
        #Process Strings
        category = breadcrumb.text
        category = category.split('\n')
        category = [c.strip() for c in category]
        category = ', '.join(category)
        return category
    
    else:
        return ''
    

### Execute Scraping

In [3]:
#Browser Object
browser = webdriver.Chrome()

#Urls
amazon = 'https://www.amazon.com'
wishlistSearch = amazon + '/gp/registry/search/ref=cm_wl_search__\
sortbar_page_2?ie=UTF8&field-firstname=&field-lastname=&'

We store the data in an SQL database

In [4]:
#Create a sql database to store data

db = sqlite3.connect("amazonData.SQL")
cur = db.cursor()

#Create tables to store data
#cur.execute("""DROP TABLE IF EXISTS people """)
#cur.execute("""DROP TABLE IF EXISTS wishlists """)
#cur.execute("""DROP TABLE IF EXISTS items """)

#cur.execute("""CREATE TABLE people (ID TEXT, Name TEXT,Place TEXT,userUrls TEXT)""")
#cur.execute("""CREATE TABLE wishlists (ID TEXT, wishlistUrl TEXT)""")
#cur.execute("""CREATE TABLE items (ID TEXT, Name TEXT, Price FLOAT, Url TEXT, Category TEXT)""")


**Execute Function 1**

In [None]:
#Browser Object
browser = webdriver.Chrome()

#Urls
amazon = 'https://www.amazon.com'
wishlistSearch = amazon + '/gp/registry/search/ref=cm_wl_search__\
sortbar_page_2?ie=UTF8&field-firstname=&field-lastname=&'

#Static Variable
nextID = 1

#Urls with variables to edit
changeVar = 'field-name={}&index=us-xml-wishlist&page={}&submit.search=1'

#Loop through five random pages of search results for each name
for commonName in topNames:
    for pageNum in np.random.randint(40,size=5):
        searchPage = wishlistSearch+changeVar.format(commonName,pageNum)
        peopleInfo = parseSearchResults(searchPage,browser)
        
        #Create ID numbers
        numUsers = len(peopleInfo['Name'])
        ID = [str(id) for id in range(nextID,nextID+numUsers)]
        nextID += numUsers + 1

        #Store User Data
        peopleInfo['userUrls'] = [amazon+url for url in peopleInfo['userUrls']]
        peopleTuple = zip(ID,peopleInfo['Name'],
                            peopleInfo['Place'],
                            peopleInfo['userUrls'])
        cur.executemany("INSERT INTO people VALUES(?,?,?,?)",peopleTuple)
        


**Execute Function 2**

In [23]:
#Loop through each user and find all wishlists
cur.execute("SELECT ID, userUrls FROM people")

for  idUrl in cur.fetchall():
    #Scrape Data
    ID,userUrl = idUrl
    wishlists = findWishlistUrls(userUrl,browser)
    
    #Store Data
    wishlists = [amazon+wl for wl in wishlists]
    wlTuple = zip([ID]*len(wishlists),wishlists)
    cur.executemany("INSERT INTO wishlists VALUES(?,?)",wlTuple)

**Execute Function 3**

In [48]:
#Loop through each wishlist to find all items
cur.execute("SELECT ID, wishlistUrl FROM wishlists")

for idUrl in cur.fetchall():
    #Scrape data
    ID,wlUrl = idUrl
    items = parseWishlist(wlUrl,browser)
    
    #Store Data
    if items:
        items['Url'] = [amazon + url for url in items['Url']]
        itemTuple = zip([ID]*len(items['Price']),
                       items['Name'],
                       items['Price'],
                       items['Url'])
        cur.executemany("INSERT INTO items VALUES(?,?,?,?)",itemTuple)

db.commit()

**Execute Function 4**

In [None]:
cur.execute("SELECT Url, Name FROM items")
items = cur.fetchall()
Categories = []

for it in items:
    url,Name = it
    cat = getCategory(url,browser)
    
    
    cur.execute('''\
    UPDATE items\
    SET Category=?
    WHERE Name = ?;
    ''',(cat,Name))
    
db.commit()

Amazon started captcha blocking me, but not before I got 14464 item categories.

In [8]:
cur.execute("Select Category from items")

<sqlite3.Cursor at 0x10650aa40>

In [9]:
cat = cur.fetchall()
cat[:5]

[(u', , , , , Clothing, Shoes & Jewelry, , , , \u203a, , , , Women, , , , \u203a, , , , Clothing, , , , \u203a, , , , Tops & Tees, , , , \u203a, , , , Knits & Tees, , , , , ',),
 (u', , , , , Clothing, Shoes & Jewelry, , , , \u203a, , , , Women, , , , \u203a, , , , Clothing, , , , \u203a, , , , Tops & Tees, , , , \u203a, , , , Blouses & Button-Down Shirts, , , , , ',),
 (u', , , , , Clothing, Shoes & Jewelry, , , , \u203a, , , , Women, , , , \u203a, , , , Clothing, , , , \u203a, , , , Tops & Tees, , , , \u203a, , , , Vests, , , , , ',),
 (u', , , , , Clothing, Shoes & Jewelry, , , , \u203a, , , , Women, , , , \u203a, , , , Clothing, , , , \u203a, , , , Tops & Tees, , , , \u203a, , , , Blouses & Button-Down Shirts, , , , , ',),
 (u', , , , , Clothing, Shoes & Jewelry, , , , \u203a, , , , Women, , , , \u203a, , , , Clothing, , , , \u203a, , , , Tops & Tees, , , , \u203a, , , , Blouses & Button-Down Shirts, , , , , ',)]

In [25]:
cat[14564]

(u', , , , , Beauty & Personal Care, , , , \u203a, , , , Hair Care, , , , \u203a, , , , Styling Tools & Appliances, , , , \u203a, , , , Hot-Air Brushes, , , , , ',)

In [26]:
len(cat)

227640

The scraper was a success! Next step of this project will be finding a way to predict what gifts people want.

**References**
* [Why Certain Gifts Are Great to Give but Not to Get ](http://journals.sagepub.com/doi/pdf/10.1177/0963721416656937)
* [Money can’t buy love](https://doi.org/10.1016/j.jesp.2008.11.003)
* [Give a piece of you](https://doi.org/10.1016/j.jesp.2015.04.006)
* [Gift giving behavior](https://www.ideals.illinois.edu/bitstream/handle/2142/27449/giftgivingbehavi449belk.pdf?sequence=1)
* [Sentimental value and gift giving](http://dx.doi.org/10.1016/j.jcps.2017.06.002 )
* [How to choose a hobby](https://hobbylark.com/misc/How-to-Choose-a-Hobby)