##  <center>Assessment 3</center>
##  <center>WebCrawler and NLP System</center>

## Task1

## Overview:

<font size="3"> The majority of wine drinkers are not experts. Most buy wine by price, label type and reputation (aka marketing) [1]. To buy a wine without being disappointed, one would require a trial an error approach, record metadata for each bottle, for example, price, grape, blend, region, country etc, and do not forget the wine drinkers descriptives such as 'a subtle hint of chocolate'. This would take considerable effort, expense, and discipline. What if chance could be taken out of buying wine, the change of being disappointed is better than 51%?<br><br>This project compares wine reviews and sentiment from wine experts and the average wine drinker, aka wine plebs. The project goal uses NLP to produce a review that is a balance between the expert and the average wine drinker. To achieve this goal, a WebCrawler takes a wine list from a popular Australian website, nicks.com.au. This website was chosen due to large number of wine items available. The wine items chosen are red wines between \\$15&ndash;\$40, mainly to reduce the wine list, however, this price range will be typlical for the non-wine expert. Another WebCrawler is used to find wine reviews from the experts and non-experts. Natural Language Processing sentiment analysis is used fine key words within each review, these words sentiment are then used to create a new review. This new review is a complementary decision maker for the average wine drinker that removes the impact of label type and reputation from the purchase.</font>

    

## References
[1] Spence, C (2020). Wine psychology: basic & applied. _Cogn. Research_ _5_, 22 . https://doi.org/10.1186/s41235-020-00225-6

## Task2

<font size="3">This section scrapes 10,080 red wine products between \\$15&ndash;\$40, using 'vintage', 'product', 'price' and 'rating'. The scraped then iterates through the wine list, adds wine name (product) to the search bar of nicks.com.au, then scrapes the wine description. Description to be used in NLP analysis.</font>

In [None]:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd

In [40]:
# Chrome driver setup with loading/scroll

def selem_drive(driver):
    driver.set_window_size(1024, 1000)
    # 11,571 red wines between $15-$40
    url = 'https://www.nicks.com.au/red-wines?cat=9&dir=desc&limit=60&mode=list&order=score&price=3.00-.00000'  
    driver.get(url)
    # scroll the page, wait 3 seconds and continue until page stops loading
    page_len = driver.execute_script(
        "window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
    match = False
    while match == False:
        last_count = page_len
        time.sleep(3)
        page_len = driver.execute_script(
            "window.scrollTo(0, document.body.scrollHeight);var lenOfPage=document.body.scrollHeight;return lenOfPage;")
        if last_count == page_len:
            match = True

    return driver


# get wine list as defind in driver.get(url)
vintage_list, product_list, rating_list, price_list = [], [], [], []


def get_elements_nicks(driver):
    xpath = '/html/body/div[1]/main/menu[2]/span'
    check_num_items = WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, xpath)))
    # current_total = int(check_num_items.text[2:4])
    total_items = int(check_num_items.text[7:13])
    total_pages = round((total_items / 60 + 1))  # 194 pages
    tot_to_use = total_pages - 24  # reduce to 10,080 wines
    # print(total_items - current_total)

    for i in range(2, tot_to_use):

        driver.find_element_by_link_text(str(i)).click()

        try:
            for res in driver.find_elements_by_class_name("info"):
                vintage_list.append(res.text.splitlines()[0])
                product_list.append(res.text.splitlines()[1])
                rating_list.append(res.text.splitlines()[2])
                price_list.append(res.text.splitlines()[3])
        except IndexError as e:
            print(e.args)

    # print(vintage_list, product_list, rating_list, price_list)
    #print(len(vintage_list), len(product_list), len(rating_list), len(price_list))
    df = ({'Vintage': vintage_list, 'Wine': product_list, 'Rating': rating_list, 'Price': price_list})
    wine_data = pd.DataFrame(df)
    wine_data.to_csv('nicks_wine_data_list.csv', index=False, header=True, encoding='utf-8')
    print(wine_data.head())
    

# search wine list (nicks_wine_data.csv) and combines wine list with wine descriptions to nicks_wine_data_.csv
desc_list = []


def search_elements_nicks(driver):
    wine_data = pd.read_csv("nicks_wine_data_list.csv")
    result = wine_data['Vintage'].apply(str) + ' ' + wine_data['Wine'].apply(str)

    try:
        for i in result:
            search = driver.find_element_by_xpath('//*[@id="search"]')
            search.send_keys(i)
            search.send_keys(Keys.RETURN)
            element = driver.find_element_by_xpath('//*[@id="hits"]/div/div/div/ul/li[2]/a')
            element.click()

            for desc in driver.find_elements_by_xpath('/html/body/div[1]/main/div[1]/div[2]/div[3]'):
                desc_list.append(desc.text)
    except:
        pass

    df_desc = ({'Description': desc_list})
    nicks_wine_data_desc = pd.DataFrame(df_desc)
    nicks_wine_data_desc.to_csv('nicks_wine_data_desc.csv', index=False, header=True, encoding='utf-8')

    wine_data_list = pd.read_csv("nicks_wine_data_list.csv")
    wine_data_desc = pd.read_csv("nicks_wine_data_desc.csv")
    nicks_wine_data = pd.concat([wine_data_list, wine_data_desc], axis=1)
    # wine dataset to use
    nicks_wine_data.to_csv('nicks_wine_data.csv', index=False, header=True, encoding='utf-8')
    print(nicks_wine_data.head())


if __name__ == '__main__':
    driver = webdriver.Chrome(executable_path=r'C:/seleniumChromeDriver/chromedriver_win32/chromedriver.exe')
    selem_drive(driver)
    get_elements_nicks(driver)
    search_elements_nicks(driver)

  Vintage                                     Wine Rating    Price
0    2018       Powell & Son Barossa Valley Shiraz     96  $357.00
1    2018            Brave Souls The Whaler Shiraz     96  $258.00
2    2016         Incygnes Green’s Vineyard Shiraz     96  $354.00
3    2018  Heathcote Estate Single Vineyard Shiraz     96  $479.88
4    2018     Mr Riggs Outpost Coonawarra Cabernet     96  $239.88
   Vintage                                     Wine  Rating    Price  \
0     2018       Powell & Son Barossa Valley Shiraz      96  $357.00   
1     2018            Brave Souls The Whaler Shiraz      96  $258.00   
2     2016         Incygnes Green’s Vineyard Shiraz      96  $354.00   
3     2018  Heathcote Estate Single Vineyard Shiraz      96  $479.88   
4     2018     Mr Riggs Outpost Coonawarra Cabernet      96  $239.88   

                                         Description  
0  Fruit for this wine is sourced from mature vin...  
1  New comer Julia Weirich in collaboration with ...  
