# Loan Default Prediction - Classification

## Part 0a. Web Scraping + Data Collection

**This notebook contains code to scrape Lendingclub.com for raw loan data.**

---

1. 19 data files are available from Lendingclub.com
2. It is possible to individually select each file from the dropdown menu, agree to terms, then click download to retrieve each file. However, it is a good exercise to learn and use web scraping.

---
<a id = 'toc'></a>
**Table of contents**
1. [Download file](#download)
2. [Notes](#notes)

In [1]:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

In [2]:
# choose download folder
# reference: https://stackoverflow.com/questions/35331854/downloading-a-file-at-a-specified-location-through-python-and-selenium-using-chr
options = webdriver.ChromeOptions() 
options.add_experimental_option("prefs", {
  "download.default_directory": '/proj-classification-loanDefault-webScrape-realData-AWS/data/raw/',
  "download.prompt_for_download": False,
  "download.directory_upgrade": True,
  "safebrowsing.enabled": True
})

In [55]:
# access download page
# reference: http://jonathansoma.com/lede/foundations-2017/classes/more-scraping/selenium/
driver = webdriver.Chrome(options = options)
driver.get("https://www.lendingclub.com/info/download-data.action")

In [56]:
# access file selector menu
# reference: https://sqa.stackexchange.com/questions/1355/what-is-the-correct-way-to-select-an-option-using-seleniums-python-webdriver
select = Select(driver.find_element_by_id('loanStatsDropdown'))

[back to top](#toc)

<a id = 'download'></a>
### 1. Download files

In [78]:
# download file
# reference: https://irwinkwan.com/2013/04/05/automating-the-web-with-selenium-complete-tasks-automatically-and-write-test-cases/
for i in range(len(select.options)):
    # step 1 - select file 
    print ('Selecting data file {}: {}...'.format(i, select.options[i].text))
    select.select_by_value(str(i))
    
    # step 2 - click download
    driver.find_element_by_id('currentLoanStatsFileNameHandler').click()
    
    # step 3 - agree to terms, download should start at this point
    WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="currentLoanStatsFileName"]'))).click()
    driver.find_element_by_id('currentLoanStatsFileName').click()
    print ('Downloading data file {}...'.format(i))
    
    #step 4 - close the download window
    driver.find_element_by_xpath('//*[@id="myModal"]/div[2]/div/div[1]/button').click()
    print ('Onto the next file\n')
    
print ('All data files available have been extracted, please check browser window for completion.')

Selecting data file 0: 2007 - 2011...
Downloading data file 0...
Onto the next file

Selecting data file 1: 2012 - 2013...
Downloading data file 1...
Onto the next file

Selecting data file 2: 2014...
Downloading data file 2...
Onto the next file

Selecting data file 3: 2015...
Downloading data file 3...
Onto the next file

Selecting data file 4: 2016 Q1...
Downloading data file 4...
Onto the next file

Selecting data file 5: 2016 Q2...
Downloading data file 5...
Onto the next file

Selecting data file 6: 2016 Q3...
Downloading data file 6...
Onto the next file

Selecting data file 7: 2016 Q4...
Downloading data file 7...
Onto the next file

Selecting data file 8: 2017 Q1...
Downloading data file 8...
Onto the next file

Selecting data file 9: 2017 Q2...
Downloading data file 9...
Onto the next file

Selecting data file 10: 2017 Q3...
Downloading data file 10...
Onto the next file

Selecting data file 11: 2017 Q4...
Downloading data file 11...
Onto the next file

Selecting data file 12

In [79]:
# stop scraper
driver.quit()

<a id = 'notes'></a>

### 2. Notes:
The code above contains a bug due to two click methods in step 3. I thought the first one alone is sufficient, but after some testing, it appears that some files were not downloaded. Therefore, a second click() was added as a back up. As a result of the current set up, all files were downloaded, but some twice.

The question is posted on Stackoverflow:
https://stackoverflow.com/questions/59168568/selenium-python-select-from-dropdown-click-button-modal-window-bug

No answer as of 2020/03/13

---

**End of current notebook**

**[back to top](#toc)**

---

**Next notebook: [part 0b - dataAggregation](proj-classification-loanDefault-p0b-dataAggregation-max-v2019Dec.ipynb)**