#  Initial Thoughts

My thoughts as I read over the sections:  

**Background**  
- Context is clear, not much further to note

**The Goal**
- CVS "or" Walgreens - Probably best to use both and combine
- CVS has an option for ZIP Code and distance ceiling. Lists generic locations that are not CVS stores.
- Walgreens just has ZIP Code, apparently setting distance ceiling to 50 miles automatically. Only lists Walgreens stores.
- It \*appears\* that the CVS page changes in full when parameters are changed, but the Walgreens site changes content.
    - Noticed that CVS includes search arguments in URL, Walgreens doesn't.
    - Walgreens is a .jsp page, so definitely dynamic. 

**Write A Script...**
- CVS listings don't always have hours, and also do have names (unlike Walgreens results which are all brick & mortar stores).
- Manhattan is pretty small (22.82 square miles according to Google).
    - Means that scraping the results off of a single ZIP code with a distance of even 10 miles might get all results.
        - Note: we don't want results *only* on the island - for some people in Manhattan, the nearest off-island location might be closer than the nearest on-island location; I think the above point still holds, however.
    - I'll test using both a single ZIP code and all Manhattan ZIP codes to see how this works out.


At this point I noticed this file has had 4 revisions pushed to GitHub. They're interesting for getting ideas about what might be expected, and the dates of revisions are interesting also, but nothing too important in there. Only once I went through these did I notice that there was the typo 'dynamtic', though.

# The Code

##  Imports

In [2]:
import requests
import bs4
import lxml
from bs4 import BeautifulSoup
import fnmatch
import timeit
import time
import pandas as pd
import numpy as np
import selenium
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys

## Get Manhattan ZIP codes

In [3]:
# zcp == ZIP code page
zcp = requests.get('http://www.citidex.com/map/zipco.html')
print(zcp.status_code, zcp.headers)

soup = BeautifulSoup(zcp.content, 'lxml');
table = soup.find_all("table", {"bgcolor":"#ffffff"} ) # There are 3 tables of ZIP codes, with this defining characteristic
zc = [] # The list of ZIP codes to be filled

200 {'Date': 'Mon, 14 May 2018 15:41:48 GMT', 'Server': 'Apache', 'Content-Encoding': 'gzip', 'Vary': 'Accept-Encoding', 'X-UA-Compatible': 'IE=EmulateIE7', 'X-Powered-By': 'PleskLin', 'Content-Length': '5052', 'Keep-Alive': 'timeout=5, max=100', 'Connection': 'Keep-Alive', 'Content-Type': 'text/html'}


I found two ways of grabbing the zip codes off of the page:

In [4]:
# Method 1: filter based on td attributes
def findEntries(tag):
    return tag.has_attr('align') and tag.has_attr('class') and not tag.has_attr('width')

t1 = timeit.default_timer()
for t in table:
    rows = t.find_all('tr')
    for r in rows:
        entry = r.find_all(findEntries)
        if(len(entry) in {2,3}):
            #print(entry[-1].contents)
            zc.append(entry[-1].contents[0])
t2 = timeit.default_timer()
        
# Method 2: filter based on finding a ZIP code-like number
zc2 = []
def findZips(tag):
    try:
        return float(tag.contents[0]) # checking conversion and empty-ness
    except:
        return False

t3 = timeit.default_timer()
for t in table:
    rows = t.find_all('tr')
    for r in rows:
        entries = r.find_all(findZips)
        for e in entries:
            #print(e)
            zc2.append(e.contents[0])
t4 = timeit.default_timer()

# After trying this code chunk a few times, the second method is slightly quicker
print(zc == zc2, t2 - t1, t4 - t3, (t4-t3)/(t2-t1)) 

True 0.0036321576816717993 0.003858353350794985 1.0622758395827885


There are some duplicate ZIP codes; let's remove them.

In [5]:
print(len(zc), len(zc) - len(set(zc))) # There are some duplicates, let's remove them
zc = list(set(zc))
print(len(zc))

108 13
95


## Set Up DataFrame Logistics

I was at first going to append to the DataFrame row-by-row, but then remembered this is terribly inefficient. 

So, there's not a ton to do here, but we can at least set up our columns.

In [6]:
columns=['Name', 'Address', 'City', 'State', 'ZIP', 'Hours']

## CVS

### Toy example using my ZIP code to figure things out

In [7]:
urlString = 'http://www.cvssavingscentral.com/storelocator/SaferCommunities.aspx?zipcode=10567&distance=40'
zcp = requests.get(urlString)#; print(zcp.status_code)

soup = BeautifulSoup(zcp.content, 'lxml');
table = soup.find("table", {'class':'address_table'})
rows = table.find_all('tr')
rows = rows[1:]           # toss header row
data = []
for r in enumerate(rows):
    
    #for j in enumerate(r[1].contents[1:7]):
    #    print(j[0], j[1].contents, j[1])
    outputRow = [x for nested in map(lambda x: x.contents, r[1].contents[1:6]) for x in nested]
#     if (r[1].contents[6] == '\n'):
#         print("!")
#     else:
#         #print(r[1].contents[6].contents[0].strip())
    last = r[1].contents[6].contents[0].strip()
    if last:
        outputRow.append(last)
    else:
        outputRow.append('N/A')
    #print(outputRow)
    data.append(outputRow)

In [None]:
pd.DataFrame(data, columns=columns).sort_values('Address') # Since some places are duplicated, it will be easier to see which if indexed by address

### Scraping with Every Manhattan ZIP Code

In [8]:
data = []
for z in log_progress(zc):
    urlString = 'http://www.cvssavingscentral.com/storelocator/SaferCommunities.aspx?zipcode='+z+'&distance=10'
    zcp = requests.get(urlString);
    soup = BeautifulSoup(zcp.content, 'lxml');
    table = soup.find("table", {'class':'address_table'})
    rows = table.find_all('tr')
    rows = rows[1:]           # toss header row
    for r in enumerate(rows):
        outputRow = [x for nested in map(lambda x: x.contents, r[1].contents[1:6]) for x in nested]
        last = r[1].contents[6].contents[0].strip()
        if last:
            outputRow.append(last)
        else:
            outputRow.append('N/A')
        data.append(outputRow)

A Jupyter Widget

In [10]:
CVS = ((pd.DataFrame(data, columns=columns)).drop_duplicates()).sort_values("Address"); CVS
#print(len(CVS))

Unnamed: 0,Name,Address,City,State,ZIP,Hours
457,Paramus Police Department,1 Carlough Drive,Paramus,NJ,7652,
11,Lodi Police Department,1 Memorial Drive,Lodi,NJ,7644,
122,Elizabeth Police Department,1 Police Plaza,Elizabeth,NJ,7201,
8,Leonia Police Department,1 Wood Park,Leonia,NJ,7605,
915,Nassau County Sixth Precinct,100 Community Dr,Manhasset,NY,11003,
23,Tenafly Police Department,100 Riveredge Road,Tenafly,NJ,7670,
25,Maywood Police Dept,15 Park Avenue,Maywood,NJ,7607,"24 hours a day, 7 days a week"
12,Belleville Police Department,152 Washington Avenue,Belleville,NJ,7109,
27,Mount Vernon Police Department,2 Roosevelt Square No.,Mt Vernon,NY,10550,
7,Little Ferry Police Department,215 Liberty Street #217,Little Ferry,NJ,7643,


### Scraping with a Single Manhattan ZIP Code

In [11]:
dataSingle = []
for z in log_progress(zc[0:1]):
    print(z)
    urlString = 'http://www.cvssavingscentral.com/storelocator/SaferCommunities.aspx?zipcode='+z+'&distance=20'
    zcp = requests.get(urlString);
    soup = BeautifulSoup(zcp.content, 'lxml');
    table = soup.find("table", {'class':'address_table'})
    rows = table.find_all('tr')
    rows = rows[1:]           # toss header row
    for r in enumerate(rows):
        outputRow = [x for nested in map(lambda x: x.contents, r[1].contents[1:6]) for x in nested]
        last = r[1].contents[6].contents[0].strip()
        if last:
            outputRow.append(last)
        else:
            outputRow.append('N/A')
        dataSingle.append(outputRow)

A Jupyter Widget

10019


Using a the first ZIP code (or even 5, 10, or 20) actually doesn't quite hit all the places covered when using the full list. 

This is even true when trying a centrally located ZIP code (10019) to ensure we're making full use of our distance range.

However, if we increase the distance covered to 20 miles, we get all of the many-ZIP-code-method locations and more.

In [12]:
CVSSingle = ((pd.DataFrame(dataSingle, columns=columns)).drop_duplicates()).sort_values("Address"); CVSSingle
#print(len(CVSSingle))

Unnamed: 0,Name,Address,City,State,ZIP,Hours
22,Paramus Police Department,1 Carlough Drive,Paramus,NJ,7652,
46,Rahway Police Department,1 City Hall Plaza,Rahway,NJ,7065,
11,Lodi Police Department,1 Memorial Drive,Lodi,NJ,7644,
17,Elizabeth Police Department,1 Police Plaza,Elizabeth,NJ,7201,
33,Caldwell Police Department,1 Provost Square Avenue,Caldwell,NJ,7006,
8,Leonia Police Department,1 Wood Park,Leonia,NJ,7605,
29,Nassau County Sixth Precinct,100 Community Dr,Manhasset,NY,11003,
40,Springfield Police Department,100 Mountain Avenue,Springfield Township,NJ,7081,
16,Tenafly Police Department,100 Riveredge Road,Tenafly,NJ,7670,
34,Westwood Police Department,101 Washington Avenue,Westwood,NJ,7675,


In [13]:
a = CVS['Name'].values
b = CVSSingle['Name'].values
print(len([x for x in a if x not in b]), len([x for x in b if x not in a]))

0 26


There are 0 rows in CVS that aren't in CVSSingle, and 26 rows in CVSSingle not in CVS.

## Walgreens

Being that this site is dynamically generated, and the simple approach above will not work correctly, this will definitely be more difficult.

**Plan**
- I'm guessing we will need selenium here, in order to deal with dynamic elements
- Use that to get to the right results page, searching using ZIP code 10019, and 20 miles.
- Parse the results into a DataFrame, concat with CVS DataFrame


### Getting Page Source with Selenium

From tibits I read on stackoverflow here and there, using xpath isn't exactly the most stable. However, it's the only way I managed to make things work sometimes. 'ID' is probably more stable, and so is used wherever it works.

In [30]:
driver = webdriver.Chrome(executable_path=r'C:\Users\Avik\chromedriver_win32\chromedriver.exe')
driver.get('https://www.walgreens.com/storelocator/find.jsp')

textbox = driver.find_element_by_id('resultsPageTextField')
textbox.send_keys(Keys.CONTROL + 'a')
textbox.send_keys(Keys.BACKSPACE)
#textbox.send_keys('10019')  Some testing shows that just using the word 'Manhattan' works better
textbox.send_keys('Manhattan')
time.sleep(1)
textbox.send_keys(Keys.ENTER)

time.sleep(1)

driver.find_element_by_id('moreFilterToggle').click()
driver.find_element_by_xpath('//*[@id="accordion4"]/section[3]').click()
time.sleep(1) # Without this, selects the Optical Services option. Loading issues?
driver.find_element_by_xpath('//*[@id="healthservicescontents"]/section/aside[4]/section').click()
driver.find_element_by_id('store-refine-btn1').click()

time.sleep(4)

result = driver.page_source
driver.close()

In [31]:
soup = BeautifulSoup(result, 'lxml');

### Parsing Addresses

Based on the way the soup came out, looks like we will need to grab all adresses and then split them up.

We should also try to differentiate Duane Reade stores from regular Walgreens.  
    \- In each address tag, found that the Duane Reade read stores have 5 'p' tags, otherwise only 4 exist.

In [20]:
addr = soup.find_all("address", {"class": "mb0 p0"})

In [21]:
print(len(addr)) #Turn's out there's 90 because for each of the 45 results, it adds a mobile result as well
addr = addr[:45]
print(len(addr))

90
45


In [None]:
a = addr[0]
(a.find_all("p"))

In [22]:
street = [] # Street name and number
cityState = [] # City, State, ZIP
isDuane = []
for a in enumerate(addr):
    elmnt = a[1].find_all("p")
    if len(elmnt) == 5:
        isDuane.append(a[0])
        #addresses.append(elmnt[1].contents[0] + ' ' + elmnt[2].contents[0])
        street.append((elmnt[1].contents[0]).title())
        cityState.append(elmnt[2].contents[0]) #Applied only to City, not state
    else:
        street.append((elmnt[0].contents[0]).title())
        cityState.append(elmnt[1].contents[0])  

In [23]:
cityState = list(map(lambda x: x.split(", "), cityState));

In [24]:
city = [x[0].title() for x in cityState]
state = [x[1].split()[0] for x in cityState]
zips = [x[1].split()[1] for x in cityState]
print(len(city), len(state), len(zips))

45 45 45


### Parsing Hours

Ok so the addresses are basically good to go. We can set up a list of lists and turn that into a dataframe easily. First, however, we need to get the hours of operation for each location.

This is weird, since oonly the hours of operation for *today* are listed within the search result page source. We can get to hours for the upcoming days by going to each search result's 'details' page.

Trying to get the hours through BeautifulSoup alone, I am unable to access the 'p' tags holding the days and respective hours. The tags simply do not exist in the request.get contents (but they are visible using Chrome dev tools). Thus, it seems I have to use Selenium. For the sake of time, I will do the operation for one location as a proof-of-concept, but for the main challenge data-table I'll include the hours for 'today' returned on the search page.

I will do this proof-of-concept with the pharmacy hours for the store at (https://www.walgreens.com/locator/duane+reade-1498+york+ave-new+york-ny-10075/id=14232) as this is the first store I came across whose store or pharmacy hours differ during the week. It's the pharmacy hours in this case - I imagine the store hours are what's really important here, but this is good enough for our example.

In [25]:
hours = soup.find_all("a", {"class": "wag-store-color wag-o-trk-index"}); 
# length 180 - 45 per desktop, mobile; x2 for 'details' and 'directions'

urlList = []
for h in hours:
    if h['href']:
        urlList.append("https://www.walgreens.com" + h['href'])
urlList = urlList[:45]

In [26]:
# This being a proof-of-concept I will hard-code the url here; I was planning on drawing it from
# the urlList above when I thought BeautifulSoup might have been enough.

driver = webdriver.Chrome(executable_path=r'C:\Users\Avik\chromedriver_win32\chromedriver.exe')
driver.get("https://www.walgreens.com/locator/duane+reade-1498+york+ave-new+york-ny-10075/id=14232")
time.sleep(4)
driver.find_element_by_xpath("//*[@id=\"pharmacyid\"]/a").click()
time.sleep(2)
det = driver.page_source
driver.close()

In [27]:
soup = BeautifulSoup(det, 'lxml');

In [28]:
# Today's hours are kept in a different 'strong' tag. This is a bit of hard-coding, but since
# today is Monday, we can just find this 'strong'-tagged time and label it Monday, then find
# the rest of the week more idiomatically
days = ["Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"]
dailyHours = []

# I don't think we need to worry about the ng-if attribute throwing things off - I believe this affects the text to 
# the right noting whether the pharmacy is open, or, whether to replace the usual time text format with "Open 24 Hours"
monday = soup.find("p", {"ng-if" : "pharmacy.open != pharmacy.close"}).contents[1].find_all("strong")
dailyHours.append("Monday " + monday[0].contents[0] + ' ' + monday[1].contents[0])

In [29]:
for d in enumerate(soup.find_all("p", {"ng-if" : "pharmacy.open != pharmacy.close"})[1:]):   #[1:] to get rid of monday
    s = d[1].find("span", {"class" : "ng-binding"}).contents[0]
    c = d[1].find("span", {"ng-if" : "pharmacy.close"}).contents[0]
    dailyHours.append(days[d[0]] + ' ' + s + ' ' + c)
dailyHours

['Monday 8AM - 9PM',
 'Tuesday 8AM - 9PM',
 'Wednesday 8AM - 9PM',
 'Thursday 8AM - 9PM',
 'Friday 8AM - 9PM',
 'Saturday 9AM - 6PM',
 'Sunday 10AM - 5PM']

So with Selenium, this technically can be done for each individual search result of the 45 returned. But for here, I'm going to move forward using 'today's time as given on the search result page.

In [32]:
def findHours(tag):
    try:
        if (tag['class'][0] == 'hidden-xs' and\
        (tag['class'][1] == 'ng-binding' or tag['class'][1] == 'ng-scope')):
            return True
        else:
            return False
    except:
           return False

hours = soup.find_all(findHours)

In [33]:
# There is a single search result which has a "Virtual Care" timeslot, throwing off my indexing. I will hard code it out
# for now, and see if I can do it idiomatically later. Noticed this checking len(hours), which was 91 - not the usual 2x45

hours = (hours[0:24] + hours[25:])    # Did not really find a way to do this idiomatically, after review
len(hours)

90

In [34]:
storeHours = []
for h in enumerate(hours):
    if h[0] % 2 == 0:                         # includes store and pharmacy times, we're going for store times
        storeHours.append(h[1].contents[0])

In [35]:
data = []
for k in range(45):
    outputRow = []
    if k in isDuane:
        outputRow.append("Duane Reade")
    else:
        outputRow.append("Walgreens")
    outputRow.extend([street[k], city[k], state[k], zips[k], storeHours[k]])
    data.append(outputRow)
    
WG = pd.DataFrame(data, columns=columns) 

# I don't like that '7th Street' becomes '7Th Street' becuase of .title()..
import re
pattern1 = re.compile("(?<=\d)St")
pattern2 = re.compile("(?<=\d)Nd")
pattern3 = re.compile("(?<=\d)Rd")
pattern4 = re.compile("(?<=\d)Th")
WG = WG.replace(pattern1, 'st')
WG = WG.replace(pattern2, 'nd')
WG = WG.replace(pattern3, 'rd')
WG = WG.replace(pattern4, 'th')

In [36]:
WG

Unnamed: 0,Name,Address,City,State,ZIP,Hours
0,Duane Reade,2522 Broadway,New York,NY,10025,24 hours
1,Duane Reade,1231 Madison Ave,New York,NY,10128,24 hours
2,Duane Reade,775 Columbus Ave,New York,NY,10025,24 hours
3,Duane Reade,1498 York Ave,New York,NY,10075,24 hours
4,Duane Reade,949 3rd Ave,New York,NY,10022,24 hours
5,Duane Reade,51 W 51st St,New York,NY,10019,24 hours
6,Duane Reade,661 8th Ave,New York,NY,10036,24 hours
7,Duane Reade,155 E 34th St,New York,NY,10016,24 hours
8,Duane Reade,71 W 23rd St,New York,NY,10010,24 hours
9,Duane Reade,1 Union Square South,New York,NY,10003,24 hours


## Final Output

In [43]:
finalOutput = pd.concat([CVS, WG]).set_index("Address").drop_duplicates(); finalOutput

Unnamed: 0_level_0,Name,City,State,ZIP,Hours
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1 Carlough Drive,Paramus Police Department,Paramus,NJ,07652,
1 Memorial Drive,Lodi Police Department,Lodi,NJ,07644,
1 Police Plaza,Elizabeth Police Department,Elizabeth,NJ,07201,
1 Wood Park,Leonia Police Department,Leonia,NJ,07605,
100 Community Dr,Nassau County Sixth Precinct,Manhasset,NY,11003,
100 Riveredge Road,Tenafly Police Department,Tenafly,NJ,07670,
15 Park Avenue,Maywood Police Dept,Maywood,NJ,07607,"24 hours a day, 7 days a week"
152 Washington Avenue,Belleville Police Department,Belleville,NJ,07109,
2 Roosevelt Square No.,Mount Vernon Police Department,Mt Vernon,NY,10550,
215 Liberty Street #217,Little Ferry Police Department,Little Ferry,NJ,07643,


UPDATE: After I committed this to GitHub and sent over the relevant email, I realized I didn't double check the actual challenge specificiation again so I thought I should do that. I notice it says to save the output table into CSV or some standard data format. I didn't do that here originally, but the code for that is just:

`finalOutput.to_csv(<filepath>)` # The other default parameter arguments are fine here.

## Final Thoughts

- Surprisingly, it feels hard to tell when exactly I'm truly hard-coding something or not. For example: 
    - Is the first method I used to find NYC ZIP codes, where I used 'td' tag attributes to filter my scraping, considered hard-coding? 
    - What about the use of the following for clicking the 'Disposal' checkbox for Walgreens?
    > `driver.find_element_by_xpath('//*[@id="healthservicescontents"]/section/aside[4]/section').click()`  

- ZIP codes aren't changing in Manhattan anytime soon, so hard-coding there is probably fine.


- I think I did pretty well in using minimal hardcoding for the CVS portion. The fact that the data came is the asked-for format was really nice.


- The Walgreens section was definitely an interesting process and furthest from what I've done in the past. I feel like there must be some way to speed up the automated browser process, considering how long it took for just one search here. I'm also curious to know how hard-coded this section in fact is, i.e. what can be made more robust.


- I have considered things like 'what happens if the page doesn't return?' as we talked about, but implementing things like changing cookies or looking for backup sources are things that I decided were probably outside the scope of this experiment.


- Ultimately, this was a fun challenge. I knew I liked webscraping in that it felt analagous to solving riddles, but this excercise (particularly the part with selenium) brought the riddle-solving to a new level. I especially enjoyed those times I managed to figure out a shortcut past, or simplification to, a process. With my limited exposure to this field before today, I'm quite happy with what I was able to produce in a single evening.

# References
 
- Introduction to Scraping w/Python
    - https://elitedatascience.com/python-web-scraping-libraries
    - https://medium.freecodecamp.org/how-to-scrape-websites-with-python-and-beautifulsoup-5946935d93fe  
    
- Table Parsing
    - https://stackoverflow.com/questions/23377533/python-beautifulsoup-parsing-table?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa


- Tag Scraping with Functions (the stackoverflow post answer led me to the information I actually used, from the docs):
    - https://stackoverflow.com/questions/22726860/beautifulsoup-webscraping-find-all-finding-exact-match?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
    - https://www.crummy.com/software/BeautifulSoup/bs4/doc/#a-function


- Review on how to best time Python code:
    - https://stackoverflow.com/questions/2866380/how-can-i-time-a-code-segment-for-testing-performance-with-pythons-timeit?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
    - https://stackoverflow.com/questions/15707056/get-time-of-execution-of-a-block-of-code-in-python-2-7?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
    
    
- Review on Efficient Pandas DataFrame Allocation:
    - https://stackoverflow.com/questions/31674557/how-to-append-rows-in-a-pandas-dataframe-in-a-for-loop?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa


- Selenium; Introduction:
    - https://medium.com/the-andela-way/introduction-to-web-scraping-using-selenium-7ec377a8cf72
    - Interesting example of someone scraping the Walgreens website, albeit a different section
        - https://github.com/timctho/walgreens-product-parser/blob/master/parse_wallgreens_beauty.py


- Selenium; Selecting elements:
    - https://codeburst.io/how-to-collect-data-through-web-scraping-using-selenium-e0f7a58c863d


- Selenium; Interfacing with elements:
    - Text Boxes
        - https://stackoverflow.com/questions/18557275/locating-entering-a-value-in-a-text-box-using-selenium-and-python?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
    - Using Ctrl+A
        - https://stackoverflow.com/questions/27775759/send-keys-control-click-in-selenium-with-python-bindings/27777509?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
    - Introduction to Keys and BACKSPACE
        - https://stackoverflow.com/questions/27338742/how-do-i-send-a-delete-keystroke-to-a-text-field-using-selenium-with-python

- Pulling href with BeautifulSoup
    - https://stackoverflow.com/questions/5815747/beautifulsoup-getting-href?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

In [1]:
# Code from https://github.com/alexanderkuk/log-progress for a nice progress bar on long for-loops

def log_progress(sequence, every=None, size=None, name='Items'):
    from ipywidgets import IntProgress, HTML, VBox
    from IPython.display import display

    is_iterator = False
    if size is None:
        try:
            size = len(sequence)
        except TypeError:
            is_iterator = True
    if size is not None:
        if every is None:
            if size <= 200:
                every = 1
            else:
                every = int(size / 200)     # every 0.5%
    else:
        assert every is not None, 'sequence is iterator, set every'

    if is_iterator:
        progress = IntProgress(min=0, max=1, value=1)
        progress.bar_style = 'info'
    else:
        progress = IntProgress(min=0, max=size, value=0)
    label = HTML()
    box = VBox(children=[label, progress])
    display(box)

    index = 0
    try:
        for index, record in enumerate(sequence, 1):
            if index == 1 or index % every == 0:
                if is_iterator:
                    label.value = '{name}: {index} / ?'.format(
                        name=name,
                        index=index
                    )
                else:
                    progress.value = index
                    label.value = u'{name}: {index} / {size}'.format(
                        name=name,
                        index=index,
                        size=size
                    )
            yield record
    except:
        progress.bar_style = 'danger'
        raise
    else:
        progress.bar_style = 'success'
        progress.value = index
        label.value = "{name}: {index}".format(
            name=name,
            index=str(index or '?')
        )