# What's that Doggy in the Window?
## Data Gathering - Web Scraping

By: **Bryan Santos**

Have you ever wanted to know the breed of dogs you see in social media or with other people because you like how they look, whether tough or cute?

This project aims to build an application that lets users upload images of a dog and to get its breed. The application will then assess the breed characteristics if it is suitable for the user based on lifestyle. If it is, then the system will redirect the user to dogs of that particular breed that is up for adoption. If not compatible, then the the system will suggest top five most compatible breeds.

The project will utilize multi-class image classification and recommendation systems machine learning models to achieve its goals.

The pet industry is a multi-billion dollar industry even just in the United States alone. The trend of owning pets is on a steady rise. Unfortunately, so do the number of dogs that would be without a permanent home or that would be euthanized. Many people buy dogs because of fad or appearances and abandon them, most likely because they do not realize that dogs of different breeds have unique characteristics and may not necessarily match their lifestyles.

***

This is the first notebook in the series and involves getting dog characteristics and actual dogs for adoption through webscraping. Dog attributes will be scraped from www.dogtime.com while dogs for adoption will be scraped from www.petfinder.com.

## 1: Package Imports

Below are the libraries that are used in order to successfully web scrape information from the two sources mentioned. Beatiful Soup would have been enough for basic websites but www.petfinder.com uses javascript to generate its HTML code so I had to use selenium as well. TQDM is particularly useful when executing codes that will run for quite a bit as it adds progress bars.

In [1]:
import numpy as np
import pandas as pd
import requests
from requests import get
from bs4 import BeautifulSoup
import tqdm
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

In [2]:
%%capture

from tqdm import tqdm_notebook as tqdm
from tqdm import tnrange
tqdm().pandas()

***

## 2: Web Scraping www.dogtime.com

This is part 1 of the web scraping objectives in this notebook where we get all pertinent details/characteristics of each dog breed.

### Initial Setup

In [25]:
### Initial Beautiful Soup setup
user_agent = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://dogtime.com/dog-breeds/profiles/'
webpage = requests.get(url, headers = user_agent).content
soup = BeautifulSoup(webpage,'html.parser')

In [33]:
### Collect the links of all breed details for further scraping
def scrape_breed_links():
    '''
    This function gets all the breed href links in the current page. 
    
    Parameters:
    - None
    
    Output:
    - List of all the URLs of all breeds
    
    '''
    breed_links = []
    breeds = soup.findAll('a',{'class':'list-item-title'})
    
    for breed in breeds:
        breed_links.append(breed['href'])

    return breed_links

In [34]:
### Use the function above
breed_links = scrape_breed_links()

### Main Scraping

In [94]:
### Prepare the dataframe where all the scraped info will be stored
dogs_df = pd.DataFrame(columns=['adaptability', 
                                'adapts_well_to_apartment_living', 
                                'good_for_novice_owners', 
                                'sensitivity_level', 
                                'tolerates_being_alone', 
                                'tolerates_cold_weather', 
                                'tolerates_hot_weather', 
                                'all_around_friendliness', 
                                'affectionate_with_family', 
                                'kid_friendly_dogs', 
                                'dog_friendly', 
                                'friendly_towards_strangers', 
                                'health_and_grooming_needs', 
                                'amount_of_shedding', 
                                'drooling_potential', 
                                'easy_to_groom', 
                                'general_health', 
                                'potential_for_weight_gain', 
                                'size', 
                                'trainability', 
                                'easy_to_train', 
                                'intelligence', 
                                'potential_for_mouthiness', 
                                'prey_drive', 
                                'tendency_to_bark_or_howl', 
                                'wanderlust_potential',
                                'physical_needs',
                                'energy_level',
                                'intensity',
                                'exercise_needs',
                                'potential_for_playfulness',
                                'dog_breed_group',
                                'height',
                                'weight',
                                'life_span',
                                'highlights',
                                'size_description',
                                'personality',
                                'health',
                                'care',
                                'feeding',
                                'coat_color_and_grooming',
                                'children_and_other_pets',
                                'rescue_groups',
                                'breed_organizations'])
dogs_df

Unnamed: 0,adaptability,adapts_well_to_apartment_living,good_for_novice_owners,sensitivity_level,tolerates_being_alone,tolerates_cold_weather,tolerates_hot_weather,all_around_friendliness,affectionate_with_family,kid_friendly_dogs,...,highlights,size_description,personality,health,care,feeding,coat_color_and_grooming,children_and_other_pets,rescue_groups,breed_organizations


In [107]:
### This is the main www.dogtime.com scraping code block

i = 1

for breed_link in tqdm(breed_links[296:]):
    
    ### Set up each breed's detailed page
    user_agent = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
    url = breed_link
    webpage = requests.get(url, headers = user_agent).content
    soup = BeautifulSoup(webpage,'html.parser')
    
    ### Prints current status
    print('[' + str(i) + '] Scraping ' + breed_link)
    i+=1
    
    star_blocks = soup.findAll('div',{'class':'characteristic-star-block'})

    stars = []
    
    ### Gets all the star count in the page
    for star_block in star_blocks:
        star_class = star_block.find('div', {'class': 'star'})
        star_class = star_class.get_attribute_list('class')
        stars.append(star_class)
    
    ### Pinpoints specific HTML codes to get the information
    vital = soup.findAll('div',{'class':'vital-stat-box'})
    org = soup.find('h3',{'class':'js-section-heading description-title'},text='Breed Organizations')
    rg = soup.find('h3',{'class':'js-section-heading description-title'},text='Rescue Groups')
    children = soup.find('h3',{'class':'js-section-heading description-title'},text='Children And Other Pets')
    cc = soup.find('h3',{'class':'js-section-heading description-title'},text='Coat Color And Grooming')
    feeding = soup.find('h3',{'class':'js-section-heading description-title'},text='Feeding')
    care = soup.find('h3',{'class':'js-section-heading description-title'},text='Care')
    health = soup.find('h3',{'class':'js-section-heading description-title'},text='Health')
    personality = soup.find('h3',{'class':'js-section-heading description-title'},text='Personality')
    size_description = soup.find('h3',{'class':'js-section-heading description-title'},text='Size')
    highlights = soup.find('h3',{'class':'js-section-heading description-title'},text='Highlights')

    ### Assigning derived information to variables
    adaptability = int(stars[0][1][-1])
    adapts_well_to_apartment_living = int(stars[1][1][-1])
    good_for_novice_owners = int(stars[2][1][-1])
    sensitivity_level = int(stars[3][1][-1])
    tolerates_being_alone = int(stars[4][1][-1])
    tolerates_cold_weather = int(stars[5][1][-1])
    tolerates_hot_weather = int(stars[6][1][-1])
    all_around_friendliness = int(stars[7][1][-1])
    affectionate_with_family = int(stars[8][1][-1])
    kid_friendly_dogs = int(stars[9][1][-1])
    dog_friendly = int(stars[10][1][-1])
    friendly_towards_strangers = int(stars[11][1][-1])
    health_and_grooming_needs = int(stars[12][1][-1])
    amount_of_shedding = int(stars[13][1][-1])
    drooling_potential = int(stars[14][1][-1])
    easy_to_groom = int(stars[15][1][-1])
    general_health = int(stars[16][1][-1])
    potential_for_weight_gain = int(stars[17][1][-1])
    size = int(stars[18][1][-1])
    trainability = int(stars[19][1][-1])
    easy_to_train = int(stars[20][1][-1])
    intelligence = int(stars[21][1][-1])
    potential_for_mouthiness = int(stars[22][1][-1])
    prey_drive = int(stars[23][1][-1])
    tendency_to_bark_or_howl = int(stars[24][1][-1])
    wanderlust_potential = int(stars[25][1][-1])
    physical_needs = int(stars[26][1][-1])
    energy_level = int(stars[27][1][-1])
    intensity = int(stars[28][1][-1])
    exercise_needs = int(stars[29][1][-1])
    potential_for_playfulness = int(stars[30][1][-1])

    dog_breed_group = vital[0].get_text()[16:]
    height = vital[1].get_text()[7:]
    weight = vital[2].get_text()[7:]
    life_span = vital[3].get_text()[10:]
    highlights = highlights.findNext().get_text() if highlights else 'Not available'
    size_description = size_description.findNext().get_text() if size_description else 'Not available'
    personality = personality.findNext().get_text()if personality else 'Not available'
    health = health.findNext().get_text() if health else 'Not available'
    care = care.findNext().get_text() if care else 'Not available'
    feeding = feeding.findNext().get_text() if feeding else 'Not available'
    coat_color_and_grooming = cc.findNext().get_text() if cc else 'Not available'
    children_and_other_pets = children.findNext().get_text() if children else 'Not available'
    rescue_groups = rg.findNext().get_text() if rg else 'Not available'
    breed_organizations = org.findNext().get_text() if org else 'Not available'
    
    ### Adding each breed details to the dataframe 
    dogs_df = dogs_df.append({'breed': breed_link[31:],
                                'adaptability': adaptability, 
                                'adapts_well_to_apartment_living': adapts_well_to_apartment_living, 
                                'good_for_novice_owners': good_for_novice_owners, 
                                'sensitivity_level': sensitivity_level, 
                                'tolerates_being_alone': tolerates_being_alone, 
                                'tolerates_cold_weather': tolerates_cold_weather, 
                                'tolerates_hot_weather': tolerates_hot_weather, 
                                'all_around_friendliness': all_around_friendliness, 
                                'affectionate_with_family': affectionate_with_family, 
                                'kid_friendly_dogs': kid_friendly_dogs, 
                                'dog_friendly': dog_friendly, 
                                'friendly_towards_strangers': friendly_towards_strangers, 
                                'health_and_grooming_needs': health_and_grooming_needs, 
                                'amount_of_shedding': amount_of_shedding, 
                                'drooling_potential': drooling_potential, 
                                'easy_to_groom': easy_to_groom, 
                                'general_health': general_health, 
                                'potential_for_weight_gain': potential_for_weight_gain, 
                                'size': size, 
                                'trainability': trainability, 
                                'easy_to_train': easy_to_train, 
                                'intelligence': intelligence, 
                                'potential_for_mouthiness': potential_for_mouthiness, 
                                'prey_drive': prey_drive, 
                                'tendency_to_bark_or_howl': tendency_to_bark_or_howl, 
                                'wanderlust_potential': wanderlust_potential,
                                'physical_needs': physical_needs,
                                'energy_level': energy_level,
                                'intensity': intensity,
                                'exercise_needs': exercise_needs,
                                'potential_for_playfulness': potential_for_playfulness,
                                'dog_breed_group': dog_breed_group,
                                'height': height,
                                'weight': weight,
                                'life_span': life_span,
                                'highlights': highlights,
                                'size_description': size_description,
                                'personality': personality,
                                'health': health,
                                'care': care,
                                'feeding': feeding,
                                'coat_color_and_grooming': coat_color_and_grooming,
                                'children_and_other_pets': children_and_other_pets,
                                'rescue_groups': rescue_groups,
                                'breed_organizations': breed_organizations}, ignore_index = True)


HBox(children=(IntProgress(value=0, max=66), HTML(value='')))

[1] Scraping https://dogtime.com/dog-breeds/puli
[2] Scraping https://dogtime.com/dog-breeds/pyredoodle
[3] Scraping https://dogtime.com/dog-breeds/pyrenean-shepherd
[4] Scraping https://dogtime.com/dog-breeds/rat-terrier
[5] Scraping https://dogtime.com/dog-breeds/redbone-coonhound
[6] Scraping https://dogtime.com/dog-breeds/rhodesian-ridgeback
[7] Scraping https://dogtime.com/dog-breeds/rottador
[8] Scraping https://dogtime.com/dog-breeds/rottle
[9] Scraping https://dogtime.com/dog-breeds/rottweiler
[10] Scraping https://dogtime.com/dog-breeds/saint-berdoodle
[11] Scraping https://dogtime.com/dog-breeds/saint-bernard
[12] Scraping https://dogtime.com/dog-breeds/saluki
[13] Scraping https://dogtime.com/dog-breeds/samoyed
[14] Scraping https://dogtime.com/dog-breeds/samusky
[15] Scraping https://dogtime.com/dog-breeds/schipperke
[16] Scraping https://dogtime.com/dog-breeds/schnoodle
[17] Scraping https://dogtime.com/dog-breeds/scottish-deerhound
[18] Scraping https://dogtime.com/dog-br

### Validation

In [239]:
dogs_df.shape

(359, 46)

### Data Export

In [113]:
dogs_df.to_csv("dogs_df.csv")

***

## 3: Web Scraping www.petfinder.com

This is part 2 of the web scraping objectives in this notebook where we get all dog details for adoption. Please note that this part uses selenium because the site uses javascript to generate HTML content.

### Initial Setup

In [235]:
### Prepares dataframe for storage
adoption_df = pd.DataFrame(columns=['label', 
                                'link'])
adoption_df

Unnamed: 0,label,link


In [217]:
### Stores URLs of all pages to be scraped
adoption_links = []

for i in range(1, 1593):
    link = 'https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=' + str(i)
    adoption_links.append(link)

In [218]:
adoption_links[:5]

['https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1',
 'https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=2',
 'https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=3',
 'https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=4',
 'https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=5']

### Main Scraping

In [236]:
### This is the main scraping code block

i = 0

### Go through each link
for adoption_link in tqdm(adoption_links):
    
    ### Print status
    print('[' + str(i) + '] Scraping ' + adoption_link)
    i+=1
    
    ### Set up selenium for beautiful soup processing
    url = adoption_link
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(options=options)
    driver.get(url)
    page = driver.page_source
    driver.quit()
    soup = BeautifulSoup(page, 'html.parser')
    
    body_details = soup.findAll('a', {'class': 'petCard-link'})
    
    ### Storing the information to the dataframe
    for body_detail in body_details:
        label = body_detail.get_attribute_list('aria-label')
        link = body_detail.get_attribute_list('href')
        adoption_df = adoption_df.append({'label': label,
                                    'link': link}, ignore_index = True)

HBox(children=(IntProgress(value=0, max=1592), HTML(value='')))

[0] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1
[1] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=2
[2] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=3
[3] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=4
[4] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=5
[5] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=6
[6] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=7
[7] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=8
[8] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=9
[9] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?dista

[79] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=80
[80] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=81
[81] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=82
[82] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=83
[83] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=84
[84] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=85
[85] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=86
[86] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=87
[87] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=88
[88] Scraping https://www.petfinder.com/search/dogs-for-adoption

[157] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=158
[158] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=159
[159] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=160
[160] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=161
[161] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=162
[162] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=163
[163] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=164
[164] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=165
[165] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=166
[166] Scraping https://www.petfinder.com/searc

[235] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=236
[236] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=237
[237] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=238
[238] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=239
[239] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=240
[240] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=241
[241] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=242
[242] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=243
[243] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=244
[244] Scraping https://www.petfinder.com/searc

[313] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=314
[314] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=315
[315] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=316
[316] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=317
[317] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=318
[318] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=319
[319] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=320
[320] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=321
[321] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=322
[322] Scraping https://www.petfinder.com/searc

[391] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=392
[392] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=393
[393] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=394
[394] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=395
[395] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=396
[396] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=397
[397] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=398
[398] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=399
[399] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=400
[400] Scraping https://www.petfinder.com/searc

[469] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=470
[470] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=471
[471] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=472
[472] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=473
[473] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=474
[474] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=475
[475] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=476
[476] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=477
[477] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=478
[478] Scraping https://www.petfinder.com/searc

[547] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=548
[548] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=549
[549] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=550
[550] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=551
[551] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=552
[552] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=553
[553] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=554
[554] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=555
[555] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=556
[556] Scraping https://www.petfinder.com/searc

[625] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=626
[626] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=627
[627] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=628
[628] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=629
[629] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=630
[630] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=631
[631] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=632
[632] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=633
[633] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=634
[634] Scraping https://www.petfinder.com/searc

[703] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=704
[704] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=705
[705] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=706
[706] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=707
[707] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=708
[708] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=709
[709] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=710
[710] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=711
[711] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=712
[712] Scraping https://www.petfinder.com/searc

[781] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=782
[782] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=783
[783] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=784
[784] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=785
[785] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=786
[786] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=787
[787] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=788
[788] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=789
[789] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=790
[790] Scraping https://www.petfinder.com/searc

[859] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=860
[860] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=861
[861] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=862
[862] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=863
[863] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=864
[864] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=865
[865] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=866
[866] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=867
[867] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=868
[868] Scraping https://www.petfinder.com/searc

[937] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=938
[938] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=939
[939] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=940
[940] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=941
[941] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=942
[942] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=943
[943] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=944
[944] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=945
[945] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=946
[946] Scraping https://www.petfinder.com/searc

[1015] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1016
[1016] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1017
[1017] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1018
[1018] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1019
[1019] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1020
[1020] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1021
[1021] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1022
[1022] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1023
[1023] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1024
[1024] Scraping https://www.

[1091] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1092
[1092] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1093
[1093] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1094
[1094] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1095
[1095] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1096
[1096] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1097
[1097] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1098
[1098] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1099
[1099] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1100
[1100] Scraping https://www.

[1167] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1168
[1168] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1169
[1169] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1170
[1170] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1171
[1171] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1172
[1172] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1173
[1173] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1174
[1174] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1175
[1175] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1176
[1176] Scraping https://www.

[1243] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1244
[1244] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1245
[1245] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1246
[1246] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1247
[1247] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1248
[1248] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1249
[1249] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1250
[1250] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1251
[1251] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1252
[1252] Scraping https://www.

[1319] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1320
[1320] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1321
[1321] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1322
[1322] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1323
[1323] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1324
[1324] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1325
[1325] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1326
[1326] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1327
[1327] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1328
[1328] Scraping https://www.

[1395] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1396
[1396] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1397
[1397] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1398
[1398] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1399
[1399] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1400
[1400] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1401
[1401] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1402
[1402] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1403
[1403] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1404
[1404] Scraping https://www.

[1471] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1472
[1472] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1473
[1473] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1474
[1474] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1475
[1475] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1476
[1476] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1477
[1477] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1478
[1478] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1479
[1479] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1480
[1480] Scraping https://www.

[1547] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1548
[1548] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1549
[1549] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1550
[1550] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1551
[1551] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1552
[1552] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1553
[1553] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1554
[1554] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1555
[1555] Scraping https://www.petfinder.com/search/dogs-for-adoption/us/new-york/?distance=Anywhere&page=1556
[1556] Scraping https://www.

### Evaluation

This is where we double-check the resulting dataframe. 

In [237]:
adoption_df

Unnamed: 0,label,link
0,"[Cary in NY - So Gentle & Sweet!, adoptable Do...",[https://www.petfinder.com/dog/cary-in-ny-so-g...
1,"[Darlin', adoptable Dog, Young Female Pit Bull...",[https://www.petfinder.com/dog/darlin-33661138...
2,"[Thea Queen, adoptable Dog, Adult Female Pit B...",[https://www.petfinder.com/dog/thea-queen-4725...
3,"[Chloe, adoptable Dog, Senior Female Pit Bull ...",[https://www.petfinder.com/dog/chloe-47211945/...
4,"[Sheba, adoptable Dog, Adult Female Pit Bull T...",[https://www.petfinder.com/dog/sheba-47211940/...
...,...,...
63586,"[Pooches, adoptable Dog, Adult Female Terrier]",[https://www.petfinder.com/dog/pooches-4581812...
63587,"[Diggity, adoptable Dog, Adult Male Dachshund ...",[https://www.petfinder.com/dog/diggity-4581819...
63588,"[Amaya, adoptable Dog, Young Female Miniature ...",[https://www.petfinder.com/dog/amaya-45818077/...
63589,"[Libby, adoptable Dog, Adult Female Pit Bull T...",[https://www.petfinder.com/dog/libby-45818067/...


In [227]:
adoption_df.iloc[2516].label

['Walter, adoptable Dog, Adult Male Collie & Corgi Mix']

## Data Export

In [238]:
adoption_df.to_csv("adoption_df.csv")

***

## 4: Web Scraping AKC for the Top 50 breeds

The final version of this project will focus on just the Top 50 breeds in America. This is a good basis because it measures the number of registered breeds per year.

In [3]:
### Initial Beautiful Soup setup
user_agent = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'}
url = 'https://www.akc.org/expert-advice/news/most-popular-dog-breeds-full-ranking-list/'
webpage = requests.get(url, headers = user_agent).content
soup = BeautifulSoup(webpage,'html.parser')

In [32]:
spans = soup.findAll('span', {'style': 'font-weight: 400;'})

In [42]:
top_60 = []
i = 1
for span in spans:
    if i <= 60:
        top_60.append(span.text)
        i+=1
    else:
        break

In [43]:
top_60

['Labrador Retrievers',
 'German Shepherd Dogs',
 'Golden Retrievers',
 'French Bulldogs',
 'Bulldogs',
 'Beagles',
 'Poodles',
 'Rottweilers',
 'Yorkshire Terriers',
 'Boxers',
 'Pembroke Welsh Corgis',
 'Siberian Huskies',
 'Australian Shepherds',
 'Cavalier King Charles Spaniels',
 'Shih Tzu',
 'Boston Terriers',
 'Bernese Mountain Dogs',
 'Havanese',
 'Brittanys',
 'English Springer Spaniels',
 'Pugs',
 'Mastiffs',
 'Cocker Spaniels',
 'Vizslas',
 'Cani Corsi',
 'Miniature American Shepherds',
 'Border Collies',
 'Weimaraners',
 'Maltese',
 'Basset Hounds',
 'Newfoundlands',
 'Rhodesian Ridgebacks',
 'West Highland White Terriers',
 'Belgian Malinois',
 'Shiba Inu',
 'Bichon Frises',
 'Akitas',
 'St. Bernards',
 'Bloodhounds',
 'Bullmastiffs',
 'English Cocker Spaniels',
 'Soft Coated Wheaten Terriers',
 'Papillons',
 'Australian Cattle Dogs',
 'Dalmatians',
 'Scottish Terriers',
 'Alaskan Malamutes',
 'Airedale Terriers',
 'Whippets',
 'Bull Terriers',
 'Chinese Shar-Pei',
 'Wireh

In [44]:
len(top_60)

60

Top 60 breeds were taken because not all breeds had images gathered from API calls in the next notebook.