In [103]:
#Import Packages
import requests
from bs4 import BeautifulSoup
import re
import pandas as pd

### Coding by Renan Peneluppi 

### This assignment asked to get breadcrumbs for a subcategory and its product attributes from homedepot.com. For example, a power tool as a subcategory can have attributes like the voltage, box dimensions, power output, etc.

### The task only required to loop thru a subcategory and its products attributes. However I chose to write a logic capable of checking that for the entire website, including the full breadcrumb path from the home page to its sub_subcategories and attributes. A bit more challeging but it just made sense to build a complete scrapper. 

### One limitation I find with the current code is that only the first product page for each sub_subcategory is explored. In order to explore all pages I would add a selenium use to the the logic. For the current purpose I didn't find that necessary. But let's say we wanted to use this code to build a product price list, in that case this would have to be added. 

### As this is an exercise I left a few blocks that explain how I started buiding each of the main loops, but at the end of the day all these codes could be combined, even made into a function, to run at the same time. 

##### For this task I'm using Python PAckages Requests, BeautifulSopu and Pandas. 
##### HomeDepot.com, like many other websites, stops responding to multiple page access if it thinks we are a bot.
##### To get around that issue I used both a timeout argument to avoid the code from getting stuck attempting a response and a Header argument within the request to emulate an actual browser. In other projects, I did that using Selenium to emulate user activity, but in this case, that is not necessary. 

##### Note that the final result for this task can take a long time to run with you loop thru all categories, sub_categories, cub-sub_categories, products and attributes. For that reason, I chose to set three different tables that can easily be joined together, but also allow for some "filtering" so that the loop will only output the attributes for that filter. If no filtering is applied the code still works but the final run may take several minutes to finish. 


### Let's get started one step at a time - I'll work getting a list of Categories, then Subcategories and Sub_subCategories

#### Start with the list of Categories


In [104]:
# Set a user agent to try avoid not getting a response from homedepot.com, this solution seems to be working for now
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'}



In [105]:
#Access home page to get the main categories into a list
url='https://www.homedepot.ca/en/home.html'
response = requests.get(url,headers=headers, timeout=5)
pagecontent = response.content.decode('utf-8')


In [106]:
#check if request worked
response.status_code

200

In [107]:
#Use beautifullSoup to parse and process the page content
soup = BeautifulSoup(response.content, 'lxml')
print(soup.prettify())

<!DOCTYPE HTML>
<html lang="en">
 <head typeof="og:website">
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no" name="viewport"/>
  <meta content="telephone=no" name="format-detection"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <title>
   Home Improvement, Home Renovation, Tools, &amp; Hardware | The Home Depot Canada
  </title>
  <meta content="Shop online at The Home Depot Canada for all of your home improvement needs. Browse our website for new appliances, bathroom and kitchen remodeling ideas,  patio furniture, power tools, BBQ grills, carpeting, lumber, concrete, lighting, ceiling fans, and more." name="description"/>
  <meta content="The Home Depot Canada" property="og:title"/>
  <meta content="website" property="og:type"/>
  <meta content="https://www.homedepot.ca/en/home.html" property="og:url"/>
  <meta content="Shop online at The Home Depot Canada for all of your home impr

In [108]:
#set a list with all the categories from the DIV I identified on home depot homepage using the inspect button 
categories = soup.find_all('a', class_='hdca-cms-category-banner__link')
categories

[<a class="hdca-cms-category-banner__link" href="https://www.homedepot.ca/en/home/categories/tools.html?intid=HP_D1_ALL_Tools_EN_NA">
 <div class="hdca-cms-category-banner__image-box">
 <picture>
 <source data-large="true" data-srcset="/content/dam/homedepot/images/homepage/shop-by-department/2020/oct-2020-homepage-shop-by-department-tools-200x147.png.imgtransform.82.1280.png" media="(min-width: 768px)"/>
 <source data-srcset="/content/dam/homedepot/images/homepage/shop-by-department/2020/oct-2020-homepage-shop-by-department-tools-140x118-mb.png.imgtransform.75.800.png"/>
 <img aria-hidden="true" class="lazy hdca-cms-category-banner__image" srcset="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7"/>
 </picture>
 </div>
 <div class="hdca-button-container hdca-button-container--content-horizontal--center" data-component="ActionButton" data-component-id="767d63c7-c08b-4986-a0ab-91516e103f51/root/responsivegrid/categorybannercontai" data-component-instance="76

In [109]:
#list catgories names and URLs

cat_list = []
cat_url=[]
for cat in categories:
    cat_list.append(cat.text)
    cat_url.append(cat.get('href'))
    
print(cat_list)
print(cat_url)

['\n\n\n\n\n\n\n\n\n\nTools\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nAppliances\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Building Materials & \n      Plumbing\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nBath\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Lighting & Ceiling \n      Fans\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nOutdoors\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Home Décor & \n      Furniture\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Floors & Area \n      Rugs\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nKitchen\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Smart \n      Home\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Storage & \n      Organization\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nElectrical\n        \n\n\n\n\n\n\n\n']
['https://www.homedepot.ca/en/home/categories/tools.html?intid=HP_D1_ALL_Tools_EN_NA', 'https://www.homedepot.ca/en/home/ca

In [110]:
#Check what the list looks like    
print(cat_list)
print(cat_url)

# URL list looks fine, category names list requires some cleaning

['\n\n\n\n\n\n\n\n\n\nTools\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nAppliances\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Building Materials & \n      Plumbing\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nBath\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Lighting & Ceiling \n      Fans\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nOutdoors\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Home Décor & \n      Furniture\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Floors & Area \n      Rugs\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nKitchen\n        \n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Smart \n      Home\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\n\n      Storage & \n      Organization\n        \n\n\n\n\n\n\n\n\n', '\n\n\n\n\n\n\n\n\n\nElectrical\n        \n\n\n\n\n\n\n\n']
['https://www.homedepot.ca/en/home/categories/tools.html?intid=HP_D1_ALL_Tools_EN_NA', 'https://www.homedepot.ca/en/home/ca

In [111]:
#still needs some cleaning

cat_list1=[]

for i in cat_list:
    cat_list1.append(i.replace('\n',''))
    
    
    
cat_list1

['Tools        ',
 'Appliances        ',
 '      Building Materials &       Plumbing        ',
 'Bath        ',
 '      Lighting & Ceiling       Fans        ',
 'Outdoors        ',
 '      Home Décor &       Furniture        ',
 '      Floors & Area       Rugs        ',
 'Kitchen        ',
 '      Smart       Home        ',
 '      Storage &       Organization        ',
 'Electrical        ']

In [112]:
#Remove white spaces

cat_list = []

for i in cat_list1:
    cat_list.append(i.replace(' ',''))
    
cat_list

['Tools',
 'Appliances',
 'BuildingMaterials&Plumbing',
 'Bath',
 'Lighting&CeilingFans',
 'Outdoors',
 'HomeDécor&Furniture',
 'Floors&AreaRugs',
 'Kitchen',
 'SmartHome',
 'Storage&Organization',
 'Electrical']

In [113]:
#Check that my url and cat_name list is the same size

print(len(cat_list))
print(len(cat_url))

12
12


### From this I have the first data required to build a dataframe with Categories, Sub Categories and its attributes.

##### As a next step I will check how to acces one of the urls and what info is there to be extracted. 


In [1]:
#check if the first URL looks about right
cat_url[0]
##Looks like it can also be cleaned a little

NameError: name 'cat_url' is not defined

In [115]:
cat_url1=[]
for i in cat_url:
    cat_url1.append(i.split('?')[0])
    
cat_url=cat_url1
cat_url

['https://www.homedepot.ca/en/home/categories/tools.html',
 'https://www.homedepot.ca/en/home/categories/appliances.html',
 'https://www.homedepot.ca/en/home/categories/building-materials.html',
 'https://www.homedepot.ca/en/home/categories/bath.html',
 'https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans.html',
 'https://www.homedepot.ca/en/home/categories/outdoors.html',
 'https://www.homedepot.ca/en/home/categories/decor.html',
 'https://www.homedepot.ca/en/home/categories/floors.html',
 'https://www.homedepot.ca/en/home/categories/kitchen.html',
 'https://www.homedepot.ca/en/home/categories/smart-home.html',
 'https://www.homedepot.ca/en/home/categories/decor/storage-and-organization.html',
 'https://www.homedepot.ca/en/home/categories/building-materials/electrical.html']

In [117]:
cat_url[0]

'https://www.homedepot.ca/en/home/categories/tools.html'

In [118]:
#Now to the subcat url
#Acces the first element of the url as a test. in the final loop this will become one hit for every category url.

# this time I will set a maximum time out, as I've been having trouble at this point
response = requests.get(cat_url[0],headers=headers, timeout=5) # Get first url in list
pagecontent = response.content.decode('utf-8')

#check if request worked
print(response.status_code)
#Use beautifullSoup to parse and process the page content
soup = BeautifulSoup(response.content, 'lxml')
print(soup.prettify())

200
<!DOCTYPE HTML>
<html lang="en">
 <head typeof="og:website">
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no" name="viewport"/>
  <meta content="telephone=no" name="format-detection"/>
  <meta content="IE=edge" http-equiv="X-UA-Compatible"/>
  <title>
   Tools: Power Tools &amp; Hand Tools | The Home Depot Canada
  </title>
  <meta content="Shop our selection of power tools &amp; hand tools including electrical tools, plumbing tools, carpentry tools, construction tools &amp; more." name="description"/>
  <meta content="Tools" property="og:title"/>
  <meta content="website" property="og:type"/>
  <meta content="https://www.homedepot.ca/en/home/categories/tools.html" property="og:url"/>
  <meta content="Shop our selection of power tools &amp; hand tools including electrical tools, plumbing tools, carpentry tools, construction tools &amp; more." property="og:description"/>
  <meta content="text/ht

In [119]:
soup.find_all('a', class_='hdca-cms-side-navigation__list-link')

[<a class="hdca-cms-side-navigation__list-link" href="/en/home/categories/all/collections/tools-special-value.html">
         Tools Special Value
       </a>,
 <a class="hdca-cms-side-navigation__list-link" href="/en/home/categories/tools/power-tools.html">
         Power Tools
       </a>,
 <a class="hdca-cms-side-navigation__list-link" href="/en/home/categories/tools/hand-tools.html">
         Hand Tools
       </a>,
 <a class="hdca-cms-side-navigation__list-link" href="/en/home/categories/tools/tool-storage.html">
         Tool Storage
       </a>,
 <a class="hdca-cms-side-navigation__list-link" href="/en/home/categories/tools/power-tool-accessories.html">
         Power Tools &amp; Accessories
       </a>,
 <a class="hdca-cms-side-navigation__list-link" href="/en/home/categories/tools/air-tools-and-compressors.html">
         Air Tools &amp; Compressors
       </a>,
 <a class="hdca-cms-side-navigation__list-link" href="/en/home/categories/appliances/vacuums-and-carpet-cleaners/wet-

In [120]:
#I need to feed two lists at the same time with the desired information so that they both have the same size and indexing,
#This way when they are zipped to each other I will have an accurate relationship between them.


tool_subcat=[]
tool_subcat_url=[]

for i in soup.find_all('a', class_='hdca-cms-side-navigation__list-link'):
    tool_subcat.append(i.text)
    tool_subcat_url.append(i.get('href'))
    
print(len(tool_subcat))   
print(len(tool_subcat_url))

37
37


In [121]:
#Again the name list requires some cleaning
tool_subcat

['\n        Tools Special Value\n      ',
 '\n        Power Tools\n      ',
 '\n        Hand Tools\n      ',
 '\n        Tool Storage\n      ',
 '\n        Power Tools & Accessories\n      ',
 '\n        Air Tools & Compressors\n      ',
 '\n        Wet & Dry Vacuums\n      ',
 '\n        Automotive\n      ',
 '\n        Woodworking Tools & Accessories\n      ',
 '\n        Apparel & Safety Gear\n      ',
 '\n        Ladders & Scaffolding\n      ',
 '\n        Welding & Soldering Torches\n      ',
 '\n        Milwaukee\n      ',
 '\n        DeWalt\n      ',
 '\n        RIDGID\n      ',
 '\n        RYOBI\n      ',
 '\n        Makita\n      ',
 '\n        Husky\n      ',
 '\n        Bosch\n      ',
 '\n        Klein Tools\n      ',
 '\n        Diablo\n      ',
 '\n        Dremel\n      ',
 '\n        Lincoln Electric\n      ',
 '\n        FEIN\n      ',
 '\n        Tradesman\n      ',
 '\n        Milwaukee\n      ',
 '\n        Hardware\n      ',
 '\n        Outdoor Power Equipment\n    

In [122]:
#It's a small list so a couple of loops quickly cleans the data

tool_subcat1= []
for i in tool_subcat:
    tool_subcat1.append(i.replace(' ',''))
  
tool_subcat=[]
    
for i in tool_subcat1:
    tool_subcat.append(i.replace('\n',''))
    
tool_subcat    

['ToolsSpecialValue',
 'PowerTools',
 'HandTools',
 'ToolStorage',
 'PowerTools&Accessories',
 'AirTools&Compressors',
 'Wet&DryVacuums',
 'Automotive',
 'WoodworkingTools&Accessories',
 'Apparel&SafetyGear',
 'Ladders&Scaffolding',
 'Welding&SolderingTorches',
 'Milwaukee',
 'DeWalt',
 'RIDGID',
 'RYOBI',
 'Makita',
 'Husky',
 'Bosch',
 'KleinTools',
 'Diablo',
 'Dremel',
 'LincolnElectric',
 'FEIN',
 'Tradesman',
 'Milwaukee',
 'Hardware',
 'OutdoorPowerEquipment',
 'Generators',
 'DrywallTools',
 'ElectricalTools',
 'PlumbingTools',
 'Sandpaper&SandingSponges',
 'GarageOrganization',
 'PaintTools',
 'PortableFans',
 'ShopNow']

#### Now let me see how to get a loop of the subcategories working

In [123]:
#Set a dictionary of categories and it's URLS

Categories = list(zip(cat_list,cat_url))
print(Categories)

[('Tools', 'https://www.homedepot.ca/en/home/categories/tools.html'), ('Appliances', 'https://www.homedepot.ca/en/home/categories/appliances.html'), ('BuildingMaterials&Plumbing', 'https://www.homedepot.ca/en/home/categories/building-materials.html'), ('Bath', 'https://www.homedepot.ca/en/home/categories/bath.html'), ('Lighting&CeilingFans', 'https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans.html'), ('Outdoors', 'https://www.homedepot.ca/en/home/categories/outdoors.html'), ('HomeDécor&Furniture', 'https://www.homedepot.ca/en/home/categories/decor.html'), ('Floors&AreaRugs', 'https://www.homedepot.ca/en/home/categories/floors.html'), ('Kitchen', 'https://www.homedepot.ca/en/home/categories/kitchen.html'), ('SmartHome', 'https://www.homedepot.ca/en/home/categories/smart-home.html'), ('Storage&Organization', 'https://www.homedepot.ca/en/home/categories/decor/storage-and-organization.html'), ('Electrical', 'https://www.homedepot.ca/en/home/categories/building-materials/

In [124]:
#Just check that the dictionary is in the expected order
for cat in Categories:
    print(str(cat[0]))

Tools
Appliances
BuildingMaterials&Plumbing
Bath
Lighting&CeilingFans
Outdoors
HomeDécor&Furniture
Floors&AreaRugs
Kitchen
SmartHome
Storage&Organization
Electrical


### Now I loop each category, find all it's sub categories, create new lists with same sizes that will have the Category, the subcategory and subcategory URL

In [125]:
#Set empty lists

CAT_list=[]
subcat=[]
subcat_url=[]


# Set a user agent to try avoid not getting a response from homedepot.com, this solution seems to be working for now
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36'}

#Loop categories URLS to retrieve Sub Cats and Sub Cats URLs
# On this point a Try Except block is probably recomended but I'll try without that for now

for cat in Categories:
    
    # this time I will set a maximum time out, as I've been having trouble at this point
    response = requests.get(cat[1],headers=headers, timeout=5) # Get first url in list
    pagecontent = response.content.decode('utf-8')
    #check if request worked - just some flow control
    print(str(cat[0]) +' URL Response: ' + str(response.status_code))
        
    #Set each subcat soup
    soup = BeautifulSoup(response.content, 'lxml')
        
       
    for sub in soup.find_all('a', class_='hdca-cms-side-navigation__list-link'):
            
        subcat.append(sub.text)
        subcat_url.append(sub.get('href'))
        CAT_list.append(cat[0])

    # I expect these list to have same number of elements, so I will print that value after each loop to track that it's working
    print(str(len(subcat))+ ' '+str(len(subcat_url))+ ' '+str(len(CAT_list)))   
                    
                  

Tools URL Response: 200
37 37 37
Appliances URL Response: 200
86 86 86
BuildingMaterials&Plumbing URL Response: 200
107 107 107
Bath URL Response: 200
129 129 129
Lighting&CeilingFans URL Response: 200
169 169 169
Outdoors URL Response: 200
206 206 206
HomeDécor&Furniture URL Response: 200
239 239 239
Floors&AreaRugs URL Response: 200
263 263 263
Kitchen URL Response: 200
302 302 302
SmartHome URL Response: 200
357 357 357
Storage&Organization URL Response: 200
380 380 380
Electrical URL Response: 200
415 415 415


#### Now let's check what the lists look like

In [126]:
pd.DataFrame(list(zip(CAT_list, subcat, subcat_url)),
            columns=['Category','Subcategory','Subcategory_URL'])
                    


Unnamed: 0,Category,Subcategory,Subcategory_URL
0,Tools,\n Tools Special Value\n,/en/home/categories/all/collections/tools-spec...
1,Tools,\n Power Tools\n,/en/home/categories/tools/power-tools.html
2,Tools,\n Hand Tools\n,/en/home/categories/tools/hand-tools.html
3,Tools,\n Tool Storage\n,/en/home/categories/tools/tool-storage.html
4,Tools,\n Power Tools & Accessories\n,/en/home/categories/tools/power-tool-accessori...
...,...,...,...
410,Electrical,\n Iberville\n,https://www.homedepot.ca/en/home/categories/bu...
411,Electrical,\n Carlon\n,https://www.homedepot.ca/en/home/categories/bu...
412,Electrical,\n Siemens\n,https://www.homedepot.ca/en/home/categories/bu...
413,Electrical,\n Square D\n,https://www.homedepot.ca/en/home/categories/bu...


It seems that I still need to clean the subcategories names list to get rid of white spaces and \n'

In [127]:
subcat1= []
for i in subcat:
    subcat1.append(i.replace(' ',''))
  
subcat=[]
    
for i in subcat1:
    subcat.append(i.replace('\n',''))
    
subcat    

['ToolsSpecialValue',
 'PowerTools',
 'HandTools',
 'ToolStorage',
 'PowerTools&Accessories',
 'AirTools&Compressors',
 'Wet&DryVacuums',
 'Automotive',
 'WoodworkingTools&Accessories',
 'Apparel&SafetyGear',
 'Ladders&Scaffolding',
 'Welding&SolderingTorches',
 'Milwaukee',
 'DeWalt',
 'RIDGID',
 'RYOBI',
 'Makita',
 'Husky',
 'Bosch',
 'KleinTools',
 'Diablo',
 'Dremel',
 'LincolnElectric',
 'FEIN',
 'Tradesman',
 'Milwaukee',
 'Hardware',
 'OutdoorPowerEquipment',
 'Generators',
 'DrywallTools',
 'ElectricalTools',
 'PlumbingTools',
 'Sandpaper&SandingSponges',
 'GarageOrganization',
 'PaintTools',
 'PortableFans',
 'ShopNow',
 'AppliancePromotions',
 'ShopAllRefrigerators',
 'FrenchDoorRefrigerators',
 'FreezerlessRefrigerators',
 'BottomFreezerRefrigerators',
 'SideBySideRefrigerators',
 'MiniRefrigerators',
 'TopFreezerRefrigerators',
 'Propane&SolarRefrigerators',
 'RefrigeratorParts',
 'Freezers&IceMakers',
 'Wine&BeverageCoolers',
 'ShopAllWasher&Dryer',
 'Washers',
 'Dryers',

In [128]:
pd.DataFrame(list(zip(CAT_list, subcat, subcat_url)),
            columns=['Category','Subcategory','Subcategory_URL'])

Unnamed: 0,Category,Subcategory,Subcategory_URL
0,Tools,ToolsSpecialValue,/en/home/categories/all/collections/tools-spec...
1,Tools,PowerTools,/en/home/categories/tools/power-tools.html
2,Tools,HandTools,/en/home/categories/tools/hand-tools.html
3,Tools,ToolStorage,/en/home/categories/tools/tool-storage.html
4,Tools,PowerTools&Accessories,/en/home/categories/tools/power-tool-accessori...
...,...,...,...
410,Electrical,Iberville,https://www.homedepot.ca/en/home/categories/bu...
411,Electrical,Carlon,https://www.homedepot.ca/en/home/categories/bu...
412,Electrical,Siemens,https://www.homedepot.ca/en/home/categories/bu...
413,Electrical,SquareD,https://www.homedepot.ca/en/home/categories/bu...


The URL list also has an inconsistancy and I need to normalize the Subcategory URL so that they all have the complete url, note that some only have the url part after the www.homedepot.ca/

In [129]:
clean_subcat_url=[]

for i in subcat_url:
    try:
        if i[:3]=='/en':
            n='https://www.homedepot.ca'+i #append the ininitial part or URL for items that only returned the partial result
            clean_subcat_url.append(n)
                        
        else:
            n=i                            # for items with full url adress we keep the same value
            clean_subcat_url.append(n)
    except:
        pass

In [130]:
len(clean_subcat_url)

414

### Now I can set a Dataframe with the correct CAT, Sub CAT and each URL

In [131]:
#set dataframe
HomeDepotDF = pd.DataFrame(list(zip(CAT_list, subcat, clean_subcat_url)),
            columns=['Category','Subcategory','Subcategory_URL'])
#check data frame
HomeDepotDF

Unnamed: 0,Category,Subcategory,Subcategory_URL
0,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...
1,Tools,PowerTools,https://www.homedepot.ca/en/home/categories/to...
2,Tools,HandTools,https://www.homedepot.ca/en/home/categories/to...
3,Tools,ToolStorage,https://www.homedepot.ca/en/home/categories/to...
4,Tools,PowerTools&Accessories,https://www.homedepot.ca/en/home/categories/to...
...,...,...,...
409,Electrical,GoogleNest,https://www.homedepot.ca/en/home/categories/bu...
410,Electrical,Iberville,https://www.homedepot.ca/en/home/categories/bu...
411,Electrical,Carlon,https://www.homedepot.ca/en/home/categories/bu...
412,Electrical,Siemens,https://www.homedepot.ca/en/home/categories/bu...


### Getting Closer...

### From this point, I need to loop into each URL to retrieve the item names and URLs, and from each Item URL get a list of attributes

I expect some of the URLs access to fail, so I have to make sure to use a “try / pass” for those.
In this case, I'm assuming that since many of the attributes are repeated and I won’t lose any important data if an URL fails. 
After all the objective is a list of attributes used in each subcategory and its breadcrumb.


In [42]:
#Test scrapping the first url in the list

# this time I will set a maximum time out, as I've been having trouble at this point
response = requests.get(clean_subcat_url[0],headers=headers, timeout=5) # Get first url in list
pagecontent = response.content.decode('utf-8')
#check if request worked - just some flow control
print(response)
        
#Set each subcat soup
soup = BeautifulSoup(response.content, 'lxml')

<Response [200]>


In [134]:
clean_subcat_url[0]

'https://www.homedepot.ca/en/home/categories/all/collections/tools-special-value.html'

In [43]:
sub_subcategory=[]

for subsub in soup.find_all('a', class_='hdca-cms-side-navigation__list-link'):
    print(subsub.text)
    print(subsub.get('href'))


        Power Tools
      
https://www.homedepot.ca/en/home/categories/tools/power-tools/f/wi0-j2z-ni6

        Power Tool Accessories
      
https://www.homedepot.ca/en/home/categories/tools/power-tool-accessories/f/uoe-j2z-ni6

        Tool Storage
      
https://www.homedepot.ca/en/home/categories/tools/tool-storage/f/gpz-j2z-ni6

        Hand Tools
      
https://www.homedepot.ca/en/home/categories/tools/hand-tools/f/92u-j2z-ni6

        Air Tools & Compressors
      
https://www.homedepot.ca/en/home/categories/tools/air-tools-and-compressors/f/plr-j2z-ni6

        Apparel & Safety Gear
      
https://www.homedepot.ca/en/home/categories/tools/apparel-and-safety-gear/f/2ne-j2z-ni6

        Woodworking Tools
      
https://www.homedepot.ca/en/home/categories/tools/power-tools/woodworking-tools/f/2t0-j2z-ni6

        All Tools Savings
      
https://www.homedepot.ca/en/home/categories/tools/f/ie8-j2z-ni6

        Milwaukee
      
https://www.homedepot.ca/en/home/categories/tools/f/ie

In [135]:
# set a subcategory and sub category url dictionary
subcategories=list(zip(subcat,clean_subcat_url))

In [136]:
for i in subcategories:print(i[1] + ' - '+i[0])

https://www.homedepot.ca/en/home/categories/all/collections/tools-special-value.html - ToolsSpecialValue
https://www.homedepot.ca/en/home/categories/tools/power-tools.html - PowerTools
https://www.homedepot.ca/en/home/categories/tools/hand-tools.html - HandTools
https://www.homedepot.ca/en/home/categories/tools/tool-storage.html - ToolStorage
https://www.homedepot.ca/en/home/categories/tools/power-tool-accessories.html - PowerTools&Accessories
https://www.homedepot.ca/en/home/categories/tools/air-tools-and-compressors.html - AirTools&Compressors
https://www.homedepot.ca/en/home/categories/appliances/vacuums-and-carpet-cleaners/wet-and-dry-vacuums.html - Wet&DryVacuums
https://www.homedepot.ca/en/home/categories/tools/automotive.html - Automotive
https://www.homedepot.ca/en/home/categories/tools/power-tools/woodworking-tools.html - WoodworkingTools&Accessories
https://www.homedepot.ca/en/home/categories/tools/apparel-and-safety-gear.html - Apparel&SafetyGear
https://www.homedepot.ca/en/

In [137]:
# Verify what the first 3 lines look like
for c in subcategories[:3]:
    print(c[1])

https://www.homedepot.ca/en/home/categories/all/collections/tools-special-value.html
https://www.homedepot.ca/en/home/categories/tools/power-tools.html
https://www.homedepot.ca/en/home/categories/tools/hand-tools.html


In [138]:
## Again I will use empty lists that will be fed at each loop, making just they have the same number of elements

sub_subcategory=[]
sub_subcategoryURL=[]
subcat2=[]

for c in subcategories: 
    

        
        # get response for each url
        response = requests.get(c[1],headers=headers, timeout=5) # Get first url in list
        pagecontent = response.content.decode('utf-8')
        #check if request worked - just some flow control
        print(str(c[1])+' '+str(response))

        #Set each subcat soup
        soup = BeautifulSoup(response.content, 'lxml')

        for subsub in soup.find_all('a', class_='hdca-cms-side-navigation__list-link'):
            print(subsub.text)
            sub_subcategory.append(subsub.text)
            sub_subcategoryURL.append(subsub.get('href'))
            subcat2.append(c[0])
            

        
print(len(subcat2))
print(len(sub_subcategory))
print(len(sub_subcategoryURL))
    

https://www.homedepot.ca/en/home/categories/all/collections/tools-special-value.html <Response [200]>

        Power Tools
      

        Power Tool Accessories
      

        Tool Storage
      

        Hand Tools
      

        Air Tools & Compressors
      

        Apparel & Safety Gear
      

        Woodworking Tools
      

        All Tools Savings
      

        Milwaukee
      

        DeWalt
      

        Ryobi
      

        RIDGID
      

        Husky
      

        Makita
      

        Bosch
      

        Below $99
      

        $100 - $399
      

        $400 - $799
      

        $800 - $999
      

        Above $1000
      

        Shop All New Low Prices
      

        Shop All Tools Special Buys
      
https://www.homedepot.ca/en/home/categories/tools/power-tools.html <Response [200]>

        Tools Special Value
      

        Drills
      

        Saws
      

        Combo Kits
      

        Grinders
      

        Impact Drivers
      

https://www.homedepot.ca/en/home/categories/tools/welding-and-soldering-torches.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A1%252Fhd-classes%252Fl1-tools%3AmanufacturerName%3AMilwaukee%2BTool <Response [200]>
https://www.homedepot.ca/en/home/categories/tools.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A1%252Fhd-classes%252Fl1-tools%3AmanufacturerName%3ADEWALT <Response [200]>
https://www.homedepot.ca/en/home/categories/tools.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A1%252Fhd-classes%252Fl1-tools%3AmanufacturerName%3ARIDGID <Response [200]>
https://www.homedepot.ca/en/home/categories/tools.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A1%252Fhd-classes%252Fl1-tools%3AmanufacturerName%3ARYOBI <Response [200]>
https://www.homedepot.ca/en/home/categories/tools.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A1%252Fhd-classes%252Fl1-tools%3AmanufacturerName%3AMAK

https://www.homedepot.ca/en/home/categories/appliances/refrigerators/french-door-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/refrigerators/freezerless-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/refrigerators/bottom-freezer-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/refrigerators/side-by-side-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/refrigerators/mini-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/refrigerators/top-freezer-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/refrigerators/propane-and-solar-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/appliances-parts-and-accessories/refrigerator-parts.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appli

https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/air-purifiers-and-filters.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/heat-recovery-ventilators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/heaters.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/humidifiers-and-dehumidifiers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/thermostats.html <Response [200]>
https://www.homedepot.ca/en/home/home-services/home-depot-installer/water-heater-installation.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/small-appliances.html <Response [200]>

        Blenders
      

        Coffee Makers
      

        Food Processors
      

        Griddler, Sandwich & Waffle Makers
      

    


        Sheathing Plywood
      

        Waferboards
      

        Hardwood Plywood
      

        Sanded Plywood
      

        Pressure Treated Plywood
      

        MDF sheets
      

        Underlayment
      

        Shop All Plywood, MDF & OSB
      

        Deck Boards
      

        Deck Railings
      

        Deck Posts & Accessories
      

        Lattices
      

        Privacy Screens
      

        Shop All Decking
      

        Dimensional Lumber
      

        Studs
      

        Wood Shim
      

        Drywall Steel Studs & Framing
      

        Shop All Framing Lumber & Studs
      

        Pine Boards
      

        Cedar Boards
      

        Hardwood Boards
      

        Weathered Barn Boards
      

        Shelving
      

        Shop All Appearance Boards
      
https://www.homedepot.ca/en/home/categories/building-materials/moulding-and-millwork.html <Response [200]>

        Millwork Accents
      

        Moulding
      

      

https://www.homedepot.ca/en/home/categories/bath/toilets-toilet-seats-and-bidets.html <Response [200]>

        Bidets & Bidet Seats
      

        Toilet Bowls
      

        Toilet Seats
      

        Toilet Tanks
      

        Toilets
      

        Urinals
      

        TOTO
      

        American Standard
      

        Kohler
      

        Glacier Bay
      

        Fill Valves
      

        Toilet Repair Kits & Flappers
      

        Seals Gaskets & Wax Rings
      

        Supply Line
      

        Bath Safety
      

        Tank Levers
      

        Toilet Bolt Caps
      
https://www.homedepot.ca/en/home/categories/all/collections/home-depot-vanity-collections.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/vanity-lighting.html <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-furniture.html <Response [200]>

        Bathroom Shelves & Shower Niches
      

        Bathroom Cabinets 

https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/outdoor-lighting/deck-and-step-lights.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/outdoor-lighting/outdoor-decorative-lighting.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/lamps-and-shades/table-lamps.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/lamps-and-shades/floor-lamps.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/lamps-and-shades/desk-lamps.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/lamps-and-shades/lamp-shades.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/commercial-lighting/shop-lights.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/commercial-lighting/commercial-strip-lights.html <Respon

https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment.html <Response [200]>

        Pressure Washers
      

        Lawn Mowers
      

        Lawn Tractors
      

        Leaf Blowers
      

        Trimmers & Edgers
      

        Combo Kits
      

        Chainsaws
      

        Pole Saws
      

        Tillers & Cultivators
      

        Log Splitters
      

        Wood Chippers
      

        Generators
      

        Outdoor Power Equipment Parts & Accessories
      

        RYOBI
      

        TORO
      

        John Deere
      

        Milwaukee
      

        ECHO
      

        Black Decker
      

        Dewalt
      

        Sun Joe
      

        Lawnboy
      

        Powersmart
      

        Chainsaw Parts & Accessories
      

        Gas Cans
      

        Lawn Mower Parts & Accessories
      

        Lawn Tractor Attachments
      

        Lawn Tractor Parts & Accessories
      

        Leaf Blower Parts & Acc

https://www.homedepot.ca/en/home/categories/outdoors/patio-furniture/outdoor-heating/outdoor-log-storage.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-cooking-and-bbqs/propane-bbqs.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-cooking-and-bbqs/natural-gas-bbqs.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-cooking-and-bbqs/kettle-and-charcoal-bbqs.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-cooking-and-bbqs/outdoor-cookware-and-accessories.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-cooking-and-bbqs/bbq-replacement-parts.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/leaf-blowers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/lawn-mowers.html <Response [200]>

        Self Propelled Lawn Mowers
    

https://www.homedepot.ca/en/home/categories/decor/wallpaper-and-supplies.html <Response [200]>

        Wallpaper
      

        Wallpaper Samples
      

        Tools & Supplies
      

        Peel & Stick Wallpaper
      

        Paintable Wallpaper
      

        Baby & Kids Wallpaper
      

        Textured Wallpaper
      

        Graham & Brown
      

        Joanna Gaines
      

        RoomMates
      

        York Wallcoverings
      

        Wall Murals
      

        Wall Decals
      

        Wall Panels
      

        Wall Tiles
      

        Window Film
      

        Con-Tact
      

        Shelf Liner
      
https://www.homedepot.ca/en/home/categories/decor/window-coverings.html <Response [200]>

        Blinds & Shades
      

        Curtain Rods & Hardware
      

        Curtains & Drapes
      

        Window Film
      

        Ready to Go Blinds
      

        Special Order Blinds
      

        Virtual & In-Home Design
      

        Liner

https://www.homedepot.ca/en/home/categories/decor/home-decor-accents/wall-decor/wall-clocks.html <Response [200]>
https://www.homedepot.ca/en/home/categories/decor/home-decor-accents.html <Response [200]>

        Candles & Holders
      

        Decorative Bowls, Plates & Trays
      

        Decorative Chests, Boxes & Cages
      

        Decorative Pillows & Cushions
      

        Figurines & Sculptures
      

        Incense, Diffusers & Fragrances
      

        Indoor Fountains
      

        Magazine Racks
      

        Picture Frames
      

        Room Dividers
      

        Vases
      

        Wall Décor
      

        Artificial Flowers & Plants
      

        Accent Chairs
      

        Accent Tables
      

        Table Lamps
      

        Wallpaper
      
https://www.homedepot.ca/en/home/categories/decor/storage-and-organization/closet-storage-and-organization.html <Response [200]>

        Closet Doors
      

        Closet Drawers 
      

       

https://www.homedepot.ca/en/home/categories/floors/exercise-and-gym-flooring.html <Response [200]>
https://www.homedepot.ca/en/home/categories/floors/flooring-samples.html <Response [200]>
https://www.homedepot.ca/en/home/categories/floors/flooring-tools-and-accessories.html <Response [200]>

        Adhesives
      

        Carpet Tools & Accessories
      

        Floor & Tile Spacers
      

        Floor Installation Kits
      

        Floor Moulding & Trim
      

        Floor Protection Materials
      

        Floor Rollers
      

        Floor Scrapers & Strippers
      

        Floor Tape
      

        Tile Setting & Grout
      

        Tile Tools & Accessories
      

        Transition Strips
      

        Underlayment & Surface Prep
      

        Wood, Laminate & Vinyl Tools
      

        Floor Baseboards
      

        Floor Cleaners
      

        Flooring Nailers
      

        Floor Registers
      
https://www.homedepot.ca/en/home/categories/floors

https://www.homedepot.ca/en/home/categories/appliances.html <Response [200]>

        Appliance Promotions
      

        Shop All Refrigerators
      

        French Door Refrigerators
      

        Freezerless Refrigerators
      

        Bottom Freezer Refrigerators
      

        Side By Side Refrigerators
      

        Mini Refrigerators
      

        Top Freezer Refrigerators
      

        Propane & Solar Refrigerators
      

        Refrigerator Parts
      

        Freezers & Ice Makers
      

        Wine & Beverage Coolers
      

        Shop All Washer & Dryer
      

        Washers
      

        Dryers
      

        Washer & Dryer Parts
      

        Dishwashers
      

        Shop All Cooking
      

        Cooktops
      

        Microwaves
      

        Ranges
      

        Range Hoods
      

        Wall Ovens
      

        Range Hood Parts
      

        Range & Cooktop Parts
      

        Over-the-Range Microwaves
      

        Sh

https://www.homedepot.ca/en/home/categories/kitchen/cooking-and-food-preparation.html <Response [200]>
https://www.homedepot.ca/en/home/categories/kitchen/kitchen-storage-and-organization.html <Response [200]>

        Cabinet Organization
      

        Drawer Organization
      

        Flatware & Utensil Holders
      

        Kitchen Racks & Dividers
      

        Towel & Napkin Holder
      

        Kitchen Cabinets & Drawers
      

        Storage Bins & Totes
      

        Storage Cabinets
      

        Storage Shelves & Racks
      
https://www.homedepot.ca/en/home/categories/kitchen/kitchen-island-and-carts.html? <Response [200]>
https://www.homedepot.ca/en/home/categories/floors/tile.html?products=&q=%3Arelevance%3AcategoryPathHierarchy%3A2%2Fhd-classes%2Fl1-floors%2Fl2-floor-walltile%3Aattributeidtileusedfor2018%3ABacksplash+Tile <Response [200]>
https://www.homedepot.ca/en/home/categories/decor/furniture/kitchen-and-dining-room-furniture.html?NAVID=CLP_LN_RC_Kitc

https://www.homedepot.ca/en/home/categories/appliances/washers-and-dryers/washers/f/bok-x79 <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/washers-and-dryers/dryers/f/59g-x79 <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/dishwashers/f/r2d-x79 <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/cooking/wall-ovens/f/sm-x79 <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/cooking/microwaves/f/2at-x79 <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/air-purifiers-and-filters/f/pg0-x79 <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/air-and-furnace-filters/f/f98-x79 <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/electrical/home-security-and-surveillance/alarm-systems-and-sensors/water-leak-detectors/f/14c8-x79 <Response [200]>
https://www.homedepot.ca/en/ho

https://www.homedepot.ca/en/home/categories/decor/storage-and-organization/f/tzw-mhz <Response [200]>
https://www.homedepot.ca/en/home/categories/decor/storage-and-organization.html?products=&q=%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-decor%252Fl2-storage-organization%3AmanufacturerName%3ARubbermaid# <Response [200]>
https://www.homedepot.ca/en/home/categories/decor/storage-and-organization.html?products=&q=%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-decor%252Fl2-storage-organization%3AmanufacturerName%3AGladiator# <Response [200]>
https://www.homedepot.ca/en/home/categories/decor/storage-and-organization.html?products=&q=%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-decor%252Fl2-storage-organization%3AmanufacturerName%3AUmbra# <Response [200]>
https://www.homedepot.ca/en/home/home-services/home-depot-installer/storage-unit-rental.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/electrical/ele

https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/ceiling-lights/flush-mount-lighting.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/outdoor-lighting.html <Response [200]>

        Outdoor Wall Lights
      

        Outdoor Ceiling Lights
      

        Outdoor Decorative Lighting
      

        Landscape Lighting
      

        Security Lights
      

        Post Lights
      

        Deck and Step Lights
      

        Shop All Outdoor Lighting
      

        Solar Landscape Lighting
      

        Solar Wall Lights
      

        Solar Deck Lights
      

        Solar Security Lights
      

        Solar Post Lights
      

        Motion Sensor Wall Lights
      

        Motion Sensor Ceiling Lights
      

        Dusk to Dawn Wall Lights
      

        Dusk to Dawn Ceiling Lights
      

        Dusk to Dawn Post Lights
      

        Dusk to Dawn Security Lights
      

        Motion Sensor Security

### Again the sub_subCategory name and URL require some cleaning steps

In [141]:
sub_subcategory[:10]

['\n        Power Tools\n      ',
 '\n        Power Tool Accessories\n      ',
 '\n        Tool Storage\n      ',
 '\n        Hand Tools\n      ',
 '\n        Air Tools & Compressors\n      ',
 '\n        Apparel & Safety Gear\n      ',
 '\n        Woodworking Tools\n      ',
 '\n        All Tools Savings\n      ',
 '\n        Milwaukee\n      ',
 '\n        DeWalt\n      ']

In [142]:
## Apply same logic to clean names
sub_subcategory1= []
for i in sub_subcategory:
    sub_subcategory1.append(i.replace(' ',''))
  
sub_subcategory=[]
    
for i in sub_subcategory1:
    sub_subcategory.append(i.replace('\n',''))
    
sub_subcategory    

['PowerTools',
 'PowerToolAccessories',
 'ToolStorage',
 'HandTools',
 'AirTools&Compressors',
 'Apparel&SafetyGear',
 'WoodworkingTools',
 'AllToolsSavings',
 'Milwaukee',
 'DeWalt',
 'Ryobi',
 'RIDGID',
 'Husky',
 'Makita',
 'Bosch',
 'Below$99',
 '$100-$399',
 '$400-$799',
 '$800-$999',
 'Above$1000',
 'ShopAllNewLowPrices',
 'ShopAllToolsSpecialBuys',
 'ToolsSpecialValue',
 'Drills',
 'Saws',
 'ComboKits',
 'Grinders',
 'ImpactDrivers',
 'Sanders',
 'Routers',
 'Planers&Jointers',
 'OscillatingTools',
 'RotaryTools',
 'Polishers',
 'HeatGuns',
 'Shears&Nibblers',
 'InspectionCameras',
 '3DPrinters&Accessories',
 'Radios',
 'SpecialtyPowerTools',
 '12V',
 '18V',
 '20V',
 '36V',
 '48V',
 '60V',
 '100V+',
 'DrillBits',
 'SawBlades',
 'Batteries&Chargers',
 'WoodworkingToolAccessories',
 'ShopAllPowerToolAccessories',
 'Drills',
 'Saws',
 'ImpactDrivers',
 'Grinders',
 'ShopAllBareTools',
 'MitreSaws',
 'CircularSaws',
 'TableSaws',
 'ReciprocatingSaws',
 'ShopAllSaws',
 'DrillDrivers'

In [143]:
### Apply same Logic to clean URLs

sub_subcategoryURL1=[]

for i in sub_subcategoryURL:
    try:
        if i[:3]=='/en':
            n='https://www.homedepot.ca'+i #append the ininitial part or URL for items that only returned the partial result
            sub_subcategoryURL1.append(n)
                        
        else:
            n=i                            # for items with full url adress we keep the same value
            sub_subcategoryURL1.append(n)
    except:
        pass


In [144]:
#Set a Data frame with the new information

Sub_Category_DF= pd.DataFrame(list(zip(subcat2, sub_subcategory, sub_subcategoryURL1)),
                              columns=['SubCategory','Sub_SUB_Category','Sub_Sub_category_URL'])

In [145]:
Sub_Category_DF

Unnamed: 0,SubCategory,Sub_SUB_Category,Sub_Sub_category_URL
0,ToolsSpecialValue,PowerTools,https://www.homedepot.ca/en/home/categories/to...
1,ToolsSpecialValue,PowerToolAccessories,https://www.homedepot.ca/en/home/categories/to...
2,ToolsSpecialValue,ToolStorage,https://www.homedepot.ca/en/home/categories/to...
3,ToolsSpecialValue,HandTools,https://www.homedepot.ca/en/home/categories/to...
4,ToolsSpecialValue,AirTools&Compressors,https://www.homedepot.ca/en/home/categories/to...
...,...,...,...
2712,Lighting&CeilingFans,FacilitiesMaintenance,https://www.homedepot.ca/en/home/categories/li...
2713,Lighting&CeilingFans,"Tools,Truck,LargeEquipmentRental",https://www.homedepot.ca/en/home/categories/sm...
2714,Lighting&CeilingFans,CeilingLights,https://www.homedepot.ca/en/home/categories/de...
2715,Lighting&CeilingFans,CeilingFans,https://www.homedepot.ca/en/home/categories/to...


### Now I have two data frames - 
### The first with all Categories, subcategories and its URLs
###                                                   The second with all SubCategories, The Sub_SubCategories and Its URLs

An additional loop will now gather each product name, product URL and from that the attributes

#####  Once the logic is complete it will be possible to either join these tables or if required combine the entire logic to be able to update the final result all at once. 


In [146]:
### Set the dictionary for the SUb_SubCategories and their URLs
sub_subcategory_dict = list(zip(sub_subcategory,sub_subcategoryURL1))

In [179]:
Sub_subCategory_new=[]
Prod_name=[]
Prod_url=[]


for zz in sub_subcategory_dict: 
    # get response for each url
        response = requests.get(zz[1],headers=headers, timeout=5) # Get first url in list
        pagecontent = response.content.decode('utf-8')
        #check if request worked - just some flow control
        print(str(zz[1])+' '+str(response))

        #Set each subcat soup
        soup = BeautifulSoup(response.content, 'lxml')

        for prod in soup.find_all('a', class_='acl-product-card__title-link ng-star-inserted'):
            Prod_name.append(prod.text)
            Prod_url.append(prod.get('href'))
            Sub_subCategory_new.append(zz[0])
            #sub_subcategory.append(subsub.text)
            #sub_subcategoryURL.append(subsub.get('href'))
            #subcat2.append(c[0])
            

        

    

https://www.homedepot.ca/en/home/categories/tools/power-tools/f/wi0-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/power-tool-accessories/f/uoe-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/tool-storage/f/gpz-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/hand-tools/f/92u-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/air-tools-and-compressors/f/plr-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/apparel-and-safety-gear/f/2ne-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/power-tools/woodworking-tools/f/2t0-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/f/ie8-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/f/ie8-9wh-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/f/ie8-t3n-j2z-ni6 <Response [200]>
https://www.homedepot.ca/en/home/categories

https://www.homedepot.ca/en/home/categories/tools/power-tools/woodworking-tools.html <Response [200]>
https://www.homedepot.ca/en/home/brand-pages/milwaukee-tools-and-accessories/milwaukee-exclusive-store-event.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/air-tools-and-compressors.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/tool-storage.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/hand-tools.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/vacuums-and-carpet-cleaners/wet-and-dry-vacuums.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/generators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/hardware/fasteners.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/ladders-and-s

https://www.homedepot.ca/en/home/categories/tools/power-tool-accessories/tool-stands.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/power-tool-accessories/woodworking-tool-accessories.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/hand-tools.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/outdoor-power-equipment-accessories.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/power-tools.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/tool-storage.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/air-tools-and-compressors/air-compressors.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/air-tools-and-compressors/air-nailers-and-staplers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/air-tools-and-compressors/air-tools.html <Response [200]>
https://www.homedepot.ca/en/home/cate

https://www.homedepot.ca/en/home/categories/tools/apparel-and-safety-gear/safety-and-protective-gear.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/commercial-lighting/work-lights.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/apparel-and-safety-gear/clothing-and-workwear.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/apparel-and-safety-gear/footwear.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/apparel-and-safety-gear/safety-and-protective-gear.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/outdoor-power-equipment-parts-and-accessories/chainsaw-parts-and-accessories/chainsaw-protective-gear.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/apparel-and-safety-gear/safety-and-protective-gear/first-aid-kits.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/safety-and-securit

https://www.homedepot.ca/en/home/categories/decor/furniture/mudroom-and-entryway-furniture/coat-racks.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/pressure-washers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/lawn-mowers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/lawn-tractors.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/leaf-blowers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/trimmers-and-edgers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/combo-kits.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/chainsaws.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/pole-saws

https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment.html?products=&q=*%253Arelevance%253AmanufacturerName%253ASun%252BJoe%253AcategoryPathHierarchy%253A2%25252Fhd-classes%25252Fl1-outdoors%25252Fl2-outdoor-power%253AattributeidOPEBatteryPlatform17%253A40V <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment.html?products=&q=*%253Arelevance%253AmanufacturerName%253AMAKITA%253AcategoryPathHierarchy%253A2%25252Fhd-classes%25252Fl1-outdoors%25252Fl2-outdoor-power%253AattributeidOPEBatteryPlatform17%253A18V <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/outdoor-power-equipment-parts-and-accessories/gas-cans.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/outdoor-power-equipment-parts-and-accessories/lawn-tractor-attachments.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/outdoor-

https://www.homedepot.ca/en/home/categories/appliances/f/92r-fs2-j2z <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/f/92r-d15-j2z <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/f/92r-a1j-lum-v0w-v74-j2z <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/f/92r-8s0-j2z-qs7 <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/f/92r-10g-j2z <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/f/92r-jf5-j2z <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/f/92r-c7b-5ye-7gz-j2z <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/f/92r-oj2-j2z <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/refrigerators/french-door-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/refrigerators/side-by-side-refrigerators.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliance

https://www.homedepot.ca/en/home/categories/appliances/washers-and-dryers/all-in-one-washer-dryer.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/washers-and-dryers.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-appliances%252Fl2-washers-dryers%3Aattributeidstackable%3AYes <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/washers-and-dryers/unitized-washers-and-dryers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/washers-and-dryers/washers/portable-washers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/washers-and-dryers.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-appliances%252Fl2-washers-dryers%3AmanufacturerName%3AWhirlpool <Response [200]>
https://www.homedepot.ca/en/home/brand-pages/lg-appliances.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/washers-and-dryers.html?produ

https://www.homedepot.ca/en/home/categories/appliances/dishwashers.html?products <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/dishwashers.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-appliances%252Fl2-dishwashers%3Aattributeidsizeh20%3ASlimline <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/dishwashers.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-appliances%252Fl2-dishwashers%3Aattributeiddishtypeh20%3APortable%252FFreestanding <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/dishwashers.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-appliances%252Fl2-dishwashers%3Aattributeiddishtypeh20%3ACountertop%252FCompact <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/dishwashers.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-appliances%252Fl2-dishwashers%3Aat

https://www.homedepot.ca/en/home/categories/appliances/cooking/cooktops.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/cooking/range-hoods/under-cabinet-range-hoods.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/cooking/range-hoods/wall-mount-range-hoods.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/cooking/range-hoods/insert-range-hoods.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/cooking/range-hoods/downdraft-range-hoods.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/cooking/range-hoods/insert-range-hoods.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/cooking/range-hoods.html <Response [200]>
https://www.homedepot.ca/en/home/brand-pages/lg-appliances.html?q=*%3Arelevance%3AcategoryPathHierarchy%3A3%252Fhd-classes%252Fl1-appliances%252Fl2-cooking%252F1010214%3AmanufacturerName%3ALG%2BElectronics <Response

https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/air-and-furnace-filters.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A3/hd-classes/l1-appliances/l2-heatingcoolingvent/l3-affilters2017%3AmanufacturerName%3AFiltrete <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/air-and-furnace-filters.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A3/hd-classes/l1-appliances/l2-heatingcoolingvent/l3-affilters2017%3AmanufacturerName%3ABlueair <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/air-and-furnace-filters.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A3/hd-classes/l1-appliances/l2-heatingcoolingvent/l3-affilters2017%3AmanufacturerName%3AHoneywell <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/heating-cooling-and-air-quality/air-and-furnace-filters.html?products=&q=*%3Arelevance%3AcategoryPathH

https://www.homedepot.ca/en/home/categories/appliances/small-appliances/rotisseries.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/small-appliances/slow-cookers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/small-appliances/specialty-small-appliances.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/small-appliances/toasters.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/small-appliances/air-fryer.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/vacuums-and-carpet-cleaners/canister-vacuums.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/vacuums-and-carpet-cleaners/carpet-and-hard-surface-cleaners.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/vacuums-and-carpet-cleaners/central-vacuums.html <Response [200]>
https://www.homedepot.ca/en/home/categories/appliances/vacuums-and-carpet-clea

https://www.homedepot.ca/en/home/categories/building-materials/electrical/alternative-power-solutions/solar-power.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/electrical/fire-safety.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/electrical/home-security-and-surveillance.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/electrical/home-electronics-and-communications.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/electrical/batteries-and-chargers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/automotive/automotive-batteries-and-chargers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/automotive/interior-automotive-accessories/car-electronics.html <Response [200]>
https://www.homedepot.ca/en/home/categories/outdoors/outdoor-power-equipment/generators.html?searchterm=generators <Response [200]

https://www.homedepot.ca/en/home/categories/building-materials/insulation/blow-in-insulation.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/insulation/fiberglass.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/insulation/foil-insulation.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/insulation/housewrap.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/insulation/insulation-accessories.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/insulation/rigid-insulation.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/insulation/spray-foam-insulation.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/insulation/stone-wool.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/plumbing/pipe-and-fittings/pipe-insulati

https://www.homedepot.ca/en/home/categories/building-materials/roofing-and-gutters/roofing.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/ventilation-and-ductwork/roof-vents.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/siding/vinyl-siding.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/hardware/fasteners/nails/roofing-nails.html <Response [200]>
https://www.homedepot.ca/en/home/categories/tools/ladders-and-scaffolding/extension-ladders.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/siding/vinyl-siding.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/siding/stone-veneer.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/siding/metal-siding.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/siding/siding-tools-and-accessories.html <Res

https://www.homedepot.ca/en/home/categories/bath/bathroom-faucets-and-shower-heads/bathroom-sink-faucets.html <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-faucets-and-shower-heads/bathtub-faucets.html <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-faucets-and-shower-heads/shower-faucets.html <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-faucets-and-shower-heads/shower-panels-and-wall-bar-shower-sets.html <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-faucets-and-shower-heads/showerheads-and-hand-showers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-faucets-and-shower-heads.html?products=&q=*%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-bath%252Fl2-bath-faucets%3AattributeidFinishGlobal%3AStainless%2BSteel <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-faucets-and-shower-heads.html?products=&q=:r

https://www.homedepot.ca/en/home/categories/bath/bathroom-vanities/bathroom-vanities-with-tops.html?q=%3Arelevance%3AcategoryPathHierarchy%3A3%252Fhd-classes%252Fl1-bath%252Fl2-vanities%252Fl3-vanitycombos%3Aattributeidbthpopularwidthin%3A48%2BInch%2BVanities# <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-vanities/bathroom-vanities-with-tops.html?q=*%3Arelevance%3AcategoryPathHierarchy%3A3%252Fhd-classes%252Fl1-bath%252Fl2-vanities%252Fl3-vanitycombos%3Aattributeidbthpopularwidthin%3A54%2BInch%2BVanities <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-vanities/bathroom-vanities-with-tops.html?q=%3Arelevance%3AcategoryPathHierarchy%3A3%252Fhd-classes%252Fl1-bath%252Fl2-vanities%252Fl3-vanitycombos%3Aattributeidbthpopularwidthin%3A60%2BInch%2BVanities# <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/bathroom-vanities/bathroom-vanities-with-tops.html?q=:relevance:attributeidbthpopularwidthin:63%2BInch%2BVanities:cat

https://www.homedepot.ca/en/home/categories/bath/toilets-toilet-seats-and-bidets/toilets.html?q=%3Arelevance%3AcategoryPathHierarchy%3A3%252Fhd-classes%252Fl1-bath%252Fl2-toilets%252Fl3-toilets%3AmanufacturerName%3AAmerican%2BStandard# <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/toilets-toilet-seats-and-bidets/toilets.html?q=%3Arelevance%3AcategoryPathHierarchy%3A3%252Fhd-classes%252Fl1-bath%252Fl2-toilets%252Fl3-toilets%3AmanufacturerName%3AKOHLER# <Response [200]>
https://www.homedepot.ca/en/home/categories/bath/toilets-toilet-seats-and-bidets/toilets/f/glacier-bay/uid-jek <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/plumbing/plumbing-repair-parts/toilet-repair-parts/flush-valves.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/plumbing/plumbing-repair-parts/toilet-repair-parts/flappers.html <Response [200]>
https://www.homedepot.ca/en/home/categories/building-materials/plumbing/plumbing-repa

https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/light-bulbs/light-bulb-accessories.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/light-bulbs/hid-bulbs.html <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/light-bulbs.html?products=&q=%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-lighting%252Fl2-lightbulbs%3Aattributeidapplicationtype2016%3ASpot%2B%2526%2BFlood%2BLight <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/light-bulbs.html?products=&q=%3Arelevance%3AcategoryPathHierarchy%3A2%252Fhd-classes%252Fl1-lighting%252Fl2-lightbulbs%3Aattributeidapplicationtype2016%3ATrack%2BLighting <Response [200]>
https://www.homedepot.ca/en/home/categories/lighting-and-ceiling-fans/light-bulbs.html?products=&q=%3Arelevance%3Aattributeidbulbshapecode2016%3AA21%3Aattributeidbulbshapecode2016%3AA19%3Aattributeidbulbshapecode2016%3AA15%3Aattrib

MissingSchema: Invalid URL '#': No schema supplied. Perhaps you meant http://#?

In [181]:
pd.DataFrame(list(zip(Sub_subCategory_new, Prod_name, Prod_url)),
                              columns=['Sub_SUBCategory','Product_name','Product_URL']).tail()

Unnamed: 0,Sub_SUBCategory,Product_name,Product_URL
15860,SolarSecurityLights,"Gama Sonic Windsor Solar Lamp, 3 inch Fitter M...",/product/gama-sonic-windsor-solar-lamp-3-inch-...
15861,SolarSecurityLights,Sun-Ray Kenwick Solar Lamp Post and Planter - ...,/product/sun-ray-kenwick-solar-lamp-post-and-p...
15862,SolarSecurityLights,Fusion Solar Lamp Post Light,/product/fusion-solar-lamp-post-light/1000694212
15863,SolarSecurityLights,Gama Sonic Aurora Black Solar Post-Mount/Wall-...,/product/gama-sonic-aurora-black-solar-post-mo...
15864,SolarSecurityLights,Sun-Ray Avalon Three Head Solar Lamp Post and ...,/product/sun-ray-avalon-three-head-solar-lamp-...


In [182]:
## The Urls require some fixing

Prod_url1=[]

for i in Prod_url:
    try:
        if i[:8]=='/product':
            n='https://www.homedepot.ca'+i #append the ininitial part or URL for items that only returned the partial result
            Prod_url1.append(n)
                        
        else:
            n=i                            # for items with full url adress we keep the same value
            Prod_url1.append(n)
    except:
        pass


In [183]:
Product_listDF = pd.DataFrame(list(zip(Sub_subCategory_new, Prod_name, Prod_url1)),
                              columns=['Sub_SUBCategory','Product_name','Product_URL'])

In [185]:
Product_listDF.tail()

Unnamed: 0,Sub_SUBCategory,Product_name,Product_URL
15860,SolarSecurityLights,"Gama Sonic Windsor Solar Lamp, 3 inch Fitter M...",https://www.homedepot.ca/product/gama-sonic-wi...
15861,SolarSecurityLights,Sun-Ray Kenwick Solar Lamp Post and Planter - ...,https://www.homedepot.ca/product/sun-ray-kenwi...
15862,SolarSecurityLights,Fusion Solar Lamp Post Light,https://www.homedepot.ca/product/fusion-solar-...
15863,SolarSecurityLights,Gama Sonic Aurora Black Solar Post-Mount/Wall-...,https://www.homedepot.ca/product/gama-sonic-au...
15864,SolarSecurityLights,Sun-Ray Avalon Three Head Solar Lamp Post and ...,https://www.homedepot.ca/product/sun-ray-avalo...


In [186]:

### Now set a Product name, product url dictionary

product_dict=list(zip(Prod_name,Prod_url1))

In [189]:
for i in product_dict[:10]:
    print(i)

('DEWALT 20V MAX XR Lithium-Ion Cordless Brushless Hammer Drill/Impact Combo Kit (2-Tool) with 2 Batteries 4 Ah and Charger', 'https://www.homedepot.ca/product/dewalt-20v-max-xr-lithium-ion-cordless-brushless-hammer-drill-impact-combo-kit-2-tool-with-2-batteries-4-ah-and-charger/1001011706')
('Milwaukee Tool M12 FUEL 12V Lithium-Ion Brushless Cordless Surge Impact & Drill Combo Kit (2-Tool) w/ 2 Batteries', 'https://www.homedepot.ca/product/milwaukee-tool-m12-fuel-12v-lithium-ion-brushless-cordless-surge-impact-drill-combo-kit-2-tool-w-2-batteries/1001527742')
('Milwaukee Tool M18 FUEL 18-Volt Cordless 1/2 in. Impact Wrench with Friction Ring Kit W/ Two 5.0 Ah Batteries', 'https://www.homedepot.ca/product/milwaukee-tool-m18-fuel-18-volt-cordless-1-2-in-impact-wrench-with-friction-ring-kit-w-two-5-0-ah-batteries/1001411923')
('Dremel 120 V Variable Speed High Performance Rotary Tool Kit', 'https://www.homedepot.ca/product/dremel-120-v-variable-speed-high-performance-rotary-tool-kit/1001

In [None]:
### Finaly I can join my data frames so that I can get features only for categories or subcategories that are interesting to me
### Let's check our dataframes




In [190]:
### Higher level category and subcat DF
HomeDepotDF.head()

Unnamed: 0,Category,Subcategory,Subcategory_URL
0,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...
1,Tools,PowerTools,https://www.homedepot.ca/en/home/categories/to...
2,Tools,HandTools,https://www.homedepot.ca/en/home/categories/to...
3,Tools,ToolStorage,https://www.homedepot.ca/en/home/categories/to...
4,Tools,PowerTools&Accessories,https://www.homedepot.ca/en/home/categories/to...


In [192]:
# The subcategory DF
Sub_Category_DF.head()

Unnamed: 0,SubCategory,Sub_SUB_Category,Sub_Sub_category_URL
0,ToolsSpecialValue,PowerTools,https://www.homedepot.ca/en/home/categories/to...
1,ToolsSpecialValue,PowerToolAccessories,https://www.homedepot.ca/en/home/categories/to...
2,ToolsSpecialValue,ToolStorage,https://www.homedepot.ca/en/home/categories/to...
3,ToolsSpecialValue,HandTools,https://www.homedepot.ca/en/home/categories/to...
4,ToolsSpecialValue,AirTools&Compressors,https://www.homedepot.ca/en/home/categories/to...


In [194]:
#the product level
Product_listDF.head()

Unnamed: 0,Sub_SUBCategory,Product_name,Product_URL
0,PowerTools,DEWALT 20V MAX XR Lithium-Ion Cordless Brushle...,https://www.homedepot.ca/product/dewalt-20v-ma...
1,PowerTools,Milwaukee Tool M12 FUEL 12V Lithium-Ion Brushl...,https://www.homedepot.ca/product/milwaukee-too...
2,PowerTools,Milwaukee Tool M18 FUEL 18-Volt Cordless 1/2 i...,https://www.homedepot.ca/product/milwaukee-too...
3,PowerTools,Dremel 120 V Variable Speed High Performance R...,https://www.homedepot.ca/product/dremel-120-v-...
4,PowerTools,MAKITA 18V LXT 9-Piece Combo 4.0Ah Kit with 3 ...,https://www.homedepot.ca/product/makita-18v-lx...


In [197]:
# Merge the mais category and the subcategory DF

Cat_SubCAT_DF = pd.merge(HomeDepotDF,Sub_Category_DF, left_on= 'Subcategory', right_on= 'SubCategory')

In [198]:
Cat_SubCAT_DF.head()

Unnamed: 0,Category,Subcategory,Subcategory_URL,SubCategory,Sub_SUB_Category,Sub_Sub_category_URL
0,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,PowerTools,https://www.homedepot.ca/en/home/categories/to...
1,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,PowerToolAccessories,https://www.homedepot.ca/en/home/categories/to...
2,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,ToolStorage,https://www.homedepot.ca/en/home/categories/to...
3,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,HandTools,https://www.homedepot.ca/en/home/categories/to...
4,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,AirTools&Compressors,https://www.homedepot.ca/en/home/categories/to...


In [200]:
# Finally merge with the product level DF
MyHomeDepot_DF = pd.merge(Cat_SubCAT_DF,Product_listDF, left_on= 'Sub_SUB_Category', right_on= 'Sub_SUBCategory')


In [201]:
MyHomeDepot_DF

Unnamed: 0,Category,Subcategory,Subcategory_URL,SubCategory,Sub_SUB_Category,Sub_Sub_category_URL,Sub_SUBCategory,Product_name,Product_URL
0,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,PowerTools,https://www.homedepot.ca/en/home/categories/to...,PowerTools,DEWALT 20V MAX XR Lithium-Ion Cordless Brushle...,https://www.homedepot.ca/product/dewalt-20v-ma...
1,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,PowerTools,https://www.homedepot.ca/en/home/categories/to...,PowerTools,Milwaukee Tool M12 FUEL 12V Lithium-Ion Brushl...,https://www.homedepot.ca/product/milwaukee-too...
2,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,PowerTools,https://www.homedepot.ca/en/home/categories/to...,PowerTools,Milwaukee Tool M18 FUEL 18-Volt Cordless 1/2 i...,https://www.homedepot.ca/product/milwaukee-too...
3,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,PowerTools,https://www.homedepot.ca/en/home/categories/to...,PowerTools,Dremel 120 V Variable Speed High Performance R...,https://www.homedepot.ca/product/dremel-120-v-...
4,Tools,ToolsSpecialValue,https://www.homedepot.ca/en/home/categories/al...,ToolsSpecialValue,PowerTools,https://www.homedepot.ca/en/home/categories/to...,PowerTools,MAKITA 18V LXT 9-Piece Combo 4.0Ah Kit with 3 ...,https://www.homedepot.ca/product/makita-18v-lx...
...,...,...,...,...,...,...,...,...,...
39515,Electrical,FlushmountLighting,https://www.homedepot.ca/en/home/categories/li...,FlushmountLighting,SolarSecurityLights,https://www.homedepot.ca/en/home/categories/li...,SolarSecurityLights,"Gama Sonic Windsor Solar Lamp, 3 inch Fitter M...",https://www.homedepot.ca/product/gama-sonic-wi...
39516,Electrical,FlushmountLighting,https://www.homedepot.ca/en/home/categories/li...,FlushmountLighting,SolarSecurityLights,https://www.homedepot.ca/en/home/categories/li...,SolarSecurityLights,Sun-Ray Kenwick Solar Lamp Post and Planter - ...,https://www.homedepot.ca/product/sun-ray-kenwi...
39517,Electrical,FlushmountLighting,https://www.homedepot.ca/en/home/categories/li...,FlushmountLighting,SolarSecurityLights,https://www.homedepot.ca/en/home/categories/li...,SolarSecurityLights,Fusion Solar Lamp Post Light,https://www.homedepot.ca/product/fusion-solar-...
39518,Electrical,FlushmountLighting,https://www.homedepot.ca/en/home/categories/li...,FlushmountLighting,SolarSecurityLights,https://www.homedepot.ca/en/home/categories/li...,SolarSecurityLights,Gama Sonic Aurora Black Solar Post-Mount/Wall-...,https://www.homedepot.ca/product/gama-sonic-au...


In [212]:
### This gives us a complete list with the full breadcrumb path for all products, 
### now it's easy to select any specific category or subcategory and check for attributes

In [None]:
#set a list of product urls for all Tool main category products
ToolProduct_list = list(MyHomeDepot_DF['Product_URL'][MyHomeDepot_DF['Category']=='Tools'])

In [214]:
ToolProduct_list[:5]

['https://www.homedepot.ca/product/dewalt-20v-max-xr-lithium-ion-cordless-brushless-hammer-drill-impact-combo-kit-2-tool-with-2-batteries-4-ah-and-charger/1001011706',
 'https://www.homedepot.ca/product/milwaukee-tool-m12-fuel-12v-lithium-ion-brushless-cordless-surge-impact-drill-combo-kit-2-tool-w-2-batteries/1001527742',
 'https://www.homedepot.ca/product/milwaukee-tool-m18-fuel-18-volt-cordless-1-2-in-impact-wrench-with-friction-ring-kit-w-two-5-0-ah-batteries/1001411923',
 'https://www.homedepot.ca/product/dremel-120-v-variable-speed-high-performance-rotary-tool-kit/1001168865',
 'https://www.homedepot.ca/product/makita-18v-lxt-9-piece-combo-4-0ah-kit-with-3-batteries/1001128533']

In [220]:
ToolSubCAT_list= list(MyHomeDepot_DF['Sub_SUBCategory'][MyHomeDepot_DF['Category']=='Tools'])

In [221]:
#zip the two lists

workdict = list(zip(ToolSubCAT_list,ToolProduct_list))

In [222]:
workdict

[('PowerTools',
  'https://www.homedepot.ca/product/dewalt-20v-max-xr-lithium-ion-cordless-brushless-hammer-drill-impact-combo-kit-2-tool-with-2-batteries-4-ah-and-charger/1001011706'),
 ('PowerTools',
  'https://www.homedepot.ca/product/milwaukee-tool-m12-fuel-12v-lithium-ion-brushless-cordless-surge-impact-drill-combo-kit-2-tool-w-2-batteries/1001527742'),
 ('PowerTools',
  'https://www.homedepot.ca/product/milwaukee-tool-m18-fuel-18-volt-cordless-1-2-in-impact-wrench-with-friction-ring-kit-w-two-5-0-ah-batteries/1001411923'),
 ('PowerTools',
  'https://www.homedepot.ca/product/dremel-120-v-variable-speed-high-performance-rotary-tool-kit/1001168865'),
 ('PowerTools',
  'https://www.homedepot.ca/product/makita-18v-lxt-9-piece-combo-4-0ah-kit-with-3-batteries/1001128533'),
 ('PowerTools',
  'https://www.homedepot.ca/product/dewalt-20v-max-lithium-ion-cordless-brushless-oscillating-tool-kit/1001431602'),
 ('PowerTools',
  'https://www.homedepot.ca/product/dewalt-20v-max-lithium-ion-cord

In [None]:
### Now I can check each product attributes

In [241]:
Attributes=[]
SubCat = []

for i in workdict[:10]: 
    # get response for each url
        response = requests.get(i[1],headers=headers, timeout=5) # Get first url in list
        pagecontent = response.content.decode('utf-8')
        #check if request worked - just some flow control
        print(str(i[1])+' '+str(response))

        #Set each subcat soup
        soup = BeautifulSoup(response.content, 'lxml')

        for att in soup.find_all('dt', class_='ng-star-inserted'):
            Attributes.append(att.text)
            SubCat.append(i[0])
            

https://www.homedepot.ca/product/dewalt-20v-max-xr-lithium-ion-cordless-brushless-hammer-drill-impact-combo-kit-2-tool-with-2-batteries-4-ah-and-charger/1001011706 <Response [200]>
https://www.homedepot.ca/product/milwaukee-tool-m12-fuel-12v-lithium-ion-brushless-cordless-surge-impact-drill-combo-kit-2-tool-w-2-batteries/1001527742 <Response [200]>
https://www.homedepot.ca/product/milwaukee-tool-m18-fuel-18-volt-cordless-1-2-in-impact-wrench-with-friction-ring-kit-w-two-5-0-ah-batteries/1001411923 <Response [200]>
https://www.homedepot.ca/product/dremel-120-v-variable-speed-high-performance-rotary-tool-kit/1001168865 <Response [200]>
https://www.homedepot.ca/product/makita-18v-lxt-9-piece-combo-4-0ah-kit-with-3-batteries/1001128533 <Response [200]>
https://www.homedepot.ca/product/dewalt-20v-max-lithium-ion-cordless-brushless-oscillating-tool-kit/1001431602 <Response [200]>
https://www.homedepot.ca/product/dewalt-20v-max-lithium-ion-cordless-7-1-4-inch-circular-saw-with-battery-5ah-cha

In [242]:
print(len(Attributes))
print(len(SubCat))

174
174


In [249]:
Tool_subcat_att=pd.DataFrame(list(zip(SubCat, Attributes)),
                              columns=['Sub_SUBCategory','Attribute'])
Tool_subcat_att

Unnamed: 0,Sub_SUBCategory,Attribute
0,PowerTools,Assembled Depth (in inches)
1,PowerTools,Assembled Height (in inches)
2,PowerTools,Assembled Weight (in lbs)
3,PowerTools,Assembled Width (in inches)
4,PowerTools,Packaged Depth (in inches)
...,...,...
169,PowerTools,Amps
170,PowerTools,Batteries Included
171,PowerTools,Cordless/Corded
172,PowerTools,Country of Origin


In [250]:
Tool_subcat_att['Category']='Tools'

In [251]:
Tool_subcat_att = Tool_subcat_att[['Category','Sub_SUBCategory','Attribute']]
Tool_subcat_att

Unnamed: 0,Category,Sub_SUBCategory,Attribute
0,Tools,PowerTools,Assembled Depth (in inches)
1,Tools,PowerTools,Assembled Height (in inches)
2,Tools,PowerTools,Assembled Weight (in lbs)
3,Tools,PowerTools,Assembled Width (in inches)
4,Tools,PowerTools,Packaged Depth (in inches)
...,...,...,...
169,Tools,PowerTools,Amps
170,Tools,PowerTools,Batteries Included
171,Tools,PowerTools,Cordless/Corded
172,Tools,PowerTools,Country of Origin


In [253]:
#Eliminate the duplicities that might arrise
Tool_subcat_att.drop_duplicates(subset='Attribute', inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [254]:
Tool_subcat_att

Unnamed: 0,Category,Sub_SUBCategory,Attribute
0,Tools,PowerTools,Assembled Depth (in inches)
1,Tools,PowerTools,Assembled Height (in inches)
2,Tools,PowerTools,Assembled Weight (in lbs)
3,Tools,PowerTools,Assembled Width (in inches)
4,Tools,PowerTools,Packaged Depth (in inches)
5,Tools,PowerTools,Packaged Height (in inches)
6,Tools,PowerTools,Packaged Weight (In lbs)
7,Tools,PowerTools,Packaged Width (in inches)
8,Tools,PowerTools,Amps
9,Tools,PowerTools,Batteries Included


In [None]:
### So there it is. Any catgory or subcategory can now be scrapped for the attibutes.
### This logic also makes it easy to adapt the code to scrape for prices, or even images.
### It takes about 30 minutes in my machine to scrape for all categories, but an individual subcategory gets done pretty fast. 

### As next steps to this I would explore the use of Selenium to access all page products, not only the first one,
# and in optimizing the code to run faster, maybe turn it into a function that accepts a subcategory to scrape and returns the desired list.
# I set this to only print the final dataframe, but this could easily be exported and saved as a .csv file.

