# Web Scraping Project - www.Hanna Andersson.com

I have chosen to practise my scraping skills on a US website that mainly sells pajamas: 
https://www.hannaandersson.com/

## Technical Requirements

The technical requirements for this project are as follows:

** You must clean and normalize your database.
* You must have at least 200 rows and 8 columns 9in the final clean database. More data is always welcome.


## Necessary Deliverables

The following deliverables should be pushed to your **Github repo** for this chapter.
* The result should be stored in **CSV format and SQL format. 
* A **Jupyter Notebook (.ipynb) file** that contains the code used to get the data. 
* An **output folder** containing the outputs of your API and scraping efforts.
* A **`README.md` file** containing a detailed explanation of your approach and code for retrieving data from the API and scraping the web page as well as your results, obstacles encountered, and lessons learned.

## Presentation

You will have **7 minutes** to present your project to the class and then **3 minutes** for Q&A,
so keep it simple!

The slides of your presentation must include the content listed below:
- Title of the project + Student name
- Description of your idea and project
- Challenges
- Process
- Learnings
- If I were to start from scratch...
- Improvements
- Highlights


## Suggested Ways to Get Started

* **Define a problem** - think what exactly you are willing to study. Prices on Black Friday? Biggest discounts?  Select your topic based on your points of interest and search for websites that contain some useful information.
* **Commit early, commit often**, don’t be afraid of doing something incorrectly because you can always roll back to a previous version.
* **Consult documentation and resources provided** to better understand the tools you are using and how to accomplish what you want.


## Useful Resources

* [Requests Library Documentation: Quickstart](http://docs.python-requests.org/en/master/user/quickstart/)
* [Requests library](http://docs.python-requests.org/en/master/#the-user-guide)
* [BeautifulSoup Documentation](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
* [Stack Overflow Python Requests Questions](https://stackoverflow.com/questions/tagged/python-requests)
* [StackOverflow BeautifulSoup Questions](https://stackoverflow.com/questions/tagged/beautifulsoup)
* [Urllib](https://docs.python.org/3/library/urllib.html#module-urllib)
* [Public APIs](https://github.com/toddmotto/public-apis)
* [API List](https://apilist.fun/)
* [GOOGLE!!!](https://www.google/com)
- [lxml lib](https://lxml.de/)
- [Scrapy](https://scrapy.org/)
- [List of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
- [HTML basics](http://www.simplehtmlguide.com/cheatsheet.php)
- [CSS basics](https://www.cssbasics.com/#page_start)



#### Below are the libraries and modules you may need. `requests`,  `BeautifulSoup` and `pandas` are already imported for you. If you prefer to use additional libraries feel free to do it.

In [1]:
import requests as r
from bs4 import BeautifulSoup
import pandas as pd

#### Download, parse (using BeautifulSoup), and print the content from the sale page of website:

In [2]:
# This is the url I have scraped in this project
url = 'https://www.hannaandersson.com/sale/?start=12&sz=12&format=page-element'

In [3]:
# your code here
response=r.get(url)
response

<Response [403]>

In [4]:
headers="""accept: text/html, */*; q=0.01
accept-encoding: gzip, deflate, br
accept-language: fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7
cache-control: no-cache
cookie: __cfduid=db177e99449198c2582f571806cacecd51606386575; dwanonymous_e4fdf894e6616217dca137d1f8a3f000=bca7iDVLBk7UZL2wB8TbaSRzw6; RfkEnabled=false; __cq_dnt=0; cqcid=bca7iDVLBk7UZL2wB8TbaSRzw6; cquid=||; dw_dnt=0; notice_behavior=expressed,eu; _gcl_au=1.1.1332759810.1606386580; FPC=bd5af2f0-c6de-4e1d-90c30c1a5eb93f44; variantCookie=1; variantCookieTestID=back2criteo100; _ga=GA1.2.198259505.1606386580; _gid=GA1.2.953923278.1606386580; dw=1; dw_cookies_accepted=1; haNewVisitor=here; _fbp=fb.1.1606386581818.462643727; _pin_unauth=dWlkPU4yVTBNalE1TURZdFl6RTJOQzAwTkRCakxXSTNOVEF0WkRrNFpEWTVOamRoT1dVMQ; scarab.visitor=%222F6EB931F62030DA%22; __cq_uuid=bca7iDVLBk7UZL2wB8TbaSRzw6; IR_gbd=hannaandersson.com; __ruid=40293435-86-s5-49-1p-8bxxp3ofr62357vagmib-1606386582496; __rcmp=0!bj1ydzEsZj1ydyxzPTEsYz0yNDQwLHQ9MjAyMDA0MDguMTk1OTtuPXNiMSxmPXNiLHM9MSxjPTI0MzcsdD0yMDIwMDQwOC4yMDUw; bfx.apiKey=0fac4c60-6e15-11ea-ae9f-6965eb1b85ea; bfx.env=PROD; bfx.logLevel=ERROR; extole_access_token=TVPSF1BPU88L99B674HRTNKSPL; bfx.currency=EUR; bfx.language=en; bfx.isInternational=true; bfx.lcpRuleId=; notice_preferences=2:; notice_gdpr_prefs=0,1,2:; cmapi_gtm_bl=; cmapi_cookie_privacy=permit 1,2,3; __olapicU=1606408027741; SIZEBAY_SESSION_ID_V3=1625582F56D36f3d231bb78e4378abca47af504a3dd4; scarab.profile=%2262634%252DGL7%7C1606408223%22; styliticsWidgetSession=92d5534e-7690-4359-9d9c-c736b12d680d; styliticsWidgetData={%22cohortType%22:%22test%22%2C%22visitor_id%22:2902676716}; bfx.sessionId=bb5e3627-f82a-48f6-8bba-0c909b23cd2b; bfx.country=FR; cbt-consent-banner=CROSS-BORDER%20Consent%20Banner; bfx.isWelcomed=true; bfx.currencyQuoteId=71703387; __rfkp=; scarab.mayAdd=%5B%7B%22i%22%3A%2262364-SW5%22%7D%2C%7B%22i%22%3A%2257421-M23%22%7D%2C%7B%22i%22%3A%2265015-011%22%7D%2C%7B%22i%22%3A%2262317-ST0%22%7D%2C%7B%22i%22%3A%2257435-GM3%22%7D%2C%7B%22i%22%3A%2262341-PF8%22%7D%2C%7B%22i%22%3A%2262251-GL7%22%7D%2C%7B%22i%22%3A%2262634-GL7%22%7D%2C%7B%22i%22%3A%2265291-TE5%22%7D%2C%7B%22i%22%3A%2262627-TD6%22%7D%5D; __cq_bc=%7B%22bblm-hannaandersson%22%3A%5B%7B%22id%22%3A%2262634%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2262634-GL7%22%7D%2C%7B%22id%22%3A%2257435%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2257435-GM3%22%7D%2C%7B%22id%22%3A%2262627%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2262627-TD6%22%7D%2C%7B%22id%22%3A%2265291%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2265291-TE5%22%7D%2C%7B%22id%22%3A%2262251%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2262251-GL7%22%7D%2C%7B%22id%22%3A%2262341%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2262341-PF8%22%7D%2C%7B%22id%22%3A%2262317%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2262317-ST0%22%7D%2C%7B%22id%22%3A%2265015%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2265015-011%22%7D%2C%7B%22id%22%3A%2257421%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2257421-M23%22%7D%2C%7B%22id%22%3A%2262364%22%2C%22type%22%3A%22vgroup%22%2C%22alt_id%22%3A%2262364-SW5%22%7D%5D%7D; __cq_seg=0~0.51!1~-0.07!2~-0.40!3~-0.30!4~-0.10!5~-0.20!6~-0.02!7~-0.42!8~-0.21!9~0.46!f0~31~22; dwac_c15d78007bc7c83b06823fd5e8=Iaz7MWH6orFLkJbD08h-YBo55qfZmzCU9os%3D|dw-only|||USD|false|US%2FPacific|true; sid=Iaz7MWH6orFLkJbD08h-YBo55qfZmzCU9os; dwsid=WF3qbMla8ZG46JuTxrb9E2PI9_pxO2O0BfPKKOSxDMQxOs_Smw7ws-0sLgRNNbADVMj5wlfruyCAAgjpPAmL7w==; _fphu=%7B%22value%22%3A%225.414hvI3ylwvTD3HIKgv.1606386583%22%2C%22ts%22%3A1606516086236%7D; IR_PI=4b772917-2fd2-11eb-8667-0a35d197d7d2%7C1606605791433; __rutmb=40293435; ABTasty=uid=xfk1nev9k76xats5&fst=1606386578875&pst=1606516072195&cst=1606519388405&ns=14&pvt=83&pvis=83&th=552141.0.79.2.12.1.1606408052970.1606519394536.1_609645.754796.1.1.1.1.1606468885141.1606468885141.1_630789.782682.83.2.14.1.1606386579098.1606519394559.1_643924.799342.45.2.7.1.1606386578949.1606519394671.1_643925.799343.36.10.7.1.1606408221955.1606503658737.1_645356.801101.36.10.7.1.1606408221197.1606503658654.1_648502.0.36.2.7.1.1606386578962.1606519394692.1; IR_5644=1606519396304%7C417361%7C1606519391433%7C%7C; __rutma=40293435-86-s5-49-1p-8bxxp3ofr62357vagmib-1606386582496.1606516083780.1606519391786.19.61.2; fanplayr=%7B%22uuid%22%3A%221606386583528-702241ba9e3f3eb8df26e0e7%22%2C%22uk%22%3A%225.414hvI3ylwvTD3HIKgv.1606386583%22%2C%22sk%22%3A%222d488f9217c6e74ae69e37b1a9046e37%22%2C%22se%22%3A%22e1.fanplayr.com%22%2C%22tm%22%3A1%2C%22t%22%3A1606519397180%7D; __rpck=0!eyJwcm8iOiJkaXJlY3QiLCJidCI6eyIwIjpmYWxzZSwiMSI6bnVsbCwiMiI6NDk3OSwiMyI6MC4zM30sIkMiOnt9LCJOIjp7fSwiZHRzIjotNjU5LCJjc3AiOnsiYiI6MTI5NDU1LCJ0Ijo2NjkwLCJzcCI6MTU0ODA0LCJjIjo4fX0~; ABTastySession=mrasn=&lp=https://www.hannaandersson.com/sale/&sen=11; _gat_UA-6112906-3=1; __rpckx=0!eyJlYyI6NjUsInQ3Ijp7IjYxIjoxNjA2NTE5Mzk2NjgzfSwidDd2Ijp7IjYxIjoxNjA2NTE5NDU2NzQ2fSwiaXRpbWUiOiIyMDIwMTEyNy4yMzIzIn0~
pragma: no-cache
referer: https://www.hannaandersson.com/sale/
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36"""

In [5]:
headers=dict([i.split(': ') for i in headers.split('\n')])
headers

{'accept': 'text/html, */*; q=0.01',
 'accept-encoding': 'gzip, deflate, br',
 'accept-language': 'fr-FR,fr;q=0.9,en-US;q=0.8,en;q=0.7',
 'cache-control': 'no-cache',
 'cookie': '__cfduid=db177e99449198c2582f571806cacecd51606386575; dwanonymous_e4fdf894e6616217dca137d1f8a3f000=bca7iDVLBk7UZL2wB8TbaSRzw6; RfkEnabled=false; __cq_dnt=0; cqcid=bca7iDVLBk7UZL2wB8TbaSRzw6; cquid=||; dw_dnt=0; notice_behavior=expressed,eu; _gcl_au=1.1.1332759810.1606386580; FPC=bd5af2f0-c6de-4e1d-90c30c1a5eb93f44; variantCookie=1; variantCookieTestID=back2criteo100; _ga=GA1.2.198259505.1606386580; _gid=GA1.2.953923278.1606386580; dw=1; dw_cookies_accepted=1; haNewVisitor=here; _fbp=fb.1.1606386581818.462643727; _pin_unauth=dWlkPU4yVTBNalE1TURZdFl6RTJOQzAwTkRCakxXSTNOVEF0WkRrNFpEWTVOamRoT1dVMQ; scarab.visitor=%222F6EB931F62030DA%22; __cq_uuid=bca7iDVLBk7UZL2wB8TbaSRzw6; IR_gbd=hannaandersson.com; __ruid=40293435-86-s5-49-1p-8bxxp3ofr62357vagmib-1606386582496; __rcmp=0!bj1ydzEsZj1ydyxzPTEsYz0yNDQwLHQ9MjAyMDA0MD

In [6]:
response=r.get(url,headers=headers)
response

<Response [200]>

""" **Instructions:**

. Find out the html tag and class names used for the products in sale, using CSS Selector.
. Use BeautifulSoup to extract all the html elements that contain the product characteristics.
. Use string manipulation techniques to replace whitespaces and linebreaks (i.e. `\n`) in the *text* of each html element. Use a list to store the clean names.
. Print the list of products."""


In [7]:
# your code here
response.content

b'\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<!-- Product Search Hit current-cat="pdict.category.ID" search-query="null" product="57437-011" product-classification-cat="root" -->\n\n<div class="category-results__product" data-container="product" data-product-id="57437-011" data-colors-to-show="011" itemprop="itemListElement" itemscope itemtype="http://schema.org/ListItem">\n\n\n\n\n<!-- CQuotient Activity Tracking (viewCategory-cquotient.js) -->\n<script type="text/javascript">//<!--\n/* <![CDATA[ */\n(function(){\ntry {\n    if(window.CQuotient) {\n\tvar cq_params = {};\n\t\n\tcq_params.cookieId = window.CQuotient.getCQCookieId();\n\tcq_params.userId = window.CQuotient.getCQUserId();\n\tcq_params.emailId = CQuotient.getCQHashedEmail();\n\tcq_params.loginId = CQuotient.getCQHashedLogin();\n\tcq_params.accumulate = tru

In [8]:
soup= BeautifulSoup(response.content)

In [9]:
type(soup)

bs4.BeautifulSoup

In [10]:
productselect=soup.select('.product__image .product__image--link')
type(productselect)

bs4.element.ResultSet

In [11]:
len(productselect)

12

In [12]:
productselect

[<a aria-label="Image link for Heather Grey Baby Snap Footed Sleeper In Organic Cotton 57437-011" class="product__image--link thumb-link" data-product-id="57437-011" href="https://www.hannaandersson.com/pajamas-baby/57437-011.html?dwvar_57437-011_color=011&amp;cgid=Sale" onclick="gtmAnalytics.submitProductImpressionClick(_analytics_f29a4b1b2dc4a4e1b9ad822d91, this, 'image');" title="Baby Snap Footed Sleeper In Organic Cotton">
 <img alt="Product image for 57437-011" class="pt-image lazyload" data-src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwe6d7c706/images/main/57437/57437_011_60_01.jpg?sw=369&amp;q=90" src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwe6d7c706/images/main/57437/57437_011_60_01.jpg?sw=369&amp;q=50"/>
 <img alt="Alternate product image for 57437-011" class="alt-image lazyload" data-src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/dema

In [13]:
names=[i.get("title") for i in productselect]
len(names)

12

In [14]:
names

['Baby Snap Footed Sleeper In Organic Cotton',
 'Double Knee Woven Pants',
 'Baby Dress & Bloomer Set In Organic Cotton',
 'Super Soft Skater Dress',
 'Baby Dress & Bloomer Set In Organic Cotton',
 'Who Will You Be Cap',
 'Baby Snap Footed Sleeper In Organic Cotton',
 'Baby Sweatpants In French Terry',
 'Baby Snap Footed Sleeper In Organic Cotton',
 'Woven Canvas Pants',
 'Athletic Shorts',
 'Disney Princess Lunch Bag']

In [15]:
imageselect=soup.select('.pt-image')
len(imageselect)

12

In [16]:
imageselect

[<img alt="Product image for 57437-011" class="pt-image lazyload" data-src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwe6d7c706/images/main/57437/57437_011_60_01.jpg?sw=369&amp;q=90" src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwe6d7c706/images/main/57437/57437_011_60_01.jpg?sw=369&amp;q=50"/>,
 <img alt="Product image for 64344-SR4" class="pt-image lazyload" data-src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dw47092f5b/images/main/64344/64344_SR4_110_01.jpg?sw=369&amp;q=90" src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dw47092f5b/images/main/64344/64344_SR4_110_01.jpg?sw=369&amp;q=50"/>,
 <img alt="Product image for 57421-A91" class="pt-image lazyload" data-src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-

In [17]:
images=[i.get('data-src') for i in imageselect]
images

['https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwe6d7c706/images/main/57437/57437_011_60_01.jpg?sw=369&q=90',
 'https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dw47092f5b/images/main/64344/64344_SR4_110_01.jpg?sw=369&q=90',
 'https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwdfb2713f/images/main/57421/57421_A91_60_01.jpg?sw=369&q=90',
 'https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwdda104e6/images/main/60363/60363_M07_110_01.jpg?sw=369&q=90',
 'https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dw02ce41b1/images/main/57421/57421_PW1_60_01.jpg?sw=369&q=90',
 'https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwaf8b6ae4/images/main/48200/48200_J86_60_01.jp

In [18]:
productfeat=soup.select('.product__image--link')
productfeat

[<a aria-label="Image link for Heather Grey Baby Snap Footed Sleeper In Organic Cotton 57437-011" class="product__image--link thumb-link" data-product-id="57437-011" href="https://www.hannaandersson.com/pajamas-baby/57437-011.html?dwvar_57437-011_color=011&amp;cgid=Sale" onclick="gtmAnalytics.submitProductImpressionClick(_analytics_f29a4b1b2dc4a4e1b9ad822d91, this, 'image');" title="Baby Snap Footed Sleeper In Organic Cotton">
 <img alt="Product image for 57437-011" class="pt-image lazyload" data-src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwe6d7c706/images/main/57437/57437_011_60_01.jpg?sw=369&amp;q=90" src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwe6d7c706/images/main/57437/57437_011_60_01.jpg?sw=369&amp;q=50"/>
 <img alt="Alternate product image for 57437-011" class="alt-image lazyload" data-src="https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/dema

In [19]:
links=[i.get('href') for i in productselect]
len(links)

12

In [20]:
links

['https://www.hannaandersson.com/pajamas-baby/57437-011.html?dwvar_57437-011_color=011&cgid=Sale',
 'https://www.hannaandersson.com/boys-clothing-pants-shorts/64344-SR4.html?dwvar_64344-SR4_color=SR4&cgid=Sale',
 'https://www.hannaandersson.com/baby-girl-dresses-skirts/57421-A91.html?dwvar_57421-A91_color=A91&cgid=Sale',
 'https://www.hannaandersson.com/girls-clothing-dresses/60363-M07.html?dwvar_60363-M07_color=M07&cgid=Sale',
 'https://www.hannaandersson.com/baby-girl-dresses-skirts/57421-PW1.html?dwvar_57421-PW1_color=PW1&cgid=Sale',
 'https://www.hannaandersson.com/accessories-baby-hats/48200-J86.html?dwvar_48200-J86_color=J86&cgid=Sale',
 'https://www.hannaandersson.com/pajamas-baby/57437-G43.html?dwvar_57437-G43_color=G43&cgid=Sale',
 'https://www.hannaandersson.com/baby-girl-pants-leggings-shorts/52315-A91.html?dwvar_52315-A91_color=A91&cgid=Sale',
 'https://www.hannaandersson.com/pajamas-baby/57436-Ql7.html?dwvar_57436-Ql7_color=Ql7&cgid=Sale',
 'https://www.hannaandersson.com/

In [21]:
stdpriceselect=soup.select('.bfx-original-price')
len(stdpriceselect)

12

In [22]:
standard_prices=[i.text.strip("Standard Price:") for i in stdpriceselect]
standard_prices

['$40.00',
 '$48.00',
 '$40.00',
 '$48.00',
 '$40.00',
 '$20.00',
 '$40.00',
 '$30.00',
 '$40.00',
 '$50.00',
 '$38.00',
 '$30.00']

In [23]:
salepriceselect=soup.select('.bfx-price')
len(salepriceselect)

12

In [24]:
sale_prices=[i.text.strip('Sale Price:') for i in salepriceselect]
sale_prices

['$15.99',
 '$18.99',
 '$15.99',
 '$18.99',
 '$15.99',
 '$4.79',
 '$15.99',
 '$15.00',
 '$15.99',
 '$19.99',
 '$14.99',
 '$11.99']

In [25]:
ratingselect=soup.select('.product__ratings')
len(ratingselect)

12

In [26]:
print(ratingselect[0])
print(ratingselect[1])
print(ratingselect[2])
print(ratingselect[3])
print(ratingselect[4])
print(ratingselect[5])
print(ratingselect[6])
print(ratingselect[7])
print(ratingselect[8])
print(ratingselect[9])
print(ratingselect[10])
print(ratingselect[11])

<div class="product__ratings">
<div class="TTteaser TTteaser-tile" data-productid="57437-011" data-starrating="4.5"></div>
</div>
<div class="product__ratings">
<div class="TTteaser TTteaser-tile" data-productid="64344-SR4" data-starrating="4.0"></div>
</div>
<div class="product__ratings">
<div class="TTteaser TTteaser-tile" data-productid="57421-A91" data-starrating="5.0"></div>
</div>
<div class="product__ratings">
<div class="TTteaser TTteaser-tile" data-productid="60363-M07" data-starrating="5.0"></div>
</div>
<div class="product__ratings">
<div class="TTteaser TTteaser-tile" data-productid="57421-PW1" data-starrating="5.0"></div>
</div>
<div class="product__ratings">
<div class="TTteaser TTteaser-tile" data-productid="48200-J86" data-starrating="5.0"></div>
</div>
<div class="product__ratings">
<div class="TTteaser TTteaser-tile" data-productid="57437-G43" data-starrating="4.5"></div>
</div>
<div class="product__ratings">
<div class="TTteaser TTteaser-tile" data-productid="52315-A

In [27]:
for i in range(0,12):
    print(i)

0
1
2
3
4
5
6
7
8
9
10
11


In [28]:
for i in range(0,12):
    print(len(str(list(ratingselect[i]))))


105
105
105
105
105
105
105
105
105
6
105
105


In [29]:
ratings=[]
for i in range(0,12):
    if len(str(list(ratingselect[i])))<=8:
        i='n/a'
    else:
        i=str(list(ratingselect[i])[1]).split()[4].strip('datsrivng>"/<-=')
    ratings.append(i)
print(ratings)

['4.5', '4.0', '5.0', '5.0', '5.0', '5.0', '4.5', '5.0', '5.0', 'n/a', '3.5', '5.0']


In [30]:
categoryselect=soup.select("div.product script[type]")
len(categoryselect)

12

In [31]:
import re

In [32]:
product_category=re.findall("\"dimension7\" : \"(.*?)\"", ' '.join([i.text for i in categoryselect]))
product_category

['pajamas-baby',
 'boys-clothing-pants-shorts',
 'baby-girl-dresses-skirts',
 'girls-clothing-dresses',
 'baby-girl-dresses-skirts',
 'accessories-baby-hats',
 'pajamas-baby',
 'baby-girl-pants-leggings-shorts',
 'pajamas-baby',
 'boys-clothing-pants-shorts',
 'boys-clothing-shorts',
 'girls-accessories-backpacks-bags']

In [33]:
len(product_category)

12

In [34]:
product_color=re.findall("\"variant\" : \"(.*?)\"", ' '.join([i.text for i in categoryselect]))
product_color

['Heather Grey',
 'Navy Blue',
 'Navy',
 'Sunshine',
 'Trek Teal',
 'Pumpkin',
 'Happy Pink',
 'Navy',
 'Fancy Frogs',
 'Deep Olive',
 'Multi',
 'Rapunzel']

In [35]:
len(product_color)

12

In [36]:
dict={"Name":names,
      "Image":images,
      "Product Color":product_color,
      "Product link":links,
      "Standard price":standard_prices,
      "Sale price":sale_prices,
      "Product category":product_category,
      "Rating":ratings}
dict

{'Name': ['Baby Snap Footed Sleeper In Organic Cotton',
  'Double Knee Woven Pants',
  'Baby Dress & Bloomer Set In Organic Cotton',
  'Super Soft Skater Dress',
  'Baby Dress & Bloomer Set In Organic Cotton',
  'Who Will You Be Cap',
  'Baby Snap Footed Sleeper In Organic Cotton',
  'Baby Sweatpants In French Terry',
  'Baby Snap Footed Sleeper In Organic Cotton',
  'Woven Canvas Pants',
  'Athletic Shorts',
  'Disney Princess Lunch Bag'],
 'Image': ['https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwe6d7c706/images/main/57437/57437_011_60_01.jpg?sw=369&q=90',
  'https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dw47092f5b/images/main/64344/64344_SR4_110_01.jpg?sw=369&q=90',
  'https://www.hannaandersson.com/dw/image/v2/BBLM_PRD/on/demandware.static/-/Sites-master-catalog/default/dwdfb2713f/images/main/57421/57421_A91_60_01.jpg?sw=369&q=90',
  'https://www.hannaandersson.com/d

In [37]:
df = pd.DataFrame(dict)
df

Unnamed: 0,Name,Image,Product Color,Product link,Standard price,Sale price,Product category,Rating
0,Baby Snap Footed Sleeper In Organic Cotton,https://www.hannaandersson.com/dw/image/v2/BBL...,Heather Grey,https://www.hannaandersson.com/pajamas-baby/57...,$40.00,$15.99,pajamas-baby,4.5
1,Double Knee Woven Pants,https://www.hannaandersson.com/dw/image/v2/BBL...,Navy Blue,https://www.hannaandersson.com/boys-clothing-p...,$48.00,$18.99,boys-clothing-pants-shorts,4.0
2,Baby Dress & Bloomer Set In Organic Cotton,https://www.hannaandersson.com/dw/image/v2/BBL...,Navy,https://www.hannaandersson.com/baby-girl-dress...,$40.00,$15.99,baby-girl-dresses-skirts,5.0
3,Super Soft Skater Dress,https://www.hannaandersson.com/dw/image/v2/BBL...,Sunshine,https://www.hannaandersson.com/girls-clothing-...,$48.00,$18.99,girls-clothing-dresses,5.0
4,Baby Dress & Bloomer Set In Organic Cotton,https://www.hannaandersson.com/dw/image/v2/BBL...,Trek Teal,https://www.hannaandersson.com/baby-girl-dress...,$40.00,$15.99,baby-girl-dresses-skirts,5.0
5,Who Will You Be Cap,https://www.hannaandersson.com/dw/image/v2/BBL...,Pumpkin,https://www.hannaandersson.com/accessories-bab...,$20.00,$4.79,accessories-baby-hats,5.0
6,Baby Snap Footed Sleeper In Organic Cotton,https://www.hannaandersson.com/dw/image/v2/BBL...,Happy Pink,https://www.hannaandersson.com/pajamas-baby/57...,$40.00,$15.99,pajamas-baby,4.5
7,Baby Sweatpants In French Terry,https://www.hannaandersson.com/dw/image/v2/BBL...,Navy,https://www.hannaandersson.com/baby-girl-pants...,$30.00,$15.00,baby-girl-pants-leggings-shorts,5.0
8,Baby Snap Footed Sleeper In Organic Cotton,https://www.hannaandersson.com/dw/image/v2/BBL...,Fancy Frogs,https://www.hannaandersson.com/pajamas-baby/57...,$40.00,$15.99,pajamas-baby,5.0
9,Woven Canvas Pants,https://www.hannaandersson.com/dw/image/v2/BBL...,Deep Olive,https://www.hannaandersson.com/boys-clothing-p...,$50.00,$19.99,boys-clothing-pants-shorts,


In [38]:
df.shape

(12, 8)