# **Web Scraping**
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. Web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser.

Web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis.

 Most Popular Python Scraping Tools is BeatifulSoup , Scrapy and Selenium 

 I've used BeatifulSoup and Selenium in this notebook


## **BeautifulSoup Library**
Beautiful Soup is a Python library for parsing structured data. It allows you to interact with HTML in a similar way to how you interact with a web page using developer tools. The library exposes a couple of intuitive functions you can use to explore the HTML you received. 

**For More Information :** https://realpython.com/beautiful-soup-web-scraper-python/#:~:text=Beautiful%20Soup%20is%20a%20Python%20library%20for%20parsing,started%2C%20use%20your%20terminal%20to%20install%20Beautiful%20Soup%3A

In [None]:
# pip install beautifulsoup4

### 1. Intro to Scraping 
------------------------------
> Frsit Scraping example and Try to scrap data from : https://jekso.github.io/scrapping-example/index.html

In [None]:
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd

In [None]:
response = requests.get('https://jekso.github.io/scrapping-example/index.html')
print(response.text)

<!DOCTYPE html>

<html>

    <head>
        <title>My First HTML Page</title>

        <style>

            p {
                color: green;
                
            }

            h1 {
                color:blue;
                text-shadow: 2px 2px 5px red;
            }


            img {
                width: 200px;
                height: 200px;
            }


            select {
                width: 300px;
                height: 50px;
                background-color: orange;
            }

            div#bigred {
                width: 100%%;
                height: 200px;
                background-color: red;
                margin: 20px;
                color: white;
            }


            div#smallblue {
                width: 100%%;
                height: 100px;
                background-color: blue;
                margin: 20px;
                color: yellow;
            }

            li.redul {
                color: red;
            }

            li

In [None]:
type(response.text)

str

In [None]:
soup = BeautifulSoup(response.text, 'html.parser')
type(soup)

bs4.BeautifulSoup

In [None]:
print(soup)

<!DOCTYPE html>

<html>
<head>
<title>My First HTML Page</title>
<style>

            p {
                color: green;
                
            }

            h1 {
                color:blue;
                text-shadow: 2px 2px 5px red;
            }


            img {
                width: 200px;
                height: 200px;
            }


            select {
                width: 300px;
                height: 50px;
                background-color: orange;
            }

            div#bigred {
                width: 100%%;
                height: 200px;
                background-color: red;
                margin: 20px;
                color: white;
            }


            div#smallblue {
                width: 100%%;
                height: 100px;
                background-color: blue;
                margin: 20px;
                color: yellow;
            }

            li.redul {
                color: red;
            }

            li.blueol {
            

In [None]:
title = soup.find('title')
title

<title>My First HTML Page</title>

In [None]:
page = title.get_text().strip()
page

'My First HTML Page'

In [None]:
paragraphs = soup.find_all('p')
paragraphs

[<p>Iam a paragrapph</p>, <p>Iam a paragrapph</p>]

In [None]:
for p in paragraphs :
    print(p.get_text())
    print('----' *5)

Iam a paragrapph
--------------------
Iam a paragrapph
--------------------


In [None]:
heads_2 = soup.find_all('h2')
heads_2

[<h2 class="h2_orange bg-yellow">My First H2 Header</h2>,
 <h2 class="h2_orange">My Second H2 Header</h2>,
 <h2>My 3rd H2 Header</h2>]

In [None]:
h2_orange = soup.find_all('h2' , attrs={'class': 'h2_orange'})
print(h2_orange)

[<h2 class="h2_orange bg-yellow">My First H2 Header</h2>, <h2 class="h2_orange">My Second H2 Header</h2>]


In [None]:
for h2 in h2_orange :
    print(h2.get_text())
    print(h2.attrs)
    print('----'*5)
    

My First H2 Header
{'class': ['h2_orange', 'bg-yellow']}
--------------------
My Second H2 Header
{'class': ['h2_orange']}
--------------------


In [None]:
for h2 in h2_orange :
    print(h2.get_text())
    print(h2.get('class'))
    print('----'*5)

My First H2 Header
['h2_orange', 'bg-yellow']
--------------------
My Second H2 Header
['h2_orange']
--------------------


In [None]:
for h2 in h2_orange :
    print(h2.get_text())
    print(h2.name)
    print('----'*5)

My First H2 Header
h2
--------------------
My Second H2 Header
h2
--------------------


In [None]:
h2_bg_yellow = soup.find_all('h2' , attrs={'class': 'bg-yellow'})
h2_bg_yellow

[<h2 class="h2_orange bg-yellow">My First H2 Header</h2>]

In [None]:
h2_bg_yellow[0].get_text()

'My First H2 Header'

In [None]:
h2_bg_yellow[0].attrs

{'class': ['h2_orange', 'bg-yellow']}

In [None]:
h2_bg_yellow[0].attrs['class']

['h2_orange', 'bg-yellow']

In [None]:
h2_bg_yellow[0].has_attr('class')

True

In [None]:
h2_bg_yellow[0].get('class')

['h2_orange', 'bg-yellow']

In [None]:
h2_bg_yellow[0].attrs['class'][1]

'bg-yellow'

In [None]:
h2_bg_yellow[0].get('id')

In [None]:
h2_bg_yellow[0].has_attr('id')

False

In [None]:
all_link = soup.find_all('a')
all_link

[<a href="http://www.google.com">Go To Google</a>]

In [None]:
for a in all_link :
    print(a.get_text())
    print(a.get('href'))

Go To Google
http://www.google.com


In [None]:
form = soup.find_all('form')
form

[<form>
             Username: <input class="bg-yellow" type="text"/>
 <br/><br/>
             Passowrd: <input type="password"/>
 <br/><br/>
 <select>
 <option value="volvo">Volvo</option>
 <option value="saab">Saab</option>
 <option value="fiat">Fiat</option>
 <option value="audi">Audi</option>
 </select>
 <br/><br/>
 <input type="checkbox"/>
             I agree for the terms
             <br/><br/>
 <input type="submit" value="Search"/>
 </form>]

In [None]:
for option in form[0].find_all('option'):
    print(option.get_text().strip())

Volvo
Saab
Fiat
Audi


In [None]:
for op in form[0].find_all('option'):
    print(op.get('value'))

volvo
saab
fiat
audi


In [None]:
all_divs = soup.find_all('div')
all_divs

[<div id="bigred">
             Hello world
         </div>,
 <div id="smallblue">
             Hello python
             <p>Iam a paragrapph</p>
 </div>]

In [None]:
for div in all_divs :
    print(div.get_text().strip())

Hello world
Hello python
            Iam a paragrapph


In [None]:
for div in all_divs :
    for p in div.find_all('p') :
        print(p.get_text())

Iam a paragrapph


In [None]:
for div in all_divs :
    print(div.get('id'))
    print('-----')

bigred
-----
smallblue
-----


In [None]:
iframe  = soup.find('iframe')
iframe

<iframe height="345" src="https://www.youtube.com/embed/tgbNymZ7vqY" width="420"></iframe>

In [None]:
iframe.attrs

{'width': '420',
 'height': '345',
 'src': 'https://www.youtube.com/embed/tgbNymZ7vqY'}

In [None]:
iframe.attrs['width']

'420'

In [None]:
iframe.attrs['class']

KeyError: 'class'

In [None]:
iframe.get('width')

'420'

In [None]:
iframe.get('class')

In [None]:
width = int(iframe.get('width'))
height = int(iframe.get('height'))
resolution = width * height
resolution

144900

In [None]:
link = iframe.get('src').strip()
link

'https://www.youtube.com/embed/tgbNymZ7vqY'

In [None]:
table = soup.find_all('tr')
table

[<tr>
 <th>House Size</th>
 <th>Num of Rooms</th>
 <th class="bg-yellow">Has Garden</th>
 <th>Price</th>
 </tr>,
 <tr>
 <td>120</td>
 <td>3</td>
 <td>yes</td>
 <td>500000</td>
 </tr>,
 <tr>
 <td>140</td>
 <td>4</td>
 <td class="bg-yellow">no</td>
 <td>750000</td>
 </tr>,
 <tr>
 <td>80</td>
 <td>1</td>
 <td>no</td>
 <td>300000</td>
 </tr>]

In [None]:
for row in table :
    print(row)
    print('-----------' + '\n\n')

<tr>
<th>House Size</th>
<th>Num of Rooms</th>
<th class="bg-yellow">Has Garden</th>
<th>Price</th>
</tr>
-----------


<tr>
<td>120</td>
<td>3</td>
<td>yes</td>
<td>500000</td>
</tr>
-----------


<tr>
<td>140</td>
<td>4</td>
<td class="bg-yellow">no</td>
<td>750000</td>
</tr>
-----------


<tr>
<td>80</td>
<td>1</td>
<td>no</td>
<td>300000</td>
</tr>
-----------




In [None]:
for head in soup.find_all('th') :
    print(head.get_text())
    print('----')

House Size
----
Num of Rooms
----
Has Garden
----
Price
----


In [None]:
header = [head.get_text() for head in soup.find_all('th') ]
header

['House Size', 'Num of Rooms', 'Has Garden', 'Price']

In [None]:
for td in soup.find_all('td') :
    print(td.get_text()) 

120
3
yes
500000
140
4
no
750000
80
1
no
300000


In [None]:
all_rows = []
for tr in soup.find_all('tr')[1:]:
    row = [ td.get_text() for td in tr.find_all('td')]
    all_rows.append(row)
print(all_rows)    

[['120', '3', 'yes', '500000'], ['140', '4', 'no', '750000'], ['80', '1', 'no', '300000']]


In [None]:
all_rows = []
for tr in soup.find_all('tr')[1:]:
    row = []
    for td in tr.find_all('td') : 
         row.append(td.get_text())
    all_rows.append(row)
print(all_rows)    

[['120', '3', 'yes', '500000'], ['140', '4', 'no', '750000'], ['80', '1', 'no', '300000']]


In [None]:
for row in range(len(all_rows)) :
    print('Row Number', row+1 ,':', all_rows[row])

Row Number 1 : ['120', '3', 'yes', '500000']
Row Number 2 : ['140', '4', 'no', '750000']
Row Number 3 : ['80', '1', 'no', '300000']


In [None]:
with open('data.csv', mode='w') as employee_file:
    writer = csv.writer(employee_file)
    
    header = [ th.get_text() for th in soup.find_all('th')]
    writer.writerow(header)
    
    for tr in soup.find_all('tr')[1:]:
        row = [ td.get_text() for td in tr.find_all('td')]
        writer.writerow(row)


### 2. USD To EGP Exchange

Scrap the USD To EGP Exchange rate from this website
https://www.exchangerates.org.uk/Dollars-to-Egyptian-Pounds-currency-conversion-page.html
and then use it to make a software that takes amount of USD Dollars from the user and calculate how much will it cost in EGP.




In [None]:
response = requests.get('https://www.exchangerates.org.uk/Dollars-to-Egyptian-Pounds-currency-conversion-page.html')
soup = BeautifulSoup(response.text, 'html.parser')

In [None]:
div = soup.find_all('div' , {'class':'p_conv30'})
div

[<div class="p_conv30">
 <p>Welcome to the <b>Dollars to Egyptian Pounds</b> page, updated every minute between Sunday 22:00 and Friday 22:00 (UK)</p>
 </div>,
 <div class="p_conv30">
 <span id="shd2a">1 USD = <span id="shd2b;">18.9174</span> EGP</span>
 </div>]

In [None]:
title  = div[0].find('p').get_text().split('page')[0] 
title

'Welcome to the Dollars to Egyptian Pounds '

In [None]:
USD_to_EGP =  float(div[1].find('span',{'id' :'shd2b;'}).get_text())
USD_to_EGP

18.9174

In [None]:
print(title)
print('--'*40)
USD = int(input('Enter USD Dollars : '))
EGP = USD * USD_to_EGP
print(f'{USD} USD =  {EGP} EGP ')

Welcome to the Dollars to Egyptian Pounds 
--------------------------------------------------------------------------------
Enter USD Dollars : 4
4 USD =  75.6696 EGP 


### 3. Scraping MIT Algorithm Course
-----------------------------------------------------
> Scraping lectures videos links for introduction to algorithm course from MIT University and saving lecture name and its  video link in csv file(introduction to algorithm.csv).

> Scraping Link : https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-spring-2020/lecture-videos/# 

In [None]:
response = requests.get('https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-spring-2020/lecture-videos/#')
print(response.text)

<!doctype html>
<html lang="en">
<head>
  <link href="/static/css/course.80259.css" rel="stylesheet">
  
    
    <script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
    new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
    j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
    'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
    })(window,document,'script','dataLayer','GTM-NMQZ25T');</script>
    
  
  <meta charset="utf-8">
  <meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover">
  
  <meta name="description" content="MIT OpenCourseWare is a web based publication of virtually all MIT course content. OCW is open and available to the world and is a permanent MIT activity">
  <meta name="keywords" content="opencourseware,MIT OCW,courseware,MIT opencourseware,Free Courses,class notes,class syllabus,class materials,tutorials,online courses,MIT courses">
  <link rel

In [None]:
soup = BeautifulSoup(response.text, 'html.parser')
print(soup)

<!DOCTYPE html>

<html lang="en">
<head>
<link href="/static/css/course.80259.css" rel="stylesheet"/>
<script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
    new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
    j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
    'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
    })(window,document,'script','dataLayer','GTM-NMQZ25T');</script>
<meta charset="utf-8"/>
<meta content="width=device-width, initial-scale=1, viewport-fit=cover" name="viewport"/>
<meta content="MIT OpenCourseWare is a web based publication of virtually all MIT course content. OCW is open and available to the world and is a permanent MIT activity" name="description"/>
<meta content="opencourseware,MIT OCW,courseware,MIT opencourseware,Free Courses,class notes,class syllabus,class materials,tutorials,online courses,MIT courses" name="keywords"/>
<link href="https://ocw.mit.edu/course

In [None]:
soup.find('div',attrs= {'id' : "course-banner" })

<div class="p-0" id="course-banner">
<div class="max-content-width m-auto px-5 py-6">
<a class="text-uppercase display-4 font-weight-bold m-0 text-white" href="/courses/6-006-introduction-to-algorithms-spring-2020/">Introduction to Algorithms</a>
</div>
</div>

In [None]:
soup.find('div',attrs= {'id' : "course-banner" }).find('a')

<a class="text-uppercase display-4 font-weight-bold m-0 text-white" href="/courses/6-006-introduction-to-algorithms-spring-2020/">Introduction to Algorithms</a>

In [None]:
course_name = soup.find('div',attrs= {'id' : "course-banner" }).find('a').get_text().strip()
course_name

'Introduction to Algorithms'

In [None]:
soup.find('title').get_text()

'Lecture Videos | Introduction to Algorithms | Electrical Engineering and Computer Science | MIT OpenCourseWare'

In [None]:
soup.find('title').get_text().split('|')

['Lecture Videos ',
 ' Introduction to Algorithms ',
 ' Electrical Engineering and Computer Science ',
 ' MIT OpenCourseWare']

In [None]:
course_name = soup.find('title').get_text().split('|')[1].strip()
course_name

'Introduction to Algorithms'

In [None]:
file_name = course_name +'(try).csv'
file_name

'Introduction to Algorithms(try).csv'

In [None]:
all_lectures_links = soup.find_all('a', attrs= {'class' :'resource-list-title'})
all_lectures_links

[<a class="resource-list-title" href="/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-1-algorithms-and-computation/">
         Lecture 1: Algorithms and Computation
       </a>,
 <a class="resource-list-title" href="/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-10-depth-first-search/">
         Lecture 10: Depth-First Search
       </a>,
 <a class="resource-list-title" href="/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-11-weighted-shortest-paths/">
         Lecture 11: Weighted Shortest Paths
       </a>,
 <a class="resource-list-title" href="/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-12-bellman-ford/">
         Lecture 12: Bellman-Ford
       </a>,
 <a class="resource-list-title" href="/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-13-dijkstra/">
         Lecture 13: Dijkstra
       </a>,
 <a class="resource-list-title" href="/courses/6-006-introduction-to

In [None]:
all_lectures_links[1].get('href')

'/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-10-depth-first-search/'

In [None]:
print('https://ocw.mit.edu'+all_lectures_links[1].get('href'))

https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-10-depth-first-search/


In [None]:
for i in range(len(all_lectures_links)) : 
    print(f"https://ocw.mit.edu{all_lectures_links[i].get('href')}")
    print('--'*60)

https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-1-algorithms-and-computation/
------------------------------------------------------------------------------------------------------------------------
https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-10-depth-first-search/
------------------------------------------------------------------------------------------------------------------------
https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-11-weighted-shortest-paths/
------------------------------------------------------------------------------------------------------------------------
https://ocw.mit.edu/courses/6-006-introduction-to-algorithms-spring-2020/resources/lecture-12-bellman-ford/
------------------------------------------------------------------------------------------------------------------------
https://ocw.mit.edu/courses/6-006-introduction-to-algo

In [None]:
all_lectures_links[1].get_text().strip()

'Lecture 10: Depth-First Search'

In [None]:
for lecture in all_lectures_links : 
    print(f'{lecture.get_text().strip()}')
    print('--'*40)

Lecture 1: Algorithms and Computation
--------------------------------------------------------------------------------
Lecture 10: Depth-First Search
--------------------------------------------------------------------------------
Lecture 11: Weighted Shortest Paths
--------------------------------------------------------------------------------
Lecture 12: Bellman-Ford
--------------------------------------------------------------------------------
Lecture 13: Dijkstra
--------------------------------------------------------------------------------
Lecture 14: APSP and Johnson
--------------------------------------------------------------------------------
Lecture 15: Dynamic Programming, Part 1: SRTBOT, Fib, DAGs, Bowling
--------------------------------------------------------------------------------
Lecture 16: Dynamic Programming, Part 2: LCS, LIS, Coins
--------------------------------------------------------------------------------
Lecture 17: Dynamic Programming, Part 3: APSP, 

In [None]:
with open(file_name, mode='w', encoding='utf-8') as csv_file:
    fieldnames = ['video_name', 'video_url']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    writer.writeheader()
    for a in soup.find_all('a', attrs={'class' :'resource-list-title'}):
        video_name = a.get_text().strip()
        video_url = 'https://ocw.mit.edu' + a.get('href')
        writer.writerow({'video_name': video_name, 'video_url': video_url})

In [None]:
pd.read_csv(file_name)

Unnamed: 0,video_name,video_url
0,Lecture 1: Algorithms and Computation,https://ocw.mit.edu/courses/6-006-introduction...
1,Lecture 10: Depth-First Search,https://ocw.mit.edu/courses/6-006-introduction...
2,Lecture 11: Weighted Shortest Paths,https://ocw.mit.edu/courses/6-006-introduction...
3,Lecture 12: Bellman-Ford,https://ocw.mit.edu/courses/6-006-introduction...
4,Lecture 13: Dijkstra,https://ocw.mit.edu/courses/6-006-introduction...
5,Lecture 14: APSP and Johnson,https://ocw.mit.edu/courses/6-006-introduction...
6,"Lecture 15: Dynamic Programming, Part 1: SRTBO...",https://ocw.mit.edu/courses/6-006-introduction...
7,"Lecture 16: Dynamic Programming, Part 2: LCS, ...",https://ocw.mit.edu/courses/6-006-introduction...
8,"Lecture 17: Dynamic Programming, Part 3: APSP,...",https://ocw.mit.edu/courses/6-006-introduction...
9,"Lecture 18: Dynamic Programming, Part 4: Rods,...",https://ocw.mit.edu/courses/6-006-introduction...


### 4. Scraping Books
--------------------------------------------------------

1. Scraping name, price, rate for books and saving this data in csv file.

2. Scraping Link : https://books.toscrape.com/catalogue/category/books/travel_2/index.html

In [None]:
response = requests.get('https://books.toscrape.com/catalogue/category/books/travel_2/index.html')

In [None]:
soup = BeautifulSoup(response.text, 'html.parser')

In [None]:
title = soup.find('title')
course_name = title.get_text().strip().split('|')[0].strip()
file_name = course_name + '.csv'
file_name

'Travel.csv'

## **Selenium Library**
Selenium is a powerful tool for controlling web browsers through programs and performing browser automation. It is functional for all browsers, works on all major OS and its scripts are written in various languages i.e Python, Java, C#, etc, we will be working with Python. Selenium Tutorial covers all topics such as – WebDriver, WebElement, Unit Testing with selenium. 

## **Selenim can be used in :**
1. Control Browser With Selenium 
2. Control Browser With Selenium For Automated Testing
3. Download File From The Internet
4. Scraping Data From Websites as :
     1. Get Quotes From Websites
     2. Get Gold and Currencies Rate
     3. Get News From Websites 
     
**For More Information :** https://www.geeksforgeeks.org/selenium-python-tutorial/

**Note :** this is only intro to selenium and part of selenium library that can be used in web scraping     

In [None]:
#pip install selenium==3.141.0

In [None]:
#pip install webdriver-manager

In [None]:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
import requests
import time
from bs4 import BeautifulSoup

In [None]:
Path = 'E:\Aslm\Driver\chromedriver.exe'
browser = webdriver.Chrome(Path)
#browser = webdriver.Chrome(ChromeDriverManager().install())

In [None]:
browser.get('https://www.techwithtim.net/')  

In [None]:
browser.title

'Tech With Tim - Python & Java Programming Tutorials - techwithtim.net'

In [None]:
print(browser.page_source)

<html lang="en-US" class=" js csstransforms csstransforms3d csstransitions"><head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1"><link rel="profile" href="http://gmpg.org/xfn/11"><link rel="pingback" href="https://www.techwithtim.net/xmlrpc.php">  <script src="https://partner.googleadservices.com/gampad/cookie.js?domain=www.techwithtim.net&amp;callback=_gfp_s_&amp;client=ca-pub-6240468619130074&amp;gpid_exp=1"></script><script src="https://pagead2.googlesyndication.com/pagead/managed/js/adsense/m202207280101/show_ads_impl_fy2021.js?bust=31068684" id="google_shimpl"></script><script type="text/javascript" async="" src="https://www.google-analytics.com/analytics.js"></script><script async="" src="https://www.googletagmanager.com/gtag/js?id=UA-129383985-3"></script> <script>window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());

  gtag('config', 'UA-129383985-3');</script> <script>functi

		}</style><link rel="icon" href="https://www.techwithtim.net/wp-content/uploads/2020/02/cropped-Tech-With-TimXL-32x32.png" sizes="32x32"><link rel="icon" href="https://www.techwithtim.net/wp-content/uploads/2020/02/cropped-Tech-With-TimXL-192x192.png" sizes="192x192"><link rel="apple-touch-icon" href="https://www.techwithtim.net/wp-content/uploads/2020/02/cropped-Tech-With-TimXL-180x180.png"><meta name="msapplication-TileImage" content="https://www.techwithtim.net/wp-content/uploads/2020/02/cropped-Tech-With-TimXL-270x270.png"><link rel="stylesheet" href="https://use.fontawesome.com/releases/v5.6.0/css/all.css" integrity="sha384-aOkxzJ5uQz7WBObEZcHvV5JvRW3TUc2rNPA7pe3AwnsUohiw1Vj2Rgx2KSOkF5+h" crossorigin="anonymous"><meta name="google-site-verification" content="Klz-majLq2UU1MPxJLTwwrCBRgin6tHkBkqiIx8eaPA"><meta http-equiv="origin-trial" content="AzoawhTRDevLR66Y6MROu167EDncFPBvcKOaQispTo9ouEt5LvcBjnRFqiAByRT+2cDHG1Yj4dXwpLeIhc98/gIAAACFeyJvcmlnaW4iOiJodHRwczovL2RvdWJsZWNsaWNrLm5ldDo

In [None]:
type(browser.page_source)

str

In [None]:
search = browser.find_element_by_name('s')
search.clear()
search.send_keys('test')
search.send_keys(Keys.RETURN)

In [None]:
home = browser.find_element_by_id('text-title-desc').find_element_by_tag_name('a')
home.click()
time.sleep(3)
python = browser.find_element_by_link_text('Python Programming')
python.click()

In [None]:
browser.back()
time.sleep(1)

In [None]:
browser.forward()
time.sleep(1)

In [None]:
browser.back()
time.sleep(1)

In [None]:
search = browser.find_element_by_name('s')
search.clear()
search.send_keys('python')
search.send_keys(Keys.RETURN)

In [None]:
main = browser.find_element_by_id('main')
headers = main.find_elements_by_class_name('entry-title')

for header in headers :
        text_tag = header.find_element_by_tag_name('a')
        print(text_tag.get_attribute('text'))
        print(text_tag.get_attribute('href'))
        print('--'*50)

Python AI ChatBot Tutorial
https://www.techwithtim.net/tutorials/ai-chatbot/
----------------------------------------------------------------------------------------------------
Python Neural Networks
https://www.techwithtim.net/tutorials/python-neural-networks/
----------------------------------------------------------------------------------------------------
Sending Emails with Python
https://www.techwithtim.net/tutorials/sending-emails-with-python/
----------------------------------------------------------------------------------------------------
Python Multi-Threading Tutorials
https://www.techwithtim.net/tutorials/python-programming/python-multi-threading/
----------------------------------------------------------------------------------------------------
Google Sheets – Python API Tutorial
https://www.techwithtim.net/tutorials/google-sheets-python-api-tutorial/
----------------------------------------------------------------------------------------------------
Python Online Gam

In [None]:
i = 1
while True :
        main = browser.find_element_by_id('main')
        headers = main.find_elements_by_class_name('entry-title')
        print('Page ' + str(i))
        print('--------')
        print('\n')
        for header in headers :
                text_tag = header.find_element_by_tag_name('a')
                print('\tName :'+ text_tag.get_attribute('text'))
                print('\t' + text_tag.get_attribute('href'))
                print('--'*50)
        try :         
            browser.find_element_by_link_text('Next →').click()
            i += 1
            print('\n\n')
            print('\tTo Next Page\n\n')
            time.sleep(2)
        except : 
            break

Page 1
--------


	Name :Python AI ChatBot Tutorial
	https://www.techwithtim.net/tutorials/ai-chatbot/
----------------------------------------------------------------------------------------------------
	Name :Python Neural Networks
	https://www.techwithtim.net/tutorials/python-neural-networks/
----------------------------------------------------------------------------------------------------
	Name :Sending Emails with Python
	https://www.techwithtim.net/tutorials/sending-emails-with-python/
----------------------------------------------------------------------------------------------------
	Name :Python Multi-Threading Tutorials
	https://www.techwithtim.net/tutorials/python-programming/python-multi-threading/
----------------------------------------------------------------------------------------------------
	Name :Google Sheets – Python API Tutorial
	https://www.techwithtim.net/tutorials/google-sheets-python-api-tutorial/
------------------------------------------------------------

	https://www.techwithtim.net/tutorials/python-programming/intermediate-python-tutorials/map-function/
----------------------------------------------------------------------------------------------------
	Name :Static & Class Methods
	https://www.techwithtim.net/tutorials/python-programming/intermediate-python-tutorials/static-class-methods/
----------------------------------------------------------------------------------------------------



	To Next Page


Page 9
--------


	Name :Optional Parameters
	https://www.techwithtim.net/tutorials/python-programming/intermediate-python-tutorials/optional-parameters/
----------------------------------------------------------------------------------------------------
	Name :Classes and Objects
	https://www.techwithtim.net/tutorials/python-programming/beginner-python-tutorials/classes-and-objects/
----------------------------------------------------------------------------------------------------
	Name :Global vs Local
	https://www.techwithtim.n

In [None]:
browser.quit()

In [None]:
frist_travel_book = soup.find('article',  attrs = {'class':'product_pod'})
frist_travel_book

<article class="product_pod">
<div class="image_container">
<a href="../../../its-only-the-himalayas_981/index.html"><img alt="It's Only the Himalayas" class="thumbnail" src="../../../../media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg"/></a>
</div>
<p class="star-rating Two">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>
<h3><a href="../../../its-only-the-himalayas_981/index.html" title="It's Only the Himalayas">It's Only the Himalayas</a></h3>
<div class="product_price">
<p class="price_color">Â£45.17</p>
<p class="instock availability">
<i class="icon-ok"></i>
    
        In stock
    
</p>
<form>
<button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
</form>
</div>
</article>

In [None]:
frist_travel_book_name = frist_travel_book.find('h3').get_text().strip()
frist_travel_book_name

"It's Only the Himalayas"

In [None]:
frist_travel_book_rating = frist_travel_book.find('p', attrs = {'class':'star-rating'}).get('class')[1]
frist_travel_book_rating

'Two'

In [None]:
frist_travel_book_price = frist_travel_book.find('div', attrs = {'class':'product_price'}).find('p',{'class' :"price_color"})
frist_travel_book_price = float(frist_travel_book_price.get_text().split('Â£')[1])
frist_travel_book_price

45.17

In [None]:
travel_books = soup.find_all('article',  attrs = {'class':'product_pod'})
travel_books

[<article class="product_pod">
 <div class="image_container">
 <a href="../../../its-only-the-himalayas_981/index.html"><img alt="It's Only the Himalayas" class="thumbnail" src="../../../../media/cache/27/a5/27a53d0bb95bdd88288eaf66c9230d7e.jpg"/></a>
 </div>
 <p class="star-rating Two">
 <i class="icon-star"></i>
 <i class="icon-star"></i>
 <i class="icon-star"></i>
 <i class="icon-star"></i>
 <i class="icon-star"></i>
 </p>
 <h3><a href="../../../its-only-the-himalayas_981/index.html" title="It's Only the Himalayas">It's Only the Himalayas</a></h3>
 <div class="product_price">
 <p class="price_color">Â£45.17</p>
 <p class="instock availability">
 <i class="icon-ok"></i>
     
         In stock
     
 </p>
 <form>
 <button class="btn btn-primary btn-block" data-loading-text="Adding..." type="submit">Add to basket</button>
 </form>
 </div>
 </article>,
 <article class="product_pod">
 <div class="image_container">
 <a href="../../../full-moon-over-noahs-ark-an-odyssey-to-mount-ararat-an

In [None]:
len(travel_books)

11

In [None]:
for book in travel_books :
    travel_book_name = book.find('h3').get_text().strip()
    rates = {'One' : 1 , 'Two' : 2 , 'Three' : 3 , 'Four' : 4 , 'Five' : 5}
    travel_book_rating = rates[book.find('p', attrs = {'class':'star-rating'}).get('class')[1]]
    travel_book_price = book.find('div', attrs = {'class':'product_price'}).find('p',{'class' :"price_color"})
    travel_book_price = float(travel_book_price.get_text().split('Â£')[1])
    print(f'Name : {travel_book_name} ||  Rating : {travel_book_rating} || Price :{travel_book_price}')
    print('---'*40)   

Name : It's Only the Himalayas ||  Rating : 2 || Price :45.17
------------------------------------------------------------------------------------------------------------------------
Name : Full Moon over Noahâs ... ||  Rating : 4 || Price :49.43
------------------------------------------------------------------------------------------------------------------------
Name : See America: A Celebration ... ||  Rating : 3 || Price :48.87
------------------------------------------------------------------------------------------------------------------------
Name : Vagabonding: An Uncommon Guide ... ||  Rating : 2 || Price :36.94
------------------------------------------------------------------------------------------------------------------------
Name : Under the Tuscan Sun ||  Rating : 3 || Price :37.33
------------------------------------------------------------------------------------------------------------------------
Name : A Summer In Europe ||  Rating : 2 || Price :44.34
---------

In [None]:
with open(file_name, mode='w', encoding='utf-8') as csv_file:
    fieldnames = ['Book Name', 'Book Rating' , 'Book Price']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
    writer.writeheader()
    for book in travel_books :
        travel_book_name = book.find('h3').get_text().strip()
        rates = {'One' : 1 , 'Two' : 2 , 'Three' : 3 , 'Four' : 4 , 'Five' : 5}
        travel_book_rating = rates[book.find('p', attrs = {'class':'star-rating'}).get('class')[1]]
        travel_book_price = book.find('div', attrs = {'class':'product_price'}).find('p',{'class' :"price_color"})
        travel_book_price = float(travel_book_price.get_text().split('Â£')[1]) 
        writer.writerow({'Book Name': travel_book_name, 'Book Rating': travel_book_rating 
                        , 'Book Price' :travel_book_price})
        
    
 

In [None]:
df = pd.read_csv(file_name)
df

Unnamed: 0,Book Name,Book Rating,Book Price
0,It's Only the Himalayas,2,45.17
1,Full Moon over Noahâs ...,4,49.43
2,See America: A Celebration ...,3,48.87
3,Vagabonding: An Uncommon Guide ...,2,36.94
4,Under the Tuscan Sun,3,37.33
5,A Summer In Europe,2,44.34
6,The Great Railway Bazaar,1,30.54
7,A Year in Provence ...,4,56.88
8,The Road to Little ...,1,23.21
9,Neither Here nor There: ...,3,38.95


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 3 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Book Name    11 non-null     object 
 1   Book Rating  11 non-null     int64  
 2   Book Price   11 non-null     float64
dtypes: float64(1), int64(1), object(1)
memory usage: 392.0+ bytes


In [None]:
df.describe()

Unnamed: 0,Book Rating,Book Price
count,11.0,11.0
mean,2.727273,39.794545
std,1.272078,10.394198
min,1.0,23.21
25%,2.0,33.74
50%,3.0,38.95
75%,3.5,47.02
max,5.0,56.88


In [None]:
response = requests.get('https://books.toscrape.com/catalogue/category/books/womens-fiction_9/index.html')
soup = BeautifulSoup(response.text, 'html.parser')

file_name = soup.find('title').get_text().strip().split(' | ')[0]
file_name += '.csv'

rates = {'One': 1, 'Two': 2, 'Three': 3, 'Four': 4, 'Five': 5}

with open(file_name, mode='w', encoding='utf-8') as csv_file:
    fieldnames = ['book_name', 'book_price', 'book_rate']
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)

    writer.writeheader()
    for article in soup.find_all('article', attrs={'class': 'product_pod'}):
        book_name = article.find('h3').find('a').get('title')
        book_price = float(article.find('p', attrs={'class': 'price_color'}).get_text()[2:])
        book_rate = rates[article.find('p', attrs={'class': 'star-rating'}).get('class')[-1]]
        writer.writerow({'book_name': book_name, 'book_price': book_price, 'book_rate': book_rate})

In [None]:
data = pd.read_csv('Womens Fiction.csv')
data

Unnamed: 0,book_name,book_price,book_rate
0,I Had a Nice Time And Other Lies...: How to fi...,57.36,4
1,Will You Won't You Want Me?,13.86,3
2,Keep Me Posted,20.46,4
3,Grey (Fifty Shades #4),48.49,4
4,Meternity,43.58,3
5,Some Women,13.73,5
6,Shopaholic Ties the Knot (Shopaholic #3),48.39,5
7,Can You Keep a Secret?,21.94,1
8,Twenties Girl,42.8,2
9,The Undomestic Goddess,45.75,4
