# WebScraping Guide
#### Web scraping = programmatically downloading web pages and extracting structured data (HTML → CSV/JSON/database). Use it for price monitoring, research, competitive intel, ad-hoc data collection, and automation.
#### Legal & ethical rules to keep in mind
+ Respect robots.txt (it signals crawler rules but isn’t an enforcement law). Always check it and follow site owners’ crawl-rate and disallowed paths where possible. 
+ Be transparent & polite: include an informative User-Agent, provide contact info if scraping at scale, and avoid harming site availability. Industry groups and advocates strongly recommend this. 
+ Electronic Frontier Foundation
+ Don’t scrape private/personal data or violate Terms of Service — legality varies by country. When in doubt, ask for permission or use official APIs.
#### Web Scraping with Requests and BeaitifulSoup(How its works!...)
1. Send an HTTP request to a webpage
2. Get the HTML
3. Parse HTML with BeautifulSoup
4. Select the elements (CSS selectors / tags)
5. Extract clean text
6. Save data (CSV / JSON)

# Install LIBARARIES in Command prompt
### pip install requests
### pip install beautifulsoup4
### pip in

+ ### check status code (Status code: 200 = OK you can work on website means scrape on web, 3xx redirects, 4xx client error, 5xx server error)

In [148]:
web = requests.get("https://www.tutorialsfreak.com/")
print(web) 

<Response [200]>


In [149]:
# to see website html source codes 
web.content

b'<!DOCTYPE html><html lang="en"><head><meta charSet="utf-8"/><title>Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill</title><link rel="canonical" href="https://www.tutorialsfreak.com/"/><meta name="viewport" content="initial-scale=1.0, width=device-width"/><meta name="keyword"/><meta name="description" content="Tutorials Freak is your source for upskilling with free online tutorials on web development, coding, cyber security, digital marketing, technologies &amp; more."/><meta property="og:title" content="Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill"/><meta property="og:site_name" content="Tutorials Freak"/><meta property="og:url" content="https://www.tutorialsfreak.com/"/><meta property="og:type" content="business.business"/><meta property="og:description" content="Tutorials Freak is your source for upskilling with free online tutorials on web development, coding, cyber security, digital marketing, technologies &amp; more."/><meta property="og:image" con

In [150]:
#to see web url
web.url

'https://www.tutorialsfreak.com/'

In [151]:
web.status_code

200

+ #### web.content: HTML from the webpage
+ #### BeautifulSoup(...) : Convert HTML to searchable soup
+ #### soup.prettify() Show HTML neatly formatted

In [153]:
#soup = BeautifulSoup(web.content, "html.parser")
print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill
  </title>
  <link href="https://www.tutorialsfreak.com/" rel="canonical"/>
  <meta content="initial-scale=1.0, width=device-width" name="viewport"/>
  <meta name="keyword"/>
  <meta content="Tutorials Freak is your source for upskilling with free online tutorials on web development, coding, cyber security, digital marketing, technologies &amp; more." name="description"/>
  <meta content="Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill" property="og:title"/>
  <meta content="Tutorials Freak" property="og:site_name"/>
  <meta content="https://www.tutorialsfreak.com/" property="og:url"/>
  <meta content="business.business" property="og:type"/>
  <meta content="Tutorials Freak is your source for upskilling with free online tutorials on web development, coding, cyber security, digital marketing, technologies &amp; more." property="og:de

In [154]:
soup.title.name

'title'

In [155]:
soup.title

<title>Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill</title>

In [156]:
soup.p

<p class="section-subheading">Kickstart effective learning with Tutorials Freak, with new content published every day.</p>

In [157]:
soup.a

<a class="header-logo-wrapper" href="/"><span class="me-lg-5 navbar-brand"><span style="box-sizing:border-box;display:inline-block;overflow:hidden;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;position:relative;max-width:100%"><span style="box-sizing:border-box;display:block;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;max-width:100%"><img alt="" aria-hidden="true" src="data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27192%27%20height=%2756%27/%3e" style="display:block;max-width:100%;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0"/></span>High-Quality, Practical-based, Free Tutorials</h1>

In [159]:
print(soup.body.prettify())

<body>
 <noscript>
  <iframe height="0" src="https://www.googletagmanager.com/ns.html?id=GTM-PC6BJKM8" style="display:none;visibility:hidden" width="0">
  </iframe>
 </noscript>
 <div id="__next">
  <div class="Toastify">
  </div>
  <div class="wrapperr h-100">
   <header class="header-section d-flex align-items-lg-center px-xl-5 px-4 position-fixed w-100 bg-white header-new-padding">
    <div class="header-bottom-line">
    </div>
    <nav class="w-100 navbar navbar-expand-lg navbar-light">
     <div class="px-0 container-fluid">
      <a class="header-logo-wrapper" href="/">
       <span class="me-lg-5 navbar-brand">
        <span style="box-sizing:border-box;display:inline-block;overflow:hidden;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;position:relative;max-width:100%">
         <span style="box-sizing:border-box;display:block;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;max-width:100%">
          <img al

## Scrape tag

In [160]:
tag = soup.html
type(tag)


bs4.element.Tag

In [161]:
tag = soup.p
tag

<p class="section-subheading">Kickstart effective learning with Tutorials Freak, with new content published every day.</p>

In [162]:
tag = soup.h3
tag

<h3 class="fs-20 lh-30 fw-600 label-color-5">Verified &amp; Reliable Content</h3>

In [163]:
tag = soup.a
tag

<a class="header-logo-wrapper" href="/"><span class="me-lg-5 navbar-brand"><span style="box-sizing:border-box;display:inline-block;overflow:hidden;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;position:relative;max-width:100%"><span style="box-sizing:border-box;display:block;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;max-width:100%"><img alt="" aria-hidden="true" src="data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27192%27%20height=%2756%27/%3e" style="display:block;max-width:100%;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0"/></span><noscript><iframe height="0" src="https://www.googletagmanager.com/ns.html?id=GTM-PC6BJKM8" style="display:none;visibility:hidden" width="0"></iframe></noscript><div id="__next"><div class="Toastify"></div><div class="wrapperr h-100"><header class="header-section d-flex align-items-lg-center px-xl-5 px-4 position-fixed w-100 bg-white header-new-padding"><div class="header-bottom-line"></div><nav class="w-100 navbar navbar-expand-lg navbar-light"><div class="px-0 container-fluid"><a class="header-logo-wrapper" href="/"><span class="me-lg-5 navbar-brand"><span style="box-sizing:border-box;display:inline-block;overflow:hidden;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;position:relative;max-width:100%"><span style="box-sizing:border-box;display:block;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;max-width:100%"><img alt="" aria-hidden="true" src="data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/sv

In [168]:
soup.title

<title>Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill</title>

In [169]:
soup.find("p")

<p class="section-subheading">Kickstart effective learning with Tutorials Freak, with new content published every day.</p>

In [170]:
soup.find_all("p")

[<p class="section-subheading">Kickstart effective learning with Tutorials Freak, with new content published every day.</p>,
 <p class="fs-16 fw-400 lh-24 label-color-1 card-text">The entire content on the site is verified by pro developers and tech freaks.</p>,
 <p class="fs-16 fw-400 lh-24 label-color-1 card-text">The entire content on the site is verified by pro developers and tech freaks. Enabling self-directed learning so that you can learn at your pace and shape your own path.</p>,
 <p class="fs-16 fw-400 lh-24 label-color-1 card-text">Along with simplified tutorials, you get video content created by industry experts.</p>,
 <p class="fw-400 fs-20 lh-30 label-color-2 mb-lg-5">Learning programming and technical things can be complex. We are here to make it easy with simple and interactive tutorials.</p>,
 <p class="section-subheading mb-0">Learning programming and technical things can be complex. We are here to make it easy with simple and interactive tutorials.</p>,
 <p class="sec

# commets

In [171]:
com = soup.p.string
com

'Kickstart effective learning with Tutorials Freak, with new content published every day.'

## finding elements from class or id

In [172]:
import requests
from bs4 import BeautifulSoup

In [173]:
web = requests.get("https://www.tutorialsfreak.com/")
web

<Response [200]>

In [174]:
# soup ke jariye ham data dekh skte hai
soup = BeautifulSoup(web.content, "html.parser")

In [175]:
print (soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill
  </title>
  <link href="https://www.tutorialsfreak.com/" rel="canonical"/>
  <meta content="initial-scale=1.0, width=device-width" name="viewport"/>
  <meta name="keyword"/>
  <meta content="Tutorials Freak is your source for upskilling with free online tutorials on web development, coding, cyber security, digital marketing, technologies &amp; more." name="description"/>
  <meta content="Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill" property="og:title"/>
  <meta content="Tutorials Freak" property="og:site_name"/>
  <meta content="https://www.tutorialsfreak.com/" property="og:url"/>
  <meta content="business.business" property="og:type"/>
  <meta content="Tutorials Freak is your source for upskilling with free online tutorials on web development, coding, cyber security, digital marketing, technologies &amp; more." property="og:de

In [176]:
soup.find_all('h3')

[<h3 class="fs-20 lh-30 fw-600 label-color-5">Verified &amp; Reliable Content</h3>,
 <h3 class="fs-20 lh-30 fw-600 label-color-5">Self-Directed Learning</h3>,
 <h3 class="fs-20 lh-30 fw-600 label-color-5">Dual Learning Methods</h3>,
 <h3 class="fs-16 lh-30 fw-500 label-color-5 my-1 heading-turnicate" title="jQuery Interview Questions">jQuery Interview Questions</h3>,
 <h3 class="fs-16 lh-30 fw-500 label-color-5 my-1 heading-turnicate" title="PHP Interview Questions">PHP Interview Questions</h3>,
 <h3 class="fs-16 lh-30 fw-500 label-color-5 my-1 heading-turnicate" title="Git Interview Questions">Git Interview Questions</h3>,
 <h3 class="fs-16 lh-30 fw-500 label-color-5 my-1 heading-turnicate" title="ReactJS Interview Questions">ReactJS Interview Questions</h3>,
 <h3 class="fs-16 lh-30 fw-500 label-color-5 my-1 heading-turnicate" title="Web App Penetration Testing Interview Questions">Web App Penetration Testing Interview Questions</h3>,
 <h3 class="fs-16 lh-30 fw-500 label-color-5 my-1 

In [177]:
class_data = soup.find("div", class_ = "why-choose-card card-shadow card")
class_data

<div class="why-choose-card card-shadow card"><div class="why-choose-us-img"><span style="box-sizing:border-box;display:inline-block;overflow:hidden;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;position:relative;max-width:100%"><span style="box-sizing:border-box;display:block;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;max-width:100%"><img alt="" aria-hidden="true" src="data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2748%27%20height=%2748%27/%3e" style="display:block;max-width:100%;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0"/></span>The entire content on the site is verified by pro developers and tech freaks.</p>]

In [179]:
clas_data = soup.find("div", id_ = "__next")
class_data

<div class="why-choose-card card-shadow card"><div class="why-choose-us-img"><span style="box-sizing:border-box;display:inline-block;overflow:hidden;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;position:relative;max-width:100%"><span style="box-sizing:border-box;display:block;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;max-width:100%"><img alt="" aria-hidden="true" src="data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2748%27%20height=%2748%27/%3e" style="display:block;max-width:100%;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0"/></span>Kickstart effective learning with Tutorials Freak, with new content published every day.</p>,
 <p class="fs-16 fw-400 lh-24 label-color-1 card-text">The entire content on the site is verified by pro developers and tech freaks.</p>,
 <p class="fs-16 fw-400 lh-24 label-color-1 card-text">The entire content on the site is verified by pro developers and tech freaks. Enabling self-directed learning so that you can learn at your pace and shape your own path.</p>,
 <p class="fs-16 fw-400 lh-24 label-color-1 card-text">Along with simplified tutorials, you get video content created by industry experts.</p>,
 <p class="fw-400 fs-20 lh-30 label-color-2 mb-lg-5">Learning programming and technical things can be complex. We are here to make it easy with simple and interactive tutorials.</p>,
 <p class="section-subheading mb-0">Learning programming and technical things can be complex. We are here to make it easy with simple and interactive tutorials.</p>,
 <p class="sec

In [181]:
for l in lines:
    print(l.text)

Kickstart effective learning with Tutorials Freak, with new content published every day.
The entire content on the site is verified by pro developers and tech freaks.
The entire content on the site is verified by pro developers and tech freaks. Enabling self-directed learning so that you can learn at your pace and shape your own path.
Along with simplified tutorials, you get video content created by industry experts.
Learning programming and technical things can be complex. We are here to make it easy with simple and interactive tutorials.
Learning programming and technical things can be complex. We are here to make it easy with simple and interactive tutorials.
Explore the expert-curated interview questions with answers.
Plenty of quizzes with time duration and skill assessment.
Find coding examples with programs, output, easy explanations, and videos.

With an account, you get access to premium content and courses at no cost.
Download the app now to learn and practice hassle-free.
© 

In [182]:
s = soup.find("div", class_ = "why-choose-card card-shadow card")
s

<div class="why-choose-card card-shadow card"><div class="why-choose-us-img"><span style="box-sizing:border-box;display:inline-block;overflow:hidden;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;position:relative;max-width:100%"><span style="box-sizing:border-box;display:block;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;max-width:100%"><img alt="" aria-hidden="true" src="data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2748%27%20height=%2748%27/%3e" style="display:block;max-width:100%;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0"/></span>The entire content on the site is verified by pro developers and tech freaks.</p>]

In [184]:
for l1 in lines_1:
    print(l1.text)

The entire content on the site is verified by pro developers and tech freaks.


In [185]:
# next method shortcuts
s1 = soup.find("p",class_ = "fs-16 fw-400 lh-24 label-color-1 card-text")
s1
               

<p class="fs-16 fw-400 lh-24 label-color-1 card-text">The entire content on the site is verified by pro developers and tech freaks.</p>

In [186]:
s1.text

'The entire content on the site is verified by pro developers and tech freaks.'

# extracting links in web page

In [187]:
soup.find_all("a")

[<a class="header-logo-wrapper" href="/"><span class="me-lg-5 navbar-brand"><span style="box-sizing:border-box;display:inline-block;overflow:hidden;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;position:relative;max-width:100%"><span style="box-sizing:border-box;display:block;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0;max-width:100%"><img alt="" aria-hidden="true" src="data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27192%27%20height=%2756%27/%3e" style="display:block;max-width:100%;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0"/></span>
<html lang="en"><head><meta charset="utf-8"/><title>Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill</title><link href="https://www.tutorialsfreak.com/" rel="canonical"/><meta content="initial-scale=1.0, width=device-width" name="viewport"/><meta name="keyword"/><meta content="Tutorials Freak is your source for upskilling with free online tutorials on web development, coding, cyber security, digital marketing, technologies &amp; more." name="description"/><meta content="Tutorials Freak: Free Online Tutorials to Learn &amp; Upskill" property="og:title"/><meta content="Tutorials Freak" property="og:site_name"/><meta content="https://www.tutorialsfreak.com/" property="og:url"/><meta content="business.business" property="og:type"/><meta content="Tutorials Freak is your source for upskilling with free online tutorials on web development, coding, cyber security, digital marketing, technologies &amp; more." property="og:description"/>

In [190]:
img =  soup.find_all("img")
img

[<img alt="" aria-hidden="true" src="data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27192%27%20height=%2756%27/%3e" style="display:block;max-width:100%;width:initial;height:initial;background:none;opacity:1;border:0;margin:0;padding:0"/>,
 ,
 <img alt="Logo-Img" class="img-fluid" data-nimg="intrinsic" decoding="async" loading="lazy" src="/images/tutorials-freak-logo.svg" srcset="/images/tutorials-freak-logo.svg 1x, /images/tutorials-freak-logo.svg 2x" style="position:absolute;top:0;left:0;bottom:0;right:0;box-sizing:border-box;padding:0;border:none;margin:auto;display:block;wi

In [191]:
for i in img:
    print(i.get("src"))
    print(type(i.get("src")))

data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%27192%27%20height=%2756%27/%3e
<class 'str'>
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
<class 'str'>
/images/tutorials-freak-logo.svg
<class 'str'>
/images/search.svg
<class 'str'>
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
<class 'str'>
/_next/image?url=%2Fimages%2Fbanner-middle.webp&w=3840&q=75
<class 'str'>
/images/kotlin-img.svg
<class 'str'>
/images/eagle-img.svg
<class 'str'>
/images/apple-img.svg
<class 'str'>
/images/angular-img.svg
<class 'str'>
/images/python-img.svg
<class 'str'>
/images/react-img.svg
<class 'str'>
/images/android-img.svg
<class 'str'>
/images/java-img.svg
<class 'str'>
data:image/svg+xml,%3csvg%20xmlns=%27http://www.w3.org/2000/svg%27%20version=%271.1%27%20width=%2748%27%20height=%2748%27/%3e
<class 'str'>
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7
<class 

In [192]:
for i in img:
    print(i.get("alt"))


Logo-Img
Logo-Img
Serch-Img
Banner-Img
Banner-Img
Course-Img
Course-Img
Course-Img
Course-Img
Course-Img
Course-Img
Course-Img
Course-Img

Why-Choose-Img
Why-Choose-Img

Why-Choose-Img
Why-Choose-Img

Why-Choose-Img
Why-Choose-Img


compiler-img
compiler-img

compiler-img
compiler-img

compiler-img
compiler-img

compiler-img
compiler-img

compiler-img
compiler-img

compiler-img
compiler-img

compiler-img
compiler-img
https://d20evgacl8spoj.cloudfront.net/uploads/learning-course/images/lcourses/861b734d-1200-4ef9-9f80-fdcbf2d5918b-1654781169.png
https://d20evgacl8spoj.cloudfront.net/uploads/learning-course/images/lcourses/99334bcb-06ec-402d-8461-a1d886d97b8e-1654319072.png
https://d20evgacl8spoj.cloudfront.net/uploads/learning-course/images/lcourses/28a317b0-5132-4aba-8a49-5ea970f23a49-1671862480.png
https://d20evgacl8spoj.cloudfront.net/uploads/learning-course/images/lcourses/6dee8d03-5667-4b32-b483-af2b62dcd8d3-1647338788.png
https://d20evgacl8spoj.cloudfront.net/uploads/learning-cou

In [None]:
THANK YOU --- HIMANSHU SHARMA [HimCodex Github]