<a href="https://colab.research.google.com/github/Thathireddy-Sravya123/Pandas_Learning/blob/main/Web_Scraping_using_Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Web Scraping using Python :**

Web scraping is essentially the process of automatically collecting data from websites.

Imagine you want to gather product information from an e-commerce site, or news articles from a publication. Web scraping automates this by using a software tool to extract the specific data you need, instead of manually copying and pasting it yourself.

Here's a breakdown of how it works:

*   Data Extraction bold text
*   Tools and Techniques

It's important to remember that scraping should be done responsibly.  Always check the website's terms and conditions to ensure scraping is allowed, and avoid overloading the site with too many requests



**Modules in Web Scraping :**


There are several key modules used for web scraping, primarily in Python:

**Requests**: This library handles sending HTTP requests (like GET or POST) to websites and retrieving the response content. It's a popular choice for its simplicity and ease of use.

**BeautifulSoup**: This library excels at parsing HTML and XML content. It helps navigate the structure of the downloaded webpage, allowing you to target specific elements containing the desired data.

**Code for Web Scraping**

This code performs the following steps:

1. Imports requests and BeautifulSoup libraries.
2. Defines the url variable for the target webpage.
3. Sends a GET request using requests.get and stores the response in a variable.
4. Checks the response status code. If it's 200 (success), proceed with parsing the content.
5. Creates a BeautifulSoup object using the response content.
6. Finds all elements containing product information using find_all. You'll need to adjust the tag and class name based on the website's HTML structure.
7. Iterates through each product element and searches for the title element using find. Again, adjust the tag and class name as needed.
Extracts the text content of the title element using .text and strips whitespace with .strip().
8. Appends the extracted title to the titles list.
9. Prints the final list of product titles or an error message if the request fails.

In [None]:
import requests
from bs4 import BeautifulSoup

# Define the target URL
url = "https://www.example.com/products"

# Send an HTTP GET request
response = requests.get(url)

# Check for successful response
if response.status_code == 200:
  # Parse the HTML content
  soup = BeautifulSoup(response.content, 'html.parser')

  # Find all product elements (replace 'div' with the appropriate tag based on website structure)
  products = soup.find_all('div', class_="product-item")  # Adjust the class name as needed

  # Extract product titles
  titles = []
  for product in products:
    title_element = product.find('h3', class_="product-title")  # Adjust the class name as needed
    if title_element:
      titles.append(title_element.text.strip())

  # Print the extracted titles
  print(titles)
else:
  print("Error:", response.status_code)
