# INTRODUCTION TO WEB SCRAPING

## What is Web Scraping?

Web scraping is the process of automatically extracting data from websites. It allows you to download data from web pages and extract specific information that you need. This can be a very useful technique for data analysis, research, and automation.

There are several tools and libraries available for web scraping, but in this notebook, we will focus on using the **BeautifulSoup** library in Python.

Real world **examples**:
    
    - Scraping e-commerce wesites for product data
    
    - Scraping job posting for marker research
    
    - scraping news articles for sentiment analysis


## What is BeautifulSoup?

BeautifulSoup is a Python library that makes it easy to scrape information from web pages. It parses the HTML or XML structure of a web page and provides a simple interface for navigating and searching the page.

With BeautifulSoup, you can easily extract data from web pages, such as text, links, images, and tables. You can also use it to search for specific elements or attributes on a page, and to navigate between pages.

# Using Requests and Beautiful Soup for we scraping

When scraping a website, the **first step** is to make an HTTP request to the website's server to retrieve the HTML content. Requests library is used to make this request, and it can handle various HTTP methods such as GET, POST, PUT, DELETE, etc.

Requests is a Python library that is often used in conjunction with Beautiful Soup for web scraping. It is a popular HTTP library that allows Python developers to send HTTP/1.1 requests and handle responses easily.

Once the HTML content is retrieved, Beautiful Soup can be used to parse the HTML and extract relevant data. Beautiful Soup is a Python library used for web scraping purposes to extract the data from HTML and XML files. It provides a convenient and easy-to-use interface for parsing HTML and XML documents.

Therefore, requests and Beautiful Soup are often used together in web scraping projects. Requests is used to retrieve the HTML content, and Beautiful Soup is used to parse and extract data from the HTML content.

# Installing requests 

To install the Requests library in Python, you can use pip, which is the package installer for Python. Here are the steps to install the Requests library using pip:

1) Open a command prompt or terminal on your computer.
2) Type the following command and press Enter to ensure that pip is      up-to-date:

                > pip install --upggrade pip
                
3) Type the following command and press Enter to install the Requests library:

                > pip install requests
                
After completing these steps, the Requests library should be installed on your computer, and you can start using it for your web scraping projects in Python.

## AGENDA : To scrape the reviews of 'iPhone 11' on flipkart

**link** : https://www.flipkart.com/apple-iphone-11-black-64-gb/product-reviews/itm4e5041ba101fd?pid=MOBFWQ6BXGJCEYNY&page=< Specify page number >

In [4]:
# Load the necessary libraries

from bs4 import BeautifulSoup as bs #bs4 stands for Beautifulsoup4

import requests 

name_list = []
rating_list = []
review_list = []

for page in range(1,1105):
    
    url = "https://www.flipkart.com/apple-iphone-11-black-64-gb/product-reviews/itm4e5041ba101fd?pid=MOBFWQ6BXGJCEYNY&page="+str(page)
    status_code = requests.get(url)
    
    #get() sends a HTTP GET request and receive a response

    #get() returns a HTTP status code of response

    #If status code is 200 then OK 
    #If status code is 404 then page not found
    #If status code is 503 for server unavailable currently, etc
    
    if str(status_code) == "<Response [200]>":
        
        soup = bs(status_code.content,'html.parser') # soup is an object of bs 
        
        print("Started to scrape page ",page)
        
    #From inspect, click on select option (top right corner in inpect tab)
    #then click on any name of customer on page to get the name tag and it's class name
        
        
        names = soup.find_all('p',class_='_2sc7ZR _2V5EHH')
        
        # same with ratings and reviews
        
        ratings = soup.find_all('div',class_='_3LWZlK _1BLPMq')
        reviews = soup.find_all('div',class_='t-ZTKy')
        
        #find_all() will give all the content of specified tag and class
        
        for i in range(len(names)):
            name_list.append(names[i].get_text())
            if len(ratings) > i:
                rating_list.append(ratings[i].get_text())
            else:
                rating_list.append("N/A")
            if len(reviews) > i:
                review_list.append(reviews[i].get_text())
            else:
                review_list.append("N/A")
            
        print("Page ",page," scraped succesfully")
            

Started to scrape page  1
Page  1  scraped succesfully
Started to scrape page  2
Page  2  scraped succesfully
Started to scrape page  3
Page  3  scraped succesfully
Started to scrape page  4
Page  4  scraped succesfully
Started to scrape page  5
Page  5  scraped succesfully
Started to scrape page  6
Page  6  scraped succesfully
Started to scrape page  7
Page  7  scraped succesfully
Started to scrape page  8
Page  8  scraped succesfully
Started to scrape page  9
Page  9  scraped succesfully
Started to scrape page  10
Page  10  scraped succesfully
Started to scrape page  11
Page  11  scraped succesfully
Started to scrape page  12
Page  12  scraped succesfully
Started to scrape page  13
Page  13  scraped succesfully
Started to scrape page  14
Page  14  scraped succesfully
Started to scrape page  15
Page  15  scraped succesfully
Started to scrape page  16
Page  16  scraped succesfully
Started to scrape page  17
Page  17  scraped succesfully
Started to scrape page  18
Page  18  scraped succ

In [5]:
# now we'll import pandas to convert the data into a csv
import pandas as pd 

df = pd.DataFrame()

df['Customer names'] = name_list
df['Ratings out of 5'] = rating_list
df['Review'] = review_list
df

Unnamed: 0,Customer names,Ratings out of 5,Review
0,Flipkart Customer,5,Really satisfied with the Product I received.....
1,Flipkart Customer,5,Amazing phone with great cameras and better ba...
2,Manish Raghuvanshi,5,Great iPhone very snappy experience as apple k...
3,George Haokip,4,I'm Really happy with the productDelivery was ...
4,Jimly Gogoi,5,It's my first time to use iOS phone and I am l...
...,...,...,...
9962,Rajat Garg,4,GoodREAD MORE
9963,Amarjit Singh,5,AwesomeREAD MORE
9964,Subham S,4,Works as usual .READ MORE
9965,Rohan Sk,5,killer 😘😘 PhoneREAD MORE


In [32]:
df.to_csv(r'C:\ScrapedFiles\iPhone_11_Reviews.csv',index=False)