# Web Scraping Project

### Problem Statement - Scrape the data from espncricnifo website consisting of top batters in the history of test cricket in terms of runs scored in their careers


### Process:

1.   Download the webpage having url as - 'https://stats.espncricinfo.com/ci/content/records/223646.html' using the "requests" library

2.   Create a soup object to parse through the gathered text

3.   Perform analysis on the object using python language and other libraries

4.   Load the data into an Excel file using the "openpyxl" library






In [71]:
##Import the required libraries

import requests
import bs4
import lxml
import openpyxl

In [72]:
##This is the URL of the target webpage

url = 'https://stats.espncricinfo.com/ci/content/records/223646.html'

In [73]:
##Creating an Excel Workbook to load the scraped data

excel = openpyxl.Workbook()
sheet = excel.active
sheet.title = "Top Batters in Test Cricket"
sheet.append(['Batter Name','Total Matches','Total Runs','Highest Score','Average','Total Centuries','Total Half Centuries'])

In [74]:



try:
  response = requests.get(url)  ##Sending HTTP request to the URL and gathering the result in response
  response.raise_for_status()  
  soup = bs4.BeautifulSoup(response.text,'lxml') ##Soup object
  batters = soup.find_all('tr',class_='data1')
  
  for batter in batters:
    
    stats = batter.find_all('td')   #to further drill down into individual components of each row
   
    name = batter.find('td',class_='left').a.text
    matches = batter.find('td',class_='padAst').text
    runs = batter.select('td > b')[0].text
    
    highest_score = stats[6].text
    avg = stats[7].text
    no_of_centuries = stats[8].text
    half_centuries = stats[9].text

    sheet.append([name,matches,runs,highest_score,avg,no_of_centuries,half_centuries])

except Exception as error:
  print(error)

In [75]:
excel.save('Top Test Batters.xlsx')     ## Saving the Excel file created 