# WebScraping from wikipedia

## webscraping Technique

<h2>Web Scraping</h2>
<p>
Web scraping is an automated method used to extract large amounts of data from websites. It involves fetching the HTML content of web pages and parsing the data to retrieve specific information. Commonly, web scraping is used for data analysis, price monitoring, market research, and more.
</p>
<p>
Popular Python libraries for web scraping include <strong>BeautifulSoup</strong> for parsing HTML and XML documents, <strong>requests</strong> for sending HTTP requests, <strong>Scrapy</strong> for more complex and large-scale scraping tasks, and <strong>Selenium</strong> for scraping dynamic content rendered by JavaScript.
</p>
<p>
While web scraping can be a powerful tool, it is important to be mindful of the website's <a href="https://en.wikipedia.org/wiki/Robots_exclusion_standard">robots.txt</a> file and the terms of service to ensure compliance with legal and ethical standards.
</p>


## Import Laibraries

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import requests


### using laibrary to extract the data from web through url

In [2]:
url= 'https://en.wikipedia.org/wiki/World_population'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')

print(soup.title)


<title>World population - Wikipedia</title>


In [3]:
#Extract All the text from the website
soup.text



In [4]:
#Extarxt title 
soup.title.text

'World population - Wikipedia'

In [5]:
#let prttiffy
prettiffy=soup.prettify()

In [6]:
# display the all table in the website
tables=soup.find_all('table')
tables

[<table class="wikitable" style="text-align:center; float:right; clear:right; margin-left:8px; margin-right:0;">
 <caption>World population milestones in billions<sup class="reference" id="cite_ref-:6_59-0"><a href="#cite_note-:6-59">[59]</a></sup> (Worldometers estimates)
 </caption>
 <tbody><tr>
 <th scope="row">Population
 </th>
 <th scope="col">1
 </th>
 <th scope="col">2
 </th>
 <th scope="col">3
 </th>
 <th scope="col">4
 </th>
 <th scope="col">5
 </th>
 <th scope="col">6
 </th>
 <th scope="col">7
 </th>
 <th scope="col">8
 </th>
 <th scope="col">9
 </th>
 <th scope="col">10
 </th></tr>
 <tr>
 <th scope="row">Year
 </th>
 <td>1804</td>
 <td>1927</td>
 <td>1960</td>
 <td>1974</td>
 <td>1987</td>
 <td>1999</td>
 <td>2011</td>
 <td>2022</td>
 <td><i>2037</i></td>
 <td><i>2057</i>
 </td></tr>
 <tr>
 <th scope="row">Years elapsed
 </th>
 <td>200,000+</td>
 <td>123</td>
 <td>33</td>
 <td>14</td>
 <td>13</td>
 <td>12</td>
 <td>12</td>
 <td>11</td>
 <td><i>15</i></td>
 <td><i>20</i>
 </t

# use for loop to convert data into tables and columns

In [7]:

dataframe=[]
for i,table in enumerate(tables):
    rows =table.find_all("tr")[1:] #/skipping the first rows
    data= [] #empty list
    for row in rows:
        cols= row.find_all("td")
        cols=[col.text.strip() for col in cols]
        data.append(cols)
    df=pd.DataFrame(data)
    dataframe.append(df)
dataframe[4]


Unnamed: 0,0,1,2,3,4
0,India,1425775850,17.6%,14 Apr 2023,UN projection[90]
1,China,1409670000,17.4%,17 Jan 2024,National annual estimate[91]
2,United States,336769645,4.15%,7 Aug 2024,National population clock[92]
3,Indonesia,278696200,3.43%,1 Jul 2023,National annual estimate[93]
4,Pakistan,229488994,2.83%,1 Jul 2022,UN projection[94]
5,Nigeria,216746934,2.67%,1 Jul 2022,UN projection[94]
6,Brazil,218007126,2.68%,7 Aug 2024,National population clock[95]
7,Bangladesh,168220000,2.07%,1 Jul 2020,Annual Population Estimate[96]
8,Russia,147190000,1.81%,1 Oct 2021,2021 preliminary census results[97]
9,Mexico,128271248,1.58%,31 Mar 2022,


# convert dataframe into columns

In [8]:

dataframe[4].to_csv("world population1.csv",index=False)

## I think this will be helpfull for any learnner...... thank you for visit