<a href="https://colab.research.google.com/github/AnanyaSharma2/webScraper/blob/main/web_scraping.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Title**:
 QuoteScraper: Extracting Insights from "quotes.toscrape.com"

**Project Description: **

This project aimed to leverage web scraping techniques to extract valuable information from the "quotes.toscrape.com" website.
Data Extracted:

The project successfully extracted the following data from each quote page:
**bold text**


*  *Quote*: The actual text of the inspirational quote.
*  *Author Name*: The name of the person who said the quote.


*  *Author Bio Link*: A link to a web page with more information about the author
*   *Tags*: Keywords associated with the quote's meaning or theme
Data Storage and Use:



>The extracted data was stored in a structured format (e.g., a CSV file ) for further analysis and potential applications.





In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [None]:
link= 'https://quotes.toscrape.com/'

In [None]:
res = requests.get(link)

In [None]:
print(res.text)  # print html page of the site

In [None]:
# to save the page in offline mode
html =res.text
fd=open('main.html','w')
fd.write(html)
fd.close()

In [None]:
soup = BeautifulSoup(res.text,'html.parser')

In [None]:
print(soup)

In [None]:
# printing the first quote store inside the span tag and text class
print(soup.find('span',class_ ='text').text[1:-1])

The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.


In [None]:
#printing all the quotes present in a single page
for quotes in soup.find_all('span',class_ ='text'):
  print(quotes)

In [None]:
for quotes in soup.find_all('span',class_ ='text'):
  print(quotes.text ,end ='\n\n')

In [None]:
#removing "" from starting and end position
for quotes in soup.find_all('span',class_ ='text'):
  print(quotes.text[1:-1] ,end ='\n\n')

Now printing the author of a single quote

In [None]:
author = soup.find_all('div',class_='quote')

In [None]:
for sp in soup.find_all('div',class_='quote'):
  print(sp)
  print()

In [None]:
sp

In [None]:
# printing quote
quote = sp.find('span',class_='text').text[1:-1]
quote

'A day without sunshine is like, you know, night.'

In [None]:
#printing author of quote
sp.find('small').text

'Steve Martin'

In [None]:
# to print the link of the author
author_id=sp.find('a').get('href')

In [None]:
# printing all the tags
tag = []
for tags in sp.find_all('a',class_='tag'):
   tag.append(tags.text)
tag

['humor', 'obvious', 'simile']

In [None]:
','.join(tag)

'humor,obvious,simile'

# **Printing all the quotes,author,author_id,tags present in a single page**

In [None]:
for sp in soup.find_all('div',class_='quote'):
  quote = sp.find('span',class_='text').text[1:-1]
  author = sp.find('small',class_='author').text
  author_id = sp.find('a').get('href')
  tag = []
  for tags in sp.find_all('a',class_='tag'):
   tag.append(tags.text)
  tag = ','.join(tag)
  print(quote,author,author_id,tag ,sep ='\n' )
  print('*'*50)

In [None]:
# saving the entire data in form of a list
data =[]
for sp in soup.find_all('div',class_='quote'):
  quote = sp.find('span',class_='text').text[1:-1]
  author = sp.find('small',class_='author').text
  author_id = sp.find('a').get('href')
  tag = []
  for tags in sp.find_all('a',class_='tag'):
   tag.append(tags.text)
  tag = ','.join(tag)
  data.append([quote,author,author_id,tag] )



In [None]:
data[0]

['The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.',
 'Albert Einstein',
 '/author/Albert-Einstein',
 'change,deep-thoughts,thinking,world']

# **Storing the entire data in a dataframe**

In [None]:
data=pd.DataFrame(data,columns=['quotes','author','author_id','tag' ])

In [None]:
data

Unnamed: 0,quotes,author,author_id,tag
0,The world as we have created it is a process o...,Albert Einstein,/author/Albert-Einstein,"change,deep-thoughts,thinking,world"
1,"It is our choices, Harry, that show what we tr...",J.K. Rowling,/author/J-K-Rowling,"abilities,choices"
2,There are only two ways to live your life. One...,Albert Einstein,/author/Albert-Einstein,"inspirational,life,live,miracle,miracles"
3,"The person, be it gentleman or lady, who has n...",Jane Austen,/author/Jane-Austen,"aliteracy,books,classic,humor"
4,"Imperfection is beauty, madness is genius and ...",Marilyn Monroe,/author/Marilyn-Monroe,"be-yourself,inspirational"
5,Try not to become a man of success. Rather bec...,Albert Einstein,/author/Albert-Einstein,"adulthood,success,value"
6,It is better to be hated for what you are than...,André Gide,/author/Andre-Gide,"life,love"
7,"I have not failed. I've just found 10,000 ways...",Thomas A. Edison,/author/Thomas-A-Edison,"edison,failure,inspirational,paraphrased"
8,A woman is like a tea bag; you never know how ...,Eleanor Roosevelt,/author/Eleanor-Roosevelt,misattributed-eleanor-roosevelt
9,"A day without sunshine is like, you know, night.",Steve Martin,/author/Steve-Martin,"humor,obvious,simile"


In [None]:
# saving the dataframe as a csv file
data.to_csv('quotes.csv',index=True)