## Making a Soup
### Explanation:
- BeautifulSoup(html_content, 'html.parser') takes the HTML content and parses it into a structured format that we can easily search and manipulate.
- soup.prettify() is used to output the HTML in a more readable (pretty-printed) format.

In [5]:
from bs4 import BeautifulSoup


with open("index.html") as fp:
    soup = BeautifulSoup(fp, features="lxml")

soup = BeautifulSoup("<html>Data</html>")

print(soup.prettify())

<html>
 <body>
  <p>
   Data
  </p>
 </body>
</html>



In [13]:
from bs4 import BeautifulSoup

html_content = """<html>
    <body>
        <h1>Welcome to BeautifulSoup</h1>
        <p>This is a sample HTML page.</p>
    </body>
</html>"""

# parse the HTML using 'html.parser' as the parser
soup = BeautifulSoup(html_content, 'html.parser')

print(soup.prettify())

<html>
 <body>
  <h1>
   Welcome to BeautifulSoup
  </h1>
  <p>
   This is a sample HTML page.
  </p>
 </body>
</html>



### Kinds of objects
Beautiful Soup transforms a complex HTML document into a complex tree of Python objects. But you’ll only ever have to deal with about four kinds of objects: 

- Tag: A tag object corresponds to an HTML tag (e.g., a, h1, p, etc.).
- NavigableString:  A string inside a tag, such as the text between the opening and closing tags.
- BeautifulSoup: This represents the whole document, and is the main object we’ll work with.
- Comment.

In [15]:
# Accessing a tag

h1_tag = soup.h1

print(f"Tag: {h1_tag}")

# accessing the string inside the tag
h1_string = h1_tag.string
print(f"String inside h1: {h1_string}")


Tag: <h1>Welcome to BeautifulSoup</h1>
String inside h1: Welcome to BeautifulSoup


#### Tags

In [6]:
soup = BeautifulSoup('<b class="boldest">Extremely bold</b>')
tag = soup.b
type(tag)


bs4.element.Tag

In [7]:
import requests
from bs4 import BeautifulSoup

url = "https://courses.wscubetech.com/"
r = requests.get(url)
print(r.text)

<html>
<head><title>401 Authorization Required</title></head>
<body>
<center><h1>401 Authorization Required</h1></center>
<hr><center>openresty</center>
</body>
</html>



In [8]:
# name tag

tag.name

'b'

In [22]:
# change tag name 
tag.name = "blockquote"
tag

<blockquote class="boldest">Extremely bold</blockquote>

#### Attributes
A tag may have any number of attributes. The tag <b id="boldest"> has an attribute “id” whose value is “boldest”. You can access a tag’s attributes by treating the tag like a dictionary:

In [25]:
# You can add, remove, and modify a tag’s attributes. Again, this is done by treating the tag as a dictionary:

# Add tag attributes
tag['id'] = 'verybold'
tag['another-attribute'] = 1
tag


# Delete tag attributes
del tag['id']
del tag['another-attribute']
tag

# chnaging tag name from <blockquote> to <b>
# tag.name = "b"
# tag

# after deleting attribute id get keyError
# tag['id']

# this give none because we don't have id 
print(tag.get('id'))


None


#### Multi-valued attributes


In [26]:
css_soup = BeautifulSoup('<p class="body"></p>')
css_soup.p['class']
# ["body"]

css_soup = BeautifulSoup('<p class="body strikeout"></p>')
css_soup.p['class']
# ["body", "strikeout"]


# 

['body', 'strikeout']

In [31]:
# changing the name or adding extra name 

# rel_soup = BeautifulSoup('<p>Back to the <a rel="index">homepage</a></p>')
# rel_soup.a['rel']
# # ['index']
# rel_soup.a['rel'] = ['index', 'contents']
# print(rel_soup.p)
# <p>Back to the <a rel="index contents">homepage</a></p>



# chnaging in html file

with open("index.html", 'r') as file:

    class_name = BeautifulSoup(file)
    class_name.h1['class']
    class_name.h1['class'] = ['header', 'new-header']
    print(class_name.body)





<body>
<h1 class="header new-header">Welcome to My First Webpage!</h1>
<p>This is a paragraph.</p>
<a href="https://amazon.com">Click Here</a>
<script src="script.js"></script>
</body>


#### NavigableString
NavigableString ek class hai jo Beautiful Soup library mein istemal hoti hai. Ye class HTML ya XML document ke andar text ko represent karti hai. Jab aap kisi tag se text nikaalte hain, to wo NavigableString ke taur par hota hai.

In [32]:
from bs4 import BeautifulSoup

html = "<blockquote>Extremely bold</blockquote>"
soup = BeautifulSoup(html, "html.parser")

tag = soup.blockquote  # Yeh tag ko le raha hai
text = tag.string       # Yeh text ko le raha hai
print(text)            # Output: Extremely bold
print(type(text))     # Output: <class 'bs4.element.NavigableString'>


Extremely bold
<class 'bs4.element.NavigableString'>


In [34]:
# Convert NavigableString into simple python (Uncode String)
# we use str() to convert the string

unicode_string = str(tag.string)  # Isse Unicode string milti hai
print(unicode_string)              # Output: Extremely bold
print(type(unicode_string))        # Output: <class 'str'>


Extremely bold
<class 'str'>


In [35]:
# If we want to replace string then we use replace_with() method

tag.string.replace_with("No longer bold")  # Isse text replace ho jayega
print(tag)                                  # Output: <blockquote>No longer bold</blockquote>


<blockquote>No longer bold</blockquote>


#### BeautifulSoup Object: 
BeautifulSoup object poore document ko represent karta hai. Iska istemal karke aap tags ko dhoond sakte hain aur unhein modify bhi kar sakte hain.


In [36]:
doc = BeautifulSoup("<document><content/>INSERT FOOTER HERE</document>", "xml")
footer = BeautifulSoup("<footer>Here's the footer</footer>", "xml")

# Footer ko replace karna
doc.find(text="INSERT FOOTER HERE").replace_with(footer)
print(doc)  # Output: <document><content/><footer>Here's the footer</footer></document>


<?xml version="1.0" encoding="utf-8"?>
<document><content/><footer>Here's the footer</footer></document>


  doc.find(text="INSERT FOOTER HERE").replace_with(footer)


#### Comment 
Comments represents by using class

In [37]:
markup = "<b><!--Hey, buddy. Want to buy a used parser?--></b>"
soup = BeautifulSoup(markup)
comment = soup.b.string  # Yeh comment ko le raha hai
print(comment)            # Output: Hey, buddy. Want to buy a used parser?


Hey, buddy. Want to buy a used parser?


##  Navigating the Tree
One of the powerful features of BeautifulSoup is the ability to navigate the parse tree and access different elements.

#### Going Down the Tree

In [44]:
# Example to accessing Children of the <body> tag
import csv


with open("index.html", 'r') as test_tree_file:
    access_child = BeautifulSoup(test_tree_file)
    # access_child.h1['class']
    body_tag = access_child.body
    print("Children of <body> are: ", list(body_tag.children))

Children of <body> are:  ['\n', <h1 class="header">Welcome to My First Webpage!</h1>, '\n', <p>This is a paragraph.</p>, '\n', <a href="https://amazon.com">Click Here</a>, '\n', <script src="script.js"></script>, '\n']


#### Going up the Tree

In [47]:
# To go up the tree and access the parent Element

with open("index.html", 'r') as up_tree_file:
    access_child = BeautifulSoup(up_tree_file)
    h2_parent = access_child.h2
    print(h2_parent.parent)

<div>
<h2 class="header2">This is h2 Tag</h2>
</div>


### Going Side Way
To navigate to siblings (elements that are at the same level):

- next_sibling: Access the next sibling element.
- previous_sibling: Access the previous sibling element.

In [53]:
# find the next sibling

h1_tag = access_child.h1

p_tag = h1_tag.find_next_sibling('p')


# p_tag = h1_tag.find_next_sibling('p')
print(f"Next sibling: {p_tag}")

Next sibling: None


#### Going Back and Forth
- next_element: Access the next element in the parse tree.
- previous_element: Access the previous element in the parse tree.

In [54]:
next_element = h1_tag.find_next_element()
print(f"Next element: {next_element}")

TypeError: 'NoneType' object is not callable

## Searching the Tree
Searching for tags and content in BeautifulSoup is powerful and flexible. You can search using various filters:

find() and find_all() Methods
- find(): Returns the first match of a tag.
- find_all(): Returns all matches of a tag.

In [59]:
# Finding the first <p> tag
p_tag = access_child.find('p')
print(p_tag)

# Finding all <p> tags
p_tags = access_child.find_all('p')
print(p_tags)


<p>This is a paragraph.</p>
[<p>This is a paragraph.</p>]
