In this exercise, you will write Python code using **Beautiful Soup** to extract and organize information about the **2020 Oscars winners and nominees** from the provided HTML. This includes:

1. Extracting **highlights** of the ceremony.
2. Listing nominees for the **Best Film**, **Best Actor**, and **Best Actress** categories.
3. Identifying the winners in each category.

---

### **Tasks**

#### **1. Extract Highlights**
- Find the `<ul>` element with the class `highlights`.
- Extract all `<li>` elements inside it and print their text.

---

#### **2. Extract All Nominees**
For each category (`best film`, `best actor`, `best actress`):
- Locate the `<div>` for the category using the class name.
- Extract all nominees from the `<ul>` inside the category.

---

#### **3. Identify the Winners**
For each category:
- Locate the `<li>` element with the class `winner`.
- Print the name of the winner along with their category.


Extract the data and export to a json

In [39]:
html = '''<html>
     <head>
         <title>2020 Oscar nominees and winners</title>
     </head>
     <body>
         <h1>Oscars 2020</h1>
         <p>The biggest cinema awards ceremony took place in February. This year's highlights were:</p>
         <ul class="highlights">
             <li>Joker</li>
             <li>1917</li>
             <li>Once Upon a Time in Hollywood</li>
             <li>The Irishman</li>
         </ul>
         <p>See the winners of 3 categories below.</p>
         <div>
             <div class="best-film-category">
                 <h2>Category: best film</h2>
                 <ul>
                     <li>Ford vs Ferrari</li>
                     <li>The Irishman</li>
                     <li>JoJo Rabbit</li>
                     <li>Joker</li>
                     <li>Little Women</li>
                     <li>Marriage Story</li>
                     <li>1917</li>
                     <li>Once Upon a Time in Hollywood</li>
                     <li class="winner">Parasite <strong>[WINNER]</strong></li>
                 </ul>
             </div>
             <br>
             <div class="best-actor-category">
                 <h2>Category: best actor</h2>
                 <ul>
                     <li>Antonio Banderas - Pain and Glory</li>
                     <li>Leonardo DiCaprio - Once Upon a Time In... Hollywood</li>
                     <li>Adam Driver - Marriage Story</li>
                     <li class="winner">Joaquin Phoenix - Joker <strong>[WINNER]</strong></li>
                     <li>Jonathan Price - The Two Popes</li>
                 </ul>
             </div>
             <br>
             <div class="best-actress-category">
                 <h2>Category: best actress</h2>
                 <ul>
                     <li>Cythia Erivo - Harriet</li>
                     <li>Scarlett Johansson - Marriage Story</li>
                     <li>Saoirse Ronan - Little Women</li>
                     <li>Charlize Theron - The Scandal</li>
                     <li class="winner">Renée Zellweger - Judy: Far Over the Rainbow <strong>[WINNER]</strong></li>
                 </ul>
             </div>
         </div>
     </body>
</html>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser') 

type(soup) #returns the type of the variable soup
soup
print(soup.prettify())


<html>
 <head>
  <title>
   2020 Oscar nominees and winners
  </title>
 </head>
 <body>
  <h1>
   Oscars 2020
  </h1>
  <p>
   The biggest cinema awards ceremony took place in February. This year's highlights were:
  </p>
  <ul class="highlights">
   <li>
    Joker
   </li>
   <li>
    1917
   </li>
   <li>
    Once Upon a Time in Hollywood
   </li>
   <li>
    The Irishman
   </li>
  </ul>
  <p>
   See the winners of 3 categories below.
  </p>
  <div>
   <div class="best-film-category">
    <h2>
     Category: best film
    </h2>
    <ul>
     <li>
      Ford vs Ferrari
     </li>
     <li>
      The Irishman
     </li>
     <li>
      JoJo Rabbit
     </li>
     <li>
      Joker
     </li>
     <li>
      Little Women
     </li>
     <li>
      Marriage Story
     </li>
     <li>
      1917
     </li>
     <li>
      Once Upon a Time in Hollywood
     </li>
     <li class="winner">
      Parasite
      <strong>
       [WINNER]
      </strong>
     </li>
    </ul>
   </div>
   <br/>
 

In [40]:
import json 
from bs4 import BeautifulSoup

categories_dict = {}
categories = ["best-film-category", "best-actor-category", "best-actress-category"]
for category in categories:
  category_div = soup.find("div", class_=category)
  category_name = category_div.find("h2").text.replace("Category: ", "")
  nominees_list = category_div.find("ul").find_all("li")
  nominees = [li.text for li in nominees_list]
  winner = category_div.find("li", class_="winner").text
  categories_dict[category_name] = {"Nominees": nominees,"Winner": winner}
categories_dict


{'best film': {'Nominees': ['Ford vs Ferrari',
   'The Irishman',
   'JoJo Rabbit',
   'Joker',
   'Little Women',
   'Marriage Story',
   '1917',
   'Once Upon a Time in Hollywood',
   'Parasite [WINNER]'],
  'Winner': 'Parasite [WINNER]'},
 'best actor': {'Nominees': ['Antonio Banderas - Pain and Glory',
   'Leonardo DiCaprio - Once Upon a Time In... Hollywood',
   'Adam Driver - Marriage Story',
   'Joaquin Phoenix - Joker [WINNER]',
   'Jonathan Price - The Two Popes'],
  'Winner': 'Joaquin Phoenix - Joker [WINNER]'},
 'best actress': {'Nominees': ['Cythia Erivo - Harriet',
   'Scarlett Johansson - Marriage Story',
   'Saoirse Ronan - Little Women',
   'Charlize Theron - The Scandal',
   'Renée Zellweger - Judy: Far Over the Rainbow [WINNER]'],
  'Winner': 'Renée Zellweger - Judy: Far Over the Rainbow [WINNER]'}}

In [41]:
highlights_list = soup.find('ul', class_='highlights').find_all('li')
highlights = [li.text for li in highlights_list]

In [43]:
oscars_data={
    "Highlights": highlights,
    "Categories": categories_dict
}
with open("oscars_data.json", "w") as json_file:
    json.dump(oscars_data, json_file, indent=4)

In [8]:
best_film_nominees = soup.find('div', class_='best-film-category').find_all('li')
print(f"Best Film Nominees: {[nominee.get_text() for nominee in best_film_nominees]}")

# Best Actor Winner
best_actor_nominees = soup.find('div', class_='best-actor-category').find_all('li')
print(f"Best Actor Nominees: {[nominee.get_text() for nominee in best_actor_nominees]}")

# Best Actress Winner
best_actress_nominees = soup.find('div', class_='best-actress-category').find_all('li')
print(f"Best Actress Nominees: {[nominee.get_text() for nominee in best_actress_nominees]}")

Best Film Nominees: ['Ford vs Ferrari', 'The Irishman', 'JoJo Rabbit', 'Joker', 'Little Women', 'Marriage Story', '1917', 'Once Upon a Time in Hollywood', 'Parasite [WINNER]']
Best Actor Nominees: ['Antonio Banderas - Pain and Glory', 'Leonardo DiCaprio - Once Upon a Time In... Hollywood', 'Adam Driver - Marriage Story', 'Joaquin Phoenix - Joker [WINNER]', 'Jonathan Price - The Two Popes']
Best Actress Nominees: ['Cythia Erivo - Harriet', 'Scarlett Johansson - Marriage Story', 'Saoirse Ronan - Little Women', 'Charlize Theron - The Scandal', 'Renée Zellweger - Judy: Far Over the Rainbow [WINNER]']


In [10]:
best_film_winner = soup.find('div', class_='best-film-category').find('li', class_='winner')
print(f"Best Film Winner: {best_film_winner.get_text()}")
best_actor_winner = soup.find('div', class_='best-actor-category').find('li', class_='winner')
print(f"Best Actor Winner: {best_actor_winner.get_text()}")
best_actress_winner = soup.find('div', class_='best-actress-category').find('li', class_='winner')
print(f"Best Actress Winner: {best_actress_winner.get_text()}")


Best Film Winner: Parasite [WINNER]
Best Actor Winner: Joaquin Phoenix - Joker [WINNER]
Best Actress Winner: Renée Zellweger - Judy: Far Over the Rainbow [WINNER]
