# Become a movie director

Let's use BeautifulSoup to get some information about the top-250 rated movies on <a href="http://www.imdb.com/" target="_blank">IMDB</a>.

To complete this exercise, feel free to look at <a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/" target="_blank">BeautifulSoup documentation</a>.

1. Import `Beautifulsoup` and `requests` libraries:

In [5]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [6]:
from bs4 import BeautifulSoup
import requests


2. With `requests`, get the source code of the webpage at this url: <a href="http://www.imdb.com/chart/top" target="_blank">http://www.imdb.com/chart/top</a>

In [7]:
page = requests.get('https://www.imdb.com/chart/top')


3. Use Beautifulsoup to extract the following items from the webpage HTML code, for each of the 250 movies: ranking, title, url, crew, rating and number of voters.

Use the `.select` method to find tags you need on the website, then store those tags into lists.

Finally, create a list named `imdb` in which each item is a dictionary containing the information related to one movie.

**Hint**: You can check out the <a href="https://docs.python.org/3/library/string.html" target="_blank">string documentation</a>, in particular the `.split`, `.join` and `.replace` methods.

In [8]:
soup = BeautifulSoup(page.text, 'html.parser')

In [9]:
content = soup.select("tbody.lister-list")[0]
content

<tbody class="lister-list">
<tr>
<td class="posterColumn">
<span data-value="1" name="rk"></span>
<span data-value="9.240095502042621" name="ir"></span>
<span data-value="7.791552E11" name="us"></span>
<span data-value="2569298" name="nv"></span>
<span data-value="-1.759904497957379" name="ur"></span>
<a href="/title/tt0111161/"> <img alt="Les Évadés" height="67" src="https://m.media-amazon.com/images/M/MV5BMDFkYTc0MGEtZmNhMC00ZDIzLWFmNTEtODM1ZmRlYWMwMWFmXkEyXkFqcGdeQXVyMTMxODk2OTU@._V1_UY67_CR0,0,45,67_AL_.jpg" width="45"/>
</a> </td>
<td class="titleColumn">
      1.
      <a href="/title/tt0111161/" title="Frank Darabont (dir.), Tim Robbins, Morgan Freeman">Les Évadés</a>
<span class="secondaryInfo">(1994)</span>
</td>
<td class="ratingColumn imdbRating">
<strong title="9.2 based on 2,569,298 user ratings">9.2</strong>
</td>
<td class="ratingColumn">
<div class="seen-widget seen-widget-tt0111161 pending" data-titleid="tt0111161">
<div class="boundary">
<div class="popover">
<span cl

In [5]:
ranks = []

for i in content.select('span[name="rk"]'):
    rank = i.get("data-value")
    ranks.append(rank)

In [6]:
# Slow method (not recommended) - 26 seconds to run per list

ranks = []

for i in range(250):
    rank = content.select('span[name="rk"]')[i].get("data-value")
    ranks.append(rank)

In [7]:
# quick method - 0,1 s to run per list

ranks = [i.get('data-value') for i in content.select('span[name="rk"]')]
votes = [i.get('data-value') for i in content.select('span[name="nv"]')]
ratings = [i.get_text()  for i in content.find_all("strong")]
titles = [i.find('a').get_text() for i in content.select("td.titleColumn")]
crews = [i.select('a')[0].get('title') for i in content.select("td.titleColumn")]
urls = [f"https://www.imdb.com{i.select('a')[0].get('href')}" for i in content.select("td.titleColumn")]

In [8]:
imdb = []

for i in range(len(ranks)):
    imdb.append({
        "rank" : ranks[i],
        "votes" : votes[i],
        "rating" : ratings[i],
        "title" : titles[i],
        "crew" : crews[i],
        "url" : urls[i]
    })
    
imdb

[{'rank': '1',
  'votes': '2567151',
  'rating': '9.2',
  'title': 'Les Évadés',
  'crew': 'Frank Darabont (dir.), Tim Robbins, Morgan Freeman',
  'url': 'https://www.imdb.com/title/tt0111161/'},
 {'rank': '2',
  'votes': '1767166',
  'rating': '9.2',
  'title': 'Le Parrain',
  'crew': 'Francis Ford Coppola (dir.), Marlon Brando, Al Pacino',
  'url': 'https://www.imdb.com/title/tt0068646/'},
 {'rank': '3',
  'votes': '2531168',
  'rating': '9.0',
  'title': 'The Dark Knight : Le Chevalier noir',
  'crew': 'Christopher Nolan (dir.), Christian Bale, Heath Ledger',
  'url': 'https://www.imdb.com/title/tt0468569/'},
 {'rank': '4',
  'votes': '1222799',
  'rating': '9.0',
  'title': 'Le Parrain, 2ᵉ partie',
  'crew': 'Francis Ford Coppola (dir.), Al Pacino, Robert De Niro',
  'url': 'https://www.imdb.com/title/tt0071562/'},
 {'rank': '5',
  'votes': '758177',
  'rating': '9.0',
  'title': '12 Hommes en colère',
  'crew': 'Sidney Lumet (dir.), Henry Fonda, Lee J. Cobb',
  'url': 'https://www

4. To check your code, loop over the `imdb` list and print some information for each movie:

In [9]:
for i in imdb:
    print(f"{i['rank']} - {i['title']} - Starring: {i['crew']} - imdb rating: {i['rating']} - {i['votes']} votes")

1 - Les Évadés - Starring: Frank Darabont (dir.), Tim Robbins, Morgan Freeman - imdb rating: 9.2 - 2567151 votes
2 - Le Parrain - Starring: Francis Ford Coppola (dir.), Marlon Brando, Al Pacino - imdb rating: 9.2 - 1767166 votes
3 - The Dark Knight : Le Chevalier noir - Starring: Christopher Nolan (dir.), Christian Bale, Heath Ledger - imdb rating: 9.0 - 2531168 votes
4 - Le Parrain, 2ᵉ partie - Starring: Francis Ford Coppola (dir.), Al Pacino, Robert De Niro - imdb rating: 9.0 - 1222799 votes
5 - 12 Hommes en colère - Starring: Sidney Lumet (dir.), Henry Fonda, Lee J. Cobb - imdb rating: 9.0 - 758177 votes
6 - La liste de Schindler - Starring: Steven Spielberg (dir.), Liam Neeson, Ralph Fiennes - imdb rating: 8.9 - 1308155 votes
7 - Le Seigneur des anneaux : Le Retour du roi - Starring: Peter Jackson (dir.), Elijah Wood, Viggo Mortensen - imdb rating: 8.9 - 1766301 votes
8 - Pulp Fiction - Starring: Quentin Tarantino (dir.), John Travolta, Uma Thurman - imdb rating: 8.9 - 1970762 vote