# Web Scraping Lab

You will find in this notebook some scrapy exercises to practise your scraping skills.

**Tips:**

- Check the response status code for each request to ensure you have obtained the intended content.
- Print the response text in each request to understand the kind of info you are getting and its format.
- Check for patterns in the response text to extract the data/info requested in each question.
- Visit the urls below and take a look at their source code through Chrome DevTools. You'll need to identify the html tags, special class names, etc used in the html content you are expected to extract.

**Resources**:
- [Requests library](http://docs.python-requests.org/en/master/#the-user-guide)
- [Beautiful Soup Doc](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib](https://docs.python.org/3/library/urllib.html#module-urllib)
- [re lib](https://docs.python.org/3/library/re.html)
- [lxml lib](https://lxml.de/)
- [Scrapy](https://scrapy.org/)
- [List of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
- [HTML basics](http://www.simplehtmlguide.com/cheatsheet.php)
- [CSS basics](https://www.cssbasics.com/#page_start)

#### Below are the libraries and modules you may need. `requests`,  `BeautifulSoup` and `pandas` are already imported for you. If you prefer to use additional libraries feel free to do it.

In [8]:
import requests
import bs4
import pandas as pd
import re


In [9]:
from bs4 import BeautifulSoup

#### Download, parse (using BeautifulSoup), and print the content from the Trending Developers page from GitHub:

In [10]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/developers'

In [11]:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
headers

{'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}

In [12]:
response = requests.get(url,
                        headers=headers)
response

<Response [200]>

In [13]:
html = response.content
parsed_html = bs4.BeautifulSoup(html, "html.parser") 


#### Display the names of the trending developers retrieved in the previous step.

Your output should be a Python list of developer names. Each name should not contain any html tag.

**Instructions:**

1. Find out the html tag and class names used for the developer names. You can achieve this using Chrome DevTools.

1. Use BeautifulSoup to extract all the html elements that contain the developer names.

1. Use string manipulation techniques to replace whitespaces and linebreaks (i.e. `\n`) in the *text* of each html element. Use a list to store the clean names.

1. Print the list of names.

Your output should look like below:

```
['trimstray (@trimstray)',
 'joewalnes (JoeWalnes)',
 'charlax (Charles-AxelDein)',
 'ForrestKnight (ForrestKnight)',
 'revery-ui (revery-ui)',
 'alibaba (Alibaba)',
 'Microsoft (Microsoft)',
 'github (GitHub)',
 'facebook (Facebook)',
 'boazsegev (Bo)',
 'google (Google)',
 'cloudfetch',
 'sindresorhus (SindreSorhus)',
 'tensorflow',
 'apache (TheApacheSoftwareFoundation)',
 'DevonCrawford (DevonCrawford)',
 'ARMmbed (ArmMbed)',
 'vuejs (vuejs)',
 'fastai (fast.ai)',
 'QiShaoXuan (Qi)',
 'joelparkerhenderson (JoelParkerHenderson)',
 'torvalds (LinusTorvalds)',
 'CyC2018',
 'komeiji-satori (神楽坂覚々)',
 'script-8']
 ```

In [18]:
parsed_tags = list(set([tag.name for tag in parsed_html.find_all(True)]))
print(type(parsed_tags), '\n')
print(parsed_tags)

<class 'list'> 

['h3', 'details-menu', 'head', 'span', 'circle', 'script', 'article', 'details-dialog', 'div', 'header', 'p', 'html', 'h2', 'clipboard-copy', 'img', 'link', 'g-emoji', 'template', 'footer', 'svg', 'input', 'label', 'h1', 'meta', 'include-fragment', 'filter-input', 'path', 'a', 'li', 'main', 'nav', 'ul', 'summary', 'details', 'title', 'button', 'form', 'body']


In [19]:
trending_developers = parsed_html.find_all("a", {"data-hydro-click" : re.compile('"TRENDING_DEVELOPER"')})


In [20]:
trending_developers = parsed_html.find_all("a", {"data-hydro-click" : re.compile('"TRENDING_DEVELOPER"')})
#trending_developers
names = [tag.string.strip() for tag in trending_developers if tag.string != None]
names.insert(20, "No Name")
names.insert(22, "No Name")
names.insert(26, "No Name")
names

['pilcrowOnPaper',
 'Yifei Zhang',
 'Yidadaa',
 'Jarred Sumner',
 'Jarred-Sumner',
 'Daniel Vaz Gaspar',
 'dpgaspar',
 'dkhamsing',
 'oobabooga',
 'bmaltais',
 'J. Nick Koston',
 'bdraco',
 'DaniPopes',
 'lijianan',
 'li-jia-nan',
 'Adrienne Walker',
 'quisquous',
 'Dessalines',
 'dessalines',
 'Etienne BAUDOUX',
 'No Name',
 'veler',
 'No Name',
 'fengmk2',
 'fengmk2',
 'Bob Nystrom',
 'No Name',
 'munificent',
 'Ismail Pelaseyed',
 'homanp',
 'Henrik Rydgård',
 'hrydgard',
 'Thibault Duplessis',
 'ornicar',
 'Harrison Chase',
 'hwchase17',
 'Alex Gaynor',
 'alex',
 'Jack Lloyd',
 'randombit',
 'Alexey Milovidov',
 'alexey-milovidov',
 'Lianmin Zheng',
 'merrymercy',
 'rootmelo92118',
 'Connor Tumbleson',
 'iBotPeaches']

In [21]:

names2 = [f'{x}' for x in names if ' ' in x]
users = [f'({x})' for x in names if ' ' not in x]
print(names2)
print(users)

['Yifei Zhang', 'Jarred Sumner', 'Daniel Vaz Gaspar', 'J. Nick Koston', 'Adrienne Walker', 'Etienne BAUDOUX', 'No Name', 'No Name', 'Bob Nystrom', 'No Name', 'Ismail Pelaseyed', 'Henrik Rydgård', 'Thibault Duplessis', 'Harrison Chase', 'Alex Gaynor', 'Jack Lloyd', 'Alexey Milovidov', 'Lianmin Zheng', 'Connor Tumbleson']
['(pilcrowOnPaper)', '(Yidadaa)', '(Jarred-Sumner)', '(dpgaspar)', '(dkhamsing)', '(oobabooga)', '(bmaltais)', '(bdraco)', '(DaniPopes)', '(lijianan)', '(li-jia-nan)', '(quisquous)', '(Dessalines)', '(dessalines)', '(veler)', '(fengmk2)', '(fengmk2)', '(munificent)', '(homanp)', '(hrydgard)', '(ornicar)', '(hwchase17)', '(alex)', '(randombit)', '(alexey-milovidov)', '(merrymercy)', '(rootmelo92118)', '(iBotPeaches)']


In [22]:
df1 = pd.DataFrame (names2, columns = ['names'])
df2 = pd.DataFrame (users, columns = ['names'])
df3 = df1 + ' ' + df2
df3

Unnamed: 0,names
0,Yifei Zhang (pilcrowOnPaper)
1,Jarred Sumner (Yidadaa)
2,Daniel Vaz Gaspar (Jarred-Sumner)
3,J. Nick Koston (dpgaspar)
4,Adrienne Walker (dkhamsing)
5,Etienne BAUDOUX (oobabooga)
6,No Name (bmaltais)
7,No Name (bdraco)
8,Bob Nystrom (DaniPopes)
9,No Name (lijianan)


In [23]:
names3 = df3.values.tolist()
names3


[['Yifei Zhang (pilcrowOnPaper)'],
 ['Jarred Sumner (Yidadaa)'],
 ['Daniel Vaz Gaspar (Jarred-Sumner)'],
 ['J. Nick Koston (dpgaspar)'],
 ['Adrienne Walker (dkhamsing)'],
 ['Etienne BAUDOUX (oobabooga)'],
 ['No Name (bmaltais)'],
 ['No Name (bdraco)'],
 ['Bob Nystrom (DaniPopes)'],
 ['No Name (lijianan)'],
 ['Ismail Pelaseyed (li-jia-nan)'],
 ['Henrik Rydgård (quisquous)'],
 ['Thibault Duplessis (Dessalines)'],
 ['Harrison Chase (dessalines)'],
 ['Alex Gaynor (veler)'],
 ['Jack Lloyd (fengmk2)'],
 ['Alexey Milovidov (fengmk2)'],
 ['Lianmin Zheng (munificent)'],
 ['Connor Tumbleson (homanp)'],
 [nan],
 [nan],
 [nan],
 [nan],
 [nan],
 [nan],
 [nan],
 [nan],
 [nan]]

In [24]:
def flatten_list(_2d_list):
    flat_list = []
    # Iterate through the outer list
    for element in _2d_list:
        if type(element) is list:
            # If the element is of type list, iterate through the sublist
            for item in element:
                flat_list.append(item)
        else:
            flat_list.append(element)
    return flat_list
final_names = (flatten_list(names3))
final_names

['Yifei Zhang (pilcrowOnPaper)',
 'Jarred Sumner (Yidadaa)',
 'Daniel Vaz Gaspar (Jarred-Sumner)',
 'J. Nick Koston (dpgaspar)',
 'Adrienne Walker (dkhamsing)',
 'Etienne BAUDOUX (oobabooga)',
 'No Name (bmaltais)',
 'No Name (bdraco)',
 'Bob Nystrom (DaniPopes)',
 'No Name (lijianan)',
 'Ismail Pelaseyed (li-jia-nan)',
 'Henrik Rydgård (quisquous)',
 'Thibault Duplessis (Dessalines)',
 'Harrison Chase (dessalines)',
 'Alex Gaynor (veler)',
 'Jack Lloyd (fengmk2)',
 'Alexey Milovidov (fengmk2)',
 'Lianmin Zheng (munificent)',
 'Connor Tumbleson (homanp)',
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan,
 nan]

#### Display the trending Python repositories in GitHub.

The steps to solve this problem is similar to the previous one except that you need to find out the repository names instead of developer names.

In [25]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/python?since=daily'

In [27]:
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
headers
response = requests.get(url, headers=headers)
html = response.content
parsed_html = bs4.BeautifulSoup(html, "html.parser") 

In [28]:
trending_repositories = [tag.a for tag in parsed_html.find_all('h1',{'class':['h3', 'lh-condensed']})]
names = [tag.string for tag in trending_repositories]
names

[]

#### Display all the image links from Walt Disney wikipedia page.

In [None]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/Walt_Disney'

In [29]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/Walt_Disney'


headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
response = requests.get(url,
                        headers=headers,
                        timeout=5)

html = response.content
parsed_html = bs4.BeautifulSoup(html, "html.parser") 

In [30]:
images = parsed_html.find_all('img', {'src':True})
for image in images:
    if "upload.wikimedia" in image['src']:
        print('https:'+image['src']+'\n')

https://upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/20px-Cscr-featured.svg.png

https://upload.wikimedia.org/wikipedia/en/thumb/8/8c/Extended-protection-shackle.svg/20px-Extended-protection-shackle.svg.png

https://upload.wikimedia.org/wikipedia/commons/thumb/d/df/Walt_Disney_1946.JPG/220px-Walt_Disney_1946.JPG

https://upload.wikimedia.org/wikipedia/commons/thumb/8/87/Walt_Disney_1942_signature.svg/150px-Walt_Disney_1942_signature.svg.png

https://upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Walt_Disney_Birthplace_Exterior_Hermosa_Chicago_Illinois.jpg/220px-Walt_Disney_Birthplace_Exterior_Hermosa_Chicago_Illinois.jpg

https://upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Walt_Disney_envelope_ca._1921.jpg/220px-Walt_Disney_envelope_ca._1921.jpg

https://upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Trolley_Troubles_poster.jpg/170px-Trolley_Troubles_poster.jpg

https://upload.wikimedia.org/wikipedia/en/thumb/4/4e/Steamboat-willie.jpg/170px-Steamboat-willi

#### Retrieve an arbitary Wikipedia page of "Python" and create a list of links on that page.

In [31]:
# This is the url you will scrape in this exercise
url ='https://en.wikipedia.org/wiki/Python' 

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
response = requests.get(url,
                        headers=headers,
                        timeout=5)

html = response.content
parsed_html = bs4.BeautifulSoup(html, "html.parser") 

In [32]:
for link in parsed_html.findAll("a"):
    if 'href' in link.attrs:
        print(link.attrs['href'])

#bodyContent
/wiki/Main_Page
/wiki/Wikipedia:Contents
/wiki/Portal:Current_events
/wiki/Special:Random
/wiki/Wikipedia:About
//en.wikipedia.org/wiki/Wikipedia:Contact_us
https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&utm_medium=sidebar&utm_campaign=C13_en.wikipedia.org&uselang=en
/wiki/Help:Contents
/wiki/Help:Introduction
/wiki/Wikipedia:Community_portal
/wiki/Special:RecentChanges
/wiki/Wikipedia:File_upload_wizard
/wiki/Main_Page
/wiki/Special:Search
/w/index.php?title=Special:CreateAccount&returnto=Python
/w/index.php?title=Special:UserLogin&returnto=Python
/w/index.php?title=Special:CreateAccount&returnto=Python
/w/index.php?title=Special:UserLogin&returnto=Python
/wiki/Help:Introduction
/wiki/Special:MyContributions
/wiki/Special:MyTalk
#
#Snakes
#Computing
#People
#Roller_coasters
#Vehicles
#Weaponry
#Other_uses
#See_also
https://af.wikipedia.org/wiki/Python
https://als.wikipedia.org/wiki/Python
https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D

#### Find the number of titles that have changed in the United States Code since its last release point.

In [33]:
url = 'http://uscode.house.gov/download/download.shtml'



headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
response = requests.get(url,
                        headers=headers,
                        timeout=5)

html = response.content
parsed_html = bs4.BeautifulSoup(html, "html.parser") 

In [34]:
changed_titles = [str(tag) for tag in parsed_html.find_all("div",{"class": "usctitlechanged"})]
changed_titles_list = [j.split(" <") for i in [i.split("          ") for i in changed_titles] for j in i]


#### Find a Python list with the top ten FBI's Most Wanted names.

In [35]:
# This is the url you will scrape in this exercise
url = 'https://www.fbi.gov/wanted/topten'


headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
response = requests.get(url,
                        headers=headers,
                        timeout=5)

html = response.content
parsed_html = bs4.BeautifulSoup(html, "html.parser") 
#print(parsed_html.prettify())

In [36]:
# your code here

top_wanted = [tag.attrs for tag in parsed_html.find_all("img",{"alt": True})]

[i["alt"] for i in top_wanted if "topten" in i["src"]]       

['WILVER VILLEGAS-PALOMINO',
 'ALEJANDRO ROSALES CASTILLO',
 'RUJA IGNATOVA',
 'DONALD EUGENE FIELDS II',
 'ARNOLDO JIMENEZ',
 'OMAR ALEXANDER CARDENAS',
 'ALEXIS FLORES',
 'YULAN ADONAY ARCHAGA CARIAS',
 'BHADRESHKUMAR CHETANBHAI PATEL',
 'JOSE RODOLFO VILLARREAL-HERNANDEZ']

####  Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe.

In [37]:
# This is the url you will scrape in this exercise
url = 'https://www.emsc-csem.org/Earthquake/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
response = requests.get(url,
                        headers=headers,
                        timeout=5)

html = response.content
parsed_html = bs4.BeautifulSoup(html, "html.parser") 
#print(parsed_html.prettify())

In [38]:
# your code here

# Date and Time

date_time = [str(tag.string) for tag in parsed_html.find_all("a",{"href": True})]
date_time = [i.replace("\xa0"," ") for i in date_time if "2023" in i]
date_time =[i.split("   ") for i in date_time]

date = [i[0] for i in date_time]
time = [i[1] for i in date_time]

# Latitude and Longitude

latitude_longitude1 = [str(tag.string) for tag in parsed_html.find_all("td",{"class": "tabev1"})]
latitude_longitude1
latitude_longitude2 = [str(tag.string) for tag in parsed_html.find_all("td",{"class": "tabev2"})]
latitude_longitude2 = [i for i in latitude_longitude2 if "S" in i or "N" in i or "E" in i or "W" in i]
latitude_longitude = [j.replace("\xa0","") + " " + k.replace("\xa0","") for j,k in zip(latitude_longitude1,latitude_longitude2)]

latitude = [i for i in latitude_longitude if "S" in i or "N" in i]
longitude = [i for i in latitude_longitude if "E" in i or "W" in i]

# Region

region = [str(tag.string).replace("\xa0","") for tag in parsed_html.find_all("td",{"class": "tb_region"})]

# DataFrame

earthquakes_dict = {"Date":date,"Time":time,"Latitude":latitude,"Longitude":longitude,"Region":region}
pd.DataFrame(earthquakes_dict).head(21)

Unnamed: 0,Date,Time,Latitude,Longitude,Region
0,2023-06-26,18:04:08.0,24.47 N,93.09 E,"MANIPUR, INDIA REGION"
1,2023-06-26,17:54:34.3,15.28 S,174.46 W,TONGA
2,2023-06-26,17:52:07.6,38.06 N,29.03 E,WESTERN TURKEY
3,2023-06-26,17:41:56.9,43.70 N,20.64 E,SERBIA
4,2023-06-26,17:35:22.1,44.19 S,168.64 E,SOUTH ISLAND OF NEW ZEALAND
5,2023-06-26,17:34:02.9,38.03 N,37.54 E,CENTRAL TURKEY
6,2023-06-26,17:30:31.1,36.55 N,28.44 E,DODECANESE IS.-TURKEY BORDER REG
7,2023-06-26,17:26:52.3,40.37 N,20.71 E,ALBANIA
8,2023-06-26,17:26:01.9,43.59 S,172.40 E,SOUTH ISLAND OF NEW ZEALAND
9,2023-06-26,17:16:04.5,40.68 N,29.98 E,WESTERN TURKEY


#### Count the number of tweets by a given Twitter account.
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the number of tweets for any provided account.

In [None]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [None]:
# your code here

#### Number of followers of a given twitter account
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the followers for any provided account.

In [None]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [None]:
# your code here

#### List all language names and number of related articles in the order they appear in wikipedia.org.

In [39]:
# This is the url you will scrape in this exercise
url = 'https://www.wikipedia.org/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
response = requests.get(url,
                        headers=headers,
                        timeout=5)

html = response.content
parsed_html = bs4.BeautifulSoup(html, "lxml") 
#print(parsed_html.prettify())

In [41]:
# your code here

nameList = parsed_html.findAll('a', {'class' : 'link-box'})
nameList

for name in nameList:
    print(name.get_text())


English
6 668 000+ articles


日本語
1 376 000+ 記事


Español
1 869 000+ artículos


Русский
1 921 000+ статей


Deutsch
2 808 000+ Artikel


Français
2 528 000+ articles


Italiano
1 814 000+ voci


中文
1 360 000+ 条目 / 條目


فارسی
965 000+ مقاله


Português
1 103 000+ artigos



#### A list with the different kind of datasets available in data.gov.uk.

In [42]:
# This is the url you will scrape in this exercise
url = 'https://data.gov.uk/'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
response = requests.get(url,
                        headers=headers,
                        timeout=5)

html = response.content
parsed_html = bs4.BeautifulSoup(html, "lxml") 
#print(parsed_html.prettify())

In [43]:
# your code here

db_types = [tag.string for tag in parsed_html.find_all("a",{"class": "govuk-link"})]
db_types = db_types[4:]

db_sub_types = [tag.string for tag in parsed_html.find_all("p",{"class": "govuk-body"})]
db_sub_types = db_sub_types[1:]

datasets_uk = {"Available Datasets":db_types,"Specifics":db_sub_types}
pd.DataFrame(datasets_uk)

Unnamed: 0,Available Datasets,Specifics
0,Business and economy,"Small businesses, industry, imports, exports a..."
1,Crime and justice,"Courts, police, prison, offenders, borders and..."
2,Defence,"Armed forces, health and safety, search and re..."
3,Education,"Students, training, qualifications and the Nat..."
4,Environment,"Weather, flooding, rivers, air quality, geolog..."
5,Government,"Staff numbers and pay, local councillors and d..."
6,Government spending,Includes all payments by government department...
7,Health,"Includes smoking, drugs, alcohol, medicine per..."
8,Mapping,"Addresses, boundaries, land ownership, aerial ..."
9,Society,"Employment, benefits, household finances, pove..."


#### Display the top 10 languages by number of native speakers stored in a pandas dataframe.

In [45]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0'}
response = requests.get(url,
                        headers=headers,
                        timeout=5)

html = response.content
parsed_html = bs4.BeautifulSoup(html, "lxml") 
#print(parsed_html.prettify())

In [46]:
# your code here

languages_attrs = [tag.attrs for tag in parsed_html.find_all("a",{"class": "mw-redirect"})]
languages_strings = [tag.string for tag in parsed_html.find_all("a",{"class": "mw-redirect"})]

[i for i,j in zip(languages_strings,languages_attrs) if "ISO" in j["title"]][:10]


['Mandarin Chinese',
 'Spanish',
 'English',
 'Hindi',
 'Portuguese',
 'Bengali',
 'Russian',
 'Japanese',
 'Yue Chinese',
 'Vietnamese']

## Bonus
#### Scrape a certain number of tweets of a given Twitter account.

In [None]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [None]:
# your code here

#### Display IMDB's top 250 data (movie name, initial release, director name and stars) as a pandas dataframe.

In [None]:
# This is the url you will scrape in this exercise 
url = 'https://www.imdb.com/chart/top'

In [None]:
# your code here

#### Display the movie name, year and a brief summary of the top 10 random movies (IMDB) as a pandas dataframe.

In [None]:
#This is the url you will scrape in this exercise
url = 'http://www.imdb.com/chart/top'

In [None]:
# your code here

#### Find the live weather report (temperature, wind speed, description and weather) of a given city.

In [None]:
#https://openweathermap.org/current
city = input('Enter the city: ')
url = 'http://api.openweathermap.org/data/2.5/weather?'+'q='+city+'&APPID=b35975e18dc93725acb092f7272cc6b8&units=metric'

In [None]:
# your code here

#### Find the book name, price and stock availability as a pandas dataframe.

In [None]:
# This is the url you will scrape in this exercise. 
# It is a fictional bookstore created to be scraped. 
url = 'http://books.toscrape.com/'

In [None]:
# your code here