# Web Scraping Lab

You will find in this notebook some scrapy exercises to practise your scraping skills.

**Tips:**

- Check the response status code for each request to ensure you have obtained the intended content.
- Print the response text in each request to understand the kind of info you are getting and its format.
- Check for patterns in the response text to extract the data/info requested in each question.
- Visit the urls below and take a look at their source code through Chrome DevTools. You'll need to identify the html tags, special class names, etc used in the html content you are expected to extract.

**Resources**:
- [Requests library](http://docs.python-requests.org/en/master/#the-user-guide)
- [Beautiful Soup Doc](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib](https://docs.python.org/3/library/urllib.html#module-urllib)
- [re lib](https://docs.python.org/3/library/re.html)
- [lxml lib](https://lxml.de/)
- [Scrapy](https://scrapy.org/)
- [List of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
- [HTML basics](http://www.simplehtmlguide.com/cheatsheet.php)
- [CSS basics](https://www.cssbasics.com/#page_start)

#### Below are the libraries and modules you may need. `requests`,  `BeautifulSoup` and `pandas` are already imported for you. If you prefer to use additional libraries feel free to do it.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re

In [None]:
# He llegado hasta los terremotos, me he parado ahí
# Lo he envío sin terminar, lo añado a tareas pendientes, perdón

#### Download, parse (using BeautifulSoup), and print the content from the Trending Developers page from GitHub:

In [2]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/developers'

In [3]:
# your code here
res = requests.get(url).content
res[0:600]
type(res)

bytes

In [4]:
soup = BeautifulSoup(res, "html.parser")
type(soup)

bs4.BeautifulSoup

#### Display the names of the trending developers retrieved in the previous step.

Your output should be a Python list of developer names. Each name should not contain any html tag.

**Instructions:**

1. Find out the html tag and class names used for the developer names. You can achieve this using Chrome DevTools.

1. Use BeautifulSoup to extract all the html elements that contain the developer names.

1. Use string manipulation techniques to replace whitespaces and linebreaks (i.e. `\n`) in the *text* of each html element. Use a list to store the clean names.

1. Print the list of names.

Your output should look like below:

```
['trimstray (@trimstray)',
 'joewalnes (JoeWalnes)',
 'charlax (Charles-AxelDein)',
 'ForrestKnight (ForrestKnight)',
 'revery-ui (revery-ui)',
 'alibaba (Alibaba)',
 'Microsoft (Microsoft)',
 'github (GitHub)',
 'facebook (Facebook)',
 'boazsegev (Bo)',
 'google (Google)',
 'cloudfetch',
 'sindresorhus (SindreSorhus)',
 'tensorflow',
 'apache (TheApacheSoftwareFoundation)',
 'DevonCrawford (DevonCrawford)',
 'ARMmbed (ArmMbed)',
 'vuejs (vuejs)',
 'fastai (fast.ai)',
 'QiShaoXuan (Qi)',
 'joelparkerhenderson (JoelParkerHenderson)',
 'torvalds (LinusTorvalds)',
 'CyC2018',
 'komeiji-satori (神楽坂覚々)',
 'script-8']
 ```

In [5]:
# string(nombre) en etiqueta a en etiqueta h1
# string(link) en etiqueta a en etiqueta p
#soup = 

In [6]:
tags = [tag for tag in soup.find_all('article')]
tags

[<article class="Box-row d-flex" id="pa-jerryjliu">
 <a class="color-fg-muted f6" data-view-component="true" href="#pa-jerryjliu" style="width: 16px;" text="center">
     1
 </a>
 <div class="mx-3">
 <a data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"TRENDING_DEVELOPERS_PAGE","click_target":"OWNER","click_visual_representation":"TRENDING_DEVELOPER","actor_id":null,"record_id":4858925,"originating_url":"https://github.com/trending/developers","user_id":null}}' data-hydro-click-hmac="98cdf677281db729746da5d1bd41c0f9897e930798697694f5b3ae5427e599e2" data-view-component="true" href="/jerryjliu">
 <img alt="@jerryjliu" class="rounded avatar-user" height="48" src="https://avatars.githubusercontent.com/u/4858925?s=96&amp;v=4" width="48">
 </img></a> </div>
 <div class="d-sm-flex flex-auto">
 <div class="col-sm-8 d-md-flex">
 <div class="col-md-6">
 <h1 class="h3 lh-condensed">
 <a data-hydro-click='{"event_type":"explore.click","payload":{"click_context":"TRENDING_

In [7]:
tags[0].h1.a.attrs


{'data-hydro-click': '{"event_type":"explore.click","payload":{"click_context":"TRENDING_DEVELOPERS_PAGE","click_target":"OWNER","click_visual_representation":"TRENDING_DEVELOPER","actor_id":null,"record_id":4858925,"originating_url":"https://github.com/trending/developers","user_id":null}}',
 'data-hydro-click-hmac': '98cdf677281db729746da5d1bd41c0f9897e930798697694f5b3ae5427e599e2',
 'href': '/jerryjliu',
 'data-view-component': 'true'}

In [8]:

tags[0].h1.a.string.strip()

'Jerry Liu'

In [9]:
print(type(tags))
[tags[i].h1.a.string.strip() for i in range(len(tags)) if tags[i].h1.a.string != None]

<class 'list'>


['Jerry Liu',
 'Matthias Fey',
 'Sertaç Özercan',
 'Harrison Chase',
 'Agniva De Sarker',
 'Yair Morgenstern',
 'Alessandro Ros',
 'Ed Page',
 'Steven Tey',
 'Sayak Paul',
 'dgtlmoon',
 'atomiks',
 'Aaron Pham',
 'bmaltais',
 'Ha Thach',
 'Pedro Cuenca',
 'Matthew Tancik',
 'Ee Durbin',
 'hiroki osame',
 'Younes Belkada',
 'Evan Wallace',
 'Jesse Glick',
 'Chansung Park',
 'Steve Sanderson',
 'Arvin Xu']

In [10]:
tags = [tag.a for tag in soup.find_all('h1', {'class':['h3', '1h_condensed']})]
names = [tag.string for tag in tags]


In [11]:
names = [name for name in names if name != None]
names = [name.strip() for name in names]
names

['Jerry Liu',
 'Matthias Fey',
 'Sertaç Özercan',
 'Harrison Chase',
 'Agniva De Sarker',
 'Yair Morgenstern',
 'Alessandro Ros',
 'Ed Page',
 'Steven Tey',
 'Sayak Paul',
 'dgtlmoon',
 'atomiks',
 'Aaron Pham',
 'bmaltais',
 'Ha Thach',
 'Pedro Cuenca',
 'Matthew Tancik',
 'Ee Durbin',
 'hiroki osame',
 'Younes Belkada',
 'Evan Wallace',
 'Jesse Glick',
 'Chansung Park',
 'Steve Sanderson',
 'Arvin Xu']

In [12]:
tags = [tag.a.attrs['href'] for tag in soup.find_all('h1', {'class':['h3', '1h_condensed']})]
users = [tag for tag in tags]

In [13]:
users = [name for name in users if name != None]
users = [name.strip().split('/')[1] for name in users]
users

['jerryjliu',
 'rusty1s',
 'sozercan',
 'hwchase17',
 'agnivade',
 'yairm210',
 'aler9',
 'epage',
 'steven-tey',
 'sayakpaul',
 'dgtlmoon',
 'atomiks',
 'aarnphm',
 'bmaltais',
 'hathach',
 'pcuenca',
 'tancik',
 'ewdurbin',
 'privatenumber',
 'younesbelkada',
 'evanw',
 'jglick',
 'deep-diver',
 'SteveSandersonMS',
 'arvinxx']

In [14]:
lista = []
for i in range(len(names)):
    lista.append(f'{names[i]} ({users[i]})')
    
lista

['Jerry Liu (jerryjliu)',
 'Matthias Fey (rusty1s)',
 'Sertaç Özercan (sozercan)',
 'Harrison Chase (hwchase17)',
 'Agniva De Sarker (agnivade)',
 'Yair Morgenstern (yairm210)',
 'Alessandro Ros (aler9)',
 'Ed Page (epage)',
 'Steven Tey (steven-tey)',
 'Sayak Paul (sayakpaul)',
 'dgtlmoon (dgtlmoon)',
 'atomiks (atomiks)',
 'Aaron Pham (aarnphm)',
 'bmaltais (bmaltais)',
 'Ha Thach (hathach)',
 'Pedro Cuenca (pcuenca)',
 'Matthew Tancik (tancik)',
 'Ee Durbin (ewdurbin)',
 'hiroki osame (privatenumber)',
 'Younes Belkada (younesbelkada)',
 'Evan Wallace (evanw)',
 'Jesse Glick (jglick)',
 'Chansung Park (deep-diver)',
 'Steve Sanderson (SteveSandersonMS)',
 'Arvin Xu (arvinxx)']

#### Display the trending Python repositories in GitHub.

The steps to solve this problem is similar to the previous one except that you need to find out the repository names instead of developer names.

In [15]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/python?since=daily'

In [16]:
# your code here
# a este ejercicio le añado la t de trending para diferenciar

In [17]:
res_t = requests.get(url)
res_t = res_t.content
res_t[0:50]

b'\n\n<!DOCTYPE html>\n<html lang="en" data-color-mode='

In [18]:
soup_t = BeautifulSoup(res_t, 'html.parser')
type(soup_t)

bs4.BeautifulSoup

In [19]:
tags_t = [tag_t for tag_t in soup_t.find_all('article')]
tags_t[0:1]

[<article class="Box-row">
 <div class="float-right d-flex">
 <div class="BtnGroup d-flex" data-view-component="true">
 <a aria-label="You must be signed in to star a repository" class="tooltipped tooltipped-s btn-sm btn BtnGroup-item" data-hydro-click='{"event_type":"authentication.click","payload":{"location_in_page":"star button","repository_id":615973283,"auth_type":"LOG_IN","originating_url":"https://github.com/trending/python?since=daily","user_id":null}}' data-hydro-click-hmac="b7043555985ea2979eca68c8f789c8710bac60ea3a2fb64096c16e5875805573" data-view-component="true" href="/login?return_to=%2Fnsarrazin%2Fserge" rel="nofollow"> <svg aria-hidden="true" class="octicon octicon-star v-align-text-bottom d-none d-md-inline-block mr-2" data-view-component="true" height="16" version="1.1" viewbox="0 0 16 16" width="16">
 <path d="M8 .25a.75.75 0 0 1 .673.418l1.882 3.815 4.21.612a.75.75 0 0 1 .416 1.279l-3.046 2.97.719 4.192a.751.751 0 0 1-1.088.791L8 12.347l-3.766 1.98a.75.75 0 0 1-1.0

In [20]:
#tags_t[0].h1.a.string.strip()


In [21]:
tags_t = [tag_t.a for tag_t in soup_t.find_all('h1', {'class':['h3', '1h_condensed']})]
names_t = [tag_t.string for tag_t in tags_t]


In [22]:
names_t = [name_t for name_t in names_t if name_t != None]
#names_t = [name_t.strip() for name_t in names_t]
names_t

[]

In [23]:
tags_t = [tag_t.a.attrs['href'] for tag_t in soup_t.find_all('h1', {'class':['h3', '1h_condensed']})]
users_t = [tag_t for tag_t in tags_t]

In [24]:
users_t = [name_t for name_t in users_t if name_t != None]
users_t = [name_t.split('/') for name_t in users_t]
users_t = [users_t[2] for users_t in users_t]
users_t

['serge',
 'myGPTReader',
 'zhao',
 'RWKV-LM',
 'BELLE',
 'fauxpilot',
 'Alpaca-LoRA-Serve',
 'ChuanhuChatGPT',
 'chatgpt_stock_report',
 'ChatRWKV',
 'text2room',
 'gerev',
 'GLM-130B',
 'llama_index',
 '30-Days-Of-Python',
 'redash',
 'instruct-pix2pix',
 'langchain',
 'CodeGen',
 'BlenderGPT',
 'MM-REACT',
 'saime-script',
 'sqlfluff',
 'sentry',
 'LAVIS']

In [25]:
#[tags[i].h1.string.strip() for i in range(len(tags_t)) if tags[i].h1.string != None]

In [26]:

'''
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

repo_list = soup.find_all('h1', {'class': 'h3 lh-condensed'})
trending_repos = []

for repo in repo_list:
    trending_repos.append(repo.text.strip())

print(trending_repos)
'''

"\nresponse = requests.get(url)\nsoup = BeautifulSoup(response.text, 'html.parser')\n\nrepo_list = soup.find_all('h1', {'class': 'h3 lh-condensed'})\ntrending_repos = []\n\nfor repo in repo_list:\n    trending_repos.append(repo.text.strip())\n\nprint(trending_repos)\n"

#### Display all the image links from Walt Disney wikipedia page.

In [18]:
# This is the url you will scrape in this exercise
url_d = 'https://en.wikipedia.org/wiki/Walt_Disney'

In [19]:
# your code here
res_d = requests.get(url_d)
res_d = res_d.content
res_d[0:50]

b'<!DOCTYPE html>\n<html class="client-nojs vector-fe'

In [20]:
soup_d = BeautifulSoup(res_d, 'html.parser')
type(soup_d)

bs4.BeautifulSoup

In [21]:
'''
tags_d = [tag_d for tag_d in soup_d.find_all('a')]
tags_d[0:5]
'''

"\ntags_d = [tag_d for tag_d in soup_d.find_all('a')]\ntags_d[0:5]\n"

In [22]:
'''
tags_d = [tag_d.img.attrs['src'] for tag_d in soup_d.find_all('h1', {'class':['image']})]
users_d = [tag_d for tag_d in tags_d]
users_d
'''

"\ntags_d = [tag_d.img.attrs['src'] for tag_d in soup_d.find_all('h1', {'class':['image']})]\nusers_d = [tag_d for tag_d in tags_d]\nusers_d\n"

In [23]:
tags_d = soup_d.find_all('img')
tags_d = [img['src'] for img in tags_d]
tags_d

['/static/images/icons/wikipedia.png',
 '/static/images/mobile/copyright/wikipedia-wordmark-en.svg',
 '/static/images/mobile/copyright/wikipedia-tagline-en.svg',
 '//upload.wikimedia.org/wikipedia/en/thumb/e/e7/Cscr-featured.svg/20px-Cscr-featured.svg.png',
 '//upload.wikimedia.org/wikipedia/en/thumb/8/8c/Extended-protection-shackle.svg/20px-Extended-protection-shackle.svg.png',
 '//upload.wikimedia.org/wikipedia/commons/thumb/d/df/Walt_Disney_1946.JPG/220px-Walt_Disney_1946.JPG',
 '//upload.wikimedia.org/wikipedia/commons/thumb/8/87/Walt_Disney_1942_signature.svg/150px-Walt_Disney_1942_signature.svg.png',
 '//upload.wikimedia.org/wikipedia/commons/thumb/3/3a/Walt_Disney_Birthplace_Exterior_Hermosa_Chicago_Illinois.jpg/220px-Walt_Disney_Birthplace_Exterior_Hermosa_Chicago_Illinois.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/c/c4/Walt_Disney_envelope_ca._1921.jpg/220px-Walt_Disney_envelope_ca._1921.jpg',
 '//upload.wikimedia.org/wikipedia/commons/thumb/0/0d/Trolley_Troubles_p

#### Retrieve an arbitary Wikipedia page of "Python" and create a list of links on that page.

In [54]:
# This is the url you will scrape in this exercise
url_p ='https://en.wikipedia.org/wiki/Python' 

In [55]:
# your code here
res_p = requests.get(url_p)
res_p = res_p.content
res_p[0:1]

b'<'

In [56]:
soup_p = BeautifulSoup(res_p, 'html.parser')
type(soup_p)

bs4.BeautifulSoup

In [57]:
tags_p = soup_p.find_all('li')
print(len(tags_p))
tags_p

136


[<li class="mw-list-item" id="n-mainpage-description"><a accesskey="z" href="/wiki/Main_Page" title="Visit the main page [z]"><span>Main page</span></a></li>,
 <li class="mw-list-item" id="n-contents"><a href="/wiki/Wikipedia:Contents" title="Guides to browsing Wikipedia"><span>Contents</span></a></li>,
 <li class="mw-list-item" id="n-currentevents"><a href="/wiki/Portal:Current_events" title="Articles related to current events"><span>Current events</span></a></li>,
 <li class="mw-list-item" id="n-randompage"><a accesskey="x" href="/wiki/Special:Random" title="Visit a randomly selected article [x]"><span>Random article</span></a></li>,
 <li class="mw-list-item" id="n-aboutsite"><a href="/wiki/Wikipedia:About" title="Learn about Wikipedia and how it works"><span>About Wikipedia</span></a></li>,
 <li class="mw-list-item" id="n-contactpage"><a href="//en.wikipedia.org/wiki/Wikipedia:Contact_us" title="How to contact Wikipedia"><span>Contact us</span></a></li>,
 <li class="mw-list-item" id

In [58]:
tags_p = soup_p.find_all('a')
print(len(tags_p))
tags_p

159


[<a class="mw-jump-link" href="#bodyContent">Jump to content</a>,
 <a accesskey="z" href="/wiki/Main_Page" title="Visit the main page [z]"><span>Main page</span></a>,
 <a href="/wiki/Wikipedia:Contents" title="Guides to browsing Wikipedia"><span>Contents</span></a>,
 <a href="/wiki/Portal:Current_events" title="Articles related to current events"><span>Current events</span></a>,
 <a accesskey="x" href="/wiki/Special:Random" title="Visit a randomly selected article [x]"><span>Random article</span></a>,
 <a href="/wiki/Wikipedia:About" title="Learn about Wikipedia and how it works"><span>About Wikipedia</span></a>,
 <a href="//en.wikipedia.org/wiki/Wikipedia:Contact_us" title="How to contact Wikipedia"><span>Contact us</span></a>,
 <a href="https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&amp;utm_medium=sidebar&amp;utm_campaign=C13_en.wikipedia.org&amp;uselang=en" title="Support us by donating to the Wikimedia Foundation"><span>Donate</span></a>,
 <a href=

In [59]:
tags_p = [tag['href'] for tag in tags_p]
tags_p

['#bodyContent',
 '/wiki/Main_Page',
 '/wiki/Wikipedia:Contents',
 '/wiki/Portal:Current_events',
 '/wiki/Special:Random',
 '/wiki/Wikipedia:About',
 '//en.wikipedia.org/wiki/Wikipedia:Contact_us',
 'https://donate.wikimedia.org/wiki/Special:FundraiserRedirector?utm_source=donate&utm_medium=sidebar&utm_campaign=C13_en.wikipedia.org&uselang=en',
 '/wiki/Help:Contents',
 '/wiki/Help:Introduction',
 '/wiki/Wikipedia:Community_portal',
 '/wiki/Special:RecentChanges',
 '/wiki/Wikipedia:File_upload_wizard',
 '/wiki/Main_Page',
 '/wiki/Special:Search',
 '/w/index.php?title=Special:CreateAccount&returnto=Python',
 '/w/index.php?title=Special:UserLogin&returnto=Python',
 '/w/index.php?title=Special:CreateAccount&returnto=Python',
 '/w/index.php?title=Special:UserLogin&returnto=Python',
 '/wiki/Help:Introduction',
 '/wiki/Special:MyContributions',
 '/wiki/Special:MyTalk',
 '#',
 '#Snakes',
 '#Computing',
 '#People',
 '#Roller_coasters',
 '#Vehicles',
 '#Weaponry',
 '#Other_uses',
 '#See_also',
 

#### Find the number of titles that have changed in the United States Code since its last release point.

In [41]:
# This is the url you will scrape in this exercise
url_c = 'http://uscode.house.gov/download/download.shtml'

In [43]:
# your code here
res_c = requests.get(url_c)
res_c = res_c.content
res_c[0:1]

b'<'

In [45]:
soup_c = BeautifulSoup(res_c, 'html.parser')
type(soup_c)

bs4.BeautifulSoup

In [67]:
tags_c = [tag_c.string.strip().split('\n\n') for tag_c in soup_c.find_all('div', {'class':['usctitlechanged']})]
tags_c

[['Title 50 - War and National Defense']]

#### Find a Python list with the top ten FBI's Most Wanted names.

In [64]:
# This is the url you will scrape in this exercise
url_f = 'https://www.fbi.gov/wanted/topten'

In [65]:
# your code here
res_f = requests.get(url_f)
res_f = res_f.content
res_f[0:1]

b'<'

In [66]:
soup_f = BeautifulSoup(res_f, 'html.parser')
type(soup_f)

bs4.BeautifulSoup

tags = [tag.a for tag in soup.find_all('h1', {'class':['h3', '1h_condensed']})]
names = [tag.string for tag in tags]

In [76]:
tags_f = [tag_f.a.string for tag_f in soup_f.find_all('h3', {'class':['title']})]
tags_f

['OMAR ALEXANDER CARDENAS',
 'ALEXIS FLORES',
 'BHADRESHKUMAR CHETANBHAI PATEL',
 'ALEJANDRO ROSALES CASTILLO',
 'YULAN ADONAY ARCHAGA CARIAS',
 'RUJA IGNATOVA',
 'ARNOLDO JIMENEZ',
 'MICHAEL JAMES PRATT',
 'JOSE RODOLFO VILLARREAL-HERNANDEZ',
 'RAFAEL CARO-QUINTERO']

####  Display the 20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe.

In [2]:
# This is the url you will scrape in this exercise
url_e = 'https://www.emsc-csem.org/Earthquake/'

In [3]:
# your code here
res_e = requests.get(url_e)
res_e = res_e.content
res_e[0:1]

b'<'

In [4]:
soup_e = BeautifulSoup(res_e, 'html.parser')
#soup_e

In [5]:
soup_e = soup_e.find_all('tr', {'class':['ligne1 normal']})
print(len(soup_e))
soup_e

25


[<tr class="ligne1 normal" id="1243374" onclick="go_details(event,1243374);"><td class="tabev0"></td><td class="tabev0"></td><td class="tabev0"></td><td class="tabev6"><b><i style="display:none;">earthquake</i><a href="/Earthquake/earthquake.php?id=1243374">2023-03-29   08:03:25.6</a></b><i class="ago" id="ago0">06min ago</i></td><td class="tabev1">38.13 </td><td class="tabev2">N  </td><td class="tabev1">37.14 </td><td class="tabev2">E  </td><td class="tabev3">7</td><td class="tabev5" id="magtyp0">ML</td><td class="tabev2">2.2</td><td class="tb_region" id="reg0"> CENTRAL TURKEY</td><td class="comment updatetimeno" id="upd0" style="text-align:right;">2023-03-29 08:07</td></tr>,
 <tr class="ligne1 normal" id="1243370" onclick="go_details(event,1243370);"><td class="tabev0"></td><td class="tabev0"></td><td class="tabev0"></td><td class="tabev6"><b><i style="display:none;">earthquake</i><a href="/Earthquake/earthquake.php?id=1243370">2023-03-29   07:51:10.5</a></b><i class="ago" id="ago2">

In [6]:
soup_e = soup_e.find_all('td')
soup_e

AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

#### Count the number of tweets by a given Twitter account.
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the number of tweets for any provided account.

In [37]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [38]:
# your code here

#### Number of followers of a given twitter account
Ask the user for the handle (@handle) of a twitter account. You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the followers for any provided account.

In [39]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [40]:
# your code here

#### List all language names and number of related articles in the order they appear in wikipedia.org.

In [41]:
# This is the url you will scrape in this exercise
url = 'https://www.wikipedia.org/'

In [42]:
# your code here

#### A list with the different kind of datasets available in data.gov.uk.

In [43]:
# This is the url you will scrape in this exercise
url = 'https://data.gov.uk/'

In [44]:
# your code here

#### Display the top 10 languages by number of native speakers stored in a pandas dataframe.

In [45]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'

In [46]:
# your code here

## Bonus
#### Scrape a certain number of tweets of a given Twitter account.

In [47]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [48]:
# your code here

#### Display IMDB's top 250 data (movie name, initial release, director name and stars) as a pandas dataframe.

In [49]:
# This is the url you will scrape in this exercise 
url = 'https://www.imdb.com/chart/top'

In [50]:
# your code here

#### Display the movie name, year and a brief summary of the top 10 random movies (IMDB) as a pandas dataframe.

In [51]:
#This is the url you will scrape in this exercise
url = 'http://www.imdb.com/chart/top'

In [52]:
# your code here

#### Find the live weather report (temperature, wind speed, description and weather) of a given city.

In [None]:
#https://openweathermap.org/current
city = input('Enter the city: ')
url = 'http://api.openweathermap.org/data/2.5/weather?'+'q='+city+'&APPID=b35975e18dc93725acb092f7272cc6b8&units=metric'

In [None]:
# your code here

#### Find the book name, price and stock availability as a pandas dataframe.

In [None]:
# This is the url you will scrape in this exercise. 
# It is a fictional bookstore created to be scraped. 
url = 'http://books.toscrape.com/'

In [None]:
# your code here