- [Requests library](http://docs.python-requests.org/en/master/#the-user-guide) documentation 
- [Beautiful Soup Doc](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
- [Urllib](https://docs.python.org/3/library/urllib.html#module-urllib)
- [re lib](https://docs.python.org/3/library/re.html)
- [lxml lib](https://lxml.de/)
- [Scrapy](https://scrapy.org/)
- [List of HTTP status codes](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes)
- [HTML basics](http://www.simplehtmlguide.com/cheatsheet.php)
- [CSS basics](https://www.cssbasics.com/#page_start)

#### Below are the libraries and modules you may need. `requests`,  `BeautifulSoup` and `pandas` are imported for you. If you prefer to use additional libraries feel free to uncomment them.

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
# from pprint import pprint
# from lxml import html
# from lxml.html import fromstring
# import urllib.request
# from urllib.request import urlopen
# import random
# import re
# import scrapy

#### Download, parse (using BeautifulSoup), and print the content from the Trending Developers page from GitHub:

In [3]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/developers'

In [12]:
#your code

html = requests.get(url).content
html[0:100]

soup = BeautifulSoup(html, 'lxml')
soup

tags = ['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'h7', 'p']
text = [element.text for element in soup.find_all(tags)]
text


['Learn and contribute',
 'Connect with others',
 'Trending',
 '\n      These are the developers building the hot tools today.\n    ',
 '\n\n            Armani Ferrante\n ',
 '\n\n              armaniferrante\n ',
 ' San Francisco, California',
 '\n\n            Julien Le Coupanec\n ',
 '\n\n              LeCoupa\n ',
 '\n\n\n      awesome-cheatsheets\n ',
 '\n\n            Anton Medvedev\n ',
 '\n\n              antonmedv\n ',
 '\n\n\n      fx\n ',
 '\n\n            Kyle Conroy\n ',
 '\n\n              kyleconroy\n ',
 '\n\n\n      sqlc\n ',
 '\n\n            mattn\n ',
 '\n\n              mattn\n ',
 '\n\n\n      go-sqlite3\n ',
 '\n\n            Bartlomiej Plotka\n ',
 '\n\n              bwplotka\n ',
 '\n\n\n      unity-grpc\n ',
 '\n\n            Will McGugan\n ',
 '\n\n              willmcgugan\n ',
 '\n\n\n      rich\n ',
 '\n\n            Stephan Dilly\n ',
 '\n\n              extrawurst\n ',
 '\n\n\n      gitui\n ',
 '\n\n            Juliette\n ',
 '\n\n              jrfnl\n '

#### Display the names of the trending developers retrieved in the previous step.

Your output should be a Python list of developer names. Each name should not contain any html tag.

**Instructions:**

1. Find out the html tag and class names used for the developer names. You can achieve this using Chrome DevTools.

1. Use BeautifulSoup to extract all the html elements that contain the developer names.

1. Use string manipulation techniques to replace whitespaces and linebreaks (i.e. `\n`) in the *text* of each html element. Use a list to store the clean names.

1. Print the list of names.

Your output should look like below:

```
['trimstray (@trimstray)',
 'joewalnes (JoeWalnes)',
 'charlax (Charles-AxelDein)',
 'ForrestKnight (ForrestKnight)',
 'revery-ui (revery-ui)',
 'alibaba (Alibaba)',
 'Microsoft (Microsoft)',
 'github (GitHub)',
 'facebook (Facebook)',
 'boazsegev (Bo)',
 'google (Google)',
 'cloudfetch',
 'sindresorhus (SindreSorhus)',
 'tensorflow',
 'apache (TheApacheSoftwareFoundation)',
 'DevonCrawford (DevonCrawford)',
 'ARMmbed (ArmMbed)',
 'vuejs (vuejs)',
 'fastai (fast.ai)',
 'QiShaoXuan (Qi)',
 'joelparkerhenderson (JoelParkerHenderson)',
 'torvalds (LinusTorvalds)',
 'CyC2018',
 'komeiji-satori (神楽坂覚々)',
 'script-8']
 ```

In [19]:
#your code

text_clean = [te.strip() for te in text]
text_clean

['Learn and contribute',
 'Connect with others',
 'Trending',
 'These are the developers building the hot tools today.',
 'Armani Ferrante',
 'armaniferrante',
 'San Francisco, California',
 'Julien Le Coupanec',
 'LeCoupa',
 'awesome-cheatsheets',
 'Anton Medvedev',
 'antonmedv',
 'fx',
 'Kyle Conroy',
 'kyleconroy',
 'sqlc',
 'mattn',
 'mattn',
 'go-sqlite3',
 'Bartlomiej Plotka',
 'bwplotka',
 'unity-grpc',
 'Will McGugan',
 'willmcgugan',
 'rich',
 'Stephan Dilly',
 'extrawurst',
 'gitui',
 'Juliette',
 'jrfnl',
 'Advies en zo',
 '陈帅',
 'chenshuai2144',
 'useMediaQuery',
 'Andrey Sitnik',
 'ai',
 'nanoid',
 'Marc Rousavy',
 'mrousavy',
 'react-native-mmkv',
 'Josh Bleecher Snyder',
 'josharian',
 'impl',
 'Ritchie Vink',
 'ritchie46',
 'serverless-model-aws',
 'maiyang',
 'yangwenmai',
 'learning-golang',
 'David Pedersen',
 'davidpdrsn',
 'json-parser',
 'Kenny Kerr',
 'kennykerr',
 'cppwinrt',
 'Ha Thach',
 'hathach',
 'tinyusb',
 'Arvid Norberg',
 'arvidn',
 'libtorrent',
 'Simo

#### Display the trending Python repositories in GitHub

The steps to solve this problem is similar to the previous one except that you need to find out the repository names instead of developer names.

In [20]:
# This is the url you will scrape in this exercise
url = 'https://github.com/trending/python?since=daily'

In [22]:
#your code

html = requests.get(url).content
html[0:100]

soup = BeautifulSoup(html, 'lxml')
soup

tags = ['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'h7', 'p']
text = [element.text for element in soup.find_all(tags)]
text

text_clean = [te.strip() for te in text]
text_clean

['Learn and contribute',
 'Connect with others',
 'Trending',
 'See what the GitHub community is most excited about today.',
 'donnemartin /\n\n      system-design-primer',
 'Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.',
 'pyston /\n\n      pyston',
 'A faster and highly-compatible implementation of the Python programming language.',
 'pallupz /\n\n      covid-vaccine-booking',
 'This very basic script can be used to automate some steps on Co-WIN Platform.',
 'hellerve /\n\n      programming-talks',
 'Awesome & interesting talks about programming',
 'ericaltendorf /\n\n      plotman',
 'Chia plotting manager',
 'fighting41love /\n\n      funNLP',
 '中英文敏感词、语言检测、中外手机/电话归属地/运营商查询、名字推断性别、手机号抽取、身份证抽取、邮箱抽取、中日文人名库、中文缩写库、拆字词典、词汇情感值、停用词、反动词表、暴恐词表、繁简体转换、英文模拟中文发音、汪峰歌词生成器、职业名称词库、同义词库、反义词库、否定词库、汽车品牌词库、汽车零件词库、连续英文切割、各种中文词向量、公司名字大全、古诗词库、IT词库、财经词库、成语词库、地名词库、历史名人词库、诗词词库、医学词库、饮食词库、法律词库、汽车词库、动物词库、中文聊天语料、中文谣言数据、百度中文问答数据集、句子相似度匹配算法集合、bert资源、文本生成&

#### Display all the image links from Walt Disney wikipedia page

In [None]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/Walt_Disney'

In [None]:
#your code

#### Retrieve an arbitary Wikipedia page of "Python" and create a list of links on that page

In [None]:
# This is the url you will scrape in this exercise
url ='https://en.wikipedia.org/wiki/Python' 

In [None]:
#your code

#### Number of Titles that have changed in the United States Code since its last release point 

In [None]:
# This is the url you will scrape in this exercise
url = 'http://uscode.house.gov/download/download.shtml'

In [None]:
#your code

#### A Python list with the top ten FBI's Most Wanted names 

In [None]:
# This is the url you will scrape in this exercise
url = 'https://www.fbi.gov/wanted/topten'

In [None]:
#your code 

####  20 latest earthquakes info (date, time, latitude, longitude and region name) by the EMSC as a pandas dataframe

In [None]:
# This is the url you will scrape in this exercise
url = 'https://www.emsc-csem.org/Earthquake/'

In [None]:
#your code

#### Display the date, days, title, city, country of next 25 hackathon events as a Pandas dataframe table

In [None]:
# This is the url you will scrape in this exercise
url ='https://hackevents.co/hackathons'

In [None]:
#your code

#### Count number of tweets by a given Twitter account.

You will need to include a ***try/except block*** for account names not found. 
<br>***Hint:*** the program should count the number of tweets for any provided account

In [None]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [None]:
#your code

#### Number of followers of a given twitter account

You will need to include a ***try/except block*** in case account/s name not found. 
<br>***Hint:*** the program should count the followers for any provided account

In [None]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [None]:
#your code

#### List all language names and number of related articles in the order they appear in wikipedia.org

In [None]:
# This is the url you will scrape in this exercise
url = 'https://www.wikipedia.org/'

In [None]:
#your code

#### A list with the different kind of datasets available in data.gov.uk 

In [None]:
# This is the url you will scrape in this exercise
url = 'https://data.gov.uk/'

In [None]:
#your code 

#### Top 10 languages by number of native speakers stored in a Pandas Dataframe

In [None]:
# This is the url you will scrape in this exercise
url = 'https://en.wikipedia.org/wiki/List_of_languages_by_number_of_native_speakers'

In [None]:
#your code

### BONUS QUESTIONS

#### Scrape a certain number of tweets of a given Twitter account.

In [None]:
# This is the url you will scrape in this exercise 
# You will need to add the account credentials to this url
url = 'https://twitter.com/'

In [None]:
# your code

#### IMDB's Top 250 data (movie name, Initial release, director name and stars) as a pandas dataframe

In [None]:
# This is the url you will scrape in this exercise 
url = 'https://www.imdb.com/chart/top'

In [None]:
# your code

#### Movie name, year and a brief summary of the top 10 random movies (IMDB) as a pandas dataframe.

In [None]:
#This is the url you will scrape in this exercise
url = 'http://www.imdb.com/chart/top'

In [None]:
#your code

#### Find the live weather report (temperature, wind speed, description and weather) of a given city.

In [None]:
#https://openweathermap.org/current
city = city=input('Enter the city:')
url = 'http://api.openweathermap.org/data/2.5/weather?'+'q='+city+'&APPID=b35975e18dc93725acb092f7272cc6b8&units=metric'

In [None]:
# your code

#### Book name,price and stock availability as a pandas dataframe.

In [None]:
# This is the url you will scrape in this exercise. 
# It is a fictional bookstore created to be scraped. 
url = 'http://books.toscrape.com/'

In [None]:
#your code