# <p style="text-align:center;font-size:70px;background-color:#0ad61b;color:white;font-style:italic;">Python for web</p>

![](http://www.blog.skytopper.com/wp-content/uploads/2015/06/Global-computer-network.jpg)

This bootcamp is all about interacting with **web** using Python programming language!

In this bootcamp, we will learn:

- to work with web APIs
- to download content from web
- web scraping
- web automation

using simple python scripts!

![](https://i.amz.mshcdn.com/mqczOBQlR2uS7uALqB4fkKylDx0=/fit-in/1200x9600/https%3A%2F%2Fblueprint-api-production.s3.amazonaws.com%2Fuploads%2Fcard%2Fimage%2F193985%2Fnewhere.jpg)

# 1. Working with web APIs

- **What is API?**<br>
    API is a set of routines, protocols, and tools for building software applications. An API specifies how software components should interact. 
![](https://www.retriever.nl/wp-content/uploads/2016/11/api-321x250.png)
-------------

- **What is web API?**<br>
    Web API is a framework for building HTTP services that can be consumed by a broad range of clients including browsers, mobiles, iphone and tablets.
![](http://dselva.co.in/blog/wp-content/uploads/2017/09/Web-APIs.png)
-----------------
- **Some examples of public web APIs:**
    - [Facebook Graph API](https://developers.facebook.com/docs/graph-api)
    - [Twitter API](https://dev.twitter.com/rest/public)
    - [Google API explorer](https://developers.google.com/apis-explorer/#p/)
--------------

- **What is REST?**<br>
    REST is an architectural style followed by web services, in which, they allow requesting systems to access and manipulate their Web resources using a uniform and predefined set of **stateless operations**.
    
    >In computing, a stateless protocol is a communications protocol in which no information is retained by either sender or receiver. The sender transmits a packet to the receiver and does not expect an acknowledgment of receipt. There is nothing saved that has to be remembered by the next transaction. The server must be able to completely understand the client request without using any server context or server session state. 
    
   Advantages of REST:
   - As the transactions are stateless, we can direct them to any instance of the web service. (As no sessions are involved). Hence, the web service can scale to accommodate load changes.
   - Binding to a service through an API is a matter of controlling how the URL is decoded.

-----------------
- **Types of HTTP requests**
    - GET
    - POST
    - DELETE
    - PUT
    - PATCH, etc.
    
![](http://lotsofthing.com/wp-content/uploads/2017/11/rest-api-1.jpg)

### HTTP  for humans: [requests](http://docs.python-requests.org/en/master/)

<img src="http://docs.python-requests.org/en/master/_static/requests-sidebar.png"  height=200 width=200>


- Requests is one of the most downloaded Python packages of all time, pulling in over 7,000,000 downloads every month.All the cool kids are doing it

- Recreational use of other HTTP libraries may result in dangerous side-effects, including: security vulnerabilities, verbose code, reinventing the wheel, constantly reading documentation, depression, headaches, or even death. Requests is the only Non-GMO HTTP library for Python, safe for human consumption.

- Python HTTP: When in doubt, or when not in doubt, use Requests. Beautiful, simple, Pythonic.

***Everybody loves it!***

#### Installation

```
pip install requests
```

## GET request

### Example 1

http://graph.facebook.com/4/picture?type=large

![](http://graph.facebook.com/4/picture?type=large)

![](https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTJ3ELNuC_coeH9tvLn62fsTMoe-vMVQrsfTrLUOIhsUI69i5QIyg)

![](http://i.imgur.com/gRvt4lV.png)

In [1]:
import requests

In [2]:
url = "http://graph.facebook.com/4/picture?type=large"

In [3]:
r = requests.get(url)

In [4]:
r.content

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x02\x00\x00\x01\x00\x01\x00\x00\xff\xed\x00\x9cPhotoshop 3.0\x008BIM\x04\x04\x00\x00\x00\x00\x00\x80\x1c\x02g\x00\x14EijAN2tIMJ-3dRhxQN6D\x1c\x02(\x00bFBMD01000ac0030000dc060000890b0000fe0c0000600e00001e1100005f1600000f17000066180000a619000065220000\xff\xe2\x02\x1cICC_PROFILE\x00\x01\x01\x00\x00\x02\x0clcms\x02\x10\x00\x00mntrRGB XYZ \x07\xdc\x00\x01\x00\x19\x00\x03\x00)\x009acspAPPL\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf6\xd6\x00\x01\x00\x00\x00\x00\xd3-lcms\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\ndesc\x00\x00\x00\xfc\x00\x00\x00^cprt\x00\x00\x01\\\x00\x00\x00\x0bwtpt\x00\x00\x01h\x00\x00\x00\x14bkpt\x00\x00\x01|\x00\x00\x00\x14rXYZ\x00\x00\x01\x90\x00\x00\x00\x14gXYZ\x00\x00\x01\xa4\x00\x00\x00\x14bXYZ\x00\x00\x01\xb8\x00\x00\x00\x1

In [5]:
r.status_code

200

In [6]:
with open("mark.jpg", 'wb') as f:
    f.write(r.content)

### Example 2

[Google maps geocoding API](https://developers.google.com/maps/documentation/geocoding/intro)

In [12]:
key = "AIzaSyBG1or9IV1cQrAUx7GZrUD6xvASD6mSJy"

In [13]:
url = "https://maps.googleapis.com/maps/api/geocode/json"

In [14]:
params = {
    "address": "coding blocks, pitampura",
    "key": key
}

In [15]:
r = requests.get(url, params=params)

In [None]:
r.url

In [18]:
r.status_code

200

In [17]:
r.content

b'{\n   "results" : [\n      {\n         "address_components" : [\n            {\n               "long_name" : "New Delhi",\n               "short_name" : "New Delhi",\n               "types" : [ "locality", "political" ]\n            },\n            {\n               "long_name" : "Delhi",\n               "short_name" : "DL",\n               "types" : [ "administrative_area_level_1", "political" ]\n            },\n            {\n               "long_name" : "India",\n               "short_name" : "IN",\n               "types" : [ "country", "political" ]\n            },\n            {\n               "long_name" : "110034",\n               "short_name" : "110034",\n               "types" : [ "postal_code" ]\n            }\n         ],\n         "formatted_address" : "47, Nishant Kunj, 1st & 2nd Floor, Pitampura Main Road, Opposite Metro Pillar 337, Pitampura, New Delhi, Delhi 110034, India",\n         "geometry" : {\n            "location" : {\n               "lat" : 28.6969421,\n    

In [19]:
data = r.json()

In [25]:
data['results'][0].keys()

dict_keys(['address_components', 'formatted_address', 'geometry', 'place_id', 'plus_code', 'types'])

In [29]:
data['results'][0]['formatted_address']

'47, Nishant Kunj, 1st & 2nd Floor, Pitampura Main Road, Opposite Metro Pillar 337, Pitampura, New Delhi, Delhi 110034, India'

In [31]:
data['results'][0]['geometry']['location']

{'lat': 28.6969421, 'lng': 77.14238250000001}

In [27]:
data['results'][0]

{'address_components': [{'long_name': 'New Delhi',
   'short_name': 'New Delhi',
   'types': ['locality', 'political']},
  {'long_name': 'Delhi',
   'short_name': 'DL',
   'types': ['administrative_area_level_1', 'political']},
  {'long_name': 'India',
   'short_name': 'IN',
   'types': ['country', 'political']},
  {'long_name': '110034', 'short_name': '110034', 'types': ['postal_code']}],
 'formatted_address': '47, Nishant Kunj, 1st & 2nd Floor, Pitampura Main Road, Opposite Metro Pillar 337, Pitampura, New Delhi, Delhi 110034, India',
 'geometry': {'location': {'lat': 28.6969421, 'lng': 77.14238250000001},
  'location_type': 'GEOMETRIC_CENTER',
  'viewport': {'northeast': {'lat': 28.6982910802915,
    'lng': 77.14373148029152},
   'southwest': {'lat': 28.6955931197085, 'lng': 77.14103351970851}}},
 'place_id': 'ChIJ_ZBg-dEDDTkRCYK3Ee8ywoI',
 'plus_code': {'compound_code': 'M4WR+QX Delhi, India',
  'global_code': '7JWVM4WR+QX'},
 'types': ['establishment', 'point_of_interest']}

In [33]:
from pprint import pprint

In [34]:
pprint(data)

{'results': [{'address_components': [{'long_name': 'New Delhi',
                                      'short_name': 'New Delhi',
                                      'types': ['locality', 'political']},
                                     {'long_name': 'Delhi',
                                      'short_name': 'DL',
                                      'types': ['administrative_area_level_1',
                                                'political']},
                                     {'long_name': 'India',
                                      'short_name': 'IN',
                                      'types': ['country', 'political']},
                                     {'long_name': '110034',
                                      'short_name': '110034',
                                      'types': ['postal_code']}],
              'formatted_address': '47, Nishant Kunj, 1st & 2nd Floor, '
                                   'Pitampura Main Road, Opposite Metro Pillar '
 

In [36]:
url = "https://httpbin.org/ip"

In [37]:
r = requests.get(url)

In [38]:
r.content

b'{\n  "origin": "112.196.171.229"\n}\n'

In [39]:
r.json()['origin']

'112.196.171.229'

In [40]:
r.headers

{'Connection': 'keep-alive', 'Server': 'gunicorn/19.9.0', 'Date': 'Wed, 02 Jan 2019 13:08:51 GMT', 'Content-Type': 'application/json', 'Content-Length': '34', 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Credentials': 'true', 'Via': '1.1 vegur'}

In [41]:
r.request.headers

{'User-Agent': 'python-requests/2.21.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}

In [75]:
proxies = {
    "https": "179.127.241.133:49187"
}

In [76]:
url = "https://httpbin.org/ip"

In [77]:
r = requests.get(url, proxies=proxies)

In [78]:
r.json()

{'origin': '179.127.240.254'}

## POST request

![](https://indianpythonista.files.wordpress.com/2016/12/iservice_post_get.png?w=809)

### Example 1

[Pastebin API](https://pastebin.com/api)

In [18]:
image_upload_url = "https://api.imgur.com/3/image"

In [19]:
new_album_url = "https://api.imgur.com/3/album"

In [20]:
headers = {
    "Authorization": "Client-ID 8bd0be01cb87666"
}

In [5]:
data = {
    'title': 'My album'
}

In [9]:
r = requests.post(url, data=data, headers=headers)

In [10]:
r.content

b'{"data":{"id":"FkGY4fT","deletehash":"jpMwxaoqVT4DU9g"},"success":true,"status":200}'

In [33]:
album_id = "jpMwxaoqVT4DU9g"

In [12]:
album_id

'FkGY4fT'

In [13]:
import os

In [14]:
os.getcwd()

'/home/nikhil/Desktop/PythonForWeb'

In [15]:
os.listdir()

['.git',
 'PyWeb.ipynb',
 'solution.cpp',
 '.ipynb_checkpoints',
 '.gitignore',
 'mark.jpg',
 'BasicsOfPython.ipynb']

In [16]:
os.chdir('/home/nikhil/Pictures/python/')

In [21]:
image_paths = os.listdir()

In [24]:
image_ids = []

In [40]:
for image_path in image_paths[:1]:
    with open(image_path, 'rb') as f:
        image = f.read()
    data = {
        'image': image
    }
    r = requests.post(image_upload_url, headers=headers, data=data)
    image_ids.append(r.json()['data']['id'])
    print(r.json())

{'data': {'id': 'L7IvsoX', 'title': None, 'description': None, 'datetime': 1546596701, 'type': 'image/png', 'animated': False, 'width': 1000, 'height': 1000, 'size': 246622, 'views': 0, 'bandwidth': 0, 'vote': None, 'favorite': False, 'nsfw': None, 'section': None, 'account_url': None, 'account_id': 0, 'is_ad': False, 'in_most_viral': False, 'has_sound': False, 'tags': [], 'ad_type': 0, 'ad_url': '', 'in_gallery': False, 'deletehash': 'sZsRGK2j4OLQ8X4', 'name': '', 'link': 'https://i.imgur.com/L7IvsoX.png'}, 'success': True, 'status': 200}


In [26]:
image_ids

['VYgV1sU',
 '1cSNagE',
 'ODxNwZD',
 'JA2UaHX',
 'fQIND6u',
 'fzZwhvX',
 'y819mPV',
 'R4sxWE1',
 'Qo1n0nK',
 'L6kCCDJ',
 's2YSNAm',
 'xzmellF',
 'W9NeDcD']

In [34]:
album_update_url = "https://api.imgur.com/3/album/{}/add".format(album_id)

In [35]:
album_update_url

'https://api.imgur.com/3/album/jpMwxaoqVT4DU9g/add'

In [36]:
data = {
    'ids': image_ids
}

In [37]:
r = requests.post(album_update_url, data=data, headers=headers)

In [38]:
r.json()

{'data': {'error': 'You must own all the image ids to add them to album FkGY4fT',
  'request': '/3/album/jpMwxaoqVT4DU9g/add',
  'method': 'POST'},
 'success': False,
 'status': 403}

# 2. Downloading files

![](https://pics.onsizzle.com/downloading-98-downloading-99-downloading-failed-11367153.png)

Downloading large files in chunks!

http://www.greenteapress.com/thinkpython/thinkpython.pdf

```python
chunk_size = 256
r = requests.get(url, stream=True)

with open("python.pdf", "wb") as f:
    for chunk in r.iter_content(chunk_size=chunk_size):
        f.write(chunk)
```

In [46]:
url = "http://www.greenteapress.com/thinkpython/thinkpython.pdf"

In [47]:
chunk_size = 256

In [62]:
r = requests.get(url, stream=True)

In [55]:
from math import ceil

In [56]:
ceil(1.1)

2

In [58]:
total = ceil(int(r.headers['Content-Length']) / chunk_size)

In [59]:
total

3261

In [63]:
with open("python.pdf", 'wb') as f:
    for chunk in tqdm_notebook(r.iter_content(chunk_size=chunk_size), total=total):
        f.write(chunk)

HBox(children=(IntProgress(value=0, max=3261), HTML(value='')))




In [61]:
from tqdm import tqdm_notebook

In [65]:
for x in tqdm_notebook(range(10000000)):
    pass

HBox(children=(IntProgress(value=0, max=10000000), HTML(value='')))




# 3. Web scraping

![](https://image.slidesharecdn.com/scrapingtotherescue-160713133749/95/getting-started-with-web-scraping-in-python-9-638.jpg?cb=1468417631)


## [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)

>Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

### Installation

```
pip install bs4
```

**Bonus:**
```
pip install html5lib
```

https://www.values.com/inspirational-quotes

In [66]:
url = "https://www.passiton.com/inspirational-quotes"

In [67]:
r = requests.get(url)

In [68]:
r.content

b'<!DOCTYPE html>\n<html dir="ltr" lang="en-US">\n<head>\n    <meta http-equiv="content-type" content="text/html; charset=utf-8" />\n    <link href="https://fonts.googleapis.com/css?family=Roboto:900|Lato:300,400,400italic,600,700|Raleway:300,400,500,600,700|Crete+Round:400italic|Zilla+Slab" rel="stylesheet" type="text/css" />\n    <link rel="stylesheet" media="all" href="/assets/application-f7c42ac1a26bc766307e42ff5df71b58.css" />\n\n    <meta charset="utf-8" />\n    <meta name="viewport" content="width=device-width, initial-scale=1" />\n    <!--[if lt IE 9]><script src="http://css3-mediaqueries-js.googlecode.com/svn/trunk/css3-mediaqueries.js"></script><![endif]-->\n    <title>Inspirational Quotes - Motivational Quotes - Leadership Quotes | PassItOn.com</title>\n    <meta name="description" content="Find the perfect quotation from our hand-picked collection of inspiring quotes by hundreds of authors." />\n    <meta name="csrf-param" content="authenticity_token" />\n<meta name="csrf-t

In [69]:
from bs4 import BeautifulSoup

In [72]:
soup = BeautifulSoup(r.content, 'html5lib')

In [73]:
print(soup.prettify())

<!DOCTYPE html>
<html dir="ltr" lang="en-US">
 <head>
  <meta content="text/html; charset=utf-8" http-equiv="content-type"/>
  <link href="https://fonts.googleapis.com/css?family=Roboto:900|Lato:300,400,400italic,600,700|Raleway:300,400,500,600,700|Crete+Round:400italic|Zilla+Slab" rel="stylesheet" type="text/css"/>
  <link href="/assets/application-f7c42ac1a26bc766307e42ff5df71b58.css" media="all" rel="stylesheet"/>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <!--[if lt IE 9]><script src="http://css3-mediaqueries-js.googlecode.com/svn/trunk/css3-mediaqueries.js"></script><![endif]-->
  <title>
   Inspirational Quotes - Motivational Quotes - Leadership Quotes | PassItOn.com
  </title>
  <meta content="Find the perfect quotation from our hand-picked collection of inspiring quotes by hundreds of authors." name="description"/>
  <meta content="authenticity_token" name="csrf-param"/>
  <meta content="xPcLg04+AnBnXXbyyT4OLbGnNrsz3emGNA

In [77]:
article_divs = soup.findAll('article')

In [79]:
article_div = article_divs[0]

In [80]:
article_div

<article class="portfolio-item quotation optimism">
    <div class="portfolio-image">
        <a href="/inspirational-quotes/7856-the-secret-of-change-is-to-focus-all-of-your"><img alt="The secret of change is to focus all of your energy, not on fighting the old, but on building the new. #&lt;Author:0x007f2246ce37a8&gt;" class="hover" src="https://quotes.values.com/quote_artwork/7856/medium/20190103_thursday_quote.jpg?1546034335"/></a>
    </div>
</article>

In [84]:
article_div.div.a.img['alt']

'The secret of change is to focus all of your energy, not on fighting the old, but on building the new. #<Author:0x007f2246ce37a8>'

In [87]:
article_div.div.a.img['src']

'https://quotes.values.com/quote_artwork/7856/medium/20190103_thursday_quote.jpg?1546034335'

In [90]:
soup.head.title.text

'Inspirational Quotes - Motivational Quotes - Leadership Quotes | PassItOn.com'

In [91]:
articles = []

In [92]:
for article_div in article_divs:
    article = {}
    article['img'] = article_div.div.a.img['src']
    article['txt'] = article_div.div.a.img['alt']
    articles.append(article)

In [94]:
import json

In [95]:
with open("quotes.json", 'w') as f:
    f.write(json.dumps(articles))

In [96]:
import csv

In [100]:
with open("quotes.csv", 'w') as f:
    csv_writer = csv.DictWriter(f, ['img', 'txt'])
    csv_writer.writeheader()
    csv_writer.writerows(articles)

![](http://www.entropywebscraping.com/wp-content/uploads/2017/02/Screenshot-from-2017-02-01-10-23-00.png)

# 4. Web automation
 
 ![](https://images.contentful.com/qs7jgwzogkzr/6HeUbprAsMYek2Keqi0WYo/d8ad7cf2f15e706ead76e00a53859cc7/testing-automation-alternatives.jpg)
 
 **Task:** Automatically submit the code for a [problem](https://www.codechef.com/problems/TEST) on [codechef](https://www.codechef.com/).
 
 ### [Selenium](http://selenium-python.readthedocs.io/) : Web automation and testing
 
 ![](https://udemy-images.udemy.com/course/750x422/482754_7146_4.jpg)
 
 
 #### Installation
 
 - To install python bindings for selenium:
     ```
     pip install selenium
     ```
     
 - To install webdriver:
 
     http://selenium-python.readthedocs.io/installation.html#drivers
     
     [How to put webdriver in PATH?](https://stackoverflow.com/questions/40208051/selenium-using-python-geckodriver-executable-needs-to-be-in-path)
 
 #### To start a browser session
 ```python
 from selenium import webdriver
 browser = webdriver.Chrome()
 ```
 
 #### To open a webpage
 ```python
 browser.get('https://www.codechef.com')
 ```
 
 #### To select an element by its id
 ```python
 browser.find_element_by_id(<id>)
 ```
 
 #### Input value in element
 ```python
 element.send_keys()
 ```
 
 #### Click on an element
 ```python
 element.click()
 ```

In [101]:
from selenium import webdriver

In [133]:
browser = webdriver.Chrome()

In [134]:
browser.get("https://www.codechef.com")

In [135]:
user_element = browser.find_element_by_id('edit-name')

In [136]:
user_element.send_keys("nikhilksingh97")

In [137]:
pass_element = browser.find_element_by_id('edit-pass')

In [138]:
from getpass import getpass

In [None]:
pass_element.send_keys(getpass("Enter password:"))

In [140]:
browser.find_element_by_id('edit-submit').click()

In [141]:
browser.get("https://www.codechef.com/submit/TEST/")

In [143]:
browser.find_element_by_id('edit_area_toggle_checkbox_edit-program').click()

In [144]:
with open("solution.cpp", 'r') as f:
    code = f.read()

In [146]:
browser.find_element_by_id('edit-program').send_keys(code)

In [147]:
browser.find_element_by_xpath('//*[@id="edit-language"]/option[2]').click()

In [148]:
browser.find_element_by_id('edit-submit-1').click()

In [150]:
soup = BeautifulSoup(browser.page_source, 'html5lib')

In [156]:
soup.find('div', {'id': 'result-box'}).div.strong.text

'Correct Answer'

In [157]:
requests.__version__

'2.21.0'

![](https://i.imgflip.com/poxkz.jpg)

## Resourses:

- Python packages:

    - [requests](http://docs.python-requests.org/en/master/)

    - [bs4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)
    
    - [html5lib](https://html5lib.readthedocs.io/en/latest/)
 

- Articles:

    - https://indianpythonista.wordpress.com/2016/12/10/get-and-post-requests-using-python/

    - https://indianpythonista.wordpress.com/2016/10/18/requests-http-for-pythonistas/

    - https://indianpythonista.wordpress.com/2016/12/10/downloading-files-from-web-using-python/

    - https://indianpythonista.wordpress.com/2016/12/10/implementing-web-scraping-in-python-with-beautiful-soup/


- Videos:

    - File downloader: https://www.youtube.com/watch?v=Xhw2l-hzoKk
    - Web scraping: https://www.youtube.com/watch?v=lIkd_jt28i0&t=557s