# ch18_The_Web_Untangled

### World Wide Web and distilled its design into three simple ideas:
1. HTTP (Hypertext Transfer Protocol)
A protocol for web clients and servers to interchange requests and
responses.
2. HTML (Hypertext Markup Language)
A presentation format for results.
3. URL (Uniform Resource Locator:A way to uniquely represent a server and a resource on that serve
4. Almost every computer language has been used to write web clients and
web servers. The dynamic languages Perl, PHP, and Ruby have been
especially popular. In this chapter, I show why Python is a particularly good language for web work at every level:
5. Clients:  to access remote sites
6. Servers:  to provide data for websites and web APIs
7. Web APIs and services:  to interchange data in other ways than
viewable web pages


## Web Clients
1. TCP/IP , HTTP send URL
2. This simplifies(簡化)basic web operations but complicates[ˋkɑmplə͵ketɪd]  (複雜的)others. Here are just a few samples of the challenges:
3. Caching: Remote content that doesn’t change should be saved by the web client and used to avoid downloading from the server again.
4. Sessions:A shopping website should remember the contents of your shopping cart.
5. Authentication[ɔ͵θɛntɪˋkeʃən]身分驗證: Sites that require your username and password should remember them while you’re logged in

#### Test with telnet : 
1. in the windows 10 add telnet
2. in CMD : telnet www.google.com 80

#### Test with curl (https://curl.se/)
The curl program is probably the most popular command-line web client.
Documentation includes the book Everything Curl(https://curl.se/book.html), in HTML, PDF, and
ebook formats. A table compares curl with similar tools(https://curl.se/docs/comparison-table.html). The download
page includes all the major platforms, and many obscure ones(https://curl.se/download.html).

1. in the CMD :　curl http://www.example.com
2. This uses HEAD:　in the CMD :curl --head http://www.example.com　
3. If you’re passing arguments, you can include them in the command line or a data file. In these examples, I use the following:
4. url for any website
5. data.txt as a text data file with these contents: a=1&b=2
6. data.json as a JSON data file with these contents: {"a":1, "b":2}
7. a=1&b=2 as two data arguments
8. Using default (form-encoded) arguments:
$ curl -X POST -d "a=1&b=2" url
$ curl -X POST -d "@data.txt" url
9. For JSON-encoded arguments:
$ curl -X POST -d "{'a':1,'b':2}" -H "Content-Type: application/json" url
, $ curl -X POST -d "@data.json" url

In [None]:
#### Test with httpie
A more Pythonic alternative to curl is httpie.
1.  pip install httpie 

$ http -f POST url a=1 b=2
$ http POST -f url < data.txt
The default encoding is JSON:
$ http POST url a=1 b=2
$ http POST url < data.json

## Python’s Standard Web Libraries
1. 1xx (information) : The server received the request but has some extra information for the client.
2. 2xx (success)  : It worked; every success code other than 200 conveys extra details.
3. 3xx (redirection) : The resource moved, so the response returns the new URL to the client.
4. 4xx (client error) : Some problem from the client side, such as the well-known 404 (not found). 418 (I’m a teapot) was an April Fool’s joke.
5. 5xx (server error) :500 is the generic whoops; you might see a 502 (bad gateway) if there’s some disconnect between a web server and a backend application server.

from Chapter 11 that a package is just a directory containing module files):
1. http manages all the client-server HTTP details:
2. client does the client-side stuff
3. server helps you write Python web servers
4. cookies and cookiejar manage cookies, which save data
between site visits
5. urllib runs on top of http:
6. request handles the client request
7. response handles the server response
8. parse cracks the parts of a URL

In [None]:
import urllib.request as ur

url = 'http://www.example.com/'
conn = ur.urlopen(url)
print(conn.status)

data = conn.read()
str_data = data.decode('utf8')
print(str_data)
print("===================================================")
print(str_data[:50])

In [None]:
# what HTTP headers were sent back to us?
for key, value in conn.getheaders():
    print(f'key = {key}, value ={value}')


## Beyond the Standard Library: requests
1. https://requests.kennethreitz.org/en/master/
2. pip install requests

In [None]:
import requests
resp = requests.get('http://example.com')
print(resp)
print(resp.status_code)
resp.text

In [None]:
# Example 18-1. ia.py #
# ====================#
# To show a JSON query
import json
import sys
import requests

def search(title):
    url = "http://archive.org/advancedsearch.php"
    params = {"q": f"title:({title})",
              "output": "json",
              "fields": "identifier,title",
              "rows": 50,
              "page": 1,}
    resp = requests.get(url, params = params )
    return resp.json()

if __name__ == "__main__":
    title = sys.argv[1]   # run: python ia,py wendigo : sys.argv[1] = wendigo
    data = search(title)
    docs = data["response"]["docs"]
    print(f"Found {len(docs)} items, showing first 10")
    print("identifier\ttitle")
    for row in docs[:10]:
        print(row["identifier"], row["title"], sep="\t")

In [None]:
%run ia.py wendigo

# Web Servers
those that I’ve found to be relatively simple to use and suitable for real websites. I’ll also show how to run the dynamic parts of a website with Python and other parts with a traditional web serve

In [None]:
# in CMD key : python -m http.server  (The Simplest Python Web Server)
# in web browsor type :  http://localhost:8000
#  python -m http.server 9999

### Web Server Gateway Interface (WSGI)
Python web development made a leap with the definition of the Web Server
Gateway Interface (WSGI), a universal API between Python web
applications and web servers. All of the Python web frameworks and web
servers in the rest of this chapter use WSGI.This is a synchronous
connection—one step follows another.

### ASGI 
In Appendix C, you’ll see more discussion, and
examples of new web frameworks that use ASGI.

### Apache
1. The apache web server’s best WSGI module is mod_wsgi (https://code.google.com/p/modwsgi). This can run
Python code within the Apache process or in separate processes that
communicate with Apache.
2. http://httpd.apache.org/

In [None]:
# Example 18-2. home.wsgi #
#=========================#
import bottle
application = bottle.default_app()

@bottle.route('/')
def home():
    return "apache and wsgi, sitting in a tree"

### NGINX 
The NGINX(http://nginx.org/) web server does not have an embedded Python module. Instead,
it’s a frontend to a separate WSGI server such as uWSGI or gUnicorn.
Together they make a very fast and configurable platform for Python web
development.

1. https://www.nginx.com/resources/wiki/start/topics/tutoirals/install/
2. https://www.nginx.com/
3. You can install nginx from its website. For examples of setting up Flask
with NGINX and a WSGI server, see this.https://flask.palletsprojects.com/en/1.0.x/deploying/wsgi-standalone/

## Other Python Web Servers
Following are some of the independent Python-based WSGI servers that
work like apache or nginx, using multiple processes and/or threads (see
“Concurrency”) to handle simultaneous requests:
1. uwsgi
2. cherrypy
3. pylons
4. Vere are some event-based servers, which use a single process but avoid
blocking on any single request:
1. tornado
2. gevent
3. gunicorn
4. I have more to say about events in the discussion about concurrency in
Chapter 15

# Web Server Frameworks
1. Web servers handle the HTTP and WSGI details, but you use web
frameworks to actually write the Python code that powers the site.
2. A web framework handles, at a minimum,client requests and server responses. Most major web frameworks include these tasks:
HTTP protocol handling , 
Authentication (authn, or who are you?) , 
Authorization (authz, or what can you do?) , 
Establish a session , 
Get parameters , 
Validate parameters (required/optional, type, range) , 
Handle HTTP verbs , 
Route (functions/classes) , 
Serve static files (HTML, JS, CSS, images) , 
Serve dynamic data (databases, services) , 
Return values and HTTP status ,
3. Optional features include:
Backend templates , 
Database connectivity, ORMs , 
Rate limiting , 
Asynchronous tasks ,

## Bottle
1. Bottle consists of a single Python file, so it’s very easy to try out, and it’s
easy to deploy later.
2. pip install bottle

In [None]:
# Example 18-3. bottle1.py #
# -------------------------#
from bottle import route, run

@route('/')
def home():
    return "It isn't fancy, but it's my home page"
run(host='localhost', port=9999)

# IN CND python bottle1.py 
# You should see this on your browser when you access
# http://localhost:9999/:

In [None]:
# Example 18-4. bottle2.py #
# =========================#
from bottle import route, run, static_file

@route('/')
def main():
    return static_file('index.html', root='.')
run(host='localhost', port=9999)

# in CMD : python bottle2.py
# in Browser : t http:/localhost:9999/,

In [None]:
# Example 18-5. bottle3.py #
#==========================#
from bottle import route, run, static_file
@route('/')
def home():
    return static_file('index.html', root='.')

@route('/echo/<thing>')
def echo(thing):
    return "Say hello to my little friend: %s!" % thing
run(host='localhost', port=9999)

# python bottle3.py
# http://localhost:9999/echo/Mothra

In [None]:
# Example 18-6. bottle_test.py #
#==============================#
import requests
    resp = requests.get('http://localhost:9999/echo/Mothra')

if resp.status_code == 200 and \
resp.text == 'Say hello to my little friend: Mothra!':
    print('It worked! That almost never happens!')
else:
    print('Argh, got this:', resp.text)
    
# python bottle_test.py

## Flask
1. It’s my personal favorite among Python web frameworks because it balances ease of use with a rich feature set.
2. pip install flask

In [None]:
# Example 18-7. flask1.py #
# ======================= #
from flask import Flask
# Flask’s default directory home for static files is static, and URLs
# for files there also begin with /static. We change the folder to
# '.' (current directory) and the URL prefix to '' (empty) to allow
# the URL / to map to the file index.html.
app = Flask(__name__, static_folder='.', static_url_path='')

@app.route('/')
def home():
    return app.send_static_file('index.html')

@app.route('/echo/<thing>')
def echo(thing):
    return "Say hello to my little friend: %s" % thing

#  run() function, setting debug=True also activates the
# automatic reloader
app.run(port=9999, debug=True)

In [None]:
#Step1 Then, run the server from a terminal or window: in CMD : 
# python flask1.py

In [None]:
# Step2 Test the home page by typing this URL into your browser:
# http://localhost:9999/

# Step3 Try the /echo endpoint:
# http://localhost:9999/echo/Godzilla

In [None]:
# Example 18-8. flask2.html #
# ==========================#
<html>
<head>
<title>Flask2 Example</title>
</head>
<body>
Say hello to my little friend: {{ thing }}
</body>
</html>

In [None]:
# Example 18-9. flask2.py #
# ======================= #
from flask import Flask, render_template
app = Flask(__name__)

@app.route('/')
def home():
    return app.send_static_file('index.html')

@app.route('/echo/<thing>')
def echo(thing):
    return render_template('flask2.html', thing=thing)

app.run(port=9999, debug=True)

# in CMD : python flask2.py
# Now, type this URL:
# http://localhost:9999/echo/Gamera
# You should see the following:
# Say hello to my little friend: Gamera


In [None]:
# flask3.html #
#=============#
<html>
<head>
<title>Flask3 Example</title>
</head>
<body>
Say hello to my little friend: {{ thing }}.
Alas, it just destroyed {{ place }}!
</body>
</html>

In [None]:
# Example 18-10. flask3a.py #
# ==========================#
from flask import Flask, render_template
app = Flask(__name__)

@app.route('/echo/<thing>/<place>')
def echo(thing, place):
    return render_template('flask3.html', thing=thing, place=place)
app.run(port=9999, debug=True)

# in CMD /; python flask3b.py 
# his time, use this URL:
# The URL would look like this:
# http://localhost:9999/echo/Rodan/McKeesport


In [None]:
# Example 18-11. flask3b.py #
# ========================= #
from flask import Flask, render_template, request

app = Flask(__name__)

@app.route('/echo/')
def echo():
    thing = request.args.get('thing')
    place = request.args.get('place')
    return render_template('flask3.html', thing=thing, place=place)
app.run(port=9999, debug=True)

# python flask3b.py
# This time, use this URL:
# http://localhost:9999/echo?thing=Gorgo&place=Wilmerding

# Django
Django(https://www.djangoproject.com/) is a very popular Python web framework, especially for large sites.It’s worth learning for many reasons, including frequent requests for
django experience in Python job ads. It includes ORM code (we talked
about ORMs in “The Object-Relational Mapper (ORM)”) to create
automatic web pages for the typical database CRUD functions (create,
replace, update, delete) that we looked at in Chapter 16. It a

# Other Frameworks
1. You can compare the frameworks by viewing this online table:
https://wiki.python.org/moin/WebFrameworks
2. fastapi: handles both synchronous (WSGI) and asynchronous
(ASGI) calls, uses type hints, generates test pages, and is well
documented. Recommended.
3. web2py:  covers much the same ground as django, with a different
style.
4. pyramid: grew from the earlier pylons project, and is similar to
django in scope.
5. turbogears supports an ORM, many databases, and multiple
template languages.
6. wheezy.web is a newer framework optimized for performance. It
was faster than the others in a recent test.
7. molten also uses type hints, but only supports WSGI.
8. apistar is similar to fastapi, but is more of an API validation tool
than a web framework.masonite is a Python version of Ruby on Rails, or PHP’s Laravel.

# Database Frameworks
1. The web and databases are the peanut butter and jelly (哥倆好)of computing: where
you find one, you’ll eventually find the other. In real-life Python
applications, at some point you’ll probably need to provide a web interface
(site and/or API) to a relational database.
2. You could build your own with:
2_1. A web framework like Bottle or Flask
2_2. A database package, like db-api or SQLAlchemy
2_3. A database driver, like pymysq

#### Instead, you could use a web/database package like one of these:

1. connexion : https://connexion.readthedocs.io/en/stable/
2. datasette : https://docs.datasette.io/en/stable/
3. sandman2 : https://github.com/jeffknupp/sandman2
flask-restles : https://flask-restless.readthedocs.io/en/stable/

Your database may not be a relational one. If your data schema varies
significantly—columns that differ markedly across rows—it might be
worthwhile to consider a schemaless database, such as one of the NoSQL
databases discussed in Chapter 16.

# Web Services and Automation

### webbrowser # 
#Let’s start begin a little surprise. Start a Python session in a terminal
window and type the following:

In [None]:
import antigravity

In [None]:
import webbrowser
url = 'http://www.python.org/'
webbrowser.open(url)

In [None]:
webbrowser.open_new(url)

In [None]:
webbrowser.open_new_tab('http://www.python.org/')

### webview
1. Rather than calling your browser as webbrowser does, webview displays
the page in its own window, using your machine’s native GUI.
2. For Windows: $ pip install pywebview[cef]

In [None]:
import webview
url = input("URL?")
URL? http://time.gov
webview.create_window(f"webview display of {url}", url)

## Web APIs(application programming interface ) and REST(Representational State Transfer)
1. Often, data is available only within web pages. If you want to access it, you
need to access the pages through a web browser and read it. If the authors
of the website made any changes since the last time you visited, the location
and style of the data might have changed.
2. Instead of publishing web pages, you can provide data through a web
application programming interface (API). Clients access your service by
making requests to URLs and getting back responses containing status and
data. Instead of HTML pages, the data is in formats that are easier for
programs to consume, such as JSON or XML (refer to Chapter 16 for more
about these formats
3. Representational State Transferr (REST):a REST interface or a RESTful
interface. In practice, this often only means that they have a web interface
—definitions of URLs to access a web service.
4. A RESTful service uses the HTTP verbs in specific ways:
5. HEAD gets information about the resource, but not its data.
6. GET retrieves the resource’s data from the server. This is the
standard method used by your browser. GET should not be used to create, change, or delete data.
7. POST creates a new resource.
8. PUT replaces an existing resource, creating it if it doesn’t exist.
9. PATCH partially updates a resource.
10. DELETE deletes. Truth in advertising

# Crawl and Scrape
You could extract what you’re looking for manually by doing the following:
1. Type the URL into your browser.
2. Wait for the remote page to load
3. Look through the displayed page for the information you want.
4. Write it down somewhere.
5. Possibly repeat the process for related URLs.

## An automated web fetcher is called a crawler or spider

### Scrapy
1.  pip install scrapy
2. Scrapy is a framework, not just a module such as BeautifulSoup. It does
more, but it’s more complex to set up. To learn more about Scrapy, read
“Scrapy at a Glance(https://docs.scrapy.org/en/latest/intro/overview.html)” and the tutorial(https://docs.scrapy.org/en/latest/intro/tutorial.html).

### BeautifulSoup
1. If you already have the HTML data from a website and just want to extract
data from it, BeautifulSoup(https://www.crummy.com/software/BeautifulSoup/)
is a good choice. 
2. pip install beautifulsoup4

In [None]:
# Example 18-13. links.py #
#=========================#
def get_links(url):
    import requests
    from bs4 import BeautifulSoup as soup
    result = requests.get(url)
    page = result.text
    doc = soup(page)
    links = [element.get('href') for element in doc.find_all('a')]
    return links

if __name__ == '__main__':
    import sys
    for url in sys.argv[1:]:
        print('Links in', url)
    for num, link in enumerate(get_links(url), start=1):
        print(num, link)
    print()
#in CMD : python links.py http://boingboing.net 

###  requests-html 
Kenneth Reitz, the author of the popular web client package requests, has
written a new scraping library called requests-html (for Python 3.6 and
newer versions).https://requests-html.kennethreitz.org/

## Let’s Watch a Movie
1. The following program shown in Example 18-14 does the following:
2. Prompts you for part of a movie or video title
3. Searches for it at the Internet Archive
4. Returns a list of identifiers, names, and descriptions
Lists them and asks you to select one
5. Displays that video in your web browser
6. Save this as iamovies.py.

In [None]:
# Example 18-14. iamovies.py #
# ===========================#
"""Find a video at the Internet Archive
by a partial title match and display it."""
import sys
import webbrowser
import requests

# The search() function uses requests to access the URL
def search(title):
    """Return a list of 3-item tuples (identifier,
       title, description) about videos
       whose titles partially match :title."""
    search_url = "https://archive.org/advancedsearch.php"
    params = {
        "q": "title:({}) AND mediatype:(movies)".format(title),
        "fl": "identifier,title,description",
        "output": "json",
        "rows": 10,
        "page": 1,
        }
    resp = requests.get(search_url, params=params)
    data = resp.json()
    docs = [(doc["identifier"], doc["title"], doc["description"])
            for doc in data["response"]["docs"]]
    return docs

def choose(docs):
    """Print line number, title and truncated description for
       each tuple in :docs. Get the user to pick a line
       number. If it's valid, return the first item in the
       chosen tuple (the "identifier"). Otherwise, return None."""
    last = len(docs) - 1
    for num, doc in enumerate(docs):
        print(f"{num}: ({doc[1]}) {doc[2][:30]}...")
    index = input(f"Which would you like to see (0 to {last})? ")
    try:
        return docs[int(index)][0]
    except:
        return None

def display(identifier):
    """Display the Archive video with :identifier in the browser"""
    details_url = "https://archive.org/details/{}".format(identifier)
    print("Loading", details_url)
    webbrowser.open(details_url)

def main(title):
    """Find any movies that match :title.
    Get the user's choice and display it in the browser."""
    identifiers = search(title)
    if identifiers:
        identifier = choose(identifiers)
        if identifier:
            display(identifier)
        else:
            print("Nothing selected")
    else:
        print("Nothing found for", title)

if __name__ == "__main__":
    main(sys.argv[1])

In [None]:
# python ia_movies.py eegah