# Chapter 17. Data in Space: Networks
There are many good reasons to challenge time and space:
1. Performance : Your goal is to keep fast components busy, not waiting for slow ones.
2. Robustness[roˋbʌstnɪs]強健 : There’s safety in numbers, so you want to duplicate tasks to work around hardware and software failures.
3. Simplicity [sɪmˋplɪsətɪ]簡化 : It’s best practice to break complex tasks into many little ones that are easier to create, understand, and fix. 
4.Scalability可擴展性 Increase your servers to handle load, decrease them to save money.

5.In this chapter, we work our way up from networking primitives to higher-level concepts

# 1. TCP/IP
1. UDP (User Datagram Protocol)This is used for short exchanges. A datagram is a tiny message sent in a
single burst(單發), like a note on a postcard. (無法確認訊息到達目的)
2. TCP (Transmission Control Protocol) This protocol is used for longer-lived connections. It sends streams of
bytes and ensures that they arrive in order without duplication.TCP sets up a secret handshake between sender and receiver to ensure a good connection

### Sockets (通訊端)
1. The lowest level of network programming uses a socket
2. In the following client and server code, address is a tuple of (address,port). The address is a string, which can be a name or an IP address. When your programs are just talking to one another on the same machine, you can use the name 'localhost' or the equivalent address string '127.0.0.1'.

### UDP
1. in the directory of 17-1. udp_server.py and 17-2. udp_client.py : run cmd.exe
2. first Run : python udp_server.py
3. Second Run : python udp_client.py
4. Ps: UDP sends data in single chunks. It does not guarantee delivery.

In [None]:
# Example 17-1. udp_server.py
from datetime import datetime
import socket

server_address = ('localhost', 6789)   # tuple

max_size = 4096

print('Starting the server at', datetime.now())
print('Waiting for a client to call.')
# AF_INET means we’ll create an IP socket.         SOCK_DGRAM means we’ll send and receive datagrams
# in other words, we’ll use UDP.
server = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)  # 建立一個通訊端 UDP
#  binds to it (listens to any data arriving at that IP(address and port)
server.bind(server_address)     # Server 才需要
# the server sits and waits for a datagram to come in (recvfrom)
data, client = server.recvfrom(max_size)    # wait client  datagram
print('At', datetime.now(), client, 'said', data)
server.sendto(b'Are you talking to me?', client)
server.close()


In [None]:
# Example 17-2. udp_client.py
import socket
from datetime import datetime
server_address = ('localhost', 6789)
max_size = 4096

print('Starting the client at', datetime.now())
client = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)  # 建立一個通訊端 UDP
client.sendto(b'Hey!', server_address)       # client 先send datagram
data, server = client.recvfrom(max_size)     # 再接收
print('At', datetime.now(), server, 'said', data)
client.close()

### TCP
1. the directory of 17-3. tcp_client.py and 17-4. tcp_server.py : run cmd.exe
2. first Run : python tcp_server.py
3. Second Run : python tcp_client.py
4. TCP maintains the client-server connection across multiple socket calls and remembers the client’s IP address.

In [None]:
# Example 17-3. tcp_client.py
import socket
from datetime import datetime

address = ('localhost', 6789)
max_size = 1000

print('Starting the client at', datetime.now())
client = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # SOCK_STREAM 建立一個通訊端 TCP 
client.connect(address)                                    # We also added a connect() call to set up the stream.
client.sendall(b'Hey!')
data = client.recv(max_size)
print('At', datetime.now(), 'someone replied', data)
client.close()

In [None]:
# Example 17-4. tcp_server.py
from datetime import datetime
import socket

address = ('localhost', 6789)
max_size = 1000

print('Starting the server at', datetime.now())
print('Waiting for a client to call.')
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)  # SOCK_STREAM 建立一個通訊端 TCP 
server.bind(address)

server.listen(5)                                    # configured to queue up to five client connections,
client, addr = server.accept()                      # gets the first available有效的 message as it arrives.

data = client.recv(max_size)                        # sets a maximum acceptable message length of1,000 bytes.
print('At', datetime.now(), client, 'said', data)
client.sendall(b'Are you talking to me?')
client.close()
server.close()


### sockets really operate at a low level:
1. UDP sends messages, but their size is limited and they’re not guaranteed[͵gærənˋti]保證 to reach their destination.
2. TCP sends streams of bytes, not messages. You don’t know how many bytes the system will send or receive with each call.
3. To exchange entire messages with TCP, you need some extra information to reassemble the full message from its segments: 
    a fixed message size (bytes), or the size of the full message, or some delimiting character.
4. Because messages are bytes, not Unicode text strings, you need to use the Python bytes type.
   For more information on that, see Chapter 12  
#### After all of this, if you find yourself interested in socket programming,check out the Python socket programming HOWTO for more details. https://docs.python.org/3/howto/sockets.html  

## Scapy
1. You may want to debug a web API, or track down some security issue. The scapy library and program provide a domain-specific language to create and inspect packets in Python,
2. pip install scapy
3. The docs are extremely thorough. https://scapy.readthedocs.io/en/latest/

## Netcat
1. Another tool to test networks and ports is Netcat, often abbreviated[əˋbrivɪ͵et]縮寫 to nc.
2. Here’s an example of an HTTP connnection to Google’s website, and requesting some basic information about its home page:
3. https://en.wikipedia.org/wiki/Netcat

In [None]:
nc www.google.com 80
HEAD / HTTP/1.1

## Networking Patterns:
You can build networking applications from some basic patterns:
1. The most common pattern is request-reply, also known as request-response or client-server. This pattern is 
   synchronous: the client waits until the server responds. You’ve seen many examples of request-reply in this book
   Your web browser is also a client,making an HTTP request to a web server, which returns a reply.
2. Another common pattern is push, or fanout: you send data to any available worker in a pool of processes.
   An example is a web server behind a load balancer.
3. The opposite of push is pull, or fanin: you accept data from one or more sources. An example would be a logger that   
    takes text messages from multiple processes and writes them to a single log file.
4. One pattern is similar to radio or television broadcasting: publish-subscribe(發布/訂閱), or pub-sub. With this pattern,    a publisher sends out data.In a simple pub-sub system, all subscribers would receive a copy.More often, subscribers
  can indicate that they’re interested only in certain types of data (often called a topic), and the publisher will
  send just those. So, unlike the push pattern, more than one subscriber might receive a given piece of data. If there’s no
  subscriber for a topic, the data are ignored.

# The Request-Reply Pattern

## ZeroMQ
1. ZeroMQ is a library, not a server.
2. ZeroMQ sockets do the things that you sort of expected plain sockets to do:
   (1). Exchange entire messages
   (2). Retry connections
   (3). Buffer data to preserve them when the timing between senders and receivers doesn’t line up
3. The online guide is well written and witty: https://zguide.zeromq.org/   
4. The Python examples are also viewable : https://github.com/booksbyus/zguide/tree/master/examples/Python

1. REQ (synchronous request)
2. REP (synchronous reply)
3. DEALER (asynchronous request)
4. ROUTER (asynchronous reply)
5. PUB (publish)
6. SUB (subscribe)
7. PUSH (fanout)
8. PULL (fanin)

1.pip install pyzmq

In [None]:
# Example 17-5. zmq_server.py
import zmq
host = '127.0.0.1'
port = 6789
context = zmq.Context()
server = context.socket(zmq.REP)   # e REP (for Reply) 
server.bind("tcp://%s:%s" % (host, port))

while True:
    # Wait for next request from client
    request_bytes = server.recv()
    request_str = request_bytes.decode('utf-8')
    print("That voice in my head says: %s" % request_str)
    reply_str = "Stop saying: %s" % request_str
    reply_bytes = bytes(reply_str, 'utf-8')
    server.send(reply_bytes)


In [None]:
#Example 17-6. zmq_client.py
import zmq
host = '127.0.0.1'
port = 6789
context = zmq.Context()
client = context.socket(zmq.REQ)    #  Its type is REQ (for REQuest), 請求
client.connect("tcp://%s:%s" % (host, port))
for num in range(1, 6):
    request_str = "message #%s" % num
    request_bytes = request_str.encode('utf-8')
    client.send(request_bytes)
    reply_bytes = client.recv()
    reply_str = reply_bytes.decode('utf-8')
    print("Sent %s, received %s" % (request_str, reply_str))

0. in cmd run :  python zmq_server.py &     ,  Start the client in the same window:  python zmq_client.py
1. tcp between processes, on one or more machines
2. ipc between processes on one machine
3. inproc between threads in a single process
![image.png](attachment:image.png)

## Other Messaging Tools
1. ZeroMQ is certainly not the only message-passing library that Python supports.
2 The Apache project, whose web server we saw in “Apache”, also maintains the ActiveMQ project, including several Python
interfaces using the simple-text STOMP protocol. https://activemq.apache.org/  , http://stomp.github.io/implementations.html
3. RabbitMQ is also popular, and it has useful online Python tutorials. https://www.rabbitmq.com/ ,
   https://www.rabbitmq.com/tutorials/tutorial-one-python.html
4. NATS is a fast messaging system, written in Go. https://nats.io/


# The Publish-Subscribe Pattern 
### redis
1. Publish-subscribe is not a queue but a broadcast.
2. start the subscriber first:in CMD Rnn:       python redis_sub.py
3. Next, start the publisher. It will send 10 messages and then quit:      python redis_pub.py

In [None]:
# Example 17-7. redis_pub.py   (發布者:publisher)
import redis
import random
conn = redis.Redis()
cats = ['siamese', 'persian', 'maine coon', 'norwegian forest']
hats = ['stovepipe', 'bowler', 'tam-o-shanter', 'fedora']
for msg in range(10):
    cat = random.choice(cats)
    hat = random.choice(hats)
    print('Publish: %s wears a %s' % (cat, hat))
    conn.publish(cat, hat)

In [None]:
# Example 17-8. redis_sub.py  (訂閱者: subscriber[səbˋskraɪbɚ])
import redis
conn = redis.Redis()
topics = ['maine coon', 'persian']
sub = conn.pubsub()
sub.subscribe(topics)
for msg in sub.listen():
    if msg['type'] == 'message':
        cat = msg['channel']
        hat = msg['data']
        print('Subscribe: %s wears a %s' % (cat, hat)

### ZeroMQ
1. ZeroMQ has no central server, so each publisher writes to all subscribers.
2. Notice that we call send_multipart() in the publisher and recv_multipart() in the subscriber. This makes it possible for us to send multipart messages and use the first part as the topic.
3. run in CMD : first : python zmq_sub.py ,  next: python zmq_pub.py

In [None]:
# Example 17-9. zmq_pub.py    publisher 
import zmq
import random
import time
host = '*'
port = 6789
ctx = zmq.Context()
pub = ctx.socket(zmq.PUB)
pub.bind('tcp://%s:%s' % (host, port))
cats = ['siamese', 'persian', 'maine coon', 'norwegian forest']
hats = ['stovepipe', 'bowler', 'tam-o-shanter', 'fedora']
time.sleep(1)
for msg in range(10):
    cat = random.choice(cats)
    cat_bytes = cat.encode('utf-8')
    hat = random.choice(hats)
    hat_bytes = hat.encode('utf-8')
    print('Publish: %s wears a %s' % (cat, hat))
    pub.send_multipart([cat_bytes, hat_bytes])


In [None]:
Example 17-10. zmq_sub.py
import zmq
host = '127.0.0.1'
port = 6789
ctx = zmq.Context()
sub = ctx.socket(zmq.SUB)
sub.connect('tcp://%s:%s' % (host, port))
topics = ['maine coon', 'persian']
for topic in topics:
    sub.setsockopt(zmq.SUBSCRIBE, topic.encode('utf-8'))
while True:
    cat_bytes, hat_bytes = sub.recv_multipart()
    cat = cat_bytes.decode('utf-8')
    hat = hat_bytes.decode('utf-8')
    print('Subscribe: %s wears a %s' % (cat, hat))

### Other Pub-Sub Tools
1. RabbitMQ is a well-known messaging broker, and pika is aPython API for it. 
See the pika documentation(https://www.rabbitmq.com/tutorials/tutorial-three-python.html) and 
a pub-sub tutorial(https://www.rabbitmq.com/tutorials/tutorial-three-python.html).
2. Go to the PyPi(https://pypi.org/) search window and type pubsub to find Python packages like pypubsub.
   (https://sourceforge.net/projects/pubsub/)
3. PubSubHubbub(https://github.com/pubsubhubbub/) enabl es subscribers to register callbacks with publishers.
4. NATS(https://nats.io/) is a fast, open source messaging system that supports pub-sub, request-reply, and queuing.

# Internet Services
1. Python has an extensive networking toolset,The official, comprehensive documentation is available online.
   https://docs.python.org/3/library/internet.html 

### Domain Name System
1.Computers have numeric IP addresses such as 85.2.101.94, but we remember names better than numbers. 
The Domain Name System (DNS) is a critical[ˋkrɪtɪk!] 關鍵性的 internet service that converts IP addresses to and
from names via a distributed database.

In [3]:
import socket
# gethostbyname() returns the IP address for a domain name
socket.gethostbyname('www.crappytaxidermy.com')
# answer: '66.6.44.4'

#  gethostbyname_ex() returns the name, a list of alternative[ɔlˋtɝnətɪv]替代的 names, and a list of addresses
socket.gethostbyname_ex('www.crappytaxidermy.com')


('crappytaxidermy.com', ['www.crappytaxidermy.com'], ['66.6.44.4'])

In [4]:
# The getaddrinfo() method looks up the IP address, but it also returns 
# enough information to create a socket to connect to it
socket.getaddrinfo('www.crappytaxidermy.com', 80)

[(<AddressFamily.AF_INET: 2>, 0, 0, '', ('66.6.44.4', 80))]

In [5]:
socket.getaddrinfo('www.crappytaxidermy.com', 80, socket.AF_INET,socket.SOCK_STREAM)

[(<AddressFamily.AF_INET: 2>,
  <SocketKind.SOCK_STREAM: 1>,
  0,
  '',
  ('66.6.44.4', 80))]

In [6]:
# These functions convert between service names and port numbers
import socket
socket.getservbyname('http')

80

In [7]:
socket.getservbyport(80)

'http'

### Python Email Modules
1. smtplib for sending email messages via Simple Mail Transfer Protocol (SMTP) :https://docs.python.org/3/library/smtplib.html
2. email for creating and parsing(解析) email messages
:https://docs.python.org/3/library/email.html
3. poplib for reading email via Post Office Protocol 3 (POP3)
https://docs.python.org/3/library/poplib.html
4. imaplib for reading email via Internet Message Access Protocol (IMAP)
https://docs.python.org/3/library/imaplib.html

5. If you want to write your own Python SMTP server, try smtpd(https://docs.python.org/3/library/smtpd.html), or the new asynchronous version aiosmtpd(https://aiosmtpd.readthedocs.io/en/latest/)

### Other Protocols
1. Using the standard ftplib module, you can push bytes around by using the
File Transfer Protocol (FTP). Although it’s an old protocol, FTP still performs very well.
https://docs.python.org/3/library/ftplib.html
2. You’ve seen many of these modules in various places in this book, but also
try the documentation for standard library support of internet protocols.
https://docs.python.org/3/library/internet.html    

# Web Services and APIs
1. Here are some interesting service APIs:
2. New York Times : https://developer.nytimes.com/
3. Twitter : https://python-twitter.readthedocs.io/en/latest/
4. Facebook : https://developers.facebook.com/tools
5. Weather Underground : http://www.wunderground.com/weather/api
6. Marvel Comics: http://developer.marvel.com/

# Data Serialization 
[͵sɪrɪəlɪˋzeʃən] 序列化 
1.The conversion between data in memory and byte sequences “on the wire” is called serialization or marshaling.
JSON is a popular serialization format

### Serialize with pickle
1. Python provides the pickle module to save and restore any object in a special binary format
2. Use dump() to pickle to a file, and load() to unpickle from one.
#### If pickle can’t serialize your data format, a newer third-party package called dill might

In [12]:
import pickle
import datetime
now1 = datetime.datetime.utcnow()
pickled = pickle.dumps(now1)      # 轉換成 binary format
print(pickled)
now2 = pickle.loads(pickled)      # 轉回 datetime
print(type(now2))
print(now1)   #datetime.datetime(2014, 6, 22, 23, 24, 19, 195722)
print(now2)   #datetime.datetime(2014, 6, 22, 23, 24, 19, 195722)

b'\x80\x04\x95*\x00\x00\x00\x00\x00\x00\x00\x8c\x08datetime\x94\x8c\x08datetime\x94\x93\x94C\n\x07\xe5\t\x06\x06/4\x05B(\x94\x85\x94R\x94.'
<class 'datetime.datetime'>
2021-09-06 06:47:52.344616
2021-09-06 06:47:52.344616


In [13]:
# pickle works with your own classes and objects, too

import pickle
class Tiny():
    def __str__(self):
        return 'tiny'

obj1 = Tiny()
print(obj1)

print(str(obj1))

pickled = pickle.dumps(obj1)  # 轉換成 binary format
print(pickled)            
obj2 = pickle.loads(pickled)  # 轉回 class Tiny()   
print(type(obj2))      # <__main__.Tiny object at 0x10076e550>
str(obj2)


tiny
tiny
b'\x80\x04\x95\x18\x00\x00\x00\x00\x00\x00\x00\x8c\x08__main__\x94\x8c\x04Tiny\x94\x93\x94)\x81\x94.'
<class '__main__.Tiny'>


'tiny'

### Other Serialization Formats
1. These binary data interchange formats are usually more compact and faster than XML or JSON:
2. MsgPack : http://msgpack.org/
3. Protocol Buffers : https://code.google.com/p/protobufa
4. Avro : http://avro.apache.org/docs/current 
5. Thrift : http://thrift.apache.org/
6. Lima : https://lima.readthedocs.io/
7. Serialize is a Python frontend to other formats, including JSON,YAML, pickle, and MsgPack.
   https://pypi.org/project/Serialize
A benchmark of various Python serialization packages. https://oreil.ly/S3ESH

1. Some third-party packages interconvert objects and basic Python data types
(allowing further conversion to/from formats like JSON), and provide validation of the following:
(1) Data types
(2) Value ranges
(3) Required versus optional data
2. These include:
(1) Marshmallow : https://marshmallow.readthedocs.io/en/3.0
(2) Pydantic—uses type hints, so requires at least Python 3.6 : https://pydantic-docs.helpmanual.io/
(3) TypeSystem :  https://pydantic-docs.helpmanual.io/

# Remote Procedure Calls (RPCs) 
1. Your local machine: (1) Serializes your function arguments into bytes. (2) Sends the encoded bytes to the remote machine.
2. The remote machine : (1)Receives the encoded request bytes. (2) Deserializes the bytes back to data structures.
(3) Finds and calls the service function with the decoded data.(4)Encodes the function results.
(5) Sends the encoded bytes back to the calle
3. And finally, the local machine that started it all:  Decodes the bytes to return values

### XML RPC 
1. The standard library includes one RPC implementation that uses XML as the exchange format: xmlrpc.
2. You define and register functions on the server, and the client calls them as though they were imported.    

In [None]:
# Example 17-11. xmlrpc_server.py 
from xmlrpc.server import SimpleXMLRPCServer
def double(num):
    return num * 2
server = SimpleXMLRPCServer(("localhost", 6789)) # The server starts up on an address and port
server.register_function(double, "double")   # register the function to make it available to clients via RPC
server.serve_forever()  #  Finally, start serving and carry on.


In [None]:
# Example 17-12. xmlrpc_client.py
import xmlrpc.client
proxy = xmlrpc.client.ServerProxy("http://localhost:6789/")
num = 7
result = proxy.double(num)
print("Double %s is %s" % (num, result))

1. start the server and then run the client: (IN CMD)
2. python xmlrpc_server.py
3. python xmlrpc_client.py

### JSON RPC
1. Python JSON-RPC libraries, but the simplest one I’ve found comes in two parts: client(https://oreil.ly/8npxf) and server(https://oreil.ly/P_uDr).
2. pip install jsonrpcserver and pip install jsonrpclient.

In [None]:
# Example 17-13. jsonrpc_server.py
from jsonrpcserver import method, serve

@method
def double(num):
    return num * 2

if __name__ == "__main__":
     serve()

In [None]:
# Example 17-14. jsonrpc_client.py    1.python jsonrpc_server.py &   2.python jsonrpc_client.py 
from jsonrpcclient import request
num = 7
response = request("http://localhost:5000", "double", num=num)
print("Double", num, "is", response.data.result)

### MessagePack RPC
1.The encoding library MessagePack has its own Python RPC implementation.
https://github.com/msgpack-rpc/msgpack-rpc-python
2. Here’s how to install it:$ pip install msgpack-rpc-python(多年以前更新，安裝後無法使用Jupyter notebook)
3. IN cmd run  : first : python msgpack_server.py
next:  python msgpack_client.py

In [None]:
# Example 17-15. msgpack_server.py
from msgpackrpc import Server, Address
class Services():
    def double(self, num):
    return num * 2
server = Server(Services())
server.listen(Address("localhost", 6789))
server.start()

In [None]:
from msgpackrpc import Client, Address
client = Client(Address("localhost", 6789))
num = 8
result = client.call('double', num)   # 可以直接呼叫 Server 的double 函式
print("Double %s is %s" % (num, result)

### Zerorpc : Written by the developers of Docker
1. http://www.zerorpc.io/
2. pip install zerorpc
3. zerorpc uses ZeroMQ and MsgPack to connect clients and servers. It
magically exposes functions as RPC endpoints
4.The site has many more examples.
:https://github.com/0rpc/zerorpc-python


In [None]:
# Example 17-17. zerorpc_server.py
import zerorpc
class RPC():
    def double(self, num):
        return 2 * num
server = zerorpc.Server(RPC())
server.bind("tcp://0.0.0.0:4242")
server.run()

In [None]:
# Example 17-18. zerorpc_client.py
import zerorpc
client = zerorpc.Client()
client.connect("tcp://127.0.0.1:4242")
num = 7
result = client.double(num)
print("Double", num, "is", result)

 in CMD : 
 1. first :  python zerorpc_server.py &
 2. Next :  python zerorpc_client.py 

### gRPC : (Remember to learn) 
1. Google created gRPC(https://grpc.io/) as a portable and fast way to define and connect services. It encodes data as protocol buffers.
https://developers.google.com/protocol-buffers/
2. Install the Python parts: in CMD
3. gRPC:    python -m pip install grpcio 
4. gRPC tools: python -m pip install grpcio-tools
5. The Python client docs(https://grpc.io/docs/languages/python/quickstart/) are very detailed, so I’m giving only a brief overview here. You may also like this separate tutorial.(https://www.semantics3.com/blog/a-simplified-guide-to-grpc-in-python-6c4e25f0c506/)

1. To use gRPC, you write a .proto file to define a service and its rpc
methods.
2. An rpc method is like a function definition (describing its arguments and
return types) and may specify one of these networking patterns:
3. Request-response (sync or async)
4. Request-streaming response
5. Streaming request-response (sync or async)
6. Streaming request-streaming response
7. Single responses can be blocking or asynchronous. Streaming responses are
iterated.
8. Next, you would run the grpc_tools.protoc program to create Python
code for the client and server. gRPC handles the serialization and network
communication; you add your application-specific code to the client and
server stubs.
9. gRPC is a top-level alternative to web REST APIs. It seems to be a better
fit than REST for inter-service communication, and REST may be preferred
for public APIs.

### Twirp
Twirp is similar to gRPC, but claims to be simpler. You define a .proto file
as you would with gRPC, and twirp can generate Python code to handle the
client and server ends.
https://blog.twitch.tv/en/2018/01/16/twirp-a-sweet-new-rpc-framework-for-go-5f2febbf35f/

# Remote Management Tools
1. Salt:(https://docs.saltproject.io/en/getstarted/system/python.html)  is written in Python. It started as a way to implement remote
execution, but grew to a full-fledged(全面性) systems management
platform. Based on ZeroMQ rather than SSH, it can scale to
thousands of servers.
2. Puppet(http://puppetlabs.com/) and Chef(https://www.chef.io/products/chef-infra): are popular and closely tied to Ruby.
3. The Ansible(https://www.ansible.com/) package:  which like Salt is written in Python, is also
comparable. It’s free to download and use, but support and some
add-on packages require a commercial license. It uses SSH by
default and does not require any special software to be installed on
the machines that it will manage.

# Big Fat Data: 
1. Developers found that it was faster to distribute and analyze data on many
networked machines than on individual ones. They could use algorithms
that sounded simplistic but actually worked better overall with massively
distributed data. One of these is MapReduce, which spreads a calculation
across many machines and then gathers the results. It’s similar to working
with queues.


### Hadoop
1. After Google published its MapReduce results in a paper(https://static.googleusercontent.com/media/research.google.com/zh-TW//archive/mapreduce-osdi04.pdf), Yahoo followed
with an open source Java-based package named Hadoop (named after the
toy stuffed elephant of the lead programmer’s son).
2. The phrase big data applies here. Often it just means “data too big to fit on my machine”: data that exceeds the disk, memory, CPU time, or all of the above. To some organizations, if big data is mentioned somewhere in a
question, the answer is always Hadoop. Hadoop copies data among
machines, running them through map (scatter) and reduce (gather)
programs, and saving the results on disk at each ste
3. This batch process can be slow. A quicker method called Hadoop streaming
works like Unix pipes, streaming the data through programs without
requiring disk writes at each step. You can write Hadoop streaming
programs in any language, including Python.
4. Many Python modules have been written for Hadoop, and some are
discussed in the blog post “A Guide to Python Frameworks for Hadoop”.(https://www.slideshare.net/InfoQ/a-guide-to-python-frameworks-for-hadoop, https://blog.cloudera.com/)
Spotify, known for streaming music, open sourced its Python component
for Hadoop streaming, Luigi. (https://github.com/spotify/luigi)

1. Spark: http://spark.apache.org/docs/latest/index.html
A rival named Spark was designed to run 10 to 100 times faster than
Hadoop. It can read and process any Hadoop data source and format. Spark
includes APIs for Python and other languages. You can find the installation
documents online.(https://spark.apache.org/docs/latest/quick-start.html)
2. Disco :(http://discoproject.org/)
Another alternative to Hadoop is Disco, which uses Python for MapReduce
processing and Erlang for communication. Alas, you can’t install it with
pip; see the documentation.(http://spark.apache.org/downloads.html)(https://disco.readthedocs.io/en/latest/start/download.html)
3. Dask :(https://dask.org/)
Dask is similar to Spark, although it’s written in Python and is largely used
with scientific Python packages like NumPy, Pandas, and scikit-learn. It can
spread tasks across thousand-machine clusters.
To get Dask and all of its extra helpers:
4. pip install dask[complete]
5. See Chapter 22 for related examples of parallel programming, in which a
large structured calculation is distributed among many machines

# Clouds
The eight fallacies of distributed computing, according to Peter
Deutsch, are as follows:
1. The network is reliable.
2. Latency is zero.
3. Bandwidth is infinite.
4. The network is secure.
5. Topology doesn’t change.
6. There is one administrator.
7. Transport cost is zero.
8. The network is homogeneous.

9.The big cloud vendors are:
Amazon (AWS)
Google
Microsoft Azure

### Amazon Web Services
1. They borrowed or innovated many solutions,
evolving into Amazon Web Services (AWS : https://aws.amazon.com/tw/), which now dominates the market. The official Python AWS library is boto3
2. documentation : https://boto3.amazonaws.com/v1/documentation/api/latest/index.html
4. SDK pages : https://aws.amazon.com/tw/sdk-for-python/
3.  pip install boto3

### Google Cloud 
Google uses Python a lot internally, and it employs some prominent Python
developers (even Guido van Rossum himself, for some time). From its main
and Python pages, you can find details on its many services.
(https://cloud.google.com/python)

### Microsoft Azure
Microsoft caught up with Amazon and Google with its cloud offering,
Azure(https://azure.microsoft.com/zh-tw/). See Python on Azure to learn how to develop and deploy Python (https://azure.microsoft.com/en-us/develop/python/) applications.

###  OpenStack
OpenStack(https://www.openstack.org/) is an open source framework of Python services and REST APIs. Many of the services are similar to those in the commercial[kəˋmɝʃəl]商業的 clouds.

# Docker
1. Docker applied the container name and
analogy to a virtualization method using some little-known Linux features.
Containers are much lighter than virtual machines, and a bit heavier than
Python virtualenvs. They allow you to package an application separately
from other applications on the same machine, sharing only the operating
system kernel.
To install Docker’s Python client library(https://pypi.org/project/docker/) : pip install docker

### Kubernetes
1. Containers caught on and spread through the computing world. Eventually[ɪˋvɛntʃʊəlɪ]最後, people needed ways to manage multiple containers and wanted to automate some of the manual steps that have been usually required in large distributed systems:
2. Failover
3. Load balancing
4. Scaling up and down
It looks like Kubernetes(https://kubernetes.io/zh/) is leading the pack in this new area of container orchestration[͵ɔrkɪsˋtreʃən]和諧的結合.
To install the Python client library(https://github.com/kubernetes-client/python): $ pip install kubernete