### Finding Numbers in a Haystack

In [1]:
import re
pile = open('regex_sum_952396.txt')
gold = pile.read()
copier = re.findall("[0-9]+", gold)
dice = [int(i) for i in copier]
sum = 0
for k in dice:
    sum += k
print(sum)

291775


### Understanding the Request / Response Cycle

In [3]:
import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('data.pr4e.org', 80))
cmd = 'GET http://data.pr4e.org/intro-short.txt HTTP/1.0\r\n\r\n'.encode()
mysock.send(cmd)

while True:
    data = mysock.recv(512)
    if len(data) < 1:
        break
    print(data.decode(),end='')

mysock.close()

HTTP/1.1 200 OK
Date: Mon, 07 Sep 2020 06:30:02 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "1d3-54f6609240717"
Accept-Ranges: bytes
Content-Length: 467
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

Why should you learn to write programs?

Writing programs (or programming) is a very creative 
and rewarding activity.  You can write programs for 
many reasons, ranging from making your living to solving
a difficult data analysis problem to having fun to helping
someone else solve a problem.  This book assumes that 
everyone needs to know how to program, and that once 
you know how to program you will figure out what you want 
to do with your newfound skills.  


### Scraping HTML Data with BeautifulSoup

In [5]:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import ssl

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
html = urlopen('http://py4e-data.dr-chuck.net/comments_952398.html', context=ctx).read()
soup = BeautifulSoup(html, "html.parser")
tags = soup('span')
sum = 0
coun = 0
print('Enter - ')
for tag in tags:
    coun += 1    
    sum += int(tag.contents[0])
print('Count', coun, '\nSum', sum)

Enter - 
Count 50 
Sum 2701


### Following Links in HTML Using BeautifulSoup

In [6]:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter URL: ')
num = input('Enter count: ')
pos = input('Enter position: ')
print('Retrieving: ', url)
for times in range(int(num)):
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    print('Retrieving: ', tags[int(pos)-1].get('href', None))
    url = tags[int(pos)-1].get('href', None)

Enter URL: http://py4e-data.dr-chuck.net/known_by_Caelyn.html
Enter count: 7
Enter position: 18
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Caelyn.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Aedyn.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Afifah.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Naideen.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Selina.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Elidh.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Scarlett.html
Retrieving:  http://py4e-data.dr-chuck.net/known_by_Anisa.html


### Extracting Data from XML

In [1]:
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = input('Enter location: ')
print ('Retrieving ', url)
html = urllib.request.urlopen(url, context=ctx).read()
print ('Retrieved', len(html), 'characters')
tree = ET.fromstring(html)
print ('Count: ',len(tree.findall('.//count')))
total = 0
for r in tree.findall("./comments/comment"):
    total += int(r.find('count').text)
print ('Sum: ', total)

Enter location: http://py4e-data.dr-chuck.net/comments_952400.xml
Retrieving  http://py4e-data.dr-chuck.net/comments_952400.xml
Retrieved 4216 characters
Count:  50
Sum:  2528


### Extracting Data from JSON

In [2]:
import urllib.request, urllib.parse, urllib.error
import json

url = input('Enter location: ')
data = urllib.request.urlopen(url).read()
info = json.loads(data)
info = info['comments']
print ('Retrieving', url, '\nRetrieved', len(data), 'caracters', '\nCount:', len(info))
num = 0
for item in info:
    num += int(item['count'])
print ('Sum:', num)

Enter location: http://py4e-data.dr-chuck.net/comments_952401.json
Retrieving http://py4e-data.dr-chuck.net/comments_952401.json 
Retrieved 2729 caracters 
Count: 50
Sum: 2595


### Using the GeoJSON API

In [6]:
import urllib.request, urllib.parse, urllib.error
import json
import ssl

api_key = False
# If you have a Google Places API key, enter it here
# api_key = 'AIzaSy___IDByT70'
# https://developers.google.com/maps/documentation/geocoding/intro

if api_key is False:
    api_key = 42
    serviceurl = 'http://py4e-data.dr-chuck.net/json?'
else :
    serviceurl = 'https://maps.googleapis.com/maps/api/geocode/json?'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

while True:
    address = input('Enter location: ')
    if len(address) < 1: break

    parms = dict()
    parms['address'] = address
    if api_key is not False: parms['key'] = api_key
    url = serviceurl + urllib.parse.urlencode(parms)

    print('Retrieving', url)
    uh = urllib.request.urlopen(url, context=ctx)
    data = uh.read().decode()
    print('Retrieved', len(data), 'characters')

    try:
        js = json.loads(data)
    except:
        js = None

    if not js or 'status' not in js or js['status'] != 'OK':
        continue

    location = js['results'][0]['place_id']
    print('Place id', location)

Enter location: Elon University
Retrieving http://py4e-data.dr-chuck.net/json?address=Elon+University&key=42
Retrieved 2298 characters
Place id ChIJ3RFkxAkpU4gRCZzO6Zydjk4
Enter location: 
