### Python for Everybody

## 06. [Intermediate Strings](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/intermediate-strings)

Extract email domain:

In [1]:
data = 'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008'
atpos = data.find('@')  # -> 21
sppos = data.find(' ', atpos)  # -> 31
host = data[atpos+1:sppos]  # -> uct.ac.za
print(host) 

uct.ac.za


## 07. [Files as a Sequence](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/files-as-a-sequence)

Search through a file and extract lines that start with `From:`:

In [2]:
fhand = open('datasets/mbox-short.txt')
for line in fhand:
    line = line.rstrip() # -> removes whitespace on the right hand side which results in removing \n
    if line.startswith('From:'):
        print(line)

From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
From: zqian@umich.edu
From: rjlowe@iupui.edu
From: cwen@iupui.edu
From: cwen@iupui.edu
From: gsilver@umich.edu
From: gsilver@umich.edu
From: zqian@umich.edu
From: gsilver@umich.edu
From: wagnermr@iupui.edu
From: zqian@umich.edu
From: antranig@caret.cam.ac.uk
From: gopal.ramasammycook@gmail.com
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: david.horwitz@uct.ac.za
From: stephen.marquard@uct.ac.za
From: louis@media.berkeley.edu
From: louis@media.berkeley.edu
From: ray@media.berkeley.edu
From: cwen@iupui.edu
From: cwen@iupui.edu
From: cwen@iupui.edu


## 09. [Dictionaries: Common Applications](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/dictionaries-common-applications)

Count elements method 1:

In [3]:
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names:
    if name not in counts:
        counts[name] = 1
    else:
        counts[name] = counts[name] + 1
print(counts)

{'csev': 2, 'cwen': 2, 'zqian': 1}


-----------------------

Count elements method 2. The same result using `.get()`:

In [4]:
# the same result using .get()
counts = dict()
names = ['csev', 'cwen', 'csev', 'zqian', 'cwen']
for name in names:
    counts[name] = counts.get(name, 0) + 1
print(counts)

{'csev': 2, 'cwen': 2, 'zqian': 1}


## 10. [Comparing and Sorting Tuples](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/comparing-and-sorting-tuples)

Extract the top 10 most common words:

In [5]:
fhand = open('datasets/mbox-short.txt')
counts = dict()
for line in fhand:
    words = line.split()
    for word in words:
        counts[word] = counts.get(word, 0) + 1

lst = list()
for key, val in counts.items():
    newtup = (val, key)
    lst.append(newtup)

lst = sorted(lst, reverse=True)

for val, key in lst[:10]:
    print(key, val)

Jan 352
2008 324
by 245
Received: 243
-0500 219
from 218
4 203
with 194
Fri, 183
id 136


---------------

Sort dictionary based on the value

In [6]:
c = {'a': 10, 'b':2, 'c':50}
sorted_c = sorted([(v, k) for k, v in c.items()], reverse=True)
print([(v, k) for k, v in sorted_c])

[('c', 50), ('a', 10), ('b', 2)]


## 11. [RegEx](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/regular-expressions)

Regular expression quick guide:

`^` Matches the __beginning__ of a line  
`$` Matches the __end__ of the line  
`.` Matches __any__ character  
`\s` Matches __whitespace__  
`\S` Matches any __non-whitespace__ character  
`*` __Repeats__ a character zero or more times  
`*?` __Repeats__ a character zero or more times (non-greedy)  
`+` __Repeats__ a character one or more times  
`+?` __Repeats__ a character one or more times (non-greedy)  
`[aeiou]` Matches a single character in the listed __set__  
`[^XYZ]` Matches a single character __not in__ the listed __set__  
`[a-z0-9]` The set of characters can include a __range__. Square bracket represent a one character allowed out of the range described inside the brackets  
`(` Indicates where string __extraction__ is to start  
`)` Indicates where string __extraction__ is to end


## 11. [RegEx (2)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/regular-expressions-matching-and-extracting-data)

### Example 1:
`[0-9]+` Finds one or more digits 

In [7]:
import re

x = 'My 2 farorite numbers are 23 and 45'
y = re.findall('[0-9]+', x)
print(y)

['2', '23', '45']


-----------------

### Example 2A: Greedy matching

`^F.+:` Finds string which starts with `F` followed by any characters and finishes with `:`
* `^F` - first character in the match is an F
* `.+` - one or more `any` characters 
* `:` - last character in the match is a `:`

__Greedy__ means that out of two options which comply here _'From:'_ and _'From: Using the:'_ , the alghorithm will choose the longer one. 

In [8]:
# Greedy Matching
x = 'From: Using the: character'
y = re.findall('^F.+:', x)
print(y)

['From: Using the:']


### Example 2B: Non-greedy matching
`^F.+?:` Finds string which starts with `F` followed by any character and finishes with `:`
* `^F` - the first character in the string is F, followed by
* `.+?` - one or more `any` characters, but `non-greedy` 
* `:` - last character in the match is `:`

__Non-greedy__ means that out of two options which comply here: _'From:'_ and _'From: Using the:'_ the algorighm will choose the shorter one

In [9]:
# Greedy Matching
x = 'From: Using the: character'
y = re.findall('^F.+?:', x)
print(y)

['From:']


-----------------

### Example 3A: Fine-Tuning String Extraction
`\S+@\S+` Finds a match with one or more non-blank character, then '@' in the middle and non-blank character following
* `\S+` - one or more non-blank character, followed by
* `@` - '@' sign, followed by
* `\S+` - one or more non-blank character

In [10]:
x = 'From stephen.marquard@uct.az.za Sat Jan  5 09:14:16 2008'
y = re.findall('\S+@\S+', x)
print(y)

['stephen.marquard@uct.az.za']


### Example 3B: Fine-Tuning String Execution (2)
In the following example string outside `(`parentheses`)` are not part of the match but they are required to match the syntax

* `^From ` - from at the beginning of a string followed up by a space
* `(\S+@\S+)` - extract one or more-non blank character, followed by '@' sign, followed by one or more non-blank character

In [11]:
x = 'From stephen.marquard@uct.az.za Sat Jan  5 09:14:16 2008'
y = re.findall('^From (\S+@\S+)', x)
print(y)

['stephen.marquard@uct.az.za']


## 11. [RegEx (3)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/regular-expressions-practical-applications)

### Example 4: 
`@([^ ]*)`
* `@` - after '@' (excluded)
* `[^ ]*` - find any number of `non-blank` character

In [12]:
x = 'From stephen.marquard@uct.az.za Sat Jan  5 09:14:16 2008'
y = re.findall('@([^ ]*)', x)
print(y)

['uct.az.za']


-----------------

### Example 5:

`'^From .*@([^ ]*)'`\
* `^From` - string must contain 'From ' at the beginning, followed by
* `.*` - any character, followed by
* `@` - '@' sign and then
* `([^ ])` - extract any non-blank character following the '@' sign

In [13]:
x = 'From stephen.marquard@uct.az.za Sat Jan  5 09:14:16 2008'
y = re.findall('^From .*@([^ ]*)', x)
print(y)

['uct.az.za']


-----------------

### Example 6: 
* `\$` - backslash with a dollar sign searches for a special character 

In [14]:
x = 'We just received $10.00 for cookies'
y = re.findall('\$[0-9.]+', x)
print(y)

['$10.00']


## 12. [Networking: Write a Web Browser (1)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/networking-write-a-web-browser)

The following code creates a simple web browser using `socket` library

Returns a HTTP Header and HTTP Body

In [15]:
import socket

# Make the socket
mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# Connect to web server
mysock.connect(('data.pr4e.org', 80))
# Talk to web server with string converted to UTF-8 byte with encoder
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode()  # required to change the original code: \n\n to \r\n\r\n
mysock.send(cmd)

# Receive loop
while True:
    # Receive 512 characters at a time
    data = mysock.recv(512)
    # If receive 0 characters then it's the end of the stream, connection closed
    if (len(data) < 1):
        break
    # convert byte data to string with decode
    print(data.decode())

# Close the socket
mysock.close()

HTTP/1.1 200 OK
Date: Sun, 13 Jun 2021 22:08:21 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already s
ick and pale with grief



## 12. [Networking: Using urllib in Python (2)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/networking-using-urllib-in-python)

Repeat the same web browser procedure using urllib

`urllib` returns an onject which is like a file handler and can be put in `for` clause

In [16]:
import urllib.request

fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
    print(line.decode().strip())

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief


And because the returned object is like a file it can be treated as a file...

In [17]:
import urllib.request

counts = dict()
fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt')
for line in fhand:
    words = line.decode().split()
    for word in words:
        counts[word] = counts.get(word, 0) + 1
print(counts)

{'But': 1, 'soft': 1, 'what': 1, 'light': 1, 'through': 1, 'yonder': 1, 'window': 1, 'breaks': 1, 'It': 1, 'is': 3, 'the': 3, 'east': 1, 'and': 3, 'Juliet': 1, 'sun': 2, 'Arise': 1, 'fair': 1, 'kill': 1, 'envious': 1, 'moon': 1, 'Who': 1, 'already': 1, 'sick': 1, 'pale': 1, 'with': 1, 'grief': 1}


## 12. [Networking: Web Scraping with Python (3)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/networking-web-scraping-with-python)

In [18]:
import urllib.request
from bs4 import BeautifulSoup
import ssl

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

url = 'http://www.dr-chuck.com/page1.htm'
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
    print(tag.get('href', None))

http://www.dr-chuck.com/page2.htm


## 13. [Web Services: XML Schema (1)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/web-services-xml-schema)


In [19]:
import xml.etree.ElementTree as ET

data = '''<person>
  <name>Chuck</name>
  <phone type="intl">+1 734 303 4456</phone>
  <email hide="yes"/>
</person>'''

# parse a string of data (XML) to create an object which we can query to pull data out 
tree = ET.fromstring(data)
print(f"Name: {tree.find('name').text}")
print(f"Attr: {tree.find('email').get('hide')}")
print(f"Phone type: {tree.find('phone').get('type')}")
print(f"Phone text: {tree.find('phone').text}")

Name: Chuck
Attr: yes
Phone type: intl
Phone text: +1 734 303 4456


In [20]:
import xml.etree.ElementTree as ET

input_ = '''<stuff>
    <users>
        <user x="2">
            <id>001</id>
            <name>Chuck</name>
        </user>
        <user x="7">
            <id>009</id>
            <name>Brent</name>
        </user>
    </users>
</stuff>
'''

# parse a string of data (XML) to create an object which we can query to pull data out 
stuff = ET.fromstring(input_)
lst = stuff.findall('users/user')

print(f"User count: {len(lst)}")

for item in lst:
    print(f"Name: {item.find('name').text}")
    print(f"Id: {item.find('id').text}")
    print(f"Attribute: {item.get('x')}")

User count: 2
Name: Chuck
Id: 001
Attribute: 2
Name: Brent
Id: 009
Attribute: 7


## 13. [Web Services - JSON (2)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/web-services-json)


In [21]:
import json

data = '''{
    "name": "Chuck",
    "phone": {
        "type": "intl",
        "number": "+1 734 303 4456"
    },
    "email": {
        "hide": "yes"
    }
}'''

info = json.loads(data)  # loads -> load from string
print(f"Name: {info['name']}")
print(f"Hide: {info['email']['hide']}")
print(f"Phone text: {info['phone']['number']}")
print(f"Phone type: {info['phone']['type']}")

Name: Chuck
Hide: yes
Phone text: +1 734 303 4456
Phone type: intl


In [22]:
import json

input_ = '''[
    {
    "id": "001",
    "x": "2",
    "name": "Chuck"
    }, 
    {
    "id": "009",
    "x": "7",
    "name": "Brent"
    }
]'''

info = json.loads(input_)
print(f"User count: {len(info)}")
for item in info:
    print(f"Name: {item['name']}")
    print(f"Id {item['id']}")
    print(f"Attribute: {item['x']}")

User count: 2
Name: Chuck
Id 001
Attribute: 2
Name: Brent
Id 009
Attribute: 7


## 13. [Web Services: APIs (3)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/web-services-apis)

The below is a sample code to connect with Google Maps API and retrieve data for specified location. \
The intention was to print latitude and longitude for any provided location. \
Since Google changed it's terms and condisions this is no longer available without a Google Cloud account. \
The code instead prints the error message. \
An example output is provided further below. 

This request denied status is explained in the following section.

In [23]:
import urllib.request, urllib.parse, urllib.error
import json

serviceurl = 'http://maps.googleapis.com/maps/api/geocode/json?'

while True:
    address = input('Enter location: ')
    if len(address) < 1: break
        
    url = serviceurl + urllib.parse.urlencode({'address': address})
    print(f"Retrieving: {url}")
    
    uh = urllib.request.urlopen(url)
    data = uh.read().decode()
    print(f"Retrieved: {len(data)}, characters")
    
    try:
        js = json.loads(data)
    except:
        js = None
    
    if not js or 'status' not in js or js['status'] != 'OK':
        print("==== Failure To Retrieve ====")
        print(data)
        continue
    
    lat = js["results"][0]["geometry"]["location"]["lat"]
    lng = js["results"][0]["geometry"]["location"]["lng"]
    print(f'lat: {lat}, lng: {lng}')
    location = js['results'][0]['formatted_address']
    print(location)

Enter location: Warrington
Retrieving: http://maps.googleapis.com/maps/api/geocode/json?address=Warrington
Retrieved: 237, characters
==== Failure To Retrieve ====
{
   "error_message" : "You must use an API key to authenticate each request to Google Maps Platform APIs. For additional information, please refer to http://g.co/dev/maps-no-account",
   "results" : [],
   "status" : "REQUEST_DENIED"
}

Enter location: 


--------------------------------------

#### Example output from service

In [24]:
maps = '''{
    "status" : "OK",
    "results": [
        {
            "geometry": {
                "location_type": "APPROXIMATE",
                "location": {
                    "lat": 42.2808256,
                    "lng": -83.7430378
                }
            },
            "address_components": [
                {
                    "long_name": "Ann Arbor",
                    "types": [
                        "locality",
                        "political"
                    ],
                    "short_name": "Ann Arbor"
                }
            ],
            "formatted_address": "Ann Adbor, MI, USA",
            "types": [
                "locality",
                "political"
            ]
        }
    ]
}'''

In [25]:
import json

data = maps
print(f"Retrieved: {len(data)}, characters")

try:
    js = json.loads(data)
    print('OK')
except:
    js = None

if not js or 'status' not in js or js['status'] != 'OK':
    print("==== Failure To Retrieve ====")
    print(data)

if js['status'] == 'OK':
    lat = js["results"][0]["geometry"]["location"]["lat"]
    lng = js["results"][0]["geometry"]["location"]["lng"]
    print(f'lat: {lat}, lng: {lng}')
    location = js['results'][0]['formatted_address']
    print(location)

Retrieved: 735, characters
OK
lat: 42.2808256, lng: -83.7430378
Ann Adbor, MI, USA


## 13. [Web Services: API Rate Limiting and Security (4)](https://www.freecodecamp.org/learn/scientific-computing-with-python/python-for-everybody/web-services-api-rate-limiting-and-security)

* The compute resources to run these APIs are not "free"
* The data provided by these APIs is usually valuable
* The data providers might limit the number of requests per day, demand an API "key", or even charge for usage
* They might change the rules as things progress...