# Week 1.4: Generators and Map Reduce 
Author: Juana Karina Diaz Barba
  

  
### Exercise 1: refactoring your own code.
In this exercise I was asked to replace the for-loops created in **week1.3 Multiple-class** for lists-comprehensions.  

In my current implementation of the classes I already use list-comprehensions. I have sumarized them here:
- In the **AverageYear** class I use one to get the average value per month do not taking into account the keys which are not months:   
     ```monthly_values = [eval(value) for key, value in year_data.items() if key in self.months]```
- In the **AverageMonth** class I use one to create a empty dictionary with the months as keys:  
```self.month_mean = {month: [] for month in self.months}```
- In the **AverageMonth** class I use one list-comprehensions to collect the monthly temperature anomalies in a period of 5 years:  
```[self.month_mean[month].append(float(year_data[month])) for month in self.months if month in year_data]```  
- In the **AverageMonth** class I use one to calculate the mean per month:  
```monthly_means = {month: mean(temperatures) for month, temperatures in self.month_mean.items()}```

### Exercise 2: functions with data  
Create a function that has two parameters: one for data (you can assume this to be a list) and one for the function to be applied to that data. Let the function return a list with new values that are created by applying the function to all the values.

In [22]:
import math
def circle_area(radius_list):
    '''Function to calculate the area of a circle'''
    area_list =[math.pi * math.pow(radius, 2)  for radius in radius_list]
    return area_list

# Generate a list with random numbers
radius_data = list(range(10))
radius_data
circle_area(radius_data)

[0.0,
 3.141592653589793,
 12.566370614359172,
 28.274333882308138,
 50.26548245743669,
 78.53981633974483,
 113.09733552923255,
 153.93804002589985,
 201.06192982974676,
 254.46900494077323]

Now enhance your function so that it can take in an arbitrary number of functions that all need to be applied to the given data.  

So if I were to give two functions to this function, it should return a list of two lists.  Make use of list comprehensions in your elaboration.

In [27]:
# Creating a function with a variable number of arguments, functions in this case
def cicle_calculations(radius_list, *functions):
    '''Make circle calculations. It takes an arbitrary number of functions 
    provided by the user and are applied to the same set of data'''
    results_list = []
    for funct in functions:
        temp_results = [funct(radius) for radius in radius_list]
        results_list.append(temp_results)
    return results_list

Testing the implementation with one function

In [30]:
# Generate a list with random numbers which are going to be the radius
radius_data = list(range(5))
print(radius_data)

# Define functions to be applied
functions = [lambda x: math.pi * math.pow(x, 2)] # Calculates circle area

# Apply functions to the data
result = cicle_calculations(radius_data, *functions)
print(result)

[0, 1, 2, 3, 4]
[[0.0, 3.141592653589793, 12.566370614359172, 28.274333882308138, 50.26548245743669]]


Testing the implementation with two functions

In [32]:
functions = [lambda x: math.pi * math.pow(x, 2),  # Calculates circle area
             lambda x: math.pi * x * 2]            # Calculates circle perimeter

# Apply functions to the data
result = cicle_calculations(radius_data, *functions)
print(result)

[[0.0, 3.141592653589793, 12.566370614359172, 28.274333882308138, 50.26548245743669], [0.0, 6.283185307179586, 12.566370614359172, 18.84955592153876, 25.132741228718345]]


### Exercise 3: refactoring other people's code.

Code donwloaded to be refactored:

In [33]:
""" 
This is a crawler program using beautifulsoup.
It crawls the website "https://sport050.nl/sportaanbieders/alle-aanbieders/"
and fetches all the sport suppliers in the city of Groningen. It outputs 
a csv-file with the url;phone-number;email-address of all the suppliers it can find.
"""

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import re


def hack_ssl():
    """ ignores the certificate errors"""
    ctx = ssl.create_default_context()
    ctx.check_hostname = False
    ctx.verify_mode = ssl.CERT_NONE
    return ctx


def open_url(url):
    """ reads url file as a big string and cleans the html file to make it
        more readable. input: url, output: soup object
    """
    ctx = hack_ssl()
    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    return soup


def read_hrefs(soup):
    """ get from soup object a list of anchor tags,
        get the href keys and and prints them. Input: soup object
    """
    reflist = []
    tags = soup('a')
    for tag in tags:
        reflist.append(tag)
    return reflist

def read_li(soup):
    lilist = []
    tags = soup('li')
    for tag in tags:
        lilist.append(tag)
    return lilist

def get_phone(info):
    reg = r"(?:(?:00|\+)?[0-9]{4})?(?:[ .-][0-9]{3}){1,5}"
    phone = [s for s in filter(lambda x: 'Telefoon' in str(x), info)]
    try:
        phone = str(phone[0])
    except:
        phone = [s for s in filter(lambda x: re.findall(reg, str(x)), info)]
        try:
            phone = str(phone[0])
        except:
            phone = ""   
    return phone.replace('Facebook', '').replace('Telefoon:', '')

def get_email(soup):
    try: 
        email = [s for s in filter(lambda x: '@' in str(x), soup)]
        email = str(email[0])[4:-5]
        bs = BeautifulSoup(email, features="html.parser")
        email = bs.find('a').attrs['href'].replace('mailto:', '')
    except:
        email = ""
    return email

def remove_html_tags(text):
    """Remove html tags from a string"""
    clean = re.compile('<.*?>')
    return re.sub(clean, '', text)

def fetch_sidebar(soup):
    """ reads html file as a big string and cleans the html file to make it
        more readable. input: html, output: tables
    """
    sidebar = soup.findAll(attrs={'class': 'sidebar'})
    return sidebar[0]

def extract(url):
    text = str(url)
    text = text[26:].split('"')[0] + "/"
    return text


print ('fetch urls')
url = "https://sport050.nl/sportaanbieders/alle-aanbieders/"
s = open_url(url)
reflist = read_hrefs(s)

print ('getting sub-urls')
sub_urls = [s for s in filter(lambda x: '<a href="/sportaanbieders' in str(x), reflist)]
sub_urls = sub_urls[3:]

print ('extracting the data')
print (f'{len(sub_urls)} sub-urls')

for sub in sub_urls:
    try:
        sub = extract(sub)
        site = url[:-16] + sub
        soup = open_url(site)    
        info = fetch_sidebar(soup)
        info = read_li(info)
        phone = get_phone(info)
        phone = remove_html_tags(phone).strip()
        email = get_email(info)
        email = remove_html_tags(email).replace("/","")
        print (f'{site} ; {phone} ; {email}')
    except Exception as e:
        print (e)
        exit()

    

fetch urls


HTTPError: HTTP Error 308: Permanent Redirect