# Introduction to Programming
## Foundations - Assignment 3

- [Prime factor decomposition](#decomposition)

- [Merging dictionaries](#merging)

---

This script contains 2 exercises. The second exercise was partially solved during lab session 2. Complete these 2 exercises as part of your third assignment, due on Sunday September 26th at midnight. Please return your solutions in jupyter notebook form by email at nbertani@ucp.pt by the deadline. These exercises are graded on effort: non-empty submissions receive full credit. Note that 3 datacamp chapters are also part of assignment 3. 

<a id="decomposition"></a>
### Prime factor decomposition

Prime factor decomposition consists in writing a natural number (positive integer) greater than 1 as a product of prime numbers. For instance, the prime factor decomposition of 9 is $3 * 3$, of 22 it is $2 * 11$, and of 252 it is $2 * 2 * 3 * 3 * 7 = 2^2 * 3^2 * 7$.

To factorize a number, take the following steps.

- Consider a natural number greater than 1.

- Find the smallest prime number by which you can divide this number exactly (the result or quotient of the division is an integer, with no remainder). Store the prime number that met the condition: this is your first prime factor.

- Take the quotient (the result) of the division above: 

    - If the quotient is one, you are done.
    
    - If the quotient is greater than one, repeat the previous step: find the next factor, store it, and then consider the new quotient. In particular, you want to do the following:
    
        - If it is possible, divide again the quotient by the same prime number you used in the previous step.

        - When you cannot divide exactly by the previous factor, divide it by the next possible prime number. This means that you need to find the smallest prime number that is greater than your last used factor and that divides exactly your quotient.


Write a program that performs factor decomposition and outputs a list of 2-element tuples to summarize the results. The list must have as many entries as the number of different prime numbers used in the decomposition (i.e. the number of factors). Each entry of the list is a 2-tuple where the first element records what factor has been used and the second records how many times this factor has been used.

As an example of output, consider the decomposition of $727, \, 776 = 2^5 * 3^2 * 7 * 19 ^ 2$.

In [32]:
def prime_factors(n):
    i = 2
    fact = []
    count = 1
    while i * i <= n:
        if n % i:
            i += 1
        else:
            n //= i
            fact.append(i)
            count += 1
    if n > 1:
        fact.append(n)
     
    return fact , count


prime_factors(727776)

([2, 2, 2, 2, 2, 3, 3, 7, 19, 19], 10)

In [None]:
2 ** 5 * 3 ** 2 * 7 * 19 ** 2

In [None]:
pretty_factor_decomposition(2 ** 5 * 3 ** 2 * 7 * 19 ** 2)

You can check whether your prime factor decomposition is correct [here](https://www.calculatorsoup.com/calculators/math/prime-factors.php).

**Hints:**

- To create your output, initialize an empty list. Then, create a variable to store your possible divisor, which you can initialize to 2 (the first possible prime divisor). Use a counter to keep track of how many times you use a given divisor. Before you move on to the next divisor, if you have used a given divisor at least once, append the tuple of its value and its counter to the output.

- Start with the smallest prime (which is 2) and, when you can no longer exactly divide by it, iteratively consider larger primes (3, 5, 7, etc). To find the next prime divisor, you can simply consider iteratively all numbers larger than your last factor (3, 4, 5, 6, 7, etc). No non-prime number will be a factor.

- The simplest check on your code is that the number 2 must be in the factors whenever you are decomposing an even number.

<a id="merging"></a>
### Merging dictionaries

Dictionaries and the JSON language are popular format for storing data or observations because of their flexibility. 
In particular, their flexibility lies in the key:value structure, allowing to store an unspecified number of such pairs. 
This is very useful when we have observations that have a (small) number of common characteristics and all remaining characteristics are very heterogenous. 

Often, data you get from the internet will be in dictionary format for this very reason.
However, to perform analyses, you will first need to unify all the common information in these observations.

We will do this for the following objects.

In [2]:
entries = {
    'cao2017modelling' : {
        'type' : 'article',
        'title' : 'Modelling the interdependence of tourism demand: The global vector autoregressive approach',
        'author' : 'Cao, Zheng and Li, Gang and Song, Haiyan',
        'journal' : 'Annals of Tourism Research',
        'volume' : 67,
        'pages' : '1--13',
        'year' : 2017,
        'publisher' : 'Elsevier'
    },
    'ishwaran2011consistency' : {
        'type' : 'article',
        'author' : 'Ishwaran, Hemant and Rao, J Sunil',
        'date-added' : '2019-09-05 21:53:03 +0200',
        'date-modified' : '2019-09-05 21:53:14 +0200',
        'journal' : 'Statistics \& Probability Letters',
        'keywords' : ['spike-and-slab', 'shrinkage', 'regularization'],
        'number' : 12,
        'pages' : '1920--1928',
        'publisher' : 'Elsevier',
        'title' : 'Consistency of spike and slab regression',
        'volume' : 81,
        'year' : 2011
    },
    'mccullagh2019generalized' : {
        'type' : 'book',
        'title' : 'Generalized linear models',
        'author': 'McCullagh, Peter and Nelder, John A',
        'year': 1983,
        'publisher' : 'Routledge'
    }
}

additional_entry = {
    'friedman2001elements' : {
        'type' : 'book',
        'title' : 'The elements of statistical learning',
        'author' : 'Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert',
        'volume' : 1,
        'number' : 10,
        'year' : 2001,
        'publisher' : 'Springer series in statistics New York'
    }
}

additional_entry_2 = {
    'FanLv2008sure' : {
        'type' : 'article',
        'author' : 'Fan, Jianqing and Lv, Jinchi',
        'journal' : 'Journal of the Royal Statistical Society. Series B (Statistical Methodology)',
        'keywords' : ['Adaptive lasso', 'Dantzig selector', 'Dimensionality reduction', 'Oracle estimator', 'Sure independence screening'],
        'number' : 5,
        'pages' : 849,
        'title' : 'Sure Independence Screening for Ultrahigh Dimensional Feature Space.',
        'volume' : 70,
        'year' : 2008
    }    
}

print(entries)
print(additional_entry)
print(additional_entry_2)

{'cao2017modelling': {'type': 'article', 'title': 'Modelling the interdependence of tourism demand: The global vector autoregressive approach', 'author': 'Cao, Zheng and Li, Gang and Song, Haiyan', 'journal': 'Annals of Tourism Research', 'volume': 67, 'pages': '1--13', 'year': 2017, 'publisher': 'Elsevier'}, 'ishwaran2011consistency': {'type': 'article', 'author': 'Ishwaran, Hemant and Rao, J Sunil', 'date-added': '2019-09-05 21:53:03 +0200', 'date-modified': '2019-09-05 21:53:14 +0200', 'journal': 'Statistics \\& Probability Letters', 'keywords': ['spike-and-slab', 'shrinkage', 'regularization'], 'number': 12, 'pages': '1920--1928', 'publisher': 'Elsevier', 'title': 'Consistency of spike and slab regression', 'volume': 81, 'year': 2011}, 'mccullagh2019generalized': {'type': 'book', 'title': 'Generalized linear models', 'author': 'McCullagh, Peter and Nelder, John A', 'year': 1983, 'publisher': 'Routledge'}}
{'friedman2001elements': {'type': 'book', 'title': 'The elements of statistic

Perform the following actions:

1. merge all elements in `entries`, `additional_entry` and `additional_entry_2` in a single dictionary called `data`. This has been solved in class.

1. create a new dictionary `common_data` to summarize the common information in the elements of `data`. The new dictionary must have keys that are equal to the keys that are common to all dictionaries nested as values in `data`. The associated values of the new dictionary must list all corresponding values from the common nested keys in `data`. Furthermore, in the new dictionary add another key with name `entry name` to store a list of the names of the elements in `data` (i.e. the non-nested keys in the `data` dictionary).

1. create two separate dictionaries, one for entries that have `'type' = 'article'` and one for `'type' = 'book'`. Populate each of these new dictionaries as in the previous point. The best way to do this is to create a function that takes as an argument the value of key `type` by which to filter the elements in `data`. A simpler solution is to use the same code twice.

**1. Merge data**

In [3]:
data = {}
for i in [entries, additional_entry, additional_entry_2]:
    data.update(i)

In [4]:
print(data.keys())

dict_keys(['cao2017modelling', 'ishwaran2011consistency', 'mccullagh2019generalized', 'friedman2001elements', 'FanLv2008sure'])


In [4]:
print(data)

{'cao2017modelling': {'type': 'article', 'title': 'Modelling the interdependence of tourism demand: The global vector autoregressive approach', 'author': 'Cao, Zheng and Li, Gang and Song, Haiyan', 'journal': 'Annals of Tourism Research', 'volume': 67, 'pages': '1--13', 'year': 2017, 'publisher': 'Elsevier'}, 'ishwaran2011consistency': {'type': 'article', 'author': 'Ishwaran, Hemant and Rao, J Sunil', 'date-added': '2019-09-05 21:53:03 +0200', 'date-modified': '2019-09-05 21:53:14 +0200', 'journal': 'Statistics \\& Probability Letters', 'keywords': ['spike-and-slab', 'shrinkage', 'regularization'], 'number': 12, 'pages': '1920--1928', 'publisher': 'Elsevier', 'title': 'Consistency of spike and slab regression', 'volume': 81, 'year': 2011}, 'mccullagh2019generalized': {'type': 'book', 'title': 'Generalized linear models', 'author': 'McCullagh, Peter and Nelder, John A', 'year': 1983, 'publisher': 'Routledge'}, 'friedman2001elements': {'type': 'book', 'title': 'The elements of statistica

In [7]:
print(data.values())

dict_values([{'type': 'article', 'title': 'Modelling the interdependence of tourism demand: The global vector autoregressive approach', 'author': 'Cao, Zheng and Li, Gang and Song, Haiyan', 'journal': 'Annals of Tourism Research', 'volume': 67, 'pages': '1--13', 'year': 2017, 'publisher': 'Elsevier'}, {'type': 'article', 'author': 'Ishwaran, Hemant and Rao, J Sunil', 'date-added': '2019-09-05 21:53:03 +0200', 'date-modified': '2019-09-05 21:53:14 +0200', 'journal': 'Statistics \\& Probability Letters', 'keywords': ['spike-and-slab', 'shrinkage', 'regularization'], 'number': 12, 'pages': '1920--1928', 'publisher': 'Elsevier', 'title': 'Consistency of spike and slab regression', 'volume': 81, 'year': 2011}, {'type': 'book', 'title': 'Generalized linear models', 'author': 'McCullagh, Peter and Nelder, John A', 'year': 1983, 'publisher': 'Routledge'}, {'type': 'book', 'title': 'The elements of statistical learning', 'author': 'Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert', 'v

**2. Create the new dictionary with common data**

Your output should look as follows:

In [11]:
print(common_data)

{'entry name': ['cao2017modelling', 'ishwaran2011consistency', 'mccullagh2019generalized', 'friedman2001elements', 'FanLv2008sure'], 'type': ['article', 'article', 'book', 'book', 'article'], 'title': ['Modelling the interdependence of tourism demand: The global vector autoregressive approach', 'Consistency of spike and slab regression', 'Generalized linear models', 'The elements of statistical learning', 'Sure Independence Screening for Ultrahigh Dimensional Feature Space.'], 'author': ['Cao, Zheng and Li, Gang and Song, Haiyan', 'Ishwaran, Hemant and Rao, J Sunil', 'McCullagh, Peter and Nelder, John A', 'Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert', 'Fan, Jianqing and Lv, Jinchi'], 'year': [2017, 2011, 1983, 2001, 2008]}


In [48]:
d1, d2, d3, d4, d5 = [i for i in data.values()] 

common_keys = []

for item in d1.keys():
    if item in d2.keys():
        if item in d3.keys():
            if item in d4.keys():
                if item in d5.keys():
                    common_keys.append(item)

tv = [data[common_keys]["type"] for common_keys in data]
tiv = [data[common_keys]["title"] for common_keys in data]
av = [data[common_keys]["author"] for common_keys in data]
yv = [data[common_keys]["year"] for common_keys in data]

common_data ={"entry name":[i for i in data.keys()], "type":tv, "title":tiv, "author":av, "year":yv}
print(common_data)
        

{'entry name': ['cao2017modelling', 'ishwaran2011consistency', 'mccullagh2019generalized', 'friedman2001elements', 'FanLv2008sure'], 'type': ['article', 'article', 'book', 'book', 'article'], 'title': ['Modelling the interdependence of tourism demand: The global vector autoregressive approach', 'Consistency of spike and slab regression', 'Generalized linear models', 'The elements of statistical learning', 'Sure Independence Screening for Ultrahigh Dimensional Feature Space.'], 'author': ['Cao, Zheng and Li, Gang and Song, Haiyan', 'Ishwaran, Hemant and Rao, J Sunil', 'McCullagh, Peter and Nelder, John A', 'Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert', 'Fan, Jianqing and Lv, Jinchi'], 'year': [2017, 2011, 1983, 2001, 2008]}


**3. Create the new dictionary with common data for a specific data type**

Using a user-defined function `find_common_data()`, your output should look as follows:

In [None]:
def fcdat():

In [13]:
print(find_common_data('article'))

{'entry name': ['cao2017modelling', 'ishwaran2011consistency', 'FanLv2008sure'], 'type': ['article', 'article', 'article'], 'title': ['Modelling the interdependence of tourism demand: The global vector autoregressive approach', 'Consistency of spike and slab regression', 'Sure Independence Screening for Ultrahigh Dimensional Feature Space.'], 'author': ['Cao, Zheng and Li, Gang and Song, Haiyan', 'Ishwaran, Hemant and Rao, J Sunil', 'Fan, Jianqing and Lv, Jinchi'], 'journal': ['Annals of Tourism Research', 'Statistics \\& Probability Letters', 'Journal of the Royal Statistical Society. Series B (Statistical Methodology)'], 'volume': [67, 81, 70], 'pages': ['1--13', '1920--1928', 849], 'year': [2017, 2011, 2008]}


In [14]:
print(find_common_data('book'))

{'entry name': ['mccullagh2019generalized', 'friedman2001elements'], 'type': ['book', 'book'], 'title': ['Generalized linear models', 'The elements of statistical learning'], 'author': ['McCullagh, Peter and Nelder, John A', 'Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert'], 'year': [1983, 2001], 'publisher': ['Routledge', 'Springer series in statistics New York']}


In [47]:
def find_common_data(value):
  
    for i in data:
        if value == 'article':
            if data[i]["type"]==value:
                print(data[i])
        elif value == 'book':
            if data[i]["type"]==value:
                print(data[i])
            
            
        
        
find_common_data("book")
print('\n')
find_common_data("article")

{'type': 'book', 'title': 'Generalized linear models', 'author': 'McCullagh, Peter and Nelder, John A', 'year': 1983, 'publisher': 'Routledge'}
{'type': 'book', 'title': 'The elements of statistical learning', 'author': 'Friedman, Jerome and Hastie, Trevor and Tibshirani, Robert', 'volume': 1, 'number': 10, 'year': 2001, 'publisher': 'Springer series in statistics New York'}


{'type': 'article', 'title': 'Modelling the interdependence of tourism demand: The global vector autoregressive approach', 'author': 'Cao, Zheng and Li, Gang and Song, Haiyan', 'journal': 'Annals of Tourism Research', 'volume': 67, 'pages': '1--13', 'year': 2017, 'publisher': 'Elsevier'}
{'type': 'article', 'author': 'Ishwaran, Hemant and Rao, J Sunil', 'date-added': '2019-09-05 21:53:03 +0200', 'date-modified': '2019-09-05 21:53:14 +0200', 'journal': 'Statistics \\& Probability Letters', 'keywords': ['spike-and-slab', 'shrinkage', 'regularization'], 'number': 12, 'pages': '1920--1928', 'publisher': 'Elsevier', '