# Fundamentals of Data Analysis (FoDA) - Tasks 2020

These are the workbook for the tasks that I have created for the FoDA module. This workbook has been created by Sheldon D'Souza (G00387857@gmit.ie)

***

### Task for the week of October 5, 2020 - Writing a function 'counts'

The objective of Task is to write a Python function called 'counts' that takes a list as input and returns a dictionary of unique items in the list as keys and the number of times each item appears as values.

#### How the function works

- The function requires a list to be passed as an argument
- It will then ask the user for input on whether the user wants to count upper and lower case items as unique. A message will appear on the screen:
```       
Do you want the  count to be case sensitive
Enter '1' to treat all as upper case
Enter '2' to treat all as lower case
Enter '3' to count in original case
```
- Depending on the choice the user makes, the function will treat the unique items in the list as case sensitive or otherwise (e.g. a list containing 'A' and 'a' will give a count of A:2 or a:2 in case option 1 or 2 are selected respectively; and A:1 , a:1 when option 3 is selected). If the user inputs anything else except 1,2 or 3 the program gives an error message and terminates
- The function will return a dictionary of  unique items OR return an error message if the the  list has items which are outside the parameters (mentioned in the function limitations below) 


#### Function scope and limitations

The function accepts a list subject to the following limitations:

- The list can contain any characters (alphaumeric, special characters etc.)
- The list can contain another list/tuples. However, that embedded list/tuples/dictionary cannot contain any further lists/tuples/dictionary. In other words ONLY one layer of embded lists or tuples or dictionaries is allowed
- The list can contain a dictionary. However, that dictionary cannot contain any further list, tuple or dictionary (see previous point). Furthermore, the embedded dictionary needs to contain only numeric values.
- The embedded lists, tuples and dictionaries will be added to the count of values in the main list

If the above limitations are exceeded the function will return and error and will terminate. 


#### Researching and planning the exercise

I had seen a similar exercise covered in 'Automate the Boring Stuff with Python' by Al Sweigart [1] which I used as a base for the exercise. 

I planned the assignment as follows:

- Researching and writing the core code which does the count
- adding in the code for the imbedded lists/tuples and dictionaries
- added code which will check whether the values for the embedded dictionaries are numeric
- added code to merge the embedded dictionary and the 'output' dictionary together
- adding in the code for the user input which in turn will set a flag for whether the charaters should be counted as case sensitive and added code based on each 'flag' scenario
- added try and except codes to return an exception and terminate the function in case the program exceeds the inbuilt limitations 



#### Writing the code

*Core code*

The core piece of code is not very complicated and mainly revolved around the use of a for loop to iterate over the list. [2].

The core code uses the 'setdefault' method of a dictionary object to create keys where they do not aready exist and uses a counter to keep track of the number of instances for a particuar key. A sample of this code is below

```python
DICT.setdefault(item, 0)
DICT[item] = count_dict[item] + 1
```

*Input code*

The input code uses the input statement to display a prompt to request the user for a choice of the numbers 1,2 or 3. The function then uses an if, elif and else statement to evaluate the user input and sets a varible to a certain state depending on the user input. The else statement terminates the function and returns an error message if the user inputs an incorrect choice. (See documentation above for details of the choices offered) [3]


*The main code block*

The main code of the program is written as follows:
1. Two blank dictionaries are created. The first one is to hold the output of the function and the second is to hold the content of any embedded list passed as an argument (these lists will be merged in a later part of the code)
2. A for loop that iterates through the list and checks for the data type. 
3. The function uses the inbuilt type function within python to check for the data type of each element of the list passed as an argument 
4. A series of if, elif and else statements then run seperate codes depending on the data type
5. If the data type is an list or tuple (i.e. an embedded within the main list passed) the function will run another for loop to iterate over each element within that list or tuple and run the 'core code' as detailed in the 'core code' section above'
6. If the data type is a dictionary, the code will use a for loop to iterate through the dictionary using an if statement to check whether each value of the dictionary is numerical (so that they can be combined with the main input dictionary). If any of the 'value' keys of the dictionary is not numerical, the function will terminate and return a string with the reason for the error. The code to accomplish this uses a flag variable that increases by '1' each time the value within the dictionary is numeric (isnumeric method of a string object). The flag variable is compared to the length of the dictionary and if the two are same it means that all the values in the dictionary are numeric. In this case the function appends the 'validated' dictionary to a temporary list using the .append method of a list object.
7. The next three blocks of code (2 elif and 1 else) checks whether the user wants the alpha characters to be upper or lower case or original case and then counts them accordingly converting them where necessary. The function checks whether the data type is a 'string' and also checks the value of the 'flag' variable set in the input block of the code. The code then uses the 'upper' or 'lower' attribute of a 'string' data type to convert the data item to upper or lower case. If no flag is selected (i.e. the flag variable is set to ''), the function treats upper and lower case data items as disctinct for the purpose of the output dictionary.
8. The final code in the code block merges the output and the 'embedded input' dictionaries together. The code was inpired by a stackoverflow post [4]. As mentioned in point 6 above, the embedded dictionaries are checked for validity and then appended onto a temporary list. The output dictionary is also appended onto this list and so we get a list of dictionaries. We then use two 'for' loops to (1) go through each dictionary and (2) then the items(key, value pairs) within each dictionary. Using  the 'setdefault' method of the dictionary object, we append the keys of the dictionaries to a new dictionary called 'final_dict. Setdefault ensures that the current dictionary key wil be added to the final_dict if it does not already exist and add together the values associated with that key on each iteration of the for loop.



*Error Handling*
As mentioned in the function scope and limitation sections above, the function will accept a single embedded list, tuple or  dictionary within the orginal input list. Anything other than this will cause a 'non hashable error' which will be handled by the 'try and except' block of code within which the function is embedded. Any TyperError generated will be addressed by the try and except code and will return an error message to the user specifing the error. [5]




#### References
[1] 'Automate the Boring Stuff with Python' by Al Sweigart - Chapter 5 - Dictionaires and Structuring Data: The Dictionary Data Type

[2] A Whirlwind tour of Python by Jake VanderPlas - 'Control Flow'

[3] https://www.askpython.com/python/examples/python-user-input

[4] https://stackoverflow.com/questions/20509570/merge-dictionaries-without-overwriting-previous-value-where-value-is-a-list

[5] A Whirlwind tour of Python by Jake VanderPlas - 'Errors and Exceptions'


In [44]:
def count(a_list):
    '''This function takes a list as an argument and returns a dictionary of unique items and the 
            number of times they appear in the list'''
    
    try: # used a try/except blocks to hand TypeErrors in case an invalid list is input
        
        
        #flag will ask for a user choice on whether the count for alpha characters will be case sensitive or not
        flag = input("Do you want the  count to be case sensitive. \
                     Enter '1' to treat all as upper case. \
                     Enter '2' to treat all as lower case. \
                     Enter '3' to count in original case")
        if str(flag) == '1':
            flag = 'upper'
        elif str(flag) == '2':
            flag = 'lower'
        elif str(flag) == '3':
            flag = ''
        else:
            return 'Invalid input. Please rerun program \
                    and make the correct case sensitivity choice' #Invalid choice terminates the program
        
        
        
        count_dict = {} # initial dictionary to count the data types. Start as blank
        final_dict = {} # the final dictionary returned by the function. Start as blank
        templist = [] #temp list to accumulate the dictionaries. Start as blank 
        
        for item in a_list: #iterate through each item on the list and count it's elements based on the type of the item
            
            if type(item)==list or type(item)==tuple: #The program will allow one layer of list or tuples within the input list
                for inner_item in item:
                    if type(inner_item) == str and flag == 'upper':
                        inner_item = inner_item.upper()
                    elif type(inner_item) == str and flag == 'lower':
                        inner_item = inner_item.lower()
                    count_dict.setdefault(inner_item, 0) #setdefault will generate a key if none exists and give it the value of zero
                    count_dict[inner_item] = count_dict[inner_item] + 1 #1 will be added to the 'value' of the key in the previous line
                        
                    
            elif type(item)==dict:
                tempflag = 0  #see below
                for value in item.values():
                    if str(value).isnumeric(): #this checks whether if each value in  the dictionary is numeric
                        tempflag = tempflag + 1 #if numeric the flag value will increase by 1
                if tempflag == len(item): #if all values are numeric; the flag value will equal the length of the dictionary
                    templist.append(item) # In this case the dictionary is valid and is included for further processing into a temporary list
                else:
                    return 'Error, One of more values of the input dictionary are not numeric. \
                        #Please include a proper dictionary' #if the value of the inner dictionary are not numbers the function terminates and returns an error to the user
                    
            
            #The next three blocks of code (2 elif and 1 else) checks whether the user wants the alpha characters to be upper
            # or lower case or original case and then counts them accordingly converting them where necessary.
            elif type(item)==str and flag == 'lower':
                item_lower = item.lower()
                count_dict.setdefault(item_lower, 0)
                count_dict[item_lower] = count_dict[item_lower] + 1

            elif type(item)==str and flag == 'upper':
                item_upper = item.upper()
                count_dict.setdefault(item_upper, 0)
                count_dict[item_upper] = count_dict[item_upper] + 1

            else:
                count_dict.setdefault(item, 0)
                count_dict[item] = count_dict[item] + 1

        templist.append(count_dict) # This appends all dictionaries together within a single list

        for d in templist: #This code goes through each item in the list (which are dictionaries)
            for k, v in d.items(): #and for each dictionary will iterate through each key value pair
                final_dict.setdefault(k, 0) #setdefaut will add the key to the final_dict (even if it doesn't exist)
                final_dict[k] = final_dict[k] + v #and add the 'values' of the keys together
        
        return final_dict
    
    except TypeError:
        return "Error. You have included a list/tuple within a list/tuple or an invaid dictionary. \
                Please refer to Readme file to see acceptable input parameters"

#### Testing the function

See below a test of the function:

In [46]:
count([[1,2,1,2,3],[2,3,5,7,9],3,7,{1:7, 3:2},{1:12, 5:7, 'cat':4},'a', 'A', 'a', 'A',(1,2,3),(3,4,5),'spam','eggs',('spam','eggs')])

Do you want the  count to be case sensitive.                      Enter '1' to treat all as upper case.                      Enter '2' to treat all as lower case.                      Enter '3' to count in original case2


{1: 22,
 3: 7,
 5: 9,
 'cat': 4,
 2: 4,
 7: 2,
 9: 1,
 'a': 4,
 4: 1,
 'spam': 2,
 'eggs': 2}

### END TASK 1

***

### Task for the week of November 2nd, 2020 - Writing a function 'dicerolls'

The objective of Task is to write a Python function called 'dicerolls' that simulates the rolling of 'k' number of dice for 'n' number of rolls. The function takes two inputs (1) 'k', which are the number of dice rolled and (2) 'n' which is the number of times 'k' dice are rolled. The sum of the face values of each dice is collated after each roll. The output of the function is a dictionary which returns the sum of the face values from each roll as the keys and the count of the number of times that each sum of face values is rolled.

The end dictionary should look something like: 

`{2:19,3:50,4:82,5:112,6:135,7:174,8:133,9:114,10:75,11:70,12:36}`

Project Plan:

- Write proof of concept code (as code seems pretty simple). Code comprises mainly of two for loops
- Include input validation and error handling
- tighten up code by using list comprehension etc.
- Finalise code by cleaning up variables etc
- Include an analysis of the data by graphs
- Any other interesting analysis
- Add commentary and oberservations

In [15]:
def dicerolls(k, n):
    import random #import random module
    l = [] #empty list to accumulate the sum of the dice after each roll
    d = {} # dictionary to hold the final output
    for x in range(n): #first for loop to simulate the number of rolls
        count = 0 # variable to hold the sum of the values per roll
        for y in range(k): # for loop to simlulate the number of dice 
            count = count + random.randint(1, 6) # random selection of the dice roll and accumulation of the rolls in the count variable
            l.append(count) #append the result of sum of the rolls to a list
    for i in l: #for loop to iterate through list of rolls values
        d.setdefault(i,0) #generate keys where none exist using the setdefault method
        d[i] = d[i] + 1 #accunulate the values for each key
    return (d) #returns the final dictionary

In [14]:
dicerolls(1,100000)

{4: 16636, 1: 16685, 5: 16586, 2: 16878, 3: 16706, 6: 16509}