# Topic: More Data Types
In this tutorial you will learn about some more data types in Python. The focus in the main exercises will be on arrays, since a lot of this content will be used when we start learning about data processing. The additional exercises will focus on the other data types.

## Core Exercises

[NumPy Basics and Elementwise Operations](#numpy_basics)<br>

[Filtering and Aggregates](#aggregates)<br>

[User Input and Formatted Output Strings](#input_and_output):
> [Exercise: Cash Register](#cash_register)<br>

## Extension Exercises

[Strings](#strings):
> [Exercise: Word Count](#word_count)<br>

[Sets](#sets):
> [Exercise: Unique Words](#unique_words)<br>
> [Exercise: Shared Words](#shared_words)<br>

[Dictionaries](#dictionaries):
> [Exercise: Word Frequency Counter](#word_frequency)<br>
> [Exercise: Sorting a Dictionary](#dictionary_sort)<br>



<a name="numpy_basics"> </a>
## NumPy basics and element-wise operations

In [None]:
import numpy as np

### One dimensional array (vector)

In [None]:
# create a list containing all the integers from 0 to 100


In [None]:
# now create a numpy array containing the integers from 0 to 100


In [None]:
# you would like to determine the square of each number from 0 to 100 and store them in a sequence (a list or an array is fine)

# a) demonstrate how to do this with a list by iterating with a for loop.


# b) demonstrate how to do this with an array by using an element-wise operation.


# which method is easier?

In [None]:
# now suppose we wanted to find the cosine of 100 evenly spaced values between 0 and 2*pi

# a) create a numpy array of 100 evenly spaced values between 0 and 2*pi


# b) use numpy's cos function to find the cosine of each of these numbers


# now run the code below to visualise your answer

In [None]:
# plotting previous answer
fig, ax = plt.subplots()
ax.plot(x, cos_x)
ax.set_title('Plot of $y = cos(x)$')
ax.set_xlabel('x')
ax.set_ylabel('y')
plt.show()

### Two dimensional array (matrix)

In [None]:
# below is an example 4 x 3 matrix that has been represented with a list of lists.
matrix = [[5, 1, 3], [7, 4, 2], [1, 3, 1], [4, 5, 9]]

In [None]:
# convert the list of lists above to a numpy array


In [None]:
# use the ndim attribute to confirm the matrix has two dimensions


In [None]:
# use the shape attribute to confirm the dimensions of the matrix are 4 x 3


In [None]:
# divide every element of the matrix by 10


<a name="aggregates"> </a>
## Filtering and Aggregates

In [None]:
# below is a numpy array that represents daily minimum and maximum temperatures over the course of a month
temperature_data = np.array([[1, 13.3, 25.8],
                           [2, 15.1, 23.5],
                           [3, 13.1, 20.8],
                           [4, 13.1, 22.9],
                           [5, 12.1, 23.4],
                           [6, 12.7, 19.3],
                           [7, 12.4, 21.6],
                           [8, 12.2, 22.6],
                           [9, 11.8, 22.9],
                           [10, 10.3, 23.3],
                           [11, 9.9, 27.8],
                           [12, 10.1, 26.4],
                           [13, 12.1, 29.1],
                           [14, 13.1, 27.9],
                           [15, 11.3, 26.1],
                           [16, 14.5, 23.7],
                           [17, 14.7, 24.2],
                           [18, 16.7, 25.1],
                           [19, 13.4, 22.4],
                           [20, 8.7, 23.3],
                           [21, 11.1, 24.4],
                           [22, 12.4, 25.9],
                           [23, 14.0, 28.1],
                           [24, 13.6, 24.3],
                           [25, 13.2, 25.3],
                           [26, 11.8, 24.1],
                           [27, 14.2, 24.7],
                           [28, 13.4, 24.7],
                           [29, 13.2, 25.5],
                           [30, 13.7, 25.7]])

In [None]:
# we can access each column of the temperature data by indexing
days = temperature_data[:, 0]
min_temps = temperature_data[:, 1]
max_temps = temperature_data[:, 2]

In [None]:
# what was the highest temperature in the month?


In [None]:
# what was the average maximum daily temperature for the month?


In [None]:
# what is the standard deviation for maximum temperature that month?


In [None]:
# create a logical array that says whether a day is warm. It should say True (1) for temperatures above 26 degrees and False (0) otherwise


In [None]:
# how many warm days occurred in the month?


In [None]:
# which days of the month were warm days?


In [None]:
# create a logical array that says whether a day is cold. It should say True for minimum temperatures below 10 and False otherwise.


In [None]:
# how many cool days occurred in the month?


In [None]:
# which days of the month were either warm or cool?


<a name="input_and_output"> </a>
## User Input and Formatted Output Strings
User input and formatted output strings are useful for creating code that will interact with people who aren't programmers. A simple example of a cost calculator is shown below. You will use this example to help create a function that interacts with an attendant at a cash register.


### Cost calculator (example)

In [1]:
def cost_calculator():
    item_cost = input('How much (in dollars) does a single item cost?')
    n_items = input('How many items would you like to purchase?')
    cost = float(item_cost) * float(n_items)
    
    print(f'It will cost ${cost} to purchase {n_items} items that cost ${item_cost} each.')

In [None]:
cost_calculator()

<a name="cash_register"> </a>
### Exercise: Cash Register
You are designing a cash register program to be used at a sausage sizzle. There are three items for sale: standard sausages (\\$3), premium sausages (\\$5), and soft drink cans (\\$2). Your function should ask the person at the register how many of each item was ordered. It should then ask them how much cash was given to them. It will then calculate how much change they need to give to the customer through a formatted output string. If they weren't given enough money, the output string should state this and mention how much extra money needs to be provided.

Bonus (optional): you can also get your function to track inventory (how much of each item has been sold) and revenue (total money made) over the course of multiple purchases.

In [None]:
def sausage_sizzle_purchase():
    pass # replace pass with your solution

In [None]:
# call the function here to test if it works

<a name="strings"> </a>
## Working with Strings
Often in programming we are required to extract certain features from a piece of text. For example, you might want to find the most frequently used words from a collection of product reviews to capture how people feel about the product. This typically involves the use of string methods and loops. There are lots of string methods, so in this tutorial we will suggest useful methods to you. 

The test cases for the following exercises will be two quotes from Hamlet.

In [None]:
quote_1 = """To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them. To die: to sleep..."""

In [None]:
quote_2 = """This above all: to thine own self be true,
And it must follow, as the night the day,
Thou canst not then be false to any man."""

### Accessing lines and words in text

In [None]:
# access lines with the splitlines method
quote_1_lines = quote_1.splitlines()
quote_1_lines

In [None]:
# since each line is stored as a separate element of a list, you can now access individual lines or iterate through each line.
# example: access the first line
quote_1_lines[0]

In [None]:
# you can access each word using the split method
quote_1_words = quote_1.split()
quote_1_words

In [None]:
# You will notice that there is punctuation attached to some of the words. Some of the words also start with capital letters. 
# This can be cleaned up with the lower method and the strip method.
import string

quote_1_words = quote_1_cleaned.lower().split() # converts to lower case

# for loop to strip punctuation from each word
for word in quote_1_words:
    cleaned_word = word.strip(string.punctuation)
    print(cleaned_word)

<a name="word_count"> </a>
### Exercise: Word Count
Create a function that counts the number of words in a piece of text. Then test your function by calling it on the two example texts.

In [None]:
def word_count(text):
    pass # replace with your own solution

In [None]:
word_count(quote_1) # should return 43

In [None]:
word_count(quote_2) # should return 27

<a name="sets"> </a>
## Sets

<a name="unique_words"> </a>
### Exercise: Finding all unique words in a piece of text
Suppose we instead were interested in each unique word in a piece of text. A convenient way to achieve this is to store each word in a set. A set does not allow duplicate entries, so will only display each word once. You can start with an empty set using the set function and then use the add method to add each word to the set. Alternatively, you could add all the words to a list using the append method, and then convert that list to a set.

In [None]:
def unique_words(text):
    pass # replace with your own solution

In [None]:
unique_words(quote_1)

<a name="shared_words"> </a>
### Exercise: Finding words shared between two texts
Another benefit of working with sets is you gain access to set operations such as intersection and union. We can use intersection (either the & operator or intersection method) to easily find which words belong to two pieces of text.

Create the function below which accepts two arguments which are strings. It should ouput the set of words that the two texts share in common.

In [None]:
def shared_words(text1, text2):
    text1_words = unique_words(text1)
    text2_words = unique_words(text2)
    return text1_words & text2_words


In [None]:
shared_words(quote_1, quote_2) # should return {'and', 'be', 'not', 'the', 'to'}

<a name="dictionaries"> </a>
## Dictionaries
A dictionary is a collection of unique identifiers (called keys) that have some information associated with them (called values). For these exercises, we will get you to create a dictionary to store information corresponding to each word, such as its length or the frequency in which it appears in a text.

### Example: Word Lengths
The example below shows how each unique word can be stored as the keys of a dictionary, and its length is stored as the corresponding value. We have called the unique_words function from earlier in the tutorial to simplify the task.

In [None]:
def word_lengths(text):
    words = unique_words(text)
    word_dictionary = {}
    for word in words:
        word_dictionary[word] = len(word)
    
    return word_dictionary

In [None]:
word_lengths(quote_1)

<a name="word_frequency"> </a>
### Exercise: Word Frequency Counter
Create a function that reads through a piece of text. It should ouput a dictionary where the keys are each unique word in the text, and the values are the frequency of that word (how many times it appeared in the text). 

In [None]:
def word_frequency(text):
    pass # replace with your own code

In [None]:
# test case
word_frequency(quote_1)

<a name="dictionary_sort"> </a>
### Exercise: Sorting a Dictionary
You will have noticed that the dictionary entries aren't sorted in alphabetical order: the order is based on when the word was entered into the dictionary. Create a function below that accepts a dictionary input and outputs a dictionary where the keys have been sorted in alphabetical order.

In [None]:
def sort_dictionary(dictionary):
    pass # replace with your own code
           

In [None]:
# test your code here by calling it on a dictionary

In [None]:
# optional: can you create a function that sorts the dictionary by value instead of by key?