<a href="https://colab.research.google.com/github/acedesci/scanalytics/blob/master/S03_Data_Structures_1/03_Lecture_Example1_Intro_Structures.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# S3 - Python Data Structures I

Programming topics covered in this section:
- Strings/Charts
- Lists/Tuples
- Dictionaries



---
## 1. String
A string is an immutable sequence of characters. There are a large number of functions methods to manipulate them. You can look at  [this page](https://www.w3schools.com/python/python_strings.asp) and [this page](https://docs.python.org/3.7/library/stdtypes.html#string-methods) for more information.
 
Each element of a string (and other sequences) can be accessed with an index between square brackets. All indexes in Python starts at 0.

**Useful functions for a string (and other sequences):**
- `len()`: get the length of a string (i.e., the number of characters)
- `in` keyword: check if a certain phrase or character is present in a string
- `not in` keyword: check if a certain phrase or character is NOT present in a string

**Useful string methods:**
- `.capitalize()`: return a copy of the string with its first character capitalized and the rest lowercased
- `.lower()`: returns a string where all characters are in upper case
- `.upper()`: returns a string where all characters are in lower case
- `.split(sep)`: return a list of the words in the string, using `sep` as the delimiter string

**String operations:**
- `+`: performs string concatenation
- `*`: performs repetition of the string 




### Example 1.1: Using common string methods and functions
Using common string methods, functions and operations.

In [None]:
# defining a variable of type string
province = "Alberta-AB"
print('Structure type', type(province))

# Using some commom functions
print('Number of characters: ', len(province))
print('First letter/character: ', province[0])   # you can access to the n-th character in a similar way (e.g., province[n])
print('Last letter/character: ', province[-1])   
print('Last 3 characters (string slices): ', province[-3:])
print('First 3 characters (string slices): ', province[:3])
print('Upper case the string: ', province.upper())
print('Lower case the string: ', province.lower())
print("Is it 'A' in the string?:","A" in province)
print("Is not 'A' in the string?:","A" not in province)

In [None]:
# Using the split method
province_split = province.split('-')
print("Splitting the string considering the separator'-' (it returns a list of strings): ", province_split)
print("The province's name (first element): ", province_split[0])
print("Abbreviation of the province's name (second element): ", province_split[1])

### Example 1.2: Extracting information from the product's reference
Define a function which returns the following information using the reference code of a product.

- The product reference (number)
- The day, month and year in which the product was produced
- The supplier reference (number)
- The full name of the province from which the product was delivered. The reference code use the following convention:
    * QC: Quebec
    * ON: Ontario
    * BC: British Columbia
    * SK: Saskatchewan
    * MB: Manitoba
    * AB: Alberta
    
The format of the product's reference is as follows:

<div>
  <img src="attachment:ProductRef-2.png" width="500">
</div>

In [None]:
# defining a function which provide products info based on its reference
def ProductInfo(reference):
    """
    Return information based on the product's reference
    Parameters:
        reference: (string) list of characters for the 
    Return:
        prod_ref: (number) product's reference
        day: (number) production day
        month: (number) production month
        year: (number) production year
        sup_province: (string) name of the province from which the product was delivered
        sup_ref: (number) supplier's reference     
        
    """
    prod_ref = int(reference.split('-')[-1])   # last item (converted to int) after splitting the input string using the '-' sep
    date = reference.split('-')[1]   # second element after splitting the input string using the '-' separator
    supply_info = reference.split('-')[0]  # first element after splitting the input string using the '-' separator
    
    return  prod_ref, int(date[:2]), int(date[2:4]), int(date[4:]), supply_info[:2], supply_info[2:]

# Determining the production year of the product based on its reference
prod_ref = 'ON41-12012012-56'
print('Production year of the product with reference number ', prod_ref,' : ', ProductInfo(prod_ref)[3])


---
## 2. Lists and Tuples

### Lists
Like a string, a **list** is a sequence of values. In a string, the values are characters; in a list, they can be any type. The values in a list are called **elements** or sometimes **items**.

There are several ways to create a new list; the simplest is to enclose the elements in square brackets (`[` and `]`):
- `[10, 20, 30, 40]`
- `['Quebec', 'Ontario', 'Alberta']`

Lists can contain strings, floats, and another lists. A list within another list is **nested**. A list with no elements is an **empty** list, which is created with empty brackets `[]`.
- `nested = [[5, 10], [12, 21], [10, 20]]`
- `empty = []`

For more information, check [this page](https://www.w3schools.com/python/python_lists.asp).

Code example:

In [None]:
prices = [12.5, 12.4, 12.0, 13.0, 12.6, 13.5, 12.8, 11.7]

# function "len(x)" gives the number of elements in the list
print(len(prices))

# Manupulating lists
# note that the index of the list always starts at zero, and then 1, 2, 3,...
print(prices[0])
print(prices[1])

In [None]:
# if you want to get the number of the last index, you can also use -1. The second ast is -2, -3 and so on
print(prices[-1])
print(prices[-2])

In [None]:
# if you write sales[a:b] where a and b are the starting and ending index, respectively, 
# this will give the "slice" of list from a to b-1
print(prices[2:5])
print(prices[2:-1])

In [None]:
# you can add an element to the list
province = ['Quebec', 'Ontario', 'Alberta']
province.append('British Columbia')
print(province)

# or remove an element from the list
province.remove('British Columbia')
print(province)

# or combine list
province_2 = ['Prince Edward Island','Saskatchewan']
print(province + province_2)

Python list offers the use of list comprehension which can be used in conjunction with for and conditional statements (see [this link](https://www.w3schools.com/python/python_lists_comprehension.asp) for more detail) 

In [None]:
product_price = [1.5, 2.4, 2.0, 3.0]
product_demand = [100, 200, 150, 400]

n_elements = len(product_price) # get the length of the list
revenue = [product_price[i]*product_demand[i] for i in range(n_elements)] # total revenue for each product
print(revenue)

In [None]:
# calculate the revenue only for the product with the price of more than $2

In [None]:
# total revenue for each product for price > 2
revenue_p2 = [product_price[i]*product_demand[i] for i in range(n_elements) if product_price[i] > 2] 
print(revenue_p2)

In [None]:
# which is the same as using the for and if statement as follows
revenue_p2_loop = []
for i in range(n_elements):
  if product_price[i] > 2:
    revenue_p2_loop.append(product_price[i]*product_demand[i])

print(revenue_p2_loop)

----
### Tuples
A tuple is a sequence of values. The values can be of any type. Tuples are immutable or unchangeable. Because tuples are immutable, their values cannot be modified (different from lists). You can see [this page](https://www.w3schools.com/python/python_tuples.asp) for more information.

Syntactically, a tuple is a comma-separated list of values. It is common to enclose tuples in parentheses:
`t = ('a', 'b', 'c', 'd', 'e')`

*NOTE:* you can basically think about a tuple as a list which cannot be modified/changed. Thus. the use of tuple is when any maniputation is not allowed.

To create a tuple with a single element, you have to include a final comma:

` t1 = 'a',
type(t1)
<class 'tuple'>`

A value in parentheses is not a tuple:

`t2 = ('a')
type(t2)
<class 'str'>`


In [None]:
province_tuple = ('Quebec', 'Ontario', 'Alberta')
print(type(province_tuple))

# we cannot make any change to the tuple
province_tuple.append('British Columbia')


In [None]:
# but we can copy it to list and make modification to the list instead
province_list = list(province_tuple)
print(province_list)
province_list.append('British Columbia')
print(province_list)

---
## 3. Dictionaries
A **dictionary** is like a list, but more general. In a list, the indices have to be integers; in a dictionary they can be (almost) any type.

A dictionary contains a collection of indices, which are called **keys**, and a collection of values. Each key is associated with a single value. The association of a key and a value is called a **key-value pair** or sometimes an **item**.

In mathematical language, a dictionary represents a **mapping** from keys to values, so you can also say that each key "maps to" a value. For more information,  see [this page](https://www.w3schools.com/python/python_dictionaries.asp).
### Example: Shipment Rates
An online retailer determines its shipment rates (in $) based on the location of the customer as follows.



|Alberta (AB)| British Columbia (BC) | Manitoba (MB)| New Brunswick (NB) | Newfoundland and Labrador (NL) | Nova Scotia (NS) | Ontario  (ON)  | Prince Edward Island (PE)|  Quebec (QC) |Saskatchewan (SK)| Yokon (YT)|
| :- | :- | :- | :- | :- | :- | :- | :- | :- | :- | :- |
|10 | 15 | 12.5 | NA | 30.5 | 25 | 8 | NA | 8 | 16 | 18.5 |

We would like to define a function which returns the shipment rate to charge to a customer based on its province location. First, we create a dictionary to save the information about the shipment rates adopted by the retailer.



In [None]:
# Format of the dictionary: {'province_abv': shipment rate}
# keys: abbreviation of the province
# values:  float - shipment rate if applicable

ship_rates = {'AB':  10,
             'BC':  15,
             'MB': 12.5,
             'NL': 30.5,
             'NS': 25,
             'ON':  8,
             'QC':  8,
             'SK':  16,
             'YT':  18.5} 
# Note that NB and PE are not included in the dictionary, as shipment is not available to these provinces
print(ship_rates)

In [None]:
cust_loc = 'QC'
print("The shipment rate for a customer in", cust_loc," is: ", ship_rates[cust_loc])
cust_loc = 'PE'
print("The shipment rate for a customer in", cust_loc," is: ", ship_rates[cust_loc])
    

We can also access the dictionary to known the shipment rate given the abbreviated name of the customer's province. In case that there is no shipment available, we can create a message that indicates the exception. The method `.get()` might be useful in this case (check [this page](https://www.w3schools.com/python/ref_dictionary_get.asp) for more information). 

In [None]:
cust_loc = 'QC'
print("The shipment rate for a customer in", cust_loc," is: ", ship_rates.get(cust_loc))
cust_loc = 'PE'
print("The shipment rate for a customer in", cust_loc," is: ", ship_rates.get(cust_loc))