<a href="https://colab.research.google.com/github/cbedart/CBPPS/blob/2024/CBPPS_part5_dictionaries_modules.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**<h1><center>Part 4 - Dictionaries and modules</center></h1>**

---

# **➤ Dictionaries**

### **General overview of dictionaries**

- Dictionary = Unordered, mutable, and indexed
- **Very similar to lists**, but it uses `{` and `}` (the colon indicates the relationship)
- Collection of key-value pairs, in the form `"key" : "value"`
- `Keys` = Unique and immutable (strings, numbers)
- `Values` = Can be any type of data


In [None]:
apple_dict = {"name":"apple", "color":"red", "taste":"sweet", "count":5}

print(apple_dict)

- You can retrieve the value associated with a given key using a syntax similar to the one used for lists, as `dict["key"]`. If the key does not exist, you will have a `KeyError`.
- You can also use the method `.get("key")` to avoid this `KeyError`, the output will be `None`. You can customise this output if it is not found by adding a second argument to get, as `.get("key", "Not found !")`
- But unlike lists, it is very easy to add an element using the same syntax, as `dict["new_key"] = "new value"`

In [None]:
apple_dict["name"]

In [None]:
apple_dict["shape"]

In [None]:
print(apple_dict.get("shape"))

In [None]:
print(apple_dict.get("shape", "Not found !"))

In [None]:
apple_dict["type"] = "fruit"

print(apple_dict)
print(apple_dict["type"])

<br />

- You can check if a key exist using the `in` operator, that gives a `True` or `False` boolean

In [None]:
if "type" in apple_dict:
    print(apple_dict["type"])
else:
    print("Not found !")

<br />

- Very useful if you need to count different items, by grouping all the variables in the same place

In [None]:
count_dicts = {"Actives":0, "Inactives":0, "Missing":0}

count_dicts["Actives"] += 3
count_dicts["Inactives"] += 8
count_dicts["Missing"] += 8

print(count_dicts)

<br />

- And you can get the length of a dictionary by using the `len()` function, as lists

In [None]:
len(count_dicts)

<br />

---   

### **List of dictionaries, dictionary of lists, and dictionary of dictionaries**

Since a list and a dictionary can contain all types of data, we can combine them to create very organised ways of storing data


- If we have a specific dictionary format but several elements, we can create a list of dictionaries, with a different element for each dictionary.
- We can access the element in the list with square brackets and the index number, then the value in the dictionary using square brackets and a key, as `list[index]["key"]`
- You can add a new element to the list using the concatenation methods for lists, for example `+` or `.append()`.

In [None]:
fruits_list = [{"name":"apple", "color":"red", "taste":"sweet", "count":5},
               {"name":"orange", "color":"orange", "taste":"sour", "count":10},
               {"name":"banana", "color":"yellow", "taste":"sweet", "count":3}]

print(fruits_list)

In [None]:
# If we want to get the name of the second fruit (so index 1):
fruits_list[1]["name"]

In [None]:
fruits_list.append({"name":"pear", "color":"green", "taste":"sweet", "count":7})
print(fruits_list)

<br /><br />

- If we have a specific dictionary format but several elements, we can also do the opposite = Create a dictionary of lists, with multiple lists for each key.
- We can access the element in the dictionary with square brackets and a key, then the value in the dictionary using an index number, as `dict["key"][index]`
- New elements can be added to the dictionary by retrieving one of the lists in the dictionary and using the list concatenation methods, such as `+` or `.append()`.

In [None]:
fruits_dict = {
    "name":["apple", "orange", "banana"],
    "color":["red", "orange", "yellow"],
    "taste":["sweet", "sour", "sweet"],
    "count":[5, 10, 3]
}

In [None]:
# If we want to get the name of the second fruit (so index 1):
fruits_dict["name"][1]

In [None]:
fruits_dict["name"] = fruits_dict["name"] + ["pear"]
fruits_dict["color"] = fruits_dict["color"] + ["green"]
fruits_dict["taste"].append("sweet")
fruits_dict["count"].append(7)
print(fruits_dict)

<br /><br />

- And you can create nested dictionaries, that will work in the same way
- The closest you can get to a spreadsheet using only base Python elements, with:
  - The first dictionary as the rows
  - The second dictionaries as the columns

In [None]:
fruits_dict_nested = {"apple":{"color":"red", "taste":"sweet", "count":5},
                      "orange":{"color":"orange", "taste":"sour", "count":10},
                      "banana":{"color":"yellow", "taste":"sweet", "count":3}}

fruits_dict_nested["apple"]["count"]

<br />

---   

### **Manipulating dictionaries using methods**

A whole set of methods can be used to manipulate dictionaries with precision.

#### **`.keys()` and `.values()` methods**

- As expected, the `.keys()` and `.values()` methods return the keys and values of a dictionary
- These elements are of type `dict_keys` and `dict_values` = they are special objects. They are **not indexable** (you can't find an element by index, as `dict.keys()[0]` will return an error) ...
  - ... BUT you can iterate on them in a loop (as the `range()` function)
  - ... BUT you can transform them into a list using the `list()` function to use them (as the `range()` function, again)


In [None]:
apple_dict = {"name":"apple", "color":"red", "taste":"sweet", "count":5}

print(apple_dict.keys())
print()
print(apple_dict.values())

In [None]:
for key in apple_dict.keys():
    print(key)

In [None]:
apple_dict.keys()[0]

In [None]:
list(apple_dict.keys())[0]

<br />

---   

#### **`.items()` methods**

- The .items() method returns a new dict_items object
- Again, not indexable but iterable
- Like enumerate(), this method give two variables `(key, value)`, in the form of a list of tuples, which will be modified at the same time

In [None]:
apple_dict = {"name":"apple", "color":"red", "taste":"sweet", "count":5}

print(apple_dict.items())

In [None]:
for key, value in apple_dict.items():
    print(key, value)

<br />

---   

#### **`.pop()` method**

- You can remove a value using its key in the `.pop()` method:

In [None]:
apple_dict = {"name":"apple", "color":"red", "taste":"sweet", "count":5}

print(apple_dict)

apple_dict.pop("count")

print(apple_dict)

<br />

---   

#**➤ Modules?**

- A module in Python is a file containing Python code that defines functions, classes, and variables
- Can be reused in other Python programs
- Modules allow developers to organize code logically and efficiently, enabling code reuse and simplifying debugging
- By using modules, you can access built-in Python functionality or extend the language with custom modules.
- Modules can be:
  - **Built-in modules**: Pre-installed with Python (os, sys, math, etc.).
  - **Third-party modules**: Available for installation via package managers like pip.
  - **User-defined modules**: Created by writing your own Python scripts.
- To import and use a module:
  - `import module_name` statement to load the full module and its components
  - `from module_name import component` to only load specific a component of the module
- A component can be a function, a group of functions, classes, new global variables, etc.

<br />

**Good practices:**
- Import modules at the very beginning of your python .py script, or in the first code block of a jupyter notebook
- Import one module per line
- Import only the modules you really need, and sometimes only load the components you really need to avoid wasting computing power and polluting the namespace.
- Group imports logically
- Use aliases for modules with long names or commonly used ones to improve code readability

In [None]:
import math

print(math.sqrt(16))
print(math.sin(12))

In [None]:
from math import sqrt

print(sqrt(16))

- You can also define an alias, a short name, for your module using `as`

In [None]:
import math as mt

print(mt.sqrt(16))

- Almost all built-in and third-party modules include help that you can access at any time using the `help()` function with the name of the module
- This lists all the components that can be used from this module

In [None]:
help(math)

- You can also see all the functions or variables associated with a module object using the `dir()` function.

In [None]:
dir(math)

### **Some common modules**

There are a number of modules that you are likely to use if you program in Python. Here is a non-exhaustive list:

**➤ `math` module** = Basic mathematical functions and constants (sin, cos, exp, pi, e, ...)

**➤ `statistics` module** = Basic statistical functions for numeric data analysis, really similar as the math module

In [None]:
import math

print(math.cos(12)) # Cosinus
print(math.sin(12)) # Sinus
print(math.exp(12)) # Exponential
print(math.sqrt(12)) # Square-root
print(math.pi) # Pi value


In [None]:
import statistics

print(statistics.mean([1,2,3,4,5])) # The mean of the list
print(statistics.median([1,2,3,4,5])) # The median of the list
print(statistics.mode([1,2,3,4,5,5])) # The most common value
print(statistics.stdev([1,2,3,4,5])) # The standard deviation of the list

**➤ `os` module** = Operating System interface, essential for tasks like file and directory management, environment variable handling, and process control

In [None]:
import os

# Access an environment variable from Linux, Windows, etc.
print(os.getenv('PATH'))


# Create or remove a directory
os.mkdir('new_dir')
os.rmdir('new_dir')

# Check the current working directory:
print(os.getcwd())

# Change the current working directory
# A really good use of the os module is to setup your working directory at the very
# start of your script, so every single file you will create will be located in
# this specific directory
os.chdir("/content")



**➤ `sys` module** = System-specific parameters and functions, that gives access to system-level information and allows interaction with the Python runtime environment

In [None]:
import sys

# Command-line argument
print(sys.argv)

# Exit the script using sys.exit(), but in Colab that will shutdown your working environment so never use it

# Check python version
print(sys.version)

**➤ `random` module** = Random number generation, used for generating random numbers and performing random selections

In [None]:
import random

# Random float between 0 and 1
print(random.random())

# Random integer between 0 and 10
print(random.randint(0, 10))

# Random item from a list
test_list = [1,2,3,4]
print(random.choice(test_list))

# Shuffle a list in place
random.shuffle(test_list)
print(test_list)

**➤ `datetime` module** = Date and time manipulation, essential for working with dates and times in Python

In [None]:
import datetime

# Get the current date and time
today = datetime.datetime.now()
print(today)

# Get a time in X days
future_date = today + datetime.timedelta(days=7)
print(future_date)

# String format a datetime object using strftime (as "string-f time") and format codes
print(today.strftime("%A the %d of %B, and the time is %I:%M %p, or %H:%M."))
print(today.strftime("We are the week number %W, the day number %j of the year %Y"))

**➤ `pathlib` module** = File and directory management

**➤ `urllib` module** = Retrieve data from the Internet using Python

**➤ `Tkinter` module** = Python interface with Tk to create graphical objects

**➤ `re` module** = Management of regular expressions

**➤ `json` module** = To work using JSON data, commonly used in APIs and web applications

**➤ `collections` module** = Advanced data structures

**➤ `itertools` module** = Tools for creating iterators to perform efficient looping


<br />

---   

#**➤ Exercises :**


**<u>Exercise 1:</u>**
1. Create a dictionary to store information about a drug, with "name", "dose_mg", "side_effects", and "type" keys. You need to use a list as the value for "side_effects".
2. Add a new key "price" with a value.
3. Update the "side_effects" key to add 2 additional side effects.
4. Remove the "price" key.

In [None]:
# Exercise 1 - #1



In [None]:
# Exercise 1 - #2



In [None]:
# Exercise 1 - #3



In [None]:
# Exercise 1 - #4



**<u>Exercise 2:</u>** As a reminder, you can also create dictionaries in dictionaries to act as spreadsheets. You have a dictionary of hospitals `hospitals_dict` that include the amount of beds in each, total and available.
1. Print, using an f-string, the number of available beds in the "Roger Salengro" hospital.
2. Add the information for the "Jeanne De Flandre" hospital, with 12 total beds and 4 available.
3. Update the available beds in "Claude Huriez" from 2 to 7.
4. Calculate how many total and available beds there are in all the hospitals (hint = loop).

In [None]:
# Exercise 2 - #1
hospitals_dict = {
    "Claude Huriez": {"total_beds": 10, "available_beds": 2},
    "Roger Salengro": {"total_beds": 8, "available_beds": 5},
    "Oscar Lambret": {"total_beds": 17, "available_beds": 9}
}



In [None]:
# Exercise 2 - #2



In [None]:
# Exercise 2 - #3



In [None]:
# Exercise 2 - #4



**<u>Exercise 3 - The belote hand counter:</u>** In a game of belote with the sans-atout variant, each card is worth a certain number of points, whatever its suit (clubs, diamonds, hearts, spades). A dictionary can be used to find the correspondence between each card and its number of points:
`dict_belote_points = {"7": 0, "8": 0, "9": 0, "10":10, "V": 2, "D": 3, "R": 4, "A": 11}`

Belote is played with a deck of 32 cards, which can be represented by a list:

`card_game = ["7", "8", "9", "10", "V", "D", "R", "A"] * 4`

<br />

Create a python function that will draw X random cards without replacement (using the `random.sample()` function using an argument to say how many random elements you will get), and count the total amount of points in the hand. You want an output as, by drawing 8 cards:


```
Your hand is ["7", "A", "9", "A", "R", "10", "V", "D"].
The total number of points in the hand is 41.
```



In [None]:
dict_belote_points = {"7": 0, "8": 0, "9": 0, "10":10, "V": 2, "D": 3, "R": 4, "A": 11}
card_game = ["7", "8", "9", "10", "V", "D", "R", "A"] * 4

####################################







**<u>Exercise 4 - Use case in data analysis:</u>** You are going to study some data from IMDb, a website that lists movies and series and lets users rate them.

- By launching the first block below, you will download the file `IMDb_dataset.tsv` under the path `/content/imdb/IMDb_dataset.tsv` (you can click on this link after running the block to see the file in a new window on the right).
- This is a `tab-separated value` text file, where all the fields in a line are separated by a tab `\t`.
- All movies are considered to be longer than 60 minutes, and all series shorter than 60 minutes.

<br />

From the data in this file:

1. Read the file, and transform it into a more suitable format, using what you've seen so far, so you can use it easily.
2. What is the average score for all the movies, and for all the series? You can use the `statistics.mean()` function.
3. Using the "certificate" data, and using a dictionary, what is the most common recommended audience categories for all these productions? (hint = `if key in dict: += 1` `else: add the key`)
4. How many productions were created in 1998? In 2002? In 2015? (hint = `a="(2008–2013)"` ; `a[1:5] → "2008"`)
5. Calculate a new variable dividing the number of votes by the rating awarded. Which production is the most successful, with the highest score in relation to the number of votes? And the least one?
6. Search your data for information on "Arcane", the best series ever created, by displaying its score and the number of voters using `print()` and an f-string. Generalize your code with a function to give the information for any row in the dataset if it exists, otherwise say that the name is not in the list.

In [None]:
##### RUN BEFORE YOUR EXERCISE #####
!wget https://github.com/cbedart/CBPPS/raw/refs/heads/2024/IMDb_dataset.tsv > /dev/null 2>&1
!mkdir imdb ; mv IMDb_dataset.tsv imdb/IMDb_dataset.tsv
####################################

In [None]:
# Exercise 4 - #1
# Read the file, and transform it into a more suitable format, using what you've seen so far, so you can use it easily.





In [None]:
# Exercise 4 - #2
# What is the average score for all the movies, and for all the series? You can use the statistics.mean() function.





In [None]:
# Exercise 4 - #3
# Using the "certificate" data, and using a dictionary, what is the most common recommended audience categories for all these
# productions. (hint = if key in dict: += 1 else: add the key)





In [None]:
# Exercise 4 - #4
# Read the file, and transform it into a more suitable format, using what you've seen so far, so you can use it easily.





In [None]:
# Exercise 4 - #5
# Calculate a new variable dividing the number of votes by the rating awarded. Which production is the most successful, with
# the highest score in relation to the number of voters? And the least one?





In [None]:
# Exercise 4 - #6
# Search your data for information on "Arcane", the best series ever created, by displaying its score and the number of voters
# using print() and an f-string. Generalise your code to give the information for any row in the dataset if it exists, otherwise
# say that the name is not in the list.



