## Python Fundamentals for Data Engineering



what you will learn in Python Fundamentals for Data Engineering

https://youtu.be/yDvRR8nWMNI

Goals of this Unit

The goal of this unit is to develop your Python 3 skills. Python is an essential tool for Data Engineers because much of your job is to write custom programs to process data.

After this unit, you will be able to:
- Write Python programs
- Use Python functions
- Manage programs using control flow
- Store data using Python lists
- Iterate over lists of data using loops and list comprehensions
- Manipulate strings in Python
- Map data using Python dictionaries
- Create your own data types with Python classes
- Work with modules and files
- Build Python projects that tackle real data

Python refers to these mistakes as errors and will point to the location where an error occurred with a ^ character. When programs throw errors that we didn’t expect to encounter we call those errors bugs. Programmers call the process of updating the program so that it no longer produces unexpected errors debugging.

Two common errors that we encounter while writing Python are SyntaxError and NameError.
- SyntaxError means there is something wrong with the way your program is written — punctuation that does not belong, a command where it is not expected, or a missing parenthesis can all trigger a SyntaxError.
- A NameError occurs when the Python interpreter sees a word it does not recognize. Code that contains something that looks like a variable but was never defined will throw a NameError.

Plus Equals

Python offers a shorthand for updating variables. When you have a number saved in a variable and want to add to the current value of the variable, you can use the += (plus-equals) operator.



```
hike_caption = "What an amazing time to walk through nature!"

# Almost forgot the hashtags!
hike_caption += " #nofilter"
hike_caption += " #blessed"

```




### CONTROL FLOW
Boolean Expressions

Boolean Variables

Before we go any further, let’s talk a little bit about True and False. You may notice that when you type them in the code editor (with uppercase T and F), they appear in a different color than variables or strings. This is because True and False are their own special type: bool.

True and False are the only bool types, and any variable that is assigned one of these values is called a boolean variable.

Boolean variables can be created in several ways. The easiest way is to simply assign True or False to a variable:
- set_to_true = True
- set_to_false = False

You can also set a variable equal to a boolean expression.
- bool_one = 5 != 7
- bool_two = 1 + 1 != 2
- bool_three = 3 * 3 == 9

These variables now contain boolean values, so when you reference them they will only return the True or False values of the expression they were assigned.
- print(bool_one)    # True
- print(bool_two)    # False
- print(bool_three)  # True




### ERRORS IN PYTHON
Introduction to Bugs

“First actual case of a bug being found.“

computer scientist Grace Hopper found a moth in the Harvard Mark II computer’s logbook and reported the world’s first literal computer bug. However, the term “bug,” in the sense of technical error, dates back at least to 1878 and with Thomas Edison.

Python refers to the mistakes within the program as errors and will point to the location where an error occurred with a ^ character. When programs throw errors that we didn’t expect to encounter, we call those errors bugs. Programmers call the process of updating the program so that it no longer produces bugs debugging.

During your programming journey, you are destined to encounter innumerable red errors. Some even say that 75% of development time is spent on debugging. But what makes a programmer successful isn’t avoiding errors, it’s knowing how to solve them. And a good place to start is understanding what they are.

In Python, there are many different ways of classifying errors, but here are some common ones:
1. SyntaxError: Error caused by not following the proper structure (syntax) of the language.
2. NameError: Errors reported when the interpreter detects a variable that is unknown. Or forgetting to define a variable
3. TypeError: Errors thrown when an operation is applied to an object of an inappropriate type.
4. Logic errors: Errors found by the programmer when the program isn’t doing what it is intending to do.

In this mini-lesson, we will be looking at these different error messages, and you’ll get some practice by debugging them one by one!




### What is a List?
In programming, it is common to want to work with collections of data. In Python, a list is one of the many built-in data structures that allows us to work with a collection of data in sequential order.

Suppose we want to make a list of the heights of students in a class:
- Noelle is 61 inches tall
- Ava is 70 inches tall
- Sam is 67 inches tall
- Mia is 64 inches tall

In Python, we can create a variable called heights to store these integers into a list:

heights = [61, 70, 67, 64]

Notice that:
- A list begins and ends with square brackets ([ and ]).
- Each item (i.e., 67 or 70) is separated by a comma (,)
- It’s considered good practice to insert a space () after each comma, but your code will run just fine if you forget the space.
- Let’s write our own list!


### MODULES IN PYTHON
Modules Python Namespaces

Notice that when we want to invoke the randint() function we call random.randint(). This is default behaviour where Python offers a namespace for the module. A namespace isolates the functions, classes, and variables defined in the module from the code in the file doing the importing. Your local namespace, meanwhile, is where your code is run.

Python defaults to naming the namespace after the module being imported, but sometimes this name could be ambiguous or lengthy. Sometimes, the module’s name could also conflict with an object you have defined within your local namespace.

Fortunately, this name can be altered by aliasing using the as keyword:

`import module_name as name_you_pick_for_the_module`

Aliasing is most often done if the name of the library is long and typing the full name every time you want to use one of its functions is laborious.

You might also occasionally encounter import *. The * is known as a “wildcard” and matches anything and everything. This syntax is considered dangerous because it could pollute our local namespace. Pollution occurs when the same name could apply to two possible things. For example, if you happen to have a function floor() focused on floor tiles, using from math import * would also import a function floor() that rounds down floats.

Let’s combine your knowledge of the random library with another fun library called matplotlib, which allows you to plot your Python code in 2D.

You’ll use a new random function random.sample() that takes a range and a number as its arguments. It will return the specified number of random numbers from that range.

```
#random.sample takes a list and randomly selects k items from it
new_list = random.sample(list, k)
#for example:
nums = [1, 2, 3, 4, 5]
sample_nums = random.sample(nums, 3)
print(sample_nums) # 2, 5, 1
```




# Python Pandas for Data Engineers
After this unit, you will be able to:
- Load, modify, and analyse a DataFrame
- Transform a Series of data
- Leverage lambda expressions to write custom programs to manipulate your DataFrames
- Perform SQL-like functions on multiple DataFrames


## CREATING, LOADING, AND SELECTING DATA WITH PANDAS

### Create a DataFrame I
A DataFrame is an object that stores data as rows and columns. You can think of a DataFrame as a spreadsheet or as a SQL table. You can manually create a DataFrame or fill it with data from a CSV, an Excel spreadsheet, or a SQL query.


### Select Columns

When we select a single column, the result is called a Series.

`<class 'pandas.core.series.Series'> `

There are two possible syntaxes for selecting all values from a column:

- Select the column as if you were selecting a value from a dictionary using a key. In our example, we would type customers['age'] to select the ages.
- If the name of a column follows all of the rules for a variable name (doesn’t start with a number, doesn’t contain spaces or special characters, etc.), then you can select it using the following notation: df.MySecondColumn. In our example, we would type customers.age.



```
print(type(clinic_north))  #Output:<class 'pandas.core.series.Series'>
print(type(df))  #Output:<class 'pandas.core.frame.DataFrame'>
```

### Selecting Multiple Columns
When you have a larger DataFrame, you might want to select just a few columns.

```
linic_north_south = df[['clinic_north', 'clinic_south']]
print(type(clinic_north_south)) #Output:<class 'pandas.core.frame.DataFrame'>


clinic_north = df[['clinic_north']]
print(type(clinic_north)) #Output: <class 'pandas.core.frame.DataFrame'>
```


### Select Rows
To select this single row of data of n=2.
We select it using the following command:


```
df.iloc[2]
print(type(df)) #Output:<class 'pandas.core.series.Series'>
```
When we select a single row, the result is a Series (just like when we select a single column).

### Selecting Multiple Rows
You can also select multiple rows from a DataFrame.

Here are some different ways of selecting multiple rows:

1. orders.iloc[3:7]
  1. select all rows: 3rd, 4th, 5th, and 6th rows
  - starting at the 3rd row
  - up to but not including the 7th row

2. orders.iloc[:4]
  1. would select all rows: 0th, 1st, 2nd, and 3rd rows
  - starting at the 0th row
  - up to but not including the 4th row

3. orders.iloc[-3:]
  1. would select the rows starting at the 3rd last, 2nd last and final rows
  - starting at the 3rd last row
  - up to and including the final row


```
df.iloc[3:]
print(type(df)) #Output:<class 'pandas.core.frame.DataFrame'>
```

### Select Rows with Logic I (Filter)

You can select a subset of a DataFrame by using logical statements:

```
df [(df.MyColumnName == desired_column_value)]
```

In Python
1. exactly equal `==` Here, we select all rows where the customer’s age is 30
```
df[df.age == 30]
```

1. Greater Than, `>` Here, we select all rows where the customer’s age is greater than 30
```
df[df.age > 30]
```

1. Less Than, `<` Here, we select all rows where the customer’s age is less than 30
```
df[df.age < 30]
```

1. Not Equal, `!=` This snippet selects all rows where the customer’s name is not Clara Oswald
```
df[df.name != 'Clara Oswald']
```






### Select Rows with Logic II (filter2)
You can also combine multiple logical statements, as long as each statement is in parentheses.
In Python, `|` means “or” and `&` means “and”.

```
df [ (filter_1) |(filter_2) & (filter_3) ... ]
```

### Select Rows with Logic III
Suppose we want to select the rows where the customer’s name is either “Martha Jones”, “Rose Tyler” or “Amy Pond”.

We could use the `isin` command to check that `df.name` is one of a list of values:

```
my_list = ['Martha Jones','Rose Tyler','Amy Pond']
df[ df.name.isin(my_list) ]
```

### Setting indices
`df.reset_index()`,
- Creates a new DataFrame with a new set of indices
- old indices have been moved into a new column called `index`

`df.reset_index(drop=True)`
- old indices have been completely removed from the new DataFrame

`df.reset_index(inplace=True)` or `df.reset_index(inplace=True, drop=True)`
- This modifies our existing DataFrame


1. In this example, you’ll be the data analyst for ShoeFly.com, a fictional online shoe store. You’ve seen this data; now it’s your turn to work with it! Load the data from shoefly.csv into the variable orders.
```
orders = pd.read_csv('shoefly.csv')
```





2. Inspect the first 5 lines of the data.
```
print(orders.head(10))
```


3. Your marketing department wants to send out an email blast to everyone who ordered shoes! Select all of the email addresses from the column email and save them to a variable called emails.
```
emails = orders.email
```


4. Frances Palmer claims that her order was wrong. What did Frances Palmer order? Use logic to select that row of orders and save it to the variable frances_palmer.
```
frances_palmer = orders[(orders.first_name == 'Frances' ) & (orders.last_name == 'Palmer')]
```

5. We need some customer reviews for our comfortable shoes. Select all orders for shoe_type: clogs, boots, and ballet flats and save them to the variable comfy_shoes.
```
my_list=['clogs', 'boots', 'ballet flats']
comfy_shoes = orders[(orders.shoe_type.isin(my_list)) ]
```

## Modifying DataFrames
- Adding columns to a DataFrame
- Using lambda functions to calculate complex quantities
- Renaming columns

### Adding a Column
Adding  a column to an existing DataFrame


```
df = pd.DataFrame([
  [1, '3 inch screw', 0.5, 0.75],
  [2, '2 inch nail', 0.10, 0.25],
  [3, 'hammer', 3.00, 5.50],
  [4, 'screwdriver', 2.50, 3.00]
  ],
  columns=['Product ID', 'Description', 'Cost to Manufacture', 'Price']
                )
```

1. we can add a new column by giving a list of the same length as the existing DataFrame.
```
df['Sold in Bulk?'] = ['Yes','Yes','No','No']
#same length as the existing DataFrame
```
2. add a new column that is the same for all rows in the DataFrame.
```
df['Is taxed?'] = 'Yes'
```

3. add a new column by performing a function on the existing columns
```
df['Margin'] = df.Price - df['Cost to Manufacture']
```

### Performing Column Operations
We can use the apply() function `df.ColumnName.apply(...)` to apply a function to every value in a particular column.

1. Apply the function lower to all names in column 'Name' in df. Assign these new names to a new column of df called 'Lowercase Name'.
```
df['Lowercase Name'] = df.Name.apply(str.lower)
```

2. Applying a Lambda to a Column
Use the following code with a lambda function and the string method .split():
Ex:
```
df['Email Provider'] = df.Email.apply(   lambda x :  x.split( '@' )[ -1 ] )
```
  Use the lambda function get_last_name to create a new column last_name with only the employees’ last name
```
get_last_name = lambda full_name: full_name.split(' ')[-1]
df['last_name'] = df.name.apply(  get_last_name )
```
 or
```
#df['last_name'] = df.name.apply(  lambda full_name: full_name.split(' ')[-1] )  
```

3. Applying a Lambda to a Row

If we use `apply()`
- without specifying a single column
  
  and
- add the argument `axis=1`

  … the input to our lambda function will be an entire row, not a column.
```
df['Price with Tax'] = df.apply(lambda row: row['Price'] * 1.075 if row['Is taxed?'] == 'Yes' else row['Price']  , axis=1 )
```
Use the lambda function total_earned and apply to add a column total_earned to df with the total amount earned by each employee. If an employee worked for more than 40 hours, she needs to be paid overtime (1.5 times the normal hourly wage).
```
total_earned = lambda row: row['hours_worked']*row['hourly_wage'] if row['hours_worked'] <= 40 else 40*row['hourly_wage'] + (row['hours_worked'] - 40)*(row['hourly_wage'] * 1.50)
df = pd.read_csv('employees.csv')
df['total_earned'] = df.apply(total_earned,axis=1)
```

4. Renaming Columns
```
df = pd.DataFrame({ 'name': ['John', 'Jane', 'Sue', 'Fred'], 'age': [23, 29, 21, 18] })
df.columns = ['First Name', 'Age']
```
This command edits the existing DataFrame df.

5. Rename individual columns by using the `.rename()` method

Pass a dictionary like the one below to

Here’s an example:
```
df = pd.DataFrame({  'name': ['John', 'Jane', 'Sue', 'Fred'],  'age': [23, 29, 21, 18] })
df.rename(columns={  'name': 'First Name', 'age': 'Age'},  inplace=True)
```

- Syntax for the `columns` keyword argument:
```
{'old_column_name1': 'new_column_name1', 'old_column_name2': 'new_column_name2'}
```
- `.rename()`  with only the `columns` keyword will create a new DataFrame. Using `inplace=True` lets us edit the original DataFrame.
- You can rename just one column

```
{'old_column_name1': 'new_column_name1}
```
- You can be specific about which column names are getting changed
- If you misspell one of the original column names, this command won’t fail. It just won’t change anything.

#### Question


1. Many of our customers want to buy vegan shoes (shoes made from materials that do not come from animals). Add a new column called shoe_source, which is vegan if the materials is not leather and animal otherwise.
```
orders['shoe_source'] = orders.shoe_material.apply(lambda x:'animal'if 'leather'== x else 'vegan' )
print(orders.head(5))
```

2. Our marketing department wants to send out an email to each customer. Using the columns last_name and gender create a column called salutation which contains Dear Mr. <last_name> for men and Dear Ms. <last_name> for women.
```
salutation = lambda row:'Dear Ms. {}'.format(row.last_name) if row['gender'] == 'female' else 'Dear Mr. {}'.format(row.last_name)
orders['salutation'] = orders.apply(salutation,axis=1)
print(orders.head(5))
```

3. Data for all of the locations of Petal Power is in the file inventory.csv. Load the data into a DataFrame called inventory.
```
inventory = pd.read_csv('inventory.csv')
```


4. Inspect the first 10 rows of inventory.
```
print(inventory.head(10))
```



5. The first 10 rows represent data from your Staten Island location. Select these rows and save them to staten_island.
```
staten_island = inventory.iloc[:11]
```
or
```
staten_island = inventory[inventory.location == 'Staten Island']
```


6. A customer just emailed you asking what products are sold at your Staten Island location. Select the column product_description from staten_island and save it to the variable product_request.
```
product_request = staten_island.product_description
```


7. Another customer emails to ask what types of seeds are sold at the Brooklyn location. Select all rows where location is equal to Brooklyn and product_type is equal to seeds and save them to the variable seed_request.
```
seed_request = inventory[(inventory.location == 'Brooklyn') & (inventory.product_type == 'seeds')]
```




8. Add a column to inventory called in_stock which is True if quantity is greater than 0 and False if quantity equals 0.
```
inventory['in_stock'] = inventory.quantity.apply(lambda x: 'False'if x==0 else 'True')
```


9. Petal Power wants to know how valuable their current inventory is. Create a column called total_value that is equal to price multiplied by quantity.
```
inventory['total_value'] = inventory.apply(lambda row: row['price']*row['quantity']  , axis=1)
```


10. The Marketing department wants a complete description of each product for their catalog. The following lambda function combines product_type and product_description into a single string
```
combine_lambda = lambda row: '{} - {}'.format(row.product_type, row.product_description)
```


11. Using combine_lambda, create a new column in inventory called full_description that has the complete description of each product.
```
inventory['full_description'] = inventory.apply(combine_lambda,axis=1)
```


## AGGREGATES IN PANDAS

### Calculating Column Statistics
- Aggregate functions summarise many data points (i.e., a column of a dataframe) into a smaller set of values.
- The general syntax for these calculations is:

```
df.column_name.command()
```


```
Command |Description
--------+----------------------------------
mean    |Average of all values in column
--------+----------------------------------
std     |Standard deviation
--------+----------------------------------
median  |Median
--------+----------------------------------
max     |Maximum value in column
--------+----------------------------------
min     |Minimum value in column
--------+----------------------------------
count   |Number of values in column
--------+----------------------------------
nunique |Number of unique values in column
--------+----------------------------------
unique  |List of unique values in column


```




### Calculating Aggregate Functions
```
df.Location == df['Location']
```
```
df.groupby(['Location', 'Day of Week']).Total_Sales == df.groupby(['Location', 'Day of Week'])['Total Sales']
```


#### Part 1(groupby())
When we have a bunch of data, we often want to calculate aggregate statistics (mean, standard deviation, median, percentiles, etc.) over certain subsets of the data.
- `groupby` function creates a new Series, not a DataFrame

In general, we use the following syntax to calculate aggregates:
```
PandasCoreSeries  = df.groupby('column1').column2.measurement()
```
where:
- column1 is the column that we want to group by ('student' in our example)
- column2 is the column that we want to perform a measurement on (grade in our example)
- measurement is the measurement function we want to apply (mean in our example)

Example 1
```
grades = df.groupby('student').grade.mean()
```
Output:
```
student |grade
--------+-------
Amy     |80
--------+-------
Bob     |90
--------+-------
Chris   |75
```


Example 2
```
pricey_shoes = orders.groupby('shoe_type').price.max()
print(type(pricey_shoes)) # <class 'pandas.core.series.Series'>
```



#### Part 2 (reset_index())
Use `reset_index()`:
- This will transform our Series into a DataFrame
- move the indices into their own column.

Generally, you’ll always see a `groupby()` statement followed by `reset_index()`:
```
df_1  = df.groupby('column1').column2.measurement().reset_index()
```

Example 1:
```
teas_counts = teas.groupby('category').id.count().reset_index()
```
output:

```
  |category |id
--+---------+----
0 |black    |3
--+---------+----
1 |green    |4
--+---------+----
2 |herbal   |8
--+---------+----
3 |white    |2
--+---------+----
```
or
```
teas_counts = teas_counts.rename(columns={"id": "counts"})
Output:
```

```
  |category |id
--+---------+----
0 |black    |3
--+---------+----
1 |green    |4
--+---------+----
2 |herbal   |8
--+---------+----
3 |white    |2
--+---------+----
```

Example 2
```
pricey_shoes = orders.groupby('shoe_type').price.max().reset_index()
print(type(pricey_shoes)) #<class 'pandas.core.frame.DataFrame'>
```




#### Part 3 (apply() and lambda())

Example1:  

Calculating percentiles. Suppose we have a DataFrame of employee information called df that has the following columns:

- id: the employee’s id number
- name: the employee’s name
- wage: the employee’s hourly wage
- category: the type of work that the employee does


```
id    |name          |wage |category
------+--------------+-----+---------
10131 |Sarah Carney  |39   |product
------+--------------+-----+---------
14189 |Heather Carey |17   |design
```
calculate the 75th percentile value for each category:
- i.e., the point at which 75% of employees have a lower wage and 25% have a higher wage
- we can use the following combination of apply and a lambda function
```
# np.percentile can calculate any percentile over an array of values
high_earners = df.groupby('category').wage.apply(lambda x: np.percentile(x, 75)).reset_index()
```
Output:


```
  |category  |wage
--+----------+-----
0 |design    |23
--+----------+-----
1 |marketing |35
--+----------+-----
2 |product   |48
```



#### Part 4 (groupby(['Column1', 'Column2']))
- passing a list of column names into the groupby() method.

Example1:
```
df.groupby(['Location', 'Day of Week'])['Total Sales'].mean().reset_index()
```
Output:
```
Location     |Day of Week |Total Sales
-------------+------------+-------------
Chelsea      |M           |402.50
-------------+------------+-------------
Chelsea      |Tu          |422.75
-------------+------------+-------------
Chelsea      |W           |452.00
-------------+------------+-------------
…            |…           |…
-------------+------------+-------------
West Village |M           |390
-------------+------------+-------------
West Village |Tu          |400
```
Example2:

At ShoeFly.com, our Purchasing team thinks that certain `shoe_type`/`shoe_color` combinations are particularly popular this year (for example, blue ballet flats are all the rage in Paris).

Create a DataFrame with the total number of shoes of each `shoe_type`/`shoe_color` combination purchased. Save it to the variable `shoe_counts`.

You should be able to do this using `groupby()` and `count()`.

Remember to use `reset_index()` at the end of your code!

Note: When we’re using count(),
- it doesn’t really matter which column we perform the calculation on.
- You should use id in this example, but we would get the same answer if we used shoe_type or last_name.

```
shoe_counts = orders.groupby(['shoe_type', 'shoe_color']).first_name.count().reset_index()
```
Or
```
shoe_counts = orders.groupby(['shoe_type', 'shoe_color']).id.count().reset_index()
```

### Pivot Tables
- Output of a pivot() command is a new DataFrame,
- but the indexing tends to be “weird”, so we usually follow up with .reset_index().

In Pandas, the command for pivot is:
```
new_pivot_table = df.pivot(columns='ColumnToPivot', index='ColumnToBeRows', values='ColumnToBeValues').reset_index()
```

Pivot without `.reset_index()`.
![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20AGGREGATES%20IN%20PANDAS%201.png)

Exampl1:
We suspected that there might be different sales on different days of the week at different stores:

- so we performed a `groupby()` across two different columns (`Location` and Day of `Week`)
- Reorganising a table in this way is called pivoting.
- The new table is called a pivot table.



```
Before:  

Location     |Day of Week |Total Sales
-------------+------------+-------------
Chelsea      |M           |300
-------------+------------+-------------
Chelsea      |Tu          |310
-------------+------------+-------------
Chelsea      |W           |375
-------------+------------+-------------
Chelsea      |Tu          |390
-------------+------------+-------------
…            |…           |…
-------------+------------+-------------
West Village |Tu          |450
-------------+------------+-------------
West Village |F           |390
-------------+------------+-------------
West Village |Sa          |250
```
```
After pivoting: more useful if the table was formatted to a pivot table


Location	 |M   |Tu  |W   |Th  |F   |Sa  |Su
-------------+----+----+----+----+----+----+----
Chelsea	  |300 |310 |375 |390 |300 |150 |175
-------------+----+----+----+----+----+----+----
West Village |300 |310 |400 |450 |390 |250 |200


```

```
# First use the groupby statement:
unpivoted = df.groupby(['Location', 'Day of Week'])['Total Sales'].mean().reset_index()
# Now pivot the table
pivoted = unpivoted.pivot(columns='Day of Week', index='Location', values='Total Sales')
```

### Review
This lesson introduced you to aggregates in Pandas. You learned:
- How to perform aggregate statistics over individual rows with the same value using `groupby(`).
- How to rearrange a DataFrame into a pivot table, a great way to compare data across two dimensions.

Questions
1.
Let’s examine some more data from ShoeFly.com. This time, we’ll be looking at data about user visits to the website (the same dataset that you saw in the introduction to this lesson).
The data is a DataFrame called `user_visits`. Use `print` and `head()` to examine the first few rows of the DataFrame.

```
print(user_visits.head())
```
```
  |id    |first_name |last_name |email                      |month      |utm_source
--+------+-----------+----------+---------------------------+-----------+------------
0 |10043 |Louis      |Koch      |LouisKoch43@gmail.com      |3-March    |yahoo
--+------+-----------+----------+---------------------------+-----------+------------
1 |10150 |Bruce      |Webb      |BruceWebb44@outlook.com    |3-March    |twitter
--+------+-----------+----------+---------------------------+-----------+------------
2 |10155 |Nicholas   |Hoffman   |Nicholas.Hoffman@gmail.com |2-February |google
--+------+-----------+----------+---------------------------+-----------+------------
3 |10178 |William    |Key       |William.Key@outlook.com    |3-March    |yahoo
--+------+-----------+----------+---------------------------+-----------+------------
4 |10208 |Karen      |Bass      |KB4971@gmail.com           |2-February |google
```


2.
The column `utm_source` contains information about how users got to ShoeFly’s homepage. For instance, if `utm_source` = `Facebook`, then the user came to ShoeFly by clicking on an ad on Facebook.com.
Use a `groupby()` statement to calculate how many visits came from each of the different sources. Save your answer to the variable `click_source`. Remember to use reset_index()!
```
click_source = user_visits.groupby('utm_source').id.count().reset_index()
```
3.
```
print(click_source)
```

```
  |utm_source |id
--+-----------+-----
0 |email      |462
--+-----------+-----
1 |facebook   |823
--+-----------+-----
2 |google     |543
--+-----------+-----
3 |twitter    |415
--+-----------+-----
4 |yahoo      |757
```


4.
Our Marketing department thinks that the traffic to our site has been changing over the past few months. Use groupby to calculate the number of visits to our site from each utm_source for each month. Save your answer to the variable click_source_by_month.
```
click_source_by_month = user_visits.groupby(['utm_source', 'month']).id.count().reset_index()
```
5.
The head of Marketing is complaining that this table is hard to read. Use pivot to create a pivot table where the rows are utm_source and the columns are month. Save your results to the variable click_source_by_month_pivot.

```
click_source_by_month_pivot = click_source_by_month.pivot(
                              columns='month',
                              index='utm_source',
                              values='id'
                              ).reset_index()
```


View your pivot table by pasting the following code into script.py:
```
print(click_source_by_month_pivot)
```


A/B Testing for ShoeFly.com

answer: https://youtu.be/cW7B7PR03mg

1.
Examine the first few rows of ad_clicks.

```
print(ad_clicks.head(10))
```


2.
Your manager wants to know which ad platform is getting you the most views. How many views (i.e., rows of the table) came from each utm_source?


```
max_ad_platform_views = ad_clicks.groupby('utm_source').user_id.count().reset_index()
```

3.
If the column ad_click_timestamp is not null, then someone actually clicked on the ad that was displayed. Create a new column called is_click, which is True if ad_click_timestamp is not null and False otherwise.

```
ad_clicks['is_click'] = ad_clicks.ad_click_timestamp.notnull()
print(ad_clicks.head(10))
```
4.
We want to know the percent of people who clicked on ads from each utm_source. Start by grouping by utm_source and is_click and counting the number of user_id‘s in each of those groups. Save your answer to the variable clicks_by_source.

```
clicks_by_source = ad_clicks.groupby(['utm_source' , 'is_click']).user_id.count().reset_index()
```
5.
Now let’s pivot the data so that the columns are is_click (either True or False), the index is utm_source, and the values are user_id. Save your results to the variable clicks_pivot.

```
clicks_pivot = clicks_by_source.pivot(index='utm_source',  columns='is_click',  values='user_id' )
```

6.
Create a new column in clicks_pivot called percent_clicked which is equal to the percent of users who clicked on the ad from each utm_source. Was there a difference in click rates for each source?

```
value = ad_clicks.groupby('experimental_group').user_id.count().reset_index()
```

7.
The column experimental_group tells us whether the user was shown Ad A or Ad B. Were approximately the same number of people shown both ads?

```
value1 = ad_clicks.groupby(['experimental_group']).user_id.count().reset_index()
```


8.
Using the column is_click that we defined earlier, check to see if a greater percentage of users clicked on Ad A or Ad B.

```
value2 = ad_clicks.groupby(['experimental_group','is_click']).user_id.count().reset_index()
value2_pivot = value2.pivot(index='experimental_group',  columns='is_click',  values='user_id')
```

9.
The Product Manager for the A/B test thinks that the clicks might have changed by day of the week. Start by creating two DataFrames: a_clicks and b_clicks, which contain only the results for A group and B group, respectively.

```
a_clicks = ad_clicks[(ad_clicks.experimental_group == 'A')]
b_clicks = ad_clicks[(ad_clicks.experimental_group == 'B')]
```

10.
For each group (a_clicks and b_clicks), calculate the percent of users who clicked on the ad by day.

```
a_clicks_groupby = a_clicks.groupby(['is_click','day']).user_id.count().reset_index()
a_clicks_groupby_pivot = a_clicks_groupby.pivot(
                            index='is_click'
                            ,columns='day'
                            ,values='user_id'
                            )

my_list = list(a_clicks_groupby_pivot.columns)
for col in my_list:
 a_clicks_groupby_pivot[col] = a_clicks_groupby_pivot[col].apply(
                                      lambda x : round(x/sum(a_clicks_groupby_pivot[col])
                                      ,2)
                                      )


b_clicks_groupby = b_clicks.groupby(['is_click','day']).user_id.count().reset_index()
b_clicks_groupby_pivot = b_clicks_groupby.pivot(index='is_click'
                                                ,columns='day'
                                                ,values='user_id'
                                               )
my_list = list(b_clicks_groupby_pivot.columns)
for col in my_list:
b_clicks_groupby_pivot[col] = b_clicks_groupby_pivot[col].apply(lambda x :
                                                        round(x/sum
                                                        (b_clicks_groupby_pivot[col])
                                                        ,2)
                                                        )


```
11.
Compare the results for A and B. What happened over the course of the week? Do you recommend that your company use Ad A or Ad B?

```
print(a_clicks_groupby_pivot.head(10))
print(b_clicks_groupby_pivot.head(10))
```

## WORKING WITH MULTIPLE DATA FRAMES


### Introduction
to efficiently store data, we often spread related information across multiple tables.

1. one table with all the info, however, a lot of this information would be repeated.
- same customer makes multiple `orders` there info will be will be reported multiple times
- same product is ordered by multiple customers
2. split our data into three tables:
- `orders` would contain the information necessary to describe an order: `order_id`, `customer_id`, `product_id`, `quantity`, and `timestamp`
- `products` would contain the information to describe each product: `product_id`, `product_description` and `product_price`
- `customers` would contain the information for each customer: `customer_id`, `customer_name`, `customer_address`, and `customer_phone_number`




### Inner Merge
This type of merge (where we only include matching rows) is called an inner merge. There are other types


#### Part 1 .merge()
The .merge() method
- looks for common columns between two DataFrames
- then looks for rows where those column’s values are the same.
- It then combines the matching rows into a single row in a new table.


```
new_df = pd.merge(df_table_1, df_table_2)
```
or
```
new_df = df_table_1.merge( df_table_2)
```
This will match up all of the customer information to the orders that each customer made.
We generally use this when we are joining more than two DataFrames together because we can “chain” the commands.
```
new_df = df_table_1.merge(df_table_2).merge(df_table_3)
```


### Merge on Specific Columns
If there aren't any common columns between two DataFrames
Using `.rename()`

use `.rename()` to rename the columns for our merges:

Example1:
- we will rename the column id to customer_id, so that orders and customers have a common column for the merge.
```
orders_and_customers = orders.merge( customers.rename(columns={'id': 'customer_id'})  )
```

#### Using keywords left_on and right_on
Use the keywords left_on and right_on to specify which columns we want to perform the merge on.

In the example below, the
- “left” table is the one that comes first (orders)
- “right” table is the one that comes second (customers).

This syntax says that we should match the customer_id from orders to the id in customers.

```
new_df = pd.merge(orders,customers,left_on='customer_id',right_on='id')
```

Output:
```
id_x |customer_id |product_id |quantity |timestamp           |id_y |customer_name |address       |phone_number
-----+------------+-----------+---------+--------------------+-----+--------------+--------------+--------------
1    |2           |3          |1        |2017-01-01 00:00:00 |2    |Jane Doe      |456 Park Ave  |949-867-5309
-----+------------+-----------+---------+--------------------+-----+--------------+--------------+--------------
2    |2           |2          |3        |2017-01-01 00:00:00 |2    |Jane Doe      |456 Park Ave  |949-867-5309
-----+------------+-----------+---------+--------------------+-----+--------------+--------------+--------------
3    |3           |1          |1        |2017-01-01 00:00:00 |3    |Joe Schmo     |789 Broadway  |112-358-1321
```
If we use this syntax, we’ll end up with two columns called id, one from the first table and one from the second. Pandas won’t let you have two columns with the same name, so it will change them to id_x and id_y.
- id_x one from the first table
- id_y one from the second table

Using the keyword suffixes. We can provide a list of suffixes to use instead of “_x” and “_y”.

For example, we could use the following code to make the suffixes reflect the table names:



```
new_df = pd.merge(orders, customers,
                  left_on='customer_id',
                  right_on='id',
                  suffixes=['_order', '_customer']
                  )
```


```
id_order |customer_id |product_id |quantity |timestamp           |id_customer |customer_name |address       |phone_number
---------+------------+-----------+---------+--------------------+------------+--------------+--------------+--------------
1        |2           |3          |1        |2017-01-01 00:00:00 |2           |Jane Doe      |456 Park Ave  |949-867-5309
---------+------------+-----------+---------+--------------------+------------+--------------+--------------+--------------
2        |2           |2          |3        |2017-01-01 00:00:00 |2           |Jane Doe      |456 Park Ave  |949-867-5309
---------+------------+-----------+---------+--------------------+------------+--------------+--------------+--------------
3        |3           |1          |1        |2017-01-01 00:00:00 |3           |Joe Schmo     |789 Broadway  |112-358-1321
```

### Mismatched Merges
when we merge two DataFrames whose rows don’t match perfectly, we lose the unmatched rows.

### Outer Merge
An Outer Join would include all rows from both tables, even if they don’t match. Any missing values are filled in with None or nan.
```
new_df = pd.merge(df_table_1, df_table_2, how='outer')
```
or
```
new_df = df_table_1.merge( df_table_2, how='outer')
```
Output :
```
name          |email                   |phone
--------------+------------------------+------------
Sally Sparrow |sally.sparrow@gmail.com |nan
--------------+------------------------+------------
Peter Grant   |pgrant@yahoo.com        |212-345-6789
--------------+------------------------+------------
Leslie May    |leslie_may@gmail.com    |626-987-6543
--------------+------------------------+------------
Aaron Burr    |nan                     |303-456-7891
```




### Left and Right Merge

#### Left Merge

- includes all rows from the first (left) table,
- only rows from the second (right) table that match the first table.


```
new_df =pd.merge( df_table_1, df_table_2, how='left')
```
Or
```
new_df = df_table_1.merge( df_table_2, how='left')
```

Output :
```
name          |email                   |phone
--------------+------------------------+------------
Sally Sparrow |sally.sparrow@gmail.com |nan
--------------+------------------------+------------
Peter Grant   |pgrant@yahoo.com        |212-345-6789
--------------+------------------------+------------
Leslie May    |leslie_may@gmail.com    |626-987-6543
```



#### Right Merge
- includes all rows from the second (right) table,
- only rows from the first (left) table that match the second  table.
```
new_df =pd.merge( df_table_1, df_table_2, how='right')
```
Or
```
new_df = df_table_1.merge( df_table_2, how='right')
```

Output :

```
name        |email                |phone
------------+---------------------+------------
Peter Grant |pgrant@yahoo.com     |212-345-6789
------------+---------------------+------------
Leslie May  |leslie_may@gmail.com |626-987-6543
------------+---------------------+------------
Aaron Burr  |nan                  |303-456-7891
```

### Concatenate df : `pd.concat()`
This method only works if all of the columns are the same in all of the DataFrames.
Reconstruct a single DataFrame from multiple smaller DataFrames, we can use the method `pd.concat([df1, df2, df3, ...])`.

df1
```
name           |email
---------------+--------------------------
Katja Obinger  |k.obinger@gmail.com
---------------+--------------------------
Alison Hendrix |alisonH@yahoo.com
---------------+--------------------------
Cosima Niehaus |cosi.niehaus@gmail.com
---------------+--------------------------
Rachel Duncan  |rachelduncan@hotmail.com
---------------+--------------------------
```


df2

```
name           |email
---------------+--------------------------
Jean Gray      |jgray@netscape.net
---------------+--------------------------
Scott Summers  |ssummers@gmail.com
---------------+--------------------------
Kitty Pryde    |kitkat@gmail.com
---------------+--------------------------
Charles Xavier |cxavier@hotmail.com
---------------+--------------------------
```

Solution
```
df3 = pd.concat([df1, df2])
```


Output:

```
name           |email
---------------+--------------------------
Katja Obinger  |k.obinger@gmail.com
---------------+--------------------------
Alison Hendrix |alisonH@yahoo.com
---------------+--------------------------
Cosima Niehaus |cosi.niehaus@gmail.com
---------------+--------------------------
Rachel Duncan  |rachelduncan@hotmail.com
---------------+--------------------------
Jean Gray      |jgray@netscape.net
---------------+--------------------------
Scott Summers  |ssummers@gmail.com
---------------+--------------------------
Kitty Pryde    |kitkat@gmail.com
---------------+--------------------------
Charles Xavier |cxavier@hotmail.com
---------------+--------------------------
```



### Review
This lesson introduced some methods for combining multiple DataFrames:
- Creating a DataFrame made by matching the common columns of two DataFrames is called a .merge()
- We can specify which columns should be matches by using the keyword arguments left_on and right_on
- We can combine DataFrames whose rows don’t all match using left, right, and outer merges and the how keyword argument
- We can stack or concatenate DataFrames with the same columns using pd.concat()

# Data Wrangling and Tidying
The goal of this unit is to learn common industry tools and best practices for data wrangling and data tidying.

Goals of this Unit:
- Understand the difference between data wrangling and data tidying
- Identify and select specific data using regular expressions
- Clean datasets using pandas
- Apply Tidy Data best practices to facilitate data analysis
- Begin developing expertise in exploratory data analysis


## Introduction: Data Wrangling and Tidying
The goal of this unit is to learn common industry tools and best practices for data wrangling and tidying.

After this unit, you will be able to:
- Understand the difference between data wrangling and data tidying
- Identify and select specific data using regular expressions
- Clean datasets using pandas
- Apply Tidy Data best practices to facilitate data analysis
- Begin developing expertise in exploratory data analysis


### Data Wrangling and Tidying
Encounter data that is unstructured and/or messy.

Messy data can take a variety of forms;
- The columns are mislabeled or do not have variable names.
- The dataset contains nonsensical data.
- Variables are stored in both the columns and rows.


#### Data wrangling
- We need to clean, transform, and sometimes manipulate the data structure to gain any insights.
- `Messy dataset → Tidy dataset`
- each column is a variable
- each row is an observation
- Tidy dataset can easily use for modelling purposes or for visualisation purposes



##### Preliminary data cleaning
To make the dataset that is cleaner and much easier to read
1. remove any duplicate rows, we can use the df.drop_duplicates() function
2. to convert the column names to all lowercase
  - `df.columns = map(str.lower, df.columns)`
3. `df = df.rename({'oldname1': 'newname1', 'oldname2': 'newname2'}, axis=1)`

###### Data Types
look at each column’s data types by appending. dtypes to our pandas dataframe.
- `df.dtypes`

`df.nunique()` counts the number of unique values in each column
- `df.nunique()`

###### Missing Data
counts the number of missing values in each column
df.isna().sum()

###### Characterising missingness with crosstab
Missingness in the url column by counting the missing values across each borough.
- We will use the pd.crosstab() function in pandas.
- The crosstab() computes the frequency of two or more variables.
  - missingness in the url column we can add isna() to the column to identify if there is an NaN in that column.
  - This will return a boolean, True if there is a NaN and False if there is not.
- In our crosstab, we will look at all the boroughs present in our data and whether or not they have missing url links.

```
pd.crosstab(
       # tabulates the boroughs as the index
       df['boro'],
       # tabulates the number of missing values in the url column as columns
       df['url'].isna(),
       # names the rows
       rownames = ['boro'],
       # names the columns
       colnames = ['url is na'])
```


```
url is na |False |True
----------+------+-----
boro      |0     |0
----------+------+-----
Bronx     |1     |1
----------+------+-----
Brooklyn  |2     |4
----------+------+-----
Manhattan |11    |2
----------+------+-----
Queens    |2     |2
```

###### Removing prefixes
removes the “https://” from the left side of the string
```
df['url'] = df['url'].str.lstrip('https://')
```
removes the “www.” from the left side of the string
```
df['url'] = df['url'].str.lstrip('www.')
```

#### Tidy Data
`(Pivot table ??)`

We will use the melt() function in pandas to turn the current values (2000 and 2007) in the column headers into row values and add year and avg_annual_wage as our column labels.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20Data%20Wrangling%20and%20Tidying%201.png)

```
annual_wage=annual_wage.melt(
            # which column to use as identifier variables
            id_vars=["boro"],
            # column name to use for “variable” names/column headers (ie. 2000 and 2007)
            var_name=["year"],
            # column name for the values originally in the columns 2000 and 2007
            value_name="avg_annual_wage"
            )
```

Output:

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20Data%20Wrangling%20and%20Tidying%202.png)


## INTRODUCTION TO REGULAR EXPRESSIONS

RegExr Regular Expression Builder

RegExr: https://regexr.com/

In this resource, you will learn how to build regular expressions. This is helpful if you wish to experiment with regular expressions before using them professionally.

### Introduction
Nearly every piece of information you enter into a web form is validated.
- Did you enter a properly formatted email including an @ symbol?
- Did you enter a phone number 10 digits long, with or without -s and parentheses?
- Did your new password meet the requirements for inclusion (and exclusion) of symbols, digits, and both upper and lower case letters?

- The technology that fuels this verification system on nearly every website and application is the ever reliable, often quirky language of regular expressions(regex)
- A regular expression is a special sequence of characters that describe a pattern of text that should be found/matched, in a string/document.
- By matching text, we can identify how often and where certain pieces of text occur, as well as have the opportunity to replace or update these pieces of text if needed.

Regular Expressions have a variety of use cases including:
- validating user input in HTML forms
- verifying and parsing text in files, code and applications
- examining test results
- finding keywords in emails and web pages



### Literals
- Literals are the simplest text we can match with regular expressions.
- Literals this is where our regular expression contains the exact text that we want to match.
- Regular expressions operate by moving character by character, from left to right, through a piece of text.
- When the regular expression finds a character that matches the first piece of the expression, it looks to find a continuous sequence of matching characters.

For example,
- regex `a` will match the text `a`
- regex `bananas` will match the text `bananas`
- regex `5 gibbons` will completely match the text `5 gibbons`
- regex `3` will match the `3` in the piece of text `34`
- regex `monkey` to match `monkey` in the piece of text `The monkeys like to eat bananas`.


### Alternation
Alternation, performed in regular expressions with the pipe symbol, `|`, allows us to match either the characters preceding the `|` OR the characters after the `|`.
The regex `baboons|gorillas` will :
- match `baboons` in the text `I love baboons`,

but will also
- match `gorillas` in the text `I love gorillas`.  

### Character Sets
Character sets, denoted by a pair of brackets [], let us match one character from a series of characters, allowing for matches with incorrect or different spellings.

The beauty of character sets (and alternation) is that they allow our regular expressions to become more flexible and less rigid than by just matching with literals!

The regex `con[sc]en[sc]us` will      
match `consensus`, the correct spelling of the word
match the following three incorrect spellings: `concensus`, `consencus`, and `concencus`.

1. First brackets, `s` and `c`, are the different possibilities for the character that comes after `con` and before `en`.
2. Second brackets, `s` and `c` are the different character possibilities to come after `en` and before `us`.

Thus the regex `[cat]` will match the characters `c`, `a`, or `t`, but not the text `cat`.

#### Negated character sets
Placed at the front of a character set, the `^` negates the set, matching any character that is not stated.
Thus the regex `[^cat]` will match any character that is ***not*** `c`, `a`, or `t`, and would completely match each character `d`, `o` or `g`.


### Wild for Wildcards
Enter the wildcard `.`! Wildcards will match any single character (letter, number, symbol or whitespace) in a piece of text. They are useful when we do not care about the specific value of a character, but only that a character exists!

Let’s say we want to `match any 9-character piece of text`. The regex `.........`will completely match ***`orangutan`*** and ***`marsupial`***!

Similarly, the regex `I ate . bananas` will completely match both
- ***`I ate 3 bananas`***
- ***`I ate 8 bananas`***

What happens if we want to match an actual full stop ( . )?
- actual full stop is `\.`

We can use the ***`escape character \ (back slash)`***, to escape the wildcard functionality of the `.` and match an actual full stop.

Regex `Howler monkeys are really lazy\.`

will completely match the
text 'Howler monkeys are really lazy.'

### Ranges
Ranges allow us to specify a range of characters in which we can make a match without having to type out each:

individual character
- regex `[A-Z]` - match any `single capital` letter
- regex `[a-z]` - match any `single lowercase` letter
- regex `[0-9]` - match any `single digit`

- multiple ranges in the same character set!
  - regex `[A-Za-z]` - match any `single capital or lowercase` letter


Examples
```
regex [abc] ≡ regex [a-c]
```
The regex `I adopted [2-9] [b-h]ats` will match the texts:
- `I adopted 4 bats`
- `I adopted 8 cats`
- `I adopted 5 hats`





### Shorthand Character Classes
shorthand character classes that represent common ranges, and they make writing regular expressions much simpler.

word character `regex \w = regex[A-Za-z0-9_]`
- It matches a single uppercase character, lowercase character, digit or underscore
digit character `regex\d = regex[0-9]`
- It matches a single digit character
whitespace character `regex\s = regex[ \t\r\n\f\v]`
- It matches a single:
  - space `[ ]`
  - tab `[\t]`
  - carriage return `[\r]`
  - line break `[\n]`
  - form feed `[\f]`
  - vertical tab`[\v]`


Example:
The regex `\d\s\w\w\w\w\w\w\w` matches:
- `\d` = a digit character,

followed by
- `\s` = a whitespace character,

followed by
- `\w\w\w\w\w\w\w` = word with 7 characters.

… Thus regex completely matches `3 monkeys`.

#### Negated Shorthand Character Classes
  - non-word character regex`\W` = regex`[^A-Za-z0-9_]`
  - non-digit character regex`\D` = regex`[^0-9]`
  - non-whitespace character regex`\S` = regex`[^ \t\r\n\f\v]`  

### Grouping  
Grouping, denoted with `()`, lets us group parts of a regular expression together, and allows us to limit alternation to part of the regex.

Example:
The regex `I love (baboons|gorillas)` will match the text
  - `I love baboons`
  - `I love gorillas`

The regex `I love (baboons|gorillas)` will match the text `I love` and then match either `baboons` ***or*** `gorillas`, as the grouping limits the reach of the | to the text within the parentheses.

### Quantifiers - Fixed
Fixed quantifiers, denoted with curly braces {}, let us indicate the exact quantity of a character we wish to match, or allow us to provide a quantity range to match on.
- regex`\w{6}\s\w{6}` = regex`\w\w\w\w\w\w\s\w\w\w\w\w\w`
  - match 6 word characters, followed by a whitespace character, and then followed by more 6 word characters
- regex`\w{3}` will match exactly 3 word characters
- regex`\w{4,7}` will match at minimum 4 word characters and at maximum 7 word characters
- regex `roa{3}r` will match the characters `ro` followed by `3 a’s`, and then the character `r`, such as in the text `roaaar`.
- regex `roa{3,7}r` will match the characters `ro` followed by at `least 3 a’s` and at `most 7 a’s`, followed by an `r`, matching the strings
  - `roaaar`
  - `roaaaaaar`
  - `roaaaaaaar`

Important note is that quantifiers are considered to be greedy. This means that they will match the greatest quantity of characters they possibly can.

Example, the regex `mo{2,4}`
- will match the text `moooo` in the string “moooo”.
- not return a match of `moo`, or `mooo` in the string “moooo”.

This is because the fixed quantifier wants to match the largest number of o’s as possible, which is 4 in the string “moooo”.

### Quantifiers - Optional
Optional quantifiers, indicated by the question mark ?, allow us to indicate a character in a regex is optional, or can appear either 0 times or 1 time.

Example,

The regex `humou?r` matches the characters `humo`, then (u?) either …
  - 0 occurrences of the letter `u`
  - 1 occurrence of the letter `u`

  … and finally the letter `r`.

```
Note the ? only applies to the character directly before it.
```

The regex `The monkey ate a (rotten )?banan `will completely match both:
- `The monkey ate a rotten banana `
- `The monkey ate a banana`


### Quantifiers - Kleene star * and Kleene plus +

#### Kleene star *
Kleene star, denoted with the asterisk `*`, and matches the preceding character 0 or more times. This means that the character doesn’t need to appear, can appear once, or can appear many many times.

Example

The regex `meo*w` will match the characters `me`, followed by 0 or more `o’s`, followed by a `w`.

Thus the regex `meo*w` will match:
  - `mew`
  - `meow`
  - `meoooow`
  - `meoooooooooooooooooooow`

#### Kleene plus +
Kleene plus, denoted with the asterisk `+`, and matches the preceding character 1 or more times. This means that the character can appear once, or can appear many many times.

The regex `meo+w` will match the characters `me`, followed by` 1 or more o’s`, followed by a `w`.

Thus the regex `meo+w` will match:
  - `meow`
  - `meoooow`
  - `meoooooooooooooooooooow`


### Anchors
- Anchor hat `^` is used to match text at the start of a string.
- Anchor dollar sign `$` is used to match text at the end of a string.

regex `^Monkeys: my mortal enemy$` will completely match the text `Monkeys: my mortal enemy` but not match:
- Spider `Monkeys: my mortal enemy in the wild`
- Squirrel `Monkeys: my mortal enemy in the wild`

```
The ^ ensures that the matched text begins with Monkeys.
                      and
The $ ensures the matched text ends with enemy.
```

### Review
Do you feel those regular expression superpowers coursing through your body? Do you just want to scream ah+ really loud? Awesome! You are now ready to take these skills and use them out in the wild. Before beginning your adventures, let’s review what we’ve learned.

```
- Regular expressions are special sequences of characters that describe a pattern of text that is to be matched
- We can use literals to match the exact characters that we desire
- Alternation, using the pipe symbol |, allows us to match the text preceding or following the |
- Character sets, denoted by a pair of brackets [], let us match one character from a series of characters
- Wildcards, represented by the period or dot ., will match any single character (letter, number, symbol or whitespace)
- Ranges allow us to specify a range of characters in which we can make a match
- Shorthand character classes like \w, \d and \s represent the ranges representing word characters, digit characters, and whitespace characters, respectively
- Groupings, denoted with parentheses (), group parts of a regular expression together, and allows us to limit alternation to part of a regex
- Fixed quantifiers, represented with curly braces {}, let us indicate the exact quantity or a range of quantity of a character we wish to match
- Optional quantifiers, indicated by the question mark ?, allow us to indicate a character in a regex is optional, or can appear either 0 times or 1 time
- The Kleene star, denoted with the asterisk *, is a quantifier that matches the preceding character 0 or more times
- The Kleene plus, denoted by the plus +, matches the preceding character 1 or more times
- The anchor symbols hat ^ and dollar sign $ are used to match text at the start and end of a string, respectively
```

RegExr Regular Expression Builder

RegExr: https://regexr.com/

In this resource, you will learn how to build regular expressions. This is helpful if you wish to experiment with regular expressions before using them professionally.



## HOW TO CLEAN DATA WITH PYTHON



### Introduction

A huge part of data science involves acquiring raw data and getting it into a form ready for analysis. Some have estimated that data scientists spend
- 80% of their time cleaning and manipulating data
- 20% of their time actually analysing it or building models from it.

When we receive raw data, we have to do a number of things before we’re ready to analyse it, possibly including:
- diagnosing the “tidiness” of the data — how much data cleaning we will have to do
- reshaping the data — getting right rows and columns for effective analysis
- combining multiple files
- changing the types of values — how we fix a column where numerical values are stored as strings, for example
- dropping or filling missing values - how we deal with data that is incomplete or missing
- manipulating strings to represent the data better

### Diagnose the Data
We often describe data that is easy to analyse and visualise as “tidy data”. What does it mean to have tidy data?

For data to be tidy, it must have:
- Each variable as a separate column
- Each row as a separate observation

The first step of diagnosing whether or not a dataset is tidy is using pandas functions to explore and probe the dataset.

You’ve seen most of the functions we often use to diagnose a dataset for cleaning. Some of the most useful ones are:
- `df.head()` — display the first 5 rows of the table
- `df.info()` — display a summary of the table
- `df.describe()` — display the summary statistics of the table
- `df.columns` — display the column names of the table
- `df.value_counts()` — display the distinct values for a column


### Dealing with Multiple Files
Often, you have the same data separated out into multiple files.

Let’s say that we have a ton of files following the filename structure:
  - 'file1.csv'
  - 'file2.csv'
  - 'file3.csv'
  -  so on...


`glob` a Python library for working with files

`glob` can open multiple files by using regex matching to get the filenames:
```
import glob
files = glob.glob("path/to/folder/file*.csv")
```
`glob` function created `files` that is a list of all the files in the folder
- that starts with `’file’`
- has an extension of `.csv`.



This code goes through any file that starts with 'file' and has an extension of .csv. It opens each file, reads the data into a DataFrame, and then concatenates all of those DataFrames together.
```
import glob
files = glob.glob("file*.csv")
df_list = []
for filename in files:
 	data = pd.read_csv(filename)
 	df_list.append(data)
df = pd.concat(df_list)
print(files.head( 10 ))
```

### Reshaping your Data .melt()
Since we want
- Each variable as a separate column
- Each row as a separate observation

Example, We would want to reshape a table like:
```
Account    |Checking |Savings
-----------+---------+----------
“12456543” |8500     |8900
-----------+---------+----------
“12283942” |6410     |8020
-----------+---------+----------
“12839485” |78000    |92000
```

Into a table that looks more like:
```
Account    |Account Type |Amount
-----------+-------------+----------
“12456543” |“Checking”   |8500
-----------+-------------+----------
“12456543” |“Savings”    |8900
-----------+-------------+----------
“12283942” |“Checking”   |6410
-----------+-------------+----------
“12283942” |“Savings”    |8020
-----------+-------------+----------
“12839485” |“Checking”   |78000
-----------+-------------+----------
“12839485” |“Savings”    |920000
```

We can use `pd.melt()` to do this transformation. `.melt()` takes in a DataFrame, and the columns to unpack:

```
df = pd.melt(
            frame=df
            , id_vars="Account"
            , var_name="Account Type"
            , value_vars=["Checking","Savings"]
            , value_name="Amount"
            )
```

The parameters you provide are:
- `frame`: the DataFrame you want to melt
- `id_vars`: the column(s) of the old DataFrame to preserve
- `value_vars`: the column(s) of the old DataFrame that you want to turn into variables
- `value_name`: what to call the column of the new DataFrame that stores the values
- `var_name`: what to call the column of the new DataFrame that stores the variables

### Dealing with Duplicates
Often we see duplicated rows occur due to errors in data collection or in saving and loading the data.

To check for duplicates, we can use the pandas function `df.duplicated()`, which will return a Series telling us which rows are duplicate rows.

Let’s say we have a DataFrame fruits that represents this table:



```
item         |price   |calories
-------------+--------+----------
“banana”     |“$1”    |105
-------------+--------+----------
“apple”      |“$0.75” |95
-------------+--------+----------
“apple”      |“$0.75” |95
-------------+--------+----------
“peach”      |“$3”    |55
-------------+--------+----------
“peach”      |“$4”    |55
-------------+--------+----------
“clementine” |“$2.5”  |35
```

If we call `fruits.duplicated()`, we would get the following table:

```
id |value
---+-------
0  |False
---+-------
1  |False
---+-------
2  |True
---+-------
3  |False
---+-------
4  |False
---+-------
5  |False
```
row 2, which represents an "apple" with price "$0.75" and 95 calories, is a duplicate row.

We can use the pandas `df.drop_duplicates()` function to remove all rows that are duplicates of another row.
If we call `fruits.drop_duplicates(`), we would get the table:


```
item         |price   |calories
-------------+--------+----------
“banana”     |“$1”    |105
-------------+--------+----------
“apple”      |“$0.75” |95
-------------+--------+----------
“peach”      |“$3”    |55
-------------+--------+----------
“peach”      |“$4”    |55
-------------+--------+----------
“clementine” |“$2.5”  |35
```
The "apple" row was deleted because it was exactly the same as another row. But the two "peach" rows remain because there is a difference in the price column.

If we wanted to remove every row with a duplicate value in the item column, we could specify a subset:

```
fruits = fruits.drop_duplicates(subset=['item'])
```
By default, this keeps the first occurrence of the duplicate:
```
item         |price   |calories
-------------+--------+----------
“banana”     |“$1”    |105
-------------+--------+----------
“apple”      |“$0.75” |95
-------------+--------+----------
“peach”      |“$4”    |55
-------------+--------+----------
“clementine” |“$2.5”  |35
```
Make sure that the columns you drop duplicates from are specifically the ones where duplicates don’t belong. You wouldn’t want to drop duplicates with the price column as a subset, for example, because it’s okay if multiple items cost the same amount!

### Splitting by Index
- clean data, we want to make sure each column represents one type of measurement.
- Often, multiple measurements are recorded in the same column, and we want to separate these out so that we can do individual analysis on each variable.

Example
- column “birthday” with data formatted in MMDDYYYY format.
  - “11011993” represents the birthday of 1st November 1993.
- We want to split this data into day, month, and year so that we can use these columns as separate features.

```
# Create the 'month' column
df['month'] = df.birthday.str[0:2]
# Create the 'day' column
df['day'] = df.birthday.str[2:4]
# Create the 'year' column
df['year'] = df.birthday.str[4:]
```
- First command takes the first two characters of each value in the birthday column and puts it into a month column.
- Second command takes the second two characters of each value in the birthday column and puts it into a day column.
- Third command takes the rest of each value in the birthday column and puts it into a year column.

This would transform a table like:
```
id   |birthday
-----+------------
1011 |“12241989”
-----+------------
1112 |“10311966”
-----+------------
1113 |“01052011”
```

into a table like:
```
id   |birthday   |month |day  |year
-----+-----------+------+-----+--------
1011 |“12241989” |“12”  |“24” |“1989”
-----+-----------+------+-----+--------
1112 |“10311966” |“10”  |“31” |“1966”
-----+-----------+------+-----+--------
1113 |“01052011” |“01”  |“05” |“2011”
```

### Splitting by Character
Example:
- column called “type” with data entries in the format "admin_US" or "user_Kenya".
- “type”column actually contains two types of data.
- user type (with values like “admin” or “user”)
- country this user is in (with values like “US” or “Kenya”).

We can no longer just split along the first 4 characters because admin and user are of different lengths. Instead, we know that we want to split along the "_".
```
# Create the 'str_split' column
df['str_split'] = df.type.str.split('_')
# Create the 'usertype' column
df['usertype'] = df.str_split.str.get(0)
# Create the 'country' column
df['country'] = df.str_split.str.get(1)
```

This would transform a table like:
```
id   |type
-----+-----------------
1011 |“user_Kenya”
-----+-----------------
1112 |“admin_US”
-----+-----------------
1113 |“moderator_UK”
```

into a table like:
```
id   |type           |country |usertype
-----+---------------+--------+-------------
1011 |“user_Kenya”   |“Kenya” |“user”
-----+---------------+--------+-------------
1112 |“admin_US”     |“US”.   |“admin”
-----+---------------+--------+-------------
1113 |“moderator_UK” |“UK”    |“moderator”
```

### Looking at Types
Each column of a DataFrame can hold items of the same data type or dtype.
We want to convert between types so that we can do better analysis.

The dtypes that pandas uses are:
- float
- int
- bool
- datetime
- timedelta
- category
- object

Example:
If a numerical category like "num_users" is stored as a Series of objects instead of ints, for example, it makes it more difficult to do something like make a line graph of users over time.


```
print(df.dtypes) # <class 'pandas.core.series.Series'>
```
make
```
item     |object
---------+--------
price    |object
---------+--------
calories |int64

dtype: object
```

We can see that the dtype of the dtypes attribute itself is an object! It is a Series object, which you have already worked with. Series objects compose all DataFrames.


### String Parsing
Sometimes we need to modify strings in our DataFrames to help us transform them into more meaningful metrics.

For example, in our fruits table from before:

```
item         |price   |calories
-------------+--------+----------
“banana”     |“$1”    |105
-------------+--------+----------
“apple”      |“$0.75” |95
-------------+--------+----------
“peach”      |“$3”    |55
-------------+--------+----------
“peach”      |“$4”    |55
-------------+--------+----------
“clementine” |“$2.5”  |35
```
We can see that the 'price' column is actually composed of strings representing dollar amounts. This column could be much better represented in floats, so that we could take the aggregate statistics or compare different fruits to one another in terms of price.


```
fruit.price = fruit['price'].replace('[\$,]', '', regex=True)
```
or
```
fruit.price = fruit['price'].replace('$', '')
```

Then, we can use the pandas function `.to_numeric()` to convert strings containing numerical values to integers or floats:
```
fruit.price = pd.to_numeric(fruit.price)
```
```
item         |price   |calories
-------------+--------+----------
“banana”     |1       |105
-------------+--------+----------
“apple”      |0.75    |95
-------------+--------+----------
“peach”      |3       |55
-------------+--------+----------
“peach”      |4       |55
-------------+--------+----------
“clementine” |2.5     |35
```







### More String Parsing
Sometimes we want to do analysis on numbers that are hidden within string values. We can use regex to extract this numerical data from the strings they are trapped in. Suppose we had this DataFrame df representing a workout regimen:


```
date       |exerciseDescription
-----------+--------------------
10/18/2018 |“lunges - 30 reps”
-----------+--------------------
10/18/2018 |“squats - 20 reps”
-----------+--------------------
10/18/2018 |“deadlifts - 25 reps”
-----------+--------------------
10/18/2018 |“jumping jacks - 30 reps”
-----------+--------------------
10/19/2018 |“lunges - 40 reps”
-----------+--------------------
10/19/2018 |“chest flyes - 15 reps”
-----------+--------------------
…          |…
```
It would be helpful to separate out data like "30 lunges" into 2 columns
- number of reps, "30"
- type of exercise, "lunges".

Then, we could compare the increase in the number of lunges done over time

Example:

To extract the numbers from the string we can use pandas’ `.str.split()` function
```
split_df = df['exerciseDescription'].str.split('(\d+)', expand=True)
```
which would result in this DataFrame split_df:


```
df.reps = pd.to_numeric(split_df[1])
df.exercise = split_df[0].replace('[\- ]', '', regex=True)
```
Now, our df looks like this:

```
date       |exerciseDescription       |reps |exercise
-----------+--------------------------+-----+-----------------
10/18/2018 |“lunges - 30 reps”        |30   |“lunges”
-----------+--------------------------+-----+-----------------
10/18/2018 |“squats - 20 reps”        |20   |“squats”
-----------+--------------------------+-----+-----------------
10/18/2018 |“deadlifts - 25 reps”     |25   |“deadlifts”
-----------+--------------------------+-----+-----------------
10/18/2018 |“jumping jacks - 30 reps” |30   |“jumping jacks”
-----------+--------------------------+-----+-----------------
10/19/2018 |“lunges - 40 reps”        |40   |“lunges”
-----------+--------------------------+-----+-----------------
10/19/2018 |“chest flyes - 15 reps”   |15   |“chest flyes”
-----------+--------------------------+-----+-----------------
…          |…                         |…    |…
```

### Missing Values
We often have data with missing elements, as a result of a problem with the data collection process or errors in the way the data was stored.

- The missing elements normally show up as NaN (or Not a Number) values

```
day   |bill  |tip |num_guests
------+------+----+------------
“Mon” |10.1  |1   |1
------+------+----+------------
“Mon” |20.75 |5.5 |2
------+------+----+------------
“Tue” |19.95 |5.5 |NaN
------+------+----+------------
“Wed” |44.10 |15  |3
------+------+----+------------
“Wed” |NaN   |1   |1
```

The `num_guests` value for the 3rd row is missing, and the bill value for the 5th row is missing. Some calculations we do will just skip the `NaN` values, but some calculations or visualisations we try to perform will break when a `NaN` is encountered.
Most of the time, we use one of two methods to deal with missing values.

#### Method 1 df.dropna()
```
bill_df = bill_df.dropna()
```
removes every row with a `NaN` value in the num_guests column only, we could specify a subset
```
bill_df = bill_df.dropna(subset=['num_guests'])
```

#### Method 2 df.fillna()
This command will result in the DataFrame with the respective mean of the column in the place of the original `NaNs`:
```
bill_df = bill_df.fillna(
                value={
                  "bill":bill_df.bill.mean() ,"num_guests":bill_df.num_guests.mean()
                      }
                        )

```


### Review
Great! We have looked at a number of different methods we may use to get data into the format we want for analysis.

Specifically, we have covered:
- diagnosing the “tidiness” of the data
- reshaping the data
- combining multiple files
- changing the types of values
- dropping or filling missing values - how we deal with data that is incomplete or missing
- manipulating strings to represent the data better

You can use these methods to transform your datasets to be clean and easy to work with!


## Handling Missing Data


### Introduction
Just as data goes missing for different reasons, there are different ways in which it is missing.

Example
health survey, we are collecting
- activity level (measured in minutes)
- location (city)
- blood pressure

If we are missing a lot of data on blood pressure, then we can’t use that dataset to answer questions about blood pressure. We can still use it to understand the relationship between location and activity, but not blood pressure.

Different types of missing data:

1. `Structurally Missing Data (SMD)`
  - we expect this data to be missing for some logical reason
  - Missing data, we aren’t surprised at all.
2. `Missing Completely at Random (MCAR)`
  - the probability of any datapoint being MCAR is the same for all data points – this type of missing data is mostly hypothetical
  - the probability of any datapoint being MCAR is the same for all data points
  - Sometimes data is just missing. It can happen for any reason, but the important thing is that it could have happened to any observation.

3. `Missing at Random (MAR)`
  - the probability of any data point being MAR is the same within groups of the observed data – this is much more realistic than MCAR
  - the probability of any data point being MAR is the same within groups of the observed data
- There’s no logic, no outside force, or unseen behaviour. It’s just a completely random, fluke occurrence that there isn’t data. MCAR data demands statistical perfection, which is extremely rare because more often than not, there is some unseen reason why data might be missing.
  - a bug in the software that causes the device to not record steps
  - People not willing tell a specific data point

4. `Missing Not at Random (MNAR)`
  - there is some reason why the data is missing
  - last-reported weight to see if data is missing from higher or lower BMI groups
  - demographics such as age, race, and gender to see if there is a pattern here
  - date of data collection

## Handling Missing Data with Deletion
- Deletion is, quite simply, when we remove some aspect of our missing data so that our resulting dataset is as complete as possible, leading to accurate analytics.
- Missing data does not provide a complete picture of what happened in our observations, we can’t rely on it for analytics, hence why deleting data can be a good solution.
- The more data we have, the more confidence we can have that our conclusions are actually happening, and not due to random chance.
- We should only drop a variable as a last resort, and if that variable is missing a very significant amount of data (at least 60%).

The big risk with deletion is that we could introduce bias, or unrepresentative data, into the dataset. If we delete:
- too much data
- the wrong kind of data,

  … then the resulting dataset doesn’t accurately describe what actually happened.

In general, data is safe to delete when:
- It is either MAR or MCAR missing data. We can remove data that falls into either of these categories without affecting the rest of the data, since we assume that the data is missing at random.
  - Don't delete if the percentage of missing data is too high.
- The missing data has a low correlation with other features in the data. If the missing data is not important for what we’re doing, then we can safely remove that data.


### Types of deletion
Depending on the kind of analysis we are doing, we have two available kinds of deletion available to us:


#### 1. Listwise (complete-case analysis)
- Technique in which we remove the entire observation when there is missing data
- missing variable(s) can directly impact the analysis we are trying to perform, usually with respect to MAR or MCAR missing data

```
# Drop rows that have any missing data
df.dropna(inplace=True)
```

Note:
- lose a lot of information when we remove an entire row.
- decrease the amount of data that we can use for analysis.
- less confidence in the accuracy of any conclusions we draw from the resulting dataset.

Best practice:
- Listwise deletion when the number of rows with missing data is relatively small to avoid significant bias.
- 5% of data is missing, then we are safe to use listwise deletion.

#### 2. Pairwise
Pairwise deletion, we only remove rows when there are missing values in the variables we are directly analysing. Unlike listwise deletion, we do not care if other variables are missing, and can retain those rows.

- Pairwise deletion has the advantage of retaining as much data as possible, while still letting us handle the missing data for our key variables.
- Pairwise deletion is the preferred technique to use.



```
df.dropna(subset=['columns_1','columns_2'] #only looks at these two columns
        ,inplace=True #removes the rows and keeps the data variable
        ,how='any' #removes data with missing data in either field
          )
```

#### Question:
Here are a few scenarios and how you might respond:
- I have a dataset of survey responses I sent out to my company. We had 90% of the company fill out the response. For the 10% that didn’t fill out the response, all of the fields are empty. Since I only want to analyse the responses that came in, I can delete the missing 10% of data using listwise deletion.
- I have a dataset of survey responses I sent out to my company. Most people responded, but we have some missing values in a few fields: Overall Happiness, Work Life Balance, Commute Time, and Comments. I only care about understanding correlation of a few other fields withWork Life Balance, so I can use pairwise deletion to only remove blanks in that field.
- I have a dataset of survey responses I sent out to my company. Although not everyone responded, I don’t think I can delete any of my data. Part of our analysis is to understand who didn’t respond, or what data is missing.

Why these responses?
- This is a good answer because they understand that, for their analysis, they only care about responses, so non-responses can be completely removed using listwise deletion.
- This is a good answer because they have shown they understand which data is missing, but more importantly, which data is important for them. They identified the field they want to clean up, and that pairwise deletion is the best solution here.
- This is a good answer as well because they understand that we can’t always delete data. Sometimes the fact that data is missing is important to know. There will be other techniques they can use, depending on the data.


## Single Imputation
- Filling in the blanks one at a time
- Missing data is handled by using the data around it to “fill in the blanks”
- Effective technique to handle missing data for our time-series datasets.

Disadvantages to using single imputation methods:
- The potential for adding bias into a dataset.
  - We are assuming that the data we are using to fill in the blanks is reliable and accurate for that observation.
- Single imputation will ignore these potential changes and will “smooth” out our data, which could lead to inaccurate results.

### Is it Missing Not at Random (MNAR)?
Before we can start describing techniques, we must verify that our missing data can be categorised as MNAR (Missing Not at Random) — these techniques assume that to be the case.

There are two key aspects to be able to accurately describe missing data as MNAR:

1. Use domain knowledge:

  Example, someone might know that data in a survey is missing in a particular column because the participant was either too embarrassed to answer, or didn’t know the answer. This would let us know that the data is MNAR.

2. analyse the dataset to find patterns:
  
  Example: if we have some survey data, we might find that our missing data almost exclusively comes from men older than 55 years old. If we see a pattern like this emerge, we can confidently say it is MNAR.


### LOCF - Last Observation Carried Forward
- fill in the missing data with the previous value.
- LOCF is used often when we see a relatively consistent pattern that has continued to either increase or decrease over time.

In Python, there are a variety of methods we can employ:

1. If your data is in a pandas DataFrame, we can use the `.ffil()` method on a particular column:
```
df['column_name'].ffill(axis=0, inplace=True)
# Applying Forward Fill (another name for LOCF) on the comfort column
```

2. If our data is in a NumPy array, there is a commonly used library called impyute that has a variety of time-series functions. To use LOCF, we can write the following code:
```
impyute.imputation.ts.locf(data, axis=0)
# Applying LOCF to the dataset
```

Example:

I analyse surveys from our employees on how they are feeling each week at work. Some of our employees didn’t fill out the data this week, but they have been marking 5/5 on their overall happiness for months now, so I can probably assume they are still at 5/5

Why this answer:

This is a good response because they understand that the historical data shows a strong trend to a particular value, so they can carry that data forward.




### NOCB - Next Observation Carried Backward
- NOCB is usually used when we have more recent data, and we know enough about the past to fill in the blanks that way.
- With NOCB, we can use information from more recent entries to fill in data from that past.

Similarly to LOCF, there are a couple common techniques we can use:
1. If your data is in a pandas DataFrame, when we can use the .bfil() method on a particular column. By “back-filling” (another name for NOCB) our data, we will take data from the next data point in the time series and carry it backwards.

```
df['column_name'].bfill(axis=0, inplace=True)
```

2. To use impyute to perform NOCB, we can write the following code

```
impyute.imputation.ts.nocb(data, axis=0)
```

### BOCF - Baseline Observation Carried Forward.
the initial values for a given variable are applied to missing values.

This is a common approach in medical studies, particularly in drug studies. For example, we could assume that missing data for a reported pain level could return to the baseline value.

```
# Isolate the first (baseline) value for our data
baseline = df['column_name'][0]
# Replace missing values with our baseline value
df['column_name'].fillna(value=baseline, inplace=True)
```




### WOCF - Worst Observation Carried Forward
The worst values for a given variable are applied to missing values.

This would be useful if the purpose of our analysis was to record improvement in some value (for example, if we wanted to study if a treatment was helping a particular patient’s condition). By filling in data with the worst value, then we can reduce potentially biassed results that didn’t actually happen.

```
# Isolate worst pain value (in this case, the highest)
worst = df['column_name'].max()
# Replace all missing values with the worst value
df['column_name'].fillna(value=worst, inplace=True)
```




## Multiple Imputation
1. Multiple imputation, in particular, is used when we have `missing data across multiple categorical columns in our dataset`.
2. Multiple imputation is a technique for filling in missing data, in which we replace the missing data multiple times.
  - After we have tried different values, we use an `algorithm` to pick the best values to replace our missing data.
  - By doing this over time, find the correct value for our missing data.

Key benefits:
- Confident that the `resulting data will be a close approximation of the real data`. This would include calculations like standard error and overall statistics.

Data needs to meet a criteria:
- Multiple imputation is best for MNAR data (MNAR = Missing Not at Random)
- With MNAR data, there is an assumption that there is an underlying reason to have missing data, and we have a good understanding of why that data is missing.

### Flow chart:
Repeat multiple cycles: Because we have missing data in multiple variables, it is very unlikely that we would have accurate predictions in the first pass through of the data. Thus, we need to run these predictive iterations multiple times so we can improve the model on each pass. The number of iterations will vary based on the data and many other factors, but a good place to start is 10 iterations.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20Multiple%20Imputation%201.png)

- Assign placeholder values: For all of the missing data we have in our variables, we need to assume a value to start with. In most cases, it is best to assume a random value from within the dataset for that particular variable, as this will avoid introducing bias into the dataset.
- Remove missing data for one variable: We now want to isolate a particular variable's missing data, so we can remove the assumed / predicted values from this variable only.
- Predict values based on other variables: For our missing data, use the values in the other variables to predict what our missing data should be. This is typically done through either a regression or nearest neighbour process.
- Integrate predicted values into dataset: After we have completed all of our cycles, we must gather all the predicted values, verify that they are the most accurate set we have, and then place them into the final dataset.
- Replace values in variable: Put the results of the prediction into our variable, so that we can continue the process with the new data.

After each iteration, our predicted values for each variable should get more and more accurate, since the models continue to refine to better fit our dataset.

The goal of multiple imputation is to fill in the missing data so that it can find a model — typically either a normal or chi-square model — to best fit the dataset.

### How to use it
IterativeImputer module within sklearn. This module provides a library to perform Multiple Imputation, leveraging the existing frameworks with sklearn and pandas DataFrames.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20Multiple%20Imputation%202.png)

to

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20Multiple%20Imputation%203.png)




```
import numpy as np
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
import pandas as pd
```
Create the dataset as a Python dictionary
```
 d = {
   'X': [5.4,13.8,14.7,17.6,np.nan,1.1,12.9,3.4,np.nan,10.2],
   'Y': [18,27.4,np.nan,18.3,49.6,48.9,np.nan,13.6,16.1,42.7],
   'Z': [7.6,4.6,4.2,np.nan,4.7,8.5,3.5,np.nan,1.8,4.7]
 }
dTest = {
   'X': [13.1, 10.8, np.nan, 9.7, 11.2],
   'Y': [18.3, np.nan, 14.1, 19.8, 17.5],
   'Z': [4.2, 3.1, 5.7,np.nan, 9.6]
}
```

Create the pandas DataFrame from our dictionary
```
df = pd.DataFrame(data=d)
dfTest = pd.DataFrame(data=dTest)
```
Create the IterativeImputer model to predict missing values
```
imp = IterativeImputer(max_iter=10, random_state=0)
```
Fit the model to the test dataset
```
imp.fit(dfTest)
```
Transform the model on the entire dataset
```
dfComplete = pd.DataFrame(np.round(imp.transform(df),1), columns=['X','Y','Z'])
print(dfComplete.head(10))
```
where:
- datasetType = categorical
- missingDataType = MAR



# Intermediate Python for Data Engineers
- build powerful, sophisticated applications
- learn how to expedite your data processing and management, manage your resources, test your code using the Unittest testing framework, and more.

overview of the courses modules:
- Functions Deep Dive
- Object-Oriented Programming
- Iterators & Generators
- Specialized Collections
- Resource Management
- Unit Testing


## Function Arguments

### Python Gotcha: Mutable Default Arguments

```
def createStudent(name, age, grades=[ ]):
   return {  'name': name, 'age': age, 'grades': grades }
```

- Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call.
- This means that when we call a function, the default values we provide for parameters are only created once, and used for each subsequent call of the function. This means our grades=[] from our earlier function was only created once and anytime we tried to access it, the same list was being modified. We can even see that the memory id of the grades property for both students is the same (using the built-in id() function):

```
# The ids printed will vary depending on the computer we are using.
print(id(chrisley['grades'])) # output: 139828567365696
print(id(dallas['grades'])) # output: 139828567365696
```


#### The None Workaround
If we want an empty list as a potential default argument value, we can use None as a special value to indicate we did not receive anything. After we check whether an argument was provided, we can instantiate a new list if it wasn’t. Here is what the solution looks like for our program from earlier:


```
def createStudent(name, age, grades=None):
  if grades is None:  grades = []
  return { 'name': name, 'age': age, 'grades': grades}
```
```
def addGrade(student, grade):
  student['grades'].append(grade)
  # To help visualise the grades we have added a print statement
  print(student['grades'])
```

### A Recap
- Positional arguments: arguments that are called by their position in the function definition. `print_name('Jiho', 'Baggins')`
- Keyword arguments: arguments that are called by their name. `print_name(last_name='Baggins', first_name='Jiho')`
- Default arguments: arguments that are given default values. `print_name()`


### Variable number of arguments: *args
Unpacking operator (*). The unpacking operator allows us to give our functions a variable number of arguments by performing what’s known as positional argument packing.



```
def my_function(*args):
 print(args)
```
Or
```
def my_function(*random_name):
 print(random_name)
```
Whatever name follows the unpacking operator (*) will store the arguments passed into the function in the form of a tuple. In this case, args has three values inside, but it can have many more (or fewer).
```
my_function('Arg1', 245, False)
```
Output:
```
('Arg1', 245, False)
```

Note to self:
- In our `print()` call, we simply use the name of args with the unpacking operator omitted. The name of args is completely arbitrary, and this example works just the same:



#### Working with *args
```
def shout_strings(*args):
 for argument in args:
   print(argument.upper())

shout_strings('Working on', 'learning', 'argument unpacking!')
```

Output:
```
(WORKING ON, LEARNING, ARGUMENT UNPACKING!)
```

```
def truncate_sentences(length, *sentences):
 for sentence in sentences:
   print(sentence[:length])

truncate_sentences(8, "What's going on here", "Looks like we've been cut off")
```

Output:
```
What's g
Looks li
```


### Variable number of arguments: **kwargs
```
def arbitrary_keyword_args(**kwargs):
 print(type(kwargs))
 print(kwargs)
 # See if there's an 'anything_goes' keyword arg and print it
 print(kwargs.get('anything_goes'))

arbitrary_keyword_args(this_arg='wowzers', anything_goes=101)
```

Output:
```
<class 'dict'>
{'this_arg': 'wowzers', 'anything_goes': 101}
101
```

We can observe two things:
- **kwargs takes the form of a dictionary with all the keyword argument values passed to arbitrary_keyword_args. Since **kwargs is a dictionary, we can use standard dictionary functions like .get() to retrieve values.

- Just as we saw with *args, the name of kwargs is completely arbitrary, and this example works exactly the same with the name becoming data:

```
def arbitrary_keyword_args(**data):
 # ...
```


#### Working with **kwargs

```
def print_data(**data):
 for arg in data.values():
   print(arg)

print_data(a='arg1', b=True, c=100)
```
Output:
```
 arg1
 True
 100
```

```
def print_data(positional_arg, **data):
 print(positional_arg)
 for arg in data.values():
   print(arg)

print_data('position 1', a='arg1', b=True, c=100)
```
Output:
```
 position 1
 arg1
 True
 100
```


### All together now
So far we have seen how both *args and **kwargs can be combined with standard arguments. This is useful, but in some cases, we may want to use all three types together! Thankfully Python allows us to do so as long as we follow the correct order in our function definition. The order is as follows:
1. Standard positional arguments
  - *args
2. Standard keyword arguments
  - **kwargs

Example:
```
def print_animals(animal1, animal2, *args, animal4, **kwargs):
 	print(animal1, animal2)
 	print(args)
 	print(animal4)
 	print(kwargs)


print_animals('Snake', 'Fish', 'Guinea Pig', 'Owl', animal4='Cat', animal5='Dog')
```

Output:
```
Snake Fish
('Guinea Pig', 'Owl')
Cat
{'animal5': 'Dog'}
```



Let’s break it down:
- The first two arguments that our function accepts will take the form of standard positional arguments. When we call the function, the first two values provided will map to animal1 and animal2. Thus, the first line of output is Snake Fish
- The non-keyword arguments that follow after Snake and Fish in our function call are all mapped to the args tuple. Thus, our result is ('Guinea Pig', 'Owl')
- Then we transition to regular keyword arguments. Since we called animal4 as a keyword, our result for the print statement is Cat
- Lastly, we have one more keyword argument that is mapped to **kwargs. Thus, our last line of output is {'animal_5': 'Dog'}

### Review
We covered a lot of ground in this lesson! We learned all about how functions can accept different arguments and different styles in which we can pass those arguments in.

We learned:
- How to pack positional arguments in a function with *args.
- How to work with *args using iteration and other positional arguments.
- How to pack keyword arguments in a function with **kwargs.
- How to work with **kwargs using iteration and other keyword arguments.
- How to combine all different types of arguments to gain the most flexibility in our function declarations.
- How to use an unpacking operator (* or **) to unpack arguments in a function call.
- How to use an unpacking operator (* or **) on iterables.
- We should now be able to read many different styles of function writing in Python and come up with ways to call those functions with style and clarity.


## NAMESPACES AND SCOPE
Some confusion between what distinguishes the concept of scope and namespaces. While both concepts are interlinked and work together…
- namespaces are the mechanism for storing name-object pairs
- scope will serve as a rule system on which point in our code we can retrieve those name-object pairs.

Scope defines which namespaces our program will look into (to check names) and in what order.


### NAMESPACES
### Introduction to Names and Namespaces
A namespace is a collection of names and the objects that they reference. Python will host a dictionary where the keys are the names that have been defined and the mapped values are the objects that they reference.

Example:

namespace Python creates would

`{'color': 'cyan'}`

So, in this case, if we tried to print the variable colour:

`print(color)`

Python would search the namespace defined above for a key named `color` and provide the value to be run in our program. Thus we would see the output of 'cyan'.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20NAMESPACES%20AND%20SCOPE%201.png)

4 distinct types of namespaces that Python generates:
1. Built-In
2. Global
3. Local
4. Enclosing

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20NAMESPACES%20AND%20SCOPE%202.png)

#### 1.Built-in Namespace

Run a Python application,
- We are provided a built-in namespace that is created when the interpreter is started
- Built-in namespace has a lifetime until the interpreter terminates (usually when our program is finished running).
- Since Python provides the namespace, these objects are accessible without the need to import a separate module.

```
(dir(__builtins__))
```

interesting facts about the objects hosted built-in namespace
- 152 names that include exceptions, functions, types, special attributes, and other Python built-in objects
- It contains many of the built-in functions we are able to use in our Python programs such as `str()`, `zip()`, `slice()`, `sorted()`, and many more.
- It also hosts many of the exceptions that we may encounter in our programs such as `'ArithmeticError'`, `'IndexError'`, `'KeyError'`, and many more.
- There are even constants like `True` and `False` !

#### 2.Global Namespace
Global namespace exists one level below the built-in namespace.

It includes all non-nested names in the module (file) we are choosing to run the Python interpreter on.

Global namespace:
- created when we run our main program
- has a lifetime until the interpreter terminates (usually when our program is finished running).

```
#Imaginary File: main.py
 import random
first_name = "Jaya"
last_name = "Bodegard"
def print_variables():
 	random_number = random.randint(0,9)
 	print(first_name)
  	print(last_name)
 	print(random_number)

# ...code from above (omitted for brevity)
print(globals())
```

- The global namespace contains all of the non-nested objects of our program. This includes the variables first_name and last_name as well as the function print_variables.
- Random_number variable is not included in the namespace because it is nested inside of our function.
- Anytime we use the import statement to bring in a new module into our program, instead of adding every name from that module
  - Python will create a new namespace for it.
- This means there might be potentially multiple global namespaces in a single program.
  - This will be masked away from us in the format seen with the random module `(<module 'random' from '/usr/lib/python3.8/random.py'>)`.

#### 3.Local Namespace


##### 3.1Local Namespace
- Interpreter executes a function, it will generate a local namespace for that specific function.
- Local namespace only exists inside of the function and remains in existence until the function terminates.

Notice the following:
- We called locals() inside the a function to get the local namespace generated when the function is executed.
  - If we called locals() outside of a function in our program, it behaves the same as globals().

##### 3.2 Enclosing Namespace
This particular namespace is a special type of local namespace called the enclosing namespace

Enclosing namespaces are created specifically when we work with nested functions and just like with the local namespace, will only exist until the function is done executing.



```
global_variable = 'global'

def outer_function():
  outer_value = "outer"

  def inner_function():
    inner_value = "inner"

  inner_function()

outer_function()
```
1. We define a function called outer_function() and nest another function inside it called inner_function(). To generate a namespace, functions must be executed, so we are calling both of them.

2. Here, The outer_function() serves the role of an enclosing function while inner_function is an enclosed function. By creating this structure, we generate an enclosing namespace - a namespace created by an enclosing function and any number of enclosed functions inside it.

#### Review
In this lesson, we’ve covered:
- Names as identifiers for objects in Python.
- What namespaces are.
- The built-in namespace and how to access it using `__builtins__`.
- The global namespace and how to access it using `globals()`.
- The local namespace and how to access it using `locals()`.
- The enclosing namespace - a special type of local namespace that occurs when working with nested functions.

Knowing these concepts allows for a stronger mastery of Python since names are the basis of how our programs store and retrieve information. Keep up the great work!


### SCOPE
Some confusion between what distinguishes the concept of scope and namespaces. While both concepts are interlinked and work together…
- namespaces are the mechanism for storing name-object pairs
- scope will serve as a rule system on which point in our code we can retrieve those name-object pairs.

#### Introduction
- Scope defines which namespaces our program will look into (to check names) and in what order.
- multiple namespaces usually exist at once,
  - Can’t access all of them in different parts of our program!
  - We can start recognizing when and where certain objects may or may not be accessed.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20NAMESPACES%20AND%20SCOPE%203.png)


Similar to namespaces, there are four different levels of scope. These levels are:
1. Built-in Scope (We will skip)
2. Global Scope
3. Enclosing Scope
4. Local Scope

Each of these scopes has a different level of access to the namespaces our programs generate.

Note: Some confusion between what distinguishes the concept of scope and namespaces. While both concepts are interlinked and work together…
- namespaces are the mechanism for storing name-object pairs
- scope will serve as a rule system on which point in our code we can retrieve those name-object pairs.


#### Local Scope
- local scope is the deepest level of the four scopes
- name-object pairs in a local scope can’t be accessed or modified by any code called in outer scopes.
  - As a rule of thumb, any names created in a local namespace are usually also locally scoped.

- Calling a function will generate a new local scope.
- Each subsequent function call will generate a new local scope.

```
def favorite_color():
 	color = 'Red'
print(color)
```

The name color is scoped locally to the function favorite_color().
Since the statement print(color) is called outside of the function, it has no access to the local scope (and thus the local namespace) inside of favorite_color() and returns an error.




#### Enclosing/Nonlocal Scope
- nested functions form a unique namespace within their enclosing functions (the enclosing namespace),
- exist special rules that apply for accessing nested values
- These rules make up the enclosing scope

```
def outer_function():
    enclosing_value = 'Enclosing Value'
     def nested_function():
       nested_value = 'Nested Value'
       print(enclosing_value)
    nested_function()


outer_function()
```
Output:
```
Enclosing Value
```
-------------------------------------------

```
def outer_function():
      enclosing_value = 'Enclosing Value'
     def nested_function():
          nested_value = 'Nested Value'
          def second_nested():
               print(enclosing_value)
               print(nested_value)
          second_nested()
     nested_function()
outer_function()
```
Output:
```
Enclosing Value
Nested Value
```
Enclosing scope allows any value defined in an enclosing function to be accessed in nested functions below it. We can observe this scope since nested_function() can access a variable defined one level above in the enclosing function (outer_function()).

There are two caveats to be aware of with enclosing scope:
1. The flow of scope access only flows upwards.
- This means that the deepest level has access to every enclosing namespace above it, but not the other way around.

2. Immutable objects ****
- objects such as strings or numbers, can be accessed in nested functions, but cannot be modified.








##### Immutable objects ****
`nonlocal` statement - Python method modify name-object pairs in the enclosing scope

Print out → value
```
def enclosing_function():
  var = "value"
  def nested_function():
    var = "new_value"
  nested_function()
  print(var)
enclosing_function()
```

Print out → new_value
```
def enclosing_function():
  var = "value"
  def nested_function():
    nonlocal var
    var = "new_value"
  nested_function()
  print(var)
enclosing_function()
```

After using the `nonlocal` statement, the variable is now modifiable from the local scope.
`nonlocal` var

#### Global Scope
- global scope highest level of access
- Name-object pairs defined in the global namespace will automatically be globally scoped and can be accessed anywhere in our program.
- Immutable objects **
  - Name-object pairs can be accessed in functions, but cannot be modified
Immutable objects **
- global keyword is used within a local scope to associate a variable name with a name in the global namespace.
- This association is only valid within the local scope when global is used.

In [None]:
global_var = 10
def some_function():
  global_var = 20
some_function()
print(global_var) # 10



global_var = 10
def some_function():
  global global_var
  global_var = 20
some_function()
print(global_var) # 20





def some_function():
 	global x
 	x = 30
some_function()
print(x)

10
20
30


In addition, the global statement can be used even if the name has not been defined in the global namespace. Using the global statement would create the new variable in the global namespace.


#### Scope Resolution: The LEGB Rule
- LEGB stands for Local, Enclosing, Global, and Built-in.
- Represent the order of namespaces Python will check to see if a name exists.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20NAMESPACES%20AND%20SCOPE%204.png)

#### Review
In this lesson, we’ve covered:
- The concept of scope and the LEGB rule.
- What the local scope is.
- What a nested function is and the enclosing/nonlocal scope.
- What the global scope is.
- How to modify behaviour using the global statement.
- How to modify behaviour using the nonlocal statement.

1. You got it right! Accessing a name outside of the proper scope will cause Python to return a NameError since it cannot find the name in any accessible namespace.
2. a nested function and as a result, replicated is in its enclosing or non-local scope
3. scope defines which namespaces a program will search within and in what order.
4. Only nested functions have access to an enclosing scope.

## Function Deep Dive



### Lambda Functions

In Python, a lambda function (also commonly called an anonymous function) is a one-line shorthand for function. Let’s start by examining how lambda functions compare to the normal functions we have already been writing.

Let’s break this syntax down:
1. The function is stored in a variable called add_two.
2. The lambda keyword declares that this is a lambda function (similar to how we use def to declare a normal function).
3. my_input is a parameter used to hold the value passed to add_two.
4

```
# This is formatted as code
```

. In the lambda function version, we are returning my_input + 2 without the use of a return keyword (the normal Python function explicitly uses the keyword return).



Remember that to make a lambda function you can use the syntax:

```lambda my_input: <returns my_input modified somehow> ```


In [None]:
#  A simple lambda function might look like this:
add_two = lambda my_input: my_input + 2
# same as
def add_two(my_input):
		return (my_input + 2)
# So this code:
print(add_two(3))
print(add_two(100))
print(add_two(-2))


5
102
0


Syntax break down:
* The function is stored in a variable called add_two
* lambda declares that this is a lambda function (if you are familiar with normal Python functions, this is similar to how we use def to declare a function)
* my_input is what we call the input we are passing into add_two
* We are returning my_input plus 2 (with normal Python functions, we use the keyword return)


In [None]:
# Q1
is_substring = lambda my_string: my_string in "This is the master string"
# So, the code:
print(is_substring('I'))      # >>> False
print(is_substring('am'))     # >>> False
print(is_substring('the'))    # >>> True
print(is_substring('master')) # >>> True



False
False
True
True


In [None]:
contains_a = lambda word : True if 'a' in word else False
print(contains_a("banana"))  #True
print(contains_a("apple"))   #True
print(contains_a("cherry"))  #False
# OR
contains_a = lambda word : 'a' in word
print(contains_a("banana"))  #True
print(contains_a("apple"))   #True
print(contains_a("cherry"))  #False

True
True
False
True
True
False


In [None]:
# Q2
#Write your lambda function here
# long_string = lambda in_put : True if 12 < len(in_put) else False
long_string = lambda in_put : 12 < len(in_put)


print(long_string("short"))  #False
print(long_string("photosynthesis"))  #True


False
True


In [None]:
# Q3
#Write your lambda function here
ends_in_a = lambda in_put:'a' in in_put[-1]


print(ends_in_a("data")) #True
print(ends_in_a("aardvark")) #False


True
False


In [None]:
# Q4
# #Write your lambda function here
double_or_zero = lambda num : 0 if num <= 10 else 2*num


print(double_or_zero(15))
print(double_or_zero(5))


30
0


In [None]:
# Q5
# Even/Odd
# * In Python, %, or the modulo operator, returns the remainder after division.
# * You can use % 2 to determine if a number is even or odd.
#Write your lambda function here
even_or_odd = lambda num : "even" if 0 == num%2 else "odd"


print(even_or_odd(10)) #even
print(even_or_odd(5)) #odd


even
odd


In [None]:
# Q6
#Write your lambda function here
multiple_of_three = lambda num : "multiple of three" if num%3==0 else "not a multiple"


print(multiple_of_three(9))  #multiple of three
print(multiple_of_three(10)) #not a multiple

multiple of three
not a multiple


In [None]:
# Q7
#Write your lambda function here
rate_movie = lambda rating: "I liked this movie" if rating>8.5 else "This movie was not very good"


print(rate_movie(9.2)) #Output:I liked this movie
print(rate_movie(7.2)) #Output: This movie was not very good

I liked this movie
This movie was not very good


In [None]:
# Q8
# You can use the modulo operator (%) with 10 to find the ones’ place of an integer.
#Write your lambda function here
ones_place = lambda num:num%10


print(ones_place(123)) #Output:3
print(ones_place(4)) #Output:4

3
4


In [None]:
# Q9
#Write your lambda function here
double_square = lambda num:2*(num**2)


print(double_square(5))  #Output:50
print(double_square(3))  #Output:18

50
18


In [None]:
# Q10
# random.randint(a,b) will return an integer between a and b (inclusive).
# * random.randint(0, 100) could return any integer between 0 and 100 including both 0 and 100.
# * random.randint(5, 8) could return any integer between 5 and 8 including both 5 and 8.

import random
#Write your lambda function here
add_random = lambda num: num+random.randint(1, 10)


print(add_random(5))
print(add_random(100))

11
107


In [None]:
# Q11
# Create a lambda function mylambda that returns the first and last letters of a string, assuming the string is at least 2 characters long.
mylambda = lambda in_put : in_put[0]+in_put[-1] if len(in_put)>1 else 'The input string must be at             least 2 characters long.'


print(mylambda('This is a string')) #Output:Tg
print(mylambda('Th')) #Output:Th
print(mylambda('T')) #Output:The input string must be at least 2 characters long.


Tg
Th
The input string must be at             least 2 characters long.


In [None]:
# Q12
# lambda x: [OUTCOME IF TRUE] if [CONDITIONAL] else [OUTCOME IF FALSE]
# you are managing the webpage of a somewhat violent video game and you want to check that each user’s age is 13 or greater when they visit the site.
mylambda = lambda in_put: 'Welcome to BattleCity!' if in_put>=13 else 'You must be 13 or older'
print(mylambda(13)) #Output: Welcome to BattleCity!
print(mylambda(9)) #Output: You must be 13 or older
print(mylambda(18)) #Output: Welcome to BattleCity!

Welcome to BattleCity!
You must be 13 or older
Welcome to BattleCity!


### Built-In Higher-Order Functions
In this article, you’ll learn about three useful higher-order functions that are built-in to Python.

What You’ll Be Learning
We’ve already learned about what defines higher-order functions, how to use them, and why they are useful. Now, we will get acquainted with Python’s built-in high-order functions. We will take a look at three distinct higher-order functions:


#### 1.map()
The map() function applies a passed function to each element in an iterable and returns a map object.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20%20Built-In%20Higher-Order%20Functions%201.png)






In [None]:
# Create a list of grades, including both floating-point numbers and integers
grade_list = [3.5, 3.7, 2.6, 95, 87]

# Use the map function to transform each element in the 'grade_list'
# The lambda function checks if the element is a floating-point number (type(x) == float)
# If it's a float, multiply it by 25 to convert it to a 100-point scale
grades_100scale = map(lambda x: x * 25 if type(x) == float else x, grade_list)

# Convert the map object to a list, storing the upgraded grades in 'upgrade_grade_list'
upgrade_grade_list = list(grades_100scale)

# Print the upgraded grades in the 100-point scale
print(upgrade_grade_list)


[87.5, 92.5, 65.0, 95, 87]


Note:
* map() to iterate through the dictionary and compute the cost of every item sold. We can potentially store this in a tuple  
* when passing a dictionary as an iterable, the function will iterate through the list of the dictionaries keys.


In [None]:
# Define a tuple of numbers from 1 to 10.
numbers = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# Use the 'map' function with a lambda function to square each number in the tuple.
mapped_numbers = map(lambda x: x * x, numbers)

# Convert the 'map' object to a tuple and print the result.
print(tuple(mapped_numbers))

# This code squares each number in the 'numbers' tuple using the 'map' function and a lambda function.
# The expected output is the tuple: (1, 4, 9, 16, 25, 36, 49, 64, 81, 100).


(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)


#### 2. filter()
The filter() function applies a filtering function (a function that returns a boolean) to each element in an iterable. filter() returns a filter object with only the elements for which the filtering function returned True.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20%20Built-In%20Higher-Order%20Functions%202.png)


In [None]:
# Define a list of books, each represented as a list with the book title and its release year.
books = [
    ["The Mystery of the Lost Key", 2021],
    ["A Tale of Two Cities", 1859],
    ["The Wizard of Oz", '190Q'],  # Note: The release year '190Q' is not a valid integer.
    ["The Great Gatsby", "Nineteen twenty-five"],  # Note: The release year is represented in words.
    ["To Kill a Mockingbird", "Nineteen sixty"],  # Note: The release year is represented in words.
    ["Harry Potter and the Philosopher's Stone", 1997],
    ["The Da Vinci Code", "Two thousand three"],  # Note: The release year is represented in words.
    ["The Hunger Games", '2008'],  # Note: The release year '2008' is not a valid integer.
    ["A Song of Ice and Fire: A Game of Thrones", 1996],
    ["The Hobbit", '1937'],  # Note: The release year '1937' is not a valid integer.
]

# Use the filter function with a lambda function to select books where the release year is not an integer.
string_title = filter(lambda book: type(book[-1]) != int, books)

# Convert the filter object to a list, storing books with non-integer release years in 'string_title_list'.
string_title_list = list(string_title)

# Print the list of books with non-integer release years.
print(string_title_list)


[['The Wizard of Oz', '190Q'], ['The Great Gatsby', 'Nineteen twenty-five'], ['To Kill a Mockingbird', 'Nineteen sixty'], ['The Da Vinci Code', 'Two thousand three'], ['The Hunger Games', '2008'], ['The Hobbit', '1937']]


In [None]:
# Define a tuple of numbers from 1 to 10.
nums = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

# Use the 'filter' function with a lambda function to select even numbers from the tuple.
filtered_numbers = tuple(filter(lambda x: x % 2 == 0, nums))

# Print the resulting tuple containing the filtered even numbers.
print(filtered_numbers)

# This code filters even numbers from the 'nums' tuple and prints the result.
# The expected output is the tuple: (2, 4, 6, 8, 10).


(2, 4, 6, 8, 10)


#### 3.reduce()
reduce() must be imported from the functools module. It reduces an iterable to a single value by cumulatively applying a passed function to the first pair of elements in the iterable and then each sequential element with the return value.


* In Python 3, the `reduce()` function has been moved to the `functools` library, so we need to import it before we can use it.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20%20Built-In%20Higher-Order%20Functions%203.png)



In [None]:
# Define a list of letters that form a word.
letters = ['r', 'e', 'd', 'u', 'c', 'e']

# Import the 'reduce' function from the 'functools' module to combine the letters into a word.
from functools import reduce

# Use the 'reduce' function with a lambda function to iteratively concatenate the letters, forming a word.
word = reduce(lambda x, y: x + y, letters)

# Print the resulting word formed by combining the letters.
print(word)


reduce


In [None]:
# Define a tuple of numbers.
nums = (2, 6, 7, 9, 1, 4, 8)

# Use the 'reduce' function with a lambda function to sum all the numbers in the tuple.
reduced_nums = reduce(lambda x, y: x + y, nums)

# Print the result, which is the sum of all the numbers in the 'nums' tuple.
print(reduced_nums)

# This code uses the 'reduce' function to calculate the sum of all the numbers in the 'nums' tuple.
# The expected output is the sum of the numbers, which is 37.


### Decorators

https://www.codecademy.com/paths/data-engineer/tracks/decp-intermediate-python-for-data-engineers/modules/int-python-functions-deep-dive/videos/learn-python-decorators

decorators in Python:
- decorators are basically special functions which we can use to add functionality to existing functions
- makes our programs a lot easier to read and write
- I have here six different ways that we can use decorators on functions

- Python is that all functions are actually objects
- function is very similar to something like a string or a number or a list basically a function




In [None]:
# returning function
def get_math_function(operator):
  def add(n1 , n2):
    return n1 + n2
  def sub(n1, n2):
    return n1 - n2
  if operator == "+":
    return add
  elif operator == "-":
    return sub

#############################################
add_function = get_math_function("+")
print (add_function)
print (add_function(5,2))

"""
output:
<function get_math_function.<locals>.add at 0x7fc2e5cd7c10>
"""

sub_function = get_math_function("-")
print (sub_function)
print (sub_function(5,2))
"""
output:
<function get_math_function.<locals>.sub at 0x7fc2e5c87310>
"""

#decorating a function
def title_decorator(print_name_function):
  def wrapper():
    print("OG")
    print_name_function()
  return wrapper


def print_name():
  print('lilly')


# decorator_function stores the wrapper function without executing it
decorator_function = title_decorator(print_name)
# executing decorator_function/wrapper function by appending ()
decorator_function()



#decorating a function
def title_decorator(print_name_function):
  def wrapper():
    print("OG")
    print_name_function()
  return wrapper


def print_name():
  print('lilly')


# decorator_function stores the wrapper function without executing it
decorator_function = title_decorator(print_name)
# executing decorator_function/wrapper function by appending ()
decorator_function()
Output: OG lilly

"""
output:
<function get_math_function.<locals>.add at 0x7fc2e5cd7c10>
"""


#decorators
def title_decorator(print_name_function):
  def wrapper():
    print("OG")
    print_name_function()
  return wrapper


@title_decorator #function decorators
def print_name(name='lilly'):
  print(name)


print_name() # name is pre-set to lilly
"""
Output: OG lilly
print_name("lol") # TypeError: wrapper() takes 0 positional arguments but 1 was given
Output:  TypeError: wrapper() takes 0 positional arguments but 1 was given
"""
#decorators w/ parameters
def title_decorator(print_name_function):
 def wrapper(*args, **kargs): # these allow us to pipe any inputs
   print("OG")
   print_name_function(*args, **kargs) # these allow us to pipe any inputs
 return wrapper


@title_decorator #
def print_name(name='lilly'):
 print(name)
print_name()
print_name("lol")
"""
Output: OG lilly and OG lol
"""

## Object-Oriented Programming

https://youtu.be/u8gRq4OojXY

### Introduction

Any language classified as an OOP language, there must exist the ability to create programs around classes and objects.

4 core pillars of OOP:
1. Inheritance
2. Polymorphism
3. Abstraction
4. Encapsulation


### 1. OOP Pillar: Inheritance
Well here is what the base structure will look like:


In [None]:
class ParentClass:
 #class methods/properties...
class ChildClass(ParentClass):
 #class methods/properties...

- Reuse methods across multiple subclasses using our parent class
- We are also able to create parent-child relationships between entities!

Example:


In [None]:
lass Animal:
 def eat(self):
   print("Nom Nom Nom...eating food!")


class Dog(Animal):
  def bark(self):
    print('Bark!')
class Cat(Animal):
  def meow(self):
    print('Meow!')


fluffy = Dog()
zoomie = Cat()
fluffy.eat() # Nom Nom Nom...eating food!
zoomie.eat() # Nom Nom Nom...eating food!

#### Overriding Methods
Overriding method - A child class may want to change the behaviour of a method from its parent class.

An overriding method in a subclass is one that has the same definition as the parent class but contains different behaviour.


In [None]:
class Animal:
  def __init__(self, name):
    	self.name = name
  def make_noise(self):
  	 print("{} says, Grrrr".format(self.name))
pet1 = Animal("Rex")
pet1.make_noise() # Rex says, Grrrr

class Cat(Animal):
  def make_noise(self):
   	print("{} says, Meow!".format(self.name))
pet2 = Cat("Maisy")
pet2.make_noise() # Maisy says, Meow!

#### super()
When overriding methods we sometimes want to still access the behaviour of the parent method.

- super() gives us a proxy object.
- With this proxy object, we can invoke the method of an object’s parent class (also called its superclass).
- super() is used in subclasses to invoke a needed behaviour from the superclass alongside the behaviour of a subclass method.


Use super() to call the Employee class .say_id() method. Use the following syntax:

```
super().method()
```




In [None]:
# Example 1
class Animal:
  def __init__(self, name, sound="Grrrr"):
   	self.name = name
   	self.sound = sound
  def make_noise(self):
   	print("{} says, {}".format(self.name, self.sound))

Using superclass super().__init__()

In [None]:

class Cat(Animal):
  def __init__(self, name):
   	super().__init__(name, "Meow!")
pet_cat = Cat("Rachel")
pet_cat.make_noise() # Rachel says, Meow!

- The .__init__() method from the subclass is overriding the one from the superclass.
- super().__init__(name, "Meow!") is called inside the subclass .__init__() method. This additional logic allows us to add the "Meow" sound from within the Cat class, but still use the .__init__() method of the Animal class.


Example 2

Calls the Employee class .say_id() method with in a different function in Admin


In [None]:
Class Employee():
  new_id = 1
  def __init__(self):
   	self.id = Employee.new_id
   	Employee.new_id += 1


  def say_id(self):
   	print("My id is {}.".format(self.id))


class Admin(Employee):
  def say_id(self):
   	super().say_id()
   	print("I am an admin.")

The Admin.say_id()

- redefines the output from the Employee.say_id()
- BUT,  super().say_id() call the Employee.say_id() get the output from the  Employee class



```
e3 = Admin()
e3.say_id()

```
Output:  My id is 3.I am an admin.



#### Multiple Inheritance
Multiple inheritance is when there are multiple levels of inheritance. This means a class inherits members from its superclass and its super-superclass.


In [None]:
class Animal:
  def __init__(self, name):
   	self.name = name
  def say_hi(self):
   	print("{} says, Hi!".format(self.name))
class Cat(Animal):
  pass
class Angry_Cat(Cat):
  pass
my_pet = Angry_Cat("Mr. Cranky")
my_pet.say_hi() # Mr. Cranky says, Hi!

In the above example, Angry_Cat inherits from Cat and Cat inherits from Animal. Both Angry_Cat and Cat have access to the Animal class name attribute and .say_hi() method. Any feature added to Cat, Angry_Cat will also have access to.





inherits directly from two classes and can use the attributes and methods of both.

Example:

In [None]:
class Animal:
  def __init__(self, name):
  	 self.name = name
class Dog(Animal):
  def action(self):
   	print("{} wags tail. Awwww".format(self.name))
class Wolf(Animal):
  def action(self):
   	print("{} bites. OUCH!".format(self.name))
class Hybrid(Dog, Wolf):
  def action(self):
   	super().action()
   	Wolf.action(self)
my_pet = Hybrid("Fluffy")
my_pet.action()

"""
Output
Fluffy wags tail. Awwww
Fluffy bites. OUCH!
"""

In [None]:
Take a closer look

In [None]:
class Hybrid(Dog, Wolf):
  def action(self):
   	super().action()
   	Wolf.action(self)

- `super().action()` inside the Hybrid class invokes the `.action()` method of the Dog class.
  - This is due to `Dog` being listed before `Wolf` in the `Hybrid(Dog, Wolf)` definition.
- `Wolf.action(self)` calls the `Wolf` class `.action()` method.
  - Note here is that self is passed as an argument. `Wolf.action(self)`
  - This ensures that the `.action()` method in Wolf receives the Hybrid class instance to output the correct `name`.

### 2. OOP Pillar: Polymorphism
In computer programming, polymorphism is the ability to apply an identical operation onto different types of objects.


In [None]:
class Animal:
  def __init__(self, name):
   	self.name = name
  def make_noise(self):
   	print("{} says, Grrrr".format(self.name))
class Cat(Animal):
  def make_noise(self):
   	print("{} says, Meow!".format(self.name))

# And

class Robot:
  def make_noise(self):
   	print("beep.boop...BEEEEP!!!")

- Animal class, its subclass Cat, and another standalone class Robot.
- Each class has a method .make_noise() with different outputs. The identical method name with different behaviours is a form of polymorphism.


In [None]:
an_animal = Animal("Bear")
my_pet = Cat("Maisy")
my_vacuum = Robot()
objects = [an_animal, my_pet, my_vacuum]
for o in objects:
 o.make_noise()
"""
OUTPUT:
"Bear says, Grrrr"
"Maisy says, Meow!"
"beep.boop...BEEEEP!!!"
"""

### 3. OOP Pillar: Abstraction
- Abstraction helps with the design of code by defining necessary behaviour to be implemented within a class structure.
- By doing so, abstraction also helps avoid leaving out or overlapping class functionality as class hierarchies get larger.


In [None]:
from abc import ABC, abstractmethod
class Animal(ABC):
  def __init__(self, name):
   	self.name = name
  @abstractmethod
  def make_noise(self):
   	pass

The `Animal` class now inherits from an imported class `ABC`, which stands for Abstract Base Class.

- Animal an abstract class that cannot be instantiated.
- imported decorator `@abstractmethod` on the empty method `.make_noise()`   

In [None]:
class Cat(Animal):
  def make_noise(self):
   	print("{} says, Meow!".format(self.name))
class Dog(Animal):
  def make_noise(self):
   	print("{} says, Woof!".format(self.name))

kitty = Cat("Maisy")
doggy = Dog("Amber")
kitty.make_noise() # "Maisy says, Meow!"
doggy.make_noise() # "Amber says, Woof!"

- Cat and Dog classes that inherit from Animal

1. The abstraction process defines what an Animal is …
  - but does not allow the creation of one.
2. The .__init__() method still requires a name, since we feel all animals deserve a name.
3. The .make_noise() method exists since all animals make some form of noise, but the method is not implemented since each animal makes a different noise.
  -  Each subclass of Animal is now required to define their own .make_noise() method or an error will occur.


### 4. OOP Pillar: Encapsulation
Encapsulation is the process of making methods and data hidden inside the object they relate to.

They are called access modifiers like: Public, Protected and Private
- Public members can be accessed from anywhere
- Protected members can only be accessed from code within the same module

single underscore `self._x` to indicate that a member is protected
- Private members can only be accessed from code within the class that these members are defined

two leading underscores `self.__x`

Members that are preceded with two underscores have their names modified in the background to obj._Classname__x.

Mangling is mechanism is to prevent clashing member names of any inheriting classes that might define a member of the same name.


### Getters, Setters and Deleters
Using getter, setter, and deleter functions are one way to implement encapsulation within Python where the state of class attributes can be handled within the class. These functions are useful in making sure that the data being handled is appropriate for the defined class functionality.


In [None]:
class Animal:
  def __init__(self, name):
   	self._name = name
   	self._age = None
  def get_age(self):
   	return self._age
  def set_age(self, new_age):
   	if isinstance(new_age, int):
    	         self._age = new_age
   	else:
    	         raise TypeError
  def delete_age(self):
   	print("_age Deleted")
   	del self._age

a = Animal("Rufus")
print(a.get_age()) # None
a.set_age(10)
print(a.get_age()) # 10
a.set_age("Ten") # Raises a TypeError
a.delete_age() # "_age Deleted"
print(a.get_age()) # Raises a AttributeError

### Review

- Inheritance

Python allows classes to inherit on multiple levels. Meaning a class can inherit from a base class as well as a derived class. Python also supports multiple inheritance, where one class can inherit from any number of other classes. This allows us to describe complex relationships between objects with minimal repeated code.

- Polymorphism

Polymorphism is a concept that allows functions and objects to behave in different ways depending on context. There is the polymorphism of functions like len() or the addition operator +, which can act differently depending on the provided data.

- Abstraction

Python supports the concept of abstraction by allowing objects with methods that have the same name, to be called in a general manner. Further, Python provides the Abstract Base Class (ABC) for us to create a more clearly defined interface.

- Encapsulation

Python’s approach to encapsulation is unique compared to most other object-oriented programming languages. In Python, all members of an object are publicly accessible but there are conventions to indicate to developers that a member is intended to be protected or private.

In this lesson, we learned more complicated relationships between classes. We learned:

- How to create a subclass of an existing class.
- How to redefine existing methods of a parent class in a subclass by overriding them.
- How to leverage a parent class’s methods in the body of a subclass method using the super() function.
- How to write programs that are flexible using interfaces and polymorphism.
- How to write data types that look and feel like native data types with dunder methods.
- These are really complicated concepts! It’s a long journey to get to the state of comfortably being able to build class hierarchies that embody the concerns that your software will need to. Give yourself a pat on the back, you earned it!




## The @property Decorator

When using the decorator, remember three rules:
1. All three methods must use the same member name (ex. weight).
2. The first method must be the getter and is identified using @property.
3. The decorators for the setter and deleter are defined by the name of the method @property is used with.

Keep the @property decorator in mind when approaching any object-oriented program! It will save time and keep code cleaner and more maintainable.

In [None]:
class Box:
  def __init__(self, weight):
   	self.__weight = weight
  def getWeight(self):
   	return self.__weight
  def setWeight(self, weight):
  	 if weight >= 0:
     	         self.__weight = weight

box = Box(10)
box.setWeight(-5)
print(box.getWeight()) # output: 10
box.setWeight(5)
print(box.getWeight()) # output: 5

Let’s break this down:
- First, we have renamed all of our methods to simply be weight().
- Then we denoted our getter with a @property. This marks the property to be used as a prefix for decorating the setter and deleter methods.
- Lastly, we use @weight.setter and @weight.deleter to define our setter and deleter methods, respectively.




In [None]:
class Box:
  def __init__(self, weight):
  	self.__weight = weight
  @property ##### tag
  def weight(self): """Docstring for the 'weight' property"""
  	return self.__weight
  @weight.setter  ##### tag
  def weight(self, weight):
  	if weight >= 0:
    	      self.__weight = weight
  @weight.deleter  ##### tag
  def weight(self):
  	del self.__weight

box = Box(10)
box.weight = 5
del box.weight

## UNIT TESTING


### Exceptions



#### Introduction
important to know how to control errors and use them to our advantage effectively.

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%201.png)


- Syntax errors are mistakes in the structure of Python code. They are caught during a special parsing stage before a program is executed. They always prevent the entire program from running.
- Exceptions are runtime errors because they occur during program execution, only when the offending code (the code causing the error) is reached.


A traceback is a summary that includes the exception type, a message, and the series of function calls preceding the exception, along with file names and line numbers.

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%202.png)

Output:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%203.png)

In the traceback above, reading from the bottom line, we see the…
- exception originated on line 1 of a file called script.py while calling print(1/0)
- exception type (ZeroDivisionError)
- message (division by zero)


#### Built-in Exceptions
The full hierarchy of built-in exceptions is the following:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%204.png)


#### Raising Exceptions

We can  raise an exception anytime we think a mistake has or will occur in our program.
- It stops program execution immediately and provides a useful error message
- instead of allowing mistakes to occur that may be difficult to diagnose at a later point.

One way to use the raise keyword is by pairing it with a specific exception class name. We can either call the class by itself or call a constructor and provide a specific error message.
Syntax:


In [None]:
# Syntax:
raise NameError
# or
raise NameError('Custom Message')

# Example:
# Alternatives: raise TypeError() or TypeError('Message')
raise TypeError
# or
raise Exception('Employee does not have access!')


#### Try / Except clauses
- try/except clauses allows for programs to continue executing even after encountering an exception.
- This process is known as exception handling and is accomplished using the try/except clauses.

Following flow chart for the mechanics of try/except:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%205.png)

- The try block lets you test a block of code for errors.
- The except block lets you handle the error.
- The else block lets you execute code when there is no error.
- The finally block lets you execute code, regardless of the result of the try- and except blocks.




Let’s break it down:
- Python will first attempt to execute code inside the try clause code block.

- If no exception is encountered in the code, the except clause is skipped and the program continues normally.

- If an exception does occur inside of the try code block, Python will immediately stop executing the code and begin executing the code inside the except code block (sometimes called a handler).



In [None]:
try:
 	print(x)
except NameError:
 	print("Variable x is not defined")
except Exception:
 	print("Something else went wrong")

- The first except block is only executed if a NameError is encountered (in the try block) rather than any exception.
- If any other exception occurs, it will be handled by the second except block
  - if an exception is encountered, Python will execute the first one that matches its type. In this case, and a valid strategy for exception handling, we use the last except clause as a generic Exception as a backup if no other specific exception gets caught.


##### The else Clause
we encounter exceptions during a try clause - but what if we want to run some code only if we do not encounter an exception?- the else clause.

flow chart shows:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%206.png)

In [None]:
# Example:

try:
 	check_password()
except ValueError:
 	print('Wrong Password! Try again!')
else:
 	login_user()  # 20 other lines of imaginary code

"""
Python will only execute the else clause if no exception was encountered in the try clause.
"""

##### The finally Clause
Note that the finally clause can be used independently (without an except or else clause). This is a convenient way to guarantee that a behaviour will occur, regardless of whether an exception occurs:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%207.png)


In [None]:
# Example
try:
 	check_password()
except ValueError:
 	print('Wrong Password! Try again!')
else:
 	login_user()
 	# 20 other lines of imaginary code
finally:
 	load_footer()


"""
Also valid and both try and finally will run
"""
try:
 	check_password()
finally:
 	load_footer()


#### User-defined Exceptions
core syntax:


In [None]:
class CustomError(Exception):
   pass

"""
Using  a class a error type
"""

class LocationTooFarError(Exception):
  pass
def schedule_delivery(distance_from_store):
   if distance_from_store > 10:
       raise LocationTooFarError
   else:
       print('Scheduling the delivery...')

#### Customising User-defined Exceptions

In [None]:
class LocationTooFarError(Exception):
  def __init__(self, distance):
      self.distance = distance
  def __str__(self):
       return 'Location is not within 10 km: ' + str(self.distance)


Let’s break this down:

- We have a class named LocationTooFarError that still inherits from the built-in Exception class.
- We have added a constructor that is going to take in a distance argument when we instantiate our exception class.
  - we have overridden the constructor of the Exception class to accept our own custom argument of distance.
  - The reason for taking in a distance is to use it in our __str__ method that will return a custom error message when the exception is hit!

- The __str__ method provides our exception a custom message by returning a string with the distance property from the constructor.


#### Review
We learned:

- How exceptions differ from syntax errors
- How to read tracebacks
- How try/except/else/finally provides us with a powerful control flow for handling exceptions
- How to create and raise custom exceptions to provide more helpful errors to users of our code
- These tools will get you very far as a Python developer!
- The output should also include the else clause, which gets executed in this case because no exception was encountered.


### Unit testing



#### Introduction to Testing

The world of testing can generally be divided into two categories:
1. Manual Testing:
  - With manual testing, a physical person interacts with software much as a user would. In fact, we have been manually testing our code any time we run it and observe the results!

2. Automated Testing:
  - With automated testing, tests are performed with code. Generally, automated testing is faster and less prone to human error.


#### The assert Statement
the assert statement. An assert statement can be used to test that a condition is met. If the condition evaluates to False, an AssertionError is raised with an optional error message.

The general syntax looks like this:


```
assert <condition>, 'Message if condition is not met'
```



In [None]:
# Example:
def times_ten(number):
   return number * 100
result = times_ten(20)
assert result == 200, 'Expected times_ten(20) to return 200, instead got ' + str(result)




Output: raised error message

```
AssertionError: Expected times_ten(20) to return 200, instead got 2000
```
To test if our times_ten() function works as intended. We use the assert statement to evaluate the expression result == 200 since we expect that our function would return 200 given an input of 20. Since this is not the case, this expression evaluates to False (there is a bug in times_ten - it actually multiplied by 100!), we get the following exception:


#### Unit Testing
- testing the smallest unit of a program.
- individual tests each function are called unit tests.
- A unit test validates a single behaviour and will make sure all of the units of a program are functioning properly.

Let’s examine a test case for our times_ten() function from the previous exercise:


In [None]:
# The unit we want to test
def times_ten(number):
   	return number * 100
# A unit test function with a single test case
def test_multiply_ten_by_zero():
   	assert times_ten(0) == 0, 'Expected times_ten(0) to return 0'

We can improve our testing coverage of this function by adding some more test cases with different inputs.
- Create test cases for specific edge case inputs as well as reasonable ones.


In [None]:
def test_multiply_ten_by_one_million():
   	assert times_ten(1000000) == 10000000, 'Expected times_ten(1000000) to return 10000000
def test_multiply_ten_by_negative_number():
   	assert times_ten(-10) == -100, 'Expected times_ten(-10) to return -100'

#### Python's unittest Framework


In [None]:
# Importing unittest framework
import unittest
# Function that gets tested
def times_ten(number):
   	return number * 100
# Test class
class TestTimesTen(unittest.TestCase):
   def test_multiply_ten_by_zero(self):
       self.assertEqual(times_ten(0), 0, 'Expected times_ten(0) to return 0')
   def test_multiply_ten_by_one_million(self):
       self.assertEqual(times_ten(100), 1000, 'Expected times_ten(100) to return 1000')
   def test_multiply_ten_by_negative_number(self):
       self.assertEqual(times_ten(-10), -100, 'Expected add_times_ten(-10) to return -100')
# Run the tests
unittest.main()

#### Assert Methods
Python documentation:
- https://docs.python.org/3/library/unittest.html#unittest.TestCase.debug
- https://docs.python.org/3/library/unittest.html#unittest.TestCase.output
- https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertNotIsInstance

##### Equality and Membership

- assertEqual: The assertEqual() method takes two values as arguments and checks that they are equal. If they are not, the test fails.

```
self.assertEqual(value1, value2)
```


- assertIn: The assertIn() method takes two arguments. It checks that the first argument is found in the second argument, which should be a container. If it is not found in the container, the test fails.

```
self.assertIn(value, container)
```

- assertTrue: The assertTrue() method takes a single argument and checks that the argument evaluates to True. If it does not evaluate to True, the test fails.

```
self.assertTrue(value)
```

Comparison:



```
Method                      | Equivalent
----------------------------+-----------------------
self.assertEqual(2, 5)      | assert 2 == 5
----------------------------+-----------------------
self.assertIn(5, [1, 2, 3]) | assert 5 in [1, 2, 3]
----------------------------+-----------------------
self.assertTrue(0)          | assert bool(0) is True

```






##### Quantitative Methods
Often we need to test conditions related to numbers. assert methods related to quantitative comparisons,

- assertLess: The assertLess() method takes two arguments and checks that the first argument is less than the second one. If it is not, the test will fail.

```
self.assertLess(value1, value2)
```

- assertAlmostEqual: The assertAlmostEqual() method takes two arguments and checks that their difference, when rounded to 7 decimal places, is 0. In other words, if they are almost equal. If the values are not close enough to equality, the test will fail.

```
self.assertAlmostEqual(value1, value2)
```



```
Method                            | Equivalent
----------------------------------+-----------------------------------
self.assertLess(2, 5)             | assert 2 < 5
----------------------------------+-----------------------------------
self.assertAlmostEqual(.22, .225) | assert round(.22 - .225, 7) == 0



```






##### Exception and Warning Methods
assert methods related to exceptions and warnings.

- assertRaises: The assertRaises() method takes an exception type as its first argument, a function reference as its second, and an arbitrary number of arguments as the rest.

```
self.assertRaises(specificException, function, functionArguments....)
```


It calls the function and checks if an exception is raised as a result. The test passes if an exception is raised, is an error if another exception is raised, or fails if no exception is raised. This method can be used with custom exceptions as well!

- assertWarns: The assertWarns() method takes a warning type as its first argument, a function reference as its second, and an arbitrary number of arguments for the rest.

```
self.assertWarns(specificWarningException, function, functionArguments...)
```

It calls the function and checks that the warning occurs. The test passes if a warning is triggered and fails if it isn’t.


#### Parameterizing Tests
- By parameterizing tests, we can leverage the functionality of a single test to get a large amount of coverage of different inputs.
- Specific toolset for tests with only minor differences.
  - The unittest framework provides us with the subTest context manager.

https://docs.python.org/3/library/unittest.html#distinguishing-test-iterations-using-subtests

In [None]:
# def test function (self):
#... more code above..
for num in [0, 1000000, -10]:
 	with self.subTest(num):
 		# ... assert statement  containing num ....
# ... more code below ....

# Example:

# The function we want to test
def times_ten(number):
   return number * 100
# Our test class
class TestTimesTen(unittest.TestCase):
   # A test method
   def test_times_ten(self):
       for num in [0, 1000000, -10]:
           with self.subTest():
               expected_result = num * 10
               errorMessage = 'Expected times_ten(' + str(num) + ') to return ' + str(expected_result)
               self.assertEqual(times_ten(num), expected_result, errorMessage)

#### Test Fixtures SetUp and tearDown
- One of the most important principles of testing is that tests need to occur in a known state.
- If test conditions are not controlled, then our results could be
  - false negatives (invalid failed results)
  - false positives (invalid passed results).

A test fixture is a mechanism for ensuring proper:
- test setup (putting tests into a known state)
- test teardown (restoring the state prior to the test running)

Test fixtures guarantee that our tests are running in predictable conditions, and thus the results are reliable.


In [None]:
def power_cycle_device():
 	print('Power cycling bluetooth device...')
class BluetoothDeviceTests(unittest.TestCase):
   def setUp(self):
   	power_cycle_device()
  def test_feature_a(self):
   	print('Testing Feature A')
  def test_feature_b(self):
   	print('Testing Feature B')
  def tearDown(self):
   	power_cycle_device()

The unittest framework automatically identifies setup and teardown methods based on their names.
- A method named setUp runs before each test case in the class.
- A method named tearDown gets called after each test case.

Now, we can guarantee that our Bluetooth module is in a working state before and after every test. Here is the output when these tests are run:


```
Power cycling bluetooth device...
Testing Feature A
Power cycling bluetooth device...
.Power cycling bluetooth device...
Testing Feature B
Power cycling bluetooth device...
.
---------------------------------------
Ran 2 tests in 0.000s
OK

```

It’s generally good practice to create fixtures that run for every test.

However, when a fixture has a large cost (i.e. it takes a long time), then it might make more sense to have it run once per test class rather than once per test.

Let’s practise setting up test fixtures!




In [None]:
def power_cycle_device():
   	print('Power cycling bluetooth device...')
class BluetoothDeviceTests(unittest.TestCase):
  @classmethod
  def setUpClass(cls):
   	power_cycle_device()
  def test_feature_a(self):
   	print('Testing Feature A')
  def test_feature_b(self):
   	print('Testing Feature B')
  @classmethod
  def tearDownClass(cls):
   	power_cycle_device()

We replaced ..
- setUp method with the setUpClass method
- tearDown method with the tearDownClass class method.

… and added the @classmethod decorator. We changed the argument from self to cls because this is a class method.

Now, we get the following output:


```
Power cycling bluetooth device...
Testing Feature A
Testing Feature B
Power cycling bluetooth device...
-----------------------------------------------
Ran 2 tests in 0.000s
OK

```
In addition to calling functions, we can also use setup methods to instantiate objects and or gather any other data needed. Anything stored in our class will be available throughout our test functions.





#### Skipping tests
We might have a group of tests that only runs on the Windows operating system but not Linux or macOS. It's helpful to be able to skip tests.

The unittest framework provides two different ways to skip tests:

1. The @unittest skip decorator
  - Skip decorators are slightly more convenient and make it easy to see under what conditions the test is skipped.
2. The skipTest() method
  - When the conditions for skipping a test are too complicated to pass into a skip decorator, the skipTest method is the recommended alternative.




1. The @unittest skip decorator

In [None]:
import sys
class LinuxTests(unittest.TestCase):
   @unittest.skipUnless(sys.platform.startswith("linux"), "This test only runs on Linux")
   def test_linux_feature(self):
       print("This test should only run on Linux")
   @unittest.skipIf(not sys.platform.startswith("linux"), "This test only runs on Linux")
   def test_other_linux_feature(self):
       print("This test should only run on Linux")

In this example, both decorators achieve the same goal: skipping the test if the operating system is not Linux. Note: not condition  used in skipIf().

Let’s break down both skip decorator options:
- The skipUnless option skips the test if the condition evaluates to False.

`@unittest.skipUnless(condition ,”string message ”)`
- The skipIf option skips the test if the condition evaluates to True.

`@unittest.skipIf(condition  ,”string message ”)`

2. The skipTest() method

In [None]:
import sys
class LinuxTests(unittest.TestCase):
   def test_linux_feature(self):
       if not sys.platform.startswith("linux"):
           self.skipTest("Test only runs on Linux")

It takes a single string message as its argument and always causes the test to be skipped when called.

#### Expected Failures
A feature has a known bug or is designed to fail on purpose.
- we wouldn’t want an expected failure to cloud our test results.
- simply skipping the test
Or
- Expected failures are counted as passed in our test results. If the test passes when we expected it to fail, then it is marked as failed in test results.

To set up a test to have an expected failure, we can use the expectedFailure decorator. Let’s consider the following example:




In [None]:
class FeatureTests(unittest.TestCase):
   	@unittest.expectedFailure
   	def test_broken_feature(self):
       		raise Exception("This test is going to fail")


The `expectedFailure` decorator takes **`no arguments`**. The test in the example will always fail because an exception was raised during test execution. When run, we get the following output:


```
x
--------------------------------
Ran 1 test in 0.000s
OK (expected failures=1)
```





#### Review
The difference between manual and automated testing.
- What unit tests are.
- How to write simple tests with the assert keyword.
- How to create and run test cases with the unittest framework.
- Best practices for test fixtures, test parameterization, skipped tests and expected failures.

The world of software testing is vast and can take time to master, but the basic principles of unit testing will almost always be applicable to any language we work with.

Incorporating testing into our software is the best way to prevent unexpected bugs from occurring. The sooner we write tests, the faster we can catch and fix bugs and make our software better!


## Iterators and generators



### Iterables & Iterators



#### Introduction to Iterables
The three core components that comprise the loop process. In the next sections we will explore:

- The iter() function that creates an iterator object out of iterables (such as our dictionary).
- The next() function that captures each individual value during the iteration process.
- The StopIteration exception that forces our loop to stop where there are no elements remaining.


In [None]:
# Example

dog_foods = {
    "Greate Dane Foods": 4,
    "Min Pin Pun Foods": 10,
    "Pawsome Puns Foods": 8
}

for food_brand in dog_foods:
  print(food_brands + " has " + str(dog_foods[food_brand]) + "bags")


![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%208.png)

#### Iterator Objects: __iter__() and iter()
- The first step that the for loop has to do is to convert our dictionary (the iterable) of dog_foods to an iterator object.
- iterator obj  that represents a stream of data that we can operate on. To accomplish this, it uses a built-in function called iter():

Iterable → iterator obj using:
- inter(Iterable)
Or
- Iterable.__iter__()



```
print(iter(dog_foods))
<dict_keyiterator object at 0x....>
# Note: The memory address is omitted since it varies on the system you run the script on.

```
![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%209.png)





#### Iterator Objects:__next__() and next()

Well, the iterator obj has a method called __next__(), which retrieves the iterator’s next value. Let’s take a look using our SKU iterable for our shop:


In [None]:
sku_list = [7046538, 8289407, 9056375, 2308597]
sku_iterator = iter(sku_list)
for i in range(5):
 	next_sku = sku_iterator.__next__()
 	print(next_sku)

Similarly to __iter__() and iter(), there is a Python built-in function called next() that we can use in place of calling the __next__() method. Calling next() simply calls the iterator object’s __next__() method. Here is the same script but using next():



```
7046538
8289407
9056375
2308597
Traceback (most recent call last):
 	File "script.py", line 12, in <module>
  		 next_sku = sku_iterator.__next__()
StopIteration
```

But how does the iterator object know when to stop retrieving values? Does it keep calling __next__() forever? Well, luckily __next__() method will raise an exception called StopIteration when all items have been iterated through.




In [None]:
dog_foods = {
    "Greate Dane Foods": 4,
    "Min Pin Pun Foods": 10,
    "Pawsome Puns Foods": 8
}

for food_brand in dog_foods:
  print(food_brands + " has " + str(dog_foods[food_brand]) + "bags")

To summarise, the three main steps are:

1. The for loop will first retrieve an iterator object for the dog_foods dictionary using iter().
2. Then, next() is called on each iteration of the for loop to retrieve the next value. This value is set to the for loop’s variable, food_brand.
3. On each for loop iteration, the print statement is executed, until finally, the for loop executes a call to next() that raises the StopIteration exception. The for loop then exits and is finished iterating.

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%2010.png)


#### Custom Iterators
The implementation of methods __iter__() and __next__() is known as the iterator protocol.

If we desire to create our own custom iterator class, we must implement the iterator protocol, meaning we need to have a class that defines at minimum the __iter__() and __next__() methods.


In [None]:
class FishInventory:
 	def __init__(self, fishList):
     		self.available_fish = fishList

By default, custom classes are not iterable. To make the custom class iterable, we can simply define `__iter__()` and `__next__()` methods.
- The `__iter__()` method must always return the iterator object itself. Typically, this is accomplished by returning `self`. It can also include some class members initialising.
- The `__next__()` method must either return the next value available or raise the `StopIteration` exception. It can also include any number of operations.

In [None]:
# Let’s return to our custom FishInventory class:

class FishInventory:
  def __init__(self, fishList):
     self.available_fish = fishList
  def __iter__(self):
   self.index = 0
   return self
  def __next__(self):
   if self.index < len(self.available_fish):
     	fish_status = self.available_fish[self.index] + " is available!"
     	self.index += 1
     return fish_status
   else:
     raise StopIteration

- Define the `__iter__()` method.
  - `__iter__()` method returns itself since this class will be an iterator object. And typically the object itself is returned here by using return `self`.
  - We can initialise a class member within the `__iter__()` method called index that will help us track the current position we’re in within the `self.available_fish` list.

- Define the `__next__()` method.
  
  we can perform operations inside this method, like incrementing class members or traversing a for loop for instance.
  - Return the next available fish status within a string value
  - Increment our class member index by 1
  - stop the iterator by raising the `StopIteration` exception if index exceeds the length of available_fish.

#### Python’s Itertools: Built-in Iterators
Built-in module named itertools used to create complex iterator manipulations. These iterator operations can input either a single iterable or a combination of them.

https://docs.python.org/3/library/itertools.html


There are three categories of itertool iterators:
- **Infinite**: Infinite iterators will repeat an infinite number of times. They will not raise a StopIteration exception and will require some type of stop condition to exit from.
- **Input-Dependent**: Input-dependent iterators are terminated by the input iterable(s) sequence length. This means that the smallest length iterable parameter of an input-dependent iterator will terminate the iterator.
- **Combinatoric**: Combinatoric iterators are iterators that are combinational, where mathematical functions are performed on the input iterable(s).

We can use the itertools module by simply supplying an import statement at the top of the module like this:

```
import itertools
```

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%2011.png)






##### Infinite Iterator: Count
- An infinite iterator will repeat an infinite number of times with no endpoint and no StopIteration exception raised.
- Infinite iterators are useful when we have unbounded streams of data to process.

Example
count() itertool. This infinite iterator will count from a first value until we provide some type of stop condition.

The base syntax of the function looks like this:

```
count(start,[step])
```

The argument of count()
- start is the value where we start counting from.
- step (optional, default to 1) that will return the current value + step. The step value can be…
  - positive
  - negative
  - integer or float number

We first import our itertools module and then create a loop (this can be a while loop or a for loop), that will iterate through our count() iterator:

In [None]:
import itertools
for i in itertools.count(start=0, step=2):
 	print(i)
 	if i >= 20:
   		break

Here is what happens in the script:
- We set our start argument to 0 so that we start counting from 0.
- We set our step argument to 2 so that way we increment +2 on each iteration.
- We create a stop condition, which is i >= 20, otherwise this for loop would continue printing forever!


##### Input-Dependent Iterator: Chain
`chain()` takes in one or more iterables and combines them into a single iterator. Here is what the base syntax looks like:


```
chain(*iterables)
```

Example:

- Imports the itertools module.
- Sets all_numbers to the iterator returned by the itertool chain().
- Uses the list iterable odd and the set iterable even as the arguments to chain().
- Implements a for loop using the iterator in all_numbers
- Prints the results, which will be:




In [None]:
import itertools
odd = [5, 7, 9]
even = {6, 8, 10}
all_numbers = itertools.chain(odd, even)
for number in all_numbers:
 	print(number)

#### Combinatoric Iterator: Combinations
A combinatorial iterator will perform a set of statistical or mathematical operations on an input iterable.

A useful itertool that is a combinatoric iterator is the combinations() itertool. This itertool will produce an iterator of tuples that contain combinations of all elements in the input.

```
combinations(iterable, r)
```

The combinations() itertool takes in two inputs,
- Iterable
- r : represents the length of each combination tuple.

The return type of combinations() is an iterator that can be used in a for loop or can be converted into an iterable type using list() or a set().


```
import itertools
even = [2, 4, 6]
even_combinations = list(itertools.combinations(even, 2))
print(even_combinations)
```

Here we:
- Import the module itertools.
- Create an iterator using combinations() with the list of even numbers as the first argument and 2 as the second argument.
- Set even_combinations equal to a list of the elements in the iterator returned from combinations().
- Print even_combinations. The resulting list of 2 member tuples are the combinations of all 3 members of even:



```
Output:
[(2, 4), (2, 6), (4, 6)]

```




In [None]:
# Ex:

import itertools
collars = ["Red-S","Red-M", "Blue-XS", "Green-L", "Green-XL", "Yellow-M"]
# Write your code below:
collar_combo_iterator = itertools.combinations(collars, 3)
for i in collar_combo_iterator:
 	print(i)




('Red-S', 'Red-M', 'Blue-XS')
('Red-S', 'Red-M', 'Green-L')
('Red-S', 'Red-M', 'Green-XL')
('Red-S', 'Red-M', 'Yellow-M')
('Red-S', 'Blue-XS', 'Green-L')
('Red-S', 'Blue-XS', 'Green-XL')
('Red-S', 'Blue-XS', 'Yellow-M')
('Red-S', 'Green-L', 'Green-XL')
('Red-S', 'Green-L', 'Yellow-M')
('Red-S', 'Green-XL', 'Yellow-M')
('Red-M', 'Blue-XS', 'Green-L')
('Red-M', 'Blue-XS', 'Green-XL')
('Red-M', 'Blue-XS', 'Yellow-M')
('Red-M', 'Green-L', 'Green-XL')
('Red-M', 'Green-L', 'Yellow-M')
('Red-M', 'Green-XL', 'Yellow-M')
('Blue-XS', 'Green-L', 'Green-XL')
('Blue-XS', 'Green-L', 'Yellow-M')
('Blue-XS', 'Green-XL', 'Yellow-M')
('Green-L', 'Green-XL', 'Yellow-M')


#### Review
Good job! In this lesson, we covered:

- Iterables and iterators and how they differ.
- Using the `iter()` funtion to create an iterator.
- Using the `next()` function to manually iterate over an iterator.
- How for loops use iterables and iterators.
- How to write custom iterators by implementing the `__iter__()` and `__next__()` methods.
- How to use built-in itertools including count(), chain() and combinations().


### Generators



#### Introduction

- Creation of iterators without having to implement __iter__() and __next__() methods. Generators improve ...
  - code readability,
  - save memory by allowing for iterative access of elements allow for the traversal of infinite streams of data.

There are two types of generators in Python:
1. Generator functions
2. Generator Expressions

- generator object that can be looped over similar to a list
- But contents of the generator object are not stored in memory
- allowing for complex and even infinite iteration of data





#### yield vs return
Generator function(yield) ≈ regular functions(return)-
- except that they must return an iterator.
- generator functions use an expression called yield.

- Well, any code that is written after a yield expression will execute on the next iteration of the iterator.
- Code written after a return statement will not execute.

In [None]:
# Example
def class_standing_generator():
 	yield 'Freshman'
 	yield 'Sophomore'
 	yield 'Junior'
 	yield 'Senior'
"""
This function will return an iterator that contains the string values 'Freshman', 'Sophomore', 'Junior', and 'Senior'.
On each iteration of the iterator, each yield will return its corresponding course value.
"""
class_standings = class_standing_generator()
for class_stand in class_standings:
 	print(class_stand)

Freshman
Sophomore
Junior
Senior


- `yield` expression will suspend the execution of the function and preserve any local variables that exist within the function.
- `return` statement will terminate the function immediately and return the result(s) to the caller.

#### next() and StopIteration
Generator functions return an iterator object that contains traversable values.

next() retrieve the next value from a generator object,

- cause the generator function to resume its execution until the next yield expression is found.
- After the next yield expression is found, the function will pause execution again.
- If no additional yield expressions are found in a generator function, a StopIteration is raised.

Generator functions are not limited to just single yield statements. They can also include loops where the yield occurs.


In [None]:
def prize_generator():
  student_info = { "Joan Stark": 355, "Billy Mars": 45, "Tori Rivers": 18, "Kyle Newman": 25 }
  for student in student_info:
    id = student_info[student]
    if id % 3 == 0 and id % 5 == 0:
      yield student + " gets prize C"
    elif id % 3 == 0:
      yield student + " gets prize A"
    elif id % 5 == 0:
      yield student + " gets prize B"


prizes = prize_generator()
print(next(prizes)) #Joan Stark gets prize B
print(next(prizes)) #Billy Mars gets prize C
print(next(prizes)) #Tori Rivers gets prize A
print(next(prizes)) #Kyle Newman gets prize B
print(next(prizes))

# Traceback (most recent call last):
#   File "script.py", line 13, in <module>
#     print(next(standing_values))
# StopIteration

Joan Stark gets prize B
Billy Mars gets prize C
Tori Rivers gets prize A
Kyle Newman gets prize B


StopIteration: 

In [None]:
# Ex:
def student_standing_generator():
  student_standings = ['Freshman','Senior', 'Junior', 'Freshman']
  # Write your code below:
  for student_stand in student_standings:
    if  student_stand == 'Freshman' :
      yield 500

standing_values = student_standing_generator()
print(next(standing_values))
print(next(standing_values))
print(next(standing_values))
"""
Output:
500
500
Traceback (most recent call last):
 File "script.py", line 13, in <module>
   print(next(standing_values))
StopIteration
"""


500
500


StopIteration: 

#### Generator Expressions
Generator expressions allow for a clean, single-line definition and creation of an iterator.

Using a generator expression, there is no need to define a full generator function as we covered in the previous exercises.

Generator expressions resemble the syntax of list comprehensions. However, they do differ in the following ways:



```
Generator Expressions            | List Comprehensions
---------------------------------+-------------------------
Returns a newly defined iterator | Returns a new list
---------------------------------+-------------------------
Uses parentheses                 | Uses brackets
```

In [None]:
# List comprehension
a_list = [i*i for i in range(4)]
# Generator comprehension
a_generator = (i*i for i in range(4))


print(a_list) # [0, 1, 4, 9]
print(a_generator) # <generator object <genexpr> at 0x7f82e0e4d4c0>

for i in a_generator:
   print(i)
# 0
# 1
# 4
# 9


[0, 1, 4, 9]
<generator object <genexpr> at 0x78a5775e17e0>
0
1
4
9


In [None]:
# Ex:
def cs_generator():
 for i in range(1,5):
   yield "Computer Science " + str(i)


# Write your code below:
cs_courses = cs_generator()
for val in cs_courses:
 print (val)
"""
Output:
Computer Science 1
Computer Science 2
Computer Science 3
Computer Science 4
"""

cs_generator_exp = ("Computer Science {}".format(i) for i in range(1,5) )


for val in cs_generator_exp:
 print (val)
"""
Output:
Computer Science 1
Computer Science 2
Computer Science 3
Computer Science 4
"""


Computer Science 1
Computer Science 2
Computer Science 3
Computer Science 4
Computer Science 1
Computer Science 2
Computer Science 3
Computer Science 4


'\nOutput:\nComputer Science 1\nComputer Science 2\nComputer Science 3\nComputer Science 4\n'

#### Generator Methods:


##### send()
The .send() method allows us to send a value to a generator using the yield expression.

- Assigning yield to a variable
- argument passed to the .send() method will be assigned to that variable.
- Calling .send() will also cause the generator to perform an iteration.

Look at the following example to see the behaviour of the .send() method:



```
def count_generator():
 while True:
  n = "yield1a2"
  print(n)

my_generator = count_generator()
next(my_generator) # 1st Iteration Output:
next(my_generator) # 2nd Iteration Output: None
my_generator.send(3) # 3rd Iteration Output: 3
next(my_generator) # 4th Iteration Output: None
```

The last 4 lines in the code are 4 iterations, 3 using next() and one using the .send() method:

- The 1st iteration creates no output since the execution stops at n = yield which is before print(n).
- The 2nd iteration assigns None to n through the n = yield expression. None is printed.
- The 3rd iteration is caused by my_generator.send(3). The value 3 is passed through yield and assigned to n. 3 is printed.
- The last, and 4th, iteration, assigns None to n. None is printed.



In [None]:
# Ex:
def generator():
  count = 0
  while True:
    n = yield count
    if n is not None:
      count = n
    count += 1
my_generator = generator()
print(next(my_generator)) # Output: 0
print(next(my_generator)) # Output: 1
print(my_generator.send(3)) # Output: 4
print(next(my_generator)) # Output: 5

0
1
4
5


In the above example, the generator function defines count = 0 as the iteration value. n is used to hold the value provided by yield. Just like next(), the .send() method returns the value of the recent iteration. In this example, the return values are printed using print().

The updated line, n = yield count, has 2 behaviours:

- At the start of each iteration the value provided by yield is assigned to n. This value will be None when next() causes an iteration or it will be equal to the value passed using .send()
- At the end of each iteration, the value stored in count is returned by the generator.

If n is not None the value stored in n can be assigned to the iterator variable, count. This allows the iterator to only change the value of count when the .send() method is called.


##### throw()
Method throw() provides the ability to throw an exception inside the generator from the caller point.

In [None]:
def generator():
  i = 0
  while True:
    yield i
    i += 1

my_generator = generator()
for item in my_generator:
  if item == 3:
    my_generator.throw(ValueError, "Bad value given")

ValueError: Bad value given

##### close()
Method .close() is used to terminate a generator early, and works by raising a GeneratorExit

Once the .close() method is called the generator is finished,  just like the end of a for loop. Any further iteration attempts will raise a StopIteration exception.


In [None]:
def generator():
  i = 0
  while True:
    yield i
    i += 1
my_generator = generator()
next(my_generator)
next(my_generator)
my_generator.close() # my_generator is finished
next(my_generator) # raises StopGenerator exception

StopIteration: 

In the above example, my_generator() holds an infinite generator object. After a couple next(my_generator) calls, my_generator.close() is called. When we attempt to call next(my_generator) again, a StopIteration exception is raised.

In [None]:
def generator():
  i = 0
  while True:
    try:
      yield i
    except GeneratorExit:
      print("Early exit, BYE!")
      break
    i += 1
my_generator = generator()
for item in my_generator:
  print(item)
  if item == 1:
    my_generator.close()

0
1
Early exit, BYE!


Putting the yield expression in a try block we can handle the GeneratorExit exception. In this case, we simply print out a message. Because we interrupted the automatic behaviour of the .close() method, we must also use a break to exit the loop or else a RuntimeError will occur.

#### Connecting Generators
Useful to connect multiple generators into one. This allows us to delegate the operations of one generator to another sub-generator. Connecting generators is similar to using the itertools chain() function to combine iterators into a single iterator.

In order to connect generators, we use the yield from statement. An example of how it is used is below:



```
def cs_courses():
       yield 'Computer Science'
       yield 'Artificial Intelligence'
def art_courses():
       yield 'Intro to Art'
       yield 'Selecting Mediums'
def all_courses():
       yield from cs_courses()
       yield from art_courses()
```
Let’s break down this example:

- We have a generator function called cs_courses() that yields two results, 'Computer Science' and 'Artificial Intelligence'.
- We have another generator function called art_courses() that will yield two separate results, 'Intro to Art' and 'Selecting Mediums'.
- Our all_courses() generator function will yield results from both cs_courses() and art_courses() to create one combined generator with all four string values representing the courses.



```
combined_generator = all_courses()
print(next(combined_generator)) # Computer Science
print(next(combined_generator)) # Artificial Intelligence
print(next(combined_generator)) # Intro to Art
print(next(combined_generator)) # Selecting Mediums
```
next(), we can see that yield from retrieves each individual yield item at a time in the order that the yields are called within the generator functions.




In [None]:
def cs_courses():
       yield 'Computer Science'
       yield 'Artificial Intelligence'
def art_courses():
       yield 'Intro to Art'
       yield 'Selecting Mediums'
def all_courses():
       yield from cs_courses()
       yield from art_courses()

combined_generator = all_courses()
print(next(combined_generator)) # Computer Science
print(next(combined_generator)) # Artificial Intelligence
print(next(combined_generator)) # Intro to Art
print(next(combined_generator)) # Selecting Mediums

Computer Science
Artificial Intelligence
Intro to Art
Selecting Mediums


Ex:

We have a generator function called science students(x) that yields science major students with student IDs 1 to x. We have another generator function, non_science_students(x,y), that yields non-science major students with student IDs x-y. We want to retrieve student ids in the following order:
- Science students with IDs 1-5
- Non-science students with IDs 10-15
- Non-science students with IDs 25-30

Use a connected generator function called combined_students that uses yield from statements to achieve this.


In [None]:
def science_students(x):
  for i in range(1,x+1):
    yield i


def non_science_students(x,y):
  for i in range(x,y+1):
    yield i
# Write your code below
def combined_students():
  yield from science_students(5)
  yield from non_science_students(10,15)
  yield from non_science_students(25,30)

student_generator = combined_students()
for i in student_generator:
  print (i)

1
2
3
4
5
10
11
12
13
14
15
25
26
27
28
29
30


#### Generator Pipelines/nested generators

Generator pipelines allow us to use multiple generators to perform a series of operations all within one expression.

To pipeline generators, the output of one generator function can be the input of another generator function. That resulting generator can then be used as input for another generator function, and so on.

Pipeline generators are also often referred to as nested generators. We can use a pipelined generator like in the following example:


In [None]:
def number_generator():
      i = 0
      while True:
            yield i
            i += 1
def even_number_generator(numbers):
      for n in numbers:
            if n % 2 == 0:
                  yield n
even_numbers = even_number_generator(number_generator())
for e in even_numbers:
      print(e)
      if e == 100:
        break

0
2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50
52
54
56
58
60
62
64
66
68
70
72
74
76
78
80
82
84
86
88
90
92
94
96
98
100


The above example contains:
- The infinite generator number_generator() that yields numbers incrementing by 1
- The infinite generator even_number_generator() which takes a generator as a parameter, iterates through that generator and only yields even numbers.
- The even_numbers variable which holds an even_number_generator() object with number_generator() as its argument.

When we iterate over even_numbers only even numbers are output. The even_number_generator() iterates over all numbers using number_generator(). When an even number occurs, that number is returned by even_number_generator().

### Review
- In this lesson, you learned how to:
- Create generator functions using yield
- Implement generator expressions
- Use built-in generator methods like .send(), .throw(), and .close()
- Connect generators into single generators
- Use nested or pipelined generators

Question

1. Create a generator function called graduation_countdown() that will countdown the number of days left before student graduation. It should take in as input days and yield one less day on each next() call, so the last value yielded is 0. Use a while loop for yielding and decrementing the day.

2. Create an equivalent generator expression called countdown_generator for the graduation_countdown generator function. It should generate the days in a descending order starting from the provided days value. Place the code after the days = 25 line.

3. Modify the graduation_countdown() generator function to accept values sent using send(). Use a local variable called days_left to store sent values. Use an if/else statement to check for sent values.

4. Call the graduation_countdown() function and set it to a variable called grad_days. Iterate through grad_days generator to print the number of days left with a string of “Days Left: x” where x represents the countdown value.
On the 15th day of the graduation countdown, the school president announces that graduation will be moved up 5 days. Send a value of 10 to the grad_days generator when the 15th day in the countdown is reached.

5. It’s our lucky day! The school president announces that graduation will now occur on the 3rd day left of the countdown. Modify the for loop so that when the countdown day is 3, the generator will close. Insert the condition check and close() before the “Days Left” printout.

6. We have three honours achievements to assign to students that are defined within the summa(), magna(), and cum_laude() generator functions. Each honour is assigned based on a given GPA range listed below. Given a list of input GPAs, create a generator function called honors_generator that takes in 1 input argument named gpas that represents the list of GPAs from the variable gpas. The function should use yield from on each input GPA to determine the honours assignment.

Honors Assignment	GPA

Summa Cum Laude	> 3.9

Magna Cum Laude	> 3.7

Cum Laude	> 3.5

7. Call the connected generator function honors_generator with the gpas list and set it to a variable called honours. Loop through the honours generator and print out each honor_label value to see which honours labels will be generated given the gpas list.



In [None]:
def summa():
  yield 'Summa Cum Laude'
def magna():
  yield 'Magna Cum Laude'
def cum_laude():
  yield 'Cum Laude'


def honors_generator(gpas):
  for gpa in gpas:
    if gpa > 3.9:
      yield from summa()
    elif gpa > 3.7:
      yield from magna()
    elif gpa > 3.5:
      yield from cum_laude()




def graduation_countdown(days):
   while days >= 0:
    days_left = yield days
    if days_left != None:
      days = days_left
    else:
      days -= 1




days = 25
countdown_generator = (day for day in range(days, -1,-1))
grad_days = graduation_countdown(days)
for day in grad_days:
  if day == 15:
    grad_days.send(10)
  elif day == 3:
    grad_days.close()
print("Days Left: " + str(day))




days = 25
gpas = [3.2, 4.0, 3.6, 2.9]
honors = honors_generator(gpas)
for honor_label in honors:
  print(honor_label)

Days Left: 3
Summa Cum Laude
Cum Laude


true about the GeneratorExit exception in relation to generators?
- The generator ends where it was last paused when a GeneratorExit exception is received.
- The .throw() method can raise the GeneratorExit exception to close a generator.
- It is raised by the .close() method.

NOT true about generator pipelines?
- The input to one generator function can be the output to another generator function.
- They allow for multiple generators to perform a series of operations.
- They allow us to break down complex operations into smaller parts that can be pipelined using generators.


Which is NOT true about the GeneratorExit exception in relation to generators?
- GeneratorExit can be used in place of StopIteration to notify the generator when it is done iterating.

Which of the following is NOT true about generator pipelines?
- They require the use of yield from statements.

Which of the following is true about Python generators?
- A generator allows us to create iterators without needing to implement the iterator protocol.

How does a generator object know when to stop executing?
- When it raises a StopIteration exception.

What does the .throw() method do to a generator?
- It will allow us to throw an exception inside the generator from the caller object.

What does the .close() method do to a generator?
- It is used to force terminate a generator that is mid-sequence.

How are generator connections created in Python?
- A generator connection function must use the yield from statement.

Which of the following is FALSE when comparing yield and return statements?
- return statements will preserve any locally defined variables that come before it.
  - You got it! Locally defined variables are kept following a yield statement but will be lost with a return statement.

How do generator expressions and list comprehensions differ syntactically?
- Generator expressions use parentheses () while list comprehensions use brackets [].

Which of the following is FALSE when comparing generator functions to standard Python functions?
- Generator function results are returned all at once.

Which of the following is NOT true when using generator expressions?
- Generator expressions are memory inefficient.
  - You got it! Generator expressions are memory efficient since they return results one at a time and prevent a need to store all results in memory at once.

What does the .send() method do to a generator?
- It will send a given value to the generator’s yield statement.

How are generators created?
- Generator functions require a yield statement to be used, unlike standard functions.
- They are created by defining generator functions or generator expressions.



## Special collections

### Set

Introduction

In Python, a set is a group of elements that are unordered and do not contain duplicates. Although it may seem that the usefulness of this data structure is limited, it can actually be very helpful for organising items and performing set mathematics.
For example, we can imagine two different groups of items that have some similarities and differences. Using set mathematics, we can find the matching items, differences, combine the sets based on different parameters, and more! This is especially helpful when combing through very large datasets.

Alternatively, there is also an immutable version of a set called a frozenset. A frozenset behaves similarly to a normal set, but it does not include methods that modify the frozenset in any way.

In this lesson, we’ll explore:

- How to create a set and a frozenset.
- How to add to a set (we won’t be able to mutate a frozenset).
- How to remove from a set (we won’t be able to mutate a frozenset).
- How to find specific elements in a set and a frozenset.
- How to perform set operations such as unions, intersections, and more.



### Creating a Set
In Python, there are multiple ways to create a set. A set object can be created by passing an iterable object into its constructor, using curly braces, or using a set comprehension.

Let’s examine the syntax of these methods:

In [None]:
# Creating a set with curly braces
music_genres = {'country', 'punk', 'rap', 'techno', 'pop', 'latin'}
# Creating a set from a list using set()
# Write your code below!
music_genres_3 = set(['country', 'punk', 'rap', 'pop', 'pop', 'pop'])

It’s worth noting that creating a set from a list with duplicates produces a set with the duplicates removed. Here is an example:



```
print(music_genres_3)

output:
{'country', 'punk', 'pop', 'rap'}
```

Lastly, similar to list comprehensions, we can create sets using a set comprehension and a data set (such as a list). Here is an example:



In [None]:
items = ['country', 'punk', 'rap', 'techno', 'pop', 'latin']
music_genres = {category for category in items if category[0] == 'p'}
print(music_genres)

# Would output a set containing all elements from items starting with the letter 'p':

{'pop', 'punk'}


### Creating a Frozenset
Unlike a normal set, you can only create a frozenset using its constructor. Remember that using a frozenset means that you cannot modify the elements inside of it.

Creating a frozenset using its constructor looks like this:



In [None]:
# Creating a frozenset from a list
frozen_music_genres = frozenset(['country', 'punk', 'rap', 'techno', 'pop', 'latin'])
# We can also create an empty frozenset:

empty_frozen_set = frozenset()

### Adding to a Set
There are two different ways to add elements to a set:

.add() method can add a single element

Or

.update() method can add multiple elements.

There are a few things to note about adding to a set:
- Neither of these methods will add a duplicate item to a set.
- A frozenset can not have any items added to it and so neither of these methods will work.
- Notice that when the elements are printed, they are not printed in the same order in which they entered the set. This is because set and frozenset containers are unordered.




In [None]:
 # .add() method can add a single element

 # Create a set to hold the song tags
song_tags = {'country', 'folk', 'acoustic'}
# Add a new tag to the set and try to add a duplicate.
song_tags.add('guitar')
song_tags.add('country')


print(song_tags)
{'country', 'acoustic', 'guitar', 'folk'}





In [None]:
# .update() method can add multiple elements.

# Create a set to hold the song tags
song_tags = {'country', 'folk', 'acoustic'}
# Add more tags using a hashable object (such as a list of elements)
other_tags = ['live', 'blues', 'acoustic']
song_tags.update(other_tags)




print(song_tags)
{'acoustic', 'folk', 'country', 'live', 'blues'}



### Removing From a Set
There are two methods for removing specific elements from a set:

.remove() method searches for an element within the set and removes it if it exists, otherwise, a KeyError is thrown

or

.discard() method works the same way but does not throw an exception if an element is not present



In [None]:
""".remove() method searches for an element within the set and removes it
if it exists, otherwise, a KeyError is thrown
"""
# Given a list of song tags
song_tags = {'guitar', 'acoustic', 'folk', 'country', 'live', 'blues'}
# Remove an existing element
song_tags.remove('folk')


print(song_tags)
{'blues', 'acoustic', 'country', 'guitar', 'live'}
# Try removing a non-existent element
song_tags.remove('fiddle')

"""
Traceback (most recent call last):
File "some_file_name.py", line 9, in <module>
song_tags.remove('fiddle')
KeyError: 'fiddle'
"""



In [None]:
""".discard() method works the same way but does not throw an exception
if an element is not present
"""
# Given a list of song tags
song_tags = {'guitar', 'acoustic', 'folk', 'country', 'live', 'blues'}
# Try removing a non-existent element but with the discard method
song_tags.discard('guitar')


print(song_tags)
{'folk', 'acoustic', 'blues', 'live', 'country'}


# Try removing a non-existent element but with the discard method
song_tags.discard('fiddle')
print(song_tags)
{'folk', 'acoustic', 'blues', 'live', 'country'}



### Finding Elements in a Set
In Python, set and frozenset items cannot be accessed by a specific index. This is due to the fact that both containers are unordered and have no indices. However, like most other Python containers, we can use the in keyword to test if an element is in a set or frozenset.

Here are some examples of finding if elements exist in a set and frozenset:


In [None]:
# Given a list of song tags
song_tags = {'guitar', 'acoustic', 'folk', 'country', 'live', 'blues'}
# Print the result of testing whether 'country' is in the set of tags or not
print('country' in song_tags)
# True


True


True

### Introduction to Set Operations
A lot of the usefulness of a set container comes from the set operations. These allow you to combine sets, find the difference and intersections of sets, and more! You can combine these operations to perform complex logic problems on multiple sets. This can be useful for filtering items, categorising, combining, as well as many other uses.

The operations which we will be looking at are:
- Unions
- Intersections (and Intersection Updates)
- Differences (and Difference Updates)
- Symmetric Differences (and Symmetric Difference Updates)

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%2012.png)

https://docs.python.org/3/library/stdtypes.html#set



#### Set Union
When working with a set or frozenset container, one of the most common operations we can perform is a merge. To do this, we can return the union of two sets using the .union() method or | operator. Doing so will return a new set or frozenset containing all elements from both sets without duplicates.

Take a look at the Venn diagram representing a union of set A and set B:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%2013.png)


Notice the resulting set contains all the elements in both set A and set B as well as elements they have in common (minus the duplicates). In this case we are only looking at merging two sets but it’s also common to perform the operation on as many as we need!

Let’s look at two examples of creating a union:

Using union():

and

Using |:




In [None]:
"""Using union():
"""
# Given a set and frozenset of song tags for two python related hits
prepare_to_py = {'rock', 'heavy metal', 'electric guitar', 'synth'}
py_and_dry = frozenset({'classic', 'rock', 'electric guitar', 'rock and roll'})
# Get the union using the .union() method
combined_tags = prepare_to_py.union(py_and_dry)


print(combined_tags)
{'electric guitar', 'classic', 'heavy metal', 'rock and roll', 'rock', 'synth'}



In [None]:
"""Using |:
"""
# Get the union using the | operator
frozen_combined_tags = py_and_dry | prepare_to_py


print(frozen_combined_tags)


frozenset({'electric guitar', 'rock and roll', 'rock', 'synth', 'heavy metal', 'classic'})



#### Set Intersection
Let’s say that we have two or more sets, and we want to find which items both sets have in common. The set container has a method called .intersection() which returns a new set or frozenset consisting of those elements. An intersection can also be performed on multiple sets using the & operator.

Similar to the other operations, the type of the first operand (a set or frozenset on the left side of the operator or method) determines if a set or frozenset is returned when finding the intersection.

Take a look at the Venn diagram representing an intersection of set A and set B:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%2014.png)


.intersection()

and

& operator



In [None]:
""".intersection()
"""
# Given a set and frozenset of song tags for two python related hits
prepare_to_py = {'rock', 'heavy metal', 'electric guitar', 'synth'}
py_and_dry = frozenset({'classic', 'rock', 'electric guitar', 'rock and roll'})
# Find the intersection between them while providing the `frozenset` first.
frozen_intersected_tags = py_and_dry.intersection(prepare_to_py)


print(frozen_intersected_tags)
frozenset({'electric guitar', 'rock'})



In [None]:
"""& operator
"""
# Find the intersection using the operator `&` and providing the normal set first
intersected_tags = prepare_to_py & py_and_dry


print(intersected_tags)
{'rock', 'electric guitar'}



In addition to a regular intersection, the set container can also use a method called .intersection_update(). Instead of returning a new set, the original set is updated to contain the result of the intersection.


#### Set Difference
Similar to how we can find elements in common between sets, we can also find unique elements in one set. To do so, the set or frozenset use the .difference() method or the - operator. This returns a set or frozenset, which contains only the elements from the first set which are not found in the second set. Similar to the other operations, the type of the first operand (a set or frozenset on the left side of the operator or method) determines if a set or frozenset is returned when finding the difference.

Take a look at the Venn diagram representing a difference operation that captures elements that are unique to set A:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%2015.png)

.difference()

or

we can use the - operator



In [None]:
""".difference()
"""
# Given a set and frozenset of song tags for two python related hits
prepare_to_py = {'rock', 'heavy metal', 'electric guitar', 'synth'}
py_and_dry = frozenset({'classic', 'rock', 'electric guitar', 'rock and roll'})
# Find the elements which are only in prepare_to_py
only_in_prepare_to_py = prepare_to_py.difference(py_and_dry)


print(only_in_prepare_to_py)
# {'heavy metal', 'synth'}



In [None]:
"""we can use the - operator
"""
# Find the elements which are only in py_and_dry
only_in_py_and_dry = py_and_dry - prepare_to_py


print(only_in_py_and_dry)
frozenset({'rock and roll', 'classic'})



This operation also supports an updating version of the method. You can use .difference_update() to update the original set with the result instead of returning a new set or frozenset object.

#### Symmetric Difference
The last operation we will be looking at is the symmetric difference. We can think of this operation as the opposite of the intersection operation. A resulting set will include all elements from the sets which are in one or the other, but not both. In other words, elements that are unique to each set.

To perform this operation on the set or frozenset containers, we can use the .symmetric_difference() method or the ^ operator. Like the other operators, the type of the first operand (a set or frozenset on the left side of the operator or method) determines if a set or frozenset is returned when finding the symmetric difference.

Take a look at the Venn diagram that represents a symmetric difference between set A and set B:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Pyhton%20UNIT%20TESTING%2016.png)

.symmetric_difference()

or

we can use the ^ operator




In [None]:
""".symmetric_difference()
"""
# Given a set and frozenset of song tags for two python related hits
prepare_to_py = {'rock', 'heavy metal', 'electric guitar', 'synth'}
py_and_dry = frozenset({'classic', 'rock', 'electric guitar', 'rock and roll'})
# Find the elements which are exclusive to each song and not shared using the method
exclusive_tags = prepare_to_py.symmetric_difference(py_and_dry)


print(exclusive_tags)
# {'heavy metal', 'synth', 'rock and roll', 'classic'}



In [None]:
"""we can use the ^ operator
"""
# Find the elements which are exclusive to each song and not shared using the operator
frozen_exclusive_tags = py_and_dry ^ prepare_to_py

print(frozen_exclusive_tags)
# frozenset({'synth', 'rock and roll', 'heavy metal', 'classic'})





We can also update the original set using this operation by using the .symmetric_difference_update() method to update the original set with the result instead of returning a new set or frozenset object.

### Review

Creating a set or frozenset:
- For set containers, we can use curly braces {}, the set() constructor, or set comprehension.
- For frozenset containers, we can only use the frozenset() constructor.

Adding items to a set:
- We can add items to a set individually using the .add() method.
- We can add multiple items at once using the .update() method.

Removing items from a set:
- The .remove() method is used to remove elements from a set.
- The .discard() method can also be used to remove elements from a set. It does not throw a KeyError if the element is not found.

Finding Elements:
- The in keyword can be used with set and frozenset containers to test if an element exists inside of them.

Union:
- A union can be found using set or frozenset containers with the .union() method or | operator.

Intersection:
- An intersection can be found using set or frozenset containers with the .intersection() method or & operator.

Difference:
- The difference can be found using set or frozenset containers with the .difference() method or - operator.

Symmetric Difference:
- The symmetric difference can be found using set or frozenset containers with the .symmetric_difference() method or ^ operator.
- Want to learn more about sets? Check out everything about sets including additional methods, testing for superclass and subclasses, and more from the Python documentation.


### Recap: Python Containers
In Python, there are many ways to store and organize data. So far, we have experienced adding elements to a list, writing key-value pairs to a dictionary, or even accessing data within tuples.
Any object which stores data is called a container. If you have written code in Python, you have likely been using containers this whole time!

We are familiar with Python’s built-in containers (such as lists or dictionaries), but there are many other containers that exist in Python. These containers each specialise in a specific job and can be imported into your code from other modules or even be custom-made! In this lesson, we will be looking at these specialised containers from the Python collections module.


We will start to dive deeper in the next exercises, but for now, let’s take some time to review some of the common built-in containers we are most familiar with:



#### Lists
Lists are an ordered group of elements. Elements can be added, removed, accessed, and modified.



```
products = ['t-shirt', 'pants', 'shoes', 'dress', 'blouse']
products.append('jacket')
products.sort()
products.remove('shoes')
```



#### Tuples
Tuples are immutable objects which group multiple elements together. They are similar to lists, except that they cannot be modified once created.

```
searched_terms = ('clothes', 'phone', 'app', 'purchase', 'clothes', 'store', 'app', 'clothes')
term = searched_terms[2]
num_of_occurrences = searched_terms.count('clothes')
```




#### Dictionaries
Dictionaries are unordered groups of key-value pairs.

```
orders = {'order_4829': {'type': 't-shirt', 'size': 'large', 'price': 9.99},
'order_6184': {'type': 'pants', 'size': 'medium', 'price': 14.99}
}
order_4829_price = orders['order_4829']['price']
order_6184_size = orders['order_6184']['size']
orders['order_4829']['size'] = 'x-large'
num_of_orders = len(orders)
```





Sets
Sets are unordered groups of elements that cannot contain duplicates, elements cannot be modified.



```
old_products_set = {'t-shirt', 'pants', 'shoes'}
new_products_set = {'t-shirt', 'pants', 'blouse', 'dress'}
updated_products = new_products_set | old_products_set
removed_products = old_products_set - new_products_set
```

You can learn more about these built-in containers in earlier lessons or in the Python Documentation.

Now that we have reviewed the most common containers in Python, let’s practise using them, and then move on to exploring the specialised containers we mentioned earlier!

### Introduction to Specialized Containers
Now that we’ve had a refresher on some of the built-in containers Python provides, let’s dive into the collections module.

The classes from the collections module are very similar to the built-in containers we’ve been already using, but they contain new methods and utilities. Each of these specialised containers focuses on a certain improvement to its built-in counterpart such as optimising performance, better organisation, fewer steps for performing tasks, and more!

In order to use classes from the collections module, we will first need to import the module into our code. This is different from the previous containers we’ve seen because they were built-in and did not require an import.

Here are some of the various ways importing will look like:



```
# To import a single class or multiple classes
from collections import name_of_class, name_of_another_class
# To import all classes in the collections module
from collections import *
# Another way to import all classes in a module
import collections

```
For a more specific example, here is what importing the OrderedDict (one of the specialized containers) would look like. We will dive deeper into the details of this particular container later on in this lesson but for now just observe the syntax:






In [None]:
from collections import OrderedDict
orders = OrderedDict({'order_4829': {'type': 't-shirt', 'size': 'large', 'price': 9.99},
'order_6184': {'type': 'pants', 'size': 'medium', 'price': 14.99},
'order_2905': {'type': 'shoes', 'size': 12, 'price': 22.50}})
orders.move_to_end('order_4829')
orders.popitem()

You might have noticed that this is a similar example to the dictionary review in the last exercise, but there are some new methods that provide even more functionality to the traditional dictionary.

Here is a list of all of the advanced containers we will be looking at in this lesson:

Advanced Containers
- deque
- namedtuple
- Counter
- defaultdict
- OrderedDict
- ChainMap

Container Wrappers
- UserDict
- UserList
- UserString


#### Deque ≈ List
https://docs.python.org/3/library/collections.html#collections.deque

The problem with this implementation is that lists are not optimised for appending and popping large amounts of data, although they are great at accessing data at any index which you provide.

To solve this problem, we can use deque containers. These are similar to lists, but they are optimised for appending and popping to the front and back, rather than having optimized accessing.

Because of this, they are great for working with data where you don’t need to access elements in the middle very often or at all.

Let’s observe our same program but implemented with a deque:


In [None]:
# Write your code below!
from collections import deque
bug_data = deque()
loaded_bug_reports = get_all_bug_reports()
for bug in loaded_bug_reports:
 	if bug['priority'] == 'high':
 		# With a deque, we can append to the front directly
 		bug_data.appendleft(bug)
 	else:
 		bug_data.append(bug)
# With a deque, we can pop from the front directly
next_bug_to_fix = bug_data.popleft()

#### Named Tuple ≈ Immutable Dict
The namedtuple collection allows us to have an immutable tuple object, but every element becomes self-documented. Let’s examine our actor example but now refactored to use a namedtuple:


In [None]:
from collections import namedtuple
# General Structure: namedtuple(typename, field_names, *, rename=False, defaults=None, module=None)
ActorData = namedtuple('ActorData', ['name', 'birth_year', 'movie', 'movie_release_date'])

In this example, we are defining an instance of the namedtuple collection with a typename called 'ActorData' and a sequence of strings called field_names that represent the labels for the data we want to store.

We are saying we want our namedtuple to be called 'ActorData' and for it to have name, birth_year, movie, and movie_release_date properties. It’s like creating a label system for the type of data inside of the tuple!

We can then define an instance of our ActorData:


In [None]:
actor_data = ActorData('Leonardo DiCaprio', 1974, 'Titanic', 1997)

This then allows us to access the mapped property value to its associated name from before using the . notation:

In [None]:
print(actor_data.name)
# Leonardo DiCaprio


Some things to note about namedtuples:
- You may have noticed we use a CapWords convention when defining our namedtuple. This is because namedtuple actually returns a subclass and thus falls under the conventions we use for classes.
- The field_names argument can alternatively be a single string with each fieldname separated by whitespace and/or commas, for example, 'x y' or 'x, y'.
- At first glance, namedtuples might seem like it is trying to replicate a dictionary. While the key idea of labeling properties is the same in both structures, namedtuples have some key advantages over a regular dictionary:
- They are immutable and maintain their order, while a dictionary does not.
- They are more lightweight than dictionaries and take no more memory than a regular tuple.

There are other useful methods that a namedtuple uses such as converting from a namedtuple to a dict, replacing elements and field names, and even setting default values for attributes. More information about namedtuple containers can be found in the Python Documentation.


#### DefaultDict ≈ Dict
When we try to access a key-value pair in a dictionary, but the key does not exist, a dictionary will normally throw a KeyError. Take a look at this example of accessing an invalid key from a normal dictionary:

Dealing with frequent KeyError exceptions can be quite cumbersome and in certain cases, it might be better to avoid throwing an error. One of the ways Python offers to deal with this issue is by having a default missing value in the dictionary, and this is exactly what the defaultdict collection does. Let’s explore this new collection together!

First, we import the class and set the default value:


In [None]:
from collections import defaultdict
validate_prices = defaultdict(lambda: 'No Price Assigned')

Notice the following:
- We set the default value using a lambda expression.
- Any time we try to access a key that does not exist, it automatically updates our defaultdict object by creating the new key-value pair using the missing key and the default value.


In [None]:
validate_prices['jeans'] = 19.99
validate_prices['shoes'] = 24.99
validate_prices['t-shirt'] = 9.99
validate_prices['blouse'] = 19.99

print(validate_prices['jacket'])
# No Price Assigned


#### OrderedDict
When keeping track of many different dictionaries with the built-in Python containers, we could try storing dictionaries in a list, or even a dictionary of dictionaries. This may work in some cases, but there are a few problems which might come up.

When storing dictionaries in a list, the order is preserved, but we have to access the elements by their index before we can access the dictionary:


In [None]:
first_order = {'order_2905': {'type': 'shoes', 'size': 12, 'price': 22.50}}
second_order = {'order_6184': {'type': 'pants', 'size': 'medium', 'price': 14.99}}
third_order = {'order_4829': {'type': 't-shirt', 'size': 'large', 'price': 9.99}}
list_of_dicts = [first_order, second_order, third_order]


print(list_of_dicts[1]['order_6184']['price'])
# Output
# 14.99


On the other hand, depending on the Python version, the dict container can preserve the order, but it is difficult to move elements around:

In [None]:
dict_of_dicts = {}
first_order = {'order_2905': {'type': 'shoes', 'size': 12, 'price': 22.50}}
second_order = {'order_6184': {'type': 'pants', 'size': 'medium', 'price': 14.99}}
third_order = {'order_4829': {'type': 't-shirt', 'size': 'large', 'price': 9.99}}
dict_of_dicts.update(first_order)
dict_of_dicts.update(second_order)
dict_of_dicts.update(third_order)
print(dict_of_dicts['order_6184']['price'])
# Output
# 14.99


Note: The dict class is unordered in earlier versions of python, so implementing it this way must have version 3.6 or greater.

To solve these issues, we can use an OrderedDict!
The OrderedDict container allows us to access values using keys, but it also preserves the order of the elements inside of it. Let’s take a closer look at the example of processing customer orders from earlier in the lesson:


In [None]:
from collections import OrderedDict
orders = OrderedDict()
# The order of the data is preserved when adding it to the OrderedDict:
orders.update({'order_2905': {'type': 'shoes', 'size': 12, 'price': 22.50}})
orders.update({'order_6184': {'type': 'pants', 'size': 'medium', 'price': 14.99}})
orders.update({'order_4829': {'type': 't-shirt', 'size': 'large', 'price': 9.99}})

# Data can be accessed using keys like a normal dictionary:
# Get a specific order
find_order = orders['order_2905']

# The order can be retrieved by converting it to a list then accessing by index:
# Get the data in a list format
orders_list = list(orders.items())
third_order = orders_list[2]

# When using an OrderedDict, we are able to use its methods for moving the data
## around. We can move an element to the back or front and pop the data from the back or front of the OrderedDict:
# Move an item to the end of the OrderedDict
orders.move_to_end('order_4829')
# Pop the last item in the dictionary
last_order = orders.popitem()


Note: These two methods also accept boolean arguments which determine if the element is moved / popped from the front or back of the OrderedDict.

In [None]:
from collections import OrderedDict


# The first 15 orders are provided
order_data = [['Order: 1', 'purchased'],
['Order: 2', 'purchased'],
['Order: 3', 'purchased'],
['Order: 4', 'returned'],
['Order: 5', 'purchased'],
['Order: 6', 'cancelled'],
['Order: 7', 'returned'],
['Order: 8', 'purchased'],
['Order: 9', 'returned'],
['Order: 10', 'cancelled'],
['Order: 11', 'purchased'],
['Order: 12', 'returned'],
['Order: 13', 'purchased'],
['Order: 14', 'cancelled'],
['Order: 15', 'purchased']]


# Write your code below!


# Checkpoint #1
orders = OrderedDict(order_data)


# Checkpoint #2
to_move = []
to_remove = []
for key, val in orders.items():
        if val == 'returned':
                to_move.append(key)
        elif val == 'canceled':
                to_remove.append(key)


# Checkpoint #3
for item in to_remove:
        orders.pop(item)


# Checkpoint #4
for item in to_move:
        orders.move_to_end(item)


# Checkpoint #5
print(orders)

#### ChainMap
https://docs.python.org/3/library/collections.html#collections.ChainMap

There is another way to store dictionaries or other mappings in Python. We have looked at the defaultdict and OrderedDict so far and they handle a lot of situations so what else could we possibly need?

Well, the ChainMap container allows us to store many mappings in an ordered group, but lookups (accessing the value using a key) are repeated for every mapping inside of the ChainMap until something is found or the end is reached.

If we try to modify the data in any way, then only the first mapping in the ChainMap will receive the changes. When accessing data, one way to think of the ChainMap is that it treats all of the stored dictionaries as one large dictionary, where if there are repeated keys, then the first found result is returned. Let’s see what this looks like with an example using a customer’s clothing dimensions!


In [None]:
"""First, we import the ChainMap container and set up our data.
"""
from collections import ChainMap
customer_info = {'name': 'Dmitri Buyer', 'age': '31', 'address': '123 Python Lane', 'phone_number': '5552930183'}
shirt_dimensions = {'shoulder': 20, 'chest': 42, 'torso_length': 29}
pants_dimensions = {'waist': 36, 'leg_length': 42.5, 'hip': 21.5, 'thigh': 25, 'bottom': 18}

"""Next, we initialise a ChainMap with the mappings which we want to use.
In this case, the mappings are the dimensions dictionaries.
"""
customer_data = ChainMap(customer_info, shirt_dimensions, pants_dimensions)

"""Now we can access values from any of the stored mappings.
"""
customer_leg_length = customer_data['leg_length']

"""The parents property skips the first mapping and returns everything else (all of the parents of the first mapping).
"""
customer_size_data = customer_data.parents

"""We can directly modify the data only in the first dictionary.
"""
customer_data['address'] = '456 ChainMap Drive'


Note: In order to modify data from dictionaries which are deeper in the ChainMap, we will need to iterate through the dictionaries which are stored inside of it.

As we can see in this example, we create a new ChainMap using three different dictionaries. This allows us to access any of the key:value pairs stored inside.

Another interesting concept that the ChainMap uses is the concept of a parent mappings. If we use the .parents property, all mappings except the first one will be returned. This is because those mappings are considered to be the parent mappings to the first one. You can add a new “child” mapping to the front of the list of mappings using the .new_child() method.

Now let’s use a ChainMap to keep track of our clothes business profits for the last 12 months!


In [None]:
from collections import ChainMap
year_profit_data = [
{'jan_profit': 15492.30, 'jan_holiday_profit': 2589.12},
{'feb_profit': 17018.05, 'feb_holiday_profit': 3701.88},
{'mar_profit': 11849.13},
{'apr_profit': 9870.68},
{'may_profit': 13662.34},
{'jun_profit': 12903.54},
{'jul_profit': 16965.08, 'jul_holiday_profit': 4360.21},
{'aug_profit': 17685.69},
{'sep_profit': 9815.57},
{'oct_profit': 10318.28},
{'nov_profit': 23295.43, 'nov_holiday_profit': 9896.55},
{'dec_profit': 21920.19, 'dec_holiday_profit': 8060.79}
]


new_months_data = [
{'jan_profit': 13977.85, 'jan_holiday_profit': 2176.43},
{'feb_profit': 16692.15, 'feb_holiday_profit': 3239.74},
{'mar_profit': 17524.35, 'mar_holiday_profit': 4301.92}
]


# Write your code below!


# Checkpoint #1
profit_map = ChainMap(*year_profit_data)


# Checkpoint #2
def get_profits(input_map):
total_standard_profit = 0.0
total_holiday_profit = 0.0


for key in input_map.keys():
      if 'holiday' in key:
            total_holiday_profit += input_map[key]
      else:
            total_standard_profit += input_map[key]


      return total_standard_profit, total_holiday_profit


last_year_standard_profit, last_year_holiday_profit = get_profits(profit_map)


# Checkpoint #3
for item in new_months_data:
      profit_map = profit_map.new_child(item)


current_year_standard_profit, current_year_holiday_profit = get_profits(profit_map)


# Checkpoint #4
year_diff_standard_profit = current_year_standard_profit - last_year_standard_profit
year_diff_holiday_profit = current_year_holiday_profit - last_year_holiday_profit
print(year_diff_standard_profit)
print(year_diff_holiday_profit)

#### Container Wrappers
In Python, wrappers are modifications to functions or classes which change the behaviour in some way. They are called wrappers because they “wrap” around the existing code to modify it. This is most commonly used with function wrapping, but we can also wrap classes. Let’s take a look at an example of a class wrapper:




In [None]:
"""First, we need a class to wrap around.
"""
class Customer:
  def __init__(self, name, age, address, phone_number):
    self.name = name
    self.age = age
    self.address = address
    self.phone_number = phone_number

"""Next, we create a wrapper class which stores an object of the class we are
wrapping around. It also includes some additional functionality.
"""
class CustomerWrap(Customer):
  def __init__(self, name, age, address, phone_number):
    self.customer = Customer(name, age, address, phone_number)
  def display_customer_info(self):
    print('Name: ' + self.customer.name)
    print('Age: ' + str(self.customer.age))
    print('Address: ' + self.customer.address)
    print('Phone Number: ' + self.customer.phone_number)

"""Finally, we can create an object from the wrapper class to access the new
functionality and the wrapped class contained inside.
"""
customer = CustomerWrap('Dmitri Buyer', 38, '123 Python Avenue', '5557098603')
customer.display_customer_info()

# Output
# Name: Dmitri Buyer
# Age: 38
# Address: 123 Python Avenue
# Phone Number: 5557098603


Wrapper classes allow us to create different variations of classes with different purposes while avoiding duplicate code. Since we use an instance of the wrapped class inside of it, it preserves all of the attributes and methods from the wrapped class and keeps us from having to re-type all of the code.

In the case of containers, the collections class has three different wrapper classes set up for us to modify! Because of this, we can refer to them as wrapper containers. The advanced containers which we have already been looking at are variations of the standard built-in containers, so using wrapper containers allows us to create our own versions as well.

The three wrapper containers we will be looking at are:
- UserDict
- UserList
- UserString



#### UserDict
In this lesson, we have seen advanced containers which modify the functionality of a dictionary such as the defaultdict and OrderedDict. The UserDict container wrapper lets us create our own version of a dictionary. This class contains all of the functionality of a normal dict, except that we can access the dictionary data through the data property. Here’s an example of creating a modified dictionary:


In [None]:
from collections import UserDict
# Create a class which inherits from the UserDict class
class DisplayDict(UserDict):
          # A new method to increase the dictionary's functionality
         def display_info(self):
                  print("Number of Keys: " + str(len(self.keys())))
                  print("Keys: " + str(list(self.keys())))
                  print("Number of Values: " + str(len(self.values())))
                  print("Values: " + str(list(self.values())))
         # We can also overwrite a method from the dictionary class
         def clear(self):
                  print("Deleting all items from the dictionary!")
                  super().clear()
disp_dict = DisplayDict({'user': 'Mark', 'device': 'desktop', 'num_visits': 37})
disp_dict.display_info()
disp_dict.clear()

As shown in this code example, we can add additional methods and overwrite methods from the UserDictclass. This is the same as inheriting from regular classes in Python.

#### UserList
Not only can we create our own version of a dictionary, the UserList wrapper container lets us create our own list as well! This class contains all of the functionality of a regular list, but it also has a property called data which allows us to access the list contents directly. Here is an example of a modified list using the container wrapper:


In [None]:
from collections import UserList
# Create a class which inherits from the UserList class
class CondenseList(UserList):
          # A new method to remove duplicate items from the list
          def condense(self):
                    self.data = list(set(self.data))
                    print(self.data)
          # We can also overwrite a method from the list class
          def clear(self):
                    print("Deleting all items from the list!")
                    super().clear()
condense_list = CondenseList(['t-shirt', 'jeans', 'jeans', 't-shirt', 'shoes'])
condense_list.condense()
condense_list.clear()


As shown in this code example, we can add additional methods and overwrite methods from the UserList class. This is the same as inheriting from regular classes in Python.

#### UserString
Since strings are also considered containers, the collections module also provides a container wrapper for the string class. This contains all of the functionality of a regular string, but it includes the string’s data inside of a property called data. Inheriting from this class allows us to create our own version of a string! Here is an example:


In [None]:
from collections import UserString
# Create a class which inherits from the UserString class
class IntenseString(UserString):
          # A new method to capitalise and add exclamation points to our string
          def exclaim(self):
                    self.data = self.data.upper() + '!!!'
                    return self.data
          # Overwrite the count method to only count a certain letter
          def count(self, sub=None, start=0, end=0):
                    num = 0
                    for let in self.data:
                              if let == 'P':
                                        num+=1
                    return num
intense_string = IntenseString("python rules")
print(intense_string.exclaim())
print(intense_string.count())

This shows how we can add additional methods to the original container’s class or even overwrite existing methods. This is the same as inheriting from regular classes in Python.

### Review of Specialized Containers
Nice work! We have learned about all sorts of advanced containers which can help make programming easier, more organised, and more optimised! We even learned how to make our own advanced containers using container wrappers. Let’s review the use case for each of the advanced containers from the collections class:

1. Deque
- An advanced container which is optimised for appending and popping items from the front and back. For accessing many elements positioned elsewhere, it is better to use a list.
2. Namedtuple
- The namedtuple lets us create an immutable data structure similar to a tuple, but we don’t have to access the stored data using indices. Instead, we can create instances of our namedtuple with named attributes. We can then use them . operator to retrieve data by the attribute names.
3. Counter
- This advanced container automatically counts the data within a hashable object which we pass into it’s constructor. It stores it as a dictionary where the keys are the elements and the values are the number of occurrences.
4. Defaultdict
- An advanced container which behaves like a regular dictionary, except that it does not throw an error when trying to access a key which does not exist. Instead, it creates a new key:value pair where the value defaults to what we provide in the constructor for the defaultdict.
5. OrderedDict
- The OrderedDict combines the functionality of a list and a dict by preserving the order of elements, but also allowing us to access values using keys without having to provide an index for the position of stored dictionaries.
6. ChainMap
- This interesting container combines multiple mappings into a single container. When accessing a value using a key, it will search through every mapping contained within until a match is found or the end is reached. It also provides some useful methods for grouping parent and child mappings.
7. UserDict
- This is a container wrapper which lets us create our own version of a dictionary
8. UserList
- This is a container wrapper which lets us create our own version of a list
9. UserString
- This is a container wrapper which lets us create our own version of a string


In [None]:
from collections import *


overstock_items = [['shirt_103985', 15.99],
['pants_906841', 19.99],
['pants_765321', 15.99],
['shoes_948059', 29.99],
['shoes_356864', 9.99],
['shirt_865327', 10.99],
['shorts_086853', 9.99],
['pants_267953', 21.99],
['dress_976264', 32.99],
['shoes_135786', 17.99],
['skirt_196543', 12.99],
['jacket_976535', 26.99],
['pants_086367', 30.99],
['dress_357896', 29.99],
['shoes_157895', 14.99]]


# Write your code below!


# Checkpoint #1
split_prices = deque()


#Checkpoint #2
for item in overstock_items:
      if item[1] > 20.0:
            split_prices.appendleft(item)
      else:
            split_prices.append(item)
print(split_prices)


# Checkpoint #3
ClothesBundle = namedtuple('ClothesBundle', ['bundle_items', 'bundle_price'])


# Checkpoint #4
bundles = []
while len(split_prices) >= 5:
      bundle_list = [split_prices.pop(), split_prices.pop(), split_prices.pop(), split_prices.popleft(),split_prices.popleft()]
      calc_price = sum(b[1] for b in bundle_list)
      bundles.append(ClothesBundle(bundle_list, calc_price))


# Checkpoint #5
promoted_bundles = []
for bundle in bundles:
      if bundle.bundle_price > 100:
            promoted_bundles.append(bundle)


# # Checkpoint #6
print(promoted_bundles)


# for bundle in promoted_bundles:
# print(bundle)


- True or False: For a Counter, the .most_common() method will only return the one element which appears the most within the counted data.
  - (Selected)Correct: False
  - The .most_common() method accepts a parameter which determines how many top results are returned.
- What is a container wrapper?
  - A class which acts as a wrapper around certain built-in containers.
- What is the purpose of containers in Python?
  - Containers make it easy to store and work with data!
- Which of the following containers is an advanced container from the collections module?
  - Deque


## Resource Management


### Introduction to Context Managers

A context manager is an object that takes care of the assigning and releasing of resources (files, database connections, etc)
- Computers have finite resources, in the form:
  - memory
  - storage
  - power
- If they are not managed well, it can lead to the computer running out of memory, space, and even cause crashes.

Learning to properly use context managers will give our software benefits such as:
- Preventing resource leaks
- Preventing crashes
- Decreasing the vulnerability of our data
- Preventing program slow-down


#### A Familiar Face: The with Statement
`with` statement serves as a context manager,where files are automatically closed after script completion and we don’t ever have to worry about the possibility of forgetting to close a resource.

```
with open("file_name.txt", "w") as file:
 file.write("How you gonna win when you ain't right within?")
```

Here is what is happening in our small script:
- The with statement calls the built-in open() function on "file_name.txt" with a mode of "w" which represents write mode.
- The as clause assigns the object opened (the file) to a target variable called file, which can be accessed inside of the context manager.
- file.write() writes a sentence to "file_name.txt"


Same as the with statement above


```
file = open("file_name.txt", "w")
try:
  file.write("How you gonna win when you ain't right within?")
finally:
  file.close()
```







### Class Based Context Managers
Creating our own context manager.
class-based approach of creating context managers. Requires explicitly defining and implementing the following two methods inside of a class:
- An `__enter__()` method
  - The `__enter__()` method allows for the setup of context managers.
  - This method commonly takes care of opening resources (like files).
  - Also begins runtime context - the period of time in which a script runs.

- An `__exit__()` method
  - The `__exit__()` ensures the breakdown of the context manager.
  - This method commonly takes care of closing open resources that are no longer in use.


In [None]:
class PoemFiles :
  def __init__(self):
   		print ('Creating Poems!')

  def __enter__(self):
   		print ('Opening poem file')

  def __exit__(self, *exc):
   		print ('Closing poem file')

By defining these two methods, we are implementing the context management protocol - a guideline for the required methods for a context manager.

Invoke the ContextManager class with a with statement.
- Implementing the context management protocol allows us to immediately invoke the class using the with statement as shown below:


In [None]:
with PoemFiles() as manager:
 	print('Hope is the thing with feathers')
""" output:
Creating Poems!
Opening poem file
Hope is the thing with feathers
Closing poem file
"""


The PoemFiles() class is executed in the following sequence:
- `__init__()` method
- `__enter__()` method
- The code in the with statement block
- `__exit__()` method


#### Class Based Context Managers II


In [None]:
class WorkWithFile:
  def __init__(self, file, mode):
   	self.file = file
   	self.mode = mode
  def __enter__(self):
   	self.opened_file = open(self.file, self.mode)
   	return self.opened_file
  def __exit__(self, *exc):
   	self.opened_file.close()


The `__init__()` method:
- self: This is standard for any class we work with and allows us to work with methods and properties we assign to an instance of a class.
- file: Since we are working with files, we need to be able to take in a file argument when we call the class with a with statement.
- mode: Lastly, we need to provide the file a mode. This allows us to manage what our context manager will actually be doing, such as reading, writing, or both!

The `__enter__()` method:
- Any new instance of our context manager will have a file and mode property,
- we can pass them into the open() function to open a specific file with a specific mode. Then, we save it as a variable called self.opened_file,
- returns self.opened_file.

The `__exit__()` method:
- Close the file we work on. Here we are still taking in a *exc argument, but we won’t touch on that until the next exercise. For now, this method is solely responsible for closing the resource we opened in `__enter__()`


```
with WorkWithFile("file.txt", "r") as file:
 	print(file.read())
```



### Handling Exceptions



#### Part 1 (def __exit__(self, *exc):)
`__exit__()` method needs four total arguments!
- the `*exc` tells the method we will pass a variable number of arguments even though we never did.

The `__exit__()` method has three required arguments (in addition to self):

1. exception type (exc_type): which indicates the class of exception
2. AttributeError class
3. NameError class
  exception value (exc_val): the actual value of the error
  traceback: a report detailing the sequence of steps that caused the error and all the details needed to fix the error.

In [None]:
# Example
class OpenFile:
    ...
  def __exit__(self, exc_type, exc_val, traceback):
  	print(exc_type)
  	print(exc_val)
  	print(traceback)
  	self.opened_file.close()


with OpenFile("file.txt", "r") as file:
 	# .see() is not a real method
 	print(file.see())

"""
output:
<class 'AttributeError'>
'_io.TextIOWrapper' object has no attribute 'see'
<traceback object at 0x7f08dcfb5040>


Traceback (most recent call last):
 	File "script.py", line 14, in <module>
   		print(file.see())
AttributeError: '_io.TextIOWrapper' object has no attribute 'see'
"""

error message that tells us that
- we have an AttributeError
- that our object has no attribute 'see'
- provides a traceback object

Sequence of events:
1. error occurs,
2. the code stops,
3. resources (file) are still closed.
4. The values of these three arguments are then thrown or suppressed.


#### Part 2 suppression of an error

An exception that occurs in a context manager can be handled in two ways:
1. If we want to throw an error when an error occurs, we can either:
  - Return False after the .close() method
  - Do nothing
2. If we want to suppress the error, we can:
  - Return True after the .close() method




In [None]:
"""Example of suppression of an error with Return True
"""
class OpenFile:
  ...
  def __exit__(self, exc_type, exc_val, traceback):
  	print(exc_type, exc_val, traceback)
  	print("The exception has been handled")
  	self.file.close()
  	return True







"""We can choose to handle a specific exception
"""
class OpenFile:
  ...
  def __exit__(self, exc_type, exc_val, traceback):
   	if isinstance(exc_val, TypeError):
      # Handle TypeError here...
     	print("The exception has been handled")
     	return True
  		self.file.close()

This is useful if we want our context manager to not block the execution of other code, but also customise the output if a certain exception occurs. Here is an example of working with a TypeError
if statement that compares exc_val to a specific exception we are trying to catch. Anything we want to happen for this specific exception can occur in the conditional code block. Lastly, we return True to make sure we suppress the exception from arising and stopping the rest of our code from running.


### Introduction to Contextlib
Built-in Python module called contextlib!

  - The contextlib module allows for the creation of a context manager with the use of a generator function (a function that uses yield instead of return) and the contextlib decorator - @contextmanager.
  - Instead of creating a class and defining `__enter__()` and `__exit__()` methods, we can use a simple function!

First, we will need to import the built-in module into our script and grab the @contextmanager decorator:

```
from contextlib import contextmanager
```

In [None]:
"""Syntax
"""
@contextmanager
def generator_function(<parameters>):
       <setup section - equivalent to __enter__ >
       try:
             yield <value>
       finally:
             <cleanup section - equivalent to __exit__ >

###############################################################################

"""Example
"""
@contextmanager
def open_file_contextlib(file, mode):
       opened_file = open(file, mode)
       try:
             yield opened_file
       finally:
             opened_file.close()


We are doing a few things here:
1. We have written a generator function called open_file_contextlib with the expectation that it will take in two arguments, a file and a mode.
2. We then use the built-in open() function to open the file (that we received as an argument) and save it to a variable called opened_file.
3. The function then will attempt (via a try statement) to yield the opened file and complete whatever code we pass when we use it in conjunction with the with statement. More on this in a bit!
4. Lastly the resource (file) will be closed once all the code is done being executed.

In [None]:
"""The created function and denoted it as a context manager using the @contextmanager decorator,
 we can immediately use it like before in a with statement:
"""
with open_file_contextlib('file.txt', 'w') as opened_file:
 	opened_file.write('We just made a context manager using contexlib')

### Contextlib Error Handling
method dealt with exceptions. For the decorator method, errors are most commonly dealt with within an except block. We will build on top of our try/finally block by incorporating an except. There are two main ways to deal with errors:
1. To throw an error and stop the execution of our entire program, we can:
  - do nothing
2. To catch errors and continue the execution of our program, we can:
  - Handle the exception via an except block.




```
@contextmanager
def open_file_contextlib(file, mode):
 	open_file = open(file, mode)
 	try:
  		yield open_file
 	# Exception Handling
 	except Exception as e:
  		print('We hit an error: ' + str(e))
 	finally:
  		open_file.close()
    
    
with open_file_contextlib('file.txt', 'w') as opened_file:
  opened_file.sign('We just made a context manager using contexlib')
```

Notice:
- The inclusion of the except clause
- The except attempts to catch a generic Exception and, if it is hit, saves it to a variable exception.

Note: we can use any exception object, not just a generic one, if we know the specific exception we are trying to catch.
- The handler then prints out the error

`with` statement above, .sign() is not a file method.  The output would look like this:


```
We hit an error: '_io.TextIOWrapper' object has no attribute 'sign'
```






### Nested Context Managers
working with multiple files! For example, we might want to:
- Work with information from multiple files.
- Copy the same information to multiple files.
- Copy information from one file to another.

To accomplish this goal of working with multiple resources at once, context managers can be nested together in a with statement to manage multiple resources simultaneously.

Example

two files: a teacher.txt file and a student.txt. We want to copy all the information on the student file to the teachers. Our code might look like this:

```
with open('teacher.txt', 'w') as teacher, open('student.txt', 'r') as student:
 	teacher.write(student.read())
```

- The with statement is being called once but invoking two context managers. This is a single-line nested with statement.
- Each context manager is separated by a comma and has its own target variable.
- Our teacher.txt file is being opened in write mode because it will be written into and our student.txt is opened in read mode because we are attempting to copy the text into the teacher’s file
- The resulting teacher.txt file will now include everything that was in the student.txt file.
- Here we have chosen to use the open() built-in function rather than a custom context manager. It is entirely possible to use our own in place of the open() function.

or

```
with open("teacher.txt", "w") as teacher:
  	with open("student.txt", "r") as student:
    		teacher.write(student.read())
```

- The with statement is being called twice
- The with statement to open student.txt in read mode is nested in the code block of the with statement that opens teacher.txt in write mode.
- This method, though slightly longer, gives a clearer visual of nesting and is preferable when working with more than two context managers.


### Review
https://docs.python.org/3/library/contextlib.html

Context Managers:
- Context managers are a form of resource management in python invoked by the with statement.
- They ensure that resources are closed/released after usage regardless of whether or not an error occurs.
- They can be created from scratch using either the class-based method or the contextlib decorator-based method.
- Behind every context manager, there’s an __enter__ and __exit__ method taking place.
- Context managers can be nested together to work with resources simultaneously.

Class-Based Context Managers
- They can be created from scratch with the manual implementation of the __enter__ and __exit__ method.
- The __exit__ method takes three arguments: An exception type, exception value, and a traceback. The method can then handle exceptions.

Decorator Based Context Managers
- They can be created from scratch using the contextlib contextmanager decorator on a generator function
- In the contextlib method, the except block handles exception’s code block


# Advanced Python for Data Engineers

## Introduction

How to leverage Python’s unique features and techniques to build powerful, sophisticated applications. You’ll also learn how to debug and track your software, write clean, efficient code, work with databases, and more. By the end of this unit you will be able to:
- Debug and monitor software with logging
- Create clean, efficient programs with functional programming
- Use the sqlite3 module to manage databases
- Implement code more efficiently with concurrent programming
- Deploy packages with Flask



## Logging in Python


### Introduction
utilising a logging utility:
- Deploy packages with Flask
- keep our code neatly organised
- easy for other programmers to view and understand the code
- makes the task of debugging significantly easier by providing detailed information on
  - what caused an error,
  - what time the error occurred,
  - where inside the code the error occurred,
  - the values of variables that you need to debug.

Python’s logging module, we can:
- Identify the date and time of a custom or error message
- Format logs to make debugging easier
- Set severity levels for the logs
- Output the logs to various streams


Dated Timestamps
- date and time when a logged message or error occurred for debugging or investigative purposes.
- The logging module provides a formatting option to include dated timestamps for logged messages.

Example:
```
[2021-11-08 03:16:05,980] {ERROR} login:attempt_login - Invalid credentials for username: foo!
```

Severity Level:
Varying levels of severity for logging messages:
- Notset
- Debug
- Info
- Warning
- Error
- critical

Being able to set different severity levels for logged messages allows us to filter out logs of certain levels or limits easily.

production-ready state: We can filter out any logging statements that have a debug severity level by setting a log level for the logger.

Log Files

logs can be saved to a file, allowing the log messages to be accessed beyond code execution time. Additionally, the saved log file can then be indexed and made searchable through other software and applications.









### Creating a Logger
1. import logging module
```
import logging
```

2. Create a logger object using the `getLogger(name)` method.
```
logger = logging.getLogger(__name__)
```
- `name` (optional), single input parameter called `name` that represents the name of the logger.
- Calling `getLogger()` with the same `name` value returns the same logger object.
- If we give no name value, the root logger is returned.
- Recommend using the built-in variable `__name__()`, return current module’s name.
- This reduces the chance of accidentally reusing a logger name and retrieving the wrong logger object.

3. Create a handler using `StreamHandler` class  
```
import sys
stream_handler = logging.StreamHandler(sys.stdout)
```

- handler inform the logger where we want our logged messages will output
- `stream` optional (default = `sys.stderr`)
- Direct output to the console, `stream` = `sys.stdout`

4. method called addHandler(hdlr) that adds a specific handler to the logger object
```
logger.addHandler(stream_handler)
```

The hdlr input represents the handler object to add, which in our example is the StreamHandler object.



In [None]:
import logging
import sys
logger = logging.getLogger(__name__)
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)

### Log Levels
defined logging levels that indicate specific levels of severity for a log message. Each logging level is a constant within the logging module with an associated numeric value. The higher this numeric value, the higher the severity of the log message. Each logging level is defined as:

```
------------------
Level    | Value |
---------+--------
NOTSET   | 0     |
---------+--------
DEBUG    | 10    |
---------+--------
INFO     | 20    |
---------+--------
WARNING  | 30    |
---------+--------
ERROR    | 40    |
---------+--------
CRITICAL | 50    |
------------------
```

- Numeric value of 0 - We should use `logging.NOTSET` logging level searches for the first non-NOTSET ancestor logger and inherits its logging level.
- Numeric value of 10 - We should use `logging.DEBUG` to provide detailed information that is useful for debugging the application.
- Numeric value of 20 - We should use `logging.INFO` for general operations where expected information or output is logged.
- Numeric value of 30 - We should use `logging.WARNING` to alert us to a current or impending, unexpected issue or error. This logging level does mean that the software or application will continue to run despite the warning message.
- Numeric value of 40 - We should use `logging.ERROR` to indicate serious problems that cause functionality within the software or application to break. It has a numeric value of 40.
- Numeric value of 50 - We should use `logging.CRITICAL` the most severe of errors and issues. These errors indicate that the software or application may stop running altogether.



### Logging Errors and Messages
The logging module has several methods that we can use to log messages and errors with an assigned severity level. Those methods are:

- `debug(msg)` which logs a message with level `DEBUG`
```
logger.debug("message")
or
logger.log(logging.DEBUG, "message")
```

- `info(msg)` which logs a message with level `INFO`
```
logger.info("message")
or
logger.log(logging.INFO, "message")
```

- `warning(msg)` which logs a message with level `WARNING`
```
logger.warning("message")
or
logger.log(logging.WARNING, "message")
```

- `error(msg)` which logs a message with level `ERROR`
```
logger.error("message")
or
logger.log(logging.ERROR, "message")
```

- `critical(msg)` which logs a message with level `CRITICAL`
```
logger.critical("message")
or
logger.log(logging.CRITICAL, "message")
```

### Setting the Log Level
```
logger.setLevel(logging.DEBUG)
```

There are several cases where setting the log level can be helpful.
- If we are trying to troubleshoot a problem within our application that is causing it to crash, it would be optimal to see only the CRITICAL level log messages and filter out all other messages.
- If we are still in a development phase of our application and have to debug often, setting the log level to DEBUG is useful for obtaining helpful debugging log messages.
- Application is ready for deployment, we can easily set the log level to WARNING to filter out developer-friendly information like DEBUG and INFO messages.

The default log level is WARNING in Python’s logging module. This means only log messages with WARNING level or higher were processed and logged to the console.

To change the default log level for a logger, we can use a method called `setLevel(level)`.
- `level` parameter represents the numeric value of the log level to use

### Pipe Logging to a File



#### Logging output to console

```
import logging
import sys


logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)
```

#### Pipe Logging to a File (output.log)

```
import logging
import sys

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
# stream_handler = logging.StreamHandler(sys.stdout)
# logger.addHandler(stream_handler)
file_handler = logging.FileHandler("output.log")
logger.addHandler(file_handler)
```

#### Logging to Console and File

```
import logging
import sys

logger = logging.getLogger(__name__)
logger.setLevel(logging.DEBUG)
file_handler = logging.FileHandler("calculator.log")
logger.addHandler(file_handler)
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)   
```


To write logs to a saved file, we can use the logging module class FileHandler(filename). The filename initialization parameter is a string value that represents the filename for the written log file.

Similarly to the StreamHandler, we can add the FileHandler object to the logger by using the addHandler(hdlr) method as shown below:


```
file_handler = logging.FileHandler("output.log")
logger.addHandler(file_handler)
```









### Formatting the Logs
https://docs.python.org/3/library/logging.html#logrecord-attributes

Logging module is the ability to format log messages, to include helpful information:
- timestamps
- the module name
- the line number

Python uses the default formatting for all log messages :
```
%(levelname)s:%(name)s:%(message)s
```

- %(levelname)s is a string that represents the log level name
- %(name)s represents the string of the name of the logger
- %(message)s is the logged message string.
- : (colon) separates each of these strings

Example:
```
logger.warning("This is a warning!")
```

output:
```
WARNING:script:This is a warning!
```

Create a custom formatter object using the `logging.Formatter()` module’s class, which accepts the formatted string as the first input value.

Example
- add in our timestamp and line number formatting



```
formatter = logging.Formatter("[%(asctime)s] %(levelname)s:%(name)s:%(lineno)d:%(message)s")
```

In [None]:
# In use:
import logging
logger = logging.getLogger(__name__)
stream_handler = logging.StreamHandler(sys.stdout)
formatter = logging.Formatter("[%(asctime)s] %(levelname)s:%(name)s:%(lineno)d:%(message)s")
stream_handler.setFormatter(formatter)
logger.addHandler(stream_handler)
logger.warning("This is a warning!")

""" output:
[2021-11-04 02:58:51,847] WARNING:script:This is a warning!
"""

Key point:

- `setFormatter()` method will work on several different handler classes (`StreamHandler()` and `FileHandler()`)
  - we can set different formatting options for different handlers

Example, if we choose to log to both console and a file, we can have different formatting for the log messages for each one by applying two separate Formatter objects to each handler. This can be useful if we need to, for example, log DEBUG messages to console with more information in each log message than what we need in our written log file.


### Using basicConfig()
The `basicConfig()` method allows for the basic configuration of the logger object by configuring the log level, any handlers, log message formatting options, +code .

Let’s say we want to simplify our calculator code by using the basicConfig() method:

Default: (no arguments to the basicConfig() method.)

```
logger = logging.getLogger(__name__)
logging.basicConfig( )
```

In that case,

-  StreamHandler object will automatically be created for the logger
-  log to the default sys.stderr
-  Formatting is levelname, name and message

Costume:
```
logger = logging.getLogger(__name__)
logging.basicConfig(filename='calculator.log'
 		         , level=logging.DEBUG
 		         , format='[%(asctime)s] %(levelname)s - %(message)s'
 		         )
```



- code automatically adds a FileHandler object that will write to the filename calculator.log
- set the log level to DEBUG
- format all logged messages to have a
  - timestamp,
  - log level,
  - message information.
  
After the configuration is set, the logger object is still retrieved using the getLogger(name) method.


### Review
Congratulations! You completed the Logging lesson! Within this
- lesson, we learned:
- The benefits of using the logging module over print statements
- How to create a logger object
- What log levels are
- How to log errors and messages with a given log level
- How to set the default log level
- How to log to the console, to a file, and to both
- How to format log messages
- How to simplify log configuration using basicConfig()

Question:
1. The basicConfig() method allows for Simple, one1.line configuration of the logging module
2. Which of the following is true of the log(level, msg) method?
3. It will log the input argument msg with the level that is associated with the numeric level input argument.
4. Which of the following is NOT a logging severity level?
  EXCEPTION
5. How can log messages be directed to the console?
  By using the StreamHandler class
6. To configure the logger objects for individual modules, which should be used?
  Use getLogger(name) to retrieve the logger object that can then be configured.
7. What does the logging module do?
  Logs formatted messages to the user that allow for events that occur within software to be tracked.
8. log(level, msg) method?
  It will log the input argument msg with the level that is associated with the numeric level input argument.


## Functional Programming in Python
​​In this article, we will explore the concept of functional programming, including its differences from object-oriented programming.

This article is divided into the following sections:

- Introduction to functional programming
- Functional vs. object-oriented programming
- Declarative vs. imperative programming
- Writing functions in functional programming
- Using recursion instead of loops
- Passing functions as arguments to other functions


### Introduction
Get ready to become a master of functional programming! This content branches off some content you may have run into from our Intermediate Python. We will do a quick review of:
- tuples
- lambda functions
- map(), reduce(), filter()
- working with CSV and JSON files

Then we will dive into more advanced problems using these tools!

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/SQL%20Functional%20Programming%20in%20Python%201.png)

### Review of Lambda Functions
A lambda function is a short anonymous function that can accept several parameters but only returns one value. Lambdas can be stored as a variable or defined inline in the accepting function.

https://www.codecademy.com/resources/docs/python/functions/anonymous-functions?page_ref=catalog


In [None]:
# Checkpoint 1 code goes here.
def odd_or_even(n, even_function, odd_function):
 	if n % 2 == 0:
   		return even_function(n)
 	else:
   		return odd_function(n)
# Checkpoint 2 code goes here.
square = lambda x:x*x
cube = lambda x:x*x*x
# Checkpoint 3 code goes here.
test  = odd_or_even(5, cube, square)
print(test) # Uncomment the print function to see the results of Checkpoint 3.

25


### Review of filter(), map(), and reduce()
- higher-order functions provided by Python.
- These functions accept an iterable and a processing function as arguments and return another iterable.
- Similar to the exercise on lambdas, this section should serve as a refresher.


In [None]:
"""filter()
"""
nums = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
# filter_values is not a higher-order function
def filter_values(predicate, lst):
 # Mutable list required because this example is imperative, not declarative
 ret = []
 for i in lst:
   if predicate(i):
     ret.append(i)
 return ret
filtered_numbers = filter_values(lambda x: x % 2 == 0, nums)
print(filtered_numbers)
# This will output the list: [2, 4, 6, 8, 10]

"""into the following declarative code:
"""
nums = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
filtered_numbers = tuple(filter(lambda x: x % 2 == 0, nums))
print(filtered_numbers)
# This will output the tuple: (2, 4, 6, 8, 10)


[2, 4, 6, 8, 10]
(2, 4, 6, 8, 10)


In [None]:
"""map()
"""
nums = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
def mapper(function, lst):
 ret = []
 for i in lst:
   ret.append(function(i))
 return ret
mapped_numbers  = mapper(lambda x: x*x, nums)
print(tuple(mapped_numbers))
# This will output: (1, 4, 9, 16, 25, 36, 49, 64, 81, 100)

"""Into the following declarative code:
"""
numbers = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
mapped_numbers = map(lambda x: x*x, numbers)
print(tuple(mapped_numbers))
# This will also output: (1, 4, 9, 16, 25, 36, 49, 64, 81, 100)


(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)
(1, 4, 9, 16, 25, 36, 49, 64, 81, 100)


Note:

map() to iterate through the dictionary and compute the cost of every item sold. We can potentially store this in a tuple  

when passing a dictionary as an iterable, the function will iterate through the list of the dictionaries keys.


**reduce()**

The lambda provided to reduce() requires that the two parameters and the returned value be of the same type. For example, in the lambda lambda x, y: x*y, the x, y, and return type are all integers. As you can see, you cannot directly reduce a dictionary to a number because they are not of the same type; we must process the data in the dictionary first.


In [None]:
"""reduce()
"""
nums = (2, 6, 7, 9, 1, 4, 8)
sum = 0
for i in nums:
 	sum += i
print(sum) # Output: 37

"""Using reduce() to find the sum declaratively:
In Python 3, the `reduce()` function has been moved to the `functools` library,
so we need to import it before we can use it.
"""
from functools import reduce
nums = (2, 6, 7, 9, 1, 4, 8)
reduced_nums = reduce(lambda x, y: x + y, nums) # reduced_nums is a number
print(reduced_nums) # Output: 37

37
37


explain

```
from functools import reduce
# Define a list of numbers
numbers = [1, 2, 3, 4, 5]
# Use reduce() to find the product of all numbers in the list
product = reduce(lambda x, y: x * y, numbers)
print(product) # Output: 120
```

1. First, the functools module is imported to make the reduce() function available.
2. A list of numbers is defined and assigned to the variable numbers.
3. The reduce() function is used with two arguments: a lambda function and the list of numbers. The lambda function is defined as lambda x, y: x * y, which takes two arguments and returns their product.
4. The reduce() function applies the lambda function to the first two elements of the numbers list, which are 1 and 2. It multiplies them together, producing a result of 2.
5. The reduce() function then applies the lambda function to the result of the previous step (2) and the next element of the numbers list (3). It multiplies them together, producing a result of 6.
6. The reduce() function then applies the lambda function to the result of the previous step (6) and the next element of the numbers list (4). It multiplies them together, producing a result of 24.
7. The reduce() function then applies the lambda function to the result of the previous step (24) and the next element of the numbers list (5). It multiplies them together, producing a result of 120.
8. The reduce() function returns the final result (120), which is assigned to the variable product.
9. Finally, the value of the product is printed to the console, producing an output of 120.

So, the reduce() function is used to iteratively apply a lambda function to the elements of a list, in this case multiplying them together to find their product. The result is a single value that represents the product of all the numbers in the list.

In [None]:
"""Question
"""
nums = (16, 2, 19, 22, 10, 23, 16, 2, 27, 29, 19, 26, 12, 20, 16, 29, 6, 2, 12, 20)


# Checkpoint 1 code goes here.
filtered_numbers = filter(lambda y: y%2==0 ,nums)
print(tuple(filtered_numbers))


# Checkpoint 2 code goes here.
mapped_numbers = map(lambda y:3*y ,nums)
print(tuple(mapped_numbers))
# Checkpoint 3 code goes here.
from functools import reduce
sum = reduce(lambda x,y:x+y ,nums)
print(sum)

(16, 2, 22, 10, 16, 2, 26, 12, 20, 16, 6, 2, 12, 20)
(48, 6, 57, 66, 30, 69, 48, 6, 81, 87, 57, 78, 36, 60, 48, 87, 18, 6, 36, 60)
328


### Mapping a Filtered Collection
In this exercise, we will see how we can combine the map() and filter() functions.

```map(mapping_function, filter(predicate, iterable))```


Conceptually, if you’re working with a collection of items and find yourself saying,
* I need to map only values that have property x, you will likely need to use map() and filter() together. in English,
* I need to map filtered values translates into Python like this:



####Example
records of students in various math classes. The tuple will be structured in the following way:  student(name, grade, course).
* As an example, student("Peter", 'B', 101) will represent a student named Peter who received a grade of B in a Math 101 course.

students who receive a grade of B or better in their respective maths courses for a special advanced maths course: Maths 201. To create the record for the course, we will filter all students with grade B or higher and map their maths course to 201.
* student("Peter", 'B', 101)  → student("Peter, 'X', 201)

The initial grade in the new tuple will contain the letter ‘X’ to represent that it is not yet defined because the student has not yet finished Math 101.
 We can do this in Python like so:


In [None]:
from collections import namedtuple
# Create a class called student
student = namedtuple("student", ["name", "grade", "course_number"])
# Create the records for the students in the form of tuples
peter = student("Peter", 'B', 101)
amanda = student("Amanda", 'C', 101 )
sarah = student("Sarah", 'A', 102)
lisa = student("Lisa", 'D', 101)
alex = student("Alex", 'A', 102)
maria = student("Maria", 'B', 101)
andrew = student("Andrew", 'C', 102)
math_class = (peter, amanda, sarah, lisa, alex, maria, andrew)

# Create a new iterable 'math_201' using the map() function.
# For each student in the 'math_class' iterable, this code:
# 1. Filters students with a grade less than or equal to 'B'.
# 2. Maps these filtered students to a new student object with a different course name 'X' and course number '201'.
# 3. This is done using a lambda function that takes a student 's' and returns a new student object.

math_201 = map(lambda s: student(s.name, 'X', 201), filter(lambda q: q.grade <= 'B', math_class))

# Convert the 'math_201' iterable to a tuple and print the result.
print(tuple(math_201))

### Reducing a Filtered Collection
In this exercise, we will explore how to use reduce() and filter() together.


In [None]:
# Use the reduce() function to find the cheapest appetizer in the 'menu' iterable.
# The reduce function takes two arguments: a lambda function and an iterable.
# It repeatedly applies the lambda function to pairs of items in the iterable until a single result is obtained.

cheapest_app = reduce(
    lambda x, y: x if x.price < y.price else y,  # Lambda function compares prices of two menu items and returns the cheaper one.
    filter(lambda x: x.dish_type == "Appetizer", menu)  # Filter the 'menu' to select only items of type "Appetizer".
)

# Print the result, which is the cheapest appetizer in the 'menu'.
# The expected output is the menu item with the lowest price.
print(cheapest_app)

# Output will be: menu_item("Sizzling Canadian Bacon", "Appetizer", 9.95)


In the reduce() function, the lambda lambda x, y: x if x.price < y.price else y returns the cheaper of the two menu_items compared by their price.

The filter() function returns an iterable with all menu_items that have “Appetizer” as dish_type. This iterable is then passed into reduce() to be processed by its lambda function. Combining these two functions converts the imperative code into a one-line solution!


In [None]:
# Checkpoint 1:
# Find the most expensive "Entree" in the 'menu' iterable.
entree = reduce(
    lambda x, y: x if x.price > y.price else y,  # Lambda function compares prices and returns the more expensive item.
    filter(lambda x: x.dish_type == 'Entree', menu)  # Filter the 'menu' to select only items of type "Entree."
)
print(entree)

# Checkpoint 2:
# Find the least expensive item that is either a "Salad" or a "Side" in the 'menu' iterable.
least_expensive = reduce(
    lambda x, y: x if x.price < y.price else y,  # Lambda function compares prices and returns the cheaper item.
    filter(lambda x: x.dish_type == 'Salad' or x.dish_type == 'Side', menu)  # Filter the 'menu' to select "Salad" or "Side" items.
)
print(least_expensive)


### Reducing a Mapped Collection
In this exercise, we will be focussing on the benefits of using reduce() and map() together.


In [None]:
# Import the 'reduce' function from the 'functools' module.
from functools import reduce

# Create a dictionary 'costs' where each entry represents an item and its associated cost information.
costs = {
    "shirt": (4, 13.00),   # Example: 4 shirts sold at 13.00 GBP each
    "shoes": (2, 80.00),   # Example: 2 pairs of shoes sold at 80.00 GBP each
    "pants": (3, 100.00),  # Example: 3 pairs of pants sold at 100.00 GBP each
    "socks": (5, 5.00)     # Example: 5 pairs of socks sold at 5.00 GBP each
}

# Use the 'map' and 'reduce' functions to calculate the total cost.
# The 'map' function multiplies the number of units sold by the price per unit for each item.
# The 'reduce' function then adds up the individual costs to find the total cost.

# Calculate the total cost and assign it to 'k' using 'reduce'.
k = reduce(
    lambda x, y: x + y,  # Lambda function to sum up individual costs.
    map(lambda q: costs[q][0] * costs[q][1], costs)  # Map each item to its cost and then sum them up.
)

# Print the total cost, which is the result of the calculations.
print(k)  # Output will be the total cost of: 537.0 GBP



537.0


This dictionary is passed into map() along with the lambda lambda q: costs[q][0] * costs[q][1]. The lambda function takes the price tuple and generates a total_cost_per_item by multiplying the number_of_units_sold (costs[q][0]) by the price_per_unit_GBP (costs[q][1]). The lambda in the reduce() function is now working strictly with integers to sum them up and returns a total cost of £537.

### Combining all Three Higher-Order Functions

Now that you’ve learned how to combine any two functions, let’s see how (and why) we can combine all three! A reason for doing this would be when you need to “filter” a collection before you “map” it (or “map” then “filter”) and then “reduce” it to a single number.

Ex:

let’s revisit the inventory problem from earlier, but this time we are only interested in the total sum of prices of items that sold for less than a certain value.

We are interested in the total sum of prices of items that sold for less than £150. We do this by:

* First “map” the items their individual total cost ((number of units sold) * (price per unit)).
* Then eliminate (“filter” out) all items that cost more than £150.
* Then “reduce” the individual costs to a single number that represents the total cost of the items.



In [None]:
# Import the 'reduce' function from the 'functools' module.
from functools import reduce

# Create a dictionary 'costs' where each entry represents an item and its associated cost information.
costs = {
    "shirt": (4, 13.00),   # Example: 4 shirts sold at 13.00 GBP each
    "shoes": (2, 80.00),   # Example: 2 pairs of shoes sold at 80.00 GBP each
    "pants": (3, 100.00),  # Example: 3 pairs of pants sold at 100.00 GBP each
    "socks": (5, 5.00),    # Example: 5 pairs of socks sold at 5.00 GBP each
    "ties": (3, 14.00),    # Example: 3 ties sold at 14.00 GBP each
    "watch": (1, 145.00)   # Example: 1 watch sold at 145.00 GBP
}

# Use the 'map', 'filter', and 'reduce' functions to calculate the total cost.
# 1. The 'map' function calculates the cost of each item by multiplying the number of units sold by the price per unit.
# 2. The 'filter' function selects only those costs that are less than or equal to 150.00 GBP.
# 3. The 'reduce' function then adds up the individual costs that passed the filter to find the total cost.

# Calculate the total cost and assign it to 'k' using 'reduce'.
k = reduce(
    lambda x, y: x + y,  # Lambda function to sum up the filtered costs.
    filter(
        lambda r: r <= 150.00,  # Lambda function filters costs less than or equal to 150.00 GBP.
        map(lambda q: costs[q][0] * costs[q][1], costs)  # Map each item to its cost.
    )
)

# Print the total cost, which is the result of the calculations.
print(k)  # Output will be the total cost of: 264.0 GBP


In [None]:
# Import the 'reduce' function from the 'functools' module.
from functools import reduce

# Create a dictionary 'costs' where each entry represents an item and its associated cost information.
costs = {
    "shirt": (4, 13.00),   # Example: 4 shirts sold at 13.00 GBP each
    "shoes": (2, 80.00),   # Example: 2 pairs of shoes sold at 80.00 GBP each
    "pants": (3, 100.00),  # Example: 3 pairs of pants sold at 100.00 GBP each
    "socks": (5, 5.00),    # Example: 5 pairs of socks sold at 5.00 GBP each
    "ties": (3, 14.00),    # Example: 3 ties sold at 14.00 GBP each
    "watch": (1, 145.00)   # Example: 1 watch sold at 145.00 GBP
}

# Create a tuple 'nums' containing a list of numbers.
nums = (24, 6, 7, 16, 8, 2, 3, 11, 21, 20, 22, 23, 19, 12, 1, 4, 17, 9, 25, 15)

# Checkpoint 1:
# Calculate the total cost of items from the 'costs' dictionary where the individual cost is greater than 150.00 GBP.
total = reduce(
    lambda x, y: x + y,  # Lambda function to sum up the filtered costs.
    filter(
        lambda r: r > 150.00,  # Lambda function filters costs greater than 150.00 GBP.
        map(lambda q: costs[q][0] * costs[q][1], costs)  # Map each item to its cost.
    )
)
print(total)

# Checkpoint 2:
# Calculate the product of numbers from the 'nums' tuple where the numbers are less than 10 after adding 5 to them.
product = reduce(
    lambda x, y: x * y,  # Lambda function to calculate the product of filtered numbers.
    map(
        lambda x: x + 5,  # Lambda function to add 5 to each number.
        filter(
            lambda x: x < 10,  # Lambda function filters numbers less than 10.
            nums  # Filter the numbers in the 'nums' tuple.
        )
    )
)
print(product)


460.0
72648576


### Importing Data From a CSV File
### (Pointer)
In this exercise we will use map() to import the data from the CSV file and represent it using a namedtuple.


```
Index | Square footage | Year | List price (USD)
------+----------------+------+------------------
   1  | 2222           | 1981 | 250000          
------+----------------+------+------------------
   2  | 1628           | 2009 | 185000           
------+----------------+------+------------------
   3  | 3824           | 1954 | 399000          
------+----------------+------+------------------
   4  | 1137           | 1993 | 150000          
------+----------------+------+------------------
   5  | 3560           | 1973 | 315000          
```




In [None]:
# Import the 'csv' module for reading CSV files and 'namedtuple' from 'collections' to create named tuples.
import csv
from collections import namedtuple

# Define a namedtuple called 'house' with field names: index, square_footage, year, and list_price.
house = namedtuple("house", ["index", "square_footage", "year", "list_price"])

# Open and read the CSV file 'zillow.csv'.
with open('zillow.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=',', quotechar='|')

    # Read the records from the CSV file and convert them into named tuples.
    # Create a tuple called 'Houses' and populate it with individual 'house' named tuples generated from the CSV records.
    Houses = map(
        lambda x: house(int(x[0]), int(x[1]), int(x[2]), int(x[3])),  # Create a 'house' namedtuple for each record.
        reader  # Iterate through the CSV records.
    )

# Print the 'Houses' tuple, which contains named tuples representing houses from the CSV file.
print(tuple(Houses))


A CSV file often contains millions of lines of data. Importing the entire contents of a CSV file is impractical as this would occupy too much RAM resulting in poor program performance. To avoid importing all the data at once, reader is an iterator object that maintains a pointer to the file and iterates through the data when next(reader) is called.
  
``` reader = csv.reader( ) ```

In [None]:
# Import the 'csv' module for reading CSV files, 'namedtuple' from 'collections' to create named tuples,
# and 'reduce' from 'functools' for potential future use.
import csv
from collections import namedtuple
from functools import reduce

# Checkpoint 1:
# Define a namedtuple 'tree' with field names: index, width, height, and volume.
tree = namedtuple("tree", ["index", "width", "height", "volume"])

# Open and read the CSV file 'trees.csv'.
with open('trees.csv', newline='') as csvfile:
    reader = csv.reader(csvfile, delimiter=',', quotechar='|')
    next(reader)  # Skip the first line in trees.csv that contains the data labels.

# Checkpoint 2:
# Use the 'map()' function to create 'tree' named tuples for each row in the CSV and store them in a tuple.
mapper = map(
    lambda row: tree(int(row[0]), float(row[1]), int(row[2]), float(row[3])),  # Create a 'tree' namedtuple for each row.
    reader  # Iterate through the CSV rows.
)

# Create a tuple called 'trees' containing the 'tree' named tuples.
trees = tuple(mapper)

# Print the 'trees' tuple, which contains named tuples representing tree data from the CSV file.
print(trees)


### (Pointer)
- When working with a file that contains a large amount of data, generating every possible record using tuple() is inefficient.
- We can apply the higher-order functions to this iterator to process the data and only bring in the relevant set of records.

Example:

let’s say we only want to retain tuples for houses built after 1985. We can do this by applying filter() with the lambda lambda q: q[2] > 1985 like so:




```
h = filter(lambda q: q[2] > 1985, map(lambda x: house(int(x[0]), int(x[1]), int(x[2]), int(x[3])), reader))
houses = tuple(h)
print(houses) # Print out tuples of house built after 1985

# Now let’s say we want the most expensive house built after 1985. We can use reduce() to do this for us like so:

from functools import reduce
most_expensive = reduce(lambda x, y: x if x.list_price > y.list_price else y, houses)
print(most_expensive)

"""Output:
Print out a tuple of the most expensive house built after 1985
"""
```

Question


```
import csv
from collections import namedtuple
from functools import reduce
tree = namedtuple("tree", ["index", "width", "height", "volume"])
with open('trees.csv', newline = '') as csvfile:
 	reader = csv.reader(csvfile, delimiter=',', quotechar='|')
 	next(reader)
 	mapper = map(lambda x: tree(int(x[0]), float(x[1]), int(x[2]), float(x[3])), reader)

 	# Checkpoint 1 code goes here.
 	t = filter( lambda row: row.height >75, mapper)
 	trees = tuple(t)

 	# Checkpoint 2 code goes here.
 	widest = reduce( lambda x,y:x if x.width > y.width else y , trees)
 	print(widest)
```





### Processing Data From a JSON File

Data:


In [None]:
data = {
  "city": [
      {
           "name": "New York",
           "country": "United States of America",
           "coordinates": [40.7128, -74.0060],
           "continent": "North America"
       },
       {
           "name":"Los Angeles",
           "country": "United States of America",
           "coordinates": [34.0522, 118.2437],
           "continent": "North America"
       },
       {
           "name": "Montreal",
           "country": "Canada",
           "coordinates": [45.5017, -73.5673],
           "continent": "North America"
       },
       {
           "name": "Toronto",
           "country": "Canada",
           "coordinates": [43.6532, -79.3832],
           "continent": "North America"
       }
    ]
}

import json
from collections import namedtuple
from functools import reduce


city = namedtuple("city", ["name", "country", "coordinates", "continent"])


# with open('cities.json') as json_file:
#  data = json.load(json_file)


cities = map(lambda x: city(x["name"], x["country"], x["coordinates"], x["continent"]), data["city"])


# Code for Checkpoint 1 goes here.
asia = tuple(filter(lambda row: row.continent =="Asia" ,cities))
print(asia)


# # Code for Checkpoint 2 goes here.
# west = reduce(lambda x,y: x if x.coordinates[1] < y.coordinates[1] else y,asia)
# print(west)

<map object at 0x7cbebbfb0dc0>
()


### Review
- Functional programming follows the declarative approach to programming; the programmer describes what needs to be done as opposed to how it is done.
- Functional programming relies heavily on immutable data structures.
- It is preferable to use a tuple instead of a list when writing code because a tuple is immutable.
- Using a namedtuple to store data leads to a more readable data structure.
- The namedtuple collection resides in the collections library.
- A namedtuple is created like so:
```
name = ("name", ["property1", "property2", …, "property"])
```
- Representing an object by a namedtuple is done like so:
```
obj = name(property1, property2, …, property)
```
- Referencing an element of a namedtuple is done like this:
```
property = name.property
```



- A lambda function is an anonymous function.
- A lambda function can be stored in a variable like any other data type.
- A lambda function can be defined directly while passing it as an argument of another function.

- The three higher-order functions Python provides are:
  - `map()`
  - `filter()`
  - `reduce()`
- These functions can be supplied as the iterable parameter to any of the higher-order functions.
```
Ex:
h = map(lambda, filter(lambda, map(lambda, iterable)))
```

- Combining `map()` and `filter()` filters an iterable before a mapping function is applied.
- Combining `reduce()` and `filter()` filters an iterable before it is reduced.
- A CSV file can be processed by supplying the CSV reader as the iterable to the higher-order functions.
- A JSON file can only be represented in Python by a dictionary.
- An iterable is not available to read data stored in a JSON file.
- A dictionary must be provided to the higher-order functions when processing a JSON file.






## QUERYING SQLITE DATABASES WITH PYTHON


### What is SQL?

SQL, which stands for Structured Query Language, is a programming language designed to manage data stored in a relational database. A relational database is a database that organizes information into one or more data tables.

Databases Contain Data Tables

These data tables each contain many fields and records. Below is an image of a data table, where we see:
- fields: the columns or attributes of the table
- records: the rows or observations associated with each field

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20QUERYING%20SQLITE%20DATABASES%20WITH%20PYTHON%201.png)

The next image is a data table from a SQL relational database. This is a customer data table where the fields are: first_name, last_name, phone_number, email, and address_id.

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20QUERYING%20SQLITE%20DATABASES%20WITH%20PYTHON%202.png)

This data table originated from the SQL relational database to your right. Notice how this database is organised into data tables, and each data table has many fields. The lines that run between the data tables are to depict how the tables are related. If you hover over one of the lines, you will notice that the data tables share an identical field. When two or more data tables need data on related topics, it may be necessary for them to contain the same field. This is how these tables are related, hence the name relational database.

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20QUERYING%20SQLITE%20DATABASES%20WITH%20PYTHON%203.png)

### Why Use Python to Access SQLite?

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20QUERYING%20SQLITE%20DATABASES%20WITH%20PYTHON%204.png)

SQLite is a lightweight disk-based database, meaning we store data on a hard drive or another type of local storage.
- Many people use SQLite because it doesn’t require a separate server process, so programmers can edit or retrieve the data using a nonstandard form of the SQL query language.
- It can also be useful to prototype an application and then transfer the code to a larger database, such as Oracle or PostgreSQL.
- Additionally, applications such as Python can use SQLite for internal data storage.

How can Python access SQLite?
- Python’s Database-API (DB-API 2.0), we can connect Python to RDBMS (Relational Database Management System) like SQLite.
- The module sqlite3, we can create, read, update, and delete the data in the SQLite relational database within the Python environment.

- Without needing an additional application, you can utilise the SQL database storage system using a Python script.

- Python is excellent at data manipulation. Once you pull the SQLite data into your Python environment, you can analyse, visualise, change, and test this data.
  - Python data manipulation libraries: pandas, numpy, matplotlib, and more. Together, SQLite and Python make a great team!


### Connecting to SQLite in Python
To connect to a database. We can connect to a new or pre-existing database with the sqlite3.connect() API.

- Remember: API is simply a way to communicate between different applications. In this case, we want Python and SQLite to communicate with one another.
  - This call will connect to the database with the given name.
  - If the database does not exist, it will create a new blank database.

Our connection object is a cable that connects our python environment to our SQLite database.

```
# Create connection to database
connection = sqlite3.connect("titanic.db")
```

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20QUERYING%20SQLITE%20DATABASES%20WITH%20PYTHON%205.png)

**Creating a Cursor Object**

1. With sqlite3.connect() we have established a connection to the SQLite database “first.db”.
2. We need a way to call SQL statements on the data within the database. To do this, we use something called a cursor object.

Using a cursor object, we can:
- represent a database cursor
- call statements to our SQLite database
- return the data in our python environment.

We create a cursor object by using the cursor method from the connection class:

```
# Create cursor object
cursor = connection.cursor()
```

If we imagine the connection object as a cable that connects Python to SQLite, the cursor uses the cable to move back and forth to send messages and exchange data between the two.

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20QUERYING%20SQLITE%20DATABASES%20WITH%20PYTHON%206.png)

### Executing SQL Statements in Python
To start a SQL command (also known as a SQL statement), we must attach the `.execute()` method to your cursor object as such, `cursor.execute()`.



```
# Create cursor object
curs = connection.cursor()
# Create table named toys
curs.execute('''CREATE TABLE toys (
                  			id INTEGER,
                  			name TEXT,
                  			price REAL,
                 			 type TEXT)''')

# Insert a row of data in the toys table
curs.execute('''INSERT INTO toys VALUES (2244560, 'Ultimate Ninja Fighter', 24.99, 'action')''')
# commit changes to database
connection.commit()
```



### Inserting Multiple Rows with .executemany()
let’s insert multiple rows of data with the .executemany() method.

Example: object new_students contains a list of the rows we want to insert. These rows follow the same table schema of the already existing students table.


In [None]:
# Insert multiple values into table at once
new_students = [(102, 'Joe', 32, '2022-05-16', 'Pass'),
          (103, 'Stacy', 10, '2022-05-16', 'Pass'),
          (104, 'Angela', 21, '2022-12-20', 'Pass'),
          (105, 'Mark', 21, '2022-12-20', 'Fail'),
          (106, 'Nathan', 21, '2022-12-20', 'Pass')
          ]
# Insert values into the students table
cursor.executemany('''INSERT INTO students VALUES (?,?,?,?,?)''', new_students)
# commit changes to database
connection.commit()

(?,?,?,?,?) - placeholders

The five question marks represent each of the five fields in the database we are inserting values into. Lastly, we include the object new_students at the end of the `.executemany()` method.

### Retrieving Data
We will pull SQLite data into your Python environment.

SQL Commands Used for Data Retrieval

For all data retrieval methods in this exercise, you will need to use the SQL commands:

```
SELECT * FROM table_name
```
 to identify which table in the database you will be pulling data from.


#### .fetchone() method
```
# Return first row in students
cursor.execute("SELECT * FROM students").fetchone()
# Output
(101, 'Alex', 32, '2022-05-16', 'Pass')
```


#### .fetchmany()method
```
# Return first three rows in students
cursor.execute("SELECT * FROM students").fetchmany(3)


# This will output a list of tuples where each tuple is a separate row.
# Output
[(101, 'Alex', 32, '2022-05-16', 'Pass'),
(102, 'Joe', 32, '2022-05-16', 'Pass'),
(103, 'Stacy', 10, '2022-05-16', 'Pass')]
```



#### .fetchall()method
```
# Return all rows in students
cursor.execute("SELECT * FROM students").fetchall()
```








### Using Loops with SQLite
Like we used the fetch methods, we can also use a for loop to retrieve data. The following code will iterate through each row in the students table and print each row where the Grade field is 'Pass'.


```
# for loop that calculates the average
for row in cursor.execute('''SELECT * FROM students WHERE Grade = 'Pass';'''):
  	print(row)
```

loop to iterate through a table field and calculate a measurement.
```
# save all rows from a field with .fetchall() then use a for loop to find some sort of result.`
major_codes = cursor.execute("SELECT major_code FROM students;").fetchall()
# Obtain the average of the tuple list by using for loops
sum = 0
for num in major_codes:
 	sum = sum + num[0]
average = sum / len(major_codes)
# Show average
print(average)
```

Let’s walk through this example code:

1. We used a SQL statement to retrieve the major code field from the students table.
2. We created the variable sum initialised at 0, to sum up the total values in the data.
3. We used a for loop to iterate through every number in major_code to create the mean average major_code.
4. We add num[0] to sum at each iteration. Note that num is a tuple of length 1 (such as (12,)) so num[0] will allow us to access the actual integer.
5. We find the average by dividing the sum and the length of major_codes.

### Committing Changes and Closing the Database Connection
We learned how to create, insert, edit, and pull specific data from a SQLite database. If we create or edit a data table using SQLite, we MUST use the .commit() method to save any alteration made to the database.
- Committing the changes ensures that others who view the database will also see these changes.
- If we do not commit these changes, they can be lost!


```
# Insert row into toys table
cursor.execute('''INSERT INTO toys VALUES (2244560, 'Ultimate Ninja Fighter', 24.99, 'action')''')

# commit changes to database
connection.commit()
```

### Closing the Connection
Once we have committed all changes, we may close the connection to the database with `.close()`. This will ensure that we do not edit the wrong database the next time we use SQLite.

```
# close connection
connection.close()
```

Notice again that we used the .close() method with the connection object.

Question
```
from start import helper
helper()
import sqlite3
con = sqlite3.connect("titanic.db")
curs = con.cursor()


# insert a row in new_table table
curs.execute("""INSERT INTO new_table VALUES ('Stephanie Bready', 37, 'stephB423', 30.00)""")
# commit this change
con.commit()
# close the connection
con.close()
```





### Review
Congratulations! You have completed your journey through database operations with Python. Let’s reflect on everything we have learned in this lesson:

- SQL, which stands for Structured Query Language, is a programming language designed to manage data stored in relational databases.
- Once you pull the SQLite data into your Python environment, you can analyse, visualise, change, and test this data.
- You may also edit a new or pre-existing SQLite database directly from a Python environment by connecting to the database using the sqlite3.connect() API.
- In a database, a cursor allows us to traverse over the data one row at a time to call statements and return data. We can create a cursor object using the .cursor() method.
- Using the .execute() method in combination with a CREATE clause, we can create a table within the SQLite database.
- Using the .execute() method and an INSERT clause, we can insert data into a pre-existing table.
- To insert multiple rows/records of data at once, we can use the .executemany() method.
- To retrieve SQLite data, we can use the fetch methods; .fetchone(), .fetchmany(), and .fetchall().
- A Python for loop can be used to retrieve SQLite data. It can also be used to analyse already pulled data.
- After making changes to the SQLite database, we must commit the changes using the .commit() method.
- When we finish editing the SQLite database and commit the changes, we can use the .close() method to close the database connection.

Question
```
# Import module sqlite3
import sqlite3
# Create connection object
con = sqlite3.connect("titanic.db")
# Create cursor object
curs = con.cursor()
# Retrieve the row where username = 'stephB423'; from new_table
n_row = curs.execute("""SELECT * FROM new_table WHERE username = 'stephB423'""").fetchall()
print(n_row)
# Close the connection
con.close()
```







## What is Concurrent Programming

### Introduction
In this article, we are going to talk about the following topics:
- Sequential Programming
- Concurrency and Parallelism
- Asynchronous Programming
- threading, multiprocessing, and asyncio modules
- Sequential Programming

As you have gone along your Python journey, you have most likely worked with sequential programs. These are programs that follow a set order of instructions. We can view this on the following diagram:

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20What%20is%20Concurrent%20Programming%201.png)

Great! We have a deterministic algorithm that suits our needs. However, in this program, we have four steps that do not take a long time. Let’s imagine we are a data scientist working on a complex learning model or a backend developer working with an extensive database. Suddenly, we are dealing with many more steps that take a lot more processing power.

In situations like this, the concepts of concurrent programming and parallel programming come into play. To give us a basic intuition behind each of these paradigms, let’s picture three separate scenarios of ordering food at a deli.


### Analogies
We can view sequential programming as a line of customers with access to one register. In this scenario, each customer has to wait in a single-file line until they can make their order. This is fine if there are only a few customers. However, as the number of customers piles up, this can be incredibly inefficient for the deli.

We can view concurrent programming as multiple lines of customers with access to one register. In this model, we might have three separate lines. The cashier can pick one of the customers who is ready to order from the front of a line. This can speed up ordering for the deli as if a customer decides they are not quite ready to order yet, they can let one of the other people in the front of the line go ahead.

We can view parallel programming as multiple lines of customers with access to multiple registers. In this model, there are multiple registers that can divide up the customers. This way, the deli is more likely to avoid long lines and run a more efficient operation.

Finally, we can also have asynchronous programs. If we add this paradigm to our analogy, we can view it as having a wait queue where each customer has a ticket. Once a customer’s food is ready, their ticket gets called. However, the tickets don’t necessarily get called in order; instead, any ticket can be called once the customer’s food is ready. Sequential (synchronous) would mean that customers would have to wait for each customer before them to get their food.


### Defining Technical Terms
Now that we have some high-level understanding of these concepts, let’s get back to programming mode and develop technical definitions for these terms.
- Concurrency is the process in which we have multiple tasks running and completing during overlapping periods of time.
- Parallelism is the process in which we simultaneously have multiple tasks or separate parts of the same task running using multiple CPUs (core processing units). These definitions seem quite similar; however, we should identify some key difference between the two processes:
- Parallelism needs hardware with multiple processing units, whereas concurrency only utilises one.
- Concurrency requires at least two tasks to exist whereas parallelism only requires one.
- Parallelism assigns each task for a core to execute, whereas concurrency executes all tasks by switching tasks simultaneously.
- Parallelism means we can do multiple things at once, while concurrency means we can juggle between tasks.

![]()

These diagrams should remind us of our deli line analogy from before. Think of each processing unit as a register and each task as a line. With one core, we can split up the tasks cleverly to speed up the runtime. With multiple cores, we can run those tasks at the same time.

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20What%20is%20Concurrent%20Programming%202.png)

## PROCESSES AND THREADS

multiple programs can often be found running at the same time.
* playing music
* creating documents
* one for browsing the web

All of these programs have certain functionalities, but on their own they do nothing. To actually make use of them, they must be executed. Processes can sometimes also be called “tasks” or “jobs”

* Computer program is a static collection of coded instructions stored on a disk
* process is an abstraction representing the program when it is running.
* A process is created when a program is executed.

These processes are not only central for the usability of a computer, but they are the building blocks of an operating system. Managing these processes is central to operating system development.

The key defining factor is that processes generally operate independently and do not share data;

Example:  a music player program will launch a music player task that would be independent of the task managing an office suite.

### Methods in Python
https://www.codecademy.com/learn/paths/data-engineer/tracks/decp-advanced-python-for-data-engineers/modules/concurrent-programming/cheatsheet


In Python, we are going to go over the following libraries:
* threading
* multiprocessing
* asyncio

Using these, we can create concurrent, parallel, and asynchronous programs that are much faster than synchronous ones.
We can map these libraries one-to-one as:
* threading –> concurrency
* multiprocessing –> parallelism
* asyncio –> asynchronous

In the next few lessons, we will learn about the concepts of threads and processes. Then we will see how to use these concepts to create powerful and efficient programs and see our conceptual understanding of concurrency, parallelism, and asynchronous programming come to life!


### Introduction to Processes
multiple programs can often be found running at the same time.
- playing music
- creating documents
- one for browsing the web

All of these programs have certain functionalities, but on their own they do nothing. To actually make use of them, they must be executed. Processes can sometimes also be called “tasks” or “jobs”
- Computer program is a static collection of coded instructions stored on a disk
- process is an abstraction representing the program when it is running.
- A process is created when a program is executed.

These processes are not only central for the usability of a computer, but they are the building blocks of an operating system. Managing these processes is central to operating system development.

The key defining factor is that processes generally operate independently and do not share data;

Example:  a music player program will launch a music player task that would be independent of the task managing an office suite.

![](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20What%20is%20Concurrent%20Programming%203.png)


### Lifecycle of a Task
To best optimise the performance of tasks as their priority changes or as they wait for access to a limited resource, tasks are put into one of five states:
1. New: The program has been started and waits to be added into memory in order to become a full task.
2. Ready: Process fully initialised, loaded into memory, and waiting to be picked up by the processor.
3. Running: Currently being executed by the processor.
4. Blocked: The task requires a contested resource that it must wait for.
5. Finished: The task has been completed.


![Lifecycle of a Task](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Lifecycle%20of%20a%20Task.png)


The life cycle of a task is its journey between these five states.
* CPU cores only execute one task at a time
* Managing the state of task allows the processor to interleave these tasks and allows multiple tasks to best share these cores and other limited computer resources.

Example: instead of a task occupying the processor while waiting for user input, it can be marked as blocked, and have the processor focus on another task in the ready state until that input arrives.


* Blocking isn’t inherently negative as some tasks require more time.
* Marking these tasks as blocked allows the processor to prioritise other tasks, creating a more responsive and efficient system.

Similarly, some tasks may also be reverted to the Ready state through preemption, where tasks are temporarily interrupted by an external scheduler for urgent reasons, such as a hardware interrupt signal asking the system to shutdown.

All of these switching tasks do come with overhead that is best to be avoided. This is called context switching and is typically an expensive operation as the current state of the tasks needs to be stored and then be reloaded later to resume execution.



### Process Layout and Process Control Block
When a task is initialised, its layout within memory has four distinct sections:
* A text section for the compiled code
* A data section for initialised variables
* A stack for local variables defined within functions
* A heap for dynamic memory allocation

Tasks are also initialised with a task Control Block that is required by the operating system for managing the task. This contains:
* A unique task ID and the ID of any parent tasks that launched the current one
* The current task state
* How long the task has been running and any time limits the task may have
* Allowed system resources and other permissions
* The priority of the task
* The program counter for the address of the instruction currently being executed

The address of other registers within the CPU holding intermediate values

Information required for memory management such as page and segment tables

Additionally, when one task launches another, the original enters a parent-child relationship with the newly-launched task that shares much of the above data. For example, when an existing music player task starts a new task for scanning the user’s music library, both of these tasks generally share the same system resources and permissions. Parent tasks usually also wait for their children to complete before terminating themselves, unless the child was created specifically to run independently in the background.

![Process Control Block](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Process%20Layout%20and%20Process%20Control%20Block.png)




### Introduction to Threads

While a process is an abstract data structure that represents all of the necessary information to run a program, a thread represents the actual sequence of processor instructions that are actively being executed.

Each process contains at least one thread to be able to execute, although more can be created to allow for concurrent processing if it is supported by the CPU. These threads live within the process and share all of the common resources available to it, such as memory pages and active files, as shown in the image to the right.

These shared resources are critical for the definition of a thread. While each process is typically independent, multiple threads usually work together within the context of a process. By sharing data directly, there is faster communication and context switching between threads than what is possible for processes, all while taking fewer system resources.

For example, within a video game process, multiple threads may exist to manage separate services relating to the operation of the game, such as one thread for collecting user inputs and another for producing sounds. As these threads live within the same process, they can easily share information about the game, such as the type of ground the player is walking on. This can be used to affect both the speed the character moves from the input thread as well as the noises created by the sound thread.

![Threads](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Threads.png)

### Multithreading

Typically, a single CPU core can only execute one thread, and therefore one process, at a time. With a clever use of blocking and context switching, this limitation can be obscured to users through nanosecond-long pauses that allow processes to be completed near-simultaneously. With some hardware advances, single CPU cores can now execute multiple threads at once, which is a capability called multithreading.

Parallelizing computations have a variety of benefits, such as improved system utilization and system responsiveness. This is because tasks can be more evenly split between multiple threads, exhausting all available computing resources and allowing longer tasks to run in the background, separate from user input. The image to the right shows how threads share data to achieve this.

However, these optimizations come with disadvantages due to the additional complexity required for the implementation. Not only are these programs more difficult to write because of their non-sequential nature, but they also create whole new classes of bugs.

The two of the most common examples are data races, where multiple threads attempt to modify the same piece of data, and deadlocks, where multiple threads all attempt to wait for each other and freeze the system. Also, since these bugs are usually related to the tight timing of CPU interactions, the programs can be considered non-deterministic and therefore untestable, compounding the problem.

![Multithreading](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Multithreading.png)

### Kernel Threads vs User Threads

Threads can behave differently depending on the environment they are created in.

A thread built into the existing process is considered a kernel thread. This means that the kernel within the operating system is fully aware of these threads and directly manages their execution.

There are also user threads that exist solely in userspace and, while functionally identical, are not known or controlled by the kernel. This allows for more fine-grained control by developers. These threads are even more efficient than their kernel counterparts as they save on the costly indirection of making a system call to constantly interact with the kernel.

While these user threads typically operate independently of the kernel, they do need to be mapped to existing kernel threads in order to have the operating system execute them. There are three common models for mapping user threads to kernel threads, as shown in the image to the right:
- 1:1 Kernel-level threading for a simple implementation that best allows for hardware acceleration provided by the kernel threads.
- N:1 User-level threading for ultra-light threads that can quickly communicate and context switch, but do not benefit from hardware acceleration due to sharing the same kernel thread.
- M:N Hybrid threading to get the best of both of the above solutions: very light and fast threads that can be hardware accelerated as necessary. However, this complex implementation can lead to bugs such as priority inversion where less important tasks are mistakenly prioritised and run first.

![Kernel Threads vs User Threads](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Kernel%20Threads%20vs%20User%20Threads.png)

### Review and Wrap-up

Congratulations! You have finished learning about some of the key foundations of an operating system: the processes and threads that all of the other code on the system rests upon.

A process is an abstraction within the operating system that represents the program while it is in execution. These processes exist in five states that are leveraged to allow the CPU cores to alternate between ready and blocked processes to best take advantage of limited computing resources.

A thread represents the actual sequence of processor instructions that are actively being executed. Each process contains at least one thread and can contain many such structures that all share resources among each other to allow for faster communication and context switching between them. This all allows them to be “lighter” and require fewer system resources. With the hardware advancement of multithreading, individual cores can also execute multiple threads at once, further improving system utilisation and responsiveness by more efficiently splitting up tasks.

Threads behave differently depending on the environment they were created in. Kernel threads are constructed through system calls to the kernel while user threads are constructed using local function calls. User threads, therefore, allow for more fine-grained control by developers that can be more efficient than their kernel counterparts. However, these user threads have to be mapped to their kernel counterparts in order to be actually executed.

![Lifecycle](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20Lifecycle%201.png)


1. How does an operating system attempt to best take advantage of limited computing resources?

Context switching between ready and blocked processes on the CPU core.

Context switching allows CPU cores to alternate between ready and blocked processes to best take advantage of limited computing resources.


2. What is the parent-child relationship between processes?

When a process launches another process, the original process enters a parent-child relationship with the new process. This relationship facilitates the sharing of common data and signals along the hierarchy as well as the arrangement of which process may terminate first.

The parent-child relationship describes the hierarchy between


3. What is the difference between kernel and user threads?

Kernel threads are constructed through system calls to the kernel while user threads are constructed using local function calls.

Kernel threads are threads created in kernel space using kernel code and libraries through a system call. The kernel is fully aware of these threads and can properly manage them. User threads are threads created in user space using local code and function calls. The kernel is not aware of these threads and cannot directly control them.


4. What is preemption?

When a process is temporarily interrupted by an external scheduler to prioritise a more important task.

Preemption is the interruption of a process that occurs when the scheduler decides that another process needs to be run instead.

5. When is a process blocked?

When the process has to wait for a contested, limited, or slow resource, such as accessing a specific file or waiting for a network request.

A process is blocked any time it has to wait for a separate resource outside of its control.

6. Within memory, the layout of a process has four distinct sections: A text section for the compiled code, a data section for initialised variables, a stack for local variables, and a heap for dynamic memory allocation .

The layout of a process contains the compiled code of the program, the initialised variables, the stack of local variables, and the dynamic memory heap.

7. A process typically exists in five states: New, Ready, Running, Blocked, or Finished.

The lifecycle of a process is its journey between the five possible states: New, Ready, Running, Blocked, or Finished.


6. What is a benefit of multithreading?

Multithreading allows a single CPU core to execute multiple threads at once, thereby improving system utilisation and responsiveness by more efficiently splitting up tasks.

By separating tasks across multiple threads, multithreading allows computer systems to usually complete tasks more quickly due to the more efficient prioritisation of targeted jobs.


7. What is a thread?

A thread represents the sequence of programmed instructions that are actively being executed.

Threads ase the programmed instructions within a process that are actively being executed. They share resources which allows for faster communication and context switching as well as requiring fewer system resources when compared to processes.


8. What is a downside of multithreading?

Multithreading can create more complex code due to having to manage shared resources and the nondeterminism brought about by synchronising many concurrent threads.

Managing multiple threads requires more advanced coding practices and testing.


9. What is the difference between a computer program and a process?

A computer program is a collection of instructions stored on the disk while a process is an abstraction used to represent a program while it is executing.

A computer program is a static collection of code while a process is a dynamic data structure containing relevant operating system information for a running program.


10. What is the purpose of the process control block?

For the operating system to identify and control the process.

Each process is initialised with a process control block that is required by the operating system to be able to identify and control the process.


## CONCURRENT PROGRAMMING IN PYTHON
### Introduction
So far, we have seen what concurrent programming is and learned about processes and threads. Now we are going to apply our knowledge of these concepts using Python! In this lesson, we are going to learn about three modules:
- threading
- asyncio
- multiprocessing

These modules can help speed up programs and provide design clarity. We will walk through what they are and practice writing short programs with them. Along the way, think back to the previous article and lesson you read. Make sure you understand the connection between these modules and the concepts we have covered already.




Much of this lesson will involve comparing the efficiency and behaviour of sequential programming with concurrent programming.

In script.py, you should see a sequential program with a timer. We will use the timer throughout this lesson to track how long each program takes to run.

After you have looked over the code a bit, click Run to move onto the next exercise.

In [None]:
import time
def sequential():
       s = time.perf_counter()
       print("Codecademy")
       time.sleep(2)
       print("says hello!")
       elapsed = time.perf_counter() - s
       print("Sequential Programming Elapsed Time: " + str(elapsed) + " seconds")


sequential()


# Output:
# Codecademy
# says hello!
# Sequential Programming Elapsed Time: 2.002037300997472 seconds


Codecademy
says hello!
Sequential Programming Elapsed Time: 2.0022948109999987 seconds


### The Threading Module
In the previous lesson, we saw what a thread is. Now let’s see how we can speed up our Python programs with the threading module!
To review, a thread is a unique flow of execution. Theoretically, multiple threads mean the ability to run multiple things at the same time. However, in Python, threads do not actually run simultaneously; they merely appear to do so. Still, we will find that threading can increase the speed of our program. Generally, threading will perform much better than sequential programming if tasks within your program spend a lot of time waiting for external events to occur.

To create a thread instance in Python, we use the following code:



```
import threading
example_thread = threading.Thread(target=some_function, args=(some_arg,))
```

The two main parameters we will focus on are:
- target: this is the function you want to execute with thread(s). It defaults to None.
- args: this is the argument or set of arguments applied to the target function. It is a tuple and defaults to None.

For example, let’s say we wanted to apply a thread to the function called analyze_list() with arguments l1, l2, l3. We would use the following code:
```
t = threading.Thread(target=analyze_list, args=(l1, l2, l3))
```

After creating our thread instance, we also have to “start” our thread using .start(). Therefore, to make sure t executes on Run, we write:
```
t.start()
```







In [None]:
import time
import threading
def greeting_with_sleep(string):
       s = time.perf_counter()
       print(string)
       time.sleep(2)
       print("says hello!")
       elapsed = time.perf_counter() - s
       print("Sequential Programming Elapsed Time: " + str(elapsed) + " seconds")


greeting_with_sleep('Codecademy')


t = threading.Thread(target=greeting_with_sleep, args=('Codecademy',))
t.start()


Codecademy
says hello!
Sequential Programming Elapsed Time: 2.0021024430000125 seconds
Codecademy


### Using Multiple Threads
We have seen how to create a thread in Python; however, we didn’t see any speed or efficiency benefits in our previous example.

To see the benefits of threading, let’s learn how to create multiple threads and test them out in our example.

When creating multiple threads throughout this course, we will use one of the following approaches:


```
# create each thread
t1 = threading.Thread(target=target_function, args=(arg1,))
t2 = threading.Thread(target=target_function, args=(arg2,))
t3 = threading.Thread(target=target_function, args=(arg3,))
# start each thread
t1.start()
t2.start()
t3.start()
```
or

```
threads = []
# list of arguments to use
args = [arg1, arg2, arg3]
# iterate through the length of arguments
for i in range(len(args)):
     # create thread
     t = threading.Thread(target=target_function, args=(args[i],))
     # add thread to threads list
     threads.append(t)
     # start thread
     t.start()
```

The benefit of the second approach is that it keeps track of our threads in a list called threads. In the next exercise, we will see why this is helpful. Therefore, we will follow this approach going forward.





In script.py, we have some code set up for you. greeting_with_sleep() is loaded in; however, this time, it does not have the timer code. Instead, a function called main_threading() contains the timer code. In this function, you will write code to create and initiate four threads.

Inside main_threading(), write a for loop that iterates through greetings. Inside this for loop:
- Create a thread called t that has greeting_with_sleep() as its target and takes one arg from greetings at each iteration.
- Start the created thread.

For now, do not worry about creating a threads list.

Finally, call main_threading().

Inside the terminal output, type python3 script.py and press
Enter. To check your answer, click Check Work once your program finishes running.

Does your output make sense?


In [None]:
import time
import threading
def greeting_with_sleep(string):
       print(string)
       time.sleep(2)
       print("says hello!")




def main_threading():
       s = time.perf_counter()
       # your code goes here
       greetings = ['Codecademy', 'Chelsea', 'Hisham', 'Ashley']
       for i in range(len(greetings)):
              t = threading.Thread (target=greeting_with_sleep, args=(greetings[i],))
              t.start()
       elapsed = time.perf_counter() - s
       print("Threading Elapsed Time: " + str(elapsed) + " seconds")


main_threading()

Codecademy
Chelsea
Hisham
AshleyThreading Elapsed Time: 0.007209773999989011 seconds



### Joining a Thread
In the previous exercise, we started working with multiple threads and noticed some interesting behaviour. We will investigate this behaviour more with the .join() threading method.

We can use .join() to tell one thread to wait for this thread to stop before moving on. Let’s see how we implement this and then try it out in our previous example.
We use .join() after each thread has already been initiated.


```
# create each thread
t1 = threading.Thread(target=target_function(), args=(arg1,))
t2 = threading.Thread(target=target_function(), args=(arg2,))
t3 = threading.Thread(target=target_function(), args=(arg3,))
# start each thread
t1.start()
t2.start()
t3.start()
With the first approach from the previous exercise, we do:
t1.join()
t2.join()
t3.join()
```
Or
```
threads = []
# list of arguments to use
args = [arg1, arg2, arg3]
# iterate through the length of arguments
for i in range(len(args)):
     # create thread
     t = threading.Thread(target=target_function, args=(args[i],))
     # add thread to threads list
     threads.append(t)
     # start thread
     t.start()
With the first approach from the previous exercise, we do:
for t in threads:
       t.join()
```



#### Examples  
1. In script.py, the code loaded in is almost identical to the final code you had in the previous exercise, but an empty list called threads has been added to main_threading().

Within the for loop, beneath # add append code here, write a line of code so that t is added to list threads.
Inside the terminal, type python3 script.py and press Enter. To check your answer, click Check Work once your program finishes running.

2. Underneath # add join code here, write a for loop that traverses threads and call .join() for each thread in threads.

Inside the terminal, type python3 script.py and press Enter. To check your answer, click Check Work once your program finishes running.

Is your output different than the output of the previous exercise?

3. You should see something similar to the following output:


```
Codecademy
Chelsea
Hisham
Ashley
says hello!
says hello!
says hello!
says hello!
```

Threading Elapsed Time: 2.002612964999571 seconds
Let’s walk through this. Now that we are using .join(), main_threading() does not complete until each thread has been executed. Therefore, we get a more accurate measurement of two seconds. If we were to run this with sequential programming, it would take eight seconds since calling:
```
greeting_with_sleep('Codecademy')
greeting_with_sleep('Chelsea')
greeting_with_sleep('Hisham')
greeting_with_sleep('Ashley')
would run into time.sleep(2) four times.
```
You may also wonder why the output order is the names, and then the greeting “says hello!”. This is because when each thread is blocked by time.sleep(2) they begin to work concurrently to complete each task. In the next checkpoint, we’ll see some interesting behaviour resulting from this concurrency.

4. Before running script, make a prediction about what the order of the outputs will be?


In [None]:
import time
import threading
def greeting_with_sleep(string):
  print(string)
  time.sleep(2)
  print(string + " says hello!")




def main_threading():
        s = time.perf_counter()
        threads = []
        greetings = ['Codecademy', 'Chelsea', 'Hisham', 'Ashley']
        for i in range(len(greetings)):
                t = threading.Thread (target=greeting_with_sleep, args=(greetings[i],))
                t.start()
                # add append code here
                threads.append(t)
        # add join code here
        for t in threads:
          t.join()


        elapsed = time.perf_counter() - s
        print("Threading Elapsed Time: " + str(elapsed) + " seconds")


main_threading()


Codecademy
Chelsea
Hisham
Ashley
Codecademy says hello!
Chelsea says hello!
Hisham says hello!
Ashley says hello!
Threading Elapsed Time: 2.0130919530001847 seconds


### The Asyncio Module

Now we will cover another concurrent programming model: the asyncio module.

The asyncio module uses async/await syntax. async and await are two keywords that allow you to build and execute asynchronous code in your programs.

The async keyword declares a function as a coroutine. Coroutines are functions that may return normally with a value or may suspend themselves internally and return a continuation. This is a fancy way of saying they allow tasks to be paused and resumed to mimic multitasking. This is conceptually very similar to what we saw with threads! Coroutines are at the heart of asynchronous programs in Python.

The await keyword suspends execution of the current task until whatever is being “await”ed on is completed. For example, if we have an “await function task2” within a coroutine “task1” this tells Python “Suspend task1 until task2 is completed.”

Let’s apply this to some real examples and see how we declare asynchronous functions in Python. The following code block defines hello_async(), prints “hello”, waits three seconds, and prints “how are you?”.



```
import asyncio
async def hello_async():
  print("hello")
  await asyncio.sleep(3)
  print("how are you")
```
To run the coroutine, we have to use the following syntax:


```
# note: syntax updatded for python 3.7
loop = asyncio.get_event_loop()
loop.run_until_complete(hello_async())
```
Or
```
# note: syntax updatded for python 3.7
asyncio.run(hello_async)
```
Or
```
# note: syntax updatded for Jupyter notebook
await hello_async()
```

1. Let’s change greeting_with_sleep_async() into an asynchronous function. To do this:
- Add in the async keyword.
- Replace time.sleep(2) with the await keyword.

Inside the terminal output, type python3 script.py and press Enter. To check your answer, click Check Work once your program finishes running.

2. Use the two lines of loop syntax noted in this lesson to run your asynchronous function. Call your function with a string set equal to 'Codecademy'. What do you think the output will be?

Inside the terminal output, type python3 script.py and press Enter. To check your answer, click Check Work once your program finishes running.


In [None]:

import time
import asyncio


async def greeting_with_sleep_async(string):
          s = time.perf_counter()
          print(string)
          await asyncio.sleep(2)
          print("says hello!")
          elapsed = time.perf_counter() - s
          print("Asyncio Elapsed Time: " + str(elapsed) + " seconds")

await greeting_with_sleep_async('Codecademy')

Codecademy
says hello!
Asyncio Elapsed Time: 2.0037986050001564 seconds
elapsed: 2.0037986050001564


In [None]:
import time
import asyncio


async def greeting_with_sleep_async(string):
          s = time.perf_counter()
          print(string)
          await asyncio.sleep(2)
          print("says hello!")
          elapsed = time.perf_counter() - s
          print("Asyncio Elapsed Time: " + str(elapsed) + " seconds")
          return elapsed

elapsed = await greeting_with_sleep_async('Codecademy')
print("elapsed:",elapsed)

Codecademy
says hello!
Asyncio Elapsed Time: 2.004042247000143 seconds
elapsed: 2.004042247000143


### Multiple Asynchronous Tasks

Let’s use our knowledge of async/await syntax to do some complicated tasks.
If we wanted to run multiple tasks, we can do a setup that is similar to how we created multiple threads. Let’s walk through the following code:


```
async def main():
           tasks = [task1(arg1), task2(arg2), task3(arg3)]
           await asyncio.gather(*tasks)
```
Breaking this down, we define main() as a coroutine function. tasks is a list of separate function calls. Note that each of task1(), task2(), and task3() are each coroutine functions.

The next line is where the magic happens. asyncio.gather() groups all of our tasks together and allows them to be run concurrently. You can read more about it here.

[python docs on asyncio-task](https://docs.python.org/3/library/asyncio-task.html#running-tasks-concurrently)

It is given the await syntax. Finally *tasks unpacks the tasks lists.


Example
1. Inside of main_async() below # your code goes here, use the .gather() asyncio method to group together the tasks in greetings.

  Inside the terminal output, type python3 script.py and press Enter. To check your answer, click Check Work once your program finishes running.

2. Call your main_async() function. Define your .event_main_loop() as loop. What do you think the output will be? How long do you think the execution will take?

  Inside the terminal output, type python3 script.py and press Enter. To check your answer, click Check Work once your program finishes running.

3. Change the following print statement:


```
print("says hello!")
```
to:
```
print(string + " says hello!")
```
Before running script.py, make a prediction about what the order of the outputs will be?

In [None]:
import time
import asyncio


# Define an asynchronous function to print a greeting with a delay.
# Asynchronously sleep for 2 seconds (non-blocking delay).
# Print the input string with " says hello!" after the delay.
async def greeting_with_sleep_async(string):
              print(string)
              await asyncio.sleep(2)
              print(string + " says hello!")



# Define the main asynchronous function.
# Record the start time for performance measurement.
# Create a list of asynchronous greeting_with_sleep_async tasks.
# Asynchronously gather and execute all greeting tasks concurrently.
# Calculate the elapsed time since the start.
# Print the total time taken by the asynchronous execution.
async def main_async():
              s = time.perf_counter()
              greetings = [greeting_with_sleep_async('Codecademy'), greeting_with_sleep_async('Chelsea'), greeting_with_sleep_async('Hisham'), greeting_with_sleep_async('Ashley')]
              # your code goes here
              await asyncio.gather(*greetings)
              elapsed = time.perf_counter() - s
              print("Asyncio Elapsed Time: " + str(elapsed) + " seconds")


# Call the main asynchronous function to start the program.
# if you run this in python 3.7
# loop = asyncio.get_event_loop()
# loop.run_until_complete(main_async())

# for jupyter notbook
await main_async()


Codecademy
Chelsea
Hisham
Ashley
Codecademy says hello!
Chelsea says hello!
Hisham says hello!
Ashley says hello!
Asyncio Elapsed Time: 2.0037702019999415 seconds


In [None]:
import time
import asyncio

# Define an asynchronous function to count and print messages.
async def count():
    print("count one")
    # Asynchronously sleep for 1 second (non-blocking delay).
    await asyncio.sleep(1)
    print("count four")

# Define another asynchronous counting function.
async def count_further():
    print("count two")
    await asyncio.sleep(1)
    print("count five")

# Define one more asynchronous counting function.
async def count_even_further():
    print("count three")
    await asyncio.sleep(1)
    print("count six")

# Define the main asynchronous function to run the counting tasks concurrently.
async def main():
    # Use asyncio.gather to run the count functions concurrently.
    await asyncio.gather(count(), count_further(), count_even_further())

# Record the start time for performance measurement.
s = time.perf_counter()

# Call the main asynchronous function to start the counting tasks concurrently.
await main()

# Calculate the elapsed time since the start.
elapsed = time.perf_counter() - s

# Print the total time taken for the script execution.
print(f"Script executed in {elapsed:0.2f} seconds.")


count one
count two
count three
count four
count five
count six
Script executed in 1.01 seconds.


### The Multiprocessing Module
We are onto our final Python module, multiprocessing. This module is unique from threading and asyncio in that it allows the user to leverage multiple processors on a given machine simultaneously. This is because instead of threads or asynchronous tasks, multiprocessing is powered by subprocesses.

Implementation will look very similar to how we approached the threading module.

We went over processes in the previous lesson; now we are going to implement them in Python.

To create a process in Python, we do the following:


```
import multiprocessing
p = multiprocessing.Process(target=target_function, args=(arg,))
```
This might look familiar as the arguments are identical. target is the function we execute with the process and args is the argument or set of arguments used in the function.

Also identical to the threading module, to start the process, we use:


```
p.start()
```


In [None]:
import time
import asyncio
import multiprocessing


def greeting_with_sleep(string):
           s = time.perf_counter()
           print(string)
           time.sleep(2)
           print("says hello!")
           elapsed = time.perf_counter() - s
           print("Multiprocessing Elapsed Time: " + str(elapsed) + " seconds")



p = multiprocessing.Process(target=greeting_with_sleep, args=('Codecademy',))
p.start()


Output:
```
Codecademy
says hello!
Multiprocessing Elapsed Time: 2.0092333778738976 seconds
```



### Using Multiple Processes
We have seen how to create a process in Python; however, we didn’t see any speed or efficiency benefits in our previous example.

To see the benefits of processes, let’s learn how to create multiple processes and test them out in our example.

We will create multiple processes using one of the following approaches:


```
import multiprocessing

# Create an empty list to store the process objects.
processes = []

# Define a list of arguments to be passed to target_functions.
args = [arg1, arg2, arg3]

# Iterate through the list of arguments.
for arg in args:
    # Create a multiprocessing Process, targeting the target_functions with the given argument.
    p = multiprocessing.Process(target=target_functions, args=(arg,))
    
    # Add the created process to the processes list.
    processes.append(p)
    
    # Start the process, causing it to execute target_functions with the provided argument.
    p.start()

# Wait for all processes to finish before continuing join each processess.
for p in processes:
    p.join()
```
or


```
import multiprocessing

# Define a list of arguments to be passed to target_functions.
args = [arg1, arg2, arg3]

# Create a list of multiprocessing Process objects, each targeting target_functions with an argument.
processes = [multiprocessing.Process(target=target_functions, args=(arg,)) for arg in args]

# Start each process, causing them to execute target_functions with their respective arguments concurrently.
for p in processes:
    p.start()

# Wait for all processes to finish before continuing.
for p in processes:
    p.join()
```

Notice that we also use .join() for processes. The reason we use the method is exactly the same!

Both approaches are effective. For the rest of the lesson, we will use the first approach. We may ask you to try the second approach later on.

Note: you can also use the second approach for the threading module!



In [None]:
import time
import multiprocessing

# Define a function to print a greeting with a sleep delay.
# Synchronously sleep for 2 seconds (blocking delay).
def greeting_with_sleep(string):
    print(string)
    time.sleep(2)
    print(string + " says hello!")

# Define the main multiprocessing function.
def main_multiprocessing():
    # Record the start time for performance measurement.
    s = time.perf_counter()

    # Create an empty list to store the process objects.
    processes = []

    # Define a list of greetings.
    greetings = ['Codecademy', 'Chelsea', 'Hisham', 'Ashley']

    # Iterate through the list of greetings and create a separate process for each.
    for i in range(len(greetings)):
        # Create a multiprocessing Process targeting the greeting_with_sleep function.
        p = multiprocessing.Process(target=greeting_with_sleep, args=(greetings[i],))

        # Append the created process to the processes list.
        processes.append(p)

        # Start the process, causing it to execute greeting_with_sleep for a greeting.
        p.start()

    # Wait for each process to finish before measuring elapsed time.
    for p in processes:
        p.join()

    # Calculate the elapsed time since the start.
    elapsed = time.perf_counter() - s

    # Print the total time taken by the multiprocessing execution.
    print("Multiprocessing Elapsed Time: " + str(elapsed) + " seconds")

# Call the main multiprocessing function to start the program.
main_multiprocessing()


Codecademy
Chelsea
Hisham
Ashley
Codecademy says hello!
Chelsea says hello!
Hisham says hello!
Ashley says hello!
Multiprocessing Elapsed Time: 2.0981327630015585 seconds


Explanation:

1. The greeting_with_sleep function is called in multiple processes for each greeting. It starts by printing each greeting, followed by a 2-second sleep, and then prints the greeting followed by " says hello!".

1. The greetings are printed in parallel in different processes, so the order in which they appear in the output may vary.

1. After all processes have completed, the total elapsed time for multiprocessing execution is printed, which is approximately 2 seconds in this case due to the 2-second sleep delay in each process.

Output:
```
Codecademy
Chelsea
Hisham
Ashley
Codecademy says hello!
Chelsea says hello!
Hisham says hello!
Ashley says hello!
Multiprocessing Elapsed Time: 2.0981327630015585 seconds
```


### Review
Congrats on making it through the Concurrent Programming in Python lesson! We have covered:
- Creating a thread in Python
- Creating and joining multiple threads in Python
- Using async/await in Python
- Creating and joining processes in Python

Next up, we have a fun project for you to practise these skills and compare their performances. Happy coding!


Ex

1.


  ![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20CONCURRENT%20PROGRAMMING%20IN%20PYTHON%201.png)


2.
```
import threading
t= threading.Thread(target= sum_function , args=(l1,l2,l3)]
t.start()
```

3. asyncio uses coroutine function


4.

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20CONCURRENT%20PROGRAMMING%20IN%20PYTHON%202.png)


5. What does asyncio.gather() do?

  Asyncio.gather() groups asynchronous task together and allows them to be run concurrently  

6. Benifit to the threading, asyncio and multiprocessing moduels?
  - they increase the processing speed of a program
  - they can help clean up the design of your program






## Deploying a Simple Python Script With Flask
Learn how to deploy a program in Python using Flask!




### Introduction
In this article, we will cover how to locally deploy a Python application using Flask, including:
- The importance of deployment.
- How to install Flask using pip.

### What is deployment?
After we have designed, built, and thoroughly tested our application, it is time to package it up and ready it for a production state. This process is called deployment. This allows our application to successfully execute in an environment other than our development and test environments.

There are several methods for deploying applications, but in this article, we will dive into deploying a simple web application to a local server using the Python web framework, Flask.

Flask is a lightweight framework for developing web applications. It provides an efficient and easy way to deploy web-based projects. Flask offers a built-in development server that we can use to test and deploy web application code locally (which we will cover in this article). It also includes useful features like templating that allow us to render HTML templates and provide the functional back-end code for template components.


### Installing Flask

To use the Flask library for deployment, we must first install it in our environment. The following steps will show how to install Flask on a Windows or macOS environment.

To install Flask, we will use pip, the package manager for Python that allows us to install and manage Python packages that are not a part of the Python standard library. If you have Python installed on your computer, you most likely have pip automatically installed. However, if you need to install it, follow the steps in pip’s documentation.

Open a command line terminal in your environment where Python3 is already installed and set up. First, let’s make sure we are running the latest, upgraded version of pip by running the following command:

```
pip install --upgrade pip
```
If a newer version of pip is available, it will download and install as shown in the screenshots below:

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20Deploying%20a%20Simple%20Python%20Script%20With%20Flask%201.png)


Now that we have the latest version of pip installed, let’s install Flask. Run the following command:
```
pip install flask
```
Flask will download and install as shown in the below screenshot:

![link text](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/PYTHON%20Deploying%20a%20Simple%20Python%20Script%20With%20Flask%202.png)


### Understanding the code

Take a look at the following calculator application code. We must first import the Flask library into our application in order to use it later on.

```
app = Flask(__name__)
@app.route('/')
def welcome():
  return "Welcome to the Codecademy Calculator!"
```
Next, we create a Flask instance and assign it to the variable named app. We can create several routes using the **`@app.route`** **decorator**.

1. These routes correspond to the URL web request made to access different pages of the web application.

2. The function that follows the **decorator** executes upon the web request. The main URL of our application is denoted by `/`.

3. If we want to **add additional URL routes** for `/division` or `/multiplication`, we can define routes in our Flask application for each.






In [None]:
# allows user to input two numbers
# Prompt the user to enter the first number and convert it to an integer.
num1 = int(input('Enter your first number: '))
# Prompt the user to enter the second number and convert it to an integer.
num2 = int(input('Enter your second number: '))

@app.route('/division')
def division():
  # Define a route for division, which will be accessible via a web application.
  # Return a string that includes the result of dividing num1 by num2.
  return "Now dividing " + str(num1) + " and " + str(num2) + "! The result is: {result}".format(result=str(num1/num2))

@app.route('/multiplication')
def multiplication():
  # Define a route for multiplication, which will be accessible via a web application.
  # Return a string that includes the result of multiplying num1 and num2.
  return "Now multiplying " + str(num1) + " and " + str(num2) + "! The result is: {result}".format(result=str(num1*num2))

# Calling app.run() will start the application server allowing the application
## to load in the web browser.
app.run()




Let’s review the following code:

- num1 and num2 allow users to input two numbers to multiply or divide.
- @app.route('/division') specifies the web request with /division.
- division() is a function that executes in the /division route. This function divides the values num1 and num2.
- @app.route('/multiplication') specifies the web request with /multiplication.
- multiplication() is a function that executes in the /multiplication route. This function multiplies the values num1 and num2.
- Finally, we need to add the last line to our program: `app.run()`
- Calling `app.run()` will start the application server allowing the application to load in the web browser.


### Running the code
Now, it’s time to run our program and see Flask come to life! You can copy and paste the code above into a Python file .

Once you have the code ready on your computer, **run** it `flask --app app run`.

You should see the following:
```
Enter your first number:
```
After you enter two numbers, you should see this:
```
* Serving Flask app 'app' (lazy loading)
* Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
* Debug mode: off
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
```

Visit the localhost http://127.0.0.1:5000/ on your browser, and the following page should appear:
![Lifecycle](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Deployment_1.png)

To go to the /division or /multiplication route, you can add division or multiplication after the final /. If you wanted to see the division() function executed, go to http://127.0.0.1:5000/division, and you should see something like the following:
![Lifecycle](https://raw.githubusercontent.com/Hardi-Lore/codeacademy-notebook/main/pictures/Deployment_2.png)

#### Conclusion
Deploying allows us to quickly package up our application to be easily configured and installed into environments other than where we developed the application. This was a brief introduction to deployment with Flask. If you want to learn more, you can dive deeper into creating web applications in our Learn Flask course.




There are various methods for deploying applications, and the choice of method depends on various factors, such as the nature of the application, the target platform, the intended audience, the deployment environment, and the deployment timeline. Some common methods for deploying applications are:

1. Manual deployment: In this method, the application is manually installed and configured on each target system. This method is suitable for small-scale deployments and simple applications but can be time-consuming and error-prone for larger deployments.

2. Scripted deployment: In this method, the application deployment process is automated using scripts or configuration management tools. This method is more efficient than manual deployment and allows for greater consistency and repeatability.

3. Containerization: In this method, the application is packaged into a container along with its dependencies and runtime environment. This container can then be deployed to any platform that supports the containerization technology, such as Docker or Kubernetes. Containerization provides greater portability and scalability than other deployment methods.

4. Virtual machine deployment: In this method, the application is deployed on a virtual machine (VM) running on a host system. This method provides greater isolation and security than other deployment methods and allows for greater flexibility in terms of the target platform.

5. Cloud deployment: In this method, the application is deployed to a cloud platform such as AWS, Azure, or Google Cloud. Cloud deployment provides greater scalability, reliability, and availability than other deployment methods and can also be more cost-effective for certain types of applications.

6. Serverless deployment: In this method, the application code is deployed to a serverless platform such as AWS Lambda or Azure Functions. The platform automatically manages the runtime environment and resources required to run the application, and users are only charged for the actual usage of the application. This method can be more cost-effective and scalable than other deployment methods for certain types of applications.
