# Python modules

## Creating, importing

We can create module by simply creating a file with the .py extension (don't use spaces in the name).

Let's create a `products.py` file.

⚠️⚠️ **The file must be in the same folder as the Jupyter Notebook.** ⚠️⚠️

Inside, let's write a list for the valid product categories.

```python
valid_categories = [
    "vegetable",
    "fruit",
    "pet",
    "house",
    "bread",
    "cleaning"
]
```

Save the file.

We can now import that module into our notebook!

In [5]:
import products

In [10]:
products.valid_categories

['vegetable', 'fruit', 'pet', 'house', 'bread', 'cleaning']

Let's go back to our module and add coffee to the category list:
```python
valid_categories = [
    "vegetable",
    "fruit",
    "pet",
    "house",
    "bread",
    "cleaning",
    "coffee",
]
```

Save the file.

Now check again, here, the values of the valid_categories.

In [11]:
products.valid_categories

['vegetable', 'fruit', 'pet', 'house', 'bread', 'cleaning']

## Reloading

As you can see, the new category does not appear in our notebook.

The change we did was after we imported the module. We can try to run `import products` again, but that won't work, because Python detects we've already imported that module, and, to save on computation, it does nothing.

To fix this, we'd have to reload the Notebook kernel (Kernel > Restart Kernek...). This means losing all variables we've loaded and computations.

An alternative is to use the importlib which allows us to reimport a given module. That allows us to get the updated module without restarting the kernel.

In [16]:
import importlib
importlib.reload(products)

<module 'products' from 'C:\\Users\\DASilva\\Documents\\GitHub\\diogoaos\\content\\slides\\iseg_python\\code\\lesson3\\products.py'>

In [17]:
products.valid_categories

['vegetable', 'fruit', 'pet', 'house', 'bread', 'cleaning', 'coffee']

## Creating fake data

Let's create some made up transactions for a day.

In [3]:
import random

# transactions
n_transactions = 200
min_price, max_price = 2, 50
transaction_prices = [round(random.random() * random.randint(min_price,max_price), 2)
                      for i in range(n_transactions)]
transaction_categories = random.choices(products.valid_categories, k=n_transactions)
transactions = list(zip(transaction_prices, transaction_categories))

NameError: name 'products' is not defined

In [66]:
# let's see part of the result
transactions[:20]

[(1.67, 'pet'),
 (38.9, 'vegetable'),
 (9.27, 'fruit'),
 (19.78, 'cleaning'),
 (4.31, 'bread'),
 (20.98, 'bread'),
 (1.62, 'bread'),
 (2.63, 'coffee'),
 (1.06, 'bread'),
 (2.97, 'house'),
 (2.66, 'house'),
 (2.2, 'pet'),
 (2.9, 'coffee'),
 (1.7, 'house'),
 (3.81, 'cleaning'),
 (5.63, 'fruit'),
 (2.65, 'house'),
 (0.62, 'house'),
 (35.82, 'pet'),
 (7.32, 'fruit')]

## Should we have more house products?

To answer this question, let's compute the percentage of revenue that comes from "house" transactions.

In [48]:
day_revenue = sum([p for p,c in transactions])
print(f"day_revenue={day_revenue}")

house_revenue = sum([p for p,c in transactions if c == "house"])
print(f"house_revenue={house_revenue}")

house_percent = house_revenue / day_revenue * 100

print(f"{house_percent :.1f}% of revenue comes from transactions of category house")

day_revenue=2358.4700000000025
house_revenue=467.90999999999985
19.8% of revenue comes from transactions of category house


This is something we might do often, and for other categories too, so it makes sense to have a function for it.

It also makes sense to have that function in our products module, because it can be reused across programs and analysis notebooks, by different people.

Let's add the following function to our `products.py` module.

```python
def category_revenue(transactions, category):
    total = sum([p for p,c in transactions])
    cat_total = sum([p for p,c in transactions if c == category])
    return cat_total / total * 100
```

Save the file, reload the module and check what is the **% of revenue that comes from pet transactions**.

In [58]:
importlib.reload(products)

products.category_revenue(transactions, "pet")

12.540757355404125

Let's do the same for all categories:

In [62]:
for cat in products.valid_categories:
    print(f"{products.category_revenue(transactions, cat) :.1f}% of revenue from {cat}")

12.4% of revenue from vegetable
13.8% of revenue from fruit
12.5% of revenue from pet
19.8% of revenue from house
13.8% of revenue from bread
15.8% of revenue from cleaning
11.9% of revenue from coffee


## Importing directly into the program's namespace

We can also import a module's functions, classes and variables directly into our notebook/program, without having to type the module's name.

To test this:
1. **restart the kernel (Kernel > Restart kernel...)**
2. go back and **rerun the cell for creating fake data (and only that one)**
3. run the following cells

In [64]:
from products import category_revenue

In [67]:
category_revenue(transactions, "pet")

10.677846521220014

This way, we can use the function directly by its name, without writing the module's name.

We can also import the module, but give it a different name. This is usually useful when a module's name is long or is very frequently used.

In [68]:
import products as prod

In [69]:
category_revenue(transactions, "pet")

10.677846521220014

That's why pandas is often imported as pd and numpy as np.

```python
import pandas as pd
import numpy as np
```

# Python packages

## Creating

As we grow our modules in size and number, we might want to group them as well. We can do that with packages.

Let's say we want to group all our modules in a single package with the name of the company `RetailXY`.

1. Create a folder named `retailxy`, in the same directory as this notebook.
2. Inside that folder, create a file names `__init__.py`.
3. Save it and leave it blank.
4. Move the `products.py` module inside the `retailxy` folder.
5. Restart the kernel.
6. Execute the following cells.


## Importing 

In [1]:
import retailxy

In [2]:
retailxy.products

AttributeError: module 'retailxy' has no attribute 'products'

In [4]:
import retailxy.products

- As you can see, after importing just `retailxy`, we can't access products directly.
- We need to explicitly import the `products` module within the `retailxy` package.
- We don't need to import just `retailxy` before importing `retailxy.products`.

There are ways to change this behaviourm by what you've learned so far is enough for allowing a good organization of modules.

## Test with practical example

In [5]:
import random

# transactions
n_transactions = 200
min_price, max_price = 2, 50
transaction_prices = [round(random.random() * random.randint(min_price,max_price), 2)
                      for i in range(n_transactions)]
transaction_categories = random.choices(retailxy.products.valid_categories, k=n_transactions)
transactions = list(zip(transaction_prices, transaction_categories))

In [6]:
retailxy.products.category_revenue(transactions, "pet")

11.26611887482757


We could also add more packages inside our `retailxy` package, e.g.

```text
retailxy/
 |- __init__.py
 |- products.py
 |
 |- marketing/
 |    |- __init__.py
 |    |- discounts.py
 |    |- promotions.py
 |
 |- logistics/
 |    |- __init__.py
 |    |- fleet.py
 |    |- inventory.py
 |
 | data_science/
 |    |- __init__.py
 |    |- forecasts.py
 |    |- segmentation.py
```