# Using the @property decorator in data science projects
The @property decorator is a simple but useful Python feature that can make your data science code easier to read and write.

## The scenario
Let's say you have just completed a small but useful data science project that looks something like this.

In [29]:
# Model A
# Mock example of a simple data science project

import CustomDataPrep
import CustomFeatureEngineering
import CustomModel

data = CustomDataPrep()
data.data_prep()

features = CustomFeatureEngineering(data)
features.make_features()

model = CustomModel(features)
predictions = model.make_predictions()

ModuleNotFoundError: No module named 'CustomDataPrep'

`Model A`'s code is short and sweet and is written following [Object Oriented Programming (OOP)](https://en.wikipedia.org/wiki/Object-oriented_programming) principles which results in nicely defined and modular classes handling each section of the pipeline.

You see an opportunity to contribute to the team by sharing these modules.
For example, we can imagine that the `CustomDataPrep` class might be useful for another data scientist who also requires data preparation. Likewise, perhaps the `CustomFeatureEngineering` class can make the development cycle for a data scientist starting on a brand new project that much quicker by reimplementing a common set of features.

## Illustration
Let's sketch out how our `CustomDataPrep` class might look in this scenario.
Briefly, let us imagine that this class does the following:
- Records the name of a fictional database `data.db`
- Records a start and end date 
- Does some common data cleaning eg. fill in missing values, convert dtypes
- Saves final data as a pandas dataframe

In [8]:
from datetime import datetime
class CustomDataPrep:
    """
    This class loads our data between a start and end date
    Performs some data cleaning
    Then saves the prepped data as a pandas dataframe
    """
    
    def __init__(self):
        self.data_source = 'data.db'
        self.start_date = datetime(2021,1,1)
        self.end_date = datetime(2021,2,1)
        self.df = None
    
    def data_prep(self):
        "This function loads data, cleans data and saves the final df"
        self.df = load_data(start_date = self.start_date, end_date = self.end_date)
        self.df = clean_data(self.df)

``` {note}
For the sake of brevity, I'll leave out the code for our fictional functions `load_data` and `clean_data`.
```

## Example issues
The next step is simply to share the code with the team! And it's really with good intentions, that we do so, because genuinely, classes like the one in our example do have the potential for increasing everyone's productivity.

However, it is not out of the question that the following example issues crop up when others try using your code.

### Accidentally overwriting or deleting parameters
Perhaps accidentally (within a long script), a crucial parameter was overwritten. This might cause the main script to break, or worse, the script does not break but instead is working on erroneous data that only gets caught much later down the line.

In [21]:
data = CustomDataPrep()

### many lines of code later

data.data_source = 'wrong.db' # accidental overwriting of our data source

data.data_prep() # This either breaks or (worse) silently is working on the wrong data source

### Accessing parameters at the wrong time
When `CustomDataPrep` is initialized, the `df` parameter that will hold our final dataframe is initialized as `None`. Our intention is that the user should first call the `data_prep()` function before trying to access `df`, but nothing actually stops users from accessing `df` at any point in time.

In [51]:
data = CustomDataPrep()

df = data.df # User accessed df but forgot to ran data.data_prep() first

### many lines of code later

df.info() # User discovered df is empty way too late

AttributeError: 'CustomDataPrep' object has no attribute 'df'

## Library code vs. User code
The above examples are my personal experiences of the concerns between writing Library code vs. User code. I first learnt of this situation from a (rather provacatively named 😝) video - [PyData Seattle 2017 - So you want to be a Python Expert?](https://youtu.be/cKPlPJyQrt4) (around the 22:30 mark). 

[![James Powell](http://img.youtube.com/vi/cKPlPJyQrt4/0.jpg)](http://www.youtube.com/watch?v=cKPlPJyQrt4)

The entire section on Library vs User code covers multiple different angles, but in the context of my data science example, I would highlight these key points:

- **User code** is code written to solve a specific use case/business problem. In our example, the models other data scientists/teams are going to write would be user code. 

- **Library code** is code written to be used by different users, with the goal of making these users more productive and for standardisation. In our example, `CustomDataPrep` is the library code.

- There are **many potential pitfalls and gotchas** for both the author of the library code (who cannot control how his/her library code will be used) and the author of the user code (who either should not or could not modify the library code at runtime).

So really, the question I want to answer is - **How do we write our library code to minimize these potential issues/errors**? 

## The @property decorator
The @property decorator a Python built-in, that allows you to have a little more control over the attributes of your classes. Used correctly, it can minimize some of the issues mentioned above. Here are some examples.

### Getters, setters and deleters

So recall in our 1st issue, we had a situation where the user of our library code accidentally overwrote the `data_source` attribute. Suppose as the author of the library code, you wanted to give users the ability to inspect `data_source` but you wanted to protect them from accidentally modifying `data_source`.

In [53]:
class CustomDataPrep:
    """
    This class loads our data between a start and end date
    Performs some data cleaning
    Then saves the prepped data as a pandas dataframe
    """
    
    def __init__(self):
        self._data_source = 'data.db'
        self._start_date = datetime(2021,1,1)
        self._end_date = datetime(2021,2,1)
        self._df = None
    
    @property
    def data_source(self):
        return self._data_source
    

In [54]:
# Users can inspect the data_source attribute
data = CustomDataPrep()
data.data_source

'data.db'

In [55]:
# Users can't accidentally modify the data_source attribute
# If they try to do so, they trigger an immediate error (fail fast!)
data = CustomDataPrep()
data.data_source = 'wrong.db'

AttributeError: can't set attribute

If you wanted to let users modify any attributes, the @property decorator also allows you to extend a setter or deleter method like so.

In [35]:
class CustomDataPrep:
    """
    This class loads our data between a start and end date
    Performs some data cleaning
    Then saves the prepped data as a pandas dataframe
    """
    
    def __init__(self):
        self._data_source = 'data.db'
        self._start_date = datetime(2021,1,1)
        self._end_date = datetime(2021,2,1)
        self._df = None
    
    @property
    def data_source(self):
        return self._data_source
    
    @data_source.setter
    def data_source(self, new_data_source):
        self._data_source = new_data_source
    
    @data_source.deleter
    def data_source(self):
        del self._data_source
    

In [48]:
# Users are now allowed to modify data_source
data = CustomDataPrep()
data.data_source

'data.db'

In [46]:
data.data_source = 'new.db'
data.data_source

'new.db'

In [49]:
# Or delete them
del data.data_source
data.data_source

AttributeError: 'CustomDataPrep' object has no attribute '_data_source'

### Readability
If you have worked with other OOP languages before (eg. Java, C++) this concept of separating what variables are private or public, properly termed as [Encapsulation](https://en.wikipedia.org/wiki/Encapsulation_(computer_programming)), will not be new. Where the @property adds value here is achieving the same result with much less boilerplate, namespace pollution and more readable code.

````{panels}
Using @property
^^^
```
def Class:
    def __init__(self):
        self._a = 'a'
    
    @property
    def a(self):
        return self._a

```
+++

---

Using getter methods
^^^
```
def Class:
    def __init__(self):
        self._a = 'a'
    
    def get_a(self):
        return self._a
```
+++

````

### Encourages the good habit of using underscore prefix
You might have also noticed that when we introduced the property decorator, we also slipped in another change. All the attributes initialized under the `__init__()` method now have an underscore prefix e.g. `_data_source`. 

This is a PEP8 convention that Python uses to signal that this attribute or method belongs privately to its class. Python (prefering to give more flexibility to users) does not enforce private/public distinctions.

So practically, the reason why we use this convention is to give our users a subtle hint that they are accessing a private attribute, and to take an extra second to consider if they are writing safe code.

```{tip}

Also, if you use wildcards when importing e.g. `from module import *`, Python will not import any method with an underscore prefix.

If you'd like to learn more about all the PEP8 naming conventions, you can explore them [here](https://pep8.org/#descriptive-naming-styles)
```

## Conclusion
Ultimately, the dynamics between library and user code covers way more potential issues than what I illustrated in this blog post. But the @property decorator is a small little fix that goes a long way in covering some of the more common issues. I have used it obsessively since learning about it and I hope it provides value for you too.