## Pandas Metadata Issue

### Goal

The aim sounds simple: implement a pandas extension that computes some additional columns, and adds a metadata attribute to them.

For example:

```python
>> region = Region('GBR')
>> region_data = region.data # Data with columns like ['confirmed', 'deaths', 'recovered']
>> region_data = region_data.covid.smooth() # Adds ['confirmed_smooth', 'deaths_smooth', 'recovered_smooth']

# And then...
>>> region_data.confirmed_smooth.attrs # Contains some metadata, in this case arguments used to create the smoothed data:
{
    'smooth': (
        'weak',
        (
            {'window': 9, 'center': True, 'win_type': 'gaussian', 'min_periods': 1},
            {'mean_std': 3},
        ),
    )
}

# Or with a custom attribute instead of attrs:
>>> region_data.confirmed_smooth.oscovida_metadata
{
    'smooth': (
        'weak',
        (
            {'window': 9, 'center': True, 'win_type': 'gaussian', 'min_periods': 1},
            {'mean_std': 3},
        ),
    )
}
```

But apparently this kind of thing is actually not easy to implement...

### Minimum Example

In [1]:
import pandas as pd

In [2]:
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df

Unnamed: 0,col1,col2
0,1,3
1,2,4


In [3]:
@pd.api.extensions.register_series_accessor('test')
class TestSeriesAccessor:
    def __init__(self, pandas_object: pd.Series):
        self._obj = pandas_object
    
    def metacs(self) -> pd.Series:
        res = self._obj.cumsum()
        res.attrs['meta'] = f"Metadata attached to cumsum column {self._obj.name}"
        return res

In [4]:
col1_with_meta = df.col1.test.metacs()
col1_with_meta

0    1
1    3
Name: col1, dtype: int64

In [5]:
col1_with_meta.attrs

{'meta': 'Metadata attached to cumsum column col1'}

Well that was easy!

Bbbbuuuttt try to use this within a dataframe and:

In [6]:
df['col1_with_meta'] = col1_with_meta

In [7]:
df.col1_with_meta.attrs

{}

RIP metadata.

### Troubleshooting

In [47]:
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
df

Unnamed: 0,col1,col2
0,1,3
1,2,4


In [50]:
col1_with_meta = df.col1.test.metacs()
df = df.assign(col1_with_meta=col1_with_meta)

In [51]:
col1_with_meta.attrs
df['col1_with_meta'].attrs

{}

In [52]:
df['col1_with_meta'].__finalize__(col1_with_meta)
df['col1_with_meta'].attrs

{'meta': 'Metadata attached to cumsum column col1'}

In [53]:
col2_with_meta = df.col2.test.metacs()
df = df.assign(col2_with_meta=col2_with_meta)

In [54]:
df['col2_with_meta'].__finalize__(col2_with_meta)
df['col2_with_meta'].attrs

{'meta': 'Metadata attached to cumsum column col2'}

In [55]:
df['col1_with_meta'].attrs

{}

Woo it worked! How about doing this to another column?

In [25]:
col2_with_meta = df.col2.test.metacs()
col2_with_meta.attrs

{'meta': 'Metadata attached to cumsum column col2'}

In [10]:
# Force the finalize step?
print("Stored attrs:", col2_with_meta.attrs) # Check that the metadata is still there
df['col2_with_meta'] = col2_with_meta
df['col2_with_meta'].__finalize__(col2_with_meta)
print("df col2 meta attrs: ", df.col2_with_meta.attrs)

Stored attrs: {'meta': 'Metadata attached to cumsum column col2'}
df col2 meta attrs:  {'meta': 'Metadata attached to cumsum column col2'}


Awesome!

In [11]:
print("df col1 meta attrs: ", df.col1_with_meta.attrs)
print("df col2 meta attrs: ", df.col2_with_meta.attrs)

df col1 meta attrs:  {}
df col2 meta attrs:  {'meta': 'Metadata attached to cumsum column col2'}


![](https://media2.giphy.com/media/3oriO5t2QB4IPKgxHi/200.gif)

### Troubleshooting 2: Electric Boogaloo

In [12]:
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})

In [13]:
col1_with_meta = df.col1.test.metacs()
df['col1_with_meta'] = col1_with_meta
df['col1_with_meta'].__finalize__(col1_with_meta)
print("df col1 meta attrs: ", df.col1_with_meta.attrs)
print("df col1 meta id:", id(df.col1_with_meta.attrs))

df col1 meta attrs:  {'meta': 'Metadata attached to cumsum column col1'}
df col1 meta id: 139777223281344


In [14]:
col1_with_meta = df.col2.test.metacs()
df['col2_with_meta'] = col2_with_meta
df['col2_with_meta'].__finalize__(col2_with_meta)
print("df col2 meta attrs: ", df.col2_with_meta.attrs)
print("df col2 meta id:", id(df.col2_with_meta.attrs))

df col2 meta attrs:  {'meta': 'Metadata attached to cumsum column col2'}
df col2 meta id: 139777223288320


In [15]:
print("df col1 meta id:", id(df.col1_with_meta.attrs)) # Changes for some reason
print("df col2 meta id:", id(df.col2_with_meta.attrs))

df col1 meta id: 139777223281152
df col2 meta id: 139777223288320


In [16]:
print("df col1 meta attrs: ", df.col1_with_meta.attrs)
print("df col2 meta attrs: ", df.col2_with_meta.attrs)

df col1 meta attrs:  {}
df col2 meta attrs:  {'meta': 'Metadata attached to cumsum column col2'}


### Troubleshooting 3

In [17]:
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})

In [18]:
col1_with_meta = df.col1.test.metacs()
df['col1_with_meta'] = col1_with_meta
print("df col1 meta id:", id(df.col1_with_meta.attrs))
df['col1_with_meta'].__finalize__(col1_with_meta)
print("df col1 meta id:", id(df.col1_with_meta.attrs))

df col1 meta id: 139777223305088
df col1 meta id: 139777223305088


In [19]:
col1_with_meta = df.col2.test.metacs()
print("df col1 meta id:", id(df.col1_with_meta.attrs))
df['col2_with_meta'] = col2_with_meta
print("df col1 meta id:", id(df.col1_with_meta.attrs), "!!! col1 attrs changes after col2 is added 😭😭😭") # But why though
df['col2_with_meta'].__finalize__(col2_with_meta)
print("df col1 meta id:", id(df.col1_with_meta.attrs))

df col1 meta id: 139777223305088
df col1 meta id: 139777222784960 !!! col1 attrs changes after col2 is added 😭😭😭
df col1 meta id: 139777222784960


### Troubleshooting 4

In [122]:
@pd.api.extensions.register_series_accessor('test2')
class TestSeriesAccessor:
    def __init__(self, pandas_object: pd.Series):
        self._obj = pandas_object
        # `_metadata` is a list of strings, strings in it refer to
        # attributes which *should* be kept during standard pandas
        # operations. If you add an attribute manually and don't add
        # it to the _metadata list it will be lost
        if 'oscovida_metadata' not in self._obj._metadata:
            self._obj._metadata.append('oscovida_metadata')
        
        if not hasattr(self._obj, 'oscovida_metadata'):
            self._obj.oscovida_metadata = {}
    
    def metacs(self) -> pd.Series:
        res = self._obj.cumsum()
        res.oscovida_metadata['meta'] = f"Metadata attached to cumsum column {self._obj.name}"
        return res

  class TestSeriesAccessor:


In [123]:
df = pd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})

In [124]:
col1_with_meta = df.col1.test2.metacs()
col1_with_meta.oscovida_metadata

{'meta': 'Metadata attached to cumsum column col1'}

In [125]:
df['col1_with_meta'] = col1_with_metacol1_with_meta = df.col1.test2.metacs()
col1_with_meta.oscovida_metadatacol1_with_meta = df.col1.test2.metacs()
col1_with_meta.oscovida_metadata

{'meta': 'Metadata attached to cumsum column col1'}

In [126]:
df['col1_with_meta'].oscovida_metadata

AttributeError: 'Series' object has no attribute 'oscovida_metadata'

ok...

In [127]:
df['col1_with_meta'].__finalize__(col1_with_meta)

0    1
1    3
Name: col1, dtype: int64

In [128]:
df['col1_with_meta'].oscovida_metadata

{'meta': 'Metadata attached to cumsum column col1'}

Progress!

In [129]:
col2_with_meta = df.col2.test2.metacs()
df['col2_with_meta'] = col2_with_meta
df['col2_with_meta'].__finalize__(col2_with_meta)

0    3
1    7
Name: col2, dtype: int64

In [134]:
df['col2_with_meta'].test2.oscovida_metadata

AttributeError: 'TestSeriesAccessor' object has no attribute 'oscovida_metadata'

In [131]:
df['col1_with_meta'].oscovida_metadata

AttributeError: 'Series' object has no attribute 'oscovida_metadata'

Aaannndd it's gone again

![](https://en.meming.world/images/en/thumb/6/6e/Surprised_Pikachu.jpg/300px-Surprised_Pikachu.jpg)

### Accepting Defeat

Before I make an issue for pandas, does anybody have a clue why this happens? Am I missing some obvious thing in python that everybody else knows about and I've somehow never encountered so far or what?

In [1]:
import pandas as pd

In [2]:
class SubclassedSeries(pd.Series):

    @property
    def _constructor(self):
        print("SubclassedSeries._constructor:", self.attrs)
        return SubclassedSeries

    @property
    def _constructor_expanddim(self):
        print("SubclassedSeries._constructor_expanddim:", self.attrs)
        return SubclassedDataFrame


class SubclassedDataFrame(pd.DataFrame):
    @property
    def _constructor(self):
        print("SubclassedDataFrame._constructor:", self.attrs)
        return SubclassedDataFrame

    @property
    def _constructor_sliced(self):
        print("SubclassedDataFrame._constructor_expanddim:", self.attrs)
        return SubclassedSeries

In [3]:
@pd.api.extensions.register_series_accessor('test')
class TestSeriesAccessor:
    def __init__(self, pandas_object: SubclassedSeries):
        self._obj = pandas_object
        # `_metadata` is a list of strings, strings in it refer to
        # attributes which *should* be kept during standard pandas
        # operations. If you add an attribute manually and don't add
        # it to the _metadata list it will be lost
    
    def metacs(self) -> SubclassedSeries:
        res = self._obj.cumsum()
        res.attrs['meta'] = f"Metadata attached to cumsum column {self._obj.name}"
        print("TestSeriesAccessor.metacs:", res.attrs)
        return res

In [4]:
df = SubclassedDataFrame({'col1': [1, 2], 'col2': [3, 4]})

In [5]:
df.attrs['df_meta'] = True

In [6]:
col1_m = df.col1.test.metacs()

SubclassedDataFrame._constructor_expanddim: {'df_meta': True}
SubclassedSeries._constructor: {}
TestSeriesAccessor.metacs: {'meta': 'Metadata attached to cumsum column col1'}


In [7]:
col1_m.attrs

{'meta': 'Metadata attached to cumsum column col1'}

In [8]:
df = df.assign(col1_m=col1_m)

SubclassedDataFrame._constructor: {'df_meta': True}


In [9]:
df.col1_m.attrs

SubclassedDataFrame._constructor_expanddim: {'df_meta': True}


{}

In [10]:
df.col1_m.__finalize__(col1_m)

0    1
1    3
Name: col1, dtype: int64

In [11]:
df.col1_m.attrs

{'meta': 'Metadata attached to cumsum column col1'}

In [12]:
col2_m = df.col2.test.metacs()

SubclassedDataFrame._constructor_expanddim: {'df_meta': True}
SubclassedSeries._constructor: {}
TestSeriesAccessor.metacs: {'meta': 'Metadata attached to cumsum column col2'}


In [13]:
df = df.assign(col2_m=col2_m)

SubclassedDataFrame._constructor: {'df_meta': True}


In [14]:
df.col2_m.attrs

SubclassedDataFrame._constructor_expanddim: {'df_meta': True}


{}

In [15]:
df.col2_m.__finalize__(col2_m)

0    3
1    7
Name: col2, dtype: int64

In [16]:
df.col2_m.attrs

{'meta': 'Metadata attached to cumsum column col2'}

In [17]:
df.col1_m.attrs

SubclassedDataFrame._constructor_expanddim: {'df_meta': True}


{}