# Question from Lilia Schuster

> Sometimes we have to apply operations on parts of DataFrames [here it mostly consists of a merge operation].
> Until now I had always solve that by splitting then reuniting the DataFrame. Is there a better solution such as filtering and applying?

_Note: I translated the original notebook from German to English_

## Example: wrangling of a FE*-Network


\***F**inite **E**lement

In [1]:
import pandas as pd
import numpy as np

# Load pickles

There are two DataFrames:

1. The first one contains elements and the ids of the nodes (corners) of each element
2. The second DataFrame contains the nodes coordinates

![Elemente](./src/all_elements.png)


![Knoten](./src/element_eid2.png)

In [2]:
df_elements= pd.read_pickle('./src/mymesh_elements.pkl')
df_nodes= pd.read_pickle('./src/mymesh_nodes.pkl')

display(df_elements.head())
df_nodes.head()

Unnamed: 0,eid,pid,n1,n2,n3,n4,n5,n6,n7,n8
0,1,1,1,2,11,10,4,5,14,13
1,2,1,2,3,12,11,5,6,15,14
2,3,1,4,5,14,13,7,8,17,16
3,4,1,5,6,15,14,8,9,18,17
4,5,1,10,11,20,19,13,14,23,22


Unnamed: 0,nid,x,y,z
0,1,0.0,0.0,2.0
1,2,1.0,0.0,2.0
2,3,2.0,0.0,2.0
3,4,0.0,1.0,2.0
4,5,1.0,1.0,2.0


# Elongate `df_elements` to bring it to a similar structure as `df_nodes`, then merge `df_nodes`

Note1: `pd.melt` is similar to `df.stack` but it makes it easier to give names for the columns corresponding to the stacked variables and values (also it returns a df and not a series)

Note2: like Lilia, I was having difficulties because at first I tried bringing `df_nodes` into a similar structure as `df_elements`. I then found doing the opposite much easier.

In [3]:
df_elements_melted = pd.melt(df_elements,
                             id_vars=('eid', 'pid'),
                             value_vars=[c for c in df_elements.columns if c.startswith('n')],
                             var_name='node_id',
                             value_name='subnode_id')
df_elements_with_coordinates = df_elements_melted.merge(right=df_nodes.rename(columns={'nid':'subnode_id'}),
                                                        how='left',
                                                        on='subnode_id')
df_elements_with_coordinates

Unnamed: 0,eid,pid,node_id,subnode_id,x,y,z
0,1,1,n1,1,0.0,0.0,2.0
1,2,1,n1,2,1.0,0.0,2.0
2,3,1,n1,4,0.0,1.0,2.0
3,4,1,n1,5,1.0,1.0,2.0
4,5,1,n1,10,0.0,0.0,1.0
...,...,...,...,...,...,...,...
59,4,1,n8,17,1.0,2.0,1.0
60,5,1,n8,22,0.0,1.0,0.0
61,6,1,n8,23,1.0,1.0,0.0
62,7,1,n8,25,0.0,2.0,0.0


# Pivot to bring it to the desired format (Lilia's requirements)

In [4]:
df_pivot = df_elements_with_coordinates.set_index(['eid', 'pid']).pivot(columns='node_id')
df_pivot.columns = df_pivot.columns.rename('coordinate', level=0) # add name for the first column level
df_pivot

Unnamed: 0_level_0,coordinate,subnode_id,subnode_id,subnode_id,subnode_id,subnode_id,subnode_id,subnode_id,subnode_id,x,x,...,y,y,z,z,z,z,z,z,z,z
Unnamed: 0_level_1,node_id,n1,n2,n3,n4,n5,n6,n7,n8,n1,n2,...,n7,n8,n1,n2,n3,n4,n5,n6,n7,n8
eid,pid,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
1,1,1,2,11,10,4,5,14,13,0.0,1.0,...,1.0,1.0,2.0,2.0,1.0,1.0,2.0,2.0,1.0,1.0
2,1,2,3,12,11,5,6,15,14,1.0,2.0,...,1.0,1.0,2.0,2.0,1.0,1.0,2.0,2.0,1.0,1.0
3,1,4,5,14,13,7,8,17,16,0.0,1.0,...,2.0,2.0,2.0,2.0,1.0,1.0,2.0,2.0,1.0,1.0
4,1,5,6,15,14,8,9,18,17,1.0,2.0,...,2.0,2.0,2.0,2.0,1.0,1.0,2.0,2.0,1.0,1.0
5,1,10,11,20,19,13,14,23,22,0.0,1.0,...,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
6,1,11,12,21,20,14,15,24,23,1.0,2.0,...,1.0,1.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
7,1,13,14,23,22,16,17,26,25,0.0,1.0,...,2.0,2.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0
8,1,14,15,24,23,17,18,27,26,1.0,2.0,...,2.0,2.0,1.0,1.0,0.0,0.0,1.0,1.0,0.0,0.0


We are in presence of a DataFrame with MultiIndex columns (2 levels). It would actually make more sense to have the levels the other way around I think (n1 -> subnode_id, x, y, z) 🤔. Let's do that.

In [5]:
df_result = df_pivot.swaplevel(i=0, j=1, axis='columns').sort_index(axis='columns')
df_result

Unnamed: 0_level_0,node_id,n1,n1,n1,n1,n2,n2,n2,n2,n3,n3,...,n6,n6,n7,n7,n7,n7,n8,n8,n8,n8
Unnamed: 0_level_1,coordinate,subnode_id,x,y,z,subnode_id,x,y,z,subnode_id,x,...,y,z,subnode_id,x,y,z,subnode_id,x,y,z
eid,pid,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2,Unnamed: 22_level_2
1,1,1,0.0,0.0,2.0,2,1.0,0.0,2.0,11,1.0,...,1.0,2.0,14,1.0,1.0,1.0,13,0.0,1.0,1.0
2,1,2,1.0,0.0,2.0,3,2.0,0.0,2.0,12,2.0,...,1.0,2.0,15,2.0,1.0,1.0,14,1.0,1.0,1.0
3,1,4,0.0,1.0,2.0,5,1.0,1.0,2.0,14,1.0,...,2.0,2.0,17,1.0,2.0,1.0,16,0.0,2.0,1.0
4,1,5,1.0,1.0,2.0,6,2.0,1.0,2.0,15,2.0,...,2.0,2.0,18,2.0,2.0,1.0,17,1.0,2.0,1.0
5,1,10,0.0,0.0,1.0,11,1.0,0.0,1.0,20,1.0,...,1.0,1.0,23,1.0,1.0,0.0,22,0.0,1.0,0.0
6,1,11,1.0,0.0,1.0,12,2.0,0.0,1.0,21,2.0,...,1.0,1.0,24,2.0,1.0,0.0,23,1.0,1.0,0.0
7,1,13,0.0,1.0,1.0,14,1.0,1.0,1.0,23,1.0,...,2.0,1.0,26,1.0,2.0,0.0,25,0.0,2.0,0.0
8,1,14,1.0,1.0,1.0,15,2.0,1.0,1.0,24,2.0,...,2.0,1.0,27,2.0,2.0,0.0,26,1.0,2.0,0.0


In [6]:
df_result['n1']

Unnamed: 0_level_0,coordinate,subnode_id,x,y,z
eid,pid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1,1,0.0,0.0,2.0
2,1,2,1.0,0.0,2.0
3,1,4,0.0,1.0,2.0
4,1,5,1.0,1.0,2.0
5,1,10,0.0,0.0,1.0
6,1,11,1.0,0.0,1.0
7,1,13,0.0,1.0,1.0
8,1,14,1.0,1.0,1.0


In [7]:
df_result['n2']

Unnamed: 0_level_0,coordinate,subnode_id,x,y,z
eid,pid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1,2,1.0,0.0,2.0
2,1,3,2.0,0.0,2.0
3,1,5,1.0,1.0,2.0
4,1,6,2.0,1.0,2.0
5,1,11,1.0,0.0,1.0
6,1,12,2.0,0.0,1.0
7,1,14,1.0,1.0,1.0
8,1,15,2.0,1.0,1.0


we can also flatten that so it is easier to work with (selection with MultiIndex columns/index is not trivial)

In [8]:
def rename_columns(tup):
    node_id, metric = tup # level 0 and level 1 of columns
    if metric == 'subnode_id':
        return node_id.replace('n', 'nid')
    elif metric in ('x', 'y', 'z'):
        return metric + node_id.lstrip('n')
    else:
        raise ValueError(f'metric is not one of ("subnode_id", "x", "y", or "z")?? Value was "{metric}"')


df_result_flattened = df_result.copy()
df_result_flattened.columns = df_result_flattened.columns.map(rename_columns)
df_result_flattened

Unnamed: 0_level_0,Unnamed: 1_level_0,nid1,x1,y1,z1,nid2,x2,y2,z2,nid3,x3,...,y6,z6,nid7,x7,y7,z7,nid8,x8,y8,z8
eid,pid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,1,1,0.0,0.0,2.0,2,1.0,0.0,2.0,11,1.0,...,1.0,2.0,14,1.0,1.0,1.0,13,0.0,1.0,1.0
2,1,2,1.0,0.0,2.0,3,2.0,0.0,2.0,12,2.0,...,1.0,2.0,15,2.0,1.0,1.0,14,1.0,1.0,1.0
3,1,4,0.0,1.0,2.0,5,1.0,1.0,2.0,14,1.0,...,2.0,2.0,17,1.0,2.0,1.0,16,0.0,2.0,1.0
4,1,5,1.0,1.0,2.0,6,2.0,1.0,2.0,15,2.0,...,2.0,2.0,18,2.0,2.0,1.0,17,1.0,2.0,1.0
5,1,10,0.0,0.0,1.0,11,1.0,0.0,1.0,20,1.0,...,1.0,1.0,23,1.0,1.0,0.0,22,0.0,1.0,0.0
6,1,11,1.0,0.0,1.0,12,2.0,0.0,1.0,21,2.0,...,1.0,1.0,24,2.0,1.0,0.0,23,1.0,1.0,0.0
7,1,13,0.0,1.0,1.0,14,1.0,1.0,1.0,23,1.0,...,2.0,1.0,26,1.0,2.0,0.0,25,0.0,2.0,0.0
8,1,14,1.0,1.0,1.0,15,2.0,1.0,1.0,24,2.0,...,2.0,1.0,27,2.0,2.0,0.0,26,1.0,2.0,0.0


# Load target pickle and check if results are identical

I created a pickle from the notebook Lilia sent me (where she was using another strategy to obtain this result).

In [9]:
df_target = pd.read_pickle('./src/df_target.pickle')
df_target

Unnamed: 0_level_0,Unnamed: 1_level_0,nid1,x1,y1,z1,nid2,x2,y2,z2,nid3,x3,...,y6,z6,nid7,x7,y7,z7,nid8,x8,y8,z8
eid,pid,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
1,1,1,0.0,0.0,2.0,2,1.0,0.0,2.0,11,1.0,...,1.0,2.0,14,1.0,1.0,1.0,13,0.0,1.0,1.0
2,1,2,1.0,0.0,2.0,3,2.0,0.0,2.0,12,2.0,...,1.0,2.0,15,2.0,1.0,1.0,14,1.0,1.0,1.0
3,1,4,0.0,1.0,2.0,5,1.0,1.0,2.0,14,1.0,...,2.0,2.0,17,1.0,2.0,1.0,16,0.0,2.0,1.0
4,1,5,1.0,1.0,2.0,6,2.0,1.0,2.0,15,2.0,...,2.0,2.0,18,2.0,2.0,1.0,17,1.0,2.0,1.0
5,1,10,0.0,0.0,1.0,11,1.0,0.0,1.0,20,1.0,...,1.0,1.0,23,1.0,1.0,0.0,22,0.0,1.0,0.0
6,1,11,1.0,0.0,1.0,12,2.0,0.0,1.0,21,2.0,...,1.0,1.0,24,2.0,1.0,0.0,23,1.0,1.0,0.0
7,1,13,0.0,1.0,1.0,14,1.0,1.0,1.0,23,1.0,...,2.0,1.0,26,1.0,2.0,0.0,25,0.0,2.0,0.0
8,1,14,1.0,1.0,1.0,15,2.0,1.0,1.0,24,2.0,...,2.0,1.0,27,2.0,2.0,0.0,26,1.0,2.0,0.0


If that does not fail then it worked 😎

In [10]:
pd.testing.assert_frame_equal(df_result_flattened, df_target)