# Post Processing

Post processing extractions on the cogs project is handled almost exclusively by the python pandas library.

There are multiple sources for information on pandas, so for the sake of this guide we'll just focus on a few commonly recurring patterns.

As we're only concerned about post processing at this point, we'll start with by loading an already flatenned csv into pandas.


In [16]:
#! Runable Cell

# Load it in via pandas.
# Note - this gets us to the same point (a dataframe of flat data) as ConversionSegment().topandas() would
import pandas as pd
df = pd.read_csv("./images/ExampleDatabakerOutput.csv")

df[:10] # 10 line preview, alter to preview as much as you want

Unnamed: 0,observation,Assets,Name,Group
0,1,Houses,John,Beatles
1,6,Cars,John,Beatles
2,1,Businesses,John,Beatles
3,2,Houses,Paul,Beatles
4,4,Cars,Paul,Beatles
5,6,Businesses,Paul,Beatles
6,3,Houses,George,Beatles
7,3,Cars,George,Beatles
8,2,Businesses,George,Beatles
9,8,Houses,Ringo,Beatles


## Example Operations

From here we're just going to rapid fire demonstrate some simple techniques.

Feel free to run them, alter them and run them again.

# 1.) Pandas Function Apply

In [17]:
# String Replacment
# Replace the name "John" with "James" in the "Name" dimension
# Operates agains each cell in turn

def replace(value):
    if value == "John":
        return "James"
    return value

df["Name"] = df["Name"].apply(replace)
df[:5]

Unnamed: 0,observation,Assets,Name,Group
0,1,Houses,James,Beatles
1,6,Cars,James,Beatles
2,1,Businesses,James,Beatles
3,2,Houses,Paul,Beatles
4,4,Cars,Paul,Beatles


## 2.) Using a lambda function

In [18]:
# String Replacment
# Replace the name "Paul" with "Pete" in the "Name" dimension
# Operates agains each cell in turn

df["Name"] = df["Name"].map(lambda x: x.replace("Paul", "Pete"))
df[:5]

Unnamed: 0,observation,Assets,Name,Group
0,1,Houses,James,Beatles
1,6,Cars,James,Beatles
2,1,Businesses,James,Beatles
3,2,Houses,Pete,Beatles
4,4,Cars,Pete,Beatles


## 3.) Whole Series Operations

In [20]:
# String Replacment
# Replace every occurance of "Cars" in the Assets dimension, with "Bikes"
# Operates against whole series

# the Assets column value for rows where assets == Cars now = Bikes
df.loc[df["Assets"] == "Cars"] = "Bikes"
df[:5]

Unnamed: 0,observation,Assets,Name,Group
0,1,Houses,James,Beatles
1,6,Bikes,James,Beatles
2,1,Businesses,James,Beatles
3,2,Houses,Pete,Beatles
4,4,Bikes,Pete,Beatles
