# Match with suffix `in ground`

There are many flows which are exactly the same, but where ecoinvent has the suffix `, in ground` which is missing in SimaPro, e.g.

* Copper, 0.52% in sulfide, Cu 0.27% and Mo 8.2E-3% in crude ore
* Copper, 0.52% in sulfide, Cu 0.27% and Mo 8.2E-3% in crude ore, in ground

These are all natural resources; `('Resource', 'in ground')` in SimaPro, `('natural resource', 'in ground')` in ecoinvent.

In [1]:
import pandas as pd
from pathlib import Path

In [22]:
input_data_dir = (Path.cwd().parent / "Mapping" / "Input" / "Flowlists").resolve()
output_dir = (Path.cwd().parent / "Contribute").resolve()

In [3]:
sp = pd.read_csv(input_data_dir / 'SimaProv9.4.csv')

Add additional column to `sp` with `in ground`

In [4]:
sp['plus_in_ground'] = sp.Flowable.apply(lambda x: x + ", in ground")

Filter to only consider natural resources:

In [5]:
sp = sp[sp.Context == 'Raw materials']

In [6]:
ei = pd.read_csv(input_data_dir / 'ecoinventEFv3.7.csv')

In [7]:
ei.columns

Index(['Flowable', 'CASNo', 'Formula', 'Synonyms', 'Unit', 'Class',
       'ExternalReference', 'Preferred', 'Context', 'FlowUUID', 'AltUnit',
       'Unnamed: 11', 'Second CAS'],
      dtype='object')

In [9]:
df = sp.merge(ei, how="inner", left_on="plus_in_ground", right_on="Flowable")
len(df)

167

Adjust columns to match expectd format

In [10]:
df = df.rename(columns={
    'Flowable_x': 'SourceFlowName', 
    'Flow UUID': 'SourceFlowUUID', 
    'Unit_x': 'SourceUnit',
    'FlowUUID': 'TargetFlowUUID',
    'Context_y': 'TargetFlowContext',
    'Unit_y': 'TargetUnit',
    'Flowable_y': 'TargetFlowName',
    'Unit_y': 'TargetUnit',
})

Add some useful columns:

In [11]:
df['SourceListName'] = 'SimaPro9.4'
df['TargetListName'] = 'ecoinventEFv3.7'
df['SourceFlowContext'] = 'Resource/in ground'
df['MatchCondition'] = "="
df['Mapper'] = 'Chris Mutel'
df['MemoMapper'] = 'Automated match. Notebook: Match - Match with suffix in ground'
df['Verifier'] = ''

In [23]:
def export_dataframe(df, name):
    SPEC_COLUMNS = [
        "SourceListName", "SourceFlowName", "SourceFlowUUID", "SourceFlowContext", "SourceUnit", 
        "MatchCondition", "ConversionFactor", "TargetListName", "TargetFlowName", "TargetFlowUUID", 
        "TargetFlowContext", "TargetUnit", "Mapper", "Verifier", "LastUpdated", "MemoMapper", 
        "MemoVerifier", "MemoSource", "MemoTarget"
    ]
    # df = df.drop(columns=set(df.columns).difference(SPEC_COLUMNS))
    df = df[[col for col in SPEC_COLUMNS if col in df.columns]]
    df.to_csv(output_dir / name, index=False)

In [24]:
export_dataframe(df, "with_in_ground.csv")