In [None]:
#| default_exp desapher

# DeSAPher

> SAP Data Dictionary Explorer. Decipher cryptic SAP column names to something more descriptive.  

## The Problem
Working with SAP systems often means dealing with hundreds of tables with cryptic names like MARA, VBAK, or MAKT. Understanding what these tables contain and how they relate to each other traditionally requires extensive documentation lookup or SAP expertise.

## Our Solution
We created a tool that automatically fetches and structures SAP table definitions from online documentation. This gives us instant access to:
- Field names and their meanings
- Data types and lengths
- Relationships between tables (via check tables)
- Comprehensive descriptions of each field
- Table names and descriptions

## Benefits
- **Time Saving**: No more manual documentation lookups
- **Better Understanding**: Clear visibility of table structures and meanings
- **Easier Data Analysis**: Quick reference for field names and their purposes
- **Knowledge Sharing**: Makes SAP data structures more accessible to team members

## Future Directions

### 1. Interactive Interface
- Build a searchable interface for quick table/field lookups
- Implement full-text search across descriptions
- Add semantic search using LLMs to find relevant tables by describing needs in plain English

### 2. Visual Data Model
- Create interactive graph visualizations showing table relationships
- Highlight primary/foreign key connections
- Enable visual exploration of the SAP data model

### 3. AI-Powered Data Assistant
- Use LLMs with our structured documentation as context
- Generate SQL queries from natural language questions
- Suggest relevant tables for specific business questions
- Provide data model explanations in plain language

# Our plan

As data scientists working with ERP systems like SAP, we often encounter large datasets with hundreds of cryptically named columns. To better understand these data sources, we aim to web scrape SAP's data dictionary documentation to create programmatic access to table definitions, column descriptions, and data types.

## Steps
1. Scrape column description for one table (for example, MARA)
2. Bring it in to a easily serachable format
3. Apply process to all tables


### Scrape column description for one table (for example, MARA)

In [None]:
url = 'https://www.sapdatasheet.org/abap/tabl/mara.html'

In [None]:
#| export
from mis_analytics.core import *
import httpx
from bs4 import BeautifulSoup
import pandas as pd

In [None]:
response = httpx.get(url)
response

<Response [200 OK]>

In [None]:
sample_text = '''
<table class="table table-sm">
                                <caption class="text-right sapds-alv">
                                    <a href="/download/abap-tabl-component.php?format=csv&amp;tabname=MARA" title="Download components as CSV file.
The downloaded file contains more columns than displayed here." target="_blank">
                                        <img src="/abap/icon/s_wdvtxe.gif"></a> &nbsp;
                                    <a href="/download/abap-tabl-component.php?format=xls&amp;tabname=MARA" title="Download components as Excel 97-2003 Worksheet (.xls) file.
The downloaded file contains more columns than displayed here." target="_blank">
                                        <img src="/abap/icon/s_x__xls.gif"></a> &nbsp;
                                    <a href="/download/abap-tabl-component.php?format=xlsx&amp;tabname=MARA" title="Download components as Excel Open XML Format Spreadsheet (.xlsx) file.
The downloaded file contains more columns than displayed here." target="_blank">
                                        <img src="/abap/icon/s_lisvie.gif"></a> &nbsp;
                                </caption>
                                <thead>
                                    <tr>
                                        <th class="sapds-alv"> <img src="/abap/icon/s_b_pvre.gif"> </th>
                                        <th class="sapds-alv"> Field </th>
                                        <th class="sapds-alv"> Key </th>
                                        <th class="sapds-alv"> Data Element</th>
                                        <th class="sapds-alv"> Domain</th>
                                        <th class="sapds-alv"> Data<br>Type</th>
                                        <th class="sapds-alv"> Length</th>
                                        <th class="sapds-alv"> Decimal<br>Places</th>
                                        <th class="sapds-alv"> Short Description</th>
                                        <th class="sapds-alv"> Check<br>table</th>
                                    </tr>
                                </thead>
                                <tbody>
                                                                            <tr>
                                            <td class="sapds-alv"> <a id="FIELD_MANDT"></a> 1 </td>
                                            <td class="sapds-alv"> <img src="/abap/icon/s_struct.gif">                                                 <a href="/abap/tabl/mara-mandt.html" title="MANDT" target="_blank">MANDT</a> </td>
                                            <td class="sapds-alv text-center"> <input type="checkbox" name="field_MANDT" disabled="disabled" checked="checked"> </td>
                                            <td class="sapds-alv"> <a href="/abap/dtel/mandt.html" title="Client" target="_blank">MANDT</a> </td>
                                            <td class="sapds-alv"> <a href="/abap/doma/mandt.html" title="Client (key field in client-specific tables)" target="_blank">MANDT</a> </td>
                                            <td class="sapds-alv"> <a href="/abap/doma/datatype.html#values" title="Dictionary Data Type" target="_blank">CLNT</a> </td>
                                            <td class="sapds-alv text-right"> 3 &nbsp; </td>
                                            <td class="sapds-alv text-right"> 0 &nbsp; </td>
                                            <td class="sapds-alv"> Client </td>
                                            <td class="sapds-alv"> <a href="/abap/tabl/t000.html" title="Clients" target="_blank">T000</a> </td>
                                        </tr>
                                                                            <tr>
                                            <td class="sapds-alv"> <a id="FIELD_FASHGRD"></a> 239 </td>
                                            <td class="sapds-alv"> <img src="/abap/icon/s_struct.gif">                                                 <a href="/abap/tabl/mara-fashgrd.html" title="FASHGRD" target="_blank">FASHGRD</a> </td>
                                            <td class="sapds-alv text-center"> <input type="checkbox" name="field_FASHGRD" disabled="disabled"> </td>
                                            <td class="sapds-alv"> <a href="/abap/dtel/fashgrd.html" title="Fashion Grade" target="_blank">FASHGRD</a> </td>
                                            <td class="sapds-alv"> <a href="/abap/doma/fashgrd.html" title="Fashion Grade" target="_blank">FASHGRD</a> </td>
                                            <td class="sapds-alv"> <a href="/abap/doma/datatype.html#values" title="Dictionary Data Type" target="_blank">CHAR</a> </td>
                                            <td class="sapds-alv text-right"> 4 &nbsp; </td>
                                            <td class="sapds-alv text-right"> 0 &nbsp; </td>
                                            <td class="sapds-alv"> Fashion Grade </td>
                                            <td class="sapds-alv"> <a href="/abap/tabl/t6wfg.html" title="Degree of Fashion" target="_blank">T6WFG</a> </td>
                                        </tr>
                                                                    </tbody>
                            </table>'''

In [None]:
soup = BeautifulSoup(response.text, 'lxml')
table = soup.find('table', class_='table table-sm')

### Bring it in to a easily serachable format

In [None]:
headers = [th.text.strip() for th in table.find('thead').find_all('th')]
headers

['',
 'Field',
 'Key',
 'Data Element',
 'Domain',
 'DataType',
 'Length',
 'DecimalPlaces',
 'Short Description',
 'Checktable']

In [None]:
trs = [tr for tr in table.find('tbody').find_all('tr')]
trs[0]

<tr>
<td class="sapds-alv"> <a id="FIELD_MANDT"></a> 1 </td>
<td class="sapds-alv"> <img src="/abap/icon/s_struct.gif"/> <a href="/abap/tabl/mara-mandt.html" target="_blank" title="MANDT">MANDT</a> </td>
<td class="sapds-alv text-center"> <input checked="checked" disabled="disabled" name="field_MANDT" type="checkbox"/> </td>
<td class="sapds-alv"> <a href="/abap/dtel/mandt.html" target="_blank" title="Client">MANDT</a> </td>
<td class="sapds-alv"> <a href="/abap/doma/mandt.html" target="_blank" title="Client (key field in client-specific tables)">MANDT</a> </td>
<td class="sapds-alv"> <a href="/abap/doma/datatype.html#values" target="_blank" title="Dictionary Data Type">CLNT</a> </td>
<td class="sapds-alv text-right"> 3   </td>
<td class="sapds-alv text-right"> 0   </td>
<td class="sapds-alv"> Client </td>
<td class="sapds-alv"> <a href="/abap/tabl/t000.html" target="_blank" title="Clients">T000</a> </td>
</tr>

In [None]:
trs[1]

<tr>
<td class="sapds-alv"> <a id="FIELD_MATNR"></a> 2 </td>
<td class="sapds-alv"> <img src="/abap/icon/s_struct.gif"/> <a href="/abap/tabl/mara-matnr.html" target="_blank" title="MATNR">MATNR</a> </td>
<td class="sapds-alv text-center"> <input checked="checked" disabled="disabled" name="field_MATNR" type="checkbox"/> </td>
<td class="sapds-alv"> <a href="/abap/dtel/matnr.html" target="_blank" title="Material Number">MATNR</a> </td>
<td class="sapds-alv"> <a href="/abap/doma/matnr.html" target="_blank" title="Material number (field C18)">MATNR</a> </td>
<td class="sapds-alv"> <a href="/abap/doma/datatype.html#values" target="_blank" title="Dictionary Data Type">CHAR</a> </td>
<td class="sapds-alv text-right"> 18   </td>
<td class="sapds-alv text-right"> 0   </td>
<td class="sapds-alv"> Material Number </td>
<td class="sapds-alv">   </td>
</tr>

In [None]:
[td.text.strip() for td in trs[1].find_all('td')]

['2', 'MATNR', '', 'MATNR', 'MATNR', 'CHAR', '18', '0', 'Material Number', '']

In [None]:
rows = []
for tr in table.find('tbody').find_all('tr'):
    row = [td.text.strip() for td in tr.find_all('td')]
    rows.append(row)

rows[10]

['11',
 'MTART',
 '',
 'MTART',
 'MTART',
 'CHAR',
 '4',
 '0',
 'Material type',
 'T134']

In [None]:
df = pd.DataFrame(rows, columns=headers)
df.head()

Unnamed: 0,Unnamed: 1,Field,Key,Data Element,Domain,DataType,Length,DecimalPlaces,Short Description,Checktable
0,1,MANDT,,MANDT,MANDT,CLNT,3,0,Client,T000
1,2,MATNR,,MATNR,MATNR,CHAR,18,0,Material Number,
2,3,.INCLUDE,,,,,0,0,Data Division MARA,
3,4,ERSDA,,ERSDA,DATUM,DATS,8,0,Created On,
4,5,ERNAM,,ERNAM,USNAM,CHAR,12,0,Name of Person who Created the Object,


In [None]:
#| export
def get_sap_table_structure(url):
    """
    Scrapes SAP table structure from sapdatasheet.org and returns a pandas DataFrame
    """
    import httpx
    from bs4 import BeautifulSoup
    import pandas as pd
    
    response = httpx.get(url)
    soup = BeautifulSoup(response.text, 'lxml')
    
    table = soup.find('table', class_='table table-sm')
    headers = [th.text.strip() for th in table.find('thead').find_all('th')]
    
    rows = []
    for tr in table.find('tbody').find_all('tr'):
        row = [td.text.strip() for td in tr.find_all('td')]
        rows.append(row)
        
    return pd.DataFrame(rows, columns=headers)

In [None]:
df = get_sap_table_structure(url)
df.head()

Unnamed: 0,Unnamed: 1,Field,Key,Data Element,Domain,DataType,Length,DecimalPlaces,Short Description,Checktable
0,1,MANDT,,MANDT,MANDT,CLNT,3,0,Client,T000
1,2,MATNR,,MATNR,MATNR,CHAR,18,0,Material Number,
2,3,.INCLUDE,,,,,0,0,Data Division MARA,
3,4,ERSDA,,ERSDA,DATUM,DATS,8,0,Created On,
4,5,ERNAM,,ERNAM,USNAM,CHAR,12,0,Name of Person who Created the Object,


In [None]:
url_2 = 'https://www.sapdatasheet.org/abap/tabl/makt.html'

df_2 = get_sap_table_structure(url_2)
df_2

Unnamed: 0,Unnamed: 1,Field,Key,Data Element,Domain,DataType,Length,DecimalPlaces,Short Description,Checktable
0,1,MANDT,,MANDT,MANDT,CLNT,3,0,Client,T000
1,2,MATNR,,MATNR,MATNR,CHAR,18,0,Material Number,MARA
2,3,SPRAS,,SPRAS,SPRAS,LANG,1,0,Language Key,T002
3,4,MAKTX,,MAKTX,TEXT40,CHAR,40,0,Material description,
4,5,MAKTG,,MAKTG,CHAR40,CHAR,40,0,Material description in upper case for matchcodes,


### Apply process to all tables

In [None]:
#| export
def get_sap_table_url(table_name):
    """
    Constructs sapdatasheet.org URL from SAP table name
    """
    return f'https://www.sapdatasheet.org/abap/tabl/{table_name.lower()}.html'

In [None]:
get_sap_table_url('VBAK')

'https://www.sapdatasheet.org/abap/tabl/vbak.html'

In [None]:
#| export
def get_sap_tables_structure(tables):
    """ Gets structure for multiple SAP tables and combines them into one DataFrame with a column indicating the source table """
    dfs = []
    for table in tables:
        url = get_sap_table_url(table)
        df = get_sap_table_structure(url)
        df['Table'] = table
        dfs.append(df)
    
    return pd.concat(dfs, ignore_index=True)

In [None]:
#| export
def get_sap_table_structure(url):
    """
    Scrapes SAP table structure from sapdatasheet.org and returns a pandas DataFrame
    Returns None if table not found or other error occurs
    """
    try:
        import httpx
        from bs4 import BeautifulSoup
        import pandas as pd
        
        response = httpx.get(url)
        response.raise_for_status()  # Raise error for bad status codes
        
        soup = BeautifulSoup(response.text, 'lxml')
        table = soup.find('table', class_='table table-sm')
        
        if table is None:
            print(f"No table found at {url}")
            return None
            
        headers = [th.text.strip() for th in table.find('thead').find_all('th')]
        
        rows = []
        for tr in table.find('tbody').find_all('tr'):
            row = [td.text.strip() for td in tr.find_all('td')]
            rows.append(row)
            
        return pd.DataFrame(rows, columns=headers)
        
    except Exception as e:
        print(f"Error processing {url}: {str(e)}")
        return None


In [None]:
#| export
def get_sap_tables_structure(tables):
    """ Gets structure for multiple SAP tables and combines them into one DataFrame with a column indicating the source table """
    dfs = []
    for table in tables:
        url = get_sap_table_url(table)
        df = get_sap_table_structure(url)
        if df is not None:
            df['Table'] = table
            dfs.append(df)
    
    if not dfs:
        return None
    return pd.concat(dfs, ignore_index=True)

In [None]:
tables = ['MARC', 'MARD', 'MARM', 'MBEW']

In [None]:
df = get_sap_tables_structure(tables)
df.sample(10)

Unnamed: 0,Unnamed: 1,Field,Key,Data Element,Domain,DataType,Length,DecimalPlaces,Short Description,Checktable,Table
259,31,DISKZ,,DISKZ,DISKZ,CHAR,1,0,Storage location MRP indicator,,MARD
350,41,BWPRH,,BWPRH,WERT11,CURR,11,2,Valuation price based on commercial law: level 1,,MBEW
151,152,CCFIX,,CCFIX,XFELD,CHAR,1,0,CC indicator is fixed,,MARC
314,5,.INCLUDE,,,,,0,0,Data Division MBEW,,MBEW
417,108,QKLAS,,QKLAS,BKLAS,CHAR,4,0,Valuation Class for Project Stock,T025,MBEW
207,208,UCMAT,,VBOB_OB_RFMAT,MATNR,CHAR,18,0,Reference Material for Original Batches,MARA,MARC
268,40,KINSM,,KINSM,MENG13V,QUAN,13,3,Consignment stock in quality inspection,,MARD
170,171,FVIDK,,CK_VERID,VERID,CHAR,4,0,Production version to be costed,*,MARC
0,1,MANDT,,MANDT,MANDT,CLNT,3,0,Client,T000,MARC
230,2,MATNR,,MATNR,MATNR,CHAR,18,0,Material Number,MARA,MARD


### Let's also grab the table description for each table.

In [None]:
url

'https://www.sapdatasheet.org/abap/tabl/mara.html'

In [None]:
#| export
def get_sap_table_description(url):
    """
    Scrapes SAP table description from sapdatasheet.org
    Returns None if not found or error occurs
    """
    try:
        import httpx
        from bs4 import BeautifulSoup
        
        response = httpx.get(url)
        response.raise_for_status()
        soup = BeautifulSoup(response.text, 'lxml')
        header = soup.find('div', class_='card-header sapds-card-header')
        
        if header:
            description = header.text.strip()
            return description
            
        return None
        
    except Exception as e:
        print(f"Error getting description from {url}: {str(e)}")
        return None

In [None]:
get_sap_table_description(url)

'SAP ABAP Table MARA (General Material Data)'

In [None]:
#| export
def get_sap_tables_structure(tables):
    """ Gets structure for multiple SAP tables and combines them into one DataFrame with a column indicating the source table """
    dfs = []
    for table in tables:
        url = get_sap_table_url(table)
        df = get_sap_table_structure(url)
        if df is not None:
            df['Table'] = table
            df['Table Description'] = get_sap_table_description(url)
            dfs.append(df)
    
    if not dfs:
        return None
    return pd.concat(dfs, ignore_index=True)

In [None]:
sap_sheet = get_sap_tables_structure(tables)
sap_sheet.sample(10)

Unnamed: 0,Unnamed: 1,Field,Key,Data Element,Domain,DataType,Length,DecimalPlaces,Short Description,Checktable,Table,Table Description
389,80,HRKFT,,HRKFT,HRKFT,CHAR,4,0,Origin Group as Subdivision of Cost Element,TKKH1,MBEW,SAP ABAP Table MBEW (Material Valuation)
0,1,MANDT,,MANDT,MANDT,CLNT,3,0,Client,T000,MARC,SAP ABAP Table MARC (Plant Data for Material)
77,78,USEQU,,USEQU,USEQU,CHAR,1,0,Quota arrangement usage,TMQ2,MARC,SAP ABAP Table MARC (Plant Data for Material)
103,104,FXHOR,,FXHOR,FXHOR,NUMC,3,0,Planning time fence,,MARC,SAP ABAP Table MARC (Plant Data for Material)
262,34,LBSTF,,LBSTF,MENG13,QUAN,13,3,Replenishment quantity for storage location MRP,,MARD,SAP ABAP Table MARD (Storage Location Data for...
310,1,MANDT,,MANDT,MANDT,CLNT,3,0,Client,T000,MBEW,SAP ABAP Table MBEW (Material Valuation)
427,118,OIPPINV,,JV_PPINV,JV_PPINV,CHAR,1,0,Prepaid Inventory Flag for Material Valuation ...,,MBEW,SAP ABAP Table MBEW (Material Valuation)
63,64,INSMK,,INSMK_MAT,QKZ,CHAR,1,0,Post to Inspection Stock,,MARC,SAP ABAP Table MARC (Plant Data for Material)
162,163,MDACH,,MDACH,MDACH,CHAR,2,0,Action control: planned order processing,T46AC,MARC,SAP ABAP Table MARC (Plant Data for Material)
372,63,KALSC,,KALSC,KALSC,CHAR,6,0,Overhead key (deactivated),,MBEW,SAP ABAP Table MBEW (Material Valuation)


In [None]:
sap_sheet['Table Description'].unique()

array(['SAP ABAP Table MARC (Plant Data for Material)',
       'SAP ABAP Table MARD (Storage Location Data for Material)',
       'SAP ABAP Table MARM (Units of Measure for Material)',
       'SAP ABAP Table MBEW (Material Valuation)'], dtype=object)

In [None]:
data = {
    'MANDT': [100, 100, 100],
    'MATNR': ['MAT001', 'MAT002', 'MAT003'],
    'WERKS': ['1000', '1000', '2000'],
    'PSTAT': ['KVEB', 'KVEB', 'KVEB'],
    'LVORM': [None, None, None],
    'BWTTY': [None, None, None],
    'MMSTA': ['1', '1', '9'],
    'MMSTD': [20200101, 20200115, 20210201],
    'MAABC': ['A', 'B', None],
    'KZKRI': [None, None, None],
    'EKGRP': ['100', '100', '200'],
    'DISPO': ['001', '001', '002'],
    'BESKZ': ['E', 'E', 'X'],
    'SOBSL': [None, None, None],
    'EISBE': [10.0, 20.0, 15.0],
    'MABST': [100.0, 200.0, 150.0],
    'ALTSL': [None, None, None],
    'KZAUS': [None, None, None],
    'AUSDT': [0, 0, 0],
    'NFMAT': [None, None, None],
    'KZBED': [None, 'T', None],
    'RGEKZ': [None, None, None],
    'FEVOR': ['G01', None, 'G02'],
    'BASMG': [1.0, 1.0, 1.0],
    'STAWN': ['84141025', '84141025', '84148073'],
    'HERKL': ['DE', 'DE', 'US'],
    'HERKR': ['05', '05', '16'],
    'EXPME': ['ST', 'ST', 'ST'],
    'MTVER': ['1', '1', '1'],
    'PRCTR': ['PC100', 'PC100', 'PC200'],
    'VERKZ': [None, 'X', None],
    'STLAL': [None, None, None],
    'STLAN': [None, None, None],
    'PLNNR': [None, None, None],
    'APLAL': [None, None, None],
    'FRTME': [None, None, None],
    'LGPRO': ['1001', '1001', '2001'],
    'DISGR': ['2000', '2000', '6000'],
    'SERNP': [None, None, None],
    'PREFE': [None, None, None],
    'PRENE': [None, None, None],
    'SCHGT': [None, None, None],
    'MCRUE': ['X', 'X', 'X'],
    'LFGJA': [2024, 2024, 2025],
    'EISLO': [0.0, 0.0, 0.0],
    'TARGET_STOCK': [50.0, 100.0, 75.0],
    'SCM_SCOST': [0.0, 0.0, 0.0],
    'SCM_LSUOM': [None, None, None],
    'SCM_STRA1': [None, None, None],
}

df_marc = pd.DataFrame(data)
df_marc

Unnamed: 0,MANDT,MATNR,WERKS,PSTAT,LVORM,BWTTY,MMSTA,MMSTD,MAABC,KZKRI,...,PREFE,PRENE,SCHGT,MCRUE,LFGJA,EISLO,TARGET_STOCK,SCM_SCOST,SCM_LSUOM,SCM_STRA1
0,100,MAT001,1000,KVEB,,,1,20200101,A,,...,,,,X,2024,0.0,50.0,0.0,,
1,100,MAT002,1000,KVEB,,,1,20200115,B,,...,,,,X,2024,0.0,100.0,0.0,,
2,100,MAT003,2000,KVEB,,,9,20210201,,,...,,,,X,2025,0.0,75.0,0.0,,


In [None]:
#| export
sap_to_pandas_dtype = {
    "CLNT": "str",               # Client (typically a 3-character fixed string)
    "CHAR": "str",               # Character-based fields
    "NUMC": "str",               # Numeric character field (stored as a string)
    "DATS": "datetime64[ns]",    # Date field (YYYYMMDD format)
    "CUKY": "str",               # Currency Key
    "CURR": "float64",           # Currency amount (with decimal places)
    "TIMS": "str",               # Time field (HHMMSS format, can be converted to time)
    "QUAN": "float64",           # Quantity field (can include decimal places)
    "UNIT": "str",               # Unit of measure
    "DEC": "float64",            # Decimal field (can also be int if no decimal places)
    "LANG": "str",               # Language key
    "INT2": "int16",             # Small integer (5-digit max)
    "INT4": "int32",             # Standard integer (10-digit max)
    "ACCP": "str",               # Accounting period (stored as a string)
    "RAW": "bytes",              # Raw binary data (e.g., UUIDs)
    "FLTP": "float64"            # Floating-point number
}

In [None]:
#| export
def convert_sap_types(df: pd.DataFrame, sap_sheet: pd.DataFrame) -> pd.DataFrame:
    """
    Converts the columns of the 'df' DataFrame to the correct data types based on 'sap_sheet'.
    
    :param df: Pandas DataFrame containing SAP table data (e.g., "df").
    :param sap_sheet: Pandas DataFrame containing SAP metadata with column data types.
    :return: Converted Pandas DataFrame.
    """
    sap_sheet = sap_sheet.rename(columns=lambda x: x.strip())
    column_type_mapping = sap_sheet.set_index("Field")["DataType"].to_dict()

    for column, sap_type in column_type_mapping.items():
        if column in df.columns:
            pandas_dtype = sap_to_pandas_dtype.get(sap_type, "str")
            try:
                if pandas_dtype == "datetime64[ns]": df[column] = pd.to_datetime(df[column], errors="coerce")
                elif pandas_dtype == "float64": df[column] = pd.to_numeric(df[column], errors="coerce")
                elif pandas_dtype == "int16": df[column] = pd.to_numeric(df[column], errors="coerce").astype("Int16")
                elif pandas_dtype == "int32": df[column] = pd.to_numeric(df[column], errors="coerce").astype("Int32")
                elif pandas_dtype == "bytes": df[column] = df[column].astype("string").apply(lambda x: x.encode() if pd.notna(x) else None)
                else: df[column] = df[column].astype(pandas_dtype)
            except Exception as e:
                print(f"Error converting column {column} to {pandas_dtype}: {e}")
    return df

In [None]:
df_marc.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 49 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   MANDT         3 non-null      int64  
 1   MATNR         3 non-null      object 
 2   WERKS         3 non-null      object 
 3   PSTAT         3 non-null      object 
 4   LVORM         0 non-null      object 
 5   BWTTY         0 non-null      object 
 6   MMSTA         3 non-null      object 
 7   MMSTD         3 non-null      int64  
 8   MAABC         2 non-null      object 
 9   KZKRI         0 non-null      object 
 10  EKGRP         3 non-null      object 
 11  DISPO         3 non-null      object 
 12  BESKZ         3 non-null      object 
 13  SOBSL         0 non-null      object 
 14  EISBE         3 non-null      float64
 15  MABST         3 non-null      float64
 16  ALTSL         0 non-null      object 
 17  KZAUS         0 non-null      object 
 18  AUSDT         3 non-null      int6

In [None]:
df_marc_converted = convert_sap_types(df_marc, sap_sheet)
df_marc_converted.head()

Unnamed: 0,MANDT,MATNR,WERKS,PSTAT,LVORM,BWTTY,MMSTA,MMSTD,MAABC,KZKRI,...,PREFE,PRENE,SCHGT,MCRUE,LFGJA,EISLO,TARGET_STOCK,SCM_SCOST,SCM_LSUOM,SCM_STRA1
0,100,MAT001,1000,KVEB,,,1,1970-01-01 00:00:00.020200101,A,,...,,,,X,2024,0.0,50.0,0.0,,
1,100,MAT002,1000,KVEB,,,1,1970-01-01 00:00:00.020200115,B,,...,,,,X,2024,0.0,100.0,0.0,,
2,100,MAT003,2000,KVEB,,,9,1970-01-01 00:00:00.020210201,,,...,,,,X,2025,0.0,75.0,0.0,,


In [None]:
df_marc_converted.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 49 columns):
 #   Column        Non-Null Count  Dtype         
---  ------        --------------  -----         
 0   MANDT         3 non-null      object        
 1   MATNR         3 non-null      object        
 2   WERKS         3 non-null      object        
 3   PSTAT         3 non-null      object        
 4   LVORM         3 non-null      object        
 5   BWTTY         3 non-null      object        
 6   MMSTA         3 non-null      object        
 7   MMSTD         3 non-null      datetime64[ns]
 8   MAABC         3 non-null      object        
 9   KZKRI         3 non-null      object        
 10  EKGRP         3 non-null      object        
 11  DISPO         3 non-null      object        
 12  BESKZ         3 non-null      object        
 13  SOBSL         3 non-null      object        
 14  EISBE         3 non-null      float64       
 15  MABST         3 non-null      float64       

In [None]:
#| export
def rename_sap_columns(df: pd.DataFrame, sap_sheet: pd.DataFrame) -> pd.DataFrame:
    """
    Renames the columns in the 'df' DataFrame using the 'Short Description' from 'sap_sheet'.
    If a column is not found in 'sap_sheet', it remains unchanged.

    :param df: Pandas DataFrame containing SAP table data (e.g., "df").
    :param sap_sheet: Pandas DataFrame containing SAP metadata with column names and short descriptions.
    :return: DataFrame with renamed columns.
    """
    sap_sheet = sap_sheet.rename(columns=lambda x: x.strip())
    column_name_mapping = sap_sheet.set_index("Field")["Short Description"].to_dict()
    return df.rename(columns=lambda col: column_name_mapping.get(col, col))

In [None]:
df_marc_converted = rename_sap_columns(df_marc_converted, sap_sheet)
df_marc_converted.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 49 columns):
 #   Column                                                       Non-Null Count  Dtype         
---  ------                                                       --------------  -----         
 0   Client                                                       3 non-null      object        
 1   Material Number                                              3 non-null      object        
 2   Plant                                                        3 non-null      object        
 3   Maintenance status                                           3 non-null      object        
 4   Deletion flag for all material data of a valuation type      3 non-null      object        
 5   Valuation Category                                           3 non-null      object        
 6   Plant-Specific Material Status                               3 non-null      object        
 7   Date from which the p

In [None]:
df_marc_converted = clean_col_names(df_marc_converted)
df_marc_converted.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 49 columns):
 #   Column                                                       Non-Null Count  Dtype         
---  ------                                                       --------------  -----         
 0   client                                                       3 non-null      object        
 1   material_number                                              3 non-null      object        
 2   plant                                                        3 non-null      object        
 3   maintenance_status                                           3 non-null      object        
 4   deletion_flag_for_all_material_data_of_a_valuation_type      3 non-null      object        
 5   valuation_category                                           3 non-null      object        
 6   plant_specific_material_status                               3 non-null      object        
 7   date_from_which_the_p