# ITEMS Table Sales Calculation Exploration

This notebook explores the ITEMS table to understand how sales are calculated, particularly:
- How CREDITUS and DEBITUS are used
- How VAT amounts (CREDITVATAMOUNT, DEBITVATAMOUNT) are calculated
- How FTYPE differentiates sales vs returns
- How the USD calculation formula works: `SUM(CREDITUS - DEBITUS) + SUM(CREDITVATAMOUNT - DEBITVATAMOUNT)`


In [35]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')

# Try to import interbase for direct InterBase connection
try:
    import interbase
    INTERBASE_AVAILABLE = True
    print("‚úÖ InterBase Python driver available for direct connection")
except ImportError:
    INTERBASE_AVAILABLE = False
    print("‚ùå InterBase Python driver not available. Please install: pip install interbase")
    raise ImportError("InterBase library not available")

# Database connection parameters (matching Flask app config)
DATA_SOURCE = "100.200.2.1"
DATABASE_PATH = r"D:\dolly2008\fer2015.dol"
USERNAME = "ALIOSS"
PASSWORD = "Ali@123"  # Updated to match database config
CLIENT_LIBRARY = r"C:\Users\User\Downloads\Compressed\ibclient64-14.1_x86-64\ibclient64-14.1.dll"

def connect_and_load_table(table_name, limit=None):
    """Load a table from the database using direct InterBase connection (same as Flask app)"""
    try:
        print(f"üîÑ Connecting to database for table {table_name}...")
        
        if not INTERBASE_AVAILABLE:
            raise Exception("InterBase Python library not available")
        
        print(f"üîó Using direct InterBase connection for {table_name}...")
        
        # Build direct connection for InterBase (same as Flask app)
        # Format: host:database_path
        dsn = f"{DATA_SOURCE}:{DATABASE_PATH}"
        print(f"üì° DSN: {dsn}")
        
        # Connect with explicit client library (same as Flask app)
        conn = interbase.connect(
            dsn=dsn,
            user=USERNAME,
            password=PASSWORD,
            ib_library_name=CLIENT_LIBRARY,
            charset='NONE'  # Use NONE charset (same as Flask app)
        )
        
        print(f"‚úÖ Direct InterBase connection successful for {table_name}")
        
        # Execute query and fetch data (same as Flask app - no LIMIT in query)
        cursor = conn.cursor()
        cursor.execute(f"SELECT * FROM {table_name}")
        
        # Get column names (same as Flask app)
        columns = [desc[0] for desc in cursor.description]
        
        # Fetch all rows (same as Flask app)
        rows = cursor.fetchall()
        
        # Convert to DataFrame (same as Flask app)
        df = pd.DataFrame(rows, columns=columns)
        
        conn.close()
        
        # Apply limit after loading (for exploration purposes)
        if limit and len(df) > limit:
            print(f"üìä Loaded {len(df):,} rows, taking first {limit:,} for exploration")
            df = df.head(limit)
        
        print(f"‚úÖ {table_name}: {df.shape[0]:,} rows √ó {df.shape[1]} columns (direct connection)")
        return df
        
    except Exception as e:
        print(f"‚ùå {table_name}: Failed to load - {e}")
        print(f"   Error type: {type(e).__name__}")
        print(f"   DSN attempted: {dsn}")
        print(f"   Client Library: {CLIENT_LIBRARY}")
        return None

print("‚úÖ Connection setup complete - Using direct InterBase connection")


‚úÖ InterBase Python driver available for direct connection
‚úÖ Connection setup complete - Using direct InterBase connection


## 1. Load ITEMS Table (Sample Rows)

Let's load a sample of rows from the ITEMS table to examine the structure and data.


In [36]:
# Load sample rows from ITEMS table
items_df = connect_and_load_table('ITEMS', limit=1000)

if items_df is not None:
    print(f"\nüìä ITEMS Table Structure:")
    print(f"   Total columns: {len(items_df.columns)}")
    print(f"\nüìã Column Names:")
    for i, col in enumerate(items_df.columns, 1):
        print(f"   {i:2d}. {col}")
    
    print(f"\nüìä First few rows:")
    display(items_df.head(10))


üîÑ Connecting to database for table ITEMS...
üîó Using direct InterBase connection for ITEMS...
üì° DSN: 100.200.2.1:D:\dolly2008\fer2015.dol
‚úÖ Direct InterBase connection successful for ITEMS
üìä Loaded 3,282,593 rows, taking first 1,000 for exploration
‚úÖ ITEMS: 1,000 rows √ó 54 columns (direct connection)

üìä ITEMS Table Structure:
   Total columns: 54

üìã Column Names:
    1. ID
    2. MID
    3. ITEM
    4. SITE
    5. STTYPE
    6. FRAC
    7. QTY
    8. PACK
    9. PRICE
   10. DISCOUNT
   11. VAT
   12. COSTUS
   13. COSTLC
   14. CATREGORYID
   15. VATAMOUNT
   16. DEBITUS
   17. CREDITUS
   18. BARCODE
   19. BONENO
   20. DEBITQTY
   21. CREDITQTY
   22. YESNO
   23. TOTAL
   24. FDATE
   25. ALLQTY
   26. JOB
   27. SID
   28. SALESMAN
   29. CONTACT
   30. TSITE
   31. DEBITLC
   32. CLC
   33. CREDITLC
   34. STQTY
   35. AUTOCURRFAC
   36. FTYPE
   37. NOVTOTAL
   38. CURRVAL
   39. CURRVALLC
   40. DEPENSE
   41. CARTOON
   42. CARTOONDC
   43. FIDATE
   44.

Unnamed: 0,ID,MID,ITEM,SITE,STTYPE,FRAC,QTY,PACK,PRICE,DISCOUNT,...,FROMBAL,TOBAL,ITCOLOR,EXTRANOTE,MYORDER,PRICEKILO,MYCATEGORYID,POID,DEBITVATAMOUNT,CREDITVATAMOUNT
0,8695710,SI14108717GEM,T400,GEM,-,0.0,1.0,1.0,14.66,0.0,...,-1.0,0.0,,,,,100006.0,,0.0,2.3456
1,9083937,FO15940961AZB,F412,DEP,-,0.0,34000.0,1.0,13.5,0.0,...,0.0,0.0,16777215.0,,,,5059.0,,0.0,0.0
2,10040519,SI14107738BUM,S365,BUM,-,0.0,64.0,1.0,8.12,0.0,...,-64.0,0.0,,,,,5076.0,,0.0,83.1488
3,10040520,SI14107738BUM,P252_A20-LINT,BUM,-,0.0,5.0,1.0,11.68,0.0,...,-5.0,0.0,,,,,5071.0,,0.0,9.344
4,10040521,SI14107738BUM,P202_A20-MINT,BUM,-,0.0,2.0,1.0,11.68,0.0,...,-2.0,0.0,,,,,5071.0,,0.0,3.7376
5,10040522,SI14107738BUM,P009_E5-AR,BUM,-,0.0,3.0,1.0,12.5,0.0,...,-3.0,0.0,,,,,5071.0,,0.0,6.0
6,10040523,SI14107738BUM,P008_E1-AR,BUM,-,0.0,5.0,1.0,2.84,0.0,...,-5.0,0.0,,,,,5071.0,,0.0,2.272
7,10040524,SI14107738BUM,J120,BUM,-,0.0,1.0,1.0,3.1,0.0,...,-1.0,0.0,,,,,100025.0,,0.0,0.496
8,10040525,SI14107738BUM,G00034,BUM,-,0.0,3.0,1.0,11.87,0.0,...,-3.0,0.0,,,,,5079.0,,0.0,5.6976
9,10040526,SI14107738BUM,G00033,BUM,-,0.0,20.0,1.0,15.09,0.0,...,-20.0,0.0,,,,,5079.0,,0.0,48.288


## 2. Examine Key Columns for Sales Calculation

Let's look at the columns that are used in the sales calculation formula.


In [None]:
if items_df is not None:
    # Key columns for sales calculation (using uppercase column names)
    key_columns = [
        'ITEM', 'SID', 'MID', 'FDATE', 'FTYPE', 'QTY', 'QTY1',
        'CREDITUS', 'DEBITUS', 'CREDITVATAMOUNT', 'DEBITVATAMOUNT',
        'SITE', 'PRICE', 'TOTAL'
    ]
    
    # Check which columns exist
    existing_columns = [col for col in key_columns if col in items_df.columns]
    missing_columns = [col for col in key_columns if col not in items_df.columns]
    
    print("‚úÖ Existing key columns:")
    for col in existing_columns:
        print(f"   - {col}")
    
    if missing_columns:
        print("\n‚ö†Ô∏è Missing columns:")
        for col in missing_columns:
            print(f"   - {col}")
    
    # Show sample data with key columns
    if existing_columns:
        print("\nüìä Sample rows with key columns:")
        display(items_df[existing_columns].head(20))


‚úÖ Existing key columns:
   - ITEM
   - SID
   - MID
   - FDATE
   - FTYPE
   - QTY
   - CREDITUS
   - DEBITUS
   - SITE
   - PRICE
   - TOTAL

‚ö†Ô∏è Missing columns:
   - QTY1
   - creditvatamount
   - debitvatamount

üìä Sample rows with key columns:


Unnamed: 0,ITEM,SID,MID,FDATE,FTYPE,QTY,CREDITUS,DEBITUS,SITE,PRICE,TOTAL
0,T400,53020022,SI14108717GEM,2024-03-20,1,1.0,14.66,0.0,GEM,14.66,14.66
1,F412,53020008,FO15940961AZB,2024-05-29,12,34000.0,0.0,0.0,DEP,13.5,459000.0
2,S365,53020014,SI14107738BUM,2024-11-08,1,64.0,519.68,0.0,BUM,8.12,519.68
3,P252_A20-LINT,53020014,SI14107738BUM,2024-11-08,1,5.0,58.4,0.0,BUM,11.68,58.4
4,P202_A20-MINT,53020014,SI14107738BUM,2024-11-08,1,2.0,23.36,0.0,BUM,11.68,23.36
5,P009_E5-AR,53020014,SI14107738BUM,2024-11-08,1,3.0,37.5,0.0,BUM,12.5,37.5
6,P008_E1-AR,53020014,SI14107738BUM,2024-11-08,1,5.0,14.2,0.0,BUM,2.84,14.2
7,J120,53020014,SI14107738BUM,2024-11-08,1,1.0,3.1,0.0,BUM,3.1,3.1
8,G00034,53020014,SI14107738BUM,2024-11-08,1,3.0,35.61,0.0,BUM,11.87,35.61
9,G00033,53020014,SI14107738BUM,2024-11-08,1,20.0,301.8,0.0,BUM,15.09,301.8


## 3. Understand FTYPE Values

FTYPE indicates the transaction type:
- FTYPE = 1: Sales transactions
- FTYPE = 2: Returns transactions


In [None]:
if items_df is not None and 'FTYPE' in items_df.columns:
    print("üìä FTYPE Value Distribution:")
    ftype_counts = items_df['FTYPE'].value_counts().sort_index()
    print(ftype_counts)
    
    print("\nüìä Sample Sales (FTYPE=1):")
    sales_sample = items_df[items_df['FTYPE'] == 1].head(10)
    if 'CREDITUS' in sales_sample.columns and 'DEBITUS' in sales_sample.columns:
        display(sales_sample[['ITEM', 'SID', 'FDATE', 'FTYPE', 'QTY', 'CREDITUS', 'DEBITUS', 
                              'CREDITVATAMOUNT', 'DEBITVATAMOUNT']].head(10))
    
    print("\nüìä Sample Returns (FTYPE=2):")
    returns_sample = items_df[items_df['FTYPE'] == 2].head(10)
    if 'CREDITUS' in returns_sample.columns and 'DEBITUS' in returns_sample.columns:
        display(returns_sample[['ITEM', 'SID', 'FDATE', 'FTYPE', 'QTY', 'CREDITUS', 'DEBITUS',
                                'CREDITVATAMOUNT', 'DEBITVATAMOUNT']].head(10))


üìä FTYPE Value Distribution:
FTYPE
1     203
2       1
3      27
12    222
14     36
15     38
22    473
Name: count, dtype: int64

üìä Sample Sales (FTYPE=1):


KeyError: "['creditvatamount', 'debitvatamount'] not in index"

## 4. Calculate USD Amounts - Step by Step

Let's demonstrate the USD calculation formula:
`QUANTITY_USD = SUM(CREDITUS - DEBITUS) + SUM(CREDITVATAMOUNT - DEBITVATAMOUNT)`


In [None]:
if items_df is not None:
    # Filter for office clients (SID starting with 4112) - like Report 7
    if 'SID' in items_df.columns:
        office_clients = items_df[items_df['SID'].astype(str).str.startswith('4112')].copy()
        print(f"üìä Office clients (SID starting with 4112): {len(office_clients)} rows")
    else:
        office_clients = items_df.copy()
    
    # Fill NaN values
    if 'CREDITUS' in office_clients.columns:
        office_clients['CREDITUS'] = office_clients['CREDITUS'].fillna(0)
    if 'DEBITUS' in office_clients.columns:
        office_clients['DEBITUS'] = office_clients['DEBITUS'].fillna(0)
    if 'CREDITVATAMOUNT' in office_clients.columns:
        office_clients['CREDITVATAMOUNT'] = office_clients['CREDITVATAMOUNT'].fillna(0)
    if 'DEBITVATAMOUNT' in office_clients.columns:
        office_clients['DEBITVATAMOUNT'] = office_clients['DEBITVATAMOUNT'].fillna(0)
    
    # Step 1: Calculate BASE_AMOUNT = CREDITUS - DEBITUS
    if 'CREDITUS' in office_clients.columns and 'DEBITUS' in office_clients.columns:
        office_clients['BASE_AMOUNT'] = office_clients['CREDITUS'] - office_clients['DEBITUS']
        print("\n‚úÖ Step 1: Calculated BASE_AMOUNT = CREDITUS - DEBITUS")
        
        # Show sample calculations
        print("\nüìä Sample BASE_AMOUNT calculations:")
        sample_cols = ['ITEM', 'SID', 'FTYPE', 'CREDITUS', 'DEBITUS', 'BASE_AMOUNT']
        display(office_clients[sample_cols].head(15))
    
    # Step 2: Calculate VAT_AMOUNT = CREDITVATAMOUNT - DEBITVATAMOUNT
    if 'CREDITVATAMOUNT' in office_clients.columns and 'DEBITVATAMOUNT' in office_clients.columns:
        office_clients['VAT_AMOUNT'] = office_clients['CREDITVATAMOUNT'] - office_clients['DEBITVATAMOUNT']
        print("\n‚úÖ Step 2: Calculated VAT_AMOUNT = CREDITVATAMOUNT - DEBITVATAMOUNT")
        
        # Show sample calculations
        print("\nüìä Sample VAT_AMOUNT calculations:")
        vat_cols = ['ITEM', 'SID', 'FTYPE', 'CREDITVATAMOUNT', 'DEBITVATAMOUNT', 'VAT_AMOUNT']
        display(office_clients[vat_cols].head(15))
    
    # Step 3: Calculate QUANTITY_USD = BASE_AMOUNT + VAT_AMOUNT
    if 'BASE_AMOUNT' in office_clients.columns and 'VAT_AMOUNT' in office_clients.columns:
        office_clients['QUANTITY_USD'] = office_clients['BASE_AMOUNT'] + office_clients['VAT_AMOUNT']
        print("\n‚úÖ Step 3: Calculated QUANTITY_USD = BASE_AMOUNT + VAT_AMOUNT")
        
        # Show complete calculation
        print("\nüìä Complete USD calculation sample:")
        usd_cols = ['ITEM', 'SID', 'FTYPE', 'CREDITUS', 'DEBITUS', 'BASE_AMOUNT',
                    'CREDITVATAMOUNT', 'DEBITVATAMOUNT', 'VAT_AMOUNT', 'QUANTITY_USD']
        display(office_clients[usd_cols].head(15))


## 5. Group by Client (SID) - Like Report 7

Let's calculate USD amounts per client, matching the Report 7 logic.


In [None]:
if items_df is not None and 'SID' in office_clients.columns:
    # Group by SID and calculate totals
    if 'BASE_AMOUNT' in office_clients.columns:
        base_amounts = office_clients.groupby('SID')['BASE_AMOUNT'].sum().reset_index()
        base_amounts.columns = ['SID', 'TOTAL_BASE_AMOUNT']
    
    if 'VAT_AMOUNT' in office_clients.columns:
        vat_amounts = office_clients.groupby('SID')['VAT_AMOUNT'].sum().reset_index()
        vat_amounts.columns = ['SID', 'TOTAL_VAT_AMOUNT']
    
    # Merge base and VAT amounts
    if 'TOTAL_BASE_AMOUNT' in base_amounts.columns and 'TOTAL_VAT_AMOUNT' in vat_amounts.columns:
        client_usd = pd.merge(base_amounts, vat_amounts, on='SID', how='outer').fillna(0)
        client_usd['TOTAL_QUANTITY_USD'] = client_usd['TOTAL_BASE_AMOUNT'] + client_usd['TOTAL_VAT_AMOUNT']
        
        print("üìä USD Amounts per Client (Top 20):")
        client_usd_sorted = client_usd.sort_values('TOTAL_QUANTITY_USD', ascending=False)
        display(client_usd_sorted.head(20))
        
        print(f"\nüí∞ Total USD across all clients: ${client_usd['TOTAL_QUANTITY_USD'].sum():,.2f}")
        print(f"üìä Number of clients: {len(client_usd)}")
        print(f"üìä Average USD per client: ${client_usd['TOTAL_QUANTITY_USD'].mean():,.2f}")


## 6. Compare Sales vs Returns

Let's see how sales (FTYPE=1) and returns (FTYPE=2) differ in their USD calculations.


In [None]:
if items_df is not None and 'FTYPE' in office_clients.columns:
    if 'QUANTITY_USD' in office_clients.columns:
        # Group by FTYPE and SID
        ftype_summary = office_clients.groupby(['FTYPE', 'SID']).agg({
            'QUANTITY_USD': 'sum',
            'BASE_AMOUNT': 'sum',
            'VAT_AMOUNT': 'sum',
            'QTY': 'sum' if 'QTY' in office_clients.columns else 'count'
        }).reset_index()
        
        print("üìä Sales (FTYPE=1) vs Returns (FTYPE=2) Summary:")
        
        sales_summary = ftype_summary[ftype_summary['FTYPE'] == 1].groupby('FTYPE').agg({
            'QUANTITY_USD': ['sum', 'mean', 'count'],
            'QTY': 'sum'
        })
        
        returns_summary = ftype_summary[ftype_summary['FTYPE'] == 2].groupby('FTYPE').agg({
            'QUANTITY_USD': ['sum', 'mean', 'count'],
            'QTY': 'sum'
        })
        
        print("\nüí∞ SALES (FTYPE=1):")
        display(sales_summary)
        
        print("\nüí∞ RETURNS (FTYPE=2):")
        display(returns_summary)
        
        # Show sample sales transactions
        print("\nüìä Sample Sales Transactions (FTYPE=1):")
        sales_sample = office_clients[office_clients['FTYPE'] == 1].head(10)
        if 'QUANTITY_USD' in sales_sample.columns:
            display(sales_sample[['ITEM', 'SID', 'FTYPE', 'QTY', 'CREDITUS', 'DEBITUS', 
                                 'CREDITVATAMOUNT', 'DEBITVATAMOUNT', 'BASE_AMOUNT', 'VAT_AMOUNT', 'QUANTITY_USD']])
        
        # Show sample returns transactions
        print("\nüìä Sample Returns Transactions (FTYPE=2):")
        returns_sample = office_clients[office_clients['FTYPE'] == 2].head(10)
        if 'QUANTITY_USD' in returns_sample.columns:
            display(returns_sample[['ITEM', 'SID', 'FTYPE', 'QTY', 'CREDITUS', 'DEBITUS',
                                   'CREDITVATAMOUNT', 'DEBITVATAMOUNT', 'BASE_AMOUNT', 'VAT_AMOUNT', 'QUANTITY_USD']])


## 7. Statistical Summary

Let's get some statistics about the USD amounts and key fields.


In [None]:
if items_df is not None:
    print("üìä Statistical Summary of Key Fields:")
    
    summary_cols = []
    if 'CREDITUS' in office_clients.columns:
        summary_cols.append('CREDITUS')
    if 'DEBITUS' in office_clients.columns:
        summary_cols.append('DEBITUS')
    if 'CREDITVATAMOUNT' in office_clients.columns:
        summary_cols.append('CREDITVATAMOUNT')
    if 'DEBITVATAMOUNT' in office_clients.columns:
        summary_cols.append('DEBITVATAMOUNT')
    if 'BASE_AMOUNT' in office_clients.columns:
        summary_cols.append('BASE_AMOUNT')
    if 'VAT_AMOUNT' in office_clients.columns:
        summary_cols.append('VAT_AMOUNT')
    if 'QUANTITY_USD' in office_clients.columns:
        summary_cols.append('QUANTITY_USD')
    
    if summary_cols:
        display(office_clients[summary_cols].describe())
    
    # Check for negative values
    if 'QUANTITY_USD' in office_clients.columns:
        negative_count = (office_clients['QUANTITY_USD'] < 0).sum()
        positive_count = (office_clients['QUANTITY_USD'] > 0).sum()
        zero_count = (office_clients['QUANTITY_USD'] == 0).sum()
        
        print(f"\nüìä QUANTITY_USD Value Distribution:")
        print(f"   Positive values: {positive_count:,} ({positive_count/len(office_clients)*100:.1f}%)")
        print(f"   Negative values: {negative_count:,} ({negative_count/len(office_clients)*100:.1f}%)")
        print(f"   Zero values: {zero_count:,} ({zero_count/len(office_clients)*100:.1f}%)")


## 8. Formula Verification

Let's verify that the formula matches what's used in Report 7:
`QUANTITY_USD = SUM(CREDITUS - DEBITUS) + SUM(CREDITVATAMOUNT - DEBITVATAMOUNT)`


In [None]:
if items_df is not None:
    print("üîç Formula Verification:")
    print("\nFormula: QUANTITY_USD = SUM(CREDITUS - DEBITUS) + SUM(CREDITVATAMOUNT - DEBITVATAMOUNT)")
    
    # Calculate using the formula
    if all(col in office_clients.columns for col in ['CREDITUS', 'DEBITUS', 'CREDITVATAMOUNT', 'DEBITVATAMOUNT']):
        formula_base = (office_clients['CREDITUS'] - office_clients['DEBITUS']).sum()
        formula_vat = (office_clients['CREDITVATAMOUNT'] - office_clients['DEBITVATAMOUNT']).sum()
        formula_total = formula_base + formula_vat
        
        # Calculate using our step-by-step method
        if 'QUANTITY_USD' in office_clients.columns:
            calculated_total = office_clients['QUANTITY_USD'].sum()
            
            print(f"\nüìä Results:")
            print(f"   SUM(CREDITUS - DEBITUS): ${formula_base:,.2f}")
            print(f"   SUM(CREDITVATAMOUNT - DEBITVATAMOUNT): ${formula_vat:,.2f}")
            print(f"   Formula Total: ${formula_total:,.2f}")
            print(f"   Calculated Total (QUANTITY_USD.sum()): ${calculated_total:,.2f}")
            
            if abs(formula_total - calculated_total) < 0.01:
                print("\n‚úÖ Formula verification PASSED - Results match!")
            else:
                print(f"\n‚ö†Ô∏è Formula verification - Small difference: ${abs(formula_total - calculated_total):,.2f}")
                print("   (This might be due to rounding or data type differences)")
