# ITEMS Table Exploration

This notebook explores the ITEMS table structure and data, loading it exactly as the webapp does.

**Purpose:**
- Discover all columns in the ITEMS table
- Understand the CONTACT column and its usage
- Analyze data types and sample values
- Explore relationships with other tables

**Loading Method:**
- Uses direct InterBase connection (same as webapp)
- Loads ITEMS table as `sales_details` dataframe
- Matches the exact loading process from `webapp/services/database_service.py`


In [None]:
# Import Required Libraries
import pandas as pd
import warnings
from datetime import datetime

# Suppress warnings to match webapp behavior
warnings.filterwarnings('ignore')

# Database configuration (same as webapp/config/database.py)
DATABASE_CONFIG = {
    'DATA_SOURCE': "100.200.2.1",
    'DATABASE_PATH': r"D:\dolly2008\fer2015.dol",
    'USERNAME': "ALIOSS",
    'PASSWORD': "Ali@123",
    'CLIENT_LIBRARY': r"C:\Users\User\Downloads\Compressed\ibclient64-14.1_x86-64\ibclient64-14.1.dll"
}

# Try to import interbase for direct InterBase connection (same as webapp)
try:
    import interbase
    INTERBASE_AVAILABLE = True
    print("‚úÖ InterBase Python driver available for direct connection")
except ImportError:
    INTERBASE_AVAILABLE = False
    print("‚ö†Ô∏è InterBase Python driver not available")
    print("   Please install: pip install interbase")

print("‚úÖ Libraries imported and configuration loaded")


‚úÖ InterBase Python driver available for direct connection
‚úÖ Libraries imported and configuration loaded


In [None]:
# Define connect_and_load_table function (exact copy from webapp/services/database_service.py)
def connect_and_load_table(table_name):
    """Load a table from the database using direct InterBase connection only"""
    try:
        print(f"üîÑ Connecting to database for table {table_name}...")
        
        # Use direct InterBase connection only (no fallback to ODBC)
        if not INTERBASE_AVAILABLE:
            raise Exception("InterBase Python library not available")
        
        print(f"üîó Using direct InterBase connection for {table_name}...")
        
        # Build direct connection for InterBase
        # Format: host:database_path
        dsn = f"{DATABASE_CONFIG['DATA_SOURCE']}:{DATABASE_CONFIG['DATABASE_PATH']}"
        print(f"üì° DSN: {dsn}")
        print(f"üìö Client Library: {DATABASE_CONFIG.get('CLIENT_LIBRARY', 'system default')}")
        
        # Connect with explicit client library if specified
        if 'CLIENT_LIBRARY' in DATABASE_CONFIG:
            conn = interbase.connect(
                dsn=dsn,
                user=DATABASE_CONFIG['USERNAME'],
                password=DATABASE_CONFIG['PASSWORD'],
                ib_library_name=DATABASE_CONFIG['CLIENT_LIBRARY'],
                charset='NONE'  # Use UTF-8 charset for better character compatibility
            )
        else:
            conn = interbase.connect(
                dsn=dsn,
                user=DATABASE_CONFIG['USERNAME'],
                password=DATABASE_CONFIG['PASSWORD'],
                charset='NONE'  # Use UTF-8 charset for better character compatibility
            )
        
        print(f"‚úÖ Direct InterBase connection successful for {table_name}")
        
        # Execute query and fetch data
        cursor = conn.cursor()
        cursor.execute(f"SELECT * FROM {table_name}")
        
        # Get column names
        columns = [desc[0] for desc in cursor.description]
        
        # Fetch all rows
        rows = cursor.fetchall()
        
        # Convert to DataFrame
        df = pd.DataFrame(rows, columns=columns)
        
        conn.close()
        print(f"‚úÖ {table_name}: {df.shape[0]:,} rows √ó {df.shape[1]} columns (direct connection)")
        return df
        
    except Exception as e:
        print(f"‚ùå {table_name}: Failed to load - {e}")
        print(f"   Error type: {type(e).__name__}")
        print(f"   DSN attempted: {DATABASE_CONFIG['DATA_SOURCE']}:{DATABASE_CONFIG['DATABASE_PATH']}")
        print(f"   Client Library: {DATABASE_CONFIG.get('CLIENT_LIBRARY', 'system default')}")
        return None

print("‚úÖ Data loading function defined (matching webapp exactly)")


‚úÖ Data loading function defined (matching webapp exactly)


## 1. Load ITEMS Table

Load the ITEMS table exactly as the webapp does (stored as `sales_details` dataframe).


In [None]:
# Load ITEMS table (same as webapp: sales_details_df = connect_and_load_table('ITEMS'))
sales_details = connect_and_load_table('ITEMS')

if sales_details is not None:
    print(f"\nüìä ITEMS Table Loaded Successfully!")
    print(f"   Shape: {sales_details.shape[0]:,} rows √ó {sales_details.shape[1]} columns")
    print(f"   Memory usage: {sales_details.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
else:
    print("‚ùå Failed to load ITEMS table")


üîÑ Connecting to database for table ITEMS...
üîó Using direct InterBase connection for ITEMS...
üì° DSN: 100.200.2.1:D:\dolly2008\fer2015.dol
üìö Client Library: C:\Users\User\Downloads\Compressed\ibclient64-14.1_x86-64\ibclient64-14.1.dll
‚úÖ Direct InterBase connection successful for ITEMS
‚úÖ ITEMS: 3,322,772 rows √ó 54 columns (direct connection)

üìä ITEMS Table Loaded Successfully!
   Shape: 3,322,772 rows √ó 54 columns
   Memory usage: 2671.59 MB


## 2. Explore Table Structure

Examine all columns, their data types, and basic statistics.


In [None]:
if sales_details is not None:
    print("üìã ITEMS Table Structure:")
    print(f"   Total columns: {len(sales_details.columns)}\n")
    
    print("üìã Column Names (in order):")
    for i, col in enumerate(sales_details.columns, 1):
        print(f"   {i:2d}. {col}")
    
    print(f"\nüìä Column Data Types:")
    print(sales_details.dtypes)
    
    print(f"\nüìä Basic Info:")
    print(sales_details.info())


üìã ITEMS Table Structure:
   Total columns: 54

üìã Column Names (in order):
    1. ID
    2. MID
    3. ITEM
    4. SITE
    5. STTYPE
    6. FRAC
    7. QTY
    8. PACK
    9. PRICE
   10. DISCOUNT
   11. VAT
   12. COSTUS
   13. COSTLC
   14. CATREGORYID
   15. VATAMOUNT
   16. DEBITUS
   17. CREDITUS
   18. BARCODE
   19. BONENO
   20. DEBITQTY
   21. CREDITQTY
   22. YESNO
   23. TOTAL
   24. FDATE
   25. ALLQTY
   26. JOB
   27. SID
   28. SALESMAN
   29. CONTACT
   30. TSITE
   31. DEBITLC
   32. CLC
   33. CREDITLC
   34. STQTY
   35. AUTOCURRFAC
   36. FTYPE
   37. NOVTOTAL
   38. CURRVAL
   39. CURRVALLC
   40. DEPENSE
   41. CARTOON
   42. CARTOONDC
   43. FIDATE
   44. MYLINES
   45. FROMBAL
   46. TOBAL
   47. ITCOLOR
   48. EXTRANOTE
   49. MYORDER
   50. PRICEKILO
   51. MYCATEGORYID
   52. POID
   53. DEBITVATAMOUNT
   54. CREDITVATAMOUNT

üìä Column Data Types:
ID                          int64
MID                        object
ITEM                       object
SIT

## 3. Explore CONTACT Column

Check if CONTACT column exists and analyze its values.


In [None]:
if sales_details is not None:
    # Check if CONTACT column exists (case-sensitive check)
    contact_col = None
    for col in sales_details.columns:
        if col.upper() == 'CONTACT':
            contact_col = col
            break
    
    if contact_col:
        print(f"‚úÖ CONTACT column found: '{contact_col}'")
        print(f"\nüìä CONTACT Column Analysis:")
        print(f"   Data type: {sales_details[contact_col].dtype}")
        print(f"   Non-null count: {sales_details[contact_col].notna().sum():,}")
        print(f"   Null count: {sales_details[contact_col].isna().sum():,}")
        print(f"   Unique values: {sales_details[contact_col].nunique():,}")
        
        print(f"\nüìã Sample Values:")
        print(sales_details[contact_col].value_counts().head(20))
        
        print(f"\nüìã Sample Rows with CONTACT:")
        display(sales_details[[contact_col] + [col for col in sales_details.columns if col != contact_col][:10]].head(10))
        
        # Check for non-null CONTACT values
        non_null_contact = sales_details[sales_details[contact_col].notna()]
        if len(non_null_contact) > 0:
            print(f"\n‚úÖ Found {len(non_null_contact):,} rows with non-null CONTACT values")
            print(f"\nüìã Sample non-null CONTACT values:")
            display(non_null_contact[[contact_col, 'SID', 'ITEM', 'FDATE', 'FTYPE']].head(20))
        else:
            print(f"\n‚ö†Ô∏è All CONTACT values are null")
    else:
        print("‚ùå CONTACT column not found")
        print(f"\nAvailable columns containing 'contact' (case-insensitive):")
        matching_cols = [col for col in sales_details.columns if 'contact' in col.lower()]
        if matching_cols:
            for col in matching_cols:
                print(f"   - {col}")
        else:
            print("   No columns found containing 'contact'")


‚úÖ CONTACT column found: 'CONTACT'

üìä CONTACT Column Analysis:
   Data type: float64
   Non-null count: 1,195,102
   Null count: 2,127,670
   Unique values: 4

üìã Sample Values:
CONTACT
0.0     945371
1.0     136548
2.0     113156
29.0        27
Name: count, dtype: int64

üìã Sample Rows with CONTACT:


Unnamed: 0,CONTACT,ID,MID,ITEM,SITE,STTYPE,FRAC,QTY,PACK,PRICE,DISCOUNT
0,0.0,8695710,SI14108717GEM,T400,GEM,-,0.0,1.0,1.0,14.66,0.0
1,,9083937,FO15940961AZB,F412,DEP,-,0.0,34000.0,1.0,13.5,0.0
2,0.0,10040519,SI14107738BUM,S365,BUM,-,0.0,64.0,1.0,8.12,0.0
3,0.0,10040520,SI14107738BUM,P252_A20-LINT,BUM,-,0.0,5.0,1.0,11.68,0.0
4,0.0,10040521,SI14107738BUM,P202_A20-MINT,BUM,-,0.0,2.0,1.0,11.68,0.0
5,0.0,10040522,SI14107738BUM,P009_E5-AR,BUM,-,0.0,3.0,1.0,12.5,0.0
6,0.0,10040523,SI14107738BUM,P008_E1-AR,BUM,-,0.0,5.0,1.0,2.84,0.0
7,0.0,10040524,SI14107738BUM,J120,BUM,-,0.0,1.0,1.0,3.1,0.0
8,0.0,10040525,SI14107738BUM,G00034,BUM,-,0.0,3.0,1.0,11.87,0.0
9,0.0,10040526,SI14107738BUM,G00033,BUM,-,0.0,20.0,1.0,15.09,0.0



‚úÖ Found 1,195,102 rows with non-null CONTACT values

üìã Sample non-null CONTACT values:


Unnamed: 0,CONTACT,SID,ITEM,FDATE,FTYPE
0,0.0,53020022,T400,2024-03-20,1
2,0.0,53020014,S365,2024-11-08,1
3,0.0,53020014,P252_A20-LINT,2024-11-08,1
4,0.0,53020014,P202_A20-MINT,2024-11-08,1
5,0.0,53020014,P009_E5-AR,2024-11-08,1
6,0.0,53020014,P008_E1-AR,2024-11-08,1
7,0.0,53020014,J120,2024-11-08,1
8,0.0,53020014,G00034,2024-11-08,1
9,0.0,53020014,G00033,2024-11-08,1
12,1.0,41120027,F408,2024-11-01,1


## 4. Sample Data Exploration

View sample rows to understand the data structure.


In [None]:
if sales_details is not None:
    print("üìã First 10 Rows:")
    display(sales_details.head(10))
    
    print("\nüìã Last 10 Rows:")
    display(sales_details.tail(10))
    
    print("\nüìã Random 10 Rows:")
    display(sales_details.sample(10))


üìã First 10 Rows:


Unnamed: 0,ID,MID,ITEM,SITE,STTYPE,FRAC,QTY,PACK,PRICE,DISCOUNT,...,FROMBAL,TOBAL,ITCOLOR,EXTRANOTE,MYORDER,PRICEKILO,MYCATEGORYID,POID,DEBITVATAMOUNT,CREDITVATAMOUNT
0,8695710,SI14108717GEM,T400,GEM,-,0.0,1.0,1.0,14.66,0.0,...,-1.0,0.0,,,,,100006.0,,0.0,2.3456
1,9083937,FO15940961AZB,F412,DEP,-,0.0,34000.0,1.0,13.5,0.0,...,0.0,0.0,16777215.0,,,,5059.0,,0.0,0.0
2,10040519,SI14107738BUM,S365,BUM,-,0.0,64.0,1.0,8.12,0.0,...,-64.0,0.0,,,,,5076.0,,0.0,83.1488
3,10040520,SI14107738BUM,P252_A20-LINT,BUM,-,0.0,5.0,1.0,11.68,0.0,...,-5.0,0.0,,,,,5071.0,,0.0,9.344
4,10040521,SI14107738BUM,P202_A20-MINT,BUM,-,0.0,2.0,1.0,11.68,0.0,...,-2.0,0.0,,,,,5071.0,,0.0,3.7376
5,10040522,SI14107738BUM,P009_E5-AR,BUM,-,0.0,3.0,1.0,12.5,0.0,...,-3.0,0.0,,,,,5071.0,,0.0,6.0
6,10040523,SI14107738BUM,P008_E1-AR,BUM,-,0.0,5.0,1.0,2.84,0.0,...,-5.0,0.0,,,,,5071.0,,0.0,2.272
7,10040524,SI14107738BUM,J120,BUM,-,0.0,1.0,1.0,3.1,0.0,...,-1.0,0.0,,,,,100025.0,,0.0,0.496
8,10040525,SI14107738BUM,G00034,BUM,-,0.0,3.0,1.0,11.87,0.0,...,-3.0,0.0,,,,,5079.0,,0.0,5.6976
9,10040526,SI14107738BUM,G00033,BUM,-,0.0,20.0,1.0,15.09,0.0,...,-20.0,0.0,,,,,5079.0,,0.0,48.288



üìã Last 10 Rows:


Unnamed: 0,ID,MID,ITEM,SITE,STTYPE,FRAC,QTY,PACK,PRICE,DISCOUNT,...,FROMBAL,TOBAL,ITCOLOR,EXTRANOTE,MYORDER,PRICEKILO,MYCATEGORYID,POID,DEBITVATAMOUNT,CREDITVATAMOUNT
3322762,12810958,SI16207866AMM,G00021,BA2,-,0.0,38.0,1.0,3.02,0.0,...,-38.0,0.0,,,,,100063.0,,0.0,18.3616
3322763,12810959,SI16207866AMM,G00023,BA2,-,0.0,16.0,1.0,4.61,0.0,...,-16.0,0.0,,,,,100063.0,,0.0,11.8016
3322764,12810960,SI16207866AMM,P007_E5-AG,BA2,-,0.0,1.0,1.0,12.5,0.0,...,-1.0,0.0,,,,,5071.0,,0.0,2.0
3322765,12810961,SI16207866AMM,P009_E5-AR,BA2,-,0.0,3.0,1.0,11.99,0.0,...,-3.0,0.0,,,,,5071.0,,0.0,5.7552
3322766,12810962,SI16207866AMM,P012_A1-AR,BA2,-,0.0,2.0,1.0,2.1,0.0,...,-2.0,0.0,,,,,5071.0,,0.0,0.672
3322767,12810963,SI16207866AMM,P150_A1-EW,BA2,-,0.0,3.0,1.0,3.55,0.0,...,-3.0,0.0,,,,,5071.0,,0.0,1.704
3322768,12810964,SI16207866AMM,P151_A4-EW,BA2,-,0.0,2.0,1.0,14.21,0.0,...,-2.0,0.0,,,,,5071.0,,0.0,4.5472
3322769,12810965,SI16207866AMM,P153_E1-EW,BA2,-,0.0,4.0,1.0,4.66,0.0,...,-4.0,0.0,,,,,5071.0,,0.0,2.9824
3322770,12810966,SI16207866AMM,P156_E1-EC,BA2,-,0.0,4.0,1.0,4.66,0.0,...,-4.0,0.0,,,,,5071.0,,0.0,2.9824
3322771,12810967,SI16207866AMM,P157_E4-EC,BA2,-,0.0,2.0,1.0,18.45,0.0,...,-2.0,0.0,,,,,5071.0,,0.0,5.904



üìã Random 10 Rows:


Unnamed: 0,ID,MID,ITEM,SITE,STTYPE,FRAC,QTY,PACK,PRICE,DISCOUNT,...,FROMBAL,TOBAL,ITCOLOR,EXTRANOTE,MYORDER,PRICEKILO,MYCATEGORYID,POID,DEBITVATAMOUNT,CREDITVATAMOUNT
78072,8081859,FO15875000AZB,F002,BOM,-,0.0,100.0,1.0,8.55,0.0,...,0.0,0.0,16777215.0,,,,5074.0,,0.0,0.0
859271,9195121,SI144834229XL,F185,P-O,-,0.0,552.0,1.0,14.66,0.0,...,-552.0,552.0,16777215.0,,,,5058.0,,0.0,0.0
595627,8800527,SI14109452KIM,F202,KIE,-,0.0,1.0,1.0,19.17,0.0,...,-1.0,0.0,,,,,5080.0,,0.0,3.0672
1945753,10716698,SI16052354AMM,F399,NG3,-,0.0,2.0,1.0,9.75,0.0,...,-2.0,0.0,,,,,5059.0,,0.0,0.0
278351,8357735,SI15893595AMM,CR001,MOK,-,0.0,7.0,1.0,2.93,0.0,...,-7.0,0.0,,,,,100007.0,,0.0,3.2816
3201351,12621109,SI16194746AMM,JKL102,BIB,-,0.0,1.0,1.0,0.39,0.0,...,-1.0,0.0,,,,,100016.0,,0.0,0.0624
3156284,12542014,SI16189277AMM,F455,LEM,-,0.0,6.0,1.0,5.34,0.0,...,-6.0,0.0,,,,,100026.0,,0.0,5.1264
13043,7993091,SI15869168AMM,JKL070,JKL,-,0.0,29.0,1.0,1.9,0.0,...,-29.0,0.0,,,,,100016.0,,0.0,8.816
2675943,11838042,SI144891679XL,F183,P-O,-,0.0,7284.0,1.0,7.98,0.0,...,-7284.0,7284.0,16777215.0,,,,5058.0,,0.0,0.0
2227569,11136417,SI16085287AMM,ECO802,DEB,-,0.0,31.0,1.0,1.07,0.0,...,-31.0,0.0,,,,,5072.0,,0.0,5.3072


## 5. Key Columns Analysis

Analyze important columns used in Reports 7 and 8.


In [None]:
if sales_details is not None:
    # Key columns used in reports
    key_columns = ['SID', 'ITEM', 'FDATE', 'FTYPE', 'QTY', 'MID', 
                   'CREDITUS', 'DEBITUS', 'CREDITVATAMOUNT', 'DEBITVATAMOUNT',
                   'SITE', 'CONTACT', 'SALESMAN']
    
    print("üìä Key Columns Analysis:")
    print("=" * 80)
    
    for col in key_columns:
        if col in sales_details.columns:
            print(f"\n‚úÖ {col}:")
            print(f"   Data type: {sales_details[col].dtype}")
            print(f"   Non-null: {sales_details[col].notna().sum():,} ({sales_details[col].notna().sum()/len(sales_details)*100:.1f}%)")
            print(f"   Null: {sales_details[col].isna().sum():,} ({sales_details[col].isna().sum()/len(sales_details)*100:.1f}%)")
            if sales_details[col].dtype in ['object', 'string']:
                print(f"   Unique values: {sales_details[col].nunique():,}")
                print(f"   Sample values: {list(sales_details[col].dropna().unique()[:5])}")
            elif sales_details[col].dtype in ['int64', 'float64']:
                print(f"   Min: {sales_details[col].min()}")
                print(f"   Max: {sales_details[col].max()}")
                print(f"   Mean: {sales_details[col].mean():.2f}")
        else:
            print(f"\n‚ùå {col}: Column not found")


üìä Key Columns Analysis:

‚úÖ SID:
   Data type: object
   Non-null: 3,322,772 (100.0%)
   Null: 0 (0.0%)
   Unique values: 1,277
   Sample values: ['53020022', '53020008', '53020014', '53010055', '41120027']

‚úÖ ITEM:
   Data type: object
   Non-null: 3,322,772 (100.0%)
   Null: 0 (0.0%)
   Unique values: 983
   Sample values: ['T400', 'F412', 'S365', 'P252_A20-LINT', 'P202_A20-MINT']

‚úÖ FDATE:
   Data type: datetime64[ns]
   Non-null: 3,322,772 (100.0%)
   Null: 0 (0.0%)

‚úÖ FTYPE:
   Data type: int64
   Non-null: 3,322,772 (100.0%)
   Null: 0 (0.0%)
   Min: 1
   Max: 23
   Mean: 4.25

‚úÖ QTY:
   Data type: float64
   Non-null: 3,322,772 (100.0%)
   Null: 0 (0.0%)
   Min: -29559.0
   Max: 600000.0
   Mean: 78.40

‚úÖ MID:
   Data type: object
   Non-null: 3,322,772 (100.0%)
   Null: 0 (0.0%)
   Unique values: 287,854
   Sample values: ['SI14108717GEM', 'FO15940961AZB', 'SI14107738BUM', 'SI16000864AMM', 'SI16000149AZB']

‚úÖ CREDITUS:
   Data type: float64
   Non-null: 3,322,73

## 6. CONTACT Column Relationship Analysis

Explore how CONTACT relates to other columns, especially SID (client).


In [None]:
if sales_details is not None:
    contact_col = None
    for col in sales_details.columns:
        if col.upper() == 'CONTACT':
            contact_col = col
            break
    
    if contact_col and 'SID' in sales_details.columns:
        print("üìä CONTACT vs SID Analysis:")
        print("=" * 80)
        
        # Check if CONTACT is unique per SID or varies
        contact_sid_analysis = sales_details.groupby('SID')[contact_col].agg(['nunique', 'count']).reset_index()
        contact_sid_analysis.columns = ['SID', 'unique_contacts', 'total_records']
        
        print(f"\nüìã SIDs with multiple different CONTACT values:")
        multiple_contacts = contact_sid_analysis[contact_sid_analysis['unique_contacts'] > 1]
        print(f"   Found {len(multiple_contacts):,} SIDs with multiple CONTACT values")
        if len(multiple_contacts) > 0:
            display(multiple_contacts.head(20))
        
        print(f"\nüìã Sample: SID -> CONTACT mapping:")
        sample_sid_contact = sales_details[['SID', contact_col]].drop_duplicates()
        print(f"   Found {len(sample_sid_contact):,} unique SID-CONTACT combinations")
        display(sample_sid_contact.head(20))
        
        # Check CONTACT distribution
        print(f"\nüìä CONTACT Value Distribution:")
        contact_dist = sales_details[contact_col].value_counts()
        print(f"   Top 20 CONTACT values:")
        display(contact_dist.head(20))
        
        # Check if CONTACT is related to SALESMAN
        if 'SALESMAN' in sales_details.columns:
            print(f"\nüìä CONTACT vs SALESMAN Analysis:")
            contact_salesman = sales_details[[contact_col, 'SALESMAN']].drop_duplicates()
            print(f"   Unique CONTACT-SALESMAN combinations: {len(contact_salesman):,}")
            display(contact_salesman.head(20))


üìä CONTACT vs SID Analysis:

üìã SIDs with multiple different CONTACT values:
   Found 46 SIDs with multiple CONTACT values


Unnamed: 0,SID,unique_contacts,total_records
4,3700002,3,8718
5,3700003,3,13980
6,3700004,3,39812
261,41110508,2,142
554,41120490,2,84
598,41120567,2,43
600,41120569,2,60
1134,53010049,2,4262
1151,53010089,2,3371
1152,53010095,2,4083



üìã Sample: SID -> CONTACT mapping:
   Found 1,466 unique SID-CONTACT combinations


Unnamed: 0,SID,CONTACT
0,53020022,0.0
1,53020008,
2,53020014,0.0
10,53010055,
12,41120027,1.0
13,3700002,
15,41120760,1.0
25,53010067,0.0
26,53010063,0.0
29,53020037,0.0



üìä CONTACT Value Distribution:
   Top 20 CONTACT values:


CONTACT
0.0     945371
1.0     136548
2.0     113156
29.0        27
Name: count, dtype: int64


üìä CONTACT vs SALESMAN Analysis:
   Unique CONTACT-SALESMAN combinations: 35


Unnamed: 0,CONTACT,SALESMAN
0,0.0,4.0
1,,4.0
2,0.0,10.0
10,,1.0
12,1.0,1.0
25,0.0,1.0
29,0.0,0.0
48,2.0,1.0
74,1.0,4.0
91,2.0,4.0


## 7. Date Range Analysis

Check the date range of data in ITEMS table.


In [None]:
if sales_details is not None and 'FDATE' in sales_details.columns:
    print("üìä Date Range Analysis:")
    print("=" * 80)
    
    # Convert FDATE to datetime
    sales_details['FDATE_DT'] = pd.to_datetime(sales_details['FDATE'], errors='coerce')
    
    print(f"\nüìÖ Date Statistics:")
    print(f"   Earliest date: {sales_details['FDATE_DT'].min()}")
    print(f"   Latest date: {sales_details['FDATE_DT'].max()}")
    print(f"   Date range: {(sales_details['FDATE_DT'].max() - sales_details['FDATE_DT'].min()).days} days")
    print(f"   Records with valid dates: {sales_details['FDATE_DT'].notna().sum():,}")
    print(f"   Records with null dates: {sales_details['FDATE_DT'].isna().sum():,}")
    
    print(f"\nüìä Records by Year:")
    sales_details['YEAR'] = sales_details['FDATE_DT'].dt.year
    year_counts = sales_details['YEAR'].value_counts().sort_index()
    display(year_counts)
    
    print(f"\nüìä Records by Month (last 12 months):")
    sales_details['YEAR_MONTH'] = sales_details['FDATE_DT'].dt.to_period('M')
    month_counts = sales_details['YEAR_MONTH'].value_counts().sort_index().tail(12)
    display(month_counts)


üìä Date Range Analysis:

üìÖ Date Statistics:
   Earliest date: 2023-10-31 00:00:00
   Latest date: 2025-12-26 00:00:00
   Date range: 787 days
   Records with valid dates: 3,322,772
   Records with null dates: 0

üìä Records by Year:


YEAR
2023     235325
2024    1447535
2025    1639912
Name: count, dtype: int64


üìä Records by Month (last 12 months):


YEAR_MONTH
2025-01    108585
2025-02    119483
2025-03    133207
2025-04    129717
2025-05    138226
2025-06    124312
2025-07    154351
2025-08    149634
2025-09    150515
2025-10    163116
2025-11    137845
2025-12    130921
Freq: M, Name: count, dtype: int64

## 8. Summary Statistics

Overall summary of the ITEMS table.


In [None]:
if sales_details is not None:
    print("üìä ITEMS Table Summary:")
    print("=" * 80)
    
    print(f"\n‚úÖ Table loaded successfully")
    print(f"   Total rows: {len(sales_details):,}")
    print(f"   Total columns: {len(sales_details.columns)}")
    print(f"   Memory usage: {sales_details.memory_usage(deep=True).sum() / 1024**2:.2f} MB")
    
    print(f"\nüìã Column Summary:")
    print(f"   Numeric columns: {len(sales_details.select_dtypes(include=['int64', 'float64']).columns)}")
    print(f"   String/Object columns: {len(sales_details.select_dtypes(include=['object', 'string']).columns)}")
    print(f"   Date columns: {len(sales_details.select_dtypes(include=['datetime64']).columns)}")
    
    # Check for CONTACT column
    contact_col = None
    for col in sales_details.columns:
        if col.upper() == 'CONTACT':
            contact_col = col
            break
    
    if contact_col:
        print(f"\n‚úÖ CONTACT column exists: '{contact_col}'")
        print(f"   Non-null values: {sales_details[contact_col].notna().sum():,}")
        print(f"   Unique values: {sales_details[contact_col].nunique():,}")
    else:
        print(f"\n‚ùå CONTACT column not found")
    
    print(f"\nüìä Key Statistics:")
    if 'QTY' in sales_details.columns:
        print(f"   Total QTY: {sales_details['QTY'].sum():,.2f}")
        print(f"   Average QTY: {sales_details['QTY'].mean():.2f}")
    
    if 'SID' in sales_details.columns:
        print(f"   Unique SIDs (clients): {sales_details['SID'].nunique():,}")
    
    if 'ITEM' in sales_details.columns:
        print(f"   Unique Items: {sales_details['ITEM'].nunique():,}")
    
    if 'FTYPE' in sales_details.columns:
        print(f"\nüìä FTYPE Distribution:")
        print(sales_details['FTYPE'].value_counts().sort_index())
    
    print(f"\n‚úÖ Exploration complete!")


üìä ITEMS Table Summary:

‚úÖ Table loaded successfully
   Total rows: 3,322,772
   Total columns: 57
   Memory usage: 2734.97 MB

üìã Column Summary:
   Numeric columns: 39
   String/Object columns: 13
   Date columns: 3

‚úÖ CONTACT column exists: 'CONTACT'
   Non-null values: 1,195,102
   Unique values: 4

üìä Key Statistics:
   Total QTY: 260,499,140.66
   Average QTY: 78.40
   Unique SIDs (clients): 1,277
   Unique Items: 983

üìä FTYPE Distribution:
FTYPE
1     2698036
2        3276
3       42955
4         688
12      99617
13       3712
14      30626
15      25331
22     402858
23      15673
Name: count, dtype: int64

‚úÖ Exploration complete!
