# AtliQ Products Analysis

## Contents

1. [Introduction](#introduction)
2. [Data loading and preprocessing](#data-loading-and-preprocessing)
3. [Analysis]
    1. [Finding the bestsellers]
    2. [Popularity across time and markets]
    3. [Variant sales]
    4. [Division sales]
    5. [Product margin]
    6. [Price vs cost]
4. [Conclusion]


## Introduction

Our team has been commissioned by AtliQ Hardware to conduct a thorough analysis of their product portfolio and sales data.

As a prominent computer hardware producer in India, AtliQ is keen on enhancing their understanding of product performance. This analysis aims to identify top-selling products, uncover trends, and develop strategies to optimize sales and market share.

The primary objective of this research is to analyze AtliQ Hardware's product portfolio and sales data to better understand product performance and identify strategies for optimizing sales. We aim to answer these key questions:
- Which items are the bestsellers?
- How has popularity changed over time/across markets?
- Are there some variants that contribute a disproportionate amount to the product sales?
- Are some channels responsible for a large portion of a division’s sales?
- What are the products with the best/worst margin?
- Is gross price keeping up with manufacturing costs?

Through this analysis, our goal is to provide AtliQ Hardware with actionable insights and recommendations to help drive business growth.

## Data loading and Preprocessing

These are the libraries that we are going to use for this project:

In [2]:
import pandas as pd
import sqlite3
import os
import requests
import shutil

We have access to an SQLite database with data on products, clients and sales. 

First let's check that it exists, and if doesn't, we'll download it.

In [3]:
# Local path to the Database
db_directory_path = 'Data'
db_file_path = os.path.join(db_directory_path, 'atliq_db.sqlite3')


In [4]:
# Check if directory exists. If it doesn't, create it
if not os.path.exists(db_directory_path):
    os.makedirs(db_directory_path)


In [5]:
# Check if file exists. If it doesn't, download it
if not os.path.exists(db_file_path):
    print('Database not found. Downloading the file...')

    db_url = 'https://practicum-content.s3.us-west-1.amazonaws.com/data-eng/databases/atliq_db.sqlite3'
    
    response = requests.get(db_url)
    with open(db_file_path, 'wb') as f:
        f.write(response.content)
    
    print('Database downloaded successfully!')
else:
    print('Database found.')


Database not found. Downloading the file...
Database downloaded successfully!


We have our database. But we don't want to directly change the raw data, and we will be working directly with the database as much as possible. So we'll make a copy and modify that instead.

In [6]:
# Check if the copy exists
work_db_path = os.path.join(db_directory_path, 'atliq_db_processed.sqlite3')

work_db_found = False
if os.path.exists(work_db_path):
    work_db_found = True
    print('Previous copy found.')
else:
    shutil.copyfile(db_file_path, work_db_path)
    print('Database duplicated.')


Database duplicated.


We can now connect to our working copy and start processing it. If we found that the copy already exists, we can assume that it is already processed, and we can skip those steps.

In [7]:
# Connect to the DB
con = sqlite3.connect(work_db_path)

First let's check that we have access to the tables that we are supposed to.

In [8]:
# Check all tables
cursor = con.cursor()
cursor.execute("SELECT name FROM sqlite_master WHERE type='table';")
print(*cursor.fetchall(), sep='\n')

('dim_customer',)
('dim_product',)
('fact_pre_discount',)
('fact_manufacturing_cost',)
('fact_gross_price',)
('fact_sales_monthly',)


Lets check for missing values. We can't load the whole tables into pandas, so we'll have to rely only on SQL queries.

Lets build a function to help us, similar to pandas info().

In [9]:
# Find Null values in a column
def count_nulls_in_column(column: str, table: str):
    query = f"""
    SELECT COUNT(*)
    FROM {table}
    WHERE {column} IS NULL
    """

    cursor.execute(query)
    return cursor.fetchone()[0]

In [10]:
# Get all the column names from a table
def get_column_names(table: str):
    query = f"""
    PRAGMA table_info({table}) 
    """

    cursor.execute(query)
    result = cursor.fetchall()
    name_pos_in_row = 1

    column_names = []
    for row in result:
        column_names.append(row[name_pos_in_row])

    return column_names

In [11]:
# Check missing values in all columns of the table
def check_nulls(table: str):
    column_names = get_column_names(table)
    null_counts = []
    for column in column_names:
        null_counts.append((column, count_nulls_in_column(column, table)))

    return null_counts

In [29]:
def table_schema(table: str):
    query = f"""
    PRAGMA table_info({table})
    """

    return pd.read_sql_query(query, con)[['name', 'type', 'pk']]


In [30]:
table_schema('dim_customer')

Unnamed: 0,name,type,pk
0,customer_code,INTEGER,0
1,customer,TEXT,0
2,platform,TEXT,0
3,channel,TEXT,0
4,market,TEXT,0
5,sub_zone,TEXT,0
6,region,TEXT,0


Now lets check were do we have some null values.

In [13]:
print(*check_nulls('dim_customer'), sep='\n')

('customer_code', 0)
('customer', 0)
('platform', 0)
('channel', 0)
('market', 0)
('sub_zone', 0)
('region', 0)


In [14]:
print(*check_nulls('dim_product'), sep='\n')

('product_code', 0)
('division', 0)
('segment', 0)
('category', 0)
('product', 0)
('variant', 0)


In [15]:
print(*check_nulls('fact_pre_discount'), sep='\n')

('customer_code', 0)
('fiscal_year', 0)
('pre_invoice_discount_pct', 0)


In [16]:
print(*check_nulls('fact_manufacturing_cost'), sep='\n')

('product_code', 0)
('cost_year', 0)
('manufacturing_cost', 0)


In [17]:
print(*check_nulls('fact_gross_price'), sep='\n')

('product_code', 0)
('fiscal_year', 0)
('gross_price', 0)


In [18]:
print(*check_nulls('fact_sales_monthly'), sep='\n')

('date', 0)
('product_code', 0)
('customer_code', 1)
('sold_quantity', 1)
('fiscal_year', 1)


In [19]:
query='''
SELECT *
FROM fact_sales_monthly
WHERE fiscal_year IS NULL
'''

cursor.execute(query)
print(*cursor.fetchall(), sep='\n')

('2019-06-01', 'A0', None, None, None)


There is only one row with missing values in the whole Database. It's for product `A0` during `June 2019`. It could mean that this product didn't get any sales that month. Let's see some more info about it.

In [20]:
# Look for other sales of this product
query='''
SELECT *
FROM fact_sales_monthly
WHERE product_code = "A0"
'''

cursor.execute(query)
print(*cursor.fetchall(), sep='\n')

('2019-06-01', 'A0', None, None, None)


There are no other record of sales of this product.

In [21]:
# What product is this
query='''
SELECT *
FROM dim_product
WHERE product_code = "A0"
'''

cursor.execute(query)
print(*cursor.fetchall(), sep='\n')




This product doesn't exist. We can delete this row.

In [22]:
# What product is this
query='''
DELETE 
FROM fact_sales_monthly
WHERE product_code = "A0"
'''

cursor.execute(query)
print(*cursor.fetchall(), sep='\n')

con.commit()





In [12]:
query="""
SELECT *
FROM sqlite_schema

"""

df = pd.read_sql_query(query, con)
df

Unnamed: 0,type,name,tbl_name,rootpage,sql
0,table,dim_customer,dim_customer,2,"CREATE TABLE ""dim_customer"" (""customer_code"" i..."
1,table,dim_product,dim_product,7,"CREATE TABLE ""dim_product"" (""product_code"" tex..."
2,table,fact_pre_discount,fact_pre_discount,16,"CREATE TABLE ""fact_pre_discount"" (""customer_co..."
3,table,fact_manufacturing_cost,fact_manufacturing_cost,23,"CREATE TABLE ""fact_manufacturing_cost"" (""produ..."
4,table,fact_gross_price,fact_gross_price,33,"CREATE TABLE ""fact_gross_price"" (""product_code..."
5,table,fact_sales_monthly,fact_sales_monthly,43,"CREATE TABLE ""fact_sales_monthly"" (""date"" text..."


In [32]:
display(table_schema('dim_customer'))
display(table_schema('fact_pre_discount'))
display(table_schema('fact_manufacturing_cost'))
display(table_schema('fact_gross_price'))
display(table_schema('fact_sales_monthly'))

Unnamed: 0,name,type,pk
0,customer_code,INTEGER,0
1,customer,TEXT,0
2,platform,TEXT,0
3,channel,TEXT,0
4,market,TEXT,0
5,sub_zone,TEXT,0
6,region,TEXT,0


Unnamed: 0,name,type,pk
0,customer_code,INTEGER,0
1,fiscal_year,INTEGER,0
2,pre_invoice_discount_pct,float,0


Unnamed: 0,name,type,pk
0,product_code,TEXT,0
1,cost_year,INTEGER,0
2,manufacturing_cost,float,0


Unnamed: 0,name,type,pk
0,product_code,TEXT,0
1,fiscal_year,INTEGER,0
2,gross_price,float,0


Unnamed: 0,name,type,pk
0,date,TEXT,0
1,product_code,TEXT,0
2,customer_code,INTEGER,0
3,sold_quantity,INTEGER,0
4,fiscal_year,INTEGER,0
