<br>

# Parch & Posey

## This notebook contains the codes that were used to create the SQLite db file of Parch and Posey

Parch & Posey is a Hypothetical Paper company with 50 sales representatives spread across the United States in four regions.

There are three types of paper
 - Regular
 - Poster
 - Glossy
 
They supply to a client that is a Fortune 100 company.

Using SQL, tricky business questions were answered.

An entity relationship diagram (ERD) is a common way to view data in a database. Below is the ERD for the database used from Parch & Posey. These diagrams helps to visualize the data you are analyzing including:

The names of the tables.
The columns in each table.
The way the tables work together.

In the Parch & Posey database there are five tables:

 - web_events
 - accounts
 - orders
 - sales_reps
 - region

The dataset was obtained in a text format, it was pasted in SQL Server Management Studio (SSMS) to create the database.

The tables in the created database were saved as CSV files giving two options for creating a SQLite database file.

One option is to use the CSV files to create the SQLite db files. Another option is to connect directly to the SQL Server Management Studio and then, through some interpolation, create the db file. The second option seems to be more efficient in terms of time and space.

The database file was created using SQL Server Management Studio and the read into a Jupyter Notebook.

![ERD of Parch and Posey](ERD_for_parch_and_porsey.png)

<br>
<br>

### Importing Libraries

In [1]:
# importing needed libraries

import pyodbc
import pandas as pd
import glob
import sqlite3
import warnings
warnings.filterwarnings("ignore")

<br>

# Creating the SQLite db file by connecting to a SQL Server Management Studio

## Connecting to the database

In [2]:
def establish_connection(server, database):
    '''
        This function is used to establish a connection with a SQL Server Management Studio.
        
        Input:
        
                server: A string representing the instance name of the server
            
                database: A string representing the name of the attached database to establish connection with
                
        Output:
                connection: The establish connection which can be used to access the database
    '''
    
    # A needed library
    import pyodbc
    
    
    trusted_connection = 'yes'
    
    connection_string = f'DRIVER=ODBC Driver 17 for SQL Server;SERVER={server};DATABASE={database};Trusted_Connection={trusted_connection}'
    
    # An exception to indicate the status of the connection and catch errors
    try:
        conn = pyodbc.connect(connection_string)
        print("Connected to the database!")
        return conn
    except Exception as e:
        print(f"Error: {str(e)}")
        return
        
    

<br>
<br>

## Saving the content of the database as a SQLite db file

In [3]:
def Save_As_Db_File(conn, database):
    '''
        This functions creates a SQLite .db file in the same directory as this notebook
        using a given connection with a SQL Server Management Studio database
        
        Input:
                conn: This is the connection that has been established with a database server
                
                database: A string representing the name of the SQLite .db file to be created
            
    '''
    
    
    #Important libraries
    import pandas as pd
    import sqlite3
    
    '''
        Creating an empty SQLite database file using database, the parameter passed into the function
    '''
    
    database_file_conn = sqlite3.connect(database+'.db')
    
    
    '''
        The names of the tables in the database will be obtained and saved as a list
        This will be used when reading the content of the remote database and when
        populating the created SQLite .db file.
    '''     
    # SQL query to retrieve tables names from the database using sys.tables
    table_names_from_db = "SELECT name FROM sys.tables"
    
    # Fetching the table names into a Pandas Dataframe
    table_names_df = pd.read_sql(table_names_from_db, conn)
    
    # Table names as a list
    table_names = list(table_names_df['name'].values)
    
    
    '''
        The content content of the database are read using a loop, and then saved in the created
        SQLite .db file
    '''
    # An empty Pandas Dataframe
    for table_name in table_names:
        
        # SQL query to fetch individual tables from the database
        query_table = f'SELECT * FROM {table_name}'
        
        # It executes the query `query_table` and fetch data into a Pandas Dataframe
        df = pd.read_sql(query_table, conn)
        
        # The Pandas Dataframe with its content is passed into the created SQLite .db file as a table
        df.to_sql(table_name, database_file_conn, if_exists="replace", index=False)
        
        
    try:
        # create a cursor
        cursor = database_file_conn.cursor()
        
        # Execute the query
        cursor.execute("SELECT name FROM sqlite_master WHERE type IN ('table', 'view') AND name NOT LIKE 'sqlite_%' ORDER BY 1")

        # Fetch the result into a variable
        result = cursor.fetchall()
        
        #Tables in created database
        tables = pd.DataFrame([item[0] for item in result], columns=['Tables'])
        print(f'The database {database} was created successfully and the tables in it are:')
        print()
        display(tables)
        return
    except Exception as e:
        print(f"Error: {str(e)}")
        return
        
    
    

### Connection parameters for the Database Server

In [4]:
# Defining connection parameters

server = 'localhost\SQLEXPRESS'  # SQL Server instance name
database = 'perch_and_posey'  # name of attached database

### Creating the locally available SQLite database file

In [5]:
conn = establish_connection(server, database)

Connected to the database!


In [6]:
Save_As_Db_File(conn, 'Parch_and_Posey')

The database Parch_and_Posey was created successfully and the tables in it are:



Unnamed: 0,Tables
0,accounts
1,orders
2,region
3,sales_reps
4,web_events


<br>
<br>

# Creating SQLite .db file from csv files

In [None]:
# Connect to or create the SQLite database

conn = sqlite3.connect('parch_and_posey.db')

In [None]:
# Using glob.glob to get a list of matching file names
file_names = glob.glob('data/*.csv')

for file in file_names:
    table_name = str(file.split('.')[0].split('\\')[-1]) # get the names of each table
    df = pd.read_csv(file)  # create a dataframe from each csv file
    df.to_sql(table_name, conn, if_exists="replace", index=False)  # Insert data into SQLite table

<br>
<br>

# Viewing the contents of each table in the database

In [7]:
%load_ext sql

In [8]:
%sql sqlite:///parch_and_posey.db

### Tables in the database

In [9]:
%%sql

SELECT name FROM sqlite_master WHERE type IN ('table', 'view') AND name NOT LIKE 'sqlite_%' ORDER BY 1

 * sqlite:///parch_and_posey.db
Done.


name
accounts
orders
region
sales_reps
web_events


### Accounts Table

In [10]:
%%sql

SELECT *
FROM accounts
LIMIT 5;

 * sqlite:///parch_and_posey.db
Done.


id,name,website,lat,long,primary_poc,sales_rep_id
1001,Walmart,www.walmart.com,40.23849561,-75.10329704,Tamara Tuma,321500
1011,Exxon Mobil,www.exxonmobil.com,41.1691563,-73.84937379,Sung Shields,321510
1021,Apple,www.apple.com,42.29049481,-76.08400942,Jodee Lupo,321520
1031,Berkshire Hathaway,www.berkshirehathaway.com,40.94902131,-75.76389759,Serafina Banda,321530
1041,McKesson,www.mckesson.com,42.21709326,-75.28499823,Angeles Crusoe,321540


### Orders Table

In [11]:
%%sql

SELECT *
FROM orders
LIMIT 5;

 * sqlite:///parch_and_posey.db
Done.


id,account_id,occurred_at,standard_qty,gloss_qty,poster_qty,total,standard_amt_usd,gloss_amt_usd,poster_amt_usd,total_amt_usd
1,1001,2015-10-06 17:31:14,123,22,24,169,613.77,164.78,194.88,973.43
2,1001,2015-11-05 03:34:33,190,41,57,288,948.1,307.09,462.84,1718.03
3,1001,2015-12-04 04:21:55,85,47,0,132,424.15,352.03,0.0,776.18
4,1001,2016-01-02 01:18:24,144,32,0,176,718.56,239.68,0.0,958.24
5,1001,2016-02-01 19:27:27,108,29,28,165,538.92,217.21,227.36,983.49


### Region Table

In [12]:
%%sql

SELECT *
FROM region
LIMIT 5;

 * sqlite:///parch_and_posey.db
Done.


id,name
1,Northeast
2,Midwest
3,Southeast
4,West


### Sales Reps Table

In [13]:
%%sql

SELECT *
FROM sales_reps
LIMIT 5;

 * sqlite:///parch_and_posey.db
Done.


id,name,region_id
321500,Samuel Racine,1
321510,Eugena Esser,1
321520,Michel Averette,1
321530,Renetta Carew,1
321540,Cara Clarke,1


### Web Events Table

In [14]:
%%sql

SELECT *
FROM web_events
LIMIT 5;

 * sqlite:///parch_and_posey.db
Done.


id,account_id,occurred_at,channel
1,1001,2015-10-06 17:13:58,direct
2,1001,2015-11-05 03:08:26,direct
3,1001,2015-12-04 03:57:24,direct
4,1001,2016-01-02 00:55:03,direct
5,1001,2016-02-01 19:02:33,direct
