<a href="https://colab.research.google.com/github/SupunGurusinghe/sqlite-plus-colab/blob/main/sg_project1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#### The core ways to handle missing values should be familiar to all data scientists, a phrase which here means ‘if you aren’t familiar, you should memorise the following list’:


1. **Listwise deletion:** if a variable has so many missing cases that it appears useless, delete it.

2. **Casewise deletion:** if there are too many factors missing for a particular observation, delete it.

3. **Dummy Variable Adjustment:** if the variable is missing for a particular case, use an assumed value in its stead. Depending on the problem the median may appear the intuitive choice or a value that represents a ‘neutral’ setting.

4. **Imputation:** use an algorithm to fill in the value, from a simple random number at the most basic end of the spectrum, to a value imputed by its own model at the more complex end.



## **Imports**

### Import SQLite database

> **Pros:** SQLite is easier to integrate with colab

> **Cons:** There are some limitations than MySQL or MS SQL Server







In [1]:
from sqlite3 import connect

## DB Connection

In [3]:
conn = connect('test.db')

# **Basic NULL values handling**

### SQLite Creating a table

### **Background**

> **Scenario:** There is database table `employees` to store employee data. Column names and their data types are mentioned below.

**Special words**
* `employees`Table name
* `id` employee id (primary key)
*  `f_name` first name of employee
* `l_name` last name of employee
* `title` title of employee
* `NIC` national id number of employee (unique)

---
**Data types**
* `id` Integer
*  `f_name` Varchar(50)
* `l_name` Varchar(50)
* `title` Varchar(10)
* `NIC` Varchar(12)

---
**Output:** All the set of rows of table



In [4]:
c = conn.cursor()

# dropping an existing table
c.execute("DROP TABLE IF EXISTS employees")

# create table
c.execute('''
  CREATE TABLE employees(
    id INT,
    f_name VARCHAR(50),
    l_name VARCHAR(50),
    title VARCHAR(10),
    age INT,
    wage INT,
    hire_date DATE,
    NIC VARCHAR(12)
  )
''')

# List of values
employees = [(1, 'kavishka', 'tim', 'Mr', 22, 28, '2022-05-01', '200005303420'), 
             (1, 'Bill', 'Tibb', 'Mr', 61, 28, '2012-05-02', '900239889v'), 
             (3, 'Bill', 'Sadat', None, 18, 12, '2019-11-08', '640239889v'),
             (4, 'Christine', 'Riveles', None, 36, 20, '2018-03-30', '200014303420'),
             (5, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '123456789v'),
             (6, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '123456789v'),
             (None, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '124097654988')]

c.executemany("INSERT INTO employees VALUES (?,?,?,?,?,?,?,?)", employees)

c.execute('SELECT * FROM employees')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(1, 'kavishka', 'tim', 'Mr', 22, 28, '2022-05-01', '200005303420')
(1, 'Bill', 'Tibb', 'Mr', 61, 28, '2012-05-02', '900239889v')
(3, 'Bill', 'Sadat', None, 18, 12, '2019-11-08', '640239889v')
(4, 'Christine', 'Riveles', None, 36, 20, '2018-03-30', '200014303420')
(5, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '123456789v')
(6, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '123456789v')
(None, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '124097654988')




---



# **Completeness (DQ Dimension)**

Data completeness refers to the comprehensiveness or wholeness of the data. 
> There should be no gaps or missing information for data to be truly complete

> Sometimes incomplete data is unusable, but often it's still used even with missing information, **which can lead to costly mistakes and false conclusions**.


---

> A `null` value in a relational database is used when the value in a column is unknown or missing.

> A `null` is neither an empty string (for character or datetime data types) nor a zero value (for numeric data types)

Identification of `null`value is a crucial thing before handling them. 
We are doing this null value handling part basically from 2 levels

1.   Basic
2.   Advance

Each level contains;


1.   `null` values identification
2.   `null` values handling








## (BASIC) Identify `NULL` values



---



> **Scenario:** Take the sum of `null` vaues for each column

 **Importance:** Helps to identify which columns with null values and allows you to identify which columns that you want to handle

 **Output:** Set of null counts for each column



In [4]:
c = conn.cursor()

c.execute('''
  SELECT 
    SUM(CASE WHEN id IS NULL THEN 1 ELSE 0 END) AS id ,
    SUM(CASE WHEN f_name IS NULL THEN 1 ELSE 0 END) AS f_name,
    SUM(CASE WHEN l_name IS NULL THEN 1 ELSE 0 END) AS l_name,
    SUM(CASE WHEN title IS NULL THEN 1 ELSE 0 END) AS title,
    SUM(CASE WHEN age IS NULL THEN 1 ELSE 0 END) AS age,
    SUM(CASE WHEN wage IS NULL THEN 1 ELSE 0 END) AS wage,
    SUM(CASE WHEN hire_date IS NULL THEN 1 ELSE 0 END) AS hire_date
  FROM employees
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(1, 0, 0, 2, 0, 0, 0)


### **Selecting rows having `NULL` value for a particular column**

In [5]:
c = conn.cursor()

c.execute('''
  SELECT *
  FROM employees
  WHERE title IS NULL
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(3, 'Bill', 'Sadat', None, 18, 12, '2019-11-08', '640239889v')
(4, 'Christine', 'Riveles', None, 36, 20, '2018-03-30', '200014303420')


### **Selecting rows having `NULL` value for any of the columns**

In [5]:
c = conn.cursor()

c.execute('''
  SELECT *
  FROM employees
  WHERE (id || f_name || l_name || title || age || wage || hire_date) IS NULL
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(3, 'Bill', 'Sadat', None, 18, 12, '2019-11-08', '640239889v')
(4, 'Christine', 'Riveles', None, 36, 20, '2018-03-30', '200014303420')
(None, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '124097654988')




---

## Handling `NULL` values

---



> There are many missing values (NULLs) of a column, but the columns 
itself is not of interest from analysis point of view.

> There may be no missing value of a column, but it is excluded from the analysis we are preparing the Dataset for. 

> There may be very few missing values of column, but dropping the field (column) is better than replacing those values



### **Deleting rows where column value is `NULL`**

In [7]:
c = conn.cursor()

c.execute('''
  DELETE FROM employees WHERE id IS NULL
''')

c.close()

### **Dropping a column when all rows have `NULL` for that column**

There are many ways to decide whether to delete or not

* Selecting id by grouping
* Selecting distinct ids
* Taking not null id count

#### **If all values `null`; then delete**

In [8]:
# If all null
c = conn.cursor()

c.execute('''SELECT id FROM employees GROUP BY id''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(1,)
(3,)
(4,)
(5,)
(6,)


#### **If all values `null`; then delete**

In [9]:
# If all null
c = conn.cursor()

c.execute('''SELECT DISTINCT id FROM employees''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(1,)
(3,)
(4,)
(5,)
(6,)


#### **If output is zero; then delete**

In [10]:
# If zero (0)
c = conn.cursor()

c.execute('''SELECT count(id) FROM employees WHERE id IS NOT NULL''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(6,)


> **Note: If only you satisfy any of the aboe conditions you have permission to execute the following codeblock and delete null rows.**

In [11]:
#  if you only get back just NULL (or 0 for that last one)
c = conn.cursor()

c.execute('''CREATE TEMPORARY TABLE t1_backup(f_name, l_name, title, age, wage, hire_date)''')
c.execute('''INSERT INTO t1_backup SELECT f_name, l_name, title, age, wage, hire_date FROM employees''')
c.execute('''DROP TABLE employees''')
c.execute('''CREATE TABLE employees(f_name, l_name, title, age, wage, hire_date)''')
c.execute('''INSERT INTO employees SELECT f_name, l_name, title, age, wage, hire_date FROM t1_backup''')
c.execute('''DROP TABLE t1_backup''')

c.execute('''
  SELECT *
  FROM employees
''')
results = c.fetchall()

for result in results:
  print(result)

c.close()

('kavishka', 'tim', 'Mr', 22, 28, '2022-05-01')
('Bill', 'Tibb', 'Mr', 61, 28, '2012-05-02')
('Bill', 'Sadat', None, 18, 12, '2019-11-08')
('Christine', 'Riveles', None, 36, 20, '2018-03-30')
('David', 'Guerin', 'Honorable', 36, 20, '2018-03-30')
('David', 'Guerin', 'Honorable', 36, 20, '2018-03-30')


### **Replace `NULL` values with a sentinel (standard value)**

When it comes to standard there are different ways to replace `null` with a standard value. For example;
> - **Numeric data:** Value not in a range (ex:- Wind speed: 9999.99, Iron melting temp: -1)
- **Dates:** unreasonablylarge or small
- **Default value** like `UNKNOWN`

In this case I have used a default value `Honarable` for missing `titles`

In [12]:
c = conn.cursor()

c.execute('''
  SELECT 
    f_name,
    l_name,
    CASE WHEN title IS NULL THEN 'Honorable' ELSE title END AS NewTitle
  FROM employees
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

('kavishka', 'tim', 'Mr')
('Bill', 'Tibb', 'Mr')
('Bill', 'Sadat', 'Honorable')
('Christine', 'Riveles', 'Honorable')
('David', 'Guerin', 'Honorable')
('David', 'Guerin', 'Honorable')


## (ADVANCE) `NULL` values handling

---

## Replace by an statistical technique such as mean

Let us now discuss another form of data wrangling by imputing (replacing) missing values with the help of Mean Method to improve the data quality.  This method requires us to calculate statistical mean value of the series of the dataset to impute (replace) missing values.

In [13]:
c = conn.cursor()

# dropping an existing table
c.execute("DROP TABLE IF EXISTS house_price")

# create table
c.execute('''
  CREATE TABLE house_price(
    id INT,
    country VARCHAR(50),
    city VARCHAR(50),
    price DOUBLE,
    a DOUBLE,
    b DOUBLE,
    c DOUBLE
  )
''')

house_price = [(1, 'USA', 'LA', 1000000.00, 1, 3, 5), 
             (2, 'UK', 'London', 400000.00, None, 5, 7), 
             (3, 'USA', 'LA', 850000.00, 9, None, None),
             (4, 'USA', 'LA', None, 12, 4, 9),
             (5, 'USA', 'LA', 900000.00, 2, 6, 1),
             (6, 'UK', 'London', 550000.00, None, 4, 8),
             (7, 'USA', 'LA', 1000000.00, 8, 8, 8), 
             (8, 'UK', 'London', 400000.00, 1, 4, 9), 
             (9, 'USA', 'LA', 850000.00, 4, 4, 5),
             (10, 'USA', 'LA', 1050000.00, None, None, None),
             (11, 'USA', 'LA', 900000.00, 3, 8.5, 9),
             (12, 'UK', 'London', None, 10, 7, None)]

c.executemany("INSERT INTO house_price VALUES (?,?,?,?,?,?,?)", house_price)

c.execute('SELECT * FROM house_price')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(1, 'USA', 'LA', 1000000.0, 1.0, 3.0, 5.0)
(2, 'UK', 'London', 400000.0, None, 5.0, 7.0)
(3, 'USA', 'LA', 850000.0, 9.0, None, None)
(4, 'USA', 'LA', None, 12.0, 4.0, 9.0)
(5, 'USA', 'LA', 900000.0, 2.0, 6.0, 1.0)
(6, 'UK', 'London', 550000.0, None, 4.0, 8.0)
(7, 'USA', 'LA', 1000000.0, 8.0, 8.0, 8.0)
(8, 'UK', 'London', 400000.0, 1.0, 4.0, 9.0)
(9, 'USA', 'LA', 850000.0, 4.0, 4.0, 5.0)
(10, 'USA', 'LA', 1050000.0, None, None, None)
(11, 'USA', 'LA', 900000.0, 3.0, 8.5, 9.0)
(12, 'UK', 'London', None, 10.0, 7.0, None)


In [14]:
c = conn.cursor()

c.execute('''
  SELECT 
    h.country, 
    h.city,
    COALESCE(h.price, n.newprice) AS price_new
  FROM house_price h, (SELECT s.city, AVG(s.price) AS newprice
        FROM house_price s
        GROUP BY s.city) n
  WHERE h.city = n.city
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

('USA', 'LA', 1000000.0)
('UK', 'London', 400000.0)
('USA', 'LA', 850000.0)
('USA', 'LA', 935714.2857142857)
('USA', 'LA', 900000.0)
('UK', 'London', 550000.0)
('USA', 'LA', 1000000.0)
('UK', 'London', 400000.0)
('USA', 'LA', 850000.0)
('USA', 'LA', 1050000.0)
('USA', 'LA', 900000.0)
('UK', 'London', 450000.0)


`Note: Instead of AVG(), use functions such as MAX(), MIN() as necessary`

In [61]:
c = conn.cursor()

# dropping an existing table
c.execute("DROP TABLE IF EXISTS hourly_machine_data")

# create table
c.execute('''
  CREATE TABLE hourly_machine_data(
    Observation_datetime DATE,
    Machine_ID INT,
    Casing_Temperature_F INT,
    Bearing_Temperature_F INT,
    Flywheel_rpm INT,
    alarm_status VARCHAR(15),
    Flywheel_rpm_2 INT,
    Filter_airflow DOUBLE,
    TARGET_failure_in_next_90 INT
  )
''')

hourly_machine_data = [('2022-01-01',1,84,131,2374,'normal',2171,1.1,0),
                ('2022-01-02',1,None,132,1587,'normal',1877,1.5,0),
                ('2022-01-03',1,85,133,1206,'normal',1296,1.8,0),
                ('2022-01-04',1,None,None,2181,'normal',1879,2,0),
                ('2022-01-05',1,None,134,1271,'normal',2170,1.7,0),
                ('2022-01-06',1,None,None,1508,'normal',1556,1.3,0),
                ('2022-01-07',1,86,135,1298,'normal',1749,1.9,0),
                ('2022-01-08',1,87,None,1327,'normal',2058,None,0),
                ('2022-01-09',1,88,136,1978,'normal',1501,1.3,0),
                ('2022-01-10',1,None,None,2131,'normal',1952,1.4,0),
                ('2022-01-11',1,None,137,1611,'normal',2049,1,0),
                ('2022-01-12',1,89,None,1388,'normal',2400,2,0),
                ('2022-01-13',1,90,138,1596,'normal',1453,1.2,0),
                ('2022-01-14',1,91,None,1911,'warning',1680,None,0),
                ('2022-01-15',1,None,139,2368,'warning',1496,1,0),
                ('2022-01-16',1,None,None,2055,'warning',1574,1.2,0),
                ('2022-01-17',1,92,140,1961,'warning',2252,1.9,0),
                ('2022-01-18',1,93,None,2314,'warning',1860,2,0),
                ('2022-01-19',1,None,141,2046,'warning',2378,1.5,0),
                ('2022-01-20',1,92,None,1880,'warning',1364,1.2,0),
                ('2022-01-21',1,91,142,1289,'warning',2174,1.5,0),
                ('2022-01-22',1,90,141,1648,'normal',1928,1.9,0),
                ('2022-01-23',1,89,140,1225,'normal',2035,1.8,0),
                ('2022-01-24',1,88,139,1403,'normal',2139,1.1,0),
                ('2022-01-25',1,87,138,1381,'normal',1230,None,0),
                ('2022-01-26',1,86,137,1720,'normal',1203,1.6,0),
                ('2022-01-27',1,85,136,2392,'normal',2148,1.3,0),
                ('2022-01-28',1,84,135,1956,'normal',2073,1.2,0)]

c.executemany("INSERT INTO hourly_machine_data VALUES (?,?,?,?,?,?,?,?,?)", hourly_machine_data)

c.execute('SELECT * FROM hourly_machine_data;')

results = c.fetchall()

for result in results:
  print(result)

c.close()

('2022-01-01', 1, 84, 131, 2374, 'normal', 2171, 1.1, 0)
('2022-01-02', 1, None, 132, 1587, 'normal', 1877, 1.5, 0)
('2022-01-03', 1, 85, 133, 1206, 'normal', 1296, 1.8, 0)
('2022-01-04', 1, None, None, 2181, 'normal', 1879, 2.0, 0)
('2022-01-05', 1, None, 134, 1271, 'normal', 2170, 1.7, 0)
('2022-01-06', 1, None, None, 1508, 'normal', 1556, 1.3, 0)
('2022-01-07', 1, 86, 135, 1298, 'normal', 1749, 1.9, 0)
('2022-01-08', 1, 87, None, 1327, 'normal', 2058, None, 0)
('2022-01-09', 1, 88, 136, 1978, 'normal', 1501, 1.3, 0)
('2022-01-10', 1, None, None, 2131, 'normal', 1952, 1.4, 0)
('2022-01-11', 1, None, 137, 1611, 'normal', 2049, 1.0, 0)
('2022-01-12', 1, 89, None, 1388, 'normal', 2400, 2.0, 0)
('2022-01-13', 1, 90, 138, 1596, 'normal', 1453, 1.2, 0)
('2022-01-22', 1, 90, 141, 1648, 'normal', 1928, 1.9, 0)
('2022-01-23', 1, 89, 140, 1225, 'normal', 2035, 1.8, 0)
('2022-01-24', 1, 88, 139, 1403, 'normal', 2139, 1.1, 0)
('2022-01-25', 1, 87, 138, 1381, 'normal', 1230, None, 0)
('2022-01-26

## Replace By Last Value

1 For a single column

https://www.sqlitetutorial.net/sqlite-window-functions/sqlite-last_value/#:~:text=The%20LAST_VALUE()%20is%20a,in%20a%20specified%20window%20frame.

Missing Data

https://towardsdatascience.com/4-techniques-to-handle-missing-values-in-time-series-data-c3568589b5a8#:~:text=Time%20Series%20models%20work%20with,Impute%20the%20missing%20information

In [None]:
c = conn.cursor()

c.execute('''
  SELECT
      Machine_ID,
      Observation_datetime,
      LAST_VALUE(Casing_Temperature_F) OVER (
        PARTITION BY Machine_ID
        ORDER BY
          Observation_datetime ROWS BETWEEN UNBOUNDED PRECEDING
          AND CURRENT ROW
      ) AS Latest_Casing_Temperature_F
  FROM hourly_machine_data
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

For multiple columns

In [None]:
c = conn.cursor()

c.execute('''
  SELECT
    Machine_ID,
    Observation_datetime,
    LAST_VALUE(Casing_Temperature_F IS NOT NULL) OVER (
      PARTITION BY Machine_ID
      ORDER BY
        Observation_datetime ROWS BETWEEN UNBOUNDED PRECEDING
        AND CURRENT ROW
    ) AS Latest_Casing_Temperature_F,
    LAST_VALUE(Bearing_Temperature_F IS NOT NULL) OVER (
      PARTITION BY Machine_ID
      ORDER BY
        Observation_datetime ROWS BETWEEN UNBOUNDED PRECEDING
        AND CURRENT ROW
    ) AS Latest_Bearing_Temperature_F,
    LAST_VALUE(Flywheel_rpm IS NOT NULL) OVER (
      PARTITION BY Machine_ID
      ORDER BY
        Observation_datetime ROWS BETWEEN UNBOUNDED PRECEDING
        AND CURRENT ROW
    ) AS Flywheel_rpm
  FROM
    hourly_machine_data;
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

## **Uniquness violations in record level**


---



## Using HAVING and GROUP BY

### **Identify uniqueness violations for a key**

In [38]:
c = conn.cursor()

c.execute('''
  SELECT *, COUNT(*) AS duplicates
  FROM employees
  GROUP BY id
  HAVING COUNT(*) > 1
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(1, 'Bill', 'Tibb', 'Mr', 61, 28, '2012-05-02', '900239889v', 2)


### **Identify uniqueness violations (Duplicate all columns except primary key)**

In [67]:
c = conn.cursor()

c.execute('''
  SELECT *, COUNT(*) AS duplicates
  FROM employees
  GROUP BY f_name, l_name, title, age, wage, hire_date, NIC
  HAVING COUNT(*) > 1
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(6, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '123456789v', 2)


### **Identify uniqueness violations for a key - Display all**

In [68]:
c = conn.cursor()

c.execute('''
  SELECT a.*
  FROM employees a, (SELECT *, COUNT(*)
        FROM employees
        GROUP BY id
        HAVING COUNT(*) > 1) b
  WHERE a.id = b.id
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(1, 'kavishka', 'tim', 'Mr', 22, 28, '2022-05-01', '200005303420')
(1, 'Bill', 'Tibb', 'Mr', 61, 28, '2012-05-02', '900239889v')


### **Identify uniqueness violations (Duplicate all columns except primary key) - Display all**

In [69]:
c = conn.cursor()

c.execute('''
  SELECT a.*
  FROM employees a, (SELECT *, COUNT(*) AS duplicates
        FROM employees
        GROUP BY f_name, l_name, title, age, wage, hire_date, NIC
        HAVING COUNT(*) > 1) b
  WHERE a.f_name = b.f_name and 
        a.l_name = b.l_name and 
        a.title = b.title and 
        a.age = b.age and 
        a.wage = b.wage and 
        a.hire_date = b.hire_date and 
        a.NIC = b.NIC
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

(5, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '123456789v')
(6, 'David', 'Guerin', 'Honorable', 36, 20, '2018-03-30', '123456789v')


### **Handling**

In [None]:
c = conn.cursor()

c.execute('''
  SELECT *, COUNT(id) AS duplicates
  FROM employees
  GROUP BY f_name, l_name, title, age, wage, hire_date, NIC
  HAVING COUNT(*) > 1
''')

results = c.fetchall()

for result in results:
  c.execute("DELETE FROM employees WHERE id = ?", (result[0],))
  
c.execute('''
  SELECT *
  FROM employees
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

## Using RANK() function

### **Identify**

In [None]:
c = conn.cursor()

c.execute('''
  SELECT E.*, T.rank
  FROM employees E
    INNER JOIN
    (
    SELECT *, 
      RANK() OVER(PARTITION BY f_name, 
                                l_name,  
                                title, 
                                age, 
                                wage, 
                                hire_date, 
                                NIC
      ORDER BY id) rank
    FROM employees e
  ) T ON E.id = t.id;
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

### **Handling**

In [None]:
c = conn.cursor()

c.execute('''
DELETE E
    FROM employees E
      INNER JOIN
      (
      SELECT *, 
        RANK() OVER(PARTITION BY f_name, 
                                  l_name,  
                                  title, 
                                  age, 
                                  wage, 
                                  hire_date, 
                                  NIC
        ORDER BY id) rank
      FROM employees
    ) T ON E.id = t.id
    WHERE rank > 1;
''')

results = c.fetchall()

for result in results:
  print(result)

c.close()

## Using Common Table Expressions (CTE)

### **Identify**

In [None]:
c = conn.cursor()

c.execute('''
WITH CTE(f_name, 
    l_name, 
    title, 
    age, 
    wage, 
    hire_date,
    NIC,
    duplicatecount)
AS (SELECT f_name, 
            l_name, 
            title, 
            age, 
            wage, 
            hire_date,
            NIC, 
           ROW_NUMBER() OVER(PARTITION BY f_name, 
                                          l_name, 
                                          title, 
                                          age, 
                                          wage, 
                                          hire_date,
                                          NIC
           ORDER BY id) AS DuplicateCount
    FROM employees)
SELECT *
FROM CTE;
''')

c.close()

### **Handling**

In [None]:
c = conn.cursor()

c.execute('''
WITH CTE(f_name, 
      l_name, 
      title, 
      age, 
      wage, 
      hire_date,
      NIC,
    duplicatecount)
AS (SELECT f_name, 
            l_name, 
            title, 
            age, 
            wage, 
            hire_date,
            NIC,
           ROW_NUMBER() OVER(PARTITION BY f_name, 
                                          l_name, 
                                          title, 
                                          age, 
                                          wage, 
                                          hire_date,
                                          NIC
           ORDER BY id) AS DuplicateCount
    FROM employees)
DELETE FROM CTE
WHERE DuplicateCount > 1;
''')

c.close()