### **Database Monitoring (Google Sheets) - Cocaine Seizures 2025**
#### InSight Crime - MAD Unit. 
June, 2025

##### Luis Felipe Villota Macías

---------------------



### 1. Goals

* MONITOR & VALIDATE -> data in a shared Google Sheet using automated checks to ensure accuracy, consistency and data quality.

* HIGHLIGHT & CORRECT -> invalid or suspicious entries directly in the sheet. 

* REPORT -> Generate weekly (every Friday) reports to support governance. 


### 2. Project Setup

#### 2.1 Version Control

I decided to create a single GitHub repository ([FelipeVillota/db-check-cocaine-seizures](https://github.com/FelipeVillota/db-check-cocaine-seizures)). I keep the repository `private` with the possibility to give access to the online repo at any time. 

#### 2.2 Reproducible Environment

In [2]:
# IMPORTANT
# To create venv
# python -m venv venv-db-watch

# To activate environment, run in Terminal:
# # (optional, temporary auth) 
# Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass 
# venv-db-watch\Scripts\activate

# Then select respective kernel --> also install ipykernel package to connect to kernel

# Update list master list
# pip freeze > requirements.txt

In [3]:
# Checking venv-db-watch works
import sys
print(sys.executable)

c:\Users\USER\Desktop\ic\db-check-cocaine-seizures\venv-db-watch\Scripts\python.exe


#### 2.3 Loading Libraries

In [4]:
# pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib pandas

In [5]:
import os
import re
import requests
import pandas as pd
from datetime import datetime
from google.oauth2 import service_account
from googleapiclient.discovery import build
# pip freeze > requirements.txt

### 3.  Approach

The general idea is to create a modular client (frontend) call that is able to extract the desired subset of data from the API server (backend); -and, make it easily reusable for future queries.


### 4. Execution

#### 4.1 Accessing the API

In [6]:
# Path to downloaded JSON credentials
SERVICE_ACCOUNT_FILE = 'C:/Users/USER/Desktop/ic/llavero/summer-sector-439022-v6-2eafffbbfb90.json'

In [7]:
# ID of the Google Sheet (from the URL)

original_id = '1t61MafCmnRe2QN082Bk1V0IxBSIW8UUqH1g5mULgb2o'
test_id = '18hdnhuqvH4vdXuL16CBI7BpaMCu8K81T2NThbK6OJzk'

SPREADSHEET_ID = test_id
SPREADSHEET_ID

'18hdnhuqvH4vdXuL16CBI7BpaMCu8K81T2NThbK6OJzk'

In [8]:

# Range to read from your sheet (e.g. 'Sheet1!A1:Z1000')
RANGE_NAME = '2025!A1:Z10000'  # Adjust the range as needed

In [9]:

# Define scopes for Google Sheets and Drive API
SCOPES = ['https://www.googleapis.com/auth/spreadsheets', 'https://www.googleapis.com/auth/drive']

In [10]:
#Authenticate and build the service
creds = service_account.Credentials.from_service_account_file(
    SERVICE_ACCOUNT_FILE, scopes=SCOPES)
service = build('sheets', 'v4', credentials=creds)

# Call the Sheets API to read data
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SPREADSHEET_ID, range=RANGE_NAME).execute()
values = result.get('values', [])

# Convert to DataFrame for easier manipulation
df = pd.DataFrame(values[1:], columns=values[0])
print(df.head()) 

      Type   Time unit        Date Date 2  Year Month Day Duration  \
0  Seizure  Individual  2025-03-24         2025     3  24            
1  Seizure  Individual  2025-03-23         2025     3  23            
2  Seizure  Individual  2025-03-22         2025     3  22            
3  Seizure  Individual  2025-03-22         2025     3  22            
4  Seizure  Individual  2025-03-19         2025     3  19            

                       Type Drugs Quantity  ... Department/State  \
0                         Cocaine       10  ...                    
1                         Cocaine      2.5  ...                    
2                         Cocaine     2619  ...            Zulia   
3  Other (explain in Description)     1240  ...            Zulia   
4                         Cocaine    16.05  ...            Texas   

          Municipality/Port Origin country         Origin Area  \
0                  Florence                                      
1             Santo Domingo           

#### 4.2 Data Validation Management

In [11]:
print(df.columns)

Index(['Type', 'Time unit', 'Date', 'Date 2', 'Year', 'Month', 'Day',
       'Duration', 'Type Drugs', 'Quantity', 'Weight unit', 'seizure_kgs',
       'Modus Operandi/place of seizure', 'Sub MO', 'Region', 'Country',
       'Department/State', 'Municipality/Port', 'Origin country',
       'Origin Area', 'Origin municipality', 'Transit 1/Region',
       'Transit 1/country', 'Transit 1/Department', 'Transit 1/Municipality',
       'Transit 2/ region'],
      dtype='object')


In [None]:
print(df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 378 entries, 0 to 377
Data columns (total 26 columns):
 #   Column                           Non-Null Count  Dtype 
---  ------                           --------------  ----- 
 0   Type                             378 non-null    object
 1   Time unit                        378 non-null    object
 2   Date                             378 non-null    object
 3   Date 2                           378 non-null    object
 4   Year                             378 non-null    object
 5   Month                            378 non-null    object
 6   Day                              378 non-null    object
 7   Duration                         378 non-null    object
 8   Type Drugs                       378 non-null    object
 9   Quantity                         378 non-null    object
 10  Weight unit                      378 non-null    object
 11  seizure_kgs                      378 non-null    object
 12  Modus Operandi/place of seizure  378

In [14]:
for col in df.columns:
    print(f"{col} → {df[col].dropna().unique()[:10]}")


Type → ['Seizure']
Time unit → ['Individual' 'Month' 'Multi-Month' 'Other (explain in Description)'
 'Year']
Date → ['2025-03-24' '2025-03-23' '2025-03-22' '2025-03-19' '2025-03-25'
 '2025-03-26' '2025-03-27' '2025-03-20' '2025-03-10' '2025-03-21']
Date 2 → ['' '2025-03-20' '2025-03-31' '2025-04-01' '2025-04-03' '2025-04-24'
 '2025-05-12' '2025-05-17' '2025-04-30']
Year → ['2025' '']
Month → ['3' '1' '4' '2' '5' '' '6']
Day → ['24' '23' '22' '19' '25' '26' '27' '20' '10' '21']
Duration → ['' '10' '89' '1' '5' '92' '3' '82' '131' '136']
Type Drugs → ['Cocaine' 'Other (explain in Description)' 'Cocaine Base' 'Coca Crops'
 'All/Unspecified/Multiple' 'Cocaine - Crack']
Quantity → ['10' '2.5' '2619' '1240' '16.05' '65' '6600' '142' '550' '2050']
Weight unit → ['Kilogram' 'Pounds (lbs)' 'Plant' 'Package'
 'Other (explain in Description)'
 'Other currency (say which in Description)' '' 'USD' 'Euro' 'Pounds (£)']
seizure_kgs → ['9' '2.5' '2619' '1240' '16.05' '65' '6600' '142' '250' '2050']
Mo

#### 4.3 Data Anlysis & Reporting
