### **Database Monitoring (Google Sheets) - Cocaine Seizures 2025**
#### InSight Crime - MAD Unit. 
June, 2025

##### Luis Felipe Villota Macías

---------------------



### 1. Goals

* Monitor and validate data in a shared Google Sheet using automated checks to ensure accuracy, consistency and data quality.

* Highlight and correct invalid or suspicious entries directly in the sheet. 

* Generate weekly (every Friday) reports to support governance. 


### 2. Project Setup

#### 2.1 Version Control

I decided to create a single GitHub repository ([FelipeVillota/db-check-cocaine-seizures](https://github.com/FelipeVillota/db-check-cocaine-seizures)). I keep the repository `private` with the possibility to give access to the online repo at any time. 

#### 2.2 Reproducible Environment

In [None]:
# IMPORTANT
# To create venv
# python -m venv venv-db-watch

# To activate environment, run in Terminal:
# # (optional, temporary auth) 
# Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass 
# venv-db-watch\Scripts\activate

# Then select respective kernel --> also install ipykernel package to connect to kernel

# Update list master list
# pip freeze > requirements.txt

In [1]:
# Checking venv-db-watch works
import sys
print(sys.executable)

c:\Users\USER\Desktop\ic\db-check-cocaine-seizures\venv-db-watch\Scripts\python.exe


#### 2.3 Loading Libraries

In [None]:
# pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib pandas

In [11]:
import os
import re
import requests
import pandas as pd
from datetime import datetime
from google.oauth2 import service_account
from googleapiclient.discovery import build
# pip freeze > requirements.txt

### 3.  Approach

My general idea is to create a modular client (frontend) call that extracts just the subset of data required from the API server (backend); -and, make it easily reusable for future queries.


### 4. Execution

#### 4.1 Accessing the API

In [6]:
# Path to downloaded JSON credentials
SERVICE_ACCOUNT_FILE = 'C:/Users/USER/Desktop/ic/llavero/summer-sector-439022-v6-2eafffbbfb90.json'


In [None]:
# ID of the Google Sheet (from the URL)

original_id = '1t61MafCmnRe2QN082Bk1V0IxBSIW8UUqH1g5mULgb2o'
test_id = '18hdnhuqvH4vdXuL16CBI7BpaMCu8K81T2NThbK6OJzk'

SPREADSHEET_ID = test_id
SPREADSHEET_ID

'18hdnhuqvH4vdXuL16CBI7BpaMCu8K81T2NThbK6OJzk'

In [14]:

# Range to read from your sheet (e.g. 'Sheet1!A1:Z1000')
RANGE_NAME = '2025!A1:Z10000'  # Adjust the range as needed

In [9]:

# Define scopes for Google Sheets and Drive API
SCOPES = ['https://www.googleapis.com/auth/spreadsheets', 'https://www.googleapis.com/auth/drive']

In [15]:
#Authenticate and build the service
creds = service_account.Credentials.from_service_account_file(
    SERVICE_ACCOUNT_FILE, scopes=SCOPES)
service = build('sheets', 'v4', credentials=creds)

# Call the Sheets API to read data
sheet = service.spreadsheets()
result = sheet.values().get(spreadsheetId=SPREADSHEET_ID, range=RANGE_NAME).execute()
values = result.get('values', [])

# Convert to DataFrame for easier manipulation
df = pd.DataFrame(values[1:], columns=values[0])
print(df.head()) 

      Type   Time unit        Date Date 2  Year Month Day Duration  \
0  Seizure  Individual  2025-03-24         2025     3  24            
1  Seizure  Individual  2025-03-23         2025     3  23            
2  Seizure  Individual  2025-03-22         2025     3  22            
3  Seizure  Individual  2025-03-22         2025     3  22            
4  Seizure  Individual  2025-03-19         2025     3  19            

                       Type Drugs Quantity  ... Department/State  \
0                         Cocaine       10  ...                    
1                         Cocaine      2.5  ...                    
2                         Cocaine     2619  ...            Zulia   
3  Other (explain in Description)     1240  ...            Zulia   
4                         Cocaine    16.05  ...            Texas   

          Municipality/Port Origin country         Origin Area  \
0                  Florence                                      
1             Santo Domingo           

#### 4.2 Data Analysis

________________