# Building a Database for Crime Reports in Boston

In this project, we will create a PostgreSQL database named `crimes_db` to store and manage crime data from Boston. Using the provided `boston.csv` dataset, we will:

- Design a schema (`crimes`) and a table (`boston_crimes`) with appropriate datatypes.
- Import the data into the database.
- Create user roles (`readonly` and `readwrite`) with appropriate privileges.
- Assign users to these roles to manage database access.

## Creating the Crime Database

We will start by creating a database for storing our crime data, along with a schema for the tables. 

In [None]:
import psycopg2

conn = psycopg2.connect(dbname="postgres", user="postgres")
conn.autocommit = True
cur = conn.cursor()
cur.execute("CREATE DATABASE crime_db;")
conn.close()

conn = psycopg2.connect(dbname="crime_db", user="postgres")
cur = conn.cursor()
cur.execute("CREATE SCHEMA crimes;")

## Obtaining the Column Names and Sample

Before we start creating tables, let's gather some data about our crime dataset so that we can select the right datatypes to use.

In [None]:
import csv

with open("boston.csv") as f:
    reader = csv.reader(f)
    col_headers = next(reader)
    first_row = next(reader)

print("Column names -->", col_headers)
print("First row of data -->", first_row)

## Creating an Auxiliary Function

To help us identify the right datatypes for the columns, we'll create a function that computes a Python set with all of the distinct values contained within a column.

The function will be useful for:

- Checking whether an enumerated datatype might be a good choice for representing a column.
- Computing the maximum length of any text-like column to select appropriate sizes for `VARCHAR` columns. 

In [None]:
def get_col_set(csv_filename, col_index):
    with open(csv_filename) as f:
        reader = csv.reader(f)
        next(reader)
        return set(row[col_index] for row in reader)

for i in range(len(col_headers)):
    num_unique_values = len(get_col_set("boston.csv", i))
    print(f"Column '{col_headers[i]}' has {num_unique_values} unique values")