# Breakout room goals

```
* skim https://docs.google.com/document/d/1xk5wB-1xiTkH8wawYwK2hlYFkZKYD3CQcEDi5ewaiVk/edit?usp=sharing to understand what one of the Just Data Lab community partners is interested in
* use the skills we have learned so far to (1) download the raw data (2) use duckdb to parse the csv into a parquet file (3) use altair to visualize the columns pertinent to a research question the organization is interested in
* post a screenshot in this figma
```

## Data download

https://drive.google.com/file/d/12-MZ2I2qWoECBHQaEX1JJ2pd77itDmej/view?usp=sharing 


## Load data into duckdb

In [9]:
!head ./data/surveillanceresistancelab.org/raw/JDL_NYPD\ Contracts\ 12.15.22.csv

Status,Category,Vendor Record Type,Vendor,Associated Prime Vendor,M/WBE Category,Woman Owned Business,Emerging Business,Agency,Expense Category,Contract ID,Parent Contract ID,Version Number,Contract type,Purpose,Industry, Current Amount , Original Amount ,Start date,End Date,Registration date,Received date,Award Method,APT PIN,PIN
Registered,Expense,Prime Vendor,PINA M INC,N/A,Women (Non-Minority),Yes,No ,Police Department,EQUIPMENT GENERAL,CT105620231408551,,1,SUPPLIES/MATERIALS/EQUIPMENT,LACTATION PODS  QMS 1529,Goods," $55,590.00 "," $55,590.00 ",11/9/2022,6/30/2024,12/13/2022,,SM PURCH GOODS SERVICES 100K,,233660122
Registered,Expense,Prime Vendor,LIRO ENGINEERS INC,N/A,Non-M/WBE,No ,No ,Police Department,PROF SERV ENGINEER & ARCHITECT,CT105620238805005,,1,REQUIREMENTS-SERVICES,Environmental Engineering & Laboratory Services - Renewal #1,Professional Services," $1,225,000.00 "," $1,225,000.00 ",11/3/2022,11/2/2025,12/9/2022,,RENEWAL OF CONTRACT,,05620P8148KXLR001
Registered,Expense

In [11]:
# Load duckdb, which lets us efficiently load large files
import duckdb

# Load pandas, which lets us manipulate dataframes
import pandas as pd

# Import jupysql Jupyter extension to create SQL cells
%load_ext sql

# Set configrations on jupysql to directly output data to Pandas and to simplify the output that is printed to the notebook.
%config SqlMagic.autopandas = True

%config SqlMagic.feedback = False
%config SqlMagic.displaycon = False

# Connect jupysql to DuckDB using a SQLAlchemy-style connection string. Either connect to an in memory DuckDB, or a file backed db.
%sql duckdb:///:memory:

[33mThere's a new jupysql version available (0.7.9), you're running 0.7.8. To upgrade: pip install jupysql --upgrade[0m


### GPT-4 prompt

```quote
%%sql
SELECT *
FROM read_csv('https://data.cityofnewyork.us/api/views/erm2-nwe9/rows.csv?accessType=DOWNLOAD',
    header=True,
    delim=',',
    quote='"',
    columns={'Unique Key': 'BIGINT',
    'Created Date': 'VARCHAR',
    'Closed Date': 'VARCHAR',
    'Agency': 'VARCHAR',
    'Agency Name': 'VARCHAR',
    'Complaint Type': 'VARCHAR',
    'Descriptor': 'VARCHAR',
    'Location Type': 'VARCHAR',
    'Incident Zip': 'VARCHAR',
    'Incident Address': 'VARCHAR',
    'Street Name': 'VARCHAR',
    'Cross Street 1': 'VARCHAR',
    'Cross Street 2': 'VARCHAR',
    'Intersection Street 1': 'VARCHAR',
    'Intersection Street 2': 'VARCHAR',
    'Address Type': 'VARCHAR',
    'City': 'VARCHAR',
    'Landmark': 'VARCHAR',
    'Facility Type': 'VARCHAR',
    'Status': 'VARCHAR',
    'Due Date': 'VARCHAR',
    'Resolution Description': 'VARCHAR',
    'Resolution Action Updated Date': 'VARCHAR',
    'Community Board': 'VARCHAR',
    'BBL': 'VARCHAR',
    'Borough': 'VARCHAR',
    'X Coordinate (State Plane)': 'VARCHAR',
    'Y Coordinate (State Plane)': 'VARCHAR',
    'Open Data Channel Type': 'VARCHAR',
    'Park Facility Name': 'VARCHAR',
    'Park Borough': 'VARCHAR',
    'Vehicle Type': 'VARCHAR',
    'Taxi Company Borough': 'VARCHAR',
    'Taxi Pick Up Location': 'VARCHAR',
    'Bridge Highway Name': 'VARCHAR',
    'Bridge Highway Direction': 'VARCHAR',
    'Road Ramp': 'VARCHAR',
    'Bridge Highway Segment': 'VARCHAR',
    'Latitude': 'DOUBLE',
    'Longitude': 'DOUBLE',
    'Location': 'VARCHAR'}) 
LIMIT 10;

please rewrite the above example using this csv file: `./data/surveillanceresistancelab.org/raw/JDL_NYPD\ Contracts\ 12.15.22.csv`, whose header is the following: 

```
Status,Category,Vendor Record Type,Vendor,Associated Prime Vendor,M/WBE Category,Woman Owned Business,Emerging Business,Agency,Expense Category,Contract ID,Parent Contract ID,Version Number,Contract type,Purpose,Industry, Current Amount , Original Amount ,Start date,End Date,Registration date,Received date,Award Method,APT PIN,PIN
Registered,Expense,Prime Vendor,PINA M INC,N/A,Women (Non-Minority),Yes,No ,Police Department,EQUIPMENT GENERAL,CT105620231408551,,1,SUPPLIES/MATERIALS/EQUIPMENT,LACTATION PODS  QMS 1529,Goods," $55,590.00 "," $55,590.00 ",11/9/2022,6/30/2024,12/13/2022,,SM PURCH GOODS SERVICES 100K,,233660122
Registered,Expense,Prime Vendor,LIRO ENGINEERS INC,N/A,Non-M/WBE,No ,No ,Police Department,PROF SERV ENGINEER & ARCHITECT,CT105620238805005,,1,REQUIREMENTS-SERVICES,Environmental Engineering & Laboratory Services - Renewal #1,Professional Services," $1,225,000.00 "," $1,225,000.00 ",11/3/2022,11/2/2025,12/9/2022,,RENEWAL OF CONTRACT,,05620P8148KXLR001
Registered,Expense,Prime Vendor,AMCHAR WHOLESALE  INC,N/A,Non-M/WBE,No ,No ,Police Department,SUPPLIES + MATERIALS - GENERAL,CT105620231408596,,1,SUPPLIES/MATERIALS/EQUIPMENT,SIMUNITION AMMO  QMS 1349,Goods," $79,970.40 "," $79,970.40 ",12/1/2022,6/30/2023,12/9/2022,,SM PURCH GOODS SERVICES 100K,,233700031
Registered,Expense,Prime Vendor,ADVANTAGE TRAVEL INC,N/A,Non-M/WBE,No ,No ,Police Department,OVERNIGHT TRVL EXP-GENERAL,CT105620231407951,,1,REQUIREMENTS-SERVICES,TRAVEL AGENT SERVICE  QMS 1248,Goods," $80,000.00 "," $80,000.00 ",1/1/2023,12/31/2025,12/9/2022,,SM PURCH GOODS SERVICES 100K,,233580015
Registered,Expense,Prime Vendor,US CHILLER SERVICE NY LLC,N/A,Non-M/WBE,No ,No ,Police Department,MAINT & OPER OF INFRASTRUCTURE,CT105620238804507,,1,WORK/LABOR,Renewal,Standardized Services," $145,731.00 "," $145,731.00 ",12/8/2022,12/7/2023,12/8/2022,,RENEWAL OF CONTRACT,,05618B8229KXLR001
Registered,Expense,Prime Vendor,AVCO ENTERPRISES DENTSERVE,N/A,Asian American,Yes,No ,Police Department,EQUIPMENT GENERAL,CT105620231408327,,1,SUPPLIES/MATERIALS/EQUIPMENT,COMPRESSION BANDAGES FOR NYPD-POLICE ACADEMY_MERCI UNIT,Goods," $40,329.54 "," $40,329.54 ",12/2/2022,6/30/2023,12/8/2022,,SM PURCH GOODS SERVICES 100K,,233840073
Registered,Expense,Prime Vendor,Walton Isaacson LLC,N/A,Non-M/WBE,No ,No ,Police Department,ADVERTISING,CT105620238804876,,1,REQUIREMENTS-SERVICES,NYPD Recruitment Advertising Media Strategy Services,Professional Services," $5,000,000.00 "," $5,000,000.00 ",11/1/2022,10/31/2024,12/8/2022,,RENEWAL OF CONTRACT,,05618P8214KXLR001
Registered,Expense,Prime Vendor,LUCCAH CONSULTING LLC,N/A,Women (Non-Minority),Yes,No ,Police Department,EQUIPMENT GENERAL,CT105620231409841,,1,SUPPLIES/MATERIALS/EQUIPMENT,RUFF LAND PERFORMANCE DOG KENNELS,Goods," $13,610.00 "," $13,610.00 ",12/5/2022,6/30/2023,12/6/2022,,SMALL PURCHASE - WRITTEN,,237080033
Registered,Expense,Prime Vendor,CEN-MED ENTERPRISES INC,N/A,Asian American,Yes,No ,Police Department,CONTRACTUAL SERVICES GENERAL,CT105620231409331,,1,SUPPLIES/MATERIALS/EQUIPMENT,AGILENT INSTRUMENTS REPAIR AND EXCHANGE  QMS 1580,Goods," $18,001.00 "," $18,001.00 ",11/28/2022,6/30/2023,12/6/2022,,SMALL PURCHASE - WRITTEN,,235640044
```
```

Response: https://chat.openai.com/share/dbe04ed0-b6ff-4d20-83aa-bd96d649f6c3 and below

In [12]:
%%sql
SELECT *
FROM read_csv('./data/surveillanceresistancelab.org/raw/JDL_NYPD Contracts 12.15.22.csv',
    header=True,
    delim=',',
    quote='"',
    columns={'Status': 'VARCHAR',
    'Category': 'VARCHAR',
    'Vendor Record Type': 'VARCHAR',
    'Vendor': 'VARCHAR',
    'Associated Prime Vendor': 'VARCHAR',
    'M/WBE Category': 'VARCHAR',
    'Woman Owned Business': 'VARCHAR',
    'Emerging Business': 'VARCHAR',
    'Agency': 'VARCHAR',
    'Expense Category': 'VARCHAR',
    'Contract ID': 'VARCHAR',
    'Parent Contract ID': 'VARCHAR',
    'Version Number': 'VARCHAR',
    'Contract type': 'VARCHAR',
    'Purpose': 'VARCHAR',
    'Industry': 'VARCHAR',
    'Current Amount': 'VARCHAR',
    'Original Amount': 'VARCHAR',
    'Start date': 'VARCHAR',
    'End Date': 'VARCHAR',
    'Registration date': 'VARCHAR',
    'Received date': 'VARCHAR',
    'Award Method': 'VARCHAR',
    'APT PIN': 'VARCHAR',
    'PIN': 'VARCHAR'})
LIMIT 10;


Unnamed: 0,Status,Category,Vendor Record Type,Vendor,Associated Prime Vendor,M/WBE Category,Woman Owned Business,Emerging Business,Agency,Expense Category,...,Industry,Current Amount,Original Amount,Start date,End Date,Registration date,Received date,Award Method,APT PIN,PIN
0,Registered,Expense,Prime Vendor,PINA M INC,,Women (Non-Minority),Yes,No,Police Department,EQUIPMENT GENERAL,...,Goods,"$55,590.00","$55,590.00",11/9/2022,6/30/2024,12/13/2022,,SM PURCH GOODS SERVICES 100K,,233660122
1,Registered,Expense,Prime Vendor,LIRO ENGINEERS INC,,Non-M/WBE,No,No,Police Department,PROF SERV ENGINEER & ARCHITECT,...,Professional Services,"$1,225,000.00","$1,225,000.00",11/3/2022,11/2/2025,12/9/2022,,RENEWAL OF CONTRACT,,05620P8148KXLR001
2,Registered,Expense,Prime Vendor,AMCHAR WHOLESALE INC,,Non-M/WBE,No,No,Police Department,SUPPLIES + MATERIALS - GENERAL,...,Goods,"$79,970.40","$79,970.40",12/1/2022,6/30/2023,12/9/2022,,SM PURCH GOODS SERVICES 100K,,233700031
3,Registered,Expense,Prime Vendor,ADVANTAGE TRAVEL INC,,Non-M/WBE,No,No,Police Department,OVERNIGHT TRVL EXP-GENERAL,...,Goods,"$80,000.00","$80,000.00",1/1/2023,12/31/2025,12/9/2022,,SM PURCH GOODS SERVICES 100K,,233580015
4,Registered,Expense,Prime Vendor,US CHILLER SERVICE NY LLC,,Non-M/WBE,No,No,Police Department,MAINT & OPER OF INFRASTRUCTURE,...,Standardized Services,"$145,731.00","$145,731.00",12/8/2022,12/7/2023,12/8/2022,,RENEWAL OF CONTRACT,,05618B8229KXLR001
5,Registered,Expense,Prime Vendor,AVCO ENTERPRISES DENTSERVE,,Asian American,Yes,No,Police Department,EQUIPMENT GENERAL,...,Goods,"$40,329.54","$40,329.54",12/2/2022,6/30/2023,12/8/2022,,SM PURCH GOODS SERVICES 100K,,233840073
6,Registered,Expense,Prime Vendor,Walton Isaacson LLC,,Non-M/WBE,No,No,Police Department,ADVERTISING,...,Professional Services,"$5,000,000.00","$5,000,000.00",11/1/2022,10/31/2024,12/8/2022,,RENEWAL OF CONTRACT,,05618P8214KXLR001
7,Registered,Expense,Prime Vendor,LUCCAH CONSULTING LLC,,Women (Non-Minority),Yes,No,Police Department,EQUIPMENT GENERAL,...,Goods,"$13,610.00","$13,610.00",12/5/2022,6/30/2023,12/6/2022,,SMALL PURCHASE - WRITTEN,,237080033
8,Registered,Expense,Prime Vendor,CEN-MED ENTERPRISES INC,,Asian American,Yes,No,Police Department,CONTRACTUAL SERVICES GENERAL,...,Goods,"$18,001.00","$18,001.00",11/28/2022,6/30/2023,12/6/2022,,SMALL PURCHASE - WRITTEN,,235640044
9,Registered,Expense,Prime Vendor,QUEST DIAGNOSTICS CLINICAL LABORATORIES INC,,Non-M/WBE,No,No,Police Department,PROF SERV OTHER,...,Not Classified,"$440,000.00","$440,000.00",3/4/2020,7/15/2022,12/2/2022,,RENEWAL OF CONTRACT,,05616B8252KXLR001


In [45]:
%%sql
COPY (SELECT *
FROM read_csv('./data/surveillanceresistancelab.org/raw/JDL_NYPD Contracts 12.15.22.csv',
    header=True,
    delim=',',
    quote='"',
    nullstr='N/A',
    columns={'Status': 'VARCHAR',
    'Category': 'VARCHAR',
    'Vendor Record Type': 'VARCHAR',
    'Vendor': 'VARCHAR',
    'Associated Prime Vendor': 'VARCHAR',
    'M/WBE Category': 'VARCHAR',
    'Woman Owned Business': 'VARCHAR',
    'Emerging Business': 'VARCHAR',
    'Agency': 'VARCHAR',
    'Expense Category': 'VARCHAR',
    'Contract ID': 'VARCHAR',
    'Parent Contract ID': 'VARCHAR',
    'Version Number': 'VARCHAR',
    'Contract type': 'VARCHAR',
    'Purpose': 'VARCHAR',
    'Industry': 'VARCHAR',
    'Current Amount': 'VARCHAR',
    'Original Amount': 'VARCHAR',
    'Start date': 'VARCHAR',
    'End Date': 'VARCHAR',
    'Registration date': 'VARCHAR',
    'Received date': 'VARCHAR',
    'Award Method': 'VARCHAR',
    'APT PIN': 'VARCHAR',
    'PIN': 'VARCHAR'})
-- LIMIT 1000000 -- uncomment this line to create a smaller version of the file for testing purposes
) TO './data/surveillanceresistancelab.org/processed/JDL_NYPD Contracts 12.15.22.parquet' (COMPRESSION ZSTD);


Unnamed: 0,Count
0,17809


# Visualizing the data

In [46]:
import vegafusion as vf
import polars as pl
import altair as alt
import altair as alt
alt.data_transformers.disable_max_rows()
alt.renderers.enable('html')

# Configure DuckDB connection
vf.runtime.set_connection("duckdb")

# Enable Mime Renderer
vf.enable(row_limit=100000000)

vegafusion.enable(mimetype='html', row_limit=100000000, embed_options=None)

In [47]:
# Load the data into a Polars datafram
contracts = pl.read_parquet("./data/surveillanceresistancelab.org/processed/JDL_NYPD Contracts 12.15.22.parquet")


In [48]:
print(contracts.schema)

{'Status': Utf8, 'Category': Utf8, 'Vendor Record Type': Utf8, 'Vendor': Utf8, 'Associated Prime Vendor': Utf8, 'M/WBE Category': Utf8, 'Woman Owned Business': Utf8, 'Emerging Business': Utf8, 'Agency': Utf8, 'Expense Category': Utf8, 'Contract ID': Utf8, 'Parent Contract ID': Utf8, 'Version Number': Utf8, 'Contract type': Utf8, 'Purpose': Utf8, 'Industry': Utf8, 'Current Amount': Utf8, 'Original Amount': Utf8, 'Start date': Utf8, 'End Date': Utf8, 'Registration date': Utf8, 'Received date': Utf8, 'Award Method': Utf8, 'APT PIN': Utf8, 'PIN': Utf8}


In [49]:
contracts

Status,Category,Vendor Record Type,Vendor,Associated Prime Vendor,M/WBE Category,Woman Owned Business,Emerging Business,Agency,Expense Category,Contract ID,Parent Contract ID,Version Number,Contract type,Purpose,Industry,Current Amount,Original Amount,Start date,End Date,Registration date,Received date,Award Method,APT PIN,PIN
str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str,str
"""Registered""","""Expense""","""Prime Vendor""","""PINA M INC""",,"""Women (Non-Min…","""Yes""","""No ""","""Police Departm…","""EQUIPMENT GENE…","""CT105620231408…","""""","""1""","""SUPPLIES/MATER…","""LACTATION PODS…","""Goods""",""" $55,590.00 """,""" $55,590.00 ""","""11/9/2022""","""6/30/2024""","""12/13/2022""","""""","""SM PURCH GOODS…","""""","""233660122"""
"""Registered""","""Expense""","""Prime Vendor""","""LIRO ENGINEERS…",,"""Non-M/WBE""","""No ""","""No ""","""Police Departm…","""PROF SERV ENGI…","""CT105620238805…","""""","""1""","""REQUIREMENTS-S…","""Environmental …","""Professional S…",""" $1,225,000.00…",""" $1,225,000.00…","""11/3/2022""","""11/2/2025""","""12/9/2022""","""""","""RENEWAL OF CON…","""""","""05620P8148KXLR…"
"""Registered""","""Expense""","""Prime Vendor""","""AMCHAR WHOLESA…",,"""Non-M/WBE""","""No ""","""No ""","""Police Departm…","""SUPPLIES + MAT…","""CT105620231408…","""""","""1""","""SUPPLIES/MATER…","""SIMUNITION AMM…","""Goods""",""" $79,970.40 """,""" $79,970.40 ""","""12/1/2022""","""6/30/2023""","""12/9/2022""","""""","""SM PURCH GOODS…","""""","""233700031"""
"""Registered""","""Expense""","""Prime Vendor""","""ADVANTAGE TRAV…",,"""Non-M/WBE""","""No ""","""No ""","""Police Departm…","""OVERNIGHT TRVL…","""CT105620231407…","""""","""1""","""REQUIREMENTS-S…","""TRAVEL AGENT S…","""Goods""",""" $80,000.00 """,""" $80,000.00 ""","""1/1/2023""","""12/31/2025""","""12/9/2022""","""""","""SM PURCH GOODS…","""""","""233580015"""
"""Registered""","""Expense""","""Prime Vendor""","""US CHILLER SER…",,"""Non-M/WBE""","""No ""","""No ""","""Police Departm…","""MAINT & OPER O…","""CT105620238804…","""""","""1""","""WORK/LABOR""","""Renewal""","""Standardized S…",""" $145,731.00 """,""" $145,731.00 ""","""12/8/2022""","""12/7/2023""","""12/8/2022""","""""","""RENEWAL OF CON…","""""","""05618B8229KXLR…"
"""Registered""","""Expense""","""Prime Vendor""","""AVCO ENTERPRIS…",,"""Asian American…","""Yes""","""No ""","""Police Departm…","""EQUIPMENT GENE…","""CT105620231408…","""""","""1""","""SUPPLIES/MATER…","""COMPRESSION BA…","""Goods""",""" $40,329.54 """,""" $40,329.54 ""","""12/2/2022""","""6/30/2023""","""12/8/2022""","""""","""SM PURCH GOODS…","""""","""233840073"""
"""Registered""","""Expense""","""Prime Vendor""","""Walton Isaacso…",,"""Non-M/WBE""","""No ""","""No ""","""Police Departm…","""ADVERTISING""","""CT105620238804…","""""","""1""","""REQUIREMENTS-S…","""NYPD Recruitme…","""Professional S…",""" $5,000,000.00…",""" $5,000,000.00…","""11/1/2022""","""10/31/2024""","""12/8/2022""","""""","""RENEWAL OF CON…","""""","""05618P8214KXLR…"
"""Registered""","""Expense""","""Prime Vendor""","""LUCCAH CONSULT…",,"""Women (Non-Min…","""Yes""","""No ""","""Police Departm…","""EQUIPMENT GENE…","""CT105620231409…","""""","""1""","""SUPPLIES/MATER…","""RUFF LAND PERF…","""Goods""",""" $13,610.00 """,""" $13,610.00 ""","""12/5/2022""","""6/30/2023""","""12/6/2022""","""""","""SMALL PURCHASE…","""""","""237080033"""
"""Registered""","""Expense""","""Prime Vendor""","""CEN-MED ENTERP…",,"""Asian American…","""Yes""","""No ""","""Police Departm…","""CONTRACTUAL SE…","""CT105620231409…","""""","""1""","""SUPPLIES/MATER…","""AGILENT INSTRU…","""Goods""",""" $18,001.00 """,""" $18,001.00 ""","""11/28/2022""","""6/30/2023""","""12/6/2022""","""""","""SMALL PURCHASE…","""""","""235640044"""
"""Registered""","""Expense""","""Prime Vendor""","""QUEST DIAGNOST…",,"""Non-M/WBE""","""No ""","""No ""","""Police Departm…","""PROF SERV OTHE…","""CT105620238803…","""""","""1""","""WORK/LABOR""","""Drug Screening…","""Not Classified…",""" $440,000.00 """,""" $440,000.00 ""","""3/4/2020""","""7/15/2022""","""12/2/2022""","""""","""RENEWAL OF CON…","""""","""05616B8252KXLR…"


In [31]:
# Create a bar chart
alt.Chart(contracts).mark_bar().encode(
    x='Agency:N',
    y='count()',
)

In [32]:
# Create a bar chart
alt.Chart(contracts).mark_bar().encode(
    x='M/WBE Category:N',
    y='count()',
)

In [33]:
# Create a bar chart
alt.Chart(contracts).mark_bar().encode(
    x='Expense Category:N',
    y='count()',
)

In [39]:
# Create a bar chart
alt.Chart(contracts).mark_bar().encode(
    x='Industry:N',
    y='count()',
)

In [50]:
# Create a bar chart
alt.Chart(contracts).mark_bar().encode(
    x='Award Method:N',
    y='count()',
)