## Washington State Liquor and Cannabis Board (WSLCB) Open Data Portal

*For an introduction to the WSLCB, [see the README in the parent directory](../README.md).*

*For an introduction to the WSLCB's Socrata-based Open Data Portal, [see the README in this directory](./README.md).*

### Dataset: Compliance Checks

* Canonical Dataset ID: **3qmf-vgdg**
* Detail screen on the WSLCB Portal: https://data.lcb.wa.gov/dataset/Compliance-Checks-Dataset/auqz-2kjf
* Detail screen on Socrata's Open Data Foundry: https://dev.socrata.com/foundry/data.lcb.wa.gov/3qmf-vgdg

We'll be using the [`cannapy`](https://github.com/CannabisData/cannapy) library to access the portal data.  `cannapy` aims to provide an abstract interface for accessing and working with *Cannabis* data from around the world.  It utilizes [xmunoz](https://github.com/xmunoz)'s [`sodapy`](https://github.com/xmunoz/sodapy) client to access Socrata-based open data portals and can return data loaded into [Pandas DataFrames](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).

In [1]:
import time
import cannapy.us.wa.wslcb.portal as wslcb
import pandas as pd

In [2]:
# Specify your own Socrata App Token if you plan to experiment
app_token = 'XaB9MBqc81C3KT4Vps6Wh5LZt'

# Instantiate a cannapy interface to the WSLCB open data portal
portal = wslcb.WSLCBPortal(app_token)

# We'll be using the Compliance Checks dataset
dataset_id = '3qmf-vgdg'

In [3]:
# Check when the dataset was last updated
last_updated = portal.dataset_last_updated(dataset_id)
print('Last updated: {}'.format(time.strftime('%c', last_updated)))

Last updated: Thu Nov 16 10:54:59 2017


In [4]:
# Retrieve the dataset preloaded into a Pandas DataFrame
df = portal.get_dataframe(dataset_id)

# Validate we've got the right data by examining the first few rows
df.head()

Unnamed: 0,date,license_number,county_name,city_name,action
0,2017-06-09T00:00:00.000,420741,KING,BURIEN,Marijuana Compliance Check-no Sale
1,2017-05-25T00:00:00.000,420619,SNOHOMISH,UNINCORPORATED AREAS,Marijuana Compliance Check-no Sale
2,2017-06-08T00:00:00.000,414280,COWLITZ,UNINCORP. AREAS,Marijuana Compliance Check-no Sale
3,2017-06-12T00:00:00.000,353993,GRAYS HARBOR,HOQUIAM,Marijuana Compliance Check-no Sale
4,2017-06-13T00:00:00.000,419640,THURSTON,UNINCORPORATED AREAS,Marijuana Compliance Check-no Sale


In [9]:
# Looks like we need to combine the city name values for "Unincorporated Areas".
df.replace(to_replace='UNINCORP. AREAS', value='UNINCORPORATED AREAS', inplace=True)

In [10]:
# The DataFrame value_counts() histogramming method is useful for asking basic questions of any dataset
# column/Series with consistent text values.

# Question: How many compliance checks have been carried out in each city?
df.city_name.value_counts()

UNINCORPORATED AREAS    316
SEATTLE                  91
TACOMA                   53
SPOKANE (CITY)           35
VANCOUVER                26
BELLINGHAM               24
LONGVIEW                 23
EVERETT                  19
SHORELINE                16
OLYMPIA                  15
UNION GAP                14
BELLEVUE                 12
PORT ORCHARD             12
ELLENSBURG               12
YAKIMA                   10
BREMERTON                 9
EDMONDS                   9
RENTON                    8
BUCKLEY                   8
WENATCHEE                 8
MOUNT VERNON              8
ABERDEEN                  8
SPOKANE VALLEY            8
LACEY                     7
SEDRO WOOLLEY             7
GOLD BAR                  7
MOSES LAKE                7
WALLA WALLA (CITY)        7
ANACORTES                 6
AUBURN                    6
                       ... 
TENINO                    4
PROSSER                   4
FORKS                     4
BLAINE                    4
BURIEN              

In [6]:
# Question: How many compliance checks have been carried out in each county?
df.county_name.value_counts()

KING            194
SNOHOMISH       112
PIERCE           90
SPOKANE          72
THURSTON         61
WHATCOM          54
KITSAP           48
SKAGIT           34
COWLITZ          33
CLARK            32
GRAYS HARBOR     30
YAKIMA           26
JEFFERSON        22
CLALLAM          21
KITTITAS         19
ISLAND           16
MASON            16
GRANT            15
CHELAN           14
STEVENS          11
WHITMAN           9
OKANOGAN          9
ADAMS             8
PACIFIC           7
BENTON            7
WALLA WALLA       7
DOUGLAS           5
LEWIS             4
ASOTIN            4
KLICKITAT         4
SAN JUAN          3
PEND OREILLE      2
SKAMANIA          2
FERRY             1
Name: county_name, dtype: int64

In [7]:
# Question: What are the types and quantities of actions taken?
df.action.value_counts()

Marijuana Compliance Check-no Sale    931
Marijuana Compliance Check-Sale        69
Name: action, dtype: int64

In [8]:
# Question: How many actions have been taken per licensee?
df.license_number.value_counts()

414733    7
414812    6
415658    6
425273    6
413358    6
417880    5
084045    5
414569    5
421409    5
414495    5
414280    5
412466    5
415523    5
414449    5
421789    5
421709    5
413314    5
420793    5
413544    5
413773    5
423413    5
415567    5
413374    5
410798    5
415410    5
415333    5
415325    5
415303    4
415132    4
415484    4
         ..
422380    1
417646    1
420338    1
414356    1
086219    1
422010    1
413558    1
415124    1
414750    1
079720    1
415641    1
422414    1
414500    1
422572    1
365856    1
080184    1
415216    1
424848    1
415425    1
414959    1
415314    1
422796    1
421910    1
421900    1
420278    1
416248    1
423542    1
413529    1
422900    1
412495    1
Name: license_number, Length: 426, dtype: int64