## Washington State Liquor and Cannabis Board (WSLCB) Open Data Portal

*For an introduction to the WSLCB, [see the README in the parent directory](../README.md).*

*For an introduction to the WSLCB's Socrata-based Open Data Portal, [see the README in this directory](./README.md).*

### Dataset: Enforcement Visits

* Canonical Dataset ID: **w7wg-8m52**
* Detail screen on the WSLCB Portal: https://data.lcb.wa.gov/dataset/Enforcement-Visits-Dataset/jizx-thwg
* Detail screen on Socrata's Open Data Foundry: https://dev.socrata.com/foundry/data.lcb.wa.gov/w7wg-8m52

We'll be using the [`cannapy`](https://github.com/CannabisData/cannapy) library to access the portal data.  `cannapy` aims to provide an abstract interface for accessing and working with *Cannabis* data from around the world.  It utilizes [xmunoz](https://github.com/xmunoz)'s [`sodapy`](https://github.com/xmunoz/sodapy) client to access Socrata-based open data portals and can return data loaded into [Pandas DataFrames](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).

In [1]:
import time
import cannapy.us.wa.wslcb.portal as wslcb
import pandas as pd

In [2]:
# Specify your own Socrata App Token if you plan to experiment
app_token = 'XaB9MBqc81C3KT4Vps6Wh5LZt'

# Instantiate a cannapy interface to the WSLCB open data portal
portal = wslcb.WSLCBPortal(app_token)

# We'll be using the Enforcement Visits dataset
dataset_id = 'w7wg-8m52'

In [3]:
# Check when the dataset was last updated
last_updated = portal.dataset_last_updated(dataset_id)
print('Last updated: {}'.format(time.strftime('%c', last_updated)))

Last updated: Thu Nov 16 10:55:40 2017


In [4]:
# Retrieve the dataset preloaded into a Pandas DataFrame
df = portal.get_dataframe(dataset_id)

# Validate we've got the right data by examining the first few rows
df.head()

Unnamed: 0,date,license_number,city_name,county_name,activity
0,2014-08-27T00:00:00.000,,,,Marijuana Applicant Site Verification
1,2017-06-08T00:00:00.000,360307.0,SPOKANE (CITY),SPOKANE,Marijuana Premises Check
2,2015-02-19T00:00:00.000,,,,Marijuana Premises Check
3,2015-04-20T00:00:00.000,,,,Marijuana Premises Check
4,2015-11-16T00:00:00.000,,,,Marijuana Applicant Site Verification


In [6]:
# Looks like we need to combine the city name values for "Unincorporated Areas".
df.replace(to_replace='UNINCORP. AREAS', value='UNINCORPORATED AREAS', inplace=True)

In [7]:
# The DataFrame value_counts() histogramming method is useful for asking basic questions of any dataset
# column/Series with consistent text values.

# Question: How many enforcement visits have been documented in each city?
df.city_name.value_counts()

UNINCORPORATED AREAS    387
SEATTLE                  65
TACOMA                   47
SPOKANE (CITY)           44
BELLINGHAM               29
YAKIMA                   14
ARLINGTON                14
UNION GAP                14
RENTON                   13
LONGVIEW                 13
VANCOUVER                12
TUMWATER                 11
EDMONDS                  11
MOUNT VERNON             10
SPOKANE VALLEY           10
BREMERTON                10
BUCKLEY                   9
PORT ORCHARD              9
COVINGTON                 9
OLYMPIA                   9
MILLWOOD                  9
AUBURN                    8
RITZVILLE                 8
GOLD BAR                  8
PULLMAN                   8
LACEY                     8
RAYMOND                   7
REDMOND                   7
ELLENSBURG                7
AIRWAY HEIGHTS            6
                       ... 
KIRKLAND                  2
BLAINE                    2
FORKS                     2
NORTH BONNEVILLE          2
LYNNWOOD            

In [8]:
# Question: How many enforcement visits have been documented in each county?
df.county_name.value_counts()

KING            158
SPOKANE          96
PIERCE           92
SNOHOMISH        74
THURSTON         66
WHATCOM          54
OKANOGAN         46
CHELAN           41
KITSAP           38
YAKIMA           36
SKAGIT           36
GRAYS HARBOR     28
COWLITZ          20
GRANT            19
KITTITAS         19
PACIFIC          16
CLARK            16
MASON            13
WHITMAN          12
ISLAND           12
ADAMS            10
DOUGLAS          10
JEFFERSON        10
BENTON            8
LEWIS             6
KLICKITAT         5
ASOTIN            5
CLALLAM           5
STEVENS           4
SKAMANIA          2
SAN JUAN          2
WALLA WALLA       2
LINCOLN           1
PEND OREILLE      1
FERRY             1
Name: county_name, dtype: int64

In [9]:
# Question: How many enforcement visits have been documented per licensee?
df.license_number.value_counts()

413596    9
423096    8
423413    7
421695    7
420889    6
421786    6
415333    6
353928    6
424751    6
413813    6
412490    5
415486    5
412923    5
422570    5
082587    5
424257    5
414398    4
422658    4
417183    4
354876    4
424647    4
414958    4
414733    4
414889    4
414539    4
415032    4
353993    4
422278    4
084154    4
424747    4
         ..
412214    1
416806    1
416106    1
422380    1
416008    1
410332    1
418021    1
413002    1
414755    1
415425    1
417646    1
415734    1
412858    1
415517    1
414273    1
423977    1
412875    1
412681    1
420389    1
422290    1
417051    1
412567    1
421726    1
413948    1
359314    1
415094    1
413801    1
412174    1
414143    1
082522    1
Name: license_number, Length: 564, dtype: int64

In [11]:
# Question: What are the types and quantities of activities?
df.activity.value_counts()

Marijuana Premises Check                 807
Marijuana Compliance Check-no Sale       177
Marijuana Applicant Site Verification      9
Marijuana Compliance Check-Sale            7
Name: activity, dtype: int64