## Washington State Liquor and Cannabis Board (WSLCB) Open Data Portal

*For an introduction to the WSLCB, [see the README in the parent directory](../README.md).*

*For an introduction to the WSLCB's Socrata-based Open Data Portal, [see the README in this directory](./README.md).*

### Dataset: Violations

* Canonical Dataset ID: **dgm4-3cm6**
* Detail screen on the WSLCB Portal: https://data.lcb.wa.gov/dataset/Violations-Dataset/dx3i-tzh2
* Detail screen on Socrata's Open Data Foundry: https://dev.socrata.com/foundry/data.lcb.wa.gov/dgm4-3cm6

We'll be using the [`cannapy`](https://github.com/CannabisData/cannapy) library to access the portal data.  `cannapy` aims to provide an abstract interface for accessing and working with *Cannabis* data from around the world.  It utilizes [xmunoz](https://github.com/xmunoz)'s [`sodapy`](https://github.com/xmunoz/sodapy) client to access Socrata-based open data portals and can return data loaded into [Pandas DataFrames](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html).

In [6]:
import time
import cannapy.us.wa.wslcb.portal as wslcb
import pandas as pd

In [7]:
# Specify your own Socrata App Token if you plan to experiment
app_token = 'XaB9MBqc81C3KT4Vps6Wh5LZt'

# Instantiate a cannapy interface to the WSLCB open data portal
portal = wslcb.WSLCBPortal(app_token)

# We'll be using the Violations dataset
dataset_id = 'dgm4-3cm6'

In [8]:
# Check when the dataset was last updated
last_updated = portal.dataset_last_updated(dataset_id)
print('Last updated: {}'.format(time.strftime('%c', last_updated)))

Last updated: Thu Nov 16 10:54:40 2017


In [9]:
# Retrieve the dataset preloaded into a Pandas DataFrame
df = portal.get_dataframe(dataset_id)

# Validate we've got the right data by examining the first few rows
df.head()

Unnamed: 0,visit_date,license_number,county_name,city_name,case,penalty_type,violation_code,wac_code
0,2016-10-28T00:00:00.000,412199,SPOKANE,UNINCORPORATED AREAS,7N6302A,Written Warning,,314.55.020
1,2017-01-25T00:00:00.000,416619,ISLAND,UNINCORPORATED AREAS,7I7025B,Written Warning,,314.55.017
2,2017-01-11T00:00:00.000,421490,THURSTON,UNINCORPORATED AREAS,1C7011A,Written Warning,,314.55.083(3)
3,2015-09-10T00:00:00.000,417379,BENTON,UNINCORP. AREAS,7O5253B,Written Warning,,314.55.083(4)
4,2015-05-04T00:00:00.000,413570,KING,SEATTLE,2A5124A,Written Warning,,314.55.155


In [10]:
# Looks like we need to combine the city name values for "Unincorporated Areas".
df.replace(to_replace='UNINCORP. AREAS', value='UNINCORPORATED AREAS', inplace=True)

In [11]:
# The DataFrame value_counts() histogramming method is useful for asking basic questions of any dataset
# column/Series with consistent text values.

# Question: How many violations have been documented in each city?
df.city_name.value_counts()

UNINCORPORATED AREAS    478
SEATTLE                 110
TACOMA                   46
SPOKANE (CITY)           32
ARLINGTON                25
MOXEE                    24
SPOKANE VALLEY           20
BELLINGHAM               20
VANCOUVER                16
BELLEVUE                 14
EVERETT                  12
UNION GAP                 9
ELLENSBURG                9
WENATCHEE                 9
LACEY                     9
RAYMOND                   9
KIRKLAND                  7
LONGVIEW                  6
RENTON                    6
AUBURN                    6
AIRWAY HEIGHTS            6
KELSO                     6
MOSES LAKE                5
OAK HARBOR                5
OLYMPIA                   5
GOLD BAR                  5
TUMWATER                  5
LAKE STEVENS              5
ANACORTES                 4
MOUNT VERNON              4
                       ... 
BURIEN                    2
EAST WENATCHEE            2
SEDRO WOOLLEY             2
WOODLAND                  2
MOUNTLAKE TERRACE   

In [12]:
# Question: How many violations have been documented in each county?
df.county_name.value_counts()

KING            173
SNOHOMISH       119
SPOKANE         113
PIERCE           58
YAKIMA           46
CHELAN           44
BENTON           44
WHATCOM          42
THURSTON         40
OKANOGAN         35
STEVENS          34
SKAGIT           25
MASON            23
CLARK            22
KITTITAS         21
KITSAP           19
COWLITZ          18
GRANT            15
CLALLAM          14
JEFFERSON        13
PACIFIC          12
GRAYS HARBOR     12
SAN JUAN          9
ISLAND            8
LINCOLN           8
DOUGLAS           7
WHITMAN           5
FERRY             4
KLICKITAT         4
WAHKIAKUM         3
FRANKLIN          3
LEWIS             2
ADAMS             2
WALLA WALLA       2
ASOTIN            1
Name: county_name, dtype: int64

In [13]:
# Question: How many violations have been documented per licensee?
df.license_number.value_counts()

413718    8
416183    8
412069    8
414958    8
412784    7
414723    7
412672    7
414785    7
412149    7
415325    6
413558    6
416968    5
416458    5
414755    5
417174    5
413719    5
413773    5
412969    5
417125    5
421667    5
413287    5
417643    5
413319    5
415812    5
415726    5
416627    5
416539    5
413426    5
414495    4
415984    4
         ..
413215    1
412959    1
416893    1
412544    1
423380    1
423390    1
412240    1
412603    1
423000    1
412949    1
415889    1
412347    1
417206    1
413150    1
413090    1
415183    1
417155    1
413922    1
413819    1
416155    1
415674    1
416235    1
417053    1
420894    1
415998    1
413131    1
415665    1
412208    1
417586    1
413659    1
Name: license_number, Length: 530, dtype: int64

In [17]:
# Question: What are the types and quantities of violation codes?
# It looks like this column/field is not currently used or populated in the dataset.
df.violation_code.value_counts()

Series([], Name: violation_code, dtype: int64)

In [18]:
# Question: What are the types and quantities of Washington Administrative Code (WAC) citations?
df.wac_code.value_counts()

314.55.083(4)       240
314.55.083(3)       145
69.50.357            81
314.55.155           69
314.55.079           57
314.55.020           41
314.55.085           37
314.55.089           35
314.55.087           33
314.55.083(1)        28
314.55.155(1)        25
314.55.035           22
314.55.105           19
314.55.084           19
314.55.092           17
314.55.050           16
69.50.401            14
314.55.097           14
314.55.155(2)        11
314.55.104            7
314.55.083(5)         7
314.55.083(6)         6
314.55.086            5
314.55.120            5
314.55.099            5
314.55.082            5
314.55.096            4
69.50.535             3
314.55.018            3
314.55.087(1)(f)      3
314.55.017            3
314.55.079(6)         3
314.55.075            2
69.50.369             2
69.50.328             2
69.50.4015            1
314.55.110            1
314.55.135            1
66.44.310(1)(a)       1
314.55.106            1
314.55.083            1
314.55.077      

In [16]:
# Question: What are the types and quantities of penalties?
df.penalty_type.value_counts()

AVN                540
Name: penalty_type, dtype: int64