This is a dataset of Assisted Living, Nursing and Residential Care facilities in Oregon, open as of January, 2017. For each, we have:

Data were munged [here](https://github.com/TheOregonian/database-story/blob/master/notebooks/transformation/mung-3-29-scrape.ipynb).

1. <i>facility_id:</i> Unique ID used to join to complaints
2. <i>fac_ccmunumber:</i> Unique ID used to join to ownership history
3. <i>facility_type:</i> NF - Nursing Facility; RCF - Residential Care Facility; ALF - Assisted Living Facility
4. <i>fac_capacity:</i> Number of beds facility is licensed to have. Not necessarily the number of beds facility does have.
5. <i>offline:</i> created in munging notebook, a count of complaints that DO NOT appear when facility is searched on state's complaint search website (https://apps.state.or.us/cf2/spd/facility_complaints/).
6. <i>online:</i> created in munging notebook, a count of complaints that DO appear when facility is searched on state's complaint search website (https://apps.state.or.us/cf2/spd/facility_complaints/).

In [1]:
import pandas as pd
import numpy as np
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

In [3]:
df = pd.read_csv('../../data/processed/facilities-3-29-scrape.csv')

<h3>How many facilities have accurate records online?</h3>

Those that have no offline records.

In [10]:
df[(df['offline'].isnull())].count()[0]

57

<h3>How many facilities have inaccurate records online?<h/3>

Those that have offline records.

In [11]:
df[(df['offline'].notnull())].count()[0]

585

<h3>How many facilities had more than double the number of complaints shown online?</h3>

In [12]:
df[(df['offline']>df['online']) & (df['online'].notnull())].count()[0]

357

<h3>How many facilities show zero complaints online but have complaints offline?</h3>

In [13]:
df[(df['online'].isnull()) & (df['offline'].notnull())].count()[0]

59

<h3>How many facilities have complaints and are accurate online?</h3>

In [14]:
df[(df['online'].notnull()) & (df['offline'].isnull())].count()[0]

14

<h3>How many facilities have complaints?</h3>

In [15]:
df[(df['online'].notnull()) | df['offline'].notnull()].count()[0]

599

<h3>What percent of facilities have accurate records online?</h3>

In [16]:
df[(df['offline'].isnull())].count()[0]/df.count()[0]*100

8.8785046728971952

<h3>What is the total capacity of all facilities with inaccurate records?</h3>

In [17]:
df[df['offline'].notnull()].sum()['fac_capacity']

35238.0

In [18]:
df[df['fac_capacity'].isnull()]

Unnamed: 0,facility_id,fac_ccmunumber,facility_type,fac_capacity,facility_name,offline,online


In [20]:
df[df['online']<1]

Unnamed: 0,facility_id,fac_ccmunumber,facility_type,fac_capacity,facility_name,offline,online


<h3>How many facilities appear to have no complaints, whether or not they do?</h3>

In [22]:
df[df['online'].isnull()].count()[0]

102