# Clean up California Ownership Data

This notebook takes the data scraped from BuzzFeed News from the Cal Health Find Database, plus additional data from the state of California and combines it with the data from QCOR to output the data for the final analysis.

In [1]:
import pandas as pd

## Load CDPH Data Re-Formatted for QCOR

The data below was initially scraped from the [Cal Health Find Database](https://www.cdph.ca.gov/Programs/CHCQ/LCP/CalHealthFind/Pages/SearchResult.aspx).

In [2]:
RAW_OWNERSHIP_DIR = "../../data/ownership/raw/"
INTERMEDIATE_OWNERSHIP_DIR = "../../data/ownership/intermediate/"

In [3]:
ca_state = pd.read_csv(
    INTERMEDIATE_OWNERSHIP_DIR + 
    "california-data-for-qcor.csv"
).dropna(subset=["HFCISFacilityProviderNumber"])

In [4]:
len(ca_state)

1058

In [5]:
ca_state.head()

Unnamed: 0,clean_owner,HFCISFacilityAddress,HFCISFacilityCity,HFCISFacilityCounty,HFCISFacilityID,HFCISFacilityName,HFCISFacilityZIPCode,HFCISNPINumber,HFCISFacilityProviderNumber,HFCISInitialLicenseDate
0,United Cerebral Palsy/Spastic Children's Found...,13272 Dronfield Ave,Sylmar,LOS ANGELES,960001997,UCP DRONFIELD NORTH,91342,1831342000.0,05G973,1996-02-13
1,Portsmouth,27695 Portsmouth Ave,Hayward,ALAMEDA,20001191,PORTSMOUTH ICF/DDH,94545,1861534000.0,55G320,1999-10-27
2,St. Mary Investments,3706 Pine Ave,Long Beach,LOS ANGELES,960001702,BIXBY KNOLL PLACE,90807,1427266000.0,05G725,1993-10-28
3,"LRC Homes, Inc.",24402 Aphena Ave,Mission Viejo,ORANGE,630004641,LORI'S HOME,92691,1588731000.0,55G559,2006-04-11
4,MITCHELL HOMES INCORPORATED,1280 Mcandrew Rd,Ojai,VENTURA,50000629,MITCHELL HOME,93023,1750406000.0,05G578,1992-04-20


## Load Spreadsheets California Provided

In addition to the Cal Health Find Database, BuzzFeed News requested the ownership information for facilities that were either closed or recently opened, because they did not appear on the website. That information was provided by the state in two additional Excel files.

In [6]:
other_ca_state = pd.read_excel(RAW_OWNERSHIP_DIR + "california-missing-facility-list-ICF-DDS.xlsx")

In [7]:
other_ca_state.head()

Unnamed: 0,#,name,particip_date,provider_id,Facility Status,CalHealthFind Status,Licensee name,Facility Type,Closed Date
0,1,CASA DEL MAR #1,1985-08-14,05G178,Open,Exists in CHF,"J & J Care Centers, Inc.",ICFDDN,NaT
1,2,"HASSIBAH - TLC, INC",1987-02-10,05G246,Closed,,"Hassibah - Tlc, Inc.",ICFDDN,2020-05-01
2,3,ADULTS IN COMMUNITY TRANSITION - A C T 1,1988-09-08,05G314,Closed,,Advocates For Independent Living,ICFDDH,2019-05-31
3,4,CASA TERCEIRA,1988-09-20,05G342,Closed,,"Noia Residential Services, Inc.",ICFDDH,2020-04-22
4,5,"LOYD'S LIBERTY HOMES, INC - SAN JOSE",1989-11-16,05G400,Closed,,"Loyd's Liberty Homes, Inc.",ICFDDN,2021-04-16


In [8]:
historical_ca_state = pd.read_excel(RAW_OWNERSHIP_DIR + "ICF-DD_Facility_Closures_and_Ownership_FINAL.xlsx")

In [9]:
historical_ca_state.head()

Unnamed: 0,NAME,PROVIDER_ID,STATE,ADDRESS,FACID,FAC_STATUS_TYPE_CODE,FAC_CLOSURE_DATE,DESCRIPTION,BUSINESS_NAME,FIRST_NAME,MIDDLE_INITIAL,LAST_NAME
0,"ALPINE HOME, THE",05G579,CA,"6156 RIPLEY LANE PARADISE, CA 95969",230000427,Closed,2018-11-16 00:00:00,Licensee,None on File,Marna,,Carli
1,ALTA LOMA HOUSE,05G697,CA,"10366 ALTA LOMA DRIVE RANCHO CUCAMONGA, CA 91701",240001246,Closed,2018-05-18 00:00:00,Licensee,"Rockcreek, Inc.",Rockcreek INC,,
2,ANACAPA,55G758,CA,"2538 ANACAPA DRIVE, APT. 101/108, COSTA MESA, ...",630015860,Open,,Licensee,"RSCR California, Inc.",RSCR California INC,,
3,ANTON'S HOME,55G447,CA,"2598 OLYMPIC DRIVE SAN BRUNO, CA 94066",220001147,Closed,2021-04-08 00:00:00,Licensee,"ANTON'S HOME, INC.",Estelita,,Evangelista
4,ARLINGTON HOME # 1,05G582,CA,"1750 ARLINGTON AVE TORRANCE, CA 90501",960001484,Closed,2017-09-14 00:00:00,Licensee,Arlington Home Care Inc.,Arlington Home Care Inc.,,


## Load QCOR Data

In [10]:
ca_facs = (
    pd
    .read_csv("../../data/qcor/facilities.csv", parse_dates=["termination_date"])
    .loc[
        lambda x: 
         (x["state"] == "CA") & 
         (
             x["termination_code"].isnull() | 
             (x["termination_date"] >= "2017-03-01")
         )
    ]
)

In [11]:
len(ca_facs)

1166

In [12]:
ca_facs.head()

Unnamed: 0,name,provider_id,type,region,state,address,phone,particip_date,certified_beds,hospital_based,ownership_type,termination_code,termination_date
63,PORTERVILLE DEVELOPMENT CENTER - DP ICF/IID,05G014,Intermediate Care Facilities for Individuals w...,(IX) San Francisco,CA,"26501 AVENUE 140\nPORTERVILLE, CA 93257",559 782-2222,1980-03-01,512,No,Government,,NaT
64,GOLDEN STATE CARE CENTER,05G015,Intermediate Care Facilities for Individuals w...,(IX) San Francisco,CA,"1758 N BIG DALTON AVENUE\nBALDWIN PARK, CA 91706",626 962-3274,1978-12-01,155,No,For Profit,,NaT
65,HY-LOND GARDEN GROVE,05G017,Intermediate Care Facilities for Individuals w...,(IX) San Francisco,CA,"9861 11TH STREET\nGARDEN GROVE, CA 92844",714 531-8741,1978-05-01,59,No,Government,Vol-Other,2019-11-12
66,EDGEWOOD CENTER,05G019,Intermediate Care Facilities for Individuals w...,(IX) San Francisco,CA,"200 WEST PARAMOUNT\nAZUSA, CA 91702",626 334-7861,1978-10-01,45,No,For Profit,,NaT
67,GLENRIDGE CENTER #140,05G020,Intermediate Care Facilities for Individuals w...,(IX) San Francisco,CA,"611 SOUTH CENTRAL AVENUE\nGLENDALE, CA 91204",818 637-7727,1978-11-01,59,No,For Profit,,NaT


There is one facility that was opened after BuzzFeed News scraped the data from the Cal Health Find Database. Further research determined it is not owned by BrightSpring.

In [13]:
missing_facs = ca_facs.loc[lambda x: ~x["provider_id"].isin(ca_state["HFCISFacilityProviderNumber"])]

In [14]:
missing_facs.loc[
    lambda x: ~x["provider_id"].isin(historical_ca_state["PROVIDER_ID"]) & ~x["termination_code"].isnull()
]

Unnamed: 0,name,provider_id,type,region,state,address,phone,particip_date,certified_beds,hospital_based,ownership_type,termination_code,termination_date
253,"LOYD'S LIBERTY HOMES, INC - SAN JOSE",05G400,Intermediate Care Facilities for Individuals w...,(IX) San Francisco,CA,"3567 SAN JOSE AVENUE\nMERCED, CA 95340",209 384-5833,1989-11-16,6,No,For Profit,"Vol-Merg, Close",2021-05-03


## Merge the datasets and create the final set

In [15]:
def find_owner(row):
    if pd.notnull(row["clean_owner"]):
        return row["clean_owner"]
    else:
        if row["Licensee name"] in ["Res-Care California, Inc.", "RSCR California, Inc.", "RSCR INLAND, INC."]:
            return "BrightSpring"
        else:
            return row["Licensee name"]

In [16]:
combined_state_owner_info = (
    pd
    .concat([
        ca_state[
            ["HFCISFacilityProviderNumber", "clean_owner"]
        ]
        .rename(columns={"HFCISFacilityProviderNumber": "provider_id"}),
        other_ca_state[
            ["provider_id", "Licensee name", "Closed Date"]
        ],
        historical_ca_state[
            ["PROVIDER_ID", "BUSINESS_NAME", "FAC_CLOSURE_DATE"]
        ]
        .rename(columns={
            "PROVIDER_ID": "provider_id", 
            "BUSINESS_NAME": "Licensee name", 
            "FAC_CLOSURE_DATE": "Closed Date"
        })
    ])
    .assign(owner = lambda df: df.apply(lambda x: find_owner(x), axis=1))
    [["provider_id", "owner", "Closed Date"]]
    .drop_duplicates(subset="provider_id")
)

In [17]:
def clean_closed_date(row):
    if not pd.isnull(row["Closed Date"]):
        return row["Closed Date"]
    elif not pd.isnull(row["termination_date"]):
        return row["termination_date"]
    else:
        return None

In [18]:
# Final data for analysis
final_ca_ownership = (
    ca_facs
    .merge(
        combined_state_owner_info,
        how="left",
        left_on="provider_id",
        right_on="provider_id"
    )
    .assign(termination_date = lambda df: df.apply(clean_closed_date, axis=1))
    .assign(is_brightspring = lambda df: df["owner"] == "BrightSpring")
    .rename(columns={"owner": "legal_owner"})
)

In [19]:
len(final_ca_ownership)

1166

In [20]:
# output for analysis
final_ca_ownership[[
    "name", "provider_id", "type", "region", "state", "address",
    "phone", "address", "particip_date", "certified_beds",
    "hospital_based", "ownership_type", "termination_code",
    "termination_date", "legal_owner", "is_brightspring"
]].to_csv("../../data/ownership/final/ca.csv", index=None)

---

---

---