### Table of Contents
* [Imports etc](#imports)
* [What kind of behavioral crises occur](#kind)
* [What are the trends for different crises](#trends)
* [What is the ambulance use /frequency of people who have behavioral crises](#frequency)

In [None]:
import sys

from matplotlib_venn import venn3
from sqlalchemy import create_engine

sys.path.append('../src/')
from helpers.helpers import *

db_conn = get_database_connection()

In [None]:

def get_county_info(county) :
    '''
    get key information that varies for each county we can use to access the data later
    '''
    if county == 'johnson' :
            table_name = 'jocomedactincidents'
            id_n = 'joid'
            case_id = 'id'
            date_name = 'incidentdate'
    elif county == 'douglas' :
            table_name = 'joco110hsccclientmisc2eaimpression'
            id_n = 'clientid'
            case_id = 'clientmiscid'
            date_name = 'timeincident'

    return table_name, id_n, case_id, date_name

In [None]:
def get_all_data(county) :
    
    ''' get the main ambulance data for a given county. 
    For Johnson county, some extra work has to be done to get the joid for each line. That's because 
    the identifier for each person (hash_rcdid) isn't actually one per individual. Since some analyses
    in this notebook look at the # of calls per person, the joid is needed. 
    since the identifier per person in the douglas county table is unique, no join is necessary. 
    '''
    
    table_name, id_n, case_id, date_name = get_county_info(county)
    
    
    
    query_begin_string = """
    
        select *,
        case when (suicidal_flag = true 
            or drug_flag = true) then true 
            else false end as suicide_or_drug,
        case when (suicidal_flag = true 
            or drug_flag = true
            or other_mental_crisis_flag = true
            or alcohol_flag= true
            ) then true 
            else false end as any_behavioral_crisis"""
    
    if county == 'johnson' :
        q = f"""
        {query_begin_string},
        encode(hash_rcdid, 'hex') as hash_rcdid2
        from (select
        ambulance.*, 
        client.joid
        from clean.{table_name} ambulance
        join clean.jocojococlient client
        on client.hash_sourceid = ambulance.hash_rcdid
        ) joined
        
        """
        

    elif county == 'douglas' :
        q = f"""
                {query_begin_string}
                from clean.{table_name}

        """
    
    all_ambulance_runs = pd.read_sql(q, db_conn)
    return all_ambulance_runs

all_ambulance_runs_jc = get_all_data('johnson')
all_ambulance_runs_dc = get_all_data('douglas')

### What kind of behavioral crises occur
<a id='kind'></a>

##### explore different types of overlap
We want to investigate how many calls there are for different kinds of mental health crises, and what the overlap is (i.e. how many calls are both suicide attempts and overdoses)

**suicidal_id** = every call relating to suicide. This includes suicide ideation and suicide attempts.

**suicide_attempt_id** = every call that is assigned as 'suicide attempt'.  
for instance, self-harm should be assigned as suicide attempt. This should be a subset of suicidal_id

**drug_id** = every call related to drugs (i.e. overdose, drug poisoning, etc. Note that it currently also includes alcohol withdrawal calls)

**alcohol_id** = calls related to alcohol. Should be a subset of drugs.

**other_mental_id** = calls for other mental health crises. This includes whenever the impression or chief complains contains words like "psychiatric episode", "psychotic episode", "anxiety", or "depression"


In [None]:
def make_venn_data(county) :
    '''make new columns so we can look at the overlap of different crises with a venn diagram later (i.e. 
    whether an ambulance run was for a suicide attempt, an overdose, or both).
    Specifically: To make a venn diagram eventually, we need lists for each category (i.e. suicide, overdose) 
    each with all the ambulance runs assigned to that category. If a specific call is listed on more than one column, 
    the venn diagram will count that as an overlap. This function essentially assigns each call (row) to at least one of 
    the categories, making columns that can be used as the list inputs for that venn diagram. 
    It does it by including the case_id in those columns when the category is matched (that is, if category columns share the same 
    case id, it can be counted as an overlapping ambulance run.) These columns can  then be used to make a venn diagram
    showing which ambulance calls are assigned to which category. 
    '''
    table_name, id_n, case_id, date_name = get_county_info(county)

    q = f"""

     select *,
            case when suicidal_flag = true then {case_id}
                else null end as suicidal_id,
            case when suicide_attempt_flag = true then {case_id}
                else null end as suicide_attempt_id,
            case when drug_flag = true then {case_id}
                else null end as drug_id,
            case when other_mental_crisis_flag = true then {case_id}
                else null end as other_mental_id,
            case when alcohol_flag = true then {case_id}
                else null end as alcohol_id 
            from clean.{table_name}
            where suicidal_flag = true
                or suicide_attempt_flag = true
                or drug_flag = true
                or alcohol_flag = true
                or other_mental_crisis_flag = true
        """
    temp_crises = pd.read_sql(q, db_conn)
    
    return temp_crises

temp_crises_jc = make_venn_data('johnson')
temp_crises_dc = make_venn_data('douglas')

In [None]:
def make_venn(county, temp_crises, id1, id2, id3) :
    '''
    This uses the newly-made columns that assign each ambulance run to a category (i.e. suicide, overdose)
    to make a venn diagram to look at what calls are for which category, and how many are overlapping. 
    '''
    print(county.upper())
    a = [set(temp_crises[temp_crises[id1].notnull()][id1].to_list()), 
     set(temp_crises[temp_crises[id2].notnull()][id2].to_list()), 
     set(temp_crises[temp_crises[id3].notnull()][id3].to_list())]
    
    
    venn3(a,
       set_labels=(id1, id2, id3)
     )
    plt.show()

#### What is the overlap between suicidal-related calls, suicide attempts, and drug-related calls?
suicide attempts should be a subset of suicide-related calls (suicidal_id includes also 'suicidal ideation' calls )

In [None]:
make_venn('johnson', temp_crises_jc, 'suicidal_id', 'suicide_attempt_id', 'drug_id')
make_venn('douglas', temp_crises_dc, 'suicidal_id', 'suicide_attempt_id', 'drug_id')

#### What is the overlap between suicidal-related calls, drug-related calls, and alcohol related calls?
alcohol_id should be a subset of drug_id

In [None]:
make_venn('johnson', temp_crises_jc, 'suicidal_id', 'drug_id', 'alcohol_id')
make_venn('douglas', temp_crises_dc, 'suicidal_id', 'drug_id', 'alcohol_id')

#### What is the overlap between suicidal-related calls, drug-related calls, and other mental health crises?



In [None]:
make_venn('johnson', temp_crises_jc, 'suicidal_id', 'drug_id', 'other_mental_id')
make_venn('douglas', temp_crises_dc, 'suicidal_id', 'drug_id', 'other_mental_id')

## What are the trends for different crises
<a id=trends></a>

In [None]:
def get_data_by_year(county): 
    '''To track the trends over time, this function gets the data grouped by each year. 
    '''
    table_name, id_n, case_id, date_name = get_county_info(county)

    q = f"""

    select
    extract(year from {date_name}) as year,
    count(*) as all_calls,
    count(case when suicidal_flag=true then true else null end) as all_suicidal,
    count(case when drug_flag=true then true else null end) as all_drug,
    count(case when alcohol_flag=true then true else null end) as all_alcohol,
    count(case when other_mental_crisis_flag=true then true else null end) as all_other_mental
    from clean.{table_name} 
    group by year
    order by year

    """

    ambulance_runs_over_time = pd.read_sql(q, db_conn)
    ambulance_runs_over_time = ambulance_runs_over_time.melt(id_vars = "year", var_name = 'call_type', value_name = "count")
    ambulance_runs_over_time['year'] = ambulance_runs_over_time['year'].astype(int);


    all_calls_count = ambulance_runs_over_time[ambulance_runs_over_time['call_type'] == 'all_calls']

    result = pd.merge(ambulance_runs_over_time, all_calls_count, on = 'year').drop(columns = ['call_type_y'])
    result['percent'] = result['count_x'] / result['count_y'] * 100
    result = result.drop(columns = 'count_y').rename(columns = {'call_type_x': 'call_type', 'count_x': 'count'}).sort_values('year')


    return result

ambulance_by_year_dc = get_data_by_year('douglas')
ambulance_by_year_jc = get_data_by_year('johnson')

In [None]:
## get data by year and month
def get_data_by_month(county):
    '''
    To track the trends over time, this function gets the data grouped by each month. 
    '''
    table_name, id_n, case_id, date_name = get_county_info(county)

    q = f"""
    select
    extract(year from {date_name}) as year,
    extract(month from {date_name}) as month,
    count(*) as all_calls,
    count(case when suicidal_flag=true then true else null end) as all_suicidal,
    count(case when drug_flag=true then true else null end) as all_drug,
    count(case when alcohol_flag=true then true else null end) as all_alcohol,
    count(case when other_mental_crisis_flag=true then true else null end) as all_other_mental
    from clean.{table_name} 
    group by year, month
    order by year, month
    """
    ambulance_runs_over_time = pd.read_sql(q, db_conn)
    ambulance_runs_over_time = ambulance_runs_over_time.melt(id_vars = ["year", "month"],  var_name = 'call_type', value_name = "count")
    ambulance_runs_over_time
    ambulance_runs_over_time['year'] = ambulance_runs_over_time['year'].astype(int);
    ambulance_runs_over_time['month'] = ambulance_runs_over_time['month'].astype(int);


    all_calls_count = ambulance_runs_over_time[ambulance_runs_over_time['call_type'] == 'all_calls']

    result = pd.merge(ambulance_runs_over_time, all_calls_count, on = ['year', 'month']).drop(columns = ['call_type_y'])
    result['percent'] = result['count_x'] / result['count_y'] * 100
    result = result.drop(columns = 'count_y').rename(columns = {'call_type_x': 'call_type', 'count_x': 'count'}).sort_values(['year', 'month'])
    
    return result

ambulance_by_month_dc = get_data_by_month('douglas')
ambulance_by_month_jc = get_data_by_month('johnson')

In [None]:
def plot_trends(county, data_year, data_month) :
    ''' Plot the trends of ambulance calls over years and over months
    Do this for both the absolute counts, and for the percentage of all ambulance calls
    '''
    #YEAR:
    
    #plot all counts with "all calls" across years
    sns.barplot(x= "year", y = "count", hue = "call_type", data = data_year)
    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)

    plt.title(f'{county.capitalize()} County: Ambulance Calls Over Years')
    plt.show()


    #plot just mental crises across years
    data_year_without_total = data_year[data_year["call_type"] != 'all_calls']

    sns.barplot(x= "year", y = "count", hue = "call_type", data = data_year_without_total, palette = 'husl')
    plt.title(f'{county.capitalize()} County: Ambulance Calls for Behavioral Crises Over Years')

    plt.show()

    sns.barplot(x= "year", y = "percent", hue = "call_type", data = data_year_without_total, palette = 'husl')
    plt.title(f'{county.capitalize()} County: Percent of Ambulance Calls for Behavioral Crises Over Years')

    plt.show()
    
    #MONTHS:
    
    data_month_without_total = data_month[data_month["call_type"] != 'all_calls']

    sns.barplot(x= "call_type", y = "count", hue = "month", data = data_month_without_total, palette = 'husl')
    plt.title(f'{county.capitalize()} County: Ambulance Calls for Behavioral Crises Over Months')

    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)

    plt.show()

    sns.barplot(x= "call_type", y = "percent", hue = "month", data = data_month_without_total, palette = 'husl')
    plt.title(f'{county.capitalize()} County: Percent of Ambulance Calls for Behavioral Crises Over Months')

    plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0)

    plt.show()

In [None]:
#PLOT TRENDS FOR JOHNSON COUNTY
plot_trends('johnson', ambulance_by_year_jc, ambulance_by_month_jc)

In [None]:
#PLOT TRENDS FOR DOUGLAS COUNTY
plot_trends('douglas', ambulance_by_year_dc, ambulance_by_month_dc)

## What is the ambulance use /frequency of people who have behavioral crises
<a id =frequency></a>

In [None]:
def label_crises(row) :
    ''' return true or false if a given row is an ambulance call with a behavioral crisis. 
    '''


    if row['suicidal_num'] != 0 or row['suicide_num'] != 0 or row['drug_num'] != 0 or  row['alcohol_num'] != 0 or row['other_num'] != 0:

        return True
    else:
        return False
        
        
def find_calls_per_person(county) :
    
    '''
    This function finds, for each person in the dataset, whether they are a person who
    has used the ambulance for some behavioral crises.
    '''
    table_name, id_n, case_id, date_name = get_county_info(county)





    if county == 'johnson' :
        
        q = f"""
        
        
        with ambulance_data as( 
            select ambulance.*, 
            client.joid from clean.{table_name} ambulance 
            join clean.jocojococlient client 
            on client.hash_sourceid = ambulance.hash_rcdid 
        )
        
        select 
            joid, 
                count(CASE WHEN suicidal_flag THEN 1 end) as suicidal_num, 
                count(CASE WHEN suicide_attempt_flag THEN 1 end) as suicide_num, 
                count(CASE WHEN drug_flag THEN 1 end) as drug_num, 
                count(CASE WHEN alcohol_flag THEN 1 end) as alcohol_num, 
                count(CASE WHEN other_mental_crisis_flag THEN 1 end) as other_num 
                from ambulance_data 
                group by joid
        
        """
#         q = f"""

#             select
#             joid,
#             count(CASE WHEN suicidal_flag THEN 1 end) as suicidal_num,
#             count(CASE WHEN suicide_attempt_flag THEN 1 end) as suicide_num,
#             count(CASE WHEN drug_flag THEN 1 end) as drug_num,
#             count(CASE WHEN alcohol_flag THEN 1 end) as alcohol_num,
#             count(CASE WHEN other_mental_crisis_flag THEN 1 end) as other_num
#             from (select
#             ambulance.*, 
#             client.joid
#             from clean.{table_name} ambulance
#             join clean.jocojococlient client
#             on client.hash_sourceid = ambulance.hash_rcdid
#             ) joined
#             group by joid

#         """

    elif county == 'douglas' :
        q = f"""

        select
        clientid,
        count(CASE WHEN suicidal_flag THEN 1 end) as suicidal_num,
        count(CASE WHEN suicide_attempt_flag THEN 1 end) as suicide_num,
        count(CASE WHEN drug_flag THEN 1 end) as drug_num,
        count(CASE WHEN alcohol_flag THEN 1 end) as alcohol_num,
        count(CASE WHEN other_mental_crisis_flag THEN 1 end) as other_num
        from clean.{table_name}
        group by clientid

        """
        
    people_call_crises = pd.read_sql(q, db_conn)




    people_call_crises['crisis'] = people_call_crises.apply(lambda row: label_crises(row), axis=1)

    people_call_crises.drop(columns = [ 'suicidal_num', 'suicide_num', 'drug_num', 'alcohol_num', 'other_num'])

    return people_call_crises

people_call_crises_jc = find_calls_per_person('johnson')
people_call_crises_dc = find_calls_per_person('douglas')

In [None]:
def num_calls_by_type(county, all_ambulance_data, call_data) :
    '''
    this function counts the # of ambulance calls per person, 
    divided by whether the person is one who has ever used the ambulance for
    a behavioral health crisis or not. 
    '''
    table_name, id_n, case_id, date_name = get_county_info(county)

    new = pd.merge(all_ambulance_data, call_data, on= id_n)

    new = pd.DataFrame(new.groupby([id_n, 'crisis']).size()).reset_index()
    new.columns = ['id', 'crisis', 'num_calls']
    new = new.sort_values('num_calls',ascending = False)

    sns.stripplot(y = 'num_calls', x = 'crisis', data = new, palette = 'husl')
    plt.xlabel('Has the person ever called for a mental health crisis?')
    plt.ylabel('Number of Ambulance Calls')
    plt.title(f"{county.capitalize()} County: # Ambulance Calls Per Person")
    plt.show()

num_calls_by_type('johnson', all_ambulance_runs_jc, people_call_crises_jc)
num_calls_by_type('douglas', all_ambulance_runs_dc, people_call_crises_dc)

It looks like the highest utilizers of the ambulance may be those who have used the ambulance for a behavioral health crisis.