# Family and Children's Services Crisis Project by DataCampers
### Objective 2: Analyze trends in call content by looking for which issues are most frequent and what is the average count of issues per call.
- *For the second objective, the columns that contain the information about the documented issues are those that begin with CRISIS Issues. When calls come in, call takers use a form to indicate the various issues the individual is experiencing.*
- *If an individual is experiencing multiple issues, the issues can be grouped together into a single cell, which makes understanding individual issues difficult. For this project you will parse the data from the different CRISIS Issues columns to allow deeper investigation of each individual issue.*

## Self EDA

#### Read in the relevant data

In [1]:
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import requests as re

In [2]:
df1 = pd.read_excel('../data/NSSCrisisv_1.xlsx') # 2020-2021
df2 = pd.read_excel('../data/NSSCrisisv_2.xlsx') # 2022
# df_2020 = pd.read_excel('../data/2020callcenter.xlsx')
# df_2021 = pd.read_excel('../data/2021callcenter.xlsx')
# df_2022 = pd.read_excel('../data/2022callcenter.xlsx')

In [15]:
# Rob: Change name of CRISIS Subjective in v2 to CRISIS Issues to match v1
labels = list()
for col in df2.columns:
    labels.append(col.replace('Subjective', 'Issues'))
df2.columns = labels

In [16]:
# Rob: Drops columns that contain only NaN
df1.dropna(how='all', axis=1, inplace=True)
df2.dropna(how='all', axis=1, inplace=True)

In [None]:
# Rob: Drop columns whose entire values are 98% NaN
# pct_null = df1.isnull().sum() / len(df1)
# missing_features = pct_null[pct_null > 0.98].index
# df1.drop(missing_features, axis=1, inplace=True)

In [None]:
# pct_null = df1.isnull().sum() / len(df1)
# missing_features = pct_null[pct_null > 0.98].index
# df2.drop(missing_features, axis=1, inplace=True)

In [17]:
# Maggie: Find the columns that are the same in both dfs
samecolumnsdf = pd.DataFrame(df1.columns.intersection(df2.columns))
samecolumnsdf

Unnamed: 0,0
0,CallReportNum
1,ReportVersion
2,LinkedToCallReportNum
3,CallDateAndTimeStart
4,CallDateAndTimeEnd
5,CallLength
6,CallerNum
7,PhoneWorkerNum
8,PhoneWorkerName
9,PostalCode


## Group EDA
We are using these columns in addition to the CRISIS Issues columns:
- CallReportNum, CallDateAndTimeStart, CallDateAndTimeEnd, CallLength, CallerNum

In [25]:
# Rob: Loop that grabs columns we want and puts it into one df
filter_col1 = [col for col in df1 if col.startswith('CRISIS Issues') or col.startswith('Call')]
df1_ci = df1[filter_col1]

In [26]:
# Rob: Loop that grabs columns we want and puts it into one df
filter_col2 = [col for col in df2 if col.startswith('CRISIS Issues') or col.startswith('Call')]
df2_ci = df2[filter_col2]

In [27]:
all_ci = pd.concat([df1_ci, df2_ci])
all_ci

Unnamed: 0,CallReportNum,CallDateAndTimeStart,CallDateAndTimeEnd,CallLength,CallerNum,CallersFeedback,CRISIS Issues - Abusive Behavior,CRISIS Issues - Emotional State,CRISIS Issues - Financial/Basic Needs,CRISIS Issues - Health/Physical,CRISIS Issues - Homicide,CRISIS Issues - Information or Services Needed,CRISIS Issues - Mental Health,CRISIS Issues - No Issue Call,CRISIS Issues - Other Description,CRISIS Issues - Relationships,CRISIS Issues - Substances,CRISIS Issues - Suicide,CRISIS Issues - No presenting problems
0,96627022,2021-12-31 23:35:00,2021-12-31 23:58:00,23,-1,,,Anxious/Stressed; Financial Stress,Employment/Job Placement,,,,Depression ; Anxiety/Panic,,Politics,Male-Female,,,
1,96626934,2021-12-31 23:29:00,2021-12-31 23:35:00,6,-1,,,Anxious/Stressed; Holiday Stress,,,,,Anxiety/Panic,,,,,,
2,96626654,2021-12-31 23:07:00,2021-12-31 23:28:00,21,1030262,,,Holiday Stress,,,,,Grief,,,Peer Group/Friend; Therapist/Medical Doctor,,,
3,96626370,2021-12-31 22:45:00,2021-12-31 22:55:00,10,2479348,,,Anxious/Stressed,,,,,Anxiety/Panic,,,Neighbor/Landlord; Animal/Pet,,,
4,96625909,2021-12-31 22:15:00,2021-12-31 22:44:00,29,-1,,,,Employment/Job Placement,,,,Depression ; Anxiety/Panic; Medical Related An...,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10643,96673255,2022-01-01 20:34:00,2022-01-01 20:58:00,24,-1,,,,,,,,,,,,,,
10644,96645142,2022-01-01 18:27:00,2022-01-01 19:12:00,45,-1,,,,,,,,,,,,,,
10645,96640741,2022-01-01 14:44:00,2022-01-01 15:19:00,35,-1,,,,,,,,,,,,,,
10646,96849680,2022-01-01 12:09:00,2022-01-01 12:10:00,1,-1,,,,,,,,,,,,,Information about SOSL support group,


Refresher: ..... **what is the average count of issues per call.**

Example CRISIS column cell: A/B; C; D
- There are 4 issues within that cell
- Each issue can be separated by a / or ;
- Therefore count of issues should be count of ('/|;')+1 for cells that have ('/|;')

## ROB'S CODE TO COUNT ISSUES (Abusive Behavior)

In [47]:
ab = all_ci['CRISIS Issues - Abusive Behavior'].notnull()

In [49]:
ab = all_ci[ab]

In [50]:
ab.head()

Unnamed: 0,CallReportNum,CallDateAndTimeStart,CallDateAndTimeEnd,CallLength,CallerNum,CallersFeedback,CRISIS Issues - Abusive Behavior,CRISIS Issues - Emotional State,CRISIS Issues - Financial/Basic Needs,CRISIS Issues - Health/Physical,CRISIS Issues - Homicide,CRISIS Issues - Information or Services Needed,CRISIS Issues - Mental Health,CRISIS Issues - No Issue Call,CRISIS Issues - Other Description,CRISIS Issues - Relationships,CRISIS Issues - Substances,CRISIS Issues - Suicide,CRISIS Issues - No presenting problems
16,96622241,2021-12-31 18:35:00,2021-12-31 18:50:00,15,-1,,Bullying,Anger/Hostility; Anxious/Stressed,,,,,,,,Family; Conflict With Other,,,
27,96615571,2021-12-31 14:35:00,2021-12-31 15:04:00,29,-1,,Abuse/Neglect of Spouse/Partner; Adult Abused ...,Anxious/Stressed; Parenting Stress,,,,,Anxiety/Panic,,,Family; Male-Female,,,
68,96595820,2021-12-30 21:40:00,2021-12-30 21:49:00,9,-1,,Abuse/Neglect of Spouse/Partner,Anxious/Stressed,,,,,Anxiety/Panic,,,Custody Issues; Marital/Divorce,,,
74,96594605,2021-12-30 20:39:00,2021-12-30 20:59:00,20,-1,,Rape/Sexual Abuse,Anxious/Stressed,,,,,Depression ; Anxiety/Panic; Grief due to Suicide,,,Blended Family; Family,,Suicide History/Previous Attempts; CURRENT THO...,
83,96592609,2021-12-30 19:16:00,2021-12-30 20:43:00,87,-1,,Adult Abused as a Child,Anxious/Stressed; Lonely ; Sad/Depressed; Over...,,,,,Moral/Religious Issues; Trauma/PTSD,,,Family; Male-Female; Peer Group/Friend; Confli...,,,


In [51]:
count = []

for x in ab['CRISIS Issues - Abusive Behavior']:
    counter0 = x.count(";")
    counter1 = x.count('/') +1
    counter = counter0+counter1
    count.append(counter)

In [52]:
ab['AB_Issues_Count'] = count
ab.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ab['AB_Issues_Count'] = count


Unnamed: 0,CallReportNum,CallDateAndTimeStart,CallDateAndTimeEnd,CallLength,CallerNum,CallersFeedback,CRISIS Issues - Abusive Behavior,CRISIS Issues - Emotional State,CRISIS Issues - Financial/Basic Needs,CRISIS Issues - Health/Physical,CRISIS Issues - Homicide,CRISIS Issues - Information or Services Needed,CRISIS Issues - Mental Health,CRISIS Issues - No Issue Call,CRISIS Issues - Other Description,CRISIS Issues - Relationships,CRISIS Issues - Substances,CRISIS Issues - Suicide,CRISIS Issues - No presenting problems,AB_Issues_Count
16,96622241,2021-12-31 18:35:00,2021-12-31 18:50:00,15,-1,,Bullying,Anger/Hostility; Anxious/Stressed,,,,,,,,Family; Conflict With Other,,,,1
27,96615571,2021-12-31 14:35:00,2021-12-31 15:04:00,29,-1,,Abuse/Neglect of Spouse/Partner; Adult Abused ...,Anxious/Stressed; Parenting Stress,,,,,Anxiety/Panic,,,Family; Male-Female,,,,4
68,96595820,2021-12-30 21:40:00,2021-12-30 21:49:00,9,-1,,Abuse/Neglect of Spouse/Partner,Anxious/Stressed,,,,,Anxiety/Panic,,,Custody Issues; Marital/Divorce,,,,3
74,96594605,2021-12-30 20:39:00,2021-12-30 20:59:00,20,-1,,Rape/Sexual Abuse,Anxious/Stressed,,,,,Depression ; Anxiety/Panic; Grief due to Suicide,,,Blended Family; Family,,Suicide History/Previous Attempts; CURRENT THO...,,2
83,96592609,2021-12-30 19:16:00,2021-12-30 20:43:00,87,-1,,Adult Abused as a Child,Anxious/Stressed; Lonely ; Sad/Depressed; Over...,,,,,Moral/Religious Issues; Trauma/PTSD,,,Family; Male-Female; Peer Group/Friend; Confli...,,,,1


In [53]:
sum_ab = ab['AB_Issues_Count'].sum()
sum_ab

6062

## MY CODE TO COUNT ISSUES (Abusive Behavior) = SAME AS ROB'S!!!

In [54]:
# Total number of abusive behavior issues
ab = all_ci['CRISIS Issues - Abusive Behavior'].str.count('/|;')+1
ab.sum()
# In the column CRISIS Issues - Abusive Behavior, there are 3557 issues (EACH OF OF THESE IS MISSING 1)

6062.0