## Medical Credit Balance

<font size="5">Reading Medical Credit Balance</font>

In [2]:
## As always, let's first import in pandas
import pandas as pd

In [3]:
## Let's get ready to work on the medical credit balance
## First, we'll load the data and look at it
df_medical = pd.read_csv('balance_medical_services.csv')

## Printing
df_medical.head(5)

Unnamed: 0,Studio Name,Day of Issue Date,Day of Expire Date,Client ID,Full Name,Email,Phone #,Credit Description,Remaining Credits,Max date
0,"Pinecrest, FL","November 21, 2023","February 29, 2024",969062,Greg - Solano,greg.solano@gmail.com,3054317251,10 Ingredient Bundle,6,12/20/2023
1,"Pinecrest, FL","December 1, 2023","March 10, 2024",840386,Billy - Zanski,info@skinnybeatsdrums.com,8287682826,10 Ingredient Bundle,10,12/20/2023
2,"Pinecrest, FL","December 20, 2023","March 29, 2024",969062,Greg - Solano,greg.solano@gmail.com,3054317251,10 Ingredient Bundle,10,12/20/2023
3,"Pinecrest, FL","July 8, 2023","January 24, 2024",926215,Donny - Tse,donnychiang888@gmail.com,9173740964,100 Ingredient Bundle,41,12/20/2023
4,"Pinecrest, FL","August 23, 2023","March 10, 2024",231125,David - Adler,dadler@adlergroup.com,3055629241,100 Ingredient Bundle,89,12/20/2023


In [4]:
## Again, we can get rid of columns that we dont need
df_medical = df_medical.drop(columns=['Studio Name', 'Day of Issue Date', 'Day of Expire Date', 'Client ID', 'Email', 'Max date'])

## Printing
df_medical.head(5)

Unnamed: 0,Full Name,Phone #,Credit Description,Remaining Credits
0,Greg - Solano,3054317251,10 Ingredient Bundle,6
1,Billy - Zanski,8287682826,10 Ingredient Bundle,10
2,Greg - Solano,3054317251,10 Ingredient Bundle,10
3,Donny - Tse,9173740964,100 Ingredient Bundle,41
4,David - Adler,3055629241,100 Ingredient Bundle,89


In [5]:
## We were told to reach out to those that have 60% utilization of their pack
## To start, We're going to have to extract the integers from the credit description because our tech team messed up
## To do this we'll use the reg ex library
import re

## Let's make a new column and populate it with the integers
df_medical['initial_credits'] = df_medical['Credit Description'].astype(str).str.extract(r'(\d+)', expand=False)
## Printing
df_medical.head(10)

Unnamed: 0,Full Name,Phone #,Credit Description,Remaining Credits,initial_credits
0,Greg - Solano,3054317251,10 Ingredient Bundle,6,10
1,Billy - Zanski,8287682826,10 Ingredient Bundle,10,10
2,Greg - Solano,3054317251,10 Ingredient Bundle,10,10
3,Donny - Tse,9173740964,100 Ingredient Bundle,41,100
4,David - Adler,3055629241,100 Ingredient Bundle,89,100
5,Amanda - Fuentes,3053056025,30 Ingredient Bundle,19,30
6,Mario - Finamore,7866815056,30 Ingredient Bundle,20,30
7,Lissette - Aguila,7862952181,30 Ingredient Bundle,10,30
8,Marc - Anderson,9084163153,30 Ingredient Bundle,30,30
9,Megan - Hamann,8594217809,30 Ingredient Bundle,28,30


In [6]:
## As mentioned before, we are trying to the ones who have 60% utilization of their packs
## We'll make a list first so we can make a dataframe out of it later
medical_pack_list = []

## The names of the table also have to be changed so it is all connected
df_medical = df_medical.rename(columns={'Full Name': 'full_name'})
df_medical = df_medical.rename(columns={'Phone #': 'phone'})
df_medical = df_medical.rename(columns={'Credit Description': 'credit_description'})
df_medical = df_medical.rename(columns={'Remaining Credits': 'remaining_credits'})

## Printing
df_medical

Unnamed: 0,full_name,phone,credit_description,remaining_credits,initial_credits
0,Greg - Solano,3054317251,10 Ingredient Bundle,6,10
1,Billy - Zanski,8287682826,10 Ingredient Bundle,10,10
2,Greg - Solano,3054317251,10 Ingredient Bundle,10,10
3,Donny - Tse,9173740964,100 Ingredient Bundle,41,100
4,David - Adler,3055629241,100 Ingredient Bundle,89,100
...,...,...,...,...,...
100,Ingrid - Jimenez,7863696463,Signature IM Shot | 5 Pack,1,5
101,Beth - Erickson,3152866384,Signature IM Shot | 5 Pack,3,5
102,Manny - Ramirez,3059651626,Signature IM Shot | 5 Pack,5,5
103,Janet - Onate,9172911469,Signature IM Shot | 5 Pack,1,5


In [7]:
## There are going to be some decrepencies with this because some of the data from the 'Remainind Credits' column might not have numbers
## Because of that, I'm going to go ahead and get rid of them as they will mess up the calculations
df_medical = df_medical.dropna()

## Printing
df_medical

## As you'll see, we went from 105 rows to 94

Unnamed: 0,full_name,phone,credit_description,remaining_credits,initial_credits
0,Greg - Solano,3054317251,10 Ingredient Bundle,6,10
1,Billy - Zanski,8287682826,10 Ingredient Bundle,10,10
2,Greg - Solano,3054317251,10 Ingredient Bundle,10,10
3,Donny - Tse,9173740964,100 Ingredient Bundle,41,100
4,David - Adler,3055629241,100 Ingredient Bundle,89,100
...,...,...,...,...,...
100,Ingrid - Jimenez,7863696463,Signature IM Shot | 5 Pack,1,5
101,Beth - Erickson,3152866384,Signature IM Shot | 5 Pack,3,5
102,Manny - Ramirez,3059651626,Signature IM Shot | 5 Pack,5,5
103,Janet - Onate,9172911469,Signature IM Shot | 5 Pack,1,5


In [8]:
## Checking types
df_medical.dtypes

full_name             object
phone                  int64
credit_description    object
remaining_credits      int64
initial_credits       object
dtype: object

In [9]:
## As you can see the 'initial_credits' column is an object. We need it to be an int to perform calculations
## To do that, we need to convert it to a string first, then we can make it an int
df_medical['initial_credits'] = df_medical['initial_credits'].astype(str).astype(int)

## Checking type now
df_medical.dtypes

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_medical['initial_credits'] = df_medical['initial_credits'].astype(str).astype(int)


full_name             object
phone                  int64
credit_description    object
remaining_credits      int64
initial_credits        int32
dtype: object

<font size="5">Working with bad data</font>

Since the orignal dataset's "initial credits" column is bugged (which is why we made our own), we're going to have to come up with a solution.
If we look at the data, we can see that some of the credits come from packs while others come from bundles. To automate this (and save us time in the future[in the event that this bug doesnt get fixed]) I'm thinking we can make 2 seperate sets:

1. One set that accounts for the purchases with the word "bundles" in it

- I am doing this because there are no other numbers in the credit description. Knowing this, we can easily extract the numbers and perform calculations 

2. One set that accounts for the purchases with the word "packs" in it
- If you look at the data, we have credit descriptions that read things like "mHbOT 60 min 10 Pack." With that, we wont be able to easily extract the numbers
- Using that example, extracting the numbers will give us the integer "6010" and we would not be able to perform the right calculations with that
- But using the RegEx library (that we'll also use for the first set), we can filter the numbers before the word "Pack" that will give us the number they initially stared with
- We will the be able to perform calculations

<font size="5">Dataset #1 - Bundles</font>

In [10]:
## Let's take a look at our data once again
df_medical.head(25)

Unnamed: 0,full_name,phone,credit_description,remaining_credits,initial_credits
0,Greg - Solano,3054317251,10 Ingredient Bundle,6,10
1,Billy - Zanski,8287682826,10 Ingredient Bundle,10,10
2,Greg - Solano,3054317251,10 Ingredient Bundle,10,10
3,Donny - Tse,9173740964,100 Ingredient Bundle,41,100
4,David - Adler,3055629241,100 Ingredient Bundle,89,100
5,Amanda - Fuentes,3053056025,30 Ingredient Bundle,19,30
6,Mario - Finamore,7866815056,30 Ingredient Bundle,20,30
7,Lissette - Aguila,7862952181,30 Ingredient Bundle,10,30
8,Marc - Anderson,9084163153,30 Ingredient Bundle,30,30
9,Megan - Hamann,8594217809,30 Ingredient Bundle,28,30


In [11]:
## First we'll filter out those purchases with the description that have the word bundle in it
## The "str.contains()" function will help us with that
df_medical.loc[df_medical['credit_description'].str.contains('bundle', case=False)]

Unnamed: 0,full_name,phone,credit_description,remaining_credits,initial_credits
0,Greg - Solano,3054317251,10 Ingredient Bundle,6,10
1,Billy - Zanski,8287682826,10 Ingredient Bundle,10,10
2,Greg - Solano,3054317251,10 Ingredient Bundle,10,10
3,Donny - Tse,9173740964,100 Ingredient Bundle,41,100
4,David - Adler,3055629241,100 Ingredient Bundle,89,100
5,Amanda - Fuentes,3053056025,30 Ingredient Bundle,19,30
6,Mario - Finamore,7866815056,30 Ingredient Bundle,20,30
7,Lissette - Aguila,7862952181,30 Ingredient Bundle,10,30
8,Marc - Anderson,9084163153,30 Ingredient Bundle,30,30
9,Megan - Hamann,8594217809,30 Ingredient Bundle,28,30


In [12]:
## Since that looks right, lets define it
df_medical_bundles = df_medical.loc[df_medical['credit_description'].str.contains('bundle', case=False)]

## As mentioned before, we are trying to the ones who have 60% utilization of their packs
## We'll make a list first so we can make a dataframe out of it later
medical_list_bundles = []

## Now we can do some calculating
## The .iterrows() function allows you to iterate through rows in a pandas datafram
## We must make sure that we are working with integers when performing calculations
## The thought process is to grab every row, divide the remaining credits by the starting, and multiply it by 100 to get the percentage
## Then if you subtract that percentage by 100, you will get the utilization percentage
## If the utilization is above is 60%, we'll put it on the list
for index, row in df_medical_bundles.iterrows():
    remaining = int(row['remaining_credits'])
    total = int(row['initial_credits'])
    calculated_percentage = (remaining / total) * 100
    utilization = 100 - calculated_percentage
    if utilization > 60:
        my_list = [row.full_name, row.phone, row.credit_description, row.remaining_credits]
        medical_list_bundles.append(my_list)

## As you see, we also put what Packages they purchased as well as how many they have remaining
## Let's print the list to check if it worked
medical_list_bundles

[['Lissette - Aguila', 7862952181, '30 Ingredient Bundle', 10]]

In [13]:
## Now let's make it into a dataframe using pandas
bundles_frame = pd.DataFrame(medical_list_bundles, columns=['full_name', 'phone', 'credit_description', 'remaining_credits'])

<font size="5">Dataset #2 - Packs</font>

We'll use the bundles_frame list later to make one big medical text list

For now let's work on the second dataset

In [14]:
## As mentioned before, the packs list will be a little more complicated
## Let's start at the beginning and take a look at our data
df_medical.head(25)

Unnamed: 0,full_name,phone,credit_description,remaining_credits,initial_credits
0,Greg - Solano,3054317251,10 Ingredient Bundle,6,10
1,Billy - Zanski,8287682826,10 Ingredient Bundle,10,10
2,Greg - Solano,3054317251,10 Ingredient Bundle,10,10
3,Donny - Tse,9173740964,100 Ingredient Bundle,41,100
4,David - Adler,3055629241,100 Ingredient Bundle,89,100
5,Amanda - Fuentes,3053056025,30 Ingredient Bundle,19,30
6,Mario - Finamore,7866815056,30 Ingredient Bundle,20,30
7,Lissette - Aguila,7862952181,30 Ingredient Bundle,10,30
8,Marc - Anderson,9084163153,30 Ingredient Bundle,30,30
9,Megan - Hamann,8594217809,30 Ingredient Bundle,28,30


In [15]:
## Now we can go ahead and look for descriptions with the word "pack" in it
## We'll use the same str.contains() function to locate them
df_medical.loc[df_medical['credit_description'].str.contains('pack', case=False)]

Unnamed: 0,full_name,phone,credit_description,remaining_credits,initial_credits
17,Namir - Hakem,7866592797,Base IV Drip | 10 Pack,6,10
18,Ryan - Eidelstein,3058121185,Base IV Drip | 10 Pack,9,10
19,Peter - Gross,3052981226,Base IV Drip | 10 Pack,10,10
20,Lindsey - Wolfson,3055828891,Base IV Drip | 3 Pack,1,3
21,Diego - Ballina,3059175676,Base IV Drip | 3 Pack,6,3
...,...,...,...,...,...
100,Ingrid - Jimenez,7863696463,Signature IM Shot | 5 Pack,1,5
101,Beth - Erickson,3152866384,Signature IM Shot | 5 Pack,3,5
102,Manny - Ramirez,3059651626,Signature IM Shot | 5 Pack,5,5
103,Janet - Onate,9172911469,Signature IM Shot | 5 Pack,1,5


In [16]:
## It looks about right so we can now define it
df_medical_packs = df_medical.loc[df_medical['credit_description'].str.contains('pack', case=False)]

## Now we'll use the RegEx libreary that we imported earlier to get the numbers before the word "Pack"
df_medical_packs['credit_description'].str.extract(r'(\d+)\s*Pack', flags=re.IGNORECASE)

## We'll also alter the initial credits column
df_medical_packs['initial_credits'] = df_medical_packs['credit_description'].str.extract(r'(\d+)\s*Pack', flags=re.IGNORECASE)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_medical_packs['initial_credits'] = df_medical_packs['credit_description'].str.extract(r'(\d+)\s*Pack', flags=re.IGNORECASE)


In [17]:
## Let's print it out and see if our code is right
df_medical_packs.head(15)

Unnamed: 0,full_name,phone,credit_description,remaining_credits,initial_credits
17,Namir - Hakem,7866592797,Base IV Drip | 10 Pack,6,10
18,Ryan - Eidelstein,3058121185,Base IV Drip | 10 Pack,9,10
19,Peter - Gross,3052981226,Base IV Drip | 10 Pack,10,10
20,Lindsey - Wolfson,3055828891,Base IV Drip | 3 Pack,1,3
21,Diego - Ballina,3059175676,Base IV Drip | 3 Pack,6,3
22,Greg - Solano,3054317251,Base IV Drip | 3 Pack,3,3
23,Audra - Kurland,3058155255,Base IV Drip | 5 Pack,3,5
24,Billy - Zanski,8287682826,Base IV Drip | 5 Pack,5,5
25,Jose - Alvarez,3052195379,Base IV Drip | 5 Pack,5,5
26,David - Walsh,8476443433,Base IV Drip | 5 Pack,4,5


In [18]:
## Now we can perform the same calculations we performed earlier
## First we'll make the empty list
medical_list_packs = []

for index, row in df_medical_packs.iterrows():
    remaining = int(row['remaining_credits'])
    total = int(row['initial_credits'])
    calculated_percentage = (remaining / total) * 100
    utilization = 100 - calculated_percentage
    if utilization > 60:
        my_list = [row.full_name, row.phone, row.credit_description, row.remaining_credits]
        medical_list_packs.append(my_list)

## Let's print the list to check if it worked
medical_list_packs

[['Lindsey - Wolfson', 3055828891, 'Base IV Drip | 3 Pack', 1],
 ['Monika - Tefel', 7865020010, 'mHbOT | 60 min | 10 Pack', 3],
 ['Juan - Anillo', 7864738195, 'mHbOT | 60 min | 10 Pack', 2],
 ['Roberto - Campillo', 3523284101, 'mHbOT | 60 min | 10 Pack', 2],
 ['Madeleine - Palmer', 4379229555, 'mHbOT | 60 min | 3 Pack', 1],
 ['Barbara - Corton', 7738058590, 'mHbOT | 60 min | 5 Pack', 1],
 ['Nicole - Klobedanz', 8609973571, 'NAD+ IV Drip Therapy 750mg | 4 Pack', 1],
 ['Sandra - Alonso', 3056133376, 'Premium IM Shot | 5 Pack', 1],
 ['Lucas - Diaz', 7863388505, 'Signature IM Shot | 3 Pack', 1],
 ['Wanda - Alvarez', 3052195380, 'Signature IM Shot | 5 Pack', 1],
 ['Ethel Rodas - Rodas', 3053026227, 'Signature IM Shot | 5 Pack', 1],
 ['Christina - Xiques', 3052990750, 'Signature IM Shot | 5 Pack', 1],
 ['Ingrid - Jimenez', 7863696463, 'Signature IM Shot | 5 Pack', 1],
 ['Janet - Onate', 9172911469, 'Signature IM Shot | 5 Pack', 1]]

In [19]:
## Like before, we'll make it a frame
packs_frame = pd.DataFrame(medical_list_packs, columns=['full_name', 'phone', 'credit_description', 'remaining_credits'])

<font size="5">Making the lists actionable</font>

Now that we have created both of the lists, lets combine them and make them into an excel sheet for the reps

In [21]:
## First let's combine the two frames using the concat() function
medical_text_list = pd.concat([bundles_frame, packs_frame], axis=0)

## Printing to make sure
medical_text_list

Unnamed: 0,full_name,phone,credit_description,remaining_credits
0,Lissette - Aguila,7862952181,30 Ingredient Bundle,10
0,Lindsey - Wolfson,3055828891,Base IV Drip | 3 Pack,1
1,Monika - Tefel,7865020010,mHbOT | 60 min | 10 Pack,3
2,Juan - Anillo,7864738195,mHbOT | 60 min | 10 Pack,2
3,Roberto - Campillo,3523284101,mHbOT | 60 min | 10 Pack,2
4,Madeleine - Palmer,4379229555,mHbOT | 60 min | 3 Pack,1
5,Barbara - Corton,7738058590,mHbOT | 60 min | 5 Pack,1
6,Nicole - Klobedanz,8609973571,NAD+ IV Drip Therapy 750mg | 4 Pack,1
7,Sandra - Alonso,3056133376,Premium IM Shot | 5 Pack,1
8,Lucas - Diaz,7863388505,Signature IM Shot | 3 Pack,1


In [22]:
## Save to excel file
medical_text_list.to_excel('medical_text_list.xlsx')