# Attendance Generation from Absent Codes

Aeries does not allow for us to find the percent present for a student based on a date range only allowing for a search on Year to Date attendance. This creates issues if we want to look at the quarter, semester or monthly attendance for students.

The following code allows for the percent attendance for a student to be calculated using a query that searches for the All Day codes for the year as well as the Enrollment Data for the students.

The only input that is required is the date range of interest as well as answering questions on the days off of school pertaining to particular school holidays. 

__The code for the Holidays might have to be changed if there are any alterations to the academic calendar__

In [1]:
import numpy as np
import pandas as pd

In [2]:
# Query for the All Day codes for the school year
# LIST ATT STU ATT.SC STU.ID ATT.DY ATT.AL ATT.DT ATT.RS ATT.DTS ATT.ACO

absent_codes = pd.read_excel(r"C:\Users\derek.castleman\Desktop\absentcodes.xlsx")

# Obtain enrollment data for the students
# LIST STU ID LN FN SC GR ED 

enrollment = pd.read_excel(r"C:\Users\derek.castleman\Desktop\enrollment.xlsx")

  warn("Workbook contains no default style, apply openpyxl's default")


In [3]:
absent_codes

Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment
0,1,1095014,7,,08/09/2022,,2022-08-09 07:34:45,
1,1,1095014,8,,08/10/2022,,2022-08-11 10:03:32,
2,1,1095014,10,,08/12/2022,,2022-08-12 10:31:39,
3,1,1095014,18,U,08/24/2022,,2022-08-25 09:30:01,
4,1,1095014,29,,09/08/2022,,2022-08-11 08:54:25,
...,...,...,...,...,...,...,...,...
63881,6,1094375,71,I,11/07/2022,,2022-11-07 13:38:02,
63882,6,1094375,72,M,11/08/2022,,2022-11-10 11:41:28,
63883,6,1094375,91,I,12/05/2022,,2022-12-05 11:32:11,
63884,6,1094375,135,M,02/03/2023,,2023-02-09 14:09:40,


## Selecting Date Range

Inputting the date range of interest that you want to generate the attendance data for and then converting it into datetime.

In [4]:
absent_codes['Date']= pd.to_datetime(absent_codes['Date']) # Changes absent date to datetime
absent_codes

Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment
0,1,1095014,7,,2022-08-09,,2022-08-09 07:34:45,
1,1,1095014,8,,2022-08-10,,2022-08-11 10:03:32,
2,1,1095014,10,,2022-08-12,,2022-08-12 10:31:39,
3,1,1095014,18,U,2022-08-24,,2022-08-25 09:30:01,
4,1,1095014,29,,2022-09-08,,2022-08-11 08:54:25,
...,...,...,...,...,...,...,...,...
63881,6,1094375,71,I,2022-11-07,,2022-11-07 13:38:02,
63882,6,1094375,72,M,2022-11-08,,2022-11-10 11:41:28,
63883,6,1094375,91,I,2022-12-05,,2022-12-05 11:32:11,
63884,6,1094375,135,M,2023-02-03,,2023-02-09 14:09:40,


In [5]:
a = input('What is the start date you are interested in (mm/dd/yyyy):          ') #Input start date

What is the start date you are interested in (mm/dd/yyyy):          12/01/2022


In [6]:
a = pd.to_datetime(a) # Change start date to datetime
a

Timestamp('2022-12-01 00:00:00')

In [7]:
b = input('What is the end date you are interested in (mm/dd/yyyy):          ') #Input end date

What is the end date you are interested in (mm/dd/yyyy):          12/31/2022


In [8]:
b = pd.to_datetime(b) # Turn end date to date time
b

Timestamp('2022-12-31 00:00:00')

In [9]:
# Filters date range from All Day code table
dates_interested = absent_codes[(absent_codes['Date'] >=a) & (absent_codes['Date'] <=b)]
dates_interested

Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment
25,1,1095014,89,U,2022-12-01,,2022-12-01 11:23:06,
26,1,1095014,90,,2022-12-02,,2022-12-08 11:05:19,
27,1,1095014,92,,2022-12-06,,2022-12-06 10:55:13,
28,1,1095014,95,,2022-12-09,,2022-12-09 09:13:22,
29,1,1095014,98,,2022-12-14,,2022-12-14 11:21:00,
...,...,...,...,...,...,...,...,...
63838,4,1094487,97,U,2022-12-13,,2022-12-14 10:59:34,
63839,4,1094487,98,U,2022-12-14,,2022-12-14 10:59:42,
63840,4,1094487,99,T,2022-12-15,,2022-12-15 09:06:28,
63841,4,1094487,100,U,2022-12-16,,2022-12-16 12:56:55,


## Calculating Absences, Tardies and Truancies

Absences will be calculated using the All Day codes which coincide with an absent for the student for the day.

Unexcused absences will be filtered by the codes that relate to this kind of absence.

Tardies will focus on the codes that are related to tardies.

Truancies will be students that have an All Day code of >30.

In [10]:
# Filtering for rows that correspond to absences
absent_students = dates_interested[(dates_interested['All day'] == '0') | (dates_interested['All day'] == '4') | 
                                  (dates_interested['All day'] == '5') | (dates_interested['All day'] == 'H') | 
                                  (dates_interested['All day'] == 'I') | (dates_interested['All day'] == 'L') | 
                                  (dates_interested['All day'] == 'M') | (dates_interested['All day'] == 'X') |
                                  (dates_interested['All day'] == '7') | (dates_interested['All day'] == 'A') |
                                  (dates_interested['All day'] == 'Q') | (dates_interested['All day'] == 'S') |
                                  (dates_interested['All day'] == 'U') | (dates_interested['All day'] == 'P')]
absent_students

Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment
25,1,1095014,89,U,2022-12-01,,2022-12-01 11:23:06,
85,4,1093925,90,I,2022-12-02,,2022-12-02 08:46:57,
86,4,1093925,91,M,2022-12-05,,2022-12-05 13:27:34,
87,4,1093925,92,M,2022-12-06,,2022-12-05 13:27:36,
88,4,1093925,93,M,2022-12-07,,2022-12-05 13:27:37,
...,...,...,...,...,...,...,...,...
63836,4,1094487,95,U,2022-12-09,,2022-12-12 16:27:32,
63838,4,1094487,97,U,2022-12-13,,2022-12-14 10:59:34,
63839,4,1094487,98,U,2022-12-14,,2022-12-14 10:59:42,
63841,4,1094487,100,U,2022-12-16,,2022-12-16 12:56:55,


In [11]:
# Adding a column that gives one day for each absent code
absent_students['Absent'] = 1
absent_students

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  absent_students['Absent'] = 1


Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment,Absent
25,1,1095014,89,U,2022-12-01,,2022-12-01 11:23:06,,1
85,4,1093925,90,I,2022-12-02,,2022-12-02 08:46:57,,1
86,4,1093925,91,M,2022-12-05,,2022-12-05 13:27:34,,1
87,4,1093925,92,M,2022-12-06,,2022-12-05 13:27:36,,1
88,4,1093925,93,M,2022-12-07,,2022-12-05 13:27:37,,1
...,...,...,...,...,...,...,...,...,...
63836,4,1094487,95,U,2022-12-09,,2022-12-12 16:27:32,,1
63838,4,1094487,97,U,2022-12-13,,2022-12-14 10:59:34,,1
63839,4,1094487,98,U,2022-12-14,,2022-12-14 10:59:42,,1
63841,4,1094487,100,U,2022-12-16,,2022-12-16 12:56:55,,1


In [12]:
# Grouping by school and student ID to calculate total number of days absent
absent = absent_students.groupby(by=['School', 'Student ID'])['Absent'].sum().reset_index()
absent

Unnamed: 0,School,Student ID,Absent
0,1,1091975,1
1,1,1091979,3
2,1,1091981,2
3,1,1091987,1
4,1,1091990,5
...,...,...,...
1042,8,1095289,5
1043,8,1095593,9
1044,8,1095830,1
1045,8,1095866,3


In [13]:
# Filters for the codes that relate to unexcused absences
unexcused_absent_students = dates_interested[(dates_interested['All day'] == '7') | (dates_interested['All day'] == 'A') |
                                  (dates_interested['All day'] == 'Q') | (dates_interested['All day'] == 'S') |
                                  (dates_interested['All day'] == 'U')]
unexcused_absent_students

Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment
25,1,1095014,89,U,2022-12-01,,2022-12-01 11:23:06,
232,1,1094155,90,U,2022-12-02,,2022-12-08 15:47:29,
236,1,1094155,100,U,2022-12-16,,2022-12-16 14:23:16,
289,2,1095206,90,U,2022-12-02,,2022-12-12 08:25:48,
418,8,1094578,99,U,2022-12-15,,2022-12-15 09:21:15,
...,...,...,...,...,...,...,...,...
63832,4,1094487,91,U,2022-12-05,,2022-12-05 13:53:01,
63836,4,1094487,95,U,2022-12-09,,2022-12-12 16:27:32,
63838,4,1094487,97,U,2022-12-13,,2022-12-14 10:59:34,
63839,4,1094487,98,U,2022-12-14,,2022-12-14 10:59:42,


In [14]:
# Gives one day for each unexcused absence
unexcused_absent_students['Unexcused Absences'] = 1
unexcused_absent_students

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  unexcused_absent_students['Unexcused Absences'] = 1


Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment,Unexcused Absences
25,1,1095014,89,U,2022-12-01,,2022-12-01 11:23:06,,1
232,1,1094155,90,U,2022-12-02,,2022-12-08 15:47:29,,1
236,1,1094155,100,U,2022-12-16,,2022-12-16 14:23:16,,1
289,2,1095206,90,U,2022-12-02,,2022-12-12 08:25:48,,1
418,8,1094578,99,U,2022-12-15,,2022-12-15 09:21:15,,1
...,...,...,...,...,...,...,...,...,...
63832,4,1094487,91,U,2022-12-05,,2022-12-05 13:53:01,,1
63836,4,1094487,95,U,2022-12-09,,2022-12-12 16:27:32,,1
63838,4,1094487,97,U,2022-12-13,,2022-12-14 10:59:34,,1
63839,4,1094487,98,U,2022-12-14,,2022-12-14 10:59:42,,1


In [15]:
# Sums up the number of unexcused absences for each students
unexcused_absent = unexcused_absent_students.groupby(by=['School', 'Student ID'])['Unexcused Absences'].sum().reset_index()
unexcused_absent

Unnamed: 0,School,Student ID,Unexcused Absences
0,1,1091979,2
1,1,1091987,1
2,1,1091990,1
3,1,1091992,1
4,1,1091999,1
...,...,...,...
430,8,1095046,1
431,8,1095184,1
432,8,1095240,1
433,8,1095242,2


In [16]:
# Filters for truancies
truancies = dates_interested[dates_interested['All day'] == 'Z']
truancies

Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment
89,4,1093925,96,Z,2022-12-12,,2022-12-12 08:32:31,
510,4,1094557,98,Z,2022-12-14,,2022-12-14 08:26:42,
619,4,1092361,90,Z,2022-12-02,,2022-12-02 08:19:51,
620,4,1092361,92,Z,2022-12-06,,2022-12-06 09:15:46,
1402,4,1094308,90,Z,2022-12-02,,2022-12-02 08:20:59,
...,...,...,...,...,...,...,...,...
62933,4,1095019,91,Z,2022-12-05,,2022-12-05 08:58:09,
63152,4,1094989,100,Z,2022-12-16,,2022-12-16 08:16:45,
63182,4,1095842,100,Z,2022-12-16,,2022-12-16 08:47:54,
63834,4,1094487,93,Z,2022-12-07,,2022-12-07 08:28:05,


In [17]:
# Gives on truancy for each day
truancies['Truant'] = 1
truancies

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  truancies['Truant'] = 1


Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment,Truant
89,4,1093925,96,Z,2022-12-12,,2022-12-12 08:32:31,,1
510,4,1094557,98,Z,2022-12-14,,2022-12-14 08:26:42,,1
619,4,1092361,90,Z,2022-12-02,,2022-12-02 08:19:51,,1
620,4,1092361,92,Z,2022-12-06,,2022-12-06 09:15:46,,1
1402,4,1094308,90,Z,2022-12-02,,2022-12-02 08:20:59,,1
...,...,...,...,...,...,...,...,...,...
62933,4,1095019,91,Z,2022-12-05,,2022-12-05 08:58:09,,1
63152,4,1094989,100,Z,2022-12-16,,2022-12-16 08:16:45,,1
63182,4,1095842,100,Z,2022-12-16,,2022-12-16 08:47:54,,1
63834,4,1094487,93,Z,2022-12-07,,2022-12-07 08:28:05,,1


In [18]:
# Summs up the truancies for each student
truant = truancies.groupby(by=['School', 'Student ID'])['Truant'].sum().reset_index()
truant

Unnamed: 0,School,Student ID,Truant
0,2,1092461,1
1,2,1092982,1
2,2,1092989,1
3,2,1092996,1
4,2,1093004,1
...,...,...,...
241,6,1095774,1
242,6,1095775,1
243,6,1095939,1
244,6,1095940,1


In [19]:
# Filters for the tardies for each student
tardy_students = dates_interested[(dates_interested['All day'] == 'T') | (dates_interested['All day'] == 'D') |
                                  (dates_interested['All day'] == 'C')]
tardy_students

Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment
689,1,1093789,90,T,2022-12-02,,2022-12-08 11:05:19,
932,1,1092532,97,T,2022-12-13,,2022-12-15 15:13:56,
1869,4,1094955,93,C,2022-12-07,,2022-12-07 07:53:48,
1910,4,1095165,93,C,2022-12-07,,2022-12-07 07:53:52,
2355,2,1095953,95,T,2022-12-09,,2022-12-13 19:11:18,
...,...,...,...,...,...,...,...,...
63674,1,1093579,89,T,2022-12-01,,2022-12-08 11:05:25,
63681,1,1093579,96,T,2022-12-12,,2022-12-12 10:37:38,
63833,4,1094487,92,T,2022-12-06,,2022-12-06 09:23:36,
63837,4,1094487,96,T,2022-12-12,,2022-12-12 09:05:15,


In [20]:
# Gives one tardy for each day
tardy_students['Tardy'] = 1
tardy_students

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tardy_students['Tardy'] = 1


Unnamed: 0,School,Student ID,Day#,All day,Date,Reason,Date Timestamp,ADA Comment,Tardy
689,1,1093789,90,T,2022-12-02,,2022-12-08 11:05:19,,1
932,1,1092532,97,T,2022-12-13,,2022-12-15 15:13:56,,1
1869,4,1094955,93,C,2022-12-07,,2022-12-07 07:53:48,,1
1910,4,1095165,93,C,2022-12-07,,2022-12-07 07:53:52,,1
2355,2,1095953,95,T,2022-12-09,,2022-12-13 19:11:18,,1
...,...,...,...,...,...,...,...,...,...
63674,1,1093579,89,T,2022-12-01,,2022-12-08 11:05:25,,1
63681,1,1093579,96,T,2022-12-12,,2022-12-12 10:37:38,,1
63833,4,1094487,92,T,2022-12-06,,2022-12-06 09:23:36,,1
63837,4,1094487,96,T,2022-12-12,,2022-12-12 09:05:15,,1


In [21]:
# Sums up the tardies for each student
tardies = tardy_students.groupby(by=['School', 'Student ID'])['Tardy'].sum().reset_index()
tardies

Unnamed: 0,School,Student ID,Tardy
0,1,1091990,3
1,1,1092002,1
2,1,1092026,1
3,1,1092059,1
4,1,1092063,1
...,...,...,...
201,6,1095929,1
202,6,1095932,1
203,6,1095940,1
204,6,1095990,1


## Calculating Days Enrolled

The days that the students are enrolled at the school for the time period that is selected will be calculated.

In [22]:
enrollment

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date
0,1095014,A Cerda,Eddie,1,9,08/09/2022
1,1094167,Aaron,Anastasia,4,4,08/10/2021
2,1095258,Aaron,Lillie,4,1,08/10/2021
3,1093925,Aburto-Ramirez,Ayden,4,2,08/10/2021
4,1095224,Acevedo,Italivi,1,10,08/10/2021
...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,08/10/2021
2363,1094273,Zuniga,Matthew,4,5,08/10/2021
2364,1092848,Zuniga,Michelle,6,5,08/10/2021
2365,1094487,Zuniga,Pedro,4,3,08/10/2021


In [23]:
# Changing the enter date to datetime format
enrollment['Enter Date']= pd.to_datetime(enrollment['Enter Date'])
enrollment

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date
0,1095014,A Cerda,Eddie,1,9,2022-08-09
1,1094167,Aaron,Anastasia,4,4,2021-08-10
2,1095258,Aaron,Lillie,4,1,2021-08-10
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10
4,1095224,Acevedo,Italivi,1,10,2021-08-10
...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10
2363,1094273,Zuniga,Matthew,4,5,2021-08-10
2364,1092848,Zuniga,Michelle,6,5,2021-08-10
2365,1094487,Zuniga,Pedro,4,3,2021-08-10


In [24]:
# Creating a function that sets different dates based on when the student enrolls and time period selected
def f(row):
    if row['Enter Date'] <= a: #Enter date is first date selected if student enrolled prior
        val = a
    else:
        val = row['Enter Date'] #Enter date is date of actual enrollment if after start date
    return val

In [25]:
# Creates enrollment column using function defined above
enrollment['Enrollment'] = enrollment.apply(f, axis=1)
enrollment

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrollment
0,1095014,A Cerda,Eddie,1,9,2022-08-09,2022-12-01
1,1094167,Aaron,Anastasia,4,4,2021-08-10,2022-12-01
2,1095258,Aaron,Lillie,4,1,2021-08-10,2022-12-01
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,2022-12-01
4,1095224,Acevedo,Italivi,1,10,2021-08-10,2022-12-01
...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,2022-12-01
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,2022-12-01
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,2022-12-01
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,2022-12-01


## Inputing Holidays

The dates for holidays can be input for the time range that is of concern. Any other holiday outside of the range can be skipped by hitting enter.

In [26]:
# Takes an input for the date then converts it to datetime and a dataframe
c = input('When is Labor Day (mm/dd/yyyy) - Hit enter if not in time range?:      ')
c = pd.to_datetime(c)
c=[c]
c = pd.DataFrame(c, columns=['Dates'])
c

When is Labor Day (mm/dd/yyyy) - Hit enter if not in time range?:      


Unnamed: 0,Dates
0,NaT


In [27]:
c["Date"] = pd.to_datetime(c['Dates']).dt.date
c.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   Dates   0 non-null      datetime64[ns]
 1   Date    0 non-null      datetime64[ns]
dtypes: datetime64[ns](2)
memory usage: 144.0 bytes


In [28]:
d = input('When is first date of Fall Break (mm/dd/yyyy)? - Hit enter if not in time range:      ')
d = pd.to_datetime(d)
d

When is first date of Fall Break (mm/dd/yyyy)? - Hit enter if not in time range:      


NaT

In [29]:
e = input('When is last date of Fall Break (mm/dd/yyyy)? - Hit enter if not in time range:      ')
e = pd.to_datetime(e)
e

When is last date of Fall Break (mm/dd/yyyy)? - Hit enter if not in time range:      


NaT

In [30]:
# If the start and end date are not null it will create a dataframe between the date range
if pd.notna(d) and pd.notna(e):
    fall_break = pd.date_range(d,e,freq='d')
    fall_break = pd.DataFrame(fall_break, columns =['Dates'])
    fall_break["Date"] = fall_break['Dates'].dt.date
else:
    fall_break = None # Returns null if the start and end date are not entered

In [31]:
fall_break

In [32]:
f = input('When is Veterans Day (mm/dd/yyyy)? - Hit enter if not in time range:      ')
f = pd.to_datetime(f)
f=[f]
f = pd.DataFrame(f, columns=['Dates'])
f["Date"] = f['Dates'].dt.date
f

When is Veterans Day (mm/dd/yyyy)? - Hit enter if not in time range:      


Unnamed: 0,Dates,Date
0,NaT,NaT


In [33]:
g = input('When is first date of Thanksgiving Break (mm/dd/yyyy)? - Hit enter if not in time range:      ')
g = pd.to_datetime(g)
g

When is first date of Thanksgiving Break (mm/dd/yyyy)? - Hit enter if not in time range:      


NaT

In [34]:
h = input('When is last date of Thanksgiving Break (mm/dd/yyyy)? - Hit enter if not in time range:      ')
h = pd.to_datetime(h)
h

When is last date of Thanksgiving Break (mm/dd/yyyy)? - Hit enter if not in time range:      


NaT

In [35]:
if pd.notna(g) and pd.notna(h):
    thanksgiving_break = pd.date_range(g,h,freq='d')
    thanksgiving_break = pd.DataFrame(thanksgiving_break, columns =['Dates'])
    thanksgiving_break["Date"] = thanksgiving_break['Dates'].dt.date
else:
    thanksgiving_break = None

In [36]:
i = input('List first date of Winter Break (mm/dd/yyyy)? - Hit enter if not in time range:      ')
i = pd.to_datetime(i)
i

List first date of Winter Break (mm/dd/yyyy)? - Hit enter if not in time range:      12/19/2022


Timestamp('2022-12-19 00:00:00')

In [37]:
j = input('List last date of Winter Break (mm/dd/yyyy)? - Hit enter if not in time range:      ')
j = pd.to_datetime(j)
j

List last date of Winter Break (mm/dd/yyyy)? - Hit enter if not in time range:      01/06/2023


Timestamp('2023-01-06 00:00:00')

In [38]:
if pd.notna(i) and pd.notna(j):
    winter_break = pd.date_range(i,j,freq='d')
    winter_break = pd.DataFrame(winter_break, columns =['Dates'])
    winter_break["Date"] = winter_break['Dates'].dt.date
else:
    winter_break = None

In [39]:
k = input('When is MLK Day (mm/dd/yyyy)? - Hit enter if not in time range:      ')
k = pd.to_datetime(k)
k=[k]
k = pd.DataFrame(k, columns=['Dates'])
k["Date"] = k['Dates'].dt.date
k

When is MLK Day (mm/dd/yyyy)? - Hit enter if not in time range:      


Unnamed: 0,Dates,Date
0,NaT,NaT


In [40]:
l = input('When is Presidents Day (mm/dd/yyyy)? - Hit enter if not in time range:      ')
l = pd.to_datetime(l)
l=[l]
l = pd.DataFrame(l, columns=['Dates'])
l["Date"] = l['Dates'].dt.date
l

When is Presidents Day (mm/dd/yyyy)? - Hit enter if not in time range:      


Unnamed: 0,Dates,Date
0,NaT,NaT


In [41]:
m = input('When does Spring Break begin (mm/dd/yyyy)? - Hit enter if not in time range:      ')
m = pd.to_datetime(m)
m

When does Spring Break begin (mm/dd/yyyy)? - Hit enter if not in time range:      


NaT

In [42]:
n = input('When does Spring Break end (mm/dd/yyyy)? - Hit enter if not in time range:      ')
n = pd.to_datetime(n)
n

When does Spring Break end (mm/dd/yyyy)? - Hit enter if not in time range:      


NaT

In [43]:
if pd.notna(m) and pd.notna(n):
    spring_break = pd.date_range(m,n,freq='d')
    spring_break = pd.DataFrame(spring_break, columns =['Dates'])
    spring_break["Date"] = spring_break['Dates'].dt.date
else:
    spring_break = None

In [44]:
o = input('When is Cesar Chavez Day (mm/dd/yyyy)? - Hit enter if not in time range:      ')
o = pd.to_datetime(o)
o=[o]
o = pd.DataFrame(o, columns=['Dates'])
o["Date"] = o['Dates'].dt.date
o

When is Cesar Chavez Day (mm/dd/yyyy)? - Hit enter if not in time range:      


Unnamed: 0,Dates,Date
0,NaT,NaT


In [45]:
p = input('When is Easter Holiday (mm/dd/yyyy)? - Hit enter if not in time range:      ')
p = pd.to_datetime(p)
p=[p]
p = pd.DataFrame(p, columns=['Dates'])
p["Date"] = p['Dates'].dt.date
p

When is Easter Holiday (mm/dd/yyyy)? - Hit enter if not in time range:      


Unnamed: 0,Dates,Date
0,NaT,NaT


In [46]:
q = input('When is Memorial Day (mm/dd/yyyy)? - Hit enter if not in time range:      ')
q = pd.to_datetime(q)
q=[q]
q = pd.DataFrame(q, columns=['Dates'])
q["Date"] = q['Dates'].dt.date
q

When is Memorial Day (mm/dd/yyyy)? - Hit enter if not in time range:      


Unnamed: 0,Dates,Date
0,NaT,NaT


## Removing Holidays

The holidays that were input will be concatenated into one dataframe. The range of dates that were selected will be generated and matched with the holidays. Then the dates that correspond with the holidays will be removed from the time range of interest.

In [47]:
# The input holidays will be concatenated into one dataframe
holidays = pd.concat([c, fall_break, f, thanksgiving_break, winter_break, k, l, spring_break, o, p, q]).reset_index(drop=True)
holidays

Unnamed: 0,Dates,Date
0,NaT,
1,NaT,
2,2022-12-19,2022-12-19
3,2022-12-20,2022-12-20
4,2022-12-21,2022-12-21
5,2022-12-22,2022-12-22
6,2022-12-23,2022-12-23
7,2022-12-24,2022-12-24
8,2022-12-25,2022-12-25
9,2022-12-26,2022-12-26


In [48]:
holidays = holidays[['Dates']] #Select the datetime column
holidays = holidays.rename(columns={"Dates": "Holidays"}) #Change the name of column to holidays
holidays

Unnamed: 0,Holidays
0,NaT
1,NaT
2,2022-12-19
3,2022-12-20
4,2022-12-21
5,2022-12-22
6,2022-12-23
7,2022-12-24
8,2022-12-25
9,2022-12-26


In [49]:
# The dates between the selected range will be generated
date_range = pd.date_range(a,b,freq='B')
date_range = pd.DataFrame(date_range, columns =['Dates'])
date_range

Unnamed: 0,Dates
0,2022-12-01
1,2022-12-02
2,2022-12-05
3,2022-12-06
4,2022-12-07
5,2022-12-08
6,2022-12-09
7,2022-12-12
8,2022-12-13
9,2022-12-14


In [50]:
# Holidays are matched with corresponding dates in date range
holiday_match = pd.merge(date_range, holidays, how='left', left_on='Dates', right_on='Holidays')
holiday_match

Unnamed: 0,Dates,Holidays
0,2022-12-01,NaT
1,2022-12-02,NaT
2,2022-12-05,NaT
3,2022-12-06,NaT
4,2022-12-07,NaT
5,2022-12-08,NaT
6,2022-12-09,NaT
7,2022-12-12,NaT
8,2022-12-13,NaT
9,2022-12-14,NaT


In [51]:
# The dates without holidays are selected
dates = holiday_match[holiday_match.Holidays.isnull()].reset_index(drop=True)
dates

Unnamed: 0,Dates,Holidays
0,2022-12-01,NaT
1,2022-12-02,NaT
2,2022-12-05,NaT
3,2022-12-06,NaT
4,2022-12-07,NaT
5,2022-12-08,NaT
6,2022-12-09,NaT
7,2022-12-12,NaT
8,2022-12-13,NaT
9,2022-12-14,NaT


In [52]:
# The holidays column is dropped
dates = dates[['Dates']]
dates

Unnamed: 0,Dates
0,2022-12-01
1,2022-12-02
2,2022-12-05
3,2022-12-06
4,2022-12-07
5,2022-12-08
6,2022-12-09
7,2022-12-12
8,2022-12-13
9,2022-12-14


In [53]:
# A column for day is generated
dates['Day'] = 'Day'
dates

Unnamed: 0,Dates,Day
0,2022-12-01,Day
1,2022-12-02,Day
2,2022-12-05,Day
3,2022-12-06,Day
4,2022-12-07,Day
5,2022-12-08,Day
6,2022-12-09,Day
7,2022-12-12,Day
8,2022-12-13,Day
9,2022-12-14,Day


In [54]:
# A countdown of days enrolled by date is generated
dates['Enrolled'] = dates.groupby(['Day']).cumcount(ascending=False)+1
dates

Unnamed: 0,Dates,Day,Enrolled
0,2022-12-01,Day,12
1,2022-12-02,Day,11
2,2022-12-05,Day,10
3,2022-12-06,Day,9
4,2022-12-07,Day,8
5,2022-12-08,Day,7
6,2022-12-09,Day,6
7,2022-12-12,Day,5
8,2022-12-13,Day,4
9,2022-12-14,Day,3


In [55]:
# The day column is dropped leaving enrolled days for each date
dates = dates.drop(columns=['Day'])
dates

Unnamed: 0,Dates,Enrolled
0,2022-12-01,12
1,2022-12-02,11
2,2022-12-05,10
3,2022-12-06,9
4,2022-12-07,8
5,2022-12-08,7
6,2022-12-09,6
7,2022-12-12,5
8,2022-12-13,4
9,2022-12-14,3


## Combining All Tables

All the tables will be combined in this section, giving the number of days each student has been enrolled by matching the date and the enrollment columns.

All of the attendance tables will then be added to create columns that represent each one.

In [56]:
enrollment

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrollment
0,1095014,A Cerda,Eddie,1,9,2022-08-09,2022-12-01
1,1094167,Aaron,Anastasia,4,4,2021-08-10,2022-12-01
2,1095258,Aaron,Lillie,4,1,2021-08-10,2022-12-01
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,2022-12-01
4,1095224,Acevedo,Italivi,1,10,2021-08-10,2022-12-01
...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,2022-12-01
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,2022-12-01
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,2022-12-01
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,2022-12-01


In [57]:
# Enrollment dates are matched with the date dataframe giving the days each student enrolled
enrolled_numbers = pd.merge(enrollment, dates, how='left', left_on='Enrollment', right_on='Dates')
enrolled_numbers

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrollment,Dates,Enrolled
0,1095014,A Cerda,Eddie,1,9,2022-08-09,2022-12-01,2022-12-01,12.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,2022-12-01,2022-12-01,12.0
2,1095258,Aaron,Lillie,4,1,2021-08-10,2022-12-01,2022-12-01,12.0
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,2022-12-01,2022-12-01,12.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,2022-12-01,2022-12-01,12.0
...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,2022-12-01,2022-12-01,12.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,2022-12-01,2022-12-01,12.0
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,2022-12-01,2022-12-01,12.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,2022-12-01,2022-12-01,12.0


In [58]:
# Dropping extra dates columns
enrolled_numbers = enrolled_numbers[['Student ID', 'Last Name', 'First Name', 'School', 'Grade', 'Enter Date', 'Enrolled']]
enrolled_numbers

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0
...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0


In [59]:
absent

Unnamed: 0,School,Student ID,Absent
0,1,1091975,1
1,1,1091979,3
2,1,1091981,2
3,1,1091987,1
4,1,1091990,5
...,...,...,...
1042,8,1095289,5
1043,8,1095593,9
1044,8,1095830,1
1045,8,1095866,3


In [60]:
# Adding days absent column
absent_enrolled = pd.merge(enrolled_numbers, absent, how='left', on=['Student ID', 'School' ])
absent_enrolled

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Absent
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,1.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,5.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,1.0
...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,4.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,1.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,7.0


In [61]:
# Giving students with no absences a zero
absent_enrolled["Absent"] = absent_enrolled["Absent"].fillna(0)
absent_enrolled

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Absent
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,1.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,0.0
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,0.0
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,5.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,1.0
...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,4.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,0.0
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,1.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,7.0


In [62]:
# Create a present column by subtracting days absent from those enrolled
absent_enrolled['Present'] = absent_enrolled['Enrolled'] - absent_enrolled['Absent']
absent_enrolled

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Absent,Present
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,1.0,11.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,0.0,12.0
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,0.0,12.0
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,5.0,7.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,1.0,11.0
...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,4.0,8.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,0.0,12.0
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,1.0,11.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,7.0,5.0


In [63]:
present = absent_enrolled[['Student ID', 'Last Name', 'First Name', 'School', 'Grade', 'Enter Date', 'Enrolled', 'Present',
                          'Absent']] #Moves the present column over
present

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Present,Absent
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,11.0,1.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,12.0,0.0
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,12.0,0.0
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,7.0,5.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,11.0,1.0
...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,8.0,4.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,12.0,0.0
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,11.0,1.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,5.0,7.0


In [64]:
# Calculates percent present by dividing days present by days enrolled
present['% Present'] = present['Present'] / present['Enrolled']
present

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  present['% Present'] = present['Present'] / present['Enrolled']


Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Present,Absent,% Present
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,11.0,1.0,0.916667
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,12.0,0.0,1.000000
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,12.0,0.0,1.000000
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,7.0,5.0,0.583333
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,11.0,1.0,0.916667
...,...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,8.0,4.0,0.666667
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,12.0,0.0,1.000000
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,11.0,1.0,0.916667
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,5.0,7.0,0.416667


In [65]:
unexcused_absent

Unnamed: 0,School,Student ID,Unexcused Absences
0,1,1091979,2
1,1,1091987,1
2,1,1091990,1
3,1,1091992,1
4,1,1091999,1
...,...,...,...
430,8,1095046,1
431,8,1095184,1
432,8,1095240,1
433,8,1095242,2


In [66]:
# Adds unexcused absences column
unexcused = pd.merge(present, unexcused_absent, how='left', on=['Student ID', 'School'])
unexcused

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Present,Absent,% Present,Unexcused Absences
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,11.0,1.0,0.916667,1.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,12.0,0.0,1.000000,
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,12.0,0.0,1.000000,
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,7.0,5.0,0.583333,
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,11.0,1.0,0.916667,
...,...,...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,8.0,4.0,0.666667,4.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,12.0,0.0,1.000000,
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,11.0,1.0,0.916667,1.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,5.0,7.0,0.416667,5.0


In [67]:
# Gives a value of zero for students who do not have one
unexcused["Unexcused Absences"] = unexcused["Unexcused Absences"].fillna(0)
unexcused

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Present,Absent,% Present,Unexcused Absences
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,11.0,1.0,0.916667,1.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,12.0,0.0,1.000000,0.0
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,12.0,0.0,1.000000,0.0
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,7.0,5.0,0.583333,0.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,11.0,1.0,0.916667,0.0
...,...,...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,8.0,4.0,0.666667,4.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,12.0,0.0,1.000000,0.0
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,11.0,1.0,0.916667,1.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,5.0,7.0,0.416667,5.0


In [68]:
truant

Unnamed: 0,School,Student ID,Truant
0,2,1092461,1
1,2,1092982,1
2,2,1092989,1
3,2,1092996,1
4,2,1093004,1
...,...,...,...
241,6,1095774,1
242,6,1095775,1
243,6,1095939,1
244,6,1095940,1


In [69]:
# Adds the truant column to the dataframe
truant = pd.merge(unexcused, truant, how='left', on=['Student ID', 'School' ])
truant

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Present,Absent,% Present,Unexcused Absences,Truant
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,11.0,1.0,0.916667,1.0,
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,12.0,0.0,1.000000,0.0,
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,12.0,0.0,1.000000,0.0,
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,7.0,5.0,0.583333,0.0,1.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,11.0,1.0,0.916667,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,8.0,4.0,0.666667,4.0,
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,12.0,0.0,1.000000,0.0,
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,11.0,1.0,0.916667,1.0,
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,5.0,7.0,0.416667,5.0,2.0


In [70]:
# Gives a zero to students who do not have one
truant["Truant"] = truant["Truant"].fillna(0)
truant

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Present,Absent,% Present,Unexcused Absences,Truant
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,11.0,1.0,0.916667,1.0,0.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,7.0,5.0,0.583333,0.0,1.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,11.0,1.0,0.916667,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,8.0,4.0,0.666667,4.0,0.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,11.0,1.0,0.916667,1.0,0.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,5.0,7.0,0.416667,5.0,2.0


In [71]:
tardies

Unnamed: 0,School,Student ID,Tardy
0,1,1091990,3
1,1,1092002,1
2,1,1092026,1
3,1,1092059,1
4,1,1092063,1
...,...,...,...
201,6,1095929,1
202,6,1095932,1
203,6,1095940,1
204,6,1095990,1


In [72]:
# Adds the tardies column to the dataframe
tardies = pd.merge(truant, tardies, how='left', on=['Student ID', 'School' ])
tardies

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Present,Absent,% Present,Unexcused Absences,Truant,Tardy
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,11.0,1.0,0.916667,1.0,0.0,
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0,
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0,
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,7.0,5.0,0.583333,0.0,1.0,
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,11.0,1.0,0.916667,0.0,0.0,
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,8.0,4.0,0.666667,4.0,0.0,2.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0,
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,11.0,1.0,0.916667,1.0,0.0,
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,5.0,7.0,0.416667,5.0,2.0,3.0


In [73]:
# Gives a zero to students who do not have one
tardies["Tardy"] = tardies["Tardy"].fillna(0)
tardies

Unnamed: 0,Student ID,Last Name,First Name,School,Grade,Enter Date,Enrolled,Present,Absent,% Present,Unexcused Absences,Truant,Tardy
0,1095014,A Cerda,Eddie,1,9,2022-08-09,12.0,11.0,1.0,0.916667,1.0,0.0,0.0
1,1094167,Aaron,Anastasia,4,4,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0,0.0
2,1095258,Aaron,Lillie,4,1,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0,0.0
3,1093925,Aburto-Ramirez,Ayden,4,2,2021-08-10,12.0,7.0,5.0,0.583333,0.0,1.0,0.0
4,1095224,Acevedo,Italivi,1,10,2021-08-10,12.0,11.0,1.0,0.916667,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
2362,1093579,Zuniga,Bailey,1,10,2021-08-10,12.0,8.0,4.0,0.666667,4.0,0.0,2.0
2363,1094273,Zuniga,Matthew,4,5,2021-08-10,12.0,12.0,0.0,1.000000,0.0,0.0,0.0
2364,1092848,Zuniga,Michelle,6,5,2021-08-10,12.0,11.0,1.0,0.916667,1.0,0.0,0.0
2365,1094487,Zuniga,Pedro,4,3,2021-08-10,12.0,5.0,7.0,0.416667,5.0,2.0,3.0


In [74]:
# Generates a csv file from the final dataframe
import base64
from IPython.display import HTML

def create_download_link( df, title = "Attendance for Date Range", filename = "Attendance for Date Range"):
    csv = df.to_csv()
    b64 = base64.b64encode(csv.encode())
    payload = b64.decode()
    html = '<a download="{filename}" href="data:text/csv;base64,{payload}" target="_blank">{title}</a>'
    html = html.format(payload=payload,title=title,filename=filename)
    return HTML(html)

create_download_link(tardies)