### Data Analysis Process

![Data Analysis Process](https://github.com/Data-Analytics-Tutors/0-Introduction/blob/master/images/Intro%20to%20Data%20Analytics%20Case%20Study.png?raw=1)


### Data Understanding

Data sourced from Dallas Open Data Portal, [Vendor Payments for Fiscal Year 2019-Present](https://www.dallasopendata.com/Economy/Vendor-Payments-for-Fiscal-Year-2019-Present/x5ih-idh7)
Latest 500k query.
Resources:
- Questions sourced from [Dallas Open Records](https://dallastx.govqa.us/WEBAPP/_rs/(S(yvwdnfffcg5vrmly43gmvfag))/OpenRecordsSummary.aspx?sSessionID=)

The csv  has 22 columns and 850,000+ rows.
Features include:
(List of columns)


## FIELD NAME DESCRIPTION
-   RUN DATE DATE POSTED
-   FY FISCAL YEAR
-   FM FISCAL MONTH ‐ OCTOBER IS MONTH 1/SEPTEMBER IS MONTH 12/PERIOD 13 IS AN ADJUSTMENT PERIOD AT YEAR END
-   DOC‐ID SYSTEM DOCUMENT ID ‐ READS AS FOLLOWS ‐ DOCUMENT TYPE‐DEPARTMENT‐DOCUMENT NUMBER
-   CHKSUBTOT PAYMENT SUBTOTAL BY LINE ITEM
-   VCODE VENDOR CODE
-   VENDOR VENDOR NAME
-   ZIP5 VENDOR ZIP CODE
-   FTYP FUND TYPE ABBREVIATION ‐ SEE THE 'READ MORE' SECTION OF THIS WEBSITE FOR DESCRIPTIONS
-   FUND TYPE FUND TYPE ‐ SEE THE 'READ MORE' SECTION OF THIS WEBSITE FOR DESCRIPTIONS
-   DPT DEPARTMENT ABBREVIATION
-   DEPARTMENT DEPARTMENT
-   ACTV INTERNAL ACTIVITY CODE
-   ACTIVITY ACTIVITY
-   OGRP OBJECT GROUP ABBREVIATION
-   OBJECTGROUP OBJECT GROUP (SPECIFIES THE EXPENDITURE CLASSIFICATION)
-   OBJ NUMERIC CODE ASSOCIATED WITH THE OBJECT
-   OBJECT DESCRIPTION OF OBJ ‐ OBJECTS ARE PLACED IN TO OBJECT GROUPS FOR CLASSIFICATION, BASED ON THEIR OBJ CODE
-   COMM COMMODITY CODE
-   COMMODITY DSCR DESCRIPTION OF COMM (COMMODITY CODE)
-   INVOICEDATE NOT APPLICABLE
-   INVOICENUMBER NOT APPLICABLE

In [None]:
!pip install sidetable
import sidetable as stb

In [None]:
#Import Libraries

import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np
import seaborn as sns
#from pandas_profiling import ProfileReport

In [None]:
#Read in data (could be done from a file(excel/csv), API, link, or SQL dB)
df = pd.read_csv("https://www.dallasopendata.com/resource/x5ih-idh7.csv?$limit=50000")
df.shape
#examine the shape of the data

In [None]:
#formatting columns, specifically the chksubtot column
format_dict =  {'chksubtot':'${:,.2f}'}

In [None]:
#returns the first 5 rows of data
df.head()

#returns the last 5 rows of data
#df.tail()

### Exploratory Data Analysis

In [None]:
df.dtypes
#examine the data data types

In [None]:
df.info()
#examine the info for the data

In [None]:
df.describe()
#describe the data

In [None]:
df.department.value_counts()
# here we are looking at the number of times a dept is represented

In [None]:
#number  of null values
df.isnull().sum()

#percentage of null values
#df.isnull().mean()



In [None]:
df['department'] = df.department.astype('string')
df.department.dtypes

#### Which department has the most vendor payouts  and what is its average payout amount?

In [None]:
format_dict = {'chksubtot':'${:,.0f}'}


In [None]:
#fy 2024
fy_24 = df[df.fy ==2024]
fy_24.style.format(format_dict)
fy_24.stb.freq(['department'], value='chksubtot', style=True, cum_cols=False)

#### Question 2 : AVI Vendors
##### Which vendor recieved the most overall pay out from the aviation department?



In [None]:
avi_dept = df[df.department =='Aviation']
avi_dept.style.format(format_dict)
avi_dept.head()

In [None]:
avi_dept.shape

In [None]:
#first, lets isolate the Aviation dept, vendors from the rest the data
aviation_vendors_sum = avi_dept.groupby(['vendor'])['chksubtot'].agg(['sum']).sort_values(by=['sum'],ascending=False)
aviation_vendors_sum

In [None]:
#contains_clear, searching for vendor payout to "Flatiron Contractors"
avi_dept.loc[avi_dept['vendor'].str.contains('Flatiron', case=False)]

## Practice ##

Using the previous excercise, perform the following:
- Create a dataframe for the purchases by the Fire Department
- Find out who the top vendors are
- How much was the most recent payout was for

In [None]:
#Create a dataframe for the purchases by the Fire Department
fire_dept =  df[df.department =='Dallas Fire Department']
fire_dept.style.format(format_dict)
fire_dept.head()

In [None]:
fire_dept.shape

In [None]:
#Find out who the top vendors are

fire_vendors =fire_dept.groupby(['vendor'])['chksubtot'].agg(['sum']).sort_values(by=['sum'],ascending=False)
fire_vendors


In [None]:
#How much was the most recent payout was for
fire_dept.loc[fire_dept['vendor'].str.contains('Siddons', case=False)]

In [None]:
#Find out who the top vendors are

fire_vendors =fire_dept.groupby(['vendor'])['chksubtot'].agg(['sum']).sort_values(by=['sum'],ascending=False)
fire_vendors

#Create a dataframe for the purchases by the Fire Department
fire_dept =  df[df.department =='Dallas Fire Department']
fire_dept.style.format(format_dict)
fire_dept.head()

#How much was the most recent payout was for
fire_dept.loc[fire_dept['vendor'].str.contains('Siddons', case=False)]

####  You recieve a request from a resident who inquires the following via email: ####
    "I am requesting an opportunity to inspect or obtain copies of public records for the contract - Temporary Staffing. The details I am requesting are given below: Proposals of the awarded vendors. Spending on this contract till now."
    - Citizen Candy

In [None]:
temp_staff_contract = df[df.object =='Outside Temps/Staffing']
temp_staff_contract.head()

In [None]:
temp_staff_contract.shape

In [None]:
#temp contract by year and  vendor
temp_staff_contract.groupby(['vendor','fy'])['chksubtot'].agg(['sum']).sort_values(by=['vendor','sum'],ascending=False)

### Modeling (optional)

In [None]:
df.columns

In [None]:
#vizualizing payout by fm/$
df.groupby('fm')['chksubtot'].mean().plot(kind='bar',color= 'C1')
plt.title("Mean Vendor Payout($) by FY", color= 'C1')
plt.xlabel(" FY",color= 'C1')
plt.ylabel("Mean Check paid ($)", color= 'C1')
plt.show()

### Evaluation( Data Presentation/Validation--how to present data to stakeholder)

In [None]:
# export for excel data profile
file_name = "fire_dept_Data_Profile.xlsx"
with pd.ExcelWriter(file_name) as writer:
    # writing to the 'Employee' sheet
    fire_dept.to_excel(writer, sheet_name='Fire_Profile', index=False)



### Deployment( Visualization & Presentation)