# Single Column Top 10 Table
This script can be run on a single column of any dataset. It lists the top 10 values in the column. This is being used instead of a table or pie chart for all the values since displaying multiple 10s of values would not be useful. This script was written for the following columns:
1. Procedures(Procedure_Performed_Description)
2. Medications(Medication_Given_Description

_Author: Jared Gauntt_

## Prepare for Analysis

### Set Parameters

In [1]:
localFolder='C:/Users/jared/Documents/My Files/DAEN 690/Analysis/'
fileName='DAEN 690 2021-02-14 V2.xlsx'
sheetName='Procedures'
columnOfInterest='Procedure_Performed_Description'

### Import Libraries

In [2]:
import pandas as pd

### Import From Excel Spreadsheet

In [3]:
#Import single tab
df=pd.read_excel(localFolder+fileName,sheet_name=sheetName)
print('Original Number of Rows = '+str(len(df)))

Original Number of Rows = 171515


### Remove Duplicate Rows & Reduce To Column Of Interest

In [4]:
#Determine which rows are duplicates (True=duplicate, False=first instance of row)
duplicateRowIdentifier=df.duplicated()

#Reduce to the rows that were not flagged as duplicates
df=df.loc[duplicateRowIdentifier==False,:]

#Reduce to column of interest
ds=df[columnOfInterest]

#Confirm
print('Number of Unique Rows in Dataset = '+str(len(ds)))

Number of Unique Rows in Dataset = 150524


## Analysis

### Count Number of Instances of Each Unique Value

In [5]:
#Create data frame for counting 
dfCount=pd.DataFrame(ds.unique(),columns=[columnOfInterest])
dfCount['Number of Rows']=0

#Count the number of rows per each value
for index in dfCount.index:
    dfCount.loc[index,'Number of Rows']=len(ds[ds==dfCount.loc[index,columnOfInterest]])
dfCount.sort_values(by='Number of Rows',ascending=False,inplace=True)
dfCount.reset_index(drop=True,inplace=True)

#Calculate the percentage of rows per each value
dfCount['Percent of Rows']=round(dfCount['Number of Rows']/len(ds)*100,2)

### Visualizations

In [6]:
#Calculate percentage of rows covered by top 10 values
print('Percent of Rows With Top 10 Values = '+str(round(dfCount.iloc[0:10,2].sum(),2)))

#Print top 10
dfCount.iloc[0:10,:]

Percent of Rows With Top 10 Values = 89.85


Unnamed: 0,Procedure_Performed_Description,Number of Rows,Percent of Rows
0,IV Start - Extremity Vein (arm or leg),57621,38.28
1,CV - ECG - 12 Lead Obtained,53188,35.34
2,MS - Cervical Spinal Restriction of Motion,4708,3.13
3,Assess - Assessment of Patient,4212,2.8
4,Assess - Blood Glucose Level (BGL),3521,2.34
5,CV - ECG - Limb Lead Monitoring,3120,2.07
6,Assess - Capnography (ETCO2),2886,1.92
7,Electrocardiographic monitoring,2444,1.62
8,IO Start - Intraosseous Access,1799,1.2
9,MS - Full Spinal Restriction of Motion,1726,1.15
