# Analzye Data using Straight Python

This example analyzes one of our test data sets using just straight python list
elements (not pandas) but strives to avoid loops as much as possible and think
about the data as a data set.

We begin by loading the file into a list of items using the "csv" package.

In [22]:
import csv

file = "datasets/lunch_10.csv"
rawrows = []
with open( file, newline='') as csvfile:
    rd = csv.reader( csvfile )
    for row in rd:
        rawrows.append( row )

labels = rawrows[ 0 ]
rows = rawrows[ 1: ]

Alright lets get some basic feel for the data. What are the labels? How many rows are there?

In [23]:
labels

['Date',
 'School',
 'Enrollment',
 'Attendance',
 'Hamburger',
 'Pizza',
 'HotDog',
 'Skipped']

In [24]:
len( rows )

1800

And now we want to ask some questions. How many unique schools are in my dataset? How many
days does my dataset cover?

In [25]:
# We can find the unique elements by casting to a set.
schools = set( [ r[1] for r in rows ] )
[ schools, len( schools ) ]

[{'Central Arthur H.S.',
  'East York H.S.',
  'North Jefferson H.S.',
  'North Lincoln M.S.',
  'Outer Roosevelt M.S.',
  'West Jefferson M.S.',
  'West King M.S.',
  'West Lincoln High',
  'West Lincoln M.S.',
  'West York High'},
 10]

In [26]:
# Similarly we can find the unique days
len( set ( [ r[ 0 ] for r in rows ] ))

180

OK so now we need to do something harder. For each school we have the number of students
and the number in attendance on a given day. So we can figure out the average daily attendance? That's a bit tricky because the data is all in one set and we want to 
derive an answer by 'grouping' the data into each school. 

There are lots of ways to do this and toolkits like 'pandas' make it really easy but
since we are working just with a list of elements here, we need to recreate that grouping a bit.

In [27]:
attendance = dict()
for s in schools:
    attendance[ s ] = [ 0, 0 ]

for r in rows:
    school = r[ 1 ]
    enroll = int(r[ 2 ])
    attend = int(r[ 3 ])
    
    prior = attendance[ school ]
    newval = [ prior[ 0 ] + enroll, prior[ 1 ] + attend ]
    attendance[ school ] = newval

result = dict()
for k,v in attendance.items():
    result[ k ] = v[ 1 ] / v[ 0 ] * 100

result

{'Central Arthur H.S.': 87.0143080706461,
 'East York H.S.': 93.280077463084,
 'North Jefferson H.S.': 87.82571288102261,
 'North Lincoln M.S.': 88.82559774964838,
 'Outer Roosevelt M.S.': 84.80662983425414,
 'West Jefferson M.S.': 88.63853123509776,
 'West King M.S.': 87.04585537918871,
 'West Lincoln High': 91.74394319131162,
 'West Lincoln M.S.': 89.16858237547892,
 'West York High': 90.41666666666667}

And so on. We could continue but this will be a bit tedious having to write all this
loop code.