### This code is for the formatting of data using the python builtin datatypes and functions which is read from a csv file.

Importing the <b>csv</b> module, <b>Counter</b> and <b>defaultdict</b> from collections module.

In [None]:
import csv
from collections import Counter,defaultdict

Reading the csv file using the <b>DictReader</b> method of csv module. This method convert each of the rows in the csv file into an Ordered Dictionary (OrderedDict) object, whose <b>key</b> is the column name and the <b>value</b> is the cell content for the corresponding row.

Storing all the OrderedDict objects into a list <b>food</b>.

In [None]:
food = list(csv.DictReader(open('E:\Personal Projects\Food_Inspections.csv')))

Printing the length of list <b>food<b>.

In [None]:
len(food)

Printing the first element stored in list <b>food</b>, indexed at position <b>0</b>.

In [None]:
food[0]

Printing the second element stored in list <b>food</b>, indexed at position <b>1</b>.

In [None]:
food[1]

Using set comprehension, extracting all the unique values stored in the OrderedDict object within the list <b>food</b> that have <b>Results</b> as their key.

In [None]:
{row['Results'] for row in food}

Using list comprehention combined with conditional filtering, extracting values from the OrderedDict objects within the list <b>food</b> which have <b>Results</b> as their key and <b>Fail</b> as the corresponding value.

This filtered list is then stored as <b>fail</b>, which also is a list of OrderedDict objects similar to that of list <b>food</b>.

In [None]:
fail = [row for row in food if row['Results'] == 'Fail']

Printing the length of list <b>fail</b>.

In [None]:
len(fail)

Printing the first elemet stored in list <b>fail</b>, indexed at position <b>0</b>.

In [None]:
fail[0]

Using list comprehention, extracting all the values for the key <b>DBA Name</b> from the OrderedDict objects within the list <b>fail</b>. This obtained list is then passed on to the <b>Counter</b> function from collections module which makes a cumulative count of each of the repeated values of <b>DBA Name</b>, if any.

The final data stored as <b>worst</b> is a dictionary with key as DBA Name and value as count of the number of occurances for corresponding DBA Name.

In [None]:
worst = Counter(row['DBA Name'] for row in fail)

From the dictionary <b>worst</b> getting the names of business and their number of occurances, for the top five records.

In [None]:
worst.most_common(5)

From the dictionary <b>worst</b> getting the names of business and their number of occurances, for the top fifteen records.

In [None]:
worst.most_common(15)

Manipulating the data in each of the OrderedDict objects inside list <b>fail</b> where the key is <b>DBA Name</b>. This is done by iterating over each of the object in list <b>fail</b>(list comprehention), and then adding a new dictionary into it whose key is <b>DBA Name</b> and its value is the existing value from the key DBA Name with modifications like replacing the single quote(') with blanks and then converting it to uppercase alphabets.

As the OrderedDict object cannot have two similar keys in it, the old value for <b>DBA Name</b> gets replaced with the modified value.

The list <b>fail</b> is overritten with the modified values for <b>DBA Name</b> in each of the OrderedDict objects inside it.

In [None]:
fail = [ { **row,'DBA Name' : row['DBA Name'].replace("'",'').upper() } for row in fail]

Applying Counter function on the modified list <b>fail</b> where the <b>DBA Name</b> value of each element inside the list matches to <b>Fail<b/>.

In [None]:
worst = Counter(row['DBA Name'] for row in fail)

From the dictionary <b>worst</b> getting the names of business and their number of occurances, for the top five records.

In [None]:
worst.most_common(5)

From the dictionary <b>worst</b> getting the names of business and their number of occurances, for the top twenty records.

In [None]:
worst.most_common(20)

Using list comprehention, extracting all the values for the key <b>Address</b> from the OrderedDict objects within the list <b>fail</b>. This obtained list is then passed on to the <b>Counter</b> function from collections module which makes a cumulative count of each of the repeated values of <b>Address</b>, if any.

The final data stored as <b>bad</b> is a dictionary with key as Address and value as count of the number of occurances for corresponding Address.

In [None]:
bad = Counter(row['Address'] for row in fail)

From the dictionary <b>bad</b> getting the address value and their number of occurances, for the top five records.

In [None]:
bad.most_common(5)

Creating a default dictionary of counters, where the values stored in this dictionaty will be of type Counter.

In [None]:
by_year = defaultdict(Counter)

Iterating over each of the element in list <b>fail</b> and adding them to the defaultdict <b>by_year</b>, where the key will be the last four digits of the <b>Inspection Date</b> and its value will be the <b>Address</b> value incremented by one for each occurance.

In [None]:
for row in fail:
    by_year[row['Inspection Date'][-4:]][row['Address']] += 1

Referring the counter object by its key and getting the list of the top 5 most repeated values.

In [None]:
by_year['2015'].most_common(5)

Referring the counter object by its key and getting the list of the top 5 most repeated values.

In [None]:
by_year['2014'].most_common(5)

Referring the counter object by its key and getting the list of the top 5 most repeated values.

In [None]:
by_year['2013'].most_common(5)

Referring the counter object by its key and getting the list of the top 5 most repeated values.

In [None]:
by_year['2016'].most_common(5)

From the dictionary <b>bad</b> getting the address value and their number of occurances, for the top five records.

In [None]:
bad.most_common(5)

Get the first element at index 0 from the element at index 0.

In python _ referres to the most recent reference value, which in this case is bad.most_common(5).

In [None]:
_[0][0]

Get the identity of the object _.

In [None]:
id(_)

Using list comprehention combined with conditional filtering, extracting values from the OrderedDict objects within the list <b>fail</b> which have <b>Address</b> as their key and the corresponding value staring with <b>11601 W TOUHY</b>.

This filtered list is then referred as <b>ohare</b>, which also is a list of OrderedDict objects.

In [None]:
ohare = [row for row in fail if row['Address'].startswith('11601 W TOUHY')]

Printing the length of list <b>ohare</b>.

In [None]:
len(ohare)

Set comprehension of list <b>ohare</b> to obtain unique values of <b>Address</b> key.

In [None]:
{row['Address'] for row in ohare}

Set comprehension of list <b>ohare</b> to obtain unique values of <b>DBA Name</b> key.

In [None]:
{row['DBA Name'] for row in ohare}

Printing the first element stored in list <b>ohare</b>, indexed at position <b>0</b>.

In [None]:
ohare[0]

Using list comprehension, extracting all the values for the key <b>AKA Name</b> from the OrderedDict objects within the list <b>ohare</b>. This obtained list is then passed on to the <b>Counter</b> function from collections module which makes a cumulative count of each of the repeated values of <b>AKA Name</b>, if any.

The final data stored as <b>c</b> is a dictionary with key as AKA Name and value as count of the number of occurances for corresponding AKA Name.

In [None]:
c = Counter(row['AKA Name'] for row in ohare)

From the dictionary <b>c</b> getting the names of business and their number of occurances, for the top ten records.

In [None]:
c.most_common(10)

Creating a default dictionary of lists, where the values stored in this dictionaty will be of type list.

In [None]:
inspections = defaultdict(list)

Iterating over each of the element in list <b>ohare</b> and adding them to the defaultdict <b>inspections</b>, where the key will be the <b>License #</b> and its value will be the OrderedDict object.

In [None]:
for row in ohare:
    inspections[row['License #']].append(row)

Referring to particular element stored in <b>inspections</b> by its key.

In [None]:
inspections['2428080']

Getting all the keys present in <b>inspections</b>.

In [None]:
inspections.keys()

Get the values for <b>Inspection Date</b> for any particular key from <b>inspections</b> using set comprehension.

In [None]:
[row['Inspection Date'] for row in inspections['34211']]

From the second element in the list <b>ohare</b>, get the value for key <b>Violations</b> and split them on <b>|</b> symbol.

In [None]:
ohare[1]['Violations'].split('|')

Renaming the results from the cell above as <b>violations</b>.

In [None]:
violations = _

Extracting the string values from each element in list <b>violations</b> sliced upto the the index of <b>- Comments</b> from the beginning, using list comprehension.

In [None]:
[v[:v.find('- Comments')] for v in violations]

Extracting the string values from each element in list <b>violations</b> sliced upto the the index of <b>- Comments</b> from the beginning and removing any whitespace from the start and end of the sliced string, using list comprehension.

In [None]:
[v[:v.find('- Comments')].strip() for v in violations]

From the list <b>ohare</b> extracting values for <b>Violations</b> and split then on <b>|</b> symbol, using list comprehension. Store this data into <b>all_violations</b> list.

In [None]:
all_violations = [row['Violations'].split('|') for row in ohare ]

Create a new Counter object.

In [None]:
c = Counter()

Iterate over each elemet in <b>all_violations</b>, for each string in <b>all_violations</b> iterate over it. Slice the obtained value from the beginning till the index of <b>- Comments</b> and strip it off any leading and trailing whitespaces. Increment this value by one each time if any dublicate value is found.

In [None]:
for violations in all_violations:
    for v in violations:
        c[v[:v.find('- Comments')].strip()] += 1
    

From the dictionary <b>c</b> getting the comments and their number of occurances, for the top five records.

In [None]:
c.most_common(5)