# Part 1 - CSV to lists
One of the things we might want to do a lot of is take a CSV file and turn it into a list of lists, which looks like a rectangular spreadsheet. For the first part of the midterm, create a module called csvlist with a class in it called CsvList.

The CsvList class will need the following methods:

- A constructor (aka init) that take a single parameter, which is the file name
- A way to retreive a list of the column names that were read from the first line of the file
- A way to retrieve the actual data rows themselves, which should be available as a list of lists

In parsing data from the file, our CsvList class can assume that the file will always have a header. The code below should be able to work if your module and class are working.

In [1]:
import jupyterimporter
from csvlist import CSVList

#f = CsvList('State_by_state.csv')
f = CSVList('/midterm/preexisting.csv')
print(f.header)  # Returns a list of the column names

importing Jupyter notebook from csvlist.ipynb
['State', 'Federally/State Administered', '"Enrolled Through January 31', ' 2014  "', '"Enrolled Through December 31', ' 2013"', '"Enrolled Through November 30', ' 2013"', '"Enrolled Through October 31', ' 2013"', '"Enrolled Through September 30', ' 2013"', '"Enrolled Through August 31', ' 2013"', '"Enrolled Through July 31', ' 2013"', '"Enrolled Through June 30', ' 2013"', '"Enrolled Through May 31', ' 2013"', '"Enrolled Through April 30', ' 2013"', '"Enrolled Through Mar. 31', ' 2013"', '"Enrolled Through Feb. 28', ' 2013"', '"Enrolled Through Jan. 31', ' 2013"', '"Enrolled Through Dec. 31', ' 2012"', '"Enrolled Through Nov. 30', ' 2012"', '"Enrolled Through October 31', ' 2012"', '"Enrolled Through September 30', ' 2012"', '"Enrolled Through August 31', ' 2012"', '"Enrolled Through July 31', ' 2012"', '"Enrolled Through June 30', ' 2012"', '"Enrolled Through May 31', ' 2012"', '"Enrolled Through April 30', ' 2012"', '"Enrolled Through Ma

In [2]:
print(f.data)

[['Alabama', 'Federal', '115', '123', '594', '639', '672', '711', '736', '766', '795', '820', '821', '1006', '972', '911', '838', '794', '735', '679', '635', '590', '559', '524', '466', '429', '389', '340', '296', '275', '230', '182', '138', '118', '103', '91', '77', '61'], ['Alaska', 'State', '5', '25', '32', '34', '34', '36', '36', '38', '43', '46', '51', '52', '47', '46', '45', '47', '42', '45', '43', '43', '45', '47', '45', '42', '42', '44', '45', '48', '45', '43', '47', '38', '35', '34', '32', '20'], ['Arizona', 'Federal', '1373', '1392', '3857', '4021', '4154', '4293', '4389', '4541', '4653', '4779', '4779', '5254', '5082', '4861', '4628', '4402', '4149', '3898', '3655', '3480', '3282', '3065', '2748', '2448', '2139', '1783', '1533', '1391', '1178', '967', '759', '639', '573', '457', '374', '270'], ['Arkansas', 'Transitioned from State to Federally-administered program 7/1/2013', '126', '129', '648', '688', '716', '750', '767', '736', '871', '878', '895', '921', '883', '868', '85

# Part 2 - "Pivoting" data
You'll notice that CMS data file includes several columns, one for each different timeframe. It's not atypical to recieve data files where time is measure horizontally across the row, but you may need to pivot that so that have one row per time entry. As a simplified example, below is a sample of input data that has a separate column for each quarter in 2015:

    Last,First,2015Q1,2015Q2,2015Q3,2015Q4
    Boal,Paul,10,9,10,8
    Westhus,Eric,9,10,10,9
    
When we say we want to pivot that data, what mean is that we want one row for each data value. To be specific, we say that we're going to pivot columns 2 throuh the end (assuming the first column is 0). The output of doing this looks like this:

    Last,First,Time,Value
    Boal,Paul,2015Q1,10
    Boal,Paul,2015Q2,9
    Boal,Paul,2015Q3,10
    Boal,Paul,2015Q4,8
    Westhus,Eric,2015Q1,9
    Westhus,Eric,2015Q2,10
    Westhus,Eric,2015Q3,10
    Westhus,Eric,2015Q4,9
    
So, create a new module called pivot with one function in it called pivot_columns(). This function should take a list of lists as well as a list of column numbers that should be pivoted as shown in the example below. You can assume the file is a CSV. The return value should be a list of lists.

In [4]:
import jupyterimporter
from pivot import pivot_columns

test = [['Last','First','2015Q1','2015Q2','2015Q3','2015Q4'],
        ['Boal','Paul',10,9,10,8],
        ['Westhus','Eric',9,10,10,9]]

out = pivot_columns(test,list(range(2,6)))
print(out)

importing Jupyter notebook from pivot.ipynb
[['Last', 'First', 'Column', 'Value'], ['Boal', 'Paul', '2015Q1', 10], ['Boal', 'Paul', '2015Q2', 9], ['Boal', 'Paul', '2015Q3', 10], ['Boal', 'Paul', '2015Q4', 8], ['Westhus', 'Eric', '2015Q1', 9], ['Westhus', 'Eric', '2015Q2', 10], ['Westhus', 'Eric', '2015Q3', 10], ['Westhus', 'Eric', '2015Q4', 9]]
None


# Part 3 - "Joining" data together


In [3]:
import jupyterimporter
from joiner import join_lists

list1 = [['Last','First','Time','Value'],
         ['Boal','Paul','2015Q1',10],
         ['Boal','Paul','2015Q2',9]]

list2 = [['Time','Census'],
         ['2015Q1',932],
         ['2015Q2',943]]

join_lists(list1, list2, [2], [0], [1,2,3,4], [1])


[['Last', 'First', 'Time', 'Value', 'Census'],
 ['Boal', 'Paul', '2015Q1', 10, 932],
 ['Boal', 'Paul', '2015Q2', 9, 943]]

In [6]:
#For #3 the idea is to find which lines have the same time and if they do append the census to the end of that row. 
#this is similar to vlookup in excel

# Part 4 - Final Code to Test with
Something roughlyt like the code below shoud run successfully for your modules. You can add code that you may need to cleanup or further parse any data, and make any corrections to the code you find below that may not work exactly with how you've implemented things.

In [10]:

import jupyterimporter
from csvlist import CSVList
from pivot import pivot_columns
from joiner import join_lists

preexisting = CSVList('/midterm/preexisting.csv')
census = CSVList('/midterm/census.csv')

### CLEANUP CODE STARTS
pre = pivot_columns([preexisting.header] + preexisting.data, list(range(2,37)))
cen = pivot_columns([census.header] + census.data, list(range(5,12)))

pre_clean = [pre[0] + ['Year']] + [(r + [r[2].split(',')[1].strip()]) for r in pre[1:]]
cen_clean = [cen[0] + ['Year']] + [(r + [r[5][-4:]]) for r in cen[1:]]
### CLEANUP CODE ENDS


out = join_lists(pre_clean, cen_clean, [0,4], [4,7], [0,2,3], [6])


[['State', 'Federally/State Administered', 'Column', 'Value'], ['Alabama', 'Federal', '"Enrolled Through January 31', '115'], ['Alabama', 'Federal', ' 2014  "', '123'], ['Alabama', 'Federal', '"Enrolled Through December 31', '594'], ['Alabama', 'Federal', ' 2013"', '639'], ['Alabama', 'Federal', '"Enrolled Through November 30', '672'], ['Alabama', 'Federal', ' 2013"', '711'], ['Alabama', 'Federal', '"Enrolled Through October 31', '736'], ['Alabama', 'Federal', ' 2013"', '766'], ['Alabama', 'Federal', '"Enrolled Through September 30', '795'], ['Alabama', 'Federal', ' 2013"', '820'], ['Alabama', 'Federal', '"Enrolled Through August 31', '821'], ['Alabama', 'Federal', ' 2013"', '1006'], ['Alabama', 'Federal', '"Enrolled Through July 31', '972'], ['Alabama', 'Federal', ' 2013"', '911'], ['Alabama', 'Federal', '"Enrolled Through June 30', '838'], ['Alabama', 'Federal', ' 2013"', '794'], ['Alabama', 'Federal', '"Enrolled Through May 31', '735'], ['Alabama', 'Federal', ' 2013"', '679'], ['Ala

TypeError: 'NoneType' object is not subscriptable

In [9]:
"""The pre_clean variable equals a new list of lists. That list consists of the first column of pre and a new column Year. Adding
    to that the rest of the list modified by the (r + [r[2].split(',')[1].strip()]. This part of the code splits the 
        3rd value at the comma and strips the second value. It makes pre_clean a 5 item lsit.
        
    Cen_clean is a varialbe of a list of lists consisiting of the first column of cen plus the year plus column 6 and
    the last 3 columns in the orgiinals list. This allows for the join lists function to be executed properly which 
    compares pre_clean and cen_clean to make sure that multiple rows mathc before the rows join. 
    """

NameError: name 'out' is not defined

NameError: name 'out' is not defined

# # Submitting

Use the commands below to add, commit, and push you code.

In [4]:
%%bash
git add csvlist.ipynb
git add joiner.ipynb
git add pivot.ipynb
git add Ndollinger0419_MIDTERM.ipynb
git commit -a -m "Submitting my midterm!!"
git push

[master 79af4a5] Submitting my midterm!!
 2 files changed, 40 insertions(+), 32 deletions(-)


Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the traditional behavior, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

When push.default is set to 'matching', git will push local branches
to the remote branches that already exist with the same name.

Since Git 2.0, Git defaults to the more conservative 'simple'
behavior, which only pushes the current branch to the corresponding
remote branch that 'git pull' uses to update the current branch.

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

To git@github.com:Ndollinger0419/hds5210-week02.git
   2e7c94f..79af4a5  master -> master
