<a href="https://colab.research.google.com/github/Balonglongz/github-codespaces-demo/blob/main/PFX_Fall22_SkillsOH_0123_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `Final Exam`, `Fall 2022`: `Time Series Analysis of US Inflation`
_Version 1.0.1_

Change history:   
1.0.1 - bugfix ex2 test code.  
1.0 - initial release  

*All of the header information is important. Please read it..*

**Topics, number of exercises:** This problem builds on your knowledge of Pandas, Numpy, basic Python data structures, and implementing mathematical functions. It has **9** exercises, numbered 0 to **8**. There are **18** available points. However, to earn 100% the threshold is **13** points. (Therefore, once you hit **13** points, you can stop. There is no extra credit for exceeding this threshold.)

**Exercise ordering:** Each exercise builds logically on previous exercises, but you may solve them in any order. That is, if you can't solve an exercise, you can still move on and try the next one. Use this to your advantage, as the exercises are **not** necessarily ordered in terms of difficulty. Higher point values generally indicate more difficult exercises.

**Demo cells:** Code cells starting with the comment `### define demo inputs` load results from prior exercises applied to the entire data set and use those to build demo inputs. These must be run for subsequent demos to work properly, but they do not affect the test cells. The data loaded in these cells may be rather large (at least in terms of human readability). You are free to print or otherwise use Python to explore them, but we did not print them in the starter code.

**Debugging your code:** Right before each exercise test cell, there is a block of text explaining the variables available to you for debugging. You may use these to test your code and can print/display them as needed (careful when printing large objects, you may want to print the head or chunks of rows at a time).

**Exercise point breakdown:**

- Exercise 0: **1** point(s)
- Exercise 1: **1** point(s)
- Exercise 2: **2** point(s)
- Exercise 3: **2** point(s)
- Exercise 4: **2** point(s)
- Exercise 5: **2** point(s)
- Exercise 6: **2** point(s)
- Exercise 7: **3** point(s)
- Exercise 8: **3** point(s)

**Final reminders:**

- Submit after **every exercise**
- Review the generated grade report after you submit to see what errors were returned
- Stay calm, skip problems as needed, and take short breaks at your leisure


## Background Inflation

Inflation is an increase in overall prices in an economy over time. Deflation is "negative inflation", a decrease in prices over time. A common way to measure inflation is to first calculate the CPI (price of a representative basket of goods), then compute the difference in CPI over a time interval. In other words if the CPI is 100 at one point in time, and the CPI is 105 one year later then we would say that the inflation rate over that year was 5%.

## Data

We have obtained the US CPI for each month going back to the early 20th century from The Organisation for Economic Co-operation and Development.

## Analysis goals
- Use the CPI data to calculate the inflation rate at any point in history over an arbitrary number of months.
- Attempt to predict the inflation rate in future months based on the inflation rate in previous months using exponential smoothing models.
    - Evaluate how "good" the predictions are.
    - Tune the models to pick the best parameters.
    - Make inferences based on the selected parameters.

In [1]:
# uncomment in Google Colab
# !python --version
!pip install dill
import dill as pickle
!pip install cryptography

Collecting dill
  Downloading dill-0.3.7-py3-none-any.whl (115 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/115.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m112.6/115.3 kB[0m [31m3.5 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m115.3/115.3 kB[0m [31m3.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dill
Successfully installed dill-0.3.7


In [2]:
### Global Imports
import pandas as pd
import numpy as np
import pickle

# Some functionality needed by the notebook and demo cells:
from pprint import pprint, pformat
import math

# === Messages === #

def status_msg(s, verbose=True, **kwargs):
    if verbose:
        print(s, **kwargs)

# === Input/output === #

# def load_df_from_file(basename, dirname='resource/asnlib/publicdata/', abort_on_error=False, verbose=False):
def load_df_from_file(basename, dirname='', abort_on_error=False, verbose=False):
    from os.path import isfile
    from dill import loads
    from pandas import DataFrame
    df = DataFrame()
    filename = f"{dirname}{basename}"
    status_msg(f"Loading `DataFrame` from '{filename}'...", verbose=verbose)
    if isfile(filename):
        try:
            with open(filename, "rb") as fp:
                df = loads(fp.read())
            status_msg(f"  ==> Done!", verbose=verbose)
        except:
            if abort_on_error:
                raise
            else:
                df = DataFrame()
                status_msg(f"  ==> An error occurred.", verbose=verbose)
    return df

# def load_obj_from_file(basename, dirname='resource/asnlib/publicdata/', abort_on_error=False, verbose=False):
def load_obj_from_file(basename, dirname='', abort_on_error=False, verbose=False):
    from os.path import isfile
    from dill import loads
    from pandas import DataFrame
    filename = f"{dirname}{basename}"
    status_msg(f"Loading object from '{filename}'...", verbose=verbose)
    if isfile(filename):
        try:
            with open(filename, "rb") as fp:
                df = loads(fp.read())
            status_msg(f"  ==> Done! Type: `{type(df)}`", verbose=verbose)
        except:
            if abort_on_error:
                raise
            else:
                df = DataFrame()
                status_msg(f"  ==> An error occurred.", verbose=verbose)
    else:
        df = None
    return df

In [3]:
# import files
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/tc_1
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/tc_2
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/tc_3
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/cpi_urban_all.csv

!mkdir tester_fw
%cd tester_fw

!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/tester_fw/__init__.py
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/tester_fw/test_utils.py
!wget https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/tester_fw/testers.py

%cd ..

--2023-11-28 14:18:02--  https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/tc_1
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 530616 (518K) [text/plain]
Saving to: ‘tc_1’


2023-11-28 14:18:02 (11.0 MB/s) - ‘tc_1’ saved [530616/530616]

--2023-11-28 14:18:02--  https://raw.githubusercontent.com/gt-cse-6040/topic_12_FEX_FA22_0123/main/tc_2
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 364812 (356K) [text/plain]
Saving to: ‘tc_2’


2023-11-28 14:18:03 (7.90 MB/s) - ‘tc_2’ saved [364812/364812]

--2023-11-

## Exercise 0 - (**1** Points):
To start things off we will load the CPI data into the notebook environment. You do not need to modify the cell below, just execute the test and collect your free point!

This cell will also display the first few rows and last few rows of the CPI data we just loaded.

In [4]:
cpi_all_df = pd.read_csv('cpi_urban_all.csv')
display(cpi_all_df.head())
display(cpi_all_df.tail())

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,HALF1,HALF2
0,1913,9.8,9.8,9.8,9.8,9.7,9.8,9.9,9.9,10.0,10.0,10.1,10.0,,
1,1914,10.0,9.9,9.9,9.8,9.9,9.9,10.0,10.2,10.2,10.1,10.2,10.1,,
2,1915,10.1,10.0,9.9,10.0,10.1,10.1,10.1,10.1,10.1,10.2,10.3,10.3,,
3,1916,10.4,10.4,10.5,10.6,10.7,10.8,10.8,10.9,11.1,11.3,11.5,11.6,,
4,1917,11.7,12.0,12.0,12.6,12.8,13.0,12.8,13.0,13.3,13.5,13.5,13.7,,


Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,HALF1,HALF2
105,2018,247.867,248.991,249.554,250.546,251.588,251.989,252.006,252.146,252.439,252.885,252.038,251.233,250.089,252.125
106,2019,251.712,252.776,254.202,255.548,256.092,256.143,256.571,256.558,256.759,257.346,257.208,256.974,254.412,256.903
107,2020,257.971,258.678,258.115,256.389,256.394,257.797,259.101,259.918,260.28,260.388,260.229,260.474,257.557,260.065
108,2021,261.582,263.014,264.877,267.054,269.195,271.696,273.003,273.567,274.31,276.589,277.948,278.802,266.236,275.703
109,2022,281.148,283.716,287.504,289.109,292.296,296.311,296.276,296.171,296.808,298.012,,,288.347,


<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 0. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [5]:
### test_cell_ex0
assert 'cpi_all_df' in globals()
assert isinstance(cpi_all_df, pd.DataFrame)
print('Passed! Please submit.')

Passed! Please submit.


## Exercise 1 - (**1** Points):
The raw data needs some light cleaning. There are some columns which we do not need for analysis, some of the numerical columns have blanks, and (due to the blanks) some numerical columns are the wrong type. We need to correct these issues before moving forward.

Define the function `cleanup_df(df, drop_cols)`. Input `df` is a DataFrame and `drop_cols` is a list of column names **which may or may not** appear in `df`.

Your function should return a new DataFrame having the same contents as `df` with the following exceptions:  
- All columns included in `drop_cols` should be dropped.  
    - Your function **should not** raise an error if a column in `drop_cols` does not appear in `df`.
- All cells which contain the value `' '` should be replaced with `np.nan`.
- All columns with month abbreviations for names (`Jan, Feb, Mar, Apr, May, Jun, Jul, Aug, Sep, Oct, Nov, Dec`) should be converted to `float64`.


In [6]:
### Define demo inputs
demo_df_ex1 = cpi_all_df.tail().reset_index(drop=True)
display(demo_df_ex1)
demo_drop_cols_ex1 = ['HALF1', 'HALF2', 'THIS COLUMN DOESN\'T EXIST']

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,HALF1,HALF2
0,2018,247.867,248.991,249.554,250.546,251.588,251.989,252.006,252.146,252.439,252.885,252.038,251.233,250.089,252.125
1,2019,251.712,252.776,254.202,255.548,256.092,256.143,256.571,256.558,256.759,257.346,257.208,256.974,254.412,256.903
2,2020,257.971,258.678,258.115,256.389,256.394,257.797,259.101,259.918,260.28,260.388,260.229,260.474,257.557,260.065
3,2021,261.582,263.014,264.877,267.054,269.195,271.696,273.003,273.567,274.31,276.589,277.948,278.802,266.236,275.703
4,2022,281.148,283.716,287.504,289.109,292.296,296.311,296.276,296.171,296.808,298.012,,,288.347,


<!-- Expected demo output text block -->
The demo included in the solution cell below should display the following output:
  
|    |   Year |     Jan |     Feb |     Mar |     Apr |     May |     Jun |     Jul |     Aug |     Sep |     Oct |     Nov |     Dec |  
|---:|-------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|  
|  0 |   2018 | 247.867 | 248.991 | 249.554 | 250.546 | 251.588 | 251.989 | 252.006 | 252.146 | 252.439 | 252.885 | 252.038 | 251.233 |  
|  1 |   2019 | 251.712 | 252.776 | 254.202 | 255.548 | 256.092 | 256.143 | 256.571 | 256.558 | 256.759 | 257.346 | 257.208 | 256.974 |  
|  2 |   2020 | 257.971 | 258.678 | 258.115 | 256.389 | 256.394 | 257.797 | 259.101 | 259.918 | 260.28  | 260.388 | 260.229 | 260.474 |  
|  3 |   2021 | 261.582 | 263.014 | 264.877 | 267.054 | 269.195 | 271.696 | 273.003 | 273.567 | 274.31  | 276.589 | 277.948 | 278.802 |  
|  4 |   2022 | 281.148 | 283.716 | 287.504 | 289.109 | 292.296 | 296.311 | 296.276 | 296.171 | 296.808 | 298.012 | NaN     | NaN     |

Notice:  
- The columns 'HALF1' and 'HALF2' were dropped.
- There was no error for trying to drop 'THIS COLUMN DOESN'T EXIST' which does not exist in `df`.
- The blanks are replaced with `np.nan` (which displays as 'NaN'). FYI `np.nan` is a `float`.

Notes:
- Check the `dtypes` attribute of your result. Columns which are months ('Jan', 'Feb', ...) should be `float64`. Any other remaining columns should have the same `dtype` as the original column in the input.

In [7]:
### Exercise 1 solution
def cleanup_df(df, drop_cols):
    ### BEGIN SOLUTION
    months = 'Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec'.split()
    month_types = {m:'float64' for m in months}
    existing_drop_cols = set(df.columns) & set(drop_cols)
    return df.replace(' ', np.nan)\
        .drop(columns=existing_drop_cols)\
        .astype(month_types)
    ### END SOLUTION

### demo function call
demo_output_ex1 = cleanup_df(demo_df_ex1, demo_drop_cols_ex1)
display(demo_output_ex1)

Unnamed: 0,Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec
0,2018,247.867,248.991,249.554,250.546,251.588,251.989,252.006,252.146,252.439,252.885,252.038,251.233
1,2019,251.712,252.776,254.202,255.548,256.092,256.143,256.571,256.558,256.759,257.346,257.208,256.974
2,2020,257.971,258.678,258.115,256.389,256.394,257.797,259.101,259.918,260.28,260.388,260.229,260.474
3,2021,261.582,263.014,264.877,267.054,269.195,271.696,273.003,273.567,274.31,276.589,277.948,278.802
4,2022,281.148,283.716,287.504,289.109,292.296,296.311,296.276,296.171,296.808,298.012,,


<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 1. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [8]:
### test_cell_ex1
from tester_fw.testers import Tester

conf = {
    'case_file':'tc_1',
    'func': cleanup_df, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'df':{
            'dtype':'pd.DataFrame', # data type of param.
            'check_modified':True,
        },
        'drop_cols':{
            'dtype':'list', # data type of param.
            'check_modified':True,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'pd.DataFrame',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-10)
        }
    }
}
tester = Tester(conf, key=b'z0BNF11iKYQicR63590bVXZGa19YGvJcmzrbP6R7oAY=', path='')
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')

Passed! Please submit.


## Exercise 2 - (**2** Points):
To complete our time series analysis we need to reshape the data into a proper time series. By using earlier functions we are able to pare down the data into this form:

|    |   Year |     Jan |     Feb |     Mar |     Apr |     May |     Jun |     Jul |     Aug |     Sep |     Oct |     Nov |     Dec |  
|---:|-------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|--------:|  
|  2 |   2020 | 257.971 | 258.678 | 258.115 | 256.389 | 256.394 | 257.797 | 259.101 | 259.918 | 260.28  | 260.388 | 260.229 | 260.474 |  
|  3 |   2021 | 261.582 | 263.014 | 264.877 | 267.054 | 269.195 | 271.696 | 273.003 | 273.567 | 274.31  | 276.589 | 277.948 | 278.802 |  
|  4 |   2022 | 281.148 | 283.716 | 287.504 | 289.109 | 292.296 | 296.311 | 296.276 | 296.171 | 296.808 | 298.012 | NaN     | NaN     |  

We want to further transform it into a single dimension in chronological order. (i.e. all the data points for 2020 followed by all the data points for 2021 followed by all the data points for 2022.)

**Note**: In the example above there are no records for November and December of 2022 (because they have not concluded at the writing of this exam).  
- For most months out of the year there will be missing values _at the end_ of the time interval.  
    - Our solution should handle this gracefully.  
- However, missing values _in the middle or at the start_ of the time interval are not expected and indicate an invalid input.  
    - Our solution should take care of this validation.

Define the function `to_ts(df)`. The input `df` can be assumed to have the following characteristics:  
- It's columns will be `'Year' 'Jan' 'Feb' 'Mar' 'Apr' 'May' 'Jun' 'Jul' 'Aug' 'Sep' 'Oct' 'Nov' 'Dec'` in that particular order.
- All of the "month" columns will be type `float64`.
- There may be some missing values which will be populated with `np.nan`.
- The records will be sorted by the `'Year'` column in ascending order.

Your function should return a new Array or `None` by implementing this logic:
- Extract the values for the "month" columns only into a 2-D array.
- Flatten it to a 1-D array such that each data point is in chronological order.
- Handle the missing values.
    - Identify the index of all missing values in the 1-D array.
    - Identify the largest index of a non-missing value in the 1-D array.
    - If there are missing values anywhere except the end of the 1-D array, return `None`
    - Otherwise, return the 1-D array with the missing values removed from the end.

In [9]:
### Define demo inputs

demo_df_ex2 = \
pd.DataFrame([[2021,261.582,263.014, 264.877, 267.054, 269.195, 271.696, 273.003, 273.567, 274.31, 276.589, 277.948, 278.802],
            [2022, 281.148, 283.716, 287.504, 289.109, 292.296, 296.311, 296.276, 296.171, 96.808, 298.012, np.nan, np.nan]],
            columns=['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])

demo_invalid_df_ex2 = \
pd.DataFrame([[2021,261.582,263.014, np.nan, 267.054, 269.195, 271.696, 273.003, 273.567, 274.31, 276.589, 277.948, 278.802],
            [2022, 281.148, 283.716, 287.504, 289.109, 292.296, 296.311, 296.276, 296.171, 96.808, 298.012, np.nan, np.nan]],
            columns=['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])

<!-- Expected demo output text block -->
The demo included in the solution cell below should display the following output:
```
Demo output
[261.582 263.014 264.877 267.054 269.195 271.696 273.003 273.567 274.31
 276.589 277.948 278.802 281.148 283.716 287.504 289.109 292.296 296.311
 296.276 296.171  96.808 298.012]

Demo handling invalid input
None
```
The demo runs your solution first on a `df` input with missing values only at the end (an array is expected as output). Then runs it on a `df` input with a missing value in the middle (`None` is expected as output).

In [10]:
### Exercise 2 solution
def to_ts(df):
    assert (['Year'] + 'Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec'.split()) == list(df.columns)
    ### BEGIN SOLUTION
    just_vals = df.drop(columns='Year').values
    ts_vals = just_vals.reshape((just_vals.size,))
    null_inds = np.argwhere(np.isnan(ts_vals)).reshape((-1,))
    null_inds.sort()
    last_not_null_ind = np.argwhere(~np.isnan(ts_vals)).reshape((-1,)).max()
    if (null_inds > last_not_null_ind).all() or (null_inds.shape[0] == 0):
        return ts_vals.reshape(-1, )[:(last_not_null_ind+1)]
    else:
        return None
    ### END SOLUTION

### demo function call
demo_output_ex2 = to_ts(demo_df_ex2)
demo_invalid_ex2 = to_ts(demo_invalid_df_ex2)
print('Demo output')
print(demo_output_ex2)
print()
print('Demo handling invalid input')
print(demo_invalid_ex2)

Demo output
[261.582 263.014 264.877 267.054 269.195 271.696 273.003 273.567 274.31
 276.589 277.948 278.802 281.148 283.716 287.504 289.109 292.296 296.311
 296.276 296.171  96.808 298.012]

Demo handling invalid input
None


In [18]:
# Kathie solution
def to_ts(df):
    assert (['Year'] + 'Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec'.split()) == list(df.columns)
    ### BEGIN SOLUTION
    """
    Input:
    df: a dataframe with year and then months as the column names, the rows are in ascending order by year.

    Goal:
    Return a new array or None after
    1. Return new 1-D array with values ordered from left to right, top to bottom from the original array. Remove np.nan at the end
    2. Return None if the input dataframe contain np.nan before the end (bottom right entries)

    Strategy
    1. Convert month dataframe columns to a 2D array
    2. Flatten 2-D array into a 1-D array
    3. Iterate over 1D array and find if any values are nan (keep track of first location of nan)
      4. If there is number after nan, then the whole thing is bad and we immediately return None
    5. If nans are all located at the end of the array, return array up to the first location of nan.
    """
    # https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array
    arr_2d = df.iloc[:,1:].to_numpy()
    print(arr_2d)

    # https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html
    arr_1d = arr_2d.flatten()

    starting_nan_location = None
    for i in range(arr_1d.size):
      if np.isnan(arr_1d[i]) and starting_nan_location is None:
        starting_nan_location = i
      elif ~np.isnan(arr_1d[i]) and starting_nan_location is not None:
        # bad
        return None

    if starting_nan_location is None:
      return arr_1d
    else:
      return arr_1d[:starting_nan_location]






### demo function call
demo_output_ex2 = to_ts(demo_df_ex2)
demo_invalid_ex2 = to_ts(demo_invalid_df_ex2)
print('Demo output')
print(demo_output_ex2)
print()
print('Demo handling invalid input')
print(demo_invalid_ex2)

[[261.582 263.014 264.877 267.054 269.195 271.696 273.003 273.567 274.31
  276.589 277.948 278.802]
 [281.148 283.716 287.504 289.109 292.296 296.311 296.276 296.171  96.808
  298.012     nan     nan]]
[[261.582 263.014     nan 267.054 269.195 271.696 273.003 273.567 274.31
  276.589 277.948 278.802]
 [281.148 283.716 287.504 289.109 292.296 296.311 296.276 296.171  96.808
  298.012     nan     nan]]
Demo output
[261.582 263.014 264.877 267.054 269.195 271.696 273.003 273.567 274.31
 276.589 277.948 278.802 281.148 283.716 287.504 289.109 292.296 296.311
 296.276 296.171  96.808 298.012]

Demo handling invalid input
None


<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 2. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [19]:
### test_cell_ex2
from tester_fw.testers import Tester

conf = {
    'case_file':'tc_2',
    'func': to_ts, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'df':{
            'dtype':'pd.DataFrame', # data type of param.
            'check_modified':True,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'np.ndarray',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-10)
        }
    }
}
tester = Tester(conf, key=b'z0BNF11iKYQicR63590bVXZGa19YGvJcmzrbP6R7oAY=', path='')
for _ in range(200):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')

[[30.9 30.9 30.9 30.9 30.9 31.  31.1 31.  31.1 31.1 31.2 31.2]
 [31.2 31.2 31.3 31.4 31.4 31.6 31.6 31.6 31.6 31.7 31.7 31.8]
 [31.8 32.  32.1 32.3 32.3 32.4 32.5 32.7 32.7 32.9 32.9 32.9]
 [32.9 32.9 33.  33.1 33.2 33.3 33.4 33.5 33.6 33.7 33.8 33.9]
 [34.1 34.2 34.3 34.4 34.5 34.7 34.9 35.  35.1 35.3 35.4 35.5]
 [35.6 35.8 36.1 36.3 36.4 36.6 36.8 37.  37.1 37.3 37.5 37.7]
 [37.8 38.  38.2 38.5 38.6 38.8 39.  39.  39.2 39.4 39.6 39.8]
 [39.8 39.9 40.  40.1 40.3 40.6 40.7 40.8 40.8 40.9 40.9 41.1]
 [41.1 41.3 41.4 41.5 41.6 41.7 41.9 42.  42.1 42.3 42.4 42.5]
 [42.6 42.9 43.3 43.6 43.9 44.2 44.3 45.1 45.2 45.6 45.9 46.2]
 [46.6 47.2 47.8 48.  48.6 49.  49.4 50.  50.6 51.1 51.5 51.9]
 [52.1 52.5 52.7 52.9 53.2 53.6 54.2 54.3 54.6 54.9 55.3 55.5]
 [55.6 55.8 55.9 56.1 56.5 56.8 57.1 57.4 57.6 57.9 58.  58.2]
 [58.5 59.1 59.5 60.  60.3 60.7 61.  61.2 61.4 61.6 61.9 62.1]
 [62.5 62.9 63.4 63.9 64.5 65.2 65.7 66.  66.5 67.1 67.4 67.7]
 [68.3 69.1 69.8 70.6 71.5 72.3 73.1 73.8 74.6 75.2 75.

## Exercise 3 - (**2** Points):
Eventually, we are going to plot some of the time series data, so we will need a date axis to provide context for users. We can extract this from our source DataFrame.

Define the function `date_series(df, n)`. The input `df` will have these columns `['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']` in that order. Also the `'Year'` column will be sorted in ascending order. We are unconcerned with the values or types in any other columns. The input `n` will be a positive integer smaller than `12*df.shape[0]`.  

Your function should return a Pandas Series with dtype of `datetime64` containing the timestamp for midnight on the first day of the first `n` months represented in `df`. The [`pd.to_datetime()`](https://pandas.pydata.org/docs/reference/api/pandas.to_datetime.html) function is useful in converting the dates.

In [None]:
### Define demo inputs
demo_df_ex3 = pd.DataFrame(columns=['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
demo_df_ex3['Year'] = [1961, 1962, 1963]
demo_n_ex3 = 14
display(demo_df_ex3)

<!-- Expected demo output text block -->
The demo included in the solution cell below should display the following output:
```
0    1961-01-01
1    1961-02-01
2    1961-03-01
3    1961-04-01
4    1961-05-01
5    1961-06-01
6    1961-07-01
7    1961-08-01
8    1961-09-01
9    1961-10-01
10   1961-11-01
11   1961-12-01
12   1962-01-01
13   1962-02-01
dtype: datetime64[ns]
```
Notice that the items are `datetime64` and not strings.

In [None]:
### Exercise 3 solution
def date_series(df, n):
    ### BEGIN SOLUTION
    return pd.Series([pd.to_datetime(str(year)+'-'+str(i)+'-01')\
        for year in df['Year']\
            for i, month in enumerate(df.columns) if month!='Year'])[:n]
    ### END SOLUTION

### demo function call
demo_output_ex3 = date_series(demo_df_ex3, demo_n_ex3)
demo_output_ex3

<!-- Test Cell Boilerplate -->
The cell below will test your solution for Exercise 3. The testing variables will be available for debugging under the following names in a dictionary format.
- `input_vars` - Input variables for your solution.
- `original_input_vars` - Copy of input variables from prior to running your solution. These _should_ be the same as `input_vars` - otherwise the inputs were modified by your solution.
- `returned_output_vars` - Outputs returned by your solution.
- `true_output_vars` - The expected output. This _should_ "match" `returned_output_vars` based on the question requirements - otherwise, your solution is not returning the correct output.

In [None]:
### test_cell_ex3
from tester_fw.testers import Tester

conf = {
    'case_file':'tc_3',
    'func': date_series, # replace this with the function defined above
    'inputs':{ # input config dict. keys are parameter names
        'df':{
            'dtype':'pd.DataFrame', # data type of param.
            'check_modified':True,
        },
        'n':{
            'dtype':'int', # data type of param.
            'check_modified':False,
        }
    },
    'outputs':{
        'output_0':{
            'index':0,
            'dtype':'pd.Series',
            'check_dtype': True,
            'check_col_dtypes': True, # Ignored if dtype is not df
            'check_col_order': True, # Ignored if dtype is not df
            'check_row_order': True, # Ignored if dtype is not df
            'check_column_type': True, # Ignored if dtype is not df
            'float_tolerance': 10 ** (-6)
        }
    }
}
tester = Tester(conf, key=b'z0BNF11iKYQicR63590bVXZGa19YGvJcmzrbP6R7oAY=', path='')
for _ in range(70):
    try:
        tester.run_test()
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
    except:
        (input_vars, original_input_vars, returned_output_vars, true_output_vars) = tester.get_test_vars()
        raise

print('Passed! Please submit.')