## Writing tests for SGLM

#### For most analyses packages that we write, we often have particular data in mind as the input. However, it is unlikely that every user will have the same data. Therefore, it is important to write tests that can be used to check that the package is working as expected.

##### Your assignment here is to write two tests for the SGLM package: 
1. The first test should check that the function `sglm` returns an error when given an invalid input.
2. The second test should check that the function `sglm` returns the correct output for a given input. 


##### We will provide the backbone of the tests in this notebook. You will need to fill in the details. Feel free to use any resources you like to help you write the tests (including generative AI). Just make sure to cite any resources you use.

In [None]:
import numpy as np
import pandas as pd
import sys
import os 
if os.path.basename(os.getcwd()) == 'notebooks':
    os.chdir('..')
import glob
import pytest
import random

from sglm import utils, glm_fit

In [None]:
#write some functions to test data input 
def check_input(input_data, input_type):
    if input_type == 'something':
    pass
    print('Success! Data format is correct.')

def check_predictors(data, predictor_columns):
    if data is None:
        raise ValueError('data is empty!')

    print('Success! All checks passed.')

In [None]:
##Generate fake data

num_rows = 2000  
timeseries_range = (-1, 1)

# Create an empty DataFrame
data = pd.DataFrame(columns=["SessionName", "TrialNumber", "Timestamp", "predictor1", "predictor2", "predictor3", "photometryNI"])


In [None]:
check_input(data, 'df')

In [None]:
check_predictors(data, ['predictor1', 'predictor2', 'predictor3'])

### Now that you have your data, and have checked that it is in the correct format, try fitting the model and checking that the outputs are similar during different runs.
Note: that the outputs will not look biologically plausible, but that is okay for now. We're just checking that the outputs are similar.

In [None]:
project_name = 'test'
project_dir = r'path/to/project'

utils.create_new_project(project_name, project_dir)
project_path = os.path.join(project_dir, project_name)
files = os.listdir(project_path)

assert 'data' in files, 'data folder not found! {}'.format(files)
assert 'results' in files, 'results folder not found! {}'.format(files)
assert 'config.yaml' in files, 'config.yaml not found! {}'.format(files)

In [None]:
df = data.copy()
index_col = ['SessionName', 'TrialNumber', 'Timestamp']
df.set_index(index_col, inplace=True)

print('Your dataframe has {} rows and {} columns'.format(df.shape[0], df.shape[1]))

In [None]:
config_file = os.path.join(project_path, 'config.yaml')
config = utils.load_config(config_file)

In [None]:
response_shift, df_predictors_shift, shifted_params = glm_fit.shift_predictors(config, df)
print('Your dataframe was shifted using: {}'.format(shifted_params))

In [None]:
X_train,X_test, y_train, y_test = glm_fit.split_data(df_predictors_shift, response_shift, config)

print('Training data has {} rows and {} columns'.format(X_train.shape[0], X_train.shape[1]))
print('Testing data has {} rows and {} columns'.format(X_test.shape[0], X_test.shape[1]))

In [None]:
# Fit the model
model, y_pred, score, beta, intercept, sparse_beta = glm_fit.fit_glm(config, X_train, X_test, y_train, y_test)
print('Your model can account for {} percent of your data'.format(score*100))

In [None]:
# Fit the model again
model2, y_pred2, score2, beta2, intercept2, sparse_beta2 = glm_fit.fit_glm(config, X_train, X_test, y_train, y_test)
print('Your model can account for {} percent of your data'.format(score*100))

In [None]:
#Write a function to check the outputs of glm_fit.fit_glm
def check_y_pred(y_pred, y_pred2, tolerance=1):
        print('Success! y_pred and y_pred2 are simliar.')
