# COMP 352 Final Project - Spotify Predictions

### By Cameron McNamara, Bilal Adam, Maximo Babun

Requirements: 
  - There are four sections of the final project. You are expected to perform the following tasks within each section to fulfill the project requirements. Remember data science is cyclical in nature and requires multiple attempts and iterations. It is okay if your code moves between sections as you try different approaches, but at the end please try and organize your code into these sections for grading purposes.
- Data Importing and Pre-processing (100 Points)
  - Import dataset and describe characteristics such as dimensions, data types, file types, and import methods used
  - Clean, wrangle, and handle missing data, duplicate data, etc.
  - Encode any categorical variables
  - Perform feature engineering on the dataset
  - Transform data appropriately using techniques such as aggregation, normalization, and feature construction
  - Reduce redundant data and perform need based discretization
- Data Analysis and Visualization (100 Points)
  - Identify categorical, ordinal, and numerical variables within data
  - Provide measures of centrality and distribution with visualizations
  - Diagnose for correlations between variables and determine independent and dependent variables
  - Perform exploratory analysis in combination with visualization techniques to discover patterns and features of interest
  - Create visualizations that allow for the discovery of insights in the data

- Data Analytics (100 Points)
  - Determine the need for a supervised or unsupervised learning method and identify dependent and independent variables
  - Choose and provide reasoning for the selected metric or metrics employed to assess your model.
  - Train, test, cross validate, and provide performance metrics for model results
  - Try multiple different types of algorithms to determine the best model for your dataset
  - Analyze your model performance


First we must setup our environment to make sure we have all appropriate modules installed. To do this, I have provided 2 methods. The 1st, is to install all modules using a ```.yaml``` file via ```conda```. 

To do this, run:
```bash
conda env create -f env_setup/data_environment.yml
```
Then activate the environment by:
```bash
conda activate data_env
```

## Data Importing and Pre-processing <a class="anchor" id="data-importing"></a>

In [1]:
# import libraries needed
import pandas as pd

pd.set_option("display.max_columns", None)
import warnings

import branca
import folium
import geopandas as gpd
import lightgbm as lgb
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import xgboost as xgb
from branca.element import Figure
from folium import Marker
from folium.plugins import HeatMap
from scipy.special import boxcox1p
from scipy.stats import norm, probplot, skew
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import ElasticNet, LinearRegression
from sklearn.neighbors import KNeighborsRegressor
from sklearn.preprocessing import LabelEncoder
from sklearn.tree import DecisionTreeRegressor
from utils.model_utils import (
    time_series_split_regression,
    StackedEnsembleCVRegressor,
)
from utils.metrics_utils import (
    compute_rmse_std,
    print_rmse_and_dates,
)

warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=FutureWarning, module="pandas.*")
%matplotlib inline

ModuleNotFoundError: No module named 'utils'