# 2022-54 OPIS Fuel Price Data

Test Plan: https://sandag.sharepoint.com/qaqc/_layouts/15/Doc.aspx?sourcedoc={c0fd7f23-7faa-4f2e-b7fb-79e477e131a6}&action=edit&wd=target%282022-41.one%7C5c70225e-b636-4090-a946-7545acf4abc3%2FTest%20Plan%7C683def87-dc70-4863-8cad-59d9ef21bd00%2F%29

In [1]:
# Note, this notebook will be opening Excel spreadsheets, which means that pandas requires openpyxl 
# This can be installed in your environment using "pip install openpyxl"
import pandas as pd
import sqlalchemy as sql

from pathlib import Path

ddam = sql.create_engine('mssql+pymssql://DDAMWSQL16/dpoe_stage')

## Download Data

In [2]:
def download_raw_data(user):
    """
    Download the four raw data files. Note that copies of these files were put into SharePoint.
    Also note that the file "San Diego County Retail Margin History.csv" is not actually formatted
    as a csv file, so additional processing was done

    :param user:    The user downloading the data from SharePoint. This is mostly here so that it
                    is easy for anyone to run the code
    :returns:       Tuple containing four dataframes. In order, data contained comes from the file:
                    "San Diego County Retail Margin History.csv"
                    "OPIS_FUEL_010105to123113.xlsx" 
                    "San Diego County Monthly Jan 2018-May 2019.xlsx"
                    "SDAG_OPIS retail margin history_020618.xlsx"
    """
    # The folder where raw data is stored
    base_url = Path(f"C:/Users/{user}/San Diego Association of Governments/SANDAG QA QC - Documents/Service Requests/2022/2022-54 OPIS Fuel Price Data QC/data/")

    # The four raw data files we are getting
    raw_files = [
        Path("San Diego County Retail Margin History.csv"),
        Path("OPIS_FUEL_010105to123113.xlsx"),
        Path("San Diego County Monthly Jan 2018-May 2019.xlsx"),
        Path("SDAG_OPIS retail margin history_020618.xlsx")
    ]

    # Get the four raw data files
    # Note the different behaviors depending on file extension
    raw_data = []
    for file in raw_files:
        if(file.suffix == ".csv"):
            raw_data.append(pd.read_csv(base_url / file))
        elif(file.suffix == ".xlsx"):
            raw_data.append(pd.read_excel(base_url / file))

    return tuple(raw_data)

a, b, c, d = download_raw_data("eli")

C:\Users\eli\San Diego Association of Governments\SANDAG QA QC - Documents\Service Requests\2022\2022-54 OPIS Fuel Price Data QC\data\San Diego County Retail Margin History.csv
C:\Users\eli\San Diego Association of Governments\SANDAG QA QC - Documents\Service Requests\2022\2022-54 OPIS Fuel Price Data QC\data\OPIS_FUEL_010105to123113.xlsx
C:\Users\eli\San Diego Association of Governments\SANDAG QA QC - Documents\Service Requests\2022\2022-54 OPIS Fuel Price Data QC\data\San Diego County Monthly Jan 2018-May 2019.xlsx
C:\Users\eli\San Diego Association of Governments\SANDAG QA QC - Documents\Service Requests\2022\2022-54 OPIS Fuel Price Data QC\data\SDAG_OPIS retail margin history_020618.xlsx


In [5]:
b

Unnamed: 0,Region Name,Product,Start Date,Station Count,Retail Average
0,"County - CA, San Diego",UNL,2005-01-01,421,2.003603
1,"County - CA, San Diego",MID,2005-01-01,377,2.124144
2,"County - CA, San Diego",PRE,2005-01-01,283,2.220431
3,"County - CA, San Diego",DSL,2005-01-01,148,2.209344
4,"County - CA, San Diego",UNL,2005-02-01,440,2.147615
...,...,...,...,...,...
427,"County - CA, San Diego",DSL,2013-11-01,393,4.041660
428,"County - CA, San Diego",UNL,2013-12-01,676,3.643526
429,"County - CA, San Diego",MID,2013-12-01,664,3.736425
430,"County - CA, San Diego",PRE,2013-12-01,664,3.832388


In [3]:
def download_SQL_dat(user):
    """
    Download the two SQL tables

    :param user:    TODO
    :returns:       Tuple containing two dataframes. In order, data contained comes from the SQL 
                    tables:
                    [dpoe_stage].[fuel_price_opis].[price_fact] 
                    [dpoe_stage].[fuel_price_opis].[date_dim]
    """
    pass

## Test Functions

## Running Tests