# LAB 1
## Load Data and visualize
In this Lab you will load measurements from a text file<br>
We could do it by hands but we will use the library pandas. It is a powerful tool for timeseries processing but it is however not efficient for real-time application. <br>
First let's import required libraries.

In [None]:
import os, sys
import pandas
import numpy as np
from datetime import datetime, timedelta
import pytz
import matplotlib.pyplot as plt

## Read a measurement csv file

### old school way
open file <br>
parse file

### pandas
A csv file can be read directly using pandas. Dates can be parsed directly when the format is standard. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html <br>
In our case, hour is given as decimal which is not a supported standard. It required then to do it separately.

In [None]:
Data=pandas.read_csv('DATA/BSRN_PAY_1MIN_2005.csv',delimiter=';',comment='#',header=None,names=['Year','Month','Day','UT','GHI','DHI','BNI','T2','RH'])

Visualise the first rows of the results with the head command

In [None]:
Data[0:10]

As said previously the time is not in a standard format the following lines convert UT time to Hour Minute

In [None]:
#Data['Hour']=Data['UT'].astype(int)
#convert UT time to hour minute second and insert it in the dataframe at a given position
Data.insert(3,'Hour',Data['UT'].astype(int))
Data.insert(4,'Minutes',np.round((Data['UT']*60) % 60).astype(int))
#remove 'UT' from dataframe
Data.drop(columns='UT',inplace=True)

Check the results using head command once again

In [None]:
Data.head()

Now that it is standard, it can be converted to datetime format.<br>
Use the pandas to_datetime command: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html

In [None]:
Data['Datetime']=pandas.to_datetime(Data[['Year','Month','Day','Hour','Minutes']],utc=True)

Check the regularity of the time stamps <br>
Mean should be 1 min

In [None]:
np.mean(Data['Datetime'].diff())

The standard deviation should be close to 0

In [None]:
np.std(Data['Datetime'].diff())

Set datetime as index: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.set_index.html

In [None]:
Data.set_index('Datetime',drop=True,inplace=True)

Create a 1-min date vector starting at 20050101T00:01:00Z and ending at 20051231T24:00 <br>
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.date_range.html

In [None]:
tzinfo=pytz.timezone('utc')
date_begin=datetime(2005,1,1,0,tzinfo=tzinfo)
date_end=datetime(2005,12,31,0,0,tzinfo=tzinfo)
complete_index=pandas.date_range(date_begin+timedelta(minutes=1),date_end+timedelta(days=1),freq='min',tz='utc')

Use the reindex command to use the previous vector as an index:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reindex.html

In [None]:
Data.reindex(complete_index)

Replace all the missing values (-999) by numpy not a number value (pandas replace command)

In [None]:
Data=Data.replace(-999,np.nan)

Check if data were missing in the initial dataset. Output date where data is missing

In [None]:
Data.index[Data.isnull().any(axis=1)]

Plot values. In Jupyter there is a shortcut for importing all you need: %pylab inline <br>
But be careful it replaces all the import you made before. We will not do it here since we import all libraries in the first cell.<br>
In pandas you can run directly plot on the desired column.

In [None]:
Data['GHI'].plot(title='1-min GHI - 2005')
plt.ylabel(r'W.$m^{-2}$')
plt.xlabel('Date')

Do the same but for a restricted date range (10 days during 2005).

In [None]:
Data['GHI'].loc[(Data.index >datetime(2005,7,1,tzinfo=tzinfo)) & (Data.index <= datetime(2005,7,10,tzinfo=tzinfo))].plot(title='1-min GHI - 2005 - 10 days in summer')
plt.ylabel(r'W.$m^{-2}$')
plt.xlabel('Date')

Plot data on a 2D plot minutes against day.<br>
Use the imshow command. You need to reshape the data in the right dimensions which means to know the number of days in 2005 and the number of minutes a day. At the end add a colorbar.

In [None]:
days=pandas.date_range(date_begin,date_end,freq='D')
nb_days=len(days)
nb_min=1440
GHI=np.reshape(Data.GHI.values,(nb_days,nb_min))
plt.figure(figsize=(15,5))
plt.imshow(np.transpose(GHI),aspect=.1)
plt.colorbar(fraction=0.046, pad=0.04)

Do the same on all interesting columns of the data ['GHI', 'DHI', 'BNI', 'T2', 'RH'] using a loop and subplot command.

In [None]:
plt.figure(figsize=(15,10))
for i,k in enumerate():
    plt.subplot(3,2,i+1)
    dat=np.reshape(Data[k].values,(nb_days,nb_min))
    plt.imshow(np.transpose(dat),aspect=.1)
    plt.colorbar(fraction=0.046, pad=0.04)
    plt.title(k)