# A Python short course on Atmospheric Data Analysis - Week 3

This Python tutorial was written in June 2024 by Ludving Cano, Research Assistant at the [Laboratory for Atmospheric Physics](http://www.chacaltaya.edu.bo) - UMSA (lcano@chacaltaya.edu.bo). It shows how to handle time data and do basic plots on it.

On **week 3** we will cover:

 - Datetime objects
   - Formatting datetimes
 - Pandas functions for datetime
 - Simple time plots
 - Plotting multiple things together

## 1. The `datetime` library
Let's start with a library that's gonna be useful in general, the [datetime library](https://docs.python.org/3/library/datetime.html) consists on general functions and objects dedicated to handle...time?

For example, when we declare a date, we have to put it in a string, so for the computer is a string like the others, it can't interpret its year, day, month, etc. 

In [None]:
day0s = '2024/06/27'
type(day0s) #its type is just an ordinary string

If we want, for example, to add one day to this string, it's PAINFULLY hard, and it can bring some problems!

Our saviour for today, the datetime library. Let's import it

In [None]:
import datetime as dt     #it's very common to import it with the dt alias

### 1.1. Creating a dt.datetime object
We have three objects to work on:
 1. `date` consisting only on information of the date (not knowing the exact time)
 2. `time` consisting only on the time (not knowing the date)
 3. `datetime` consisting on both, date and time information.
 
We'll work mainly on the last one, but feel free to explore the other two using the [documentation](https://docs.python.org/3/library/datetime.html).

Okay, let's create a datetime object, it's important that this is different from the library's name (datetime too), so to create the object we use `dt.datetime`.

First let's take a look on what to add to create a datetime object, for this:

In [None]:
# help(dt.datetime) #uncomment it to use

The basic way to know what to add to create the object is passing the year, month, day, hour, minutes and seconds. Time information is optional, it'll assume it's 00:00.

For the sake of the example we will set the date and time for the start of this class.

In [None]:
day0 = dt.datetime(2024, 6, 27, 13, 30)
day0

If we want to create a datetime object with the current date and time, we use:

In [None]:
dt.datetime.now()

### 1.2. Getting information from an existent datetime object

We can get any information by simply calling the attributes:

In [None]:
day0.hour
# try to change it for: year, month, day, minutes, seconds, etc.

But also we can get a `time` and a `date` object.

In [None]:
day0.date()

In [None]:
day0.time()

### 1.3. Operations using datetime objects

#### 1.3.1. Adding or substracting time

What if we want to change the time to the end of the next class? Then we add time! (That's beautiful, it accounts even for leap years).

For this we simply add an object called `dt.timedelta()` in which we put the change desired.

In [None]:
dayf = day0 + dt.timedelta(days = 7, hours = 3)
dayf

If you want to go back in time, you can substract the object timedelta OR pass negative arguments to `dt.timedelta()`.

#### 1.3.1. Replacing parts of the object

Oh no! I put the wrong month, what if we want to replace the month? We'll use `dt.replace()`.

In [None]:
daymod = day0.replace(month = 8)
daymod

#### 1.3.2. Substracting datetime objects

What if we want to know how many days have passed since something? For example, since Christmas? We can substract datetime objects and get a timedelta as result.

In [None]:
last_xmas = dt.datetime(2023, 12, 25)
today = dt.datetime.now()

dif = today - last_xmas
dif

Take into account that the resulting timedelta only have the attributes `days`, `seconds` and `microseconds`. 

<b><font color="green" size=5>Exercise 1: Your real age</font></b>

Create necessary datetimes and answer the following questions
1. How long has passed since your last birthday?
2. Calculate in how much will pass for your next bday


#### 1.3.3. Comparing dates

Just for info, we can compare two datetime objects and get to know if one happened before or after the other one. Using < and >:

In [None]:
day0 < dayf

#### 1.3.4. Sorting a list of datetimes
When we can apply operators < or > to objects, we can sort a list that is made of them, for example I can compile some of the datetime objects we created before into a list and show it.


In [None]:
lst_datetimes = [daymod, day0, dayf]
lst_datetimes

To sort it, we just do:


In [None]:
sorted(lst_datetimes)

## 1.4. Formatting datetimes
### 1.4.1. String to datetime
Remember our first example? `day0s`

In [None]:
day0s

How can we convert that string to a datetime? For this the library has the `strptime()` function, which takes an arbitrary string and a format, with that the function parses the string and returns the desired object.

For example, our day0s string has the format year/month/day. According to the [datetime formats in documentation](https://docs.python.org/3/library/datetime.html#format-codes) the format can be written as `%Y/%m/%d`. So we add this to the function:

In [None]:
day0f = dt.datetime.strptime(day0s, '%Y/%m/%d')
day0f

<b><font color="green" size=4]>Exercise 2: Format a date</font></b>

Convert the following string to a datetime object.

In [None]:
test_str = '10-13-2024 10:00'

### 1.4.2. Datetime to string
Well, sometimes we want to print a datetime with a certain format, it may be for writing in a simple way in a file, display it on screen or any other use.

For this we pass a datetime object and the desired output format into the `dt.datetime.strftime()` function.
For example I can print the `.now()` datetime in the format of the exercise 2:

In [None]:
dt.datetime.strftime(dt.datetime.now(), '%m-%d-%Y %H:%M')

## 2. Pandas functions for datetime

### 2.1. Columns of datetime

On the last session we observed that pandas doesn't infer automatically all the data, so, for example we can see that a column that clearly has dates is interpreted as object.

We will use the IUV file from the last week: `data_samples/IUV_18_06.csv`

In [None]:
# let's not forget to import pandas!
import pandas as pd

In [None]:
df1 = pd.read_csv('data_samples/IUV_18_06.csv')
df1.dtypes

In [None]:
type(df1.TIME[0])

To convert the TIME column to a datetime-like column, we use the `pd.to_datetime()` function, and passing the desired column as parameter. **NOTE: It will do its best to infer which format we are using, but sometimes you can pass the exact format just in case.**

In [None]:
pd.to_datetime(df1.TIME)

To replace the column in our original dataframe, we can replace it as we did before:


In [None]:
df1['TIME'] = pd.to_datetime(df1.TIME)
df1.dtypes

### 2.2. Some functions of a datetimelike column
Sometimes we would need to get something related to a datetime object, like getting the hour, as we are having a series, we would need to get the value for each row, this can be done (manually) like:

In [None]:
### NOT RECOMMENDED
pd.Series([i.hour for i in df1.TIME])

But it can be easier using the `Series.dt` functions, that allow us to get data for the column itself.

In [None]:
## this is the recommended way!
df1.TIME.dt.hour

### 2.3. An example for different columns
Let's use for example the DECADE file we worked last week, it has a column for year, month and day, so to get a column datetime we will do the following:

In [None]:
## This example will be worked with the instructor, solutions below



<details><summary><b><font color="black">If the instructor is not available click here</font></b></summary>

```
df_dec = pd.read_csv('data_samples/EL_ALTO_AEROPUERTO_simple.dat',
                delim_whitespace=True, skiprows = 12)
df_dec = df_dec[['Year','Mo','Da','Param=1']]
df_dec.columns = ['Year','Month', 'Day', 'PRCP']
df_dec['datetime'] = pd.to_datetime(df_dec[['Year','Month','Day']])
```


## (Little parenthesis): Reading a Davis AWS file
The Davis Automatic Weather Station was located at an experimental site last year, this file is a bit complicated, as the header consist of two rows of headers!

The file is located at `data_samples/FabricaForno_Davis_WS.txt`.

In [None]:
davis = pd.read_csv('data_samples/FabricaForno_Davis_WS.txt', skiprows=2, sep = '\t', header = None)

#recreating the header
hd1, hd2 = [i.split('\t') for i in open('data_samples/FabricaForno_Davis_WS.txt').readlines()[:2]]
hdd = [(i+' '+j).strip() for i, j in zip(hd1, hd2)]

davis.columns = hdd
davis

Now we want to create a datetime column using the first two columns Date and Time, to do this we will 

In [None]:
davis['datetime'] = pd.to_datetime(davis.Date + ' ' + davis.Time)
davis.drop(['Date', 'Time'], axis = 1)

## 3. Plotting time series  
(If you are not following the class, go to the slides related to this week)

As said in the theory, we have different kinds of plots, each one can represent one, two, or more variables, some of them are useful for a certain thing.

For most of our plotting we will use the library `matplotlib`

In [None]:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates #this is gonna be useful later

### 3.1. Distributions (histograms)

Distributions are a way to see how a data (one variable) is distributed, that is, how many of its values lie in a certain range. Please do not think that are the same as barplots, that we can work if we have the time.

A distribution is best seen with a **histogram**, and to create a simple one we can call `plt` and pass our series, in this case the Ultraviolet Index Value (IUV) column of `df1`.

In [None]:
plt.hist(df1.IUV)

Note some things, in the x-axis we get our data distributed in 10 bars (called bins), in the y-axis we have the quantity of values that lie in that range.

For the sake of the experiment we can change the number of bins, also we can set our x and y labels using the following commands:

In [None]:
plt.hist(df1.IUV, bins = 20)
plt.xlabel('IUV')
plt.ylabel('Frequency')

Histograms can be complicated as one wants, you are invited to read its [official documentation.](https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.hist.html)

### 3.2. Plotting time series: Scatter vs Line Plot

So, here's the dilemma, when is it better to show a line that crosses the points? when do we only need the points? or when do we need only a line? 

The answer is... yes

It's hard since it depends, what do you want to show: A correlation between two axis? a scatter! But the fitted line? a line plot!

Let's put an example on the table, I'll show a scatter and a Line Plot for IUV.

Both of our options take two important parameters: x and y, they **have** to have the same lenght.

In [None]:
plt.plot(df1.TIME, df1.IUV)

Something is.. weird, to configure the way the date is showed we can use the following commands:

In [None]:
plt.plot(df1.TIME, df1.IUV)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))

In [None]:
plt.scatter(df1.TIME, df1.IUV)