### Esto es un titulo

In [None]:
import netCDF4
import numpy as np
import pandas as pd
import os

In [None]:
# Let's load the ".nc" file using netCDF4 library and
# store a pointer to it in the "data" variable
# we assume that the file you want to load is in the same folder
# as this .ipynb file

data = netCDF4.Dataset('2011.daily_rai.nc', 'r')

> If I would like to, I could store ALL the arrays for the different keys in the 'variables' dictionary as below (uncomment if you want to play with that). We won't do that now since we are trying to understand what **shape** the data has.

In [None]:
#lat = data.variables['lat'][:]
#lon = data.variables['lon'][:]
#time = data.variables['time'][:]

Let's look at the **shape** of the data we have at hand, begining by looking at the different variables we have available to us. 

In [None]:
# What are my available dimensions? (the ones that describe the general 'shape' of the data)
data.dimensions

In [None]:
# What are my variables? (the ones actually containing the data)
data.variables

If we want to see which **methods** (functions that *do something*) and **attributes** (as the name indicates, these are mere *properties*, they return a value rather than *performing an action on the data*) are available to me on "one" of the dimensions, in this case "latitude" we can use the **dir** default method. 

> **NOTE**: the "dir" method is built-in into python, it has nothing to do with netCDF4 or any other imported library. You can use "dir" to inspect any kind of python object.

What is the description *https://silo.longpaddock.qld.gov.au/data-products* is giving us about the data?

```
Gridded daily climate surfaces which have been derived either by splining or kriging the observational data:

The grid spans 112°E to 154°E, 10°S to 44°S with resolution 0. 05° latitude by 0.05° longitude (approximately 5 km × 5 km). 
```

In [None]:
dir(data.variables['lat'])

## Let's explore the contents of the "lat" key inside the "variables" dictionary

In [None]:
data.variables['lat']

In [None]:
# first value
data.variables['lat'][0]

# last value
data.variables['lat'][680]

# Total values?
len(data.variables['lat'])

# Ok, let's get into the juicy stuff

If we recall from our previous "data shape" exploration, we saw this for the "max temp" variable: 

```
('max_temp', <class 'netCDF4._netCDF4.Variable'>
              int16 max_temp(time, lat, lon)
                  _FillValue: -32768
                  add_offset: 0.0
                  long_name: Maximum temperature
                  units: Celsius
                  scale_factor: 0.1
              unlimited dimensions: time
              current shape = (365, 681, 841)
              filling on)
```

In *pythonic terms* this is actually a **class** (*<class 'netCDF4._netCDF4.Variable'>*) but we don't have to worry about that at the moment. Suffice to say that this is an object called **max_temp** that will use the previous variables (*time, lat, lon*) to pinpoint a particular value out of the many. This object is also showing us that its "curret shape" (structure), and telling us that we need to pass a **time**, **lat** and **lon** (in that order)

In [None]:
data.variables['max_temp'][45][234][80]

In [None]:
# Accessing data for day 45 in latitude 234 (-21.7 degrees) in longitud 80 (116 degrees)
data.variables['max_temp'][45][234][80]

# How many datapoints are there inside the pair (time=45,lat=234)? 
# (we know it's going to be 841 but we will use the "len" built-in method
# that returns the lenght of an object in python)
len(data.variables['max_temp'][45][234])


In [None]:
# How to access multiple data
for p in range(len(data.variables['max_temp'][45][234])):
    print(((p*0.05)+112), data.variables['max_temp'][45][234][p])

## Prototype function for obtaining "a" value based on a combination of three variables ("keys" in python terminology): time, lat, lon. 

For our next examples, we want to make our lives easier by capturing the contents of all the max_temp object inside a variable that we can easily manipulate. By doing this, we avoid having to call `data.variables['max_temp']` **every** time we want to access a datapoint inside "max_temp". We can just call `maxtemp` (our newly created variable)

In [None]:
maxtemp = data.variables['max_temp'][:]

Ok, now we need to create our "mother" function (template function) that will help us derive multiple different functions by adapting it. 
Because we need to extract data for the *Tasmania* region only, we first grabbed the 8 (lat,lon) pairs that make up the rectangle where Tasmania fits in. However, the data that we will be accessing **doesn't understand lat and lon**, it only understands **integers** (our lat dimension had 641 elements, which is the result of dividing the range of *lat* and *lon* by the scale (0.05). In other words, each *degree* is divided by 20. We need to capture that range, but *only for Tasmania*. That's why we will use the *arange* method provided to us by **numpy** to create a range of (lat,lon) degrees that span the Tasmania region only (we will also capture the "time"): 

```python
lat_range = np.arange(39,44,0.05)
lon_range = np.arange(143,150,0.05)
t = np.arange(0,365,1)
```

After doing this, we need to define the function with **def**. The function should accept three parameters (time, lat, lon)

> NOTE: the name you give to these parameters doesn't matter, they don't necessarily *need to match* with those found in your dataset. We could have named them (time, latit, longit) and the function would still work. These values are only relevant to the "function" itself.

```python
def get_cdf_values(t,lat,lon):
```

Now we need to create two new variables that will calculate the actual *slice* within the 841 slices of *lon* and 681 slices of *lat* that a given **real latitude or longitud value** matches up against, for that we will create two new variables:

```python
detailed_lat = int((lat - 10)/ 0.05)
detailed_lon = int((lon - 112)/ 0.05)
```

The function will, finally, print out the results by simply printing the latitude/longitud values we passed on to it (captured in the "lat" and "lon" variables of the function) and the actual datapoint for "max_temp". To get to the "max_temp" value we need to index into the original dataset (remember we captured all the range of values inside the **maxtemp** variable) and pass in the *time, lat, lon* indices. This is what we do when we write `maxtemp[t][detailed_lat][detailed_lon]`

The complete function follows:

In [None]:
lat_range = np.arange(39,44,0.05)
lon_range = np.arange(143,150,0.05)
t = np.arange(0,365,1)

def get_cdf_values(t,lat,lon):
    detailed_lat = int((lat - 10)/ 0.05)
    detailed_lon = int((lon - 112)/ 0.05)
    print('time:', t, 'lat:', lat, 'lon:', lon, 'value:', maxtemp[t][detailed_lat][detailed_lon])
    

In [None]:
# If you want to get ALL values (uncomment the following lines) 
'''
for day in t:
    for lat in lat_range:
        for lon in lon_range:
            get_cdf_values(day,lat,lon)
'''

### Let's make a function that will accept two spatial variables (lat, long) and will return each value for all days (time datapoints)

In [None]:
lat_range = np.arange(39,44,0.05)
lon_range = np.arange(143,150,0.05)
t = np.arange(0,365,1)
jday = np.arange(1,366,1)

def get_cdf_values_bytime(lat,lon):
    detailed_lat = int((lat - 10) / 0.05)
    detailed_lon = int((lon - 112) / 0.05)
    #print(detailed_lat)
    #print(detailed_lon)
    values = []
    
    for day in t:
        val = maxtemp[day][detailed_lat][detailed_lon]
        val = round(val,1)
        values.append(val)
        
        #print('jday', jday, 'value', val)
    
    df = pd.DataFrame(values, columns=['Tmax'], index=jday)
    return df


In [None]:
int((116 - 112) / 0.05)
#int((80 * 0.05) + 112)

In [None]:
df2 = get_cdf_values_bytime(39.05,145.10)
df2.plot()

In [None]:
df2.to_csv('3905-14510.csv')

In [None]:
from collections import OrderedDict

real_lat_range = np.arange(39,44,0.05)
real_lon_range = np.arange(143,150,0.05)
slice_lon_range = [int((x - 112)/ 0.05) for x in real_lon_range]

t = np.arange(0,365,1)
jday = np.arange(1,366,1)

def get_values_FuckYeah_export_csv(t,lat):
    detailed_lat = int((lat - 10)/ 0.05)
    
    data_values = []
    lon_values = []
    
    for lon_slice in slice_lon_range:
        # let's retrieve the specific data value first
        val = maxtemp[t][detailed_lat][lon_slice]

        if type(val) is not numpy.ma.core.MaskedConstant:
            val = round(val,1)
            # if the value is NOT "masked" then we have actually a value
            # and we will append it to the data_values list
            data_values.append(val)
            
        else:
            # if the value is "masked" then it's non-existent
            # and we will append a "None" instead of a real value
            data_values.append(None)
        
        # let's now append the "real longitude" value for this loop iteration
        #detailed_lon = int((lon - 112)/ 0.05)
        #real_lon = int((lon_slice * 0.05) + 112)
        #print(real_lon)
        #lon_values.append(real_lon)

    # we need to get the total amount of values collected
    total_values = len(data_values)
    
    # let's now create a numpy array containing the "latitude" value
    # we are pivoting off in this loop iteration. The same value
    # should be repeated as many times as "lon" values there are (841)
    lat_values = np.full(total_values, lat)

    # let's do the same as above but for the "date" dimension this time
    time_values = np.full(total_values, t)

    # now we need to fill a PANDAS DataFrame with the lists we've been 
    # compiling
    pandas_dict_of_items = {'Lat': lat_values,
                            'Lon': real_lon_range,
                            'jday': time_values,
                            'Tmax': data_values}
    
    df = pd.DataFrame.from_dict(pandas_dict_of_items)
      
    return df

## Invoking La Bestia Pop

**La Bestia Pop** should create df of 306965 rows each (841 x 365), **but** we are only going limiting this to to the data within the Tasmanian region. Thus, the total rows will be **365 x 141**: the total julian days *by all the points between lon 143 and 150 in 0.05 increments*. What we will do, so that we don't end up with a *super-masive-beast* df, is to export to a csv file each one of those df (i.e. every 306965) rows. Essentially, you will end up with a csv file **per each 0.05 fraction of a latitude degree** or, in other words, **20 files** for each full latitude degree.
The expected output that you will see is similar to this: 

```
Writting CSV file Lat-39.0.csv to C:\Users\PopBeast\Documents\Project-01
Writting CSV file Lat-39.05.csv to C:\Users\PopBeast\Documents\Project-01
Writting CSV file Lat-39.099999999999994.csv to C:\Users\PopBeast\Documents\Project-01
Writting CSV file Lat-39.14999999999999.csv to C:\Users\PopBeast\Documents\Project-01
```

If you go to your current folder, you should find the files in there, they will be 14MB files.

In [None]:
# let's first create an empty df
monster_df = pd.DataFrame()

# now let's iterate through each "latitude" value and, 
# within this loop, let's create another one that will
# iterate through the different "days" in the year

for lat in lat_range:
    for day in t:
        temp_df = get_values_FuckYeah_export_csv(day,lat)
        monster_df = monster_df.append(temp_df, ignore_index=True)
    
    # let's build the name of the file based on the value of the 
    # first row for latitude
    csv_name = 'Lat-{}.csv'.format(monster_df['Lat'][0])
    
    # let's get the current directory so we can let the user
    # know where the file is going to get written to
    current_dir = os.getcwd()
    
    # return info to user and write df to csv
    print("Writting CSV file", csv_name, "to", current_dir)
    monster_df.to_csv(csv_name)
            
    # when we get out of the previous loop, we will have a df
    # containing 306965 rows, we need to write that to a csv file
    # and then "reset" the monster_df back to zero.
    monster_df = pd.DataFrame()

In [None]:
lon_range = np.arange(143,150,0.05)
det_range = [int((x - 112)/ 0.05) for x in lon_range]
[((x * 0.05) + 112) for x in det_range]
len(real_lon_range)

## What is a dictionary in Python?

In [None]:
ejemplodic = {"lat": "45", "lon": "56"}

In [None]:
# What happens if I want to access the "lat" key?
ejemplodic['lat']

In [None]:
# What is a list?
example_list = ['zero', 'one', 'two']
example_list2 = [1,2,3,4,5]
example_list3 = [1,2,3,'four']


In [None]:
example_list[1]