# Convert a CSV data to netCDF

Read the CSV file, generate the gridfile from the CSV lon and lat data, 
write data to file. Then use cdo to write the data to an netCDF file.

- read the ASCII file
- generate the gridfile
- write netcdf file

Input ```data data/1901_1.csv```: 
  
&ensp; &ensp; data is on a grid where the rows are longitudes and columns are latitudes

Output file: ```1901_1.nc```

</br></br>


In [1]:
import numpy as np
from cdo import *
import csv

For the sake of simplicity

In [2]:
cdo = Cdo()

Read Ascii data

In [3]:
ascii_data = np.genfromtxt('data/1901_1.csv', dtype=None, delimiter=',')

Get number of lines and columns

In [4]:
nlines = ascii_data.shape[0]
ncols = ascii_data.shape[1]

Print some information

In [9]:
print('--> data shape        = (%d,%d) ' % ascii_data.shape)
print('--> number of lines   = %d ' % nlines)
print('--> number of columns = %d ' % ncols)

--> data shape        = (720,360) 
--> number of lines   = 720 
--> number of columns = 360 


The data is in the wrong shape (columns x lines).
The rows and columns must be swapped (lines x columns). 

In [10]:
data = ascii_data.T
nlat = data.shape[0]
nlon = data.shape[1]

Print the information about the transposed data

In [13]:
print('\nCorrect shape of data!\n')
print('--> data shape      = (%d,%d) ' % data.shape)
print('--> number of lat   = %d ' % nlat)
print('--> number of lon   = %d ' % nlon)


Correct shape of data!

--> data shape      = (360,720) 
--> number of lat   = 360 
--> number of lon   = 720 


Set variable name

In [14]:
varname = 't'

Set missing value

In [15]:
missing = 1e20

Set time and reference time

In [16]:
reftime = '1900-01-01,00:00:00,1day'
time = '1901-01-01,12:00:00,1day'

Set NaN to missing value

In [17]:
data = np.nan_to_num(data, nan=missing)

Write data array to file data/var.txt

In [18]:
np.savetxt('data/var.txt', data, delimiter=', ', fmt='%1.2e') 

Write grid description file.

In [19]:
f = open('data/gridfile_ascii.txt', 'w')
f.write('gridtype  = lonlat'+'\n')
f = open('data/gridfile_ascii.txt', 'a')
f.write('gridsize  = '+str(nlines*ncols)+'\n')
f.write('xsize     = ' + str(nlon)+'\n')
f.write('ysize     = ' + str(nlat)+'\n')
f.write('xname     = lon'+'\n')
f.write('xlongname = longitude'+'\n')
f.write('xunits    = degrees_east'+'\n')
f.write('xfirst    = -179.75'+'\n')
f.write('xinc      = 0.5'+'\n')
f.write('yname     = lat'+'\n')
f.write('ylongname = latitude'+'\n')
f.write('yunits    = degrees_north'+'\n')
f.write('yfirst    = -89.75'+'\n')
f.write('yinc      = 0.5'+'\n')
f.close()

CDO command:
- read the ASCII data
- set variable name
- set the calendar, time and reference time
- set the missing value
- convert to netCDF file format

In [20]:
cdo.settaxis(
        time, input='-setreftime,1900-01-01,00:00:00,1day '+ \
        '-setcalendar,standard '+ \
        '-setmissval,'+str(missing)+ \
        ' -setname,'+varname+ \
        ' -input,data/gridfile_ascii.txt < data/var.txt',
        output='tmp.nc', 
        options = '-f nc')

'tmp.nc'

CDO command:
- add variable attributes long_name and units
- add global attribute source

In [21]:
cdo.setattribute(
        varname+'@long_name="monthly mean temperature",'+\
        varname+'@units="deg C",'+ \
        'source="CRU"',
        input='tmp.nc', 
        output='1901_1.nc')

'1901_1.nc'