# 2. Data Manipulation

## Definitions

* **Data type** refers to the kind or modality of data. Some examples:
    * Structured, or *quantitative* data - floats and integers
    * Unstructured, or *qualitative* or categorical data - strings, audio, images, etc.
* **Data format** refers to the file type used to encode the data.
* **Data frame** refers to a tabular structure within which data is organized as a 2-D array.
* **Data schema** refers to the way in which the data is laid out in a data frame.
* **Data structures** are tabular arrays of data with a bunch of associated metadata.
* **Dimensionality** refers to the modalities of the data (e.g. spatial, temporal, etc.)


## Data formats

Different data types are best stored in different data formats.

For tabular data:
 * `.csv`
 * Parquet

For nstructured, non-relational data:
* `JSON`: JavaScript Object Notation, an open standard file format that uses human-readable text. The data may be attribute-value pairs and arrays. It is language-independent. The syntax looks like this:

                    {
                    "firstName": "John",
                    "lastName": "Smith",
                    "isAlive": true,
                    "age": 27,
                    "address": {
                        "streetAddress": "21 2nd Street",
                        "city": "New York",
                        "state": "NY",
                        "postalCode": "10021-3100"
                    }

For pixelized raster data (can be read using `rasterio`):
* `GeoTIFF`: metadata standard that allows for georeferencing information embedded in a TIFF (Tagged Image File Format) file. GeoTIFF is enhanced to be cloud optimized.
* `GeoJSON`: GeoJSON is a format for encoding a variety of geographic data structures in the JSON format.

For big heterogeneous data wth different data types:
* `NetCDF4`
* `HDF5`
* `Zarr`



This notebook didn't get completed beyond this point. 

### Reading GeoTIFF raster data with rasterio

Here an example of reading a GeoTIFF:

In [18]:
#libraries needed
import requests, zipfile , os, io
import folium
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
#import pycrs
import h5py
import rasterio
import netCDF4 as nc
import wget


from folium.plugins import MarkerCluster
from rasterio.mask import mask
from rasterio.plot import show

Next we want to download the data file, rename it, unzip it if needed, and then place it in the right location to use here. 

In [6]:
# download the data with wget

fname = 'HYP_50M_SR'
wget.download("https://www.dropbox.com/s/j5lxhd8uxrtsxko/"+str(fname)+"?dl=1") # note the last character as a string to request the file itself

#This will be saved to the lecture notes folder, but we want to put it in a new folder for data associated with these notes with the following command:

os.replace(fname + '.tif', './data_for_notes/' + fname + '.tif')

Now we have the .tif file in the data folder. Next we want to get the Digital Elevation Map out of it using rasterio. Some basic commands to use here:

In [17]:
elevation = rasterio.open("./data_for_notes/" + fname + ".tif")
