<a href="https://colab.research.google.com/github/ASUcicilab/GIS322/blob/main/notebook/Module_1_(3)_Module_1_(3)_Review_of_Numpy_for_Basic_Usage_and_File_Read.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Review of numpy for basic usage and file read

One of the "classical" problems in GIS is the situation where you have a set of coordinates in a file and you need to get them into a map (or into a GIS-software). Python is a really handy tool to solve this problem as with Python it is basically possible to read data from any kind of input datafile (such as csv-, txt-, excel-, or gpx-files (gps data) or from different databases).

Tasks:
1. Read from sample dataset travelTimes_example_2019.txt using numpy. This file contains travel times in some sampled trips.

The first four rows of our data looks like this:
   from_id;to_id;fromid_toid;route_number;at;from_x;from_y;to_x;to_y;total_route_time;route_time;route_distance;route_total_lines
   5861326;5785640;5861326_5785640;1;08:10;24.9704379;60.3119173;24.8560344;60.399940599999994;125.0;99.0;22917.6;2.0
   5861326;5785641;5861326_5785641;1;08:10;24.9704379;60.3119173;24.8605682;60.4000135;123.0;102.0;23123.5;2.0
   5861326;5785642;5861326_5785642;1;08:10;24.9704379;60.3119173;24.865102;60.4000863;125.0;103.0;23241.3;2.0

The important columns are:

**from_x**:	    x-coordinate of the origin location (longitude)

**from_y**:	    y-coordinate of the origin location (latitude)

**to_x**:	      x-coordinate of the destination location (longitude)

**to_y**:	y-coordinate of the destination location (latitude)

**total_route_time**:	Travel time with public transportation at the route




## A review of numpy

reference: https://docs.scipy.org/doc/numpy/user/quickstart.html

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers. In NumPy dimensions are called axes.

For example, the coordinates of a point in 3D space [1, 2, 1] has one axis. That axis has 3 elements in it, so we say it has a length of 3.

#### Let's take a good at a few examples of numpy

In [None]:
import numpy as np
# create a numpy array
a = np.array([7, 8, 9])
print(a)

[7 8 9]


In [None]:
# type of numpy array
type(a)

numpy.ndarray

In [None]:
# the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension.
# for a matrix with n rows and m columns, shape will be (n,m).
a.shape

(3,)

In [None]:
# the number of axes (dimensions) of the array
a.ndim

1

In [None]:
# the name of the type of the elements in the array
a.dtype.name

'int64'

In [None]:
# the size in bytes of each element of the array
a.itemsize

8

In [None]:
# the total number of elements of the array
a.size

3

#### Basic Operations:

**1. create a numpy array:**
*   **arange(m,n)**: A one dimensional array starting from m, ending in n-1
*   **array()**: Create an array from a list


   
**2. mathematical operations:** Use +, -, *, just as you do for operations of numbers

**3. element-wise product:** A*B

**4. matrix product:** A.dot(B) or A@B

See below examples.

In [None]:
# Create a new numpy array.
a = np.array( [5,20,50,60] )  # create a numpy array from a list
b = np.arange(2, 6)  # arange(m,n): set a range starting from m (2 in this case), ending with n-1 (6-1 in this case)
print(b)

[2 3 4 5]


In [None]:
# minus when two numpy arrays have the same dimension
c = a-b
print(c)

[ 3 17 46 55]


In [None]:
# take the square of each element in the array
b**2

array([ 4,  9, 16, 25])

In [None]:
# product

10*np.sin(a) # take the sin() of each element in the array and then time each by 10
print(np.sin(a))
print(10*np.sin(a))

[-0.95892427  0.91294525 -0.26237485 -0.30481062]
[-9.58924275  9.12945251 -2.62374854 -3.04810621]


In [None]:
# true or false
a < 35 # compare each element in a with 35

array([ True,  True, False, False])

Unlike in many matrix languages, the product operator * operates elementwise in NumPy arrays. The matrix product can be performed using the @ operator (in python >=3.5) or the dot function or method.

#### Element-wise Product

![alt text](http://www.public.asu.edu/~wenwenl1/gis322o/images/dot_product.png)



In [None]:
A = np.array( [[1,0],
               [1,2]] )
B = np.array( [[3,0],
               [2,4]] )

In [None]:
# elementwise product
A * B

array([[3, 0],
       [2, 8]])

####Matrix product:


![alt text](http://www.public.asu.edu/~wenwenl1/gis322o/images/matrix_product.png)

In [None]:
# matrix product
A @ B

array([[3, 0],
       [7, 8]])

In [None]:
# matrix product, equaivalent to A@B
A.dot(B)

array([[3, 0],
       [7, 8]])

**Create new matrix by defining its dimensions**

1. **np.ones((m,n))**: create all one matrix which has dimensions of m rows and n columns

2. **np.random.random((m,n))**: randomly generate a matrix which has dimensions of m rows and n columns with values within range of (0,1)



In [None]:
a = np.ones((2,3), dtype=int) # create all one matrix which has dimensions of 3 rows and 2 columns
b = np.random.random((2,3)) # randomly generate a matrix which has dimensions of 3 rows and 2 columns with values within range of (0,1)

print(a)
print(b)

[[1 1 1]
 [1 1 1]]
[[0.76602104 0.03129752 0.22519871]
 [0.91153471 0.6780027  0.01916153]]


 **Some operations, such as += and *=, act in place to modify an existing array rather than create a new one.**


In [None]:
a *= 3
a

array([[3, 3, 3],
       [3, 3, 3]])

In [None]:
b += a
b

array([[3.76602104, 3.03129752, 3.22519871],
       [3.91153471, 3.6780027 , 3.01916153]])

**Many unary operations, such as computing the sum of all the elements in the array, are implemented as methods of the ndarray class.**

**sum()**: get the summation of all elements in the matrix

**max()**: get the maximum value within the matrix

**min(**): get the minimum value within the matrix

In [None]:
a = np.random.random((4,8))
a

array([[0.77664715, 0.75842532, 0.85372134, 0.24085538, 0.34550379,
        0.98421482, 0.62625349, 0.1828257 ],
       [0.58968397, 0.84653846, 0.65814266, 0.52920124, 0.97476592,
        0.08348534, 0.6262228 , 0.04749273],
       [0.55461608, 0.50874813, 0.21853266, 0.7688291 , 0.71368085,
        0.04922459, 0.70320796, 0.95678416],
       [0.66780285, 0.09672928, 0.4422787 , 0.74093819, 0.82692854,
        0.44804998, 0.38576318, 0.28894064]])

In [None]:
a.sum()

17.495034978019074

In [None]:
a.min()

0.04749272773712887

In [None]:
a.max()

0.9842148240632305

**By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array:**

In [None]:
b = np.arange(12).reshape(3,4) # reshape the one dimensional array a matrix with three rows and four columns
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [None]:
# sum of each column
b.sum(axis=0)

array([12, 15, 18, 21])

In [None]:
# min of each row
b.min(axis=1)

array([0, 4, 8])

In [None]:
# cumulative sum along each row
b.cumsum(axis=1)

array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]])

## 2. read data from a file

Often you may need to read a file containing numerical data in Python . One of the options is to import the file/data in Python is Python’s NumPy library.

There are number of advantages to use NumPy. NumPy is designed to deal with numerical data, it is fast and it has loads of built-in functions that allow us to import and analyze the data easily. Let us see how to use NumPy to read numerical data file.

### Load NumPy library

In [None]:
# import necessary library
import numpy as np    # import numpy library as np

# numerical data file
filename="travelTimes_example_2019.txt"

### Load a csv file with NumPy

NumpPy’s loadtxt() function allows us to read numerical data file in text format in to Python. To load a CSV (Comma Separated Values) file, we specify delimitter to “,”.

### Load a csv file with NumPy and skip a row

NumPy’s loadtxt() function offers numerous options to load the data. For example,
if the data has header information in the first line of the file and if we want to ignore that we can use “skiprows” option.

By specifying “skiprows=1” while loading the numerical data, we will skip  the first line of the file. By specifying different “skiprows”, we can skip multiple lines of a file.

In [None]:
# use skiprows to skip rows
# data = np.loadtxt(filename, delimiter=",", skiprows=1)

### Load a csv file with NumPy, skip a row and select columns

If we want to load/read jusa few columns of of the data file, we can use “usecols” option and specify the column indices that we want to load. Which means, it's possible to choose the columns by index. For example, to load 1st, 2nd, and 5th column we use:

In [None]:
# usecols select columns
# data = np.loadtxt(filename, delimiter=",", skiprows=1,
#                   usecols=[0,1,4])

### Now let's try to read the text file provided for exercise

In [None]:
# load the data with NumPy function "loadtxt"
# Before running this statement, you need to upload the file "travelTimes_2015_Helsinki.txt" to your google colab
# See tutorial in Module 0
data = np.loadtxt("travelTimes_example_2019.txt", delimiter=';',skiprows=1, usecols=[5,6,7,8,9])
data    # check the input

array([[ 33.,  11.,  44.,  38., 125.],
       [ 71.,  24.,  17.,  49., 123.],
       [ 52.,  52.,  69.,  48., 125.],
       [ 68.,  48.,  67.,  34., 129.],
       [ 11.,  76.,  43.,  24., 118.],
       [ 28.,  36.,  84.,  53., 119.],
       [ 49.,  19.,  30.,  48., 123.],
       [ 47.,  56.,  20.,  28., 129.],
       [ 66.,  78.,  71.,  43., 125.],
       [ 64.,  49.,  32.,  89., 129.],
       [ 51.,  23.,  61.,  16., 173.],
       [ 28.,  37.,  82.,  67.,  86.],
       [ 90.,  41.,  65.,  58.,  90.],
       [ 37.,  84.,  50.,  48.,  92.],
       [ 20.,  35.,  46.,  58., 122.],
       [ 41.,  77.,  57.,  49., 122.],
       [ 31.,  36.,  16.,  57., 122.],
       [ 40.,  80.,  14.,  45., 123.],
       [ 40.,  57.,  22.,  21., 124.],
       [ 25.,  17.,  76.,  22., 126.],
       [ 85.,  31.,  64.,  22., 171.],
       [ 56.,  70.,  14.,  23.,  94.],
       [ 27.,  25.,  55.,  44.,  90.],
       [ 41.,  31.,  76.,  54.,  87.],
       [ 86.,  74.,  71.,  45.,  87.],
       [ 52.,  62.,  50.,

In [None]:
#Use .shape to check data dimensions
print(data.shape)

(40, 5)
