## <font color=#00cc00>Processing data with NumPy</h3></font>

In [1]:
import numpy as np

In [2]:
# path to the data source
fp = r"C:\Users\Gustavocolmenares\Documents\SchoolStaff\GIS_PCC\Courses_Training\Geo_python_Course\Exercise-5\Pandas\Kumpula-June-2016-w-metadata.txt"
data = np.genfromtxt(fp, skip_header=9, delimiter=',')

In [3]:
# Provide name for all the columns array that are contained in the txt file
date = data[:, 0]
temp = data[:, 1]
temp_max = data[:, 2]
temp_min = data[:, 3]

## <font color=#ffbf00>Calculating with NumPy arrays</h3></font>

In [4]:
# Creating arrays based on calculation between different other arrays (Columns)
# Way 1:
    # Create array of zeros with same length as existing arrays. Use it as blanck space for calculation 
diff = np.zeros(len(date)) # new arrays of zeros using zeros() function, with length of date array

In [5]:
print(diff)

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0.]


## <font color=#ffbf00>Calcuting values using other arrays</h3></font>

In [6]:
# Using the new array 'diff' to calculate different between [temp_max and temp_min]
diff = temp_max - temp_min
print(diff)

[18.9 25.8 22.3 23.6 15.1 16.9 19.2 12.9  8.4 12.9 20.4 18.2 20.9 20.
 21.  11.9 14.8  8.8  5.1 16.9 21.  14.8 12.2 12.2 17.5 17.4 12.4 17.2
 13.5 13.5]


In [7]:
# By calculating between arrays, we automatically create array
diff_min = temp - temp_min
print(diff_min)

[10.8 10.8 12.8 10.2  8.2  9.4 11.   6.7  3.7  6.5 12.3  9.4 11.  11.9
 14.1  2.2  4.5  3.3  2.2  7.1 12.2  6.3  6.   4.4  7.8  9.3  3.1  9.6
  6.1  6.5]


In [8]:
# We can confirm this by checking the type of the diff_min array
type(diff_min)

numpy.ndarray

In [9]:
# Let's convert the temperatures in Fahrenheit to Celsius and stored in temp_Celsius array
temp_Celsius = (temp -32) / (9/5)
temp_Celsius

array([18.61111111, 18.77777778, 20.22222222, 14.16666667, 10.77777778,
       11.22222222, 13.83333333, 12.33333333,  9.66666667,  9.72222222,
       12.22222222, 13.        , 14.61111111, 15.38888889, 17.44444444,
       14.33333333, 15.77777778, 14.05555556, 13.5       , 15.16666667,
       17.        , 16.5       , 16.05555556, 16.16666667, 18.72222222,
       20.88888889, 15.94444444, 18.55555556, 18.77777778, 18.72222222])

## <font color=#ffbf00>Filtering Data <h3/></font>

In [10]:
# we can look for subset of the data that match some criteria. 
# We might want to create an array called w_temp that only contains 'warm' temperatures above 15C
w_temps = temp_Celsius[temp_Celsius> 15.0]
w_temps

array([18.61111111, 18.77777778, 20.22222222, 15.38888889, 17.44444444,
       15.77777778, 15.16666667, 17.        , 16.5       , 16.05555556,
       16.16666667, 18.72222222, 20.88888889, 15.94444444, 18.55555556,
       18.77777778, 18.72222222])

<p><b>Combine multiple criteria at same time:</b></p>
<ul>
<li> Select temp > 15 and recorded in the second half of June 2016 (date >= 20160615)</li>
<li> Combining multiple criteria can be done with the " & " (AND) operator or, the " | " (OR) operator.</li>
<li> Separate this clauses inside of parentheses</li>

In [11]:
# temperatures above 15C in the second half of June 
w_temp2 = temp_Celsius[(temp_Celsius> 15) & (date >= 20160615)]
w_temp2

array([17.44444444, 15.77777778, 15.16666667, 17.        , 16.5       ,
       16.05555556, 16.16666667, 18.72222222, 20.88888889, 15.94444444,
       18.55555556, 18.77777778, 18.72222222])

## <font color= #33cc33>Using Data Masks</h3></font>
<p>We also can identify the dates with temp above 15C as the sample above and only keeps those dates in our other data columns, such as date , temp, etc.<br>
Mask array is a boolean(True/False) array, used to take a subset of data from other array. Rather than extracting w_temps directly.<br>
Let's start by identifying temp_Celsius above 15 (True) or temp_Celsius <15 (False).
</p>





In [12]:
w_temps_mask = temp_Celsius > 15.0
print(w_temps_mask)

[ True  True  True False False False False False False False False False
 False  True  True False  True False False  True  True  True  True  True
  True  True  True  True  True  True]


<p> Seeing a list of true or False values in array of same size as temp_Celsius.<br>
This array shows us wether the condition we stated is <i>True</i> or <i>False</i> for each index.
If we want to know the dates when temperature was above 15C, just simple take the values from the date array using the mask we just created
</p>

In [13]:
w_temp_dates = date[w_temps_mask]
print(w_temp_dates) # Shows subsets of dates that match condition of temperature above 15C

[20160601. 20160602. 20160603. 20160614. 20160615. 20160617. 20160620.
 20160621. 20160622. 20160623. 20160624. 20160625. 20160626. 20160627.
 20160628. 20160629. 20160630.]


In [14]:
# To prove this, let's check the length of w_temps and w_temp_dates.
# If they are the same it means that both has the same dates where the temperature exceeded 15C
print(len(w_temps))
print(len(w_temp_dates))

17
17


## <font color=#33cc33 >Removing Missing/ Bad Data</h3></font>
<p>
sometimes data comes with missing values or values that cannont be read, this may be replaced by nan "not a number", and you want to get rid if theses values.
</p>

In [16]:
# let's create an array called bad_data with same size as the array date
bad_data = np.zeros(len(date))

In [20]:
# add nan values to the first 5 rows
bad_data[:5] = np.nan
print(bad_data)

[nan nan nan nan nan  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.
  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]


In [19]:
# let's check the result
bad_data_mask = np.isfinite(bad_data)
print(bad_data_mask)

[False False False False False  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True]


<p>
If want to include values for the data column that only corresponde to locations in bad_data
where threr's not nan values, use <b>isifinite()</b> function in NumPy.
isfinite() checks to see iff value is defined
</p>

In [22]:
# let's create a mask with bad_data
bad_data_mask = np.isfinite(bad_data)
print(bad_data_mask)

[False False False False False  True  True  True  True  True  True  True
  True  True  True  True  True  True  True  True  True  True  True  True
  True  True  True  True  True  True]


In [23]:
# fi we want to include only the dates woith good data, we can use the mask as we did before
good_dates = date[bad_data_mask]
print(good_dates)

[20160606. 20160607. 20160608. 20160609. 20160610. 20160611. 20160612.
 20160613. 20160614. 20160615. 20160616. 20160617. 20160618. 20160619.
 20160620. 20160621. 20160622. 20160623. 20160624. 20160625. 20160626.
 20160627. 20160628. 20160629. 20160630.]


## <font color=#432541 >Rounding and Finding Unique Values</h3></font>
<p>We can find unique values in an array using the <b>unique()</b> function</p>

In [24]:
unique = np.unique(temp_celsius_rounded)
print(unique)

NameError: name 'temp_celsius_rounded' is not defined