### Intro to lists

Lists are data structures that come with Python (no package needed to use them)

Although lists are not a NumPy product (i.e., you do not need NumPy to use lists), we will do a brief intro to lists because they are a very basic and widely used data structure.

In [1]:
# Let's create a list, L1, with five numbers in it

L1= [8,23,22,0,10]
L1

[8, 23, 22, 0, 10]

In [2]:
type(L1)

list

In [3]:
L2= L1 + ['a','b']
L2

[8, 23, 22, 0, 10, 'a', 'b']

With lists, the "+" operator adds the elements to the list.

In [4]:
type(L2)

list

In [5]:
# Another way of adding elements to a list is by using the extend() method

L2.extend(['c', 'd'])

In [6]:
L2

[8, 23, 22, 0, 10, 'a', 'b', 'c', 'd']

The method append() is used to add one single element. See next:

In [7]:
L1.append(5)

In [8]:
L1

[8, 23, 22, 0, 10, 5]

In [9]:
L1.extend(20)

TypeError: 'int' object is not iterable

Use extend() when adding more than one item and append() when adding only one item

In [10]:
L1.append(20)

In [11]:
L1

[8, 23, 22, 0, 10, 5, 20]

Does append work for more than one item? Yes, but...

In [12]:
L1.append([30, 40])

In [13]:
L1

[8, 23, 22, 0, 10, 5, 20, [30, 40]]

In [14]:
# Remove the last element of L1 and add them back using extend()

L1.remove([30,40])

In [15]:
L1

[8, 23, 22, 0, 10, 5, 20]

In [16]:
L1.extend([30, 40])

In [17]:
L1

[8, 23, 22, 0, 10, 5, 20, 30, 40]

#### Indexing lists

In [18]:
L2

[8, 23, 22, 0, 10, 'a', 'b', 'c', 'd']

In [19]:
# Retrieve first element of the L2 list and return its type

L2[0]

8

In [20]:
type(L2[0])

int

In [21]:
# Retrieve the last element of the L2 list and return its type

L2[-1]

'd'

In [22]:
# Retrieve the last element of the L2 list and return its type. Alternative 2

L2[len(L2)-1]

'd'

In [26]:
type(L2[-1])

str

In [27]:
L2

[8, 23, 22, 0, 10, 'a', 'b', 'c', 'd']

In [28]:
# Return the index of an item from a list

# Example: Get the index of the value 23 in L2

L2.index(23)

1

In [29]:
L2.index('d')

8

In [None]:
# Example: Get the index of the value 'a' in L2

In [30]:
L2.index('a')

5

### NumPy arrays

Unlike Python lists, NumPy arrays can only contain data of the same type. If types do not match, NumPy will cast the values to have the same type.

__Note__: Cast is what in R is called coerce

In [32]:
# Import the NumPy package

import numpy as np

#### Create an array from a list

In [33]:
# Here we create an array from L1 by using the method array()

a1=np.array(L1)
a1

array([ 8, 23, 22,  0, 10,  5, 20, 30, 40])

In [34]:
type(a1)

numpy.ndarray

In [35]:
a2= np.array(L2)
a2

array(['8', '23', '22', '0', '10', 'a', 'b', 'c', 'd'], dtype='<U11')

__dtype__ is the property of an array that tells us the type of data it contains

https://numpy.org/doc/stable/reference/arrays.dtypes.html

In [36]:
type (a2)

numpy.ndarray

In [39]:
# Retrieve the first element of the a2 array and return its type

type(a2[0])

numpy.str_

All elements of a2 were casted to string because we transformed L2 into an array

In [45]:
# Another way of creating an array from a list:

arrayx= np.array([5,4,8,10, 1])
arrayx

array([ 5,  4,  8, 10,  1])

#### Create an array using the NumPy methods to generate arrays

In [41]:
# We can use np.full() to create an array with the same number repeated many times
# Ex, create an array with 1 repeated 10 times

np.full(10, 1)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [42]:
np.full(10, 0)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [43]:
# We can use the NumPy arange() method to generate sequences
# Ex. Let's generate a sequence with the numbers from 1 to 20

np.arange(1,21)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [44]:
# Ex. Let's generate a sequence with the numbers from 1 to 20, in steps of 2

np.arange(1,21,2)

array([ 1,  3,  5,  7,  9, 11, 13, 15, 17, 19])

In [55]:
# Use linspace() to generate a sequence of numbers with decimals between an interval
# Ex: Use it to generate 10 numbers between 0 and 1

np.linspace(0,1,10)

array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,
       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])

In [56]:
np.linspace(0,1,11)

array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])

In [60]:
    np.linspace(0,1,20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

In [61]:
np.linspace(1,2,11)

array([1. , 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. ])

In [62]:
np.linspace(1,21,11)

array([ 1.,  3.,  5.,  7.,  9., 11., 13., 15., 17., 19., 21.])

In [None]:
np.arange(1,22,2)

#### Numpy methods to generate random arrays

In [63]:
# Use randint() to create a random sample of integers taken from an interval of intergers
# Ex, generate five random numbers from 1 to 20

np.random.seed(1)

np.random.randint(1,21,5)

array([ 6, 12, 13,  9, 10])

In [66]:
# Use choice() to take a random sample from a group of values

# Ex: Get 5 random numbers from x without replacement

x= np.array([5,6, 34, 23, -5, 10, 23, 45,7,9])
x

array([ 5,  6, 34, 23, -5, 10, 23, 45,  7,  9])

In [67]:
np.random.seed(1)
np.random.choice(x,5,replace=False)

array([34,  9, 23, -5,  5])

In [68]:
np.random.seed(1)
np.random.choice(x,5,replace=True)

array([10,  7,  9, 10,  5])

In [73]:
# Generate an array with Normal data

# Ex: Generate 20 values from a variable with normal distribution with mean=0 and  variance =1

np.random.seed(1)
np.random.normal(0,1,20)

array([ 1.62434536, -0.61175641, -0.52817175, -1.07296862,  0.86540763,
       -2.3015387 ,  1.74481176, -0.7612069 ,  0.3190391 , -0.24937038,
        1.46210794, -2.06014071, -0.3224172 , -0.38405435,  1.13376944,
       -1.09989127, -0.17242821, -0.87785842,  0.04221375,  0.58281521])

### Indexing and slicing arrays

Let's practice indexing and slicing with 2 dimensional arrays. But first, let's create a one-dimensional array to see the
difference with a two-dimensional array.

In [69]:
np.full(10, 1)

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [70]:
np.full(10, 1).ndim

1

In [71]:
np.full(10, 1).shape

(10,)

Now, let's create a two dimensional arrays with 4 rows and 4 columns

In [72]:
np.full((6,4),1)

array([[1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1],
       [1, 1, 1, 1]])

In [74]:
np.arange(1,21)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20])

In [75]:
np.arange(1,21).reshape(4,5)

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

In [76]:
np.arange(1,21).reshape(5,4)

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20]])

In [78]:
# Let's create a two dimensional array called "mat"

mat= np.arange(1,21).reshape(5,4)
mat

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20]])

##### Indexing

In [79]:
mat[0,0]

1

In [80]:
mat[2,2]

11

In [81]:
mat[0,-1]

4

In [82]:
mat[-1,-1]

20

##### Slicing

In [83]:
mat

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20]])

In [84]:
# Slicing the first two rows

mat[0:2,]

# 0:2 this will include 0 and 1 since the last index is excluded

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [85]:
# Slicing the first two rows

mat[[0,1],]

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [89]:
# Slicing the first two rows

mat[[0,1],:]

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [88]:
# Slicing the first two columns

mat[:,[0,1]]

array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14],
       [17, 18]])

In [90]:
# The first and third rows

# Well... the index of the first row is 0 and the index of the third row is 2

mat[[0,2],]

array([[ 1,  2,  3,  4],
       [ 9, 10, 11, 12]])

In [91]:
# The first two rows and columns

mat[0:2,0:2]

array([[1, 2],
       [5, 6]])

In [93]:
# The first two rows and columns

mat [[0,1],0:2]

array([[1, 2],
       [5, 6]])

In [95]:
mat

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20]])

In [96]:
# The first two rows and cols 2 and 3

print(mat[0:2, 1:3])

# or ...

print(mat[0:2,[1,2]])

[[2 3]
 [6 7]]
[[2 3]
 [6 7]]


In [98]:
# What if you do not specify if you want the rows or cols?

mat[1]

array([5, 6, 7, 8])

### Some methods and properties of arrays

In [99]:
a1

array([ 8, 23, 22,  0, 10,  5, 20, 30, 40])

In [100]:
mat

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20]])

In [101]:
print(type(a1))
print(type(mat))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [102]:
a1.dtype

dtype('int32')

In [103]:
a2

array(['8', '23', '22', '0', '10', 'a', 'b', 'c', 'd'], dtype='<U11')

In [104]:
a2.dtype

dtype('<U11')

In [105]:
a1.size

9

In [106]:
mat.size

20

In [107]:
a1.ndim

1

In [108]:
mat.ndim

2

In [109]:
a1.shape

(9,)

In [110]:
mat.shape

(5, 4)

In [111]:
# Vectorize operations on NumPy arrays

a1

array([ 8, 23, 22,  0, 10,  5, 20, 30, 40])

In [112]:
a1 + 1

array([ 9, 24, 23,  1, 11,  6, 21, 31, 41])

In [113]:
a1*2

array([16, 46, 44,  0, 20, 10, 40, 60, 80])

In [115]:
# The sum method in NumPy

np.sum(a1)

158

In [117]:
# We can also use the Python built-in sum to add the elements of an array

sum(a1)

158

In [118]:
mat

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16],
       [17, 18, 19, 20]])

In [119]:
# Row total

np.sum(mat,axis=1)

array([10, 26, 42, 58, 74])

In [120]:
# Column total

np.sum(mat,axis=0)

array([45, 50, 55, 60])

In [121]:
np.sum(mat)

210

In [122]:
sum(mat)

array([45, 50, 55, 60])

The np.sort() can be used to sort an array

In [48]:
arrayx

array([ 5,  4,  8, 10,  1])

In [50]:
np.sort(arrayx)

array([ 1,  4,  5,  8, 10])

np.sort() does not work "in place", meaning that it does not sort the original array

In [47]:
arrayx

array([ 5,  4,  8, 10,  1])

In contrast to np.sort(), the method to sort lists (which is sort()), do the sorting in place (= it changes the original list)

In [51]:
listx= [5,  4,  8, 10,  1]
listx

[5, 4, 8, 10, 1]

In [53]:
listx.sort()

In [54]:
listx

[1, 4, 5, 8, 10]

#### Boolean arrays
#### Indexing using Boolean arrays (= using a Boolean mask)

In [123]:
a3= np.array([4,7,1,3,10,9,9,0])
a3

array([ 4,  7,  1,  3, 10,  9,  9,  0])

In [124]:
a3>5

array([False,  True, False, False,  True,  True,  True, False])

In [125]:
a4= a3>5
a4

array([False,  True, False, False,  True,  True,  True, False])

In [126]:
a4.dtype

dtype('bool')

In [128]:
# Let's get the elements in a3 greater than 5

a3[a3>5]

array([ 7, 10,  9,  9])

In [129]:
# Let's get the elements in a3 greater than 5 and less than 10

a3[(a3>5) & (a3<10)]

array([7, 9, 9])

In [130]:
a3>5

array([False,  True, False, False,  True,  True,  True, False])

In [131]:
# Let's get how many elements in a3 are greater than 5

np.sum (a3>5)

4

In [133]:
# Let's get how many elements in a3 are greater than 5 and less than 10

np.sum((a3>5) & (a3<10))

3

### Practice
#### Read the precipitation dataset

In [None]:
# Read the precipitation data as a local csv file

# The Pandas package is needed to read this csv file

import pandas as pd

In [None]:
rain_df= pd.read_csv('C:\\Users\\jheredi2\\Documents\\PythonDataAnalytics\\1-Datasets\\rain_dataset.csv')

In [None]:
rain_df.info()

In [None]:
# Create an array with the values from the column "PRCP" (the precipitation data)

rain_array= rain_df['PRCP'].values

In [None]:
type(rain_array)

Alternative way of creating the array: 

rain_array= rain_df.PRCP.values

#### Applying Numpy methods on rain array

In [None]:
rain_array[0:5]

First convert rain_array to inches by dividing by 254 (rain data is in one tenth of ml)

In [None]:
rain_array=rain_array/254
rain_array[0:20]

In [None]:
# Count how many days rained 

np.sum(rain_array>0)

In [None]:
# Count how many days the inches of rain were btw 0.5 and 1

np.sum((rain_array>=0.5)&(rain_array<=1))

In [None]:
# Get the median precipitation on rainy days in 2014 (inches)

# How do we get the values of rain in rainy days?

rain_array[rain_array>0]

In [None]:
# Now apply the median() on the previous array

np.median(rain_array[rain_array>0])

In [None]:
round(np.median(rain_array[rain_array>0]),2)

In [None]:
# Median precipitation on summer days in 2014 (inches)
# What are the summer days? 
# The first day of summer is day 172 (which is June 21st) and the upper limit is day 262
# First, create an array with the day of the year (from 1 to 365)

day_year= np.arange(1,366)

In [None]:
# Create a Boolean array with True for Summer days

is_summer_day= (day_year>=172) & (day_year<=262)

is_summer_day.dtype

In [None]:
np.median(rain_array[is_summer_day])

In [None]:
# Mean precipitation on summer days in 2014 (inches)

np.mean(rain_array[is_summer_day])

In [None]:
# Maximum precipitation on summer days in 2014 (inches)

np.max(rain_array[is_summer_day])

In [None]:
# Median precipitation on non-summer rainy days (inches)

# Here we have two conditions: rainy and non-summer

# We can use ~ 

# This operattor (~) negates Boolean values of a NumPy array

np.median(rain_array[(rain_array>0)&~(is_summer_day)])