In [1]:
# REPL -- read, eval, print loop 
# Jupyter is based on a browser

# Jupyter's server runs on your computer (or somewhere else)
# Jupyter's client runs in your browser

# you get the illusion of running Python inside of your browser

# Course agenda

1. Jupyter (Monday)
2. NumPy  (Monday)
    - Arrays
    - Data types (dtypes)
    - Operations with NumPy arrays
    - Working with files (external data)
    - Boolean indexing
    - Searching, sorting, retrieving data
    - Plotting with Matplotlib
3. Pandas (Tuesday-Thursday)
    - Series and data frames
    - Working with data (setting, retrieving) in Pandas
    - Importing and exporting data (in various formats)
    - Filtering by rows and columns
    - Working with string data (text)
    - Indexing (regular indexes and multi-indexes)
    - Pivot tables
    - Grouping
    - Sorting
    - Joining
    - Categories
    - Working with dates and times
    - Plotting and visualization using Pandas

In [2]:
# Input in Jupyter is put into "cells"
# Each cell can contain either Python code (like this) or Markdown (like above, which creates HTML).

# If I have Python code, I can just execute it with shift+enter.

x = 10
y = 20

print(x+y)

30


In [3]:
# the entire Python backend running on my computer is one Python process
# variables and functions stick around from one cell to another

# so even though I'm in a new cell, I can still say:

print(x+y)

30


In [4]:
# A cell may contain any number of lines of Python code
# If the final line is an expression (i.e., it gives us a value back)
# then we'll see its value , even without printing

x+y

30

In [5]:
x

10

In [6]:
y

20

# Modes in Jupyter

Jupyter actually has two different "modes" -- meaning, what happens when you type.

- Edit mode (green frame around the cell, press ENTER or click inside of the cell to activate) is what you use to enter text or code.  It's what I'm using right now.
- Command mode (blue frame around the cell, press ESC or click to the left of the cell to activate) is what you use to give Jupyter commands.  

When you're in command mode, you can type many one-character commands to Jupyter:

- `c` -- copy the current cell
- `v` -- paste the current cell
- `x` -- cut the current cell
- `h` -- get help about command mode
- `a` -- add a new, empty cell *above* the current one
- `b` -- add a new, empty cell *below* the current one
- `m` -- set the mode to markdown (like now) for easy-to-write HTML
- `y` -- set the mode to code, for writing Python
- `r` -- set the mode to "raw," meaning just text that isn't marked up or executed
- `z` -- undo the latest action

Always, we can use shift+Enter to execute the cell

# Installing and starting Jupyter

1. Download and install it with `pip install -U jupyter`.  This is a command-line command, not a Python command.
2. At the command line, type `jupyter notebook`.
3. You'll see a "new" menu on the top right, and you should choose "Python 3 notebook."
4. You can rename your notebook by clicking on the title and changing it.  (View->Header, and click on the title, to rename).

# Exercise: Starting with Jupyter

1. Start up a Jupyter server on your computer. (If you haven't yet installed it, now is a good time!)
2. Start a new notebook with Jupyter.
3. Rename it to reflect today's date
4. Write some simple Python code, and execute it in Jupyter.

# Magic commands

In Jupyter, you can type whatever Python code you want, and execute it.  In addition, Jupyter has its own "magic commands," all of which start with `%` (which isn't legal in Python, so Jupyter can notice it).



In [7]:
%pwd

'/Users/reuven/Courses/Current/Cisco-2022-06June-06-analytics'

In [8]:
%ls 

Cisco-2022-06June-06-analytics.ipynb


In [9]:
%ls /etc/*.conf

/etc/AFP.conf	       /etc/newsyslog.conf	    /etc/resolv.conf@
/etc/asl.conf	       /etc/nfs.conf		    /etc/rtadvd.conf
/etc/autofs.conf       /etc/notify.conf		    /etc/slpsa.conf
/etc/kern_loader.conf  /etc/ntp.conf		    /etc/syslog.conf
/etc/launchd.conf      /etc/ntp_opendirectory.conf
/etc/man.conf	       /etc/pf.conf


In [10]:
# get a list of magic commands
%magic

In [11]:
%autosave 30

Autosaving every 30 seconds


# Shell commands

You can execute a program in your computer's shell (either the Unix shell or Windows CMD) by putting `!` at the start of a line.

In [12]:
!ls /etc/*.conf

/etc/AFP.conf	       /etc/newsyslog.conf	    /etc/resolv.conf
/etc/asl.conf	       /etc/nfs.conf		    /etc/rtadvd.conf
/etc/autofs.conf       /etc/notify.conf		    /etc/slpsa.conf
/etc/kern_loader.conf  /etc/ntp.conf		    /etc/syslog.conf
/etc/launchd.conf      /etc/ntp_opendirectory.conf
/etc/man.conf	       /etc/pf.conf


In [14]:
# I will, very often, use the "cat" and "head" commands in Unix.

!head -12 /etc/passwd

##
# User Database
# 
# Note that this file is consulted directly only when the system is running
# in single-user mode.  At other times this information is provided by
# Open Directory.
#
# See the opendirectoryd(8) man page for additional information about
# Open Directory.
##
nobody:*:-2:-2:Unprivileged User:/var/empty:/usr/bin/false
root:*:0:0:System Administrator:/var/root:/bin/sh


# Examining our environment

Jupyter defines `In`, a list of all inputs we have entered, and `Out`, a dict of all return values we've received.

In [15]:
2+2

4

In [16]:
# The %whos magic command shows me all variables and their values

%whos

Variable   Type    Data/Info
----------------------------
x          int     10
y          int     20


In [17]:
s = 'abcd'
d = {'a':1, 'b':2}

def hello(name):
    return f'Hello, {name}'

In [18]:
%whos

Variable   Type        Data/Info
--------------------------------
d          dict        n=2
hello      function    <function hello at 0x10f6711b0>
s          str         abcd
x          int         10
y          int         20


In [19]:
# if I want to see a function's source code, I can put ?? after its name
hello??

# NumPy

In [20]:
# If I have an integer in Python, how many bytes does it take up?
# if my ints are 64 bits, then they'll be 8 bytes

import sys

s = 1234
sys.getsizeof(s)

28

In [22]:
s = 12345678901234567890
sys.getsizeof(s)

36

# What is NumPy?

A C-language array of integers, with a thin wrapper of Python around it. This allows us to benefit from the best of both worlds -- we can get the speed, small size, and efficiency of C, but still work in Python.

NumPy basically defines one thing, namely a new kind of array. The array type that's defined here is known as `ndarray`, short for "n-dimensional array."

We're going to use 1- and 2-dimensional NumPy arrays, nothing too wacky or weird.

Normally, in Python, to create an object of type X, you execute X and get back a new object.

In [23]:
# Load NumPy
import numpy as np

# create a new array of integers
a = np.array([10, 20, 30, 40, 50])
a



array([10, 20, 30, 40, 50])

In [24]:
type(a)

numpy.ndarray

In [25]:
# it often looks like NumPy arrays are just like lists
a[0]

10

In [27]:
a[-1]  # final element

50

In [28]:
a[2:4]  # slice

array([30, 40])

In [29]:
# NumPy arrays are mutable
a[3] = 999
a

array([ 10,  20,  30, 999,  50])

In [30]:
# run the len function on it
len(a)

5

In [31]:
# NumPy has a bunch of methods
a.sum()

1109

In [32]:
a.mean()

221.8

In [33]:
a.std()  

388.82510207032675

In [34]:
a.min()

10

In [35]:
a.max()

999

# A few other ways to create NumPy arrays



In [36]:
np.arange(10)   # new array with 10 elements starting at 0

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [37]:
np.arange(10, 20)   # new array with 10 elements starting at 10, ending (before) 20

array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [38]:
np.arange(10, 20, 3)   # new array with 10 elements starting at 10, ending before 20, step size 3

array([10, 13, 16, 19])

In [39]:
# get random integers
np.random.randint(0, 100, 5)   # 5 integers, each pulled randomly from 0-100

array([62, 89, 21, 83, 71])

In [41]:
# get random floats -- each number in the array is from 0-1
np.random.rand(10)

array([0.89531392, 0.52862809, 0.43669085, 0.02876322, 0.38268141,
       0.95171115, 0.25083314, 0.07558865, 0.95633308, 0.88245151])

# Exercise: Simple NumPy arrays

1. Create a NumPy array with three elements -- the year, the month, and the day of your birthday.
2. Retrieve the year. Retrieve the month.
3. Replace the year with the current year.
4. Create an array with all of the numbers from 567 to 890, skipping by 3s. What is the number at index 7? What is the mean of the entire array?
5. Create an array with 500 integers, randomly chosen from 0-100.  What is the mean? What is the standard deviation.
6. Create an array of 5 ints, from 0 to 100.  What is the mean, and what is the standard deviation? How are these different from #5?

In [42]:
a = np.array([1970, 7, 14])

In [43]:
type(a)

numpy.ndarray

In [44]:
a

array([1970,    7,   14])

In [45]:
a[0]

1970

In [46]:
a[1]

7

In [47]:
a[2]

14

In [48]:
a[0] = 2022
a

array([2022,    7,   14])

In [50]:
# create an array with all of the numbers from 567 to 890, step size of 3
a = np.arange(567, 890, 3)
a

array([567, 570, 573, 576, 579, 582, 585, 588, 591, 594, 597, 600, 603,
       606, 609, 612, 615, 618, 621, 624, 627, 630, 633, 636, 639, 642,
       645, 648, 651, 654, 657, 660, 663, 666, 669, 672, 675, 678, 681,
       684, 687, 690, 693, 696, 699, 702, 705, 708, 711, 714, 717, 720,
       723, 726, 729, 732, 735, 738, 741, 744, 747, 750, 753, 756, 759,
       762, 765, 768, 771, 774, 777, 780, 783, 786, 789, 792, 795, 798,
       801, 804, 807, 810, 813, 816, 819, 822, 825, 828, 831, 834, 837,
       840, 843, 846, 849, 852, 855, 858, 861, 864, 867, 870, 873, 876,
       879, 882, 885, 888])

In [51]:
a[7]

588

In [52]:
a.mean()

727.5

In [54]:
a.sum() / len(a)

727.5

In [55]:
a = np.random.randint(0, 100, 500)
a

array([41, 52, 68, 28, 25, 27, 95, 89, 37, 60, 63,  3,  3, 57, 65, 73, 26,
       10, 12, 29, 46, 23, 57, 50, 28, 31, 52, 17,  3, 75, 23, 67, 98, 44,
       95, 93, 52, 18, 73, 67,  2, 21, 55, 89, 98, 55, 66, 28, 96, 52, 33,
       96, 85, 52, 24, 50, 15, 82, 32, 36, 79, 93, 48, 27, 46, 80, 48,  0,
       38, 16, 70, 97, 83, 28, 63,  7, 43, 71, 49, 51, 25, 64, 66, 54, 71,
       77, 16, 86, 57, 90, 65,  9, 21, 21, 50, 79, 96, 11, 70, 77, 44, 86,
       43, 29, 50, 59, 24, 89, 67, 69, 56, 90,  6, 39, 84, 97, 79, 29, 36,
       65, 72, 49, 17, 61, 65, 56, 99, 97,  4, 97, 63, 59, 40, 44,  4, 91,
       93, 60, 89, 69,  8, 97, 88, 55, 92, 71, 29,  5, 16, 60, 52, 70, 71,
       72, 76, 38, 84, 28, 39,  1,  7, 34, 62, 52, 22, 92, 43, 57, 61, 35,
       65, 51, 70, 21, 38,  0,  2, 16, 82, 57, 14, 66, 56, 29, 97, 48, 41,
       98, 66, 97,  7, 89, 87, 27, 33, 47, 31, 35, 89, 85, 95,  6, 38, 72,
       68, 66, 64, 69, 70, 49, 12, 83, 84, 26, 24, 44, 25, 95, 69, 26, 83,
       66, 99, 49, 57, 13

In [56]:
a.mean()

52.242

In [57]:
a.std()

29.026116447089507

In [58]:
a = np.random.randint(0, 100, 5)
a

array([49, 56, 16, 50, 76])

In [59]:
a.mean()

49.4

In [60]:
a.std()

19.324595726689857

In [61]:
a = np.random.randint(0, 100, 5000)
a.mean()

49.724

In [62]:
a = np.random.randint(0, 100, 50000)
a.mean()

49.53526

In [63]:
a = np.random.randint(0, 100, 5000000)
a.mean()

49.524672

In [64]:
a = np.random.randint(0, 100, 500000000)
a.mean()

49.500133526

# Where are NumPy arrays different from lists?

In [65]:
mylist = [10, 20, 30]
mylist + mylist  # can I add a list to a list?

[10, 20, 30, 10, 20, 30]

In [67]:
a = np.array([10, 20, 30])

# NumPy arrays can be added, by their indexes

a + a   # what will I get back now?

array([20, 40, 60])

In [68]:
a = np.array([10, 20, 30])
b = np.array([10, 20, 30, 40])

a + b

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

In [69]:
a * b

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

# Operations on arrays

We can use any operator on two NumPy arrays. The operation will be executed on each index. So, given arrays A and B, if we have operator x, Python will give us back a new array in which index 0 is `A[0] x B[0]`, and index 1 is `A[1] x B[1]`, and so forth.

In [70]:
a = np.array([10, 20, 30])
b = np.array([100, 200, 300])

a + b

array([110, 220, 330])

In [71]:
a - b

array([ -90, -180, -270])

In [72]:
a * b

array([1000, 4000, 9000])

In [73]:
a / b  # this is "truediv", always giving a float

array([0.1, 0.1, 0.1])

In [74]:
a // b  # this is "floordiv", removing the decimal point and anything after it

array([0, 0, 0])

In [75]:
a % b  # return the remainder

array([10, 20, 30])

In [76]:
a ** b  # exponentiation

array([0, 0, 0])

# What about operations with an array and a scalar value?

We've now seen that if we apply an operator to two arrays, the operation is handled at each index (almost as if we had a `for` loop), one by one.  The return value from index `i` will be based on `A[i]` and `B[i]`.

But.  If we use a scalar value, then the value is "broadcast" to each of the elements of the array.



In [77]:
a

array([10, 20, 30])

In [78]:
a + 3

array([13, 23, 33])

In [80]:
a  # we didn't change a

array([10, 20, 30])

In [81]:
a - 3

array([ 7, 17, 27])

In [82]:
a * 3

array([30, 60, 90])

In [83]:
a / 3

array([ 3.33333333,  6.66666667, 10.        ])

In [84]:
a // 3

array([ 3,  6, 10])

In [85]:
a % 3

array([1, 2, 0])

In [86]:
a ** 3


array([ 1000,  8000, 27000])

In [88]:
# broadcasting and random floats

# we've seen that we can get an array of random floats between 0 and 1:

# now get random floats between 0 and 10
np.random.rand(10) * 10

array([6.76927194, 5.23189611, 0.41635728, 1.71524373, 9.44062929,
       9.01420103, 0.8278796 , 1.3268744 , 8.19596229, 4.58059884])

# Exercise: Vectorized and broadcast operations

1. Create two arrays, each with 20 random integers from 0 to 1,000.
2. What are the mean and std of the array we get after adding them together?
3. Take the first array, and multiply it by 5. What is the mean of the new array you got?
4. What are the min and max values (using the `min` and `max` array methods) for each of the two arrays?

In [93]:
np.random.seed(0)   # reset the random-number generator, so that we get deterministic values back

a = np.random.randint(0, 1000, 20)
b = np.random.randint(0, 1000, 20)

# c = a+b
# c.mean()

(a+b).mean()

1107.65

In [94]:
(a+b).std()

355.2476987962061

In [96]:
a

array([684, 559, 629, 192, 835, 763, 707, 359,   9, 723, 277, 754, 804,
       599,  70, 472, 600, 396, 314, 705])

In [97]:
a*5

array([3420, 2795, 3145,  960, 4175, 3815, 3535, 1795,   45, 3615, 1385,
       3770, 4020, 2995,  350, 2360, 3000, 1980, 1570, 3525])

In [98]:
(a*5).mean()

2612.75

In [99]:
a.mean()

522.55

In [100]:
a.mean() * 5

2612.75

In [101]:
a.min()

9

In [102]:
a.max()

835

In [103]:
b.min()

72

In [104]:
b.max()

976

In [106]:
np.random.seed(0)

a = np.random.randint(0, 100, 5)
a

array([44, 47, 64, 67, 67])

In [107]:
a[2]

64

In [108]:
a[4]

67

In [109]:
# I can do "fancy indexing" -- giving a list of indexes

a[[2, 4]]  # notice: double square brackets!

array([64, 67])

In [110]:
a[[3, 0, 2, 0, 1, 0]]

array([67, 44, 64, 44, 47, 44])

In [111]:
# I can pass a list of True/False (boolean) values, and get only those elements
# where we have a True value -- this is known as a "mask index" or a "boolean index"

a[[True, False, True, False, True]]

array([44, 64, 67])

In [112]:
a

array([44, 47, 64, 67, 67])

In [113]:
# how can we create a boolean index, if not manually?
# answer: broadcasting boolean operators

a + 5

array([49, 52, 69, 72, 72])

In [114]:
a + 200

array([244, 247, 264, 267, 267])

In [115]:
# what if I ask about equality?

a == 67  # broadcast the == operator, and get back a boolean index

array([False, False, False,  True,  True])

In [116]:
# let's use that boolean index as a mask index on our array

a[a == 67]

array([67, 67])

In [117]:
a[a < 50]

array([44, 47])

In [118]:
a[a > 30]

array([44, 47, 64, 67, 67])

In [119]:
a[a > a.mean()]

array([64, 67, 67])

# Exercise: Mask indexes

1. Create a NumPy array with the temperature forecast for your city over the next 10 days.
2. On how many days will the temperature be above the average?
3. On how many days will the temperature be very hot - that is, more than the mean + std?

In [120]:
a = np.array([30, 30, 32, 33, 33, 31, 30, 29, 27, 28])
a

array([30, 30, 32, 33, 33, 31, 30, 29, 27, 28])

In [121]:
a.mean() 

30.3

In [122]:
# when will the temperature be greater than a.mean()?

a > 30.3   # broadcasting > on each element of a, comparing it with 30.3

array([False, False,  True,  True,  True,  True, False, False, False,
       False])

In [129]:
# when will the temperature be greater than a.mean()?

a > a.mean()

array([False, False,  True,  True,  True,  True, False, False, False,
       False])

In [128]:
2 > 5

False

In [124]:
# show me elements of a
# where the element is greater than 30.3

# when I apply a boolean array as an index to a, we get back only those elements of a where it's True

# [30,    30,     32,     33,    33,    31,   30,    29,    27,    28]
# [False, False,  True,  True,  True,  True, False, False, False, False]

a[a > 30.3]   

array([32, 33, 33, 31])

In [125]:
# show me elements of a
# where the element is greater than a.mean()

# (1) calculate a.mean()
# (2) broadcast a > a.mean(), getting an array of True/False values
# (3) apply that boolean array as a mask index onto a, giving us a new NumPy array with those
#  elements of a that are > a.mean()

a[a > a.mean()]

array([32, 33, 33, 31])

# Next up

1. Boolean indexes and floats
2. Complex comparisons
3. Assignments based on conditions
4. Dtypes

Resume at :50

In [130]:
# I'm going to create an array of 30 floats from 0-1,000.
# I want to find all of the numbers < the mean
# Following that, I want to find all of the numbers < the mean - 1 standard deviation

In [131]:
a = np.array([10, 10, 10, 10, 10 ,10, 10, 10])

In [132]:
a.mean()

10.0

In [133]:
a.std()

0.0

In [134]:
a = np.array([6,7,8,9,10,11,12,13,14])

In [135]:
a.mean()

10.0

In [136]:
a.std()

2.581988897471611

In [137]:
# create an array of 30 floats from 0-1,000

np.random.seed(0)
a = np.random.rand(30)
a

array([0.5488135 , 0.71518937, 0.60276338, 0.54488318, 0.4236548 ,
       0.64589411, 0.43758721, 0.891773  , 0.96366276, 0.38344152,
       0.79172504, 0.52889492, 0.56804456, 0.92559664, 0.07103606,
       0.0871293 , 0.0202184 , 0.83261985, 0.77815675, 0.87001215,
       0.97861834, 0.79915856, 0.46147936, 0.78052918, 0.11827443,
       0.63992102, 0.14335329, 0.94466892, 0.52184832, 0.41466194])

In [138]:
np.random.seed(0)
a = np.random.rand(30) * 1000
a

array([548.81350393, 715.18936637, 602.76337607, 544.883183  ,
       423.65479934, 645.89411307, 437.58721126, 891.77300078,
       963.6627605 , 383.44151883, 791.72503808, 528.89491975,
       568.04456109, 925.59663829,  71.0360582 ,  87.1292997 ,
        20.21839744, 832.61984555, 778.15675095, 870.01214825,
       978.61834223, 799.15856422, 461.47936225, 780.52917629,
       118.27442587, 639.92102133, 143.35328741, 944.66891705,
       521.84832175, 414.66193999])

In [140]:
# find numbers < mean

a < a.mean()

array([ True, False, False,  True,  True, False,  True, False, False,
        True, False,  True,  True, False,  True,  True,  True, False,
       False, False, False, False,  True, False,  True, False,  True,
       False,  True,  True])

In [141]:
# let's apply that boolean array as a mask index on a

a[a < a.mean()]

array([548.81350393, 544.883183  , 423.65479934, 437.58721126,
       383.44151883, 528.89491975, 568.04456109,  71.0360582 ,
        87.1292997 ,  20.21839744, 461.47936225, 118.27442587,
       143.35328741, 521.84832175, 414.66193999])

In [144]:
# let's find the *really* small values -- those that are 
# < a.mean() - a.std() 

a[a < a.mean() - a.std()]

array([ 71.0360582 ,  87.1292997 ,  20.21839744, 118.27442587,
       143.35328741])

# Complex comparisons

What if I have an array of 20 integers from 0-100, and I want to find even numbers?

In [145]:
np.random.seed(0)
a = np.random.randint(0, 100, 20)
a

array([44, 47, 64, 67, 67,  9, 83, 21, 36, 87, 70, 88, 88, 12, 58, 65, 39,
       87, 46, 88])

In [146]:
a%2 == 0   # if the remainder from dividing by 2 is 0, the number is even

array([ True, False,  True, False, False, False, False, False,  True,
       False,  True,  True,  True,  True,  True, False, False, False,
        True,  True])

In [149]:
# apply that boolean array as a mask index
a[a%2 == 0] # get all of the even numbers in a

array([44, 64, 36, 70, 88, 88, 12, 58, 46, 88])

In [150]:
a[a%2 == 1]  # get all of the odd numbers in a

array([47, 67, 67,  9, 83, 21, 87, 65, 39, 87])

In [151]:
# I want all of the even numbers in a, that are also < the mean

a[a%2 == 0 and a<a.mean()]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

# Boolean context

`and` (as well as `not` and `or`, and also `if` and `while`) only works with boolean (`True` and `False`) values. If it sees a non-boolean value, then it turns that value into a boolean.

In Python, everything is considered `True` in this "boolean context" (i.e., when we force data to be boolean) except for:

- `None`
- 0
- `False`
- anything empty 

NumPy breaks this rule a bit -- it doesn't allow you to call a NumPy array either `True` or `False` in boolean context, unless it contains 0 or 1 elements.

As a result, don't use `and`, `or`, and `not` with NumPy arrays.  Instead, you'll use the operators `&` (for and) `|` (for or) and `~` (for not).

In [153]:
# let's then try & instead of "and"
# the idea is that & will operate on two NumPy arrays
# wherever both items at a given index are True, we'll get True
# if zero or one element is True, then we'll get False

a[(a%2 == 0) & (a<a.mean())]

array([44, 36, 12, 58, 46])

In [154]:
(a%2 == 0)    # generate a boolean array

array([ True, False,  True, False, False, False, False, False,  True,
       False,  True,  True,  True,  True,  True, False, False, False,
        True,  True])

In [155]:
(a<a.mean())  # generate a second boolean array

array([ True,  True, False, False, False,  True, False,  True,  True,
       False, False, False, False,  True,  True, False,  True, False,
        True, False])

In [156]:
(a%2 == 0)  & (a<a.mean()) 

array([ True, False, False, False, False, False, False, False,  True,
       False, False, False, False,  True,  True, False, False, False,
        True, False])

In [158]:
a[(a%2 == 0)  & (a<a.mean()) ]    # what even numbers, less than the mean, do we see?

array([44, 36, 12, 58, 46])

# Exercises: Complex comparisons

1. Create a NumPy array of 20 random integers from 0-100.
2. What's the smallest even number that's also greater than the mean?
3. Show all numbers that are either < mean-std or >mean+std (i.e., outliers, kind of)
4. Show odd numbers that are < mean, and even numbers that are > mean.  (This will be long and horrible looking.)

In [159]:
np.random.seed(0)

a = np.random.randint(0, 100, 20)
a

array([44, 47, 64, 67, 67,  9, 83, 21, 36, 87, 70, 88, 88, 12, 58, 65, 39,
       87, 46, 88])

In [160]:
# even numbers
# greater than the mean
# smallest of them

a%2 == 0  # boolean array indicating which are even

array([ True, False,  True, False, False, False, False, False,  True,
       False,  True,  True,  True,  True,  True, False, False, False,
        True,  True])

In [161]:
a > a.mean()   # boolean array indicating which elements of a are greater than a.mean()

array([False, False,  True,  True,  True, False,  True, False, False,
        True,  True,  True,  True, False, False,  True, False,  True,
       False,  True])

In [163]:
(a%2==0) & (a>a.mean())   #boolean array indicating which are both even and greater than a.mean()

array([False, False,  True, False, False, False, False, False, False,
       False,  True,  True,  True, False, False, False, False, False,
       False,  True])

In [164]:
# apply this boolean index to a
a[(a%2==0) & (a>a.mean())]

array([64, 70, 88, 88, 88])

In [165]:
# get the smallest of these even numbers that are > mean
a[(a%2==0) & (a>a.mean())].min()

64

In [None]:
# 3. Show all numbers that are either < mean-std or >mean+std (i.e., outliers, kind of)
