 -------------------------- 9/23/25 ------------------------
# Ch7 Array-Oriented Programming with NumPy

- High-performance, richly functional n-dimensional array type called **ndarray**.
- Critical in big-data processing, AI applications and much more.
- Functional-style programming with internal iteration makes array-oriented manipulations concise and straightforward, and reduces the possibility of error.

- Functional programming (FP) is a programming paradigm — a style of building programs — where functions are treated as the fundamental building blocks.

## 7.2 Creating arrays from Existing Data¶
- Creating an array with the array function
- Argument is an array or other iterable
- Returns a new array containing the argument’s elements

In [1]:
import numpy as np # library contains arrays 

numbers = np.array([3,5,6,7,2])

In [2]:
type(numbers)

numpy.ndarray

In [3]:
numbers

array([3, 5, 6, 7, 2])

### Multidimensional Arguments

In [5]:
np.array([[1,5,6],[6,8,3]])

array([[1, 5, 6],
       [6, 8, 3]])

## 7.3 array Attributes
- attributes enable you to discover information about its structure and contents

In [18]:
import numpy as np

In [19]:
integers = np.array([[3,5,6],[3,1,5]])
integers

array([[3, 5, 6],
       [3, 1, 5]])

NumPy does not display trailing 0s:

In [20]:
floats= np.array([0.0, 0.1, 0.2, 0.3, 0.4])
floats

array([0. , 0.1, 0.2, 0.3, 0.4])

### Determining an array’s Element Type

In [12]:
integers.dtype

dtype('int64')

In [13]:
floats.dtype

dtype('float64')

In [14]:
numbers.dtype

dtype('int64')

### Determining an array’s Dimensions

- ndim contains an array’s number of dimensions
- shape contains a tuple specifying an array’s dimensions

In [15]:
integers.ndim

2

In [17]:
numbers.ndim

1

In [21]:
integers.shape
# 2 dimensions and 3 elements in each 

(2, 3)

In [19]:
numbers.shape

(5,)

### Determining an array’s Number of Elements and Element Size
- view an array’s total number of elements with size
- view number of bytes required to store each element with itemsize

In [20]:
integers.size

6

In [21]:
floats.size

5

In [23]:
floats.itemsize

8

### Iterating through a Multidimensional array’s Elements

In [26]:
for row in integers: 
    for column in row:
        print(column, end='  ')
    print()

3  5  6  
3  1  5  


- Iterate through a multidimensional array as if it were one-dimensional by using flat

In [27]:
for i in integers.flat: 
    print(i, end = '  ')

3  5  6  3  1  5  

## 7.4 Filling arrays with Specific Values
- Functions zeros, ones and full create arrays containing 0s, 1s or a specified value, respectively

In [29]:
import numpy as np
np.zeros(5)

array([0., 0., 0., 0., 0.])

- For a tuple of integers, these functions return a multidimensional array with the specified dimensions

In [30]:
np.ones((2,4), dtype = int)

array([[1, 1, 1, 1],
       [1, 1, 1, 1]])

In [32]:
np.full((3,5), 13)

array([[13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13]])

## 7.5 Creating arrays from Ranges
- NumPy provides optimized functions for creating arrays from ranges

### Creating Integer Ranges with arange

In [56]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [57]:
np.arange(5,10)

array([5, 6, 7, 8, 9])

In [58]:
np.arange(10,1,-2)

array([10,  8,  6,  4,  2])

### Creating Floating-Point Ranges with linspace
- Produce evenly spaced floating-point ranges with NumPy’s linspace function
- Ending value is included in the array

In [59]:
np.linspace(0.0, 1.0, num=5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

### Reshaping an array
- array method reshape transforms an array into different number of dimensions
- New shape must have the same number of elements as the original

In [37]:
np.arange(1,21).reshape(4,5)

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

## Displaying Large arrays
- When displaying an array, if there are 1000 items or more, NumPy drops the middle rows, columns or both from the output

In [39]:
np.arange(1,100001).reshape(4,25000)

array([[     1,      2,      3, ...,  24998,  24999,  25000],
       [ 25001,  25002,  25003, ...,  49998,  49999,  50000],
       [ 50001,  50002,  50003, ...,  74998,  74999,  75000],
       [ 75001,  75002,  75003, ...,  99998,  99999, 100000]])

In [41]:
np.arange(1, 100001).reshape(100, 1000)

array([[     1,      2,      3, ...,    998,    999,   1000],
       [  1001,   1002,   1003, ...,   1998,   1999,   2000],
       [  2001,   2002,   2003, ...,   2998,   2999,   3000],
       ...,
       [ 97001,  97002,  97003, ...,  97998,  97999,  98000],
       [ 98001,  98002,  98003, ...,  98998,  98999,  99000],
       [ 99001,  99002,  99003, ...,  99998,  99999, 100000]])

------------------------- 9/24/25 -----------------------

# 7.6 List vs. array Performance: Introducing %timeit
- Most array operations execute significantly faster than corresponding list operations
- IPython %timeit magic command times the average duration of operations

### Magics 
- are special commands available in IPython and Jupyter Notebooks.
- They are not part of standard Python syntax — they’re extensions that make interactive work easier.
- They always start with % (line magics) or %% (cell magics).

### Two kinds of magics
**Line magics (%)**
Apply to a single line. Example:

- %timeit [x**2 for x in range(1000)]  # times this one line


**Cell magics (%%)**
Apply to an entire cell. Example:

- %%time
- squares = [x**2 for x in range(10_000_000)]
- print(len(squares))

### Timing the Creation of a List Containing Results of 6,000,000 Die Rolls¶


- By default, %timeit executes a statement in a loop, and it runs the loop seven times
- If you do not indicate the number of loops, %timeit chooses an appropriate value
- After executing the statement, %timeit displays the statement’s average execution time, as well as the standard deviation of all the executions

In [60]:
import random
# specifically for JUPYTER : 
%time
rolls_list = [random.randrange(1, 7) for _ in range(6_000_000)]

CPU times: total: 0 ns
Wall time: 3.81 μs


In [61]:
## FOR PYTHON IDE
%timeit rolls_list = \
   [random.randrange(1, 7) for i in range(0, 6_000_000)]

2.42 s ± 204 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Timing the Creation of an array Containing Results of 6,000 Die Rolls

In [62]:
import numpy as np
%timeit rolls = np.random.randint(1,7,6000)

48.8 μs ± 5.22 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


### Customizing the %timeit Iterations

In [63]:
%timeit -n3 -r2 rolls_array = np.random.randint(1, 7, 6_000)


207 μs ± 80.9 μs per loop (mean ± std. dev. of 2 runs, 3 loops each)


### Other IPython Magics

IPython provides dozens of magics for a variety of tasks. For a complete list, see the [IPython magics documentation](https://ipython.readthedocs.io/en/stable/interactive/magics.html).  
Here are a few helpful ones:

- `%load` — read code into IPython from a local file or URL.
- `%save` — save snippets to a file.
- `%run` — execute a `.py` file from IPython.
- `%precision` — change the default floating-point precision for IPython outputs.
- `%cd` — change directories without having to exit IPython first.
- `%edit` — launch an external editor (useful for modifying more complex snippets).
- `%history` — view a list of all snippets and commands executed in the current IPython session.

### 7.7 array Operators
- array operators perform operations on entire arrays.
- Can perform arithmetic between arrays and scalar numeric values, and between arrays of the same shape.

In [41]:
import numpy as np

number = np.arange(1,6)
number

array([1, 2, 3, 4, 5])

In [42]:
number * 2

array([ 2,  4,  6,  8, 10])

In [43]:
number ** 3

array([  1,   8,  27,  64, 125])

In [44]:
number # number is unchanged by arithmetic operators 

array([1, 2, 3, 4, 5])

In [46]:
number += 10
number

array([21, 22, 23, 24, 25])

### Broadcasting
- Arithmetic operations require as operands two arrays of the same size and shape.
- numbers * 2 is equivalent to numbers * [2, 2, 2, 2, 2] for a 5-element array.
- Applying the operation to every element is called broadcasting.
- Also can be applied between arrays of different sizes and shapes, enabling some concise and powerful manipulations.

### Arithmetic Operations Between arrays
- Can perform arithmetic operations and augmented assignments between arrays of the same shape

In [56]:
number2 = np.linspace(1.1, 5.5, 5)
number2

array([1.1, 2.2, 3.3, 4.4, 5.5])

In [57]:
number * number2

array([ 23.1,  48.4,  75.9, 105.6, 137.5])

### Comparing arrays
- Can compare arrays with individual values and with other arrays
- Comparisons performed element-wise
- Produce arrays of Boolean values in which each element’s True or False value indicates the comparison result

In [58]:
number >=13

array([ True,  True,  True,  True,  True])

In [60]:
number2 < number

array([ True,  True,  True,  True,  True])

# 7.8 NumPy Calculation Methods
- These methods ignore the array’s shape and use all the elements in the calculations.
- Consider an array representing four students’ grades on three exams:

In [61]:
import numpy as np

grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])

grades

array([[ 87,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

- Can use methods to calculate sum, min, max, mean, std (standard deviation) and var (variance)
- Each is a functional-style programming reduction

In [62]:
grades.sum()

np.int64(1054)

In [64]:
grades.min()

np.int64(70)

In [65]:
grades.max()

np.int64(100)

In [66]:
grades.mean()

np.float64(87.83333333333333)

In [67]:
grades.std()

np.float64(8.792357792739987)

In [68]:
grades.var()

np.float64(77.30555555555556)

# Calculations by Row or Column
- You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions)
- Each 2D+ array has one axis per dimension
- In a 2D array, axis=0 indicates calculations should be column-by-column

In [69]:
grades.mean(axis =0)

array([95.25, 85.25, 83.  ])

In [70]:
grades.mean(axis=1)

array([84.33333333, 92.33333333, 87.        , 87.66666667])

# 7.9 Universal Functions
- Standalone universal functions (ufuncs) perform element-wise operations using one or two array or array-like arguments (like lists)
- Each returns a new array containing the results
- Some ufuncs are called when you use array operators like + and *
- Create an array and calculate the square root of its values, using the sqrt universal function

In [64]:
import numpy as np
numbers = np.array([1, 4, 9, 16, 25, 36])
np.sqrt(numbers)


array([1., 2., 3., 4., 5., 6.])

- Add two arrays with the same shape, using the add universal function
- Equivalent to:
numbers + numbers2

In [3]:
numbers2 = np.arange(1, 7) * 10
numbers2

array([10, 20, 30, 40, 50, 60])

In [4]:
np.add(numbers, numbers2)


array([11, 24, 39, 56, 75, 96])

### Broadcasting with Universal Functions
- Universal functions can use broadcasting, just like NumPy array operators

In [5]:
np.multiply(numbers2, 5)
numbers3 = numbers2.reshape(2, 3)
numbers3

array([[10, 20, 30],
       [40, 50, 60]])

In [6]:
numbers4 = np.array([2, 4, 6])
np.multiply(numbers3, numbers4)


array([[ 20,  80, 180],
       [ 80, 200, 360]])

### **Other Universal Functions**  
**NumPy universal functions**  
**Math** — add, subtract, multiply, divide, remainder, exp, log, sqrt, power, and more.  
**Trigonometry** — sin, cos, tan, hypot, arcsin, arccos, arctan, and more.  
**Bit manipulation** — bitwise_and, bitwise_or, bitwise_xor, invert, left_shift, and right_shift.  
**Comparison** — greater, greater_equal, less, less_equal, equal, not_equal, logical_and, logical_or, logical_xor, logical_not, minimum, maximum, and more.  
**Floating point** — floor, ceil, isinf, isnan, fabs, trunc, and more.


### 7.10 Indexing and Slicing¶
- One-dimensional arrays can be indexed and sliced like lists.
- Indexing with Two-Dimensional arrays
- To select an element in a two-dimensional array, specify a tuple containing the element’s row and column indices in square brackets

In [8]:
import numpy as np
grades = np.array([[87, 96, 70], [100, 87, 90],
                   [94, 77, 90], [100, 81, 82]])
grades

array([[ 87,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

In [9]:
grades[0, 1]  # row 0, column 1

np.int64(96)

### Selecting a Subset of a Two-Dimensional array’s Rows¶
- To select a single row, specify only one index in square brackets

In [10]:
grades[1]


array([100,  87,  90])

In [68]:
grades[0:2]


Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,100,87,81,65


In [70]:
grades[[1, 3]]


KeyError: "None of [Index([1, 3], dtype='int64')] are in the [columns]"

## Selecting a Subset of a Two-Dimensional array’s Columns
- The column index also can be a specific index, a slice or a list

In [13]:
grades[:, 0]

array([ 87, 100,  94, 100])

In [14]:
grades[:, 1:3]

array([[96, 70],
       [87, 90],
       [77, 90],
       [81, 82]])

In [15]:
grades[:, [0, 2]]

array([[ 87,  70],
       [100,  90],
       [ 94,  90],
       [100,  82]])

## 7.11 Views: Shallow Copies
- Views “see” the data in other objects, rather than having their own copies of the data
- Views are shallow copies *array method view returns a new array object with a view of the original array object’s data

In [18]:
import numpy as np
numbers = np.arange(1, 6)
numbers

array([1, 2, 3, 4, 5])

In [19]:
numbers2 = numbers.view()
numbers2

array([1, 2, 3, 4, 5])

In [22]:
id(numbers)

1880940299152

In [21]:
id(numbers2)

1880940043344

In [23]:
numbers[1] *= 10
numbers2


In [None]:
numbers


In [None]:
numbers2[1] /= 10


In [None]:
numbers


In [None]:
numbers2

## Slice Views
- Slices also create views

In [25]:
numbers2 = numbers[0:3]
numbers2

array([ 1, 20,  3])

In [26]:
id(numbers)

1880940299152

In [27]:
id(numbers2)

1880940299248

In [None]:
# Confirm that numbers2 is a view of only first three numbers elements

numbers2[3]


Modify an element both arrays share to show both are updated

In [28]:
numbers[1] *= 20
numbers


array([  1, 400,   3,   4,   5])

In [29]:
numbers2


array([  1, 400,   3])

-- -------------- 9/30/25 --------------------

# 7.12 Deep Copies
- When sharing mutable values, sometimes it’s necessary to create a deep copy of the original data
- Especially important in multi-core programming, where separate parts of your program could attempt to modify your data at the same time, possibly corrupting it
- array method copy returns a new array object with an independent copy of the original array's data

In [3]:
import numpy as np

numbers = np.arange(1,6)
numbers

array([1, 2, 3, 4, 5])

In [4]:
numbers2 = numbers.copy()

In [5]:
numbers2

array([1, 2, 3, 4, 5])

In [6]:
numbers[1] = 10
numbers

array([ 1, 10,  3,  4,  5])

In [7]:
numbers2

array([1, 2, 3, 4, 5])

# 7.13 Reshaping and Transposing
- reshape vs. resize
- Method reshape returns a view (shallow copy) of the original array with new dimensions
- Does not modify the original array

In [8]:
import numpy as np
grades = np.array([[87,96,70],[100,87,90]])
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [9]:
grades.reshape(1,6)

array([[ 87,  96,  70, 100,  87,  90]])

In [10]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

- Method resize modifies the original array’s shape

In [11]:
grades.resize(1,6)

In [12]:
grades

array([[ 87,  96,  70, 100,  87,  90]])

### flatten vs. ravel
- Can flatten a multi-dimensonal array into a single dimension with methods flatten and ravel
- flatten deep copies the original array’s data

In [14]:
grades = np.array([[87,96,70],[100,87,90]])

In [15]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [16]:
flattened = grades.flatten()
flattened

array([ 87,  96,  70, 100,  87,  90])

In [17]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [18]:
flattened[0] = 100
flattened

array([100,  96,  70, 100,  87,  90])

In [19]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

- Method ravel produces a view of the original array, which shares the grades array's data 

In [24]:
raveled = grades.ravel()
raveled

array([100,  96,  70, 100,  87,  90])

In [23]:
grades

array([[100,  96,  70],
       [100,  87,  90]])

In [22]:
raveled[0] = 100
raveled

array([100,  96,  70, 100,  87,  90])

In [21]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

### Transposing Rows and Columns
- Can quickly transpose an array’s rows and columns
- “flips” the array, so the rows become the columns and the columns become the rows
- T attribute returns a transposed view (shallow copy) of the array

In [26]:
grades.T

array([[100, 100],
       [ 96,  87],
       [ 70,  90]])

In [25]:
grades

array([[100,  96,  70],
       [100,  87,  90]])

### Horizontal and Vertical Stacking
- Can combine arrays by adding more columns or more rows—known as horizontal stacking and vertical stacking

In [28]:
grades2 = np.array([[94, 77, 90], [100, 81, 82]])

- Combine grades and grades2 with NumPy’s hstack (horizontal stack) function by passing a tuple containing the arrays to combine
- The extra parentheses are required because hstack expects one argument
- Adds more columns

In [29]:
np.hstack((grades, grades2))

array([[100,  96,  70,  94,  77,  90],
       [100,  87,  90, 100,  81,  82]])

- Combine grades and grades2 with NumPy’s vstack (vertical stack) function
- Adds more rows

In [30]:
np.vstack((grades, grades2))

array([[100,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

-- ----------------- 10/1/25 ------------
## 7.14.1 pandas **Series**
- An enhanced one-dimensional array
- Supports custom indexing, including even non-integer indices like strings
- Offers additional capabilities that make them more convenient for many data-science oriented tasks
  - Series may have missing data
  - Many Series operations ignore missing data by default

### Creating a Series with Default Indices
- By default, a Series has integer indices numbered sequentially from 0

In [5]:
import pandas as pd
grades = pd.Series([87,100,94])

## Creating a Series with All Elements Having the Same Value
- Second argument is a one-dimensional iterable object (such as a list, an array or a range) containing the Series’ indices
- Number of indices determines the number of elements

In [3]:
pd.Series(98.6, range(3))

0    98.6
1    98.6
2    98.6
dtype: float64

### Accessing a Series’ Elements

In [4]:
grades[0]

np.int64(87)

### Producing Descriptive Statistics for a Series
- Series provides many methods for common tasks including producing various descriptive statistics
- Each of these is a functional-style reduction

In [6]:
grades.count()

np.int64(3)

In [7]:
grades.mean()

np.float64(93.66666666666667)

In [8]:
grades.min()

87

In [9]:
grades.max()

100

In [10]:
grades.std()

6.506407098647712

- Series method **describe** produces all these stats and more
- The 25%, 50% and 75% are quartiles:
  - 50% represents the median of the sorted values.
  - 25% represents the median of the first half of the sorted values.
  - 75% represents the median of the second half of the sorted values.
- For the quartiles, if there are two middle elements, then their average is that quartile’s median

In [11]:
grades.describe()

count      3.000000
mean      93.666667
std        6.506407
min       87.000000
25%       90.500000
50%       94.000000
75%       97.000000
max      100.000000
dtype: float64

### Creating a Series with Custom Indices
- Can specify custom indices with the index keyword argument

In [12]:
grades = pd.Series([87,100,94], index=['Wally', 'Eva', 'Sam'])

In [13]:
grades

Wally     87
Eva      100
Sam       94
dtype: int64

### Accessing Elements of a Series Via Custom Indices
Can access individual elements via square brackets containing a custom index value

In [14]:
grades['Eva']

np.int64(100)

- If custom indices are strings that could represent valid Python identifiers, pandas automatically adds them to the Series as attributes

In [15]:
grades.Wally


np.int64(87)

- dtype attribute returns the underlying array’s element type

In [16]:
grades.dtype

dtype('int64')

- values attribute returns the underlying array

In [17]:
grades.values

array([ 87, 100,  94])

### Creating a Series of Strings
- In a Series of strings, you can use str attribute to call string methods on the elements

In [18]:
hardware = pd.Series(['Hammer', 'Saw', 'Wrench'])

In [19]:
hardware

0    Hammer
1       Saw
2    Wrench
dtype: object

In [20]:
hardware.str.contains('a')

0     True
1     True
2    False
dtype: bool

In [71]:
hardware.str.upper()

NameError: name 'hardware' is not defined

--------- 10 / 6/ 25 --------------

# 7.14.2 DataFrames
- Enhanced two-dimensional array
- Can have custom row and column indices
- Offers additional operations and capabilities that make them more convenient for many data-science oriented tasks
- Support missing data
- Each column in a DataFrame is a Series

Creating a DataFrame from a Dictionary¶
- Create a DataFrame from a dictionary that represents student grades on three exams

In [1]:
import pandas as pd
grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90],
               'Sam': [94, 77, 90], 'Katie': [100, 81, 82],
               'Bob': [83, 65, 85]}

In [2]:
grades = pd.DataFrame(grades_dict)

In [3]:
grades

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
0,87,100,94,100,83
1,96,87,77,81,65
2,70,90,90,82,85


# Customizing a DataFrame’s Indices with the index Attribute
- Can use the index attribute to change the DataFrame’s indices from sequential integers to labels
- Must provide a one-dimensional collection that has the same number of elements as there are rows in the DataFrame

In [81]:
grades.index = ['Test1', 'Test2', 'Test3']

In [75]:
grades

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,100,87,81,65
Test3,70,90,90,82,85


# Accessing a DataFrame’s Columns
- Can quickly and conveniently look at your data in many different ways, including selecting portions of the data
- Get Eva’s grades by name
- Displays her column as a Series

In [76]:
grades['Eva']


Test1    100
Test2    100
Test3     90
Name: Eva, dtype: int64

- If a DataFrame’s column-name strings are valid Python identifiers, you can use them as attributes

In [77]:
grades.Sam

Test1    94
Test2    87
Test3    90
Name: Sam, dtype: int64

### Selecting Rows via the loc and iloc Attributes¶
- DataFrames support indexing capabilities with [], but pandas documentation recommends using the attributes loc, iloc, at and iat
- Optimized to access DataFrames and also provide additional capabilities
- Access a row by its label via the DataFrame’s loc attribute

In [78]:
grades.loc['Test1']

Wally     87
Eva      100
Sam       94
Katie    100
Bob       83
Name: Test1, dtype: int64

In [80]:
grades.iloc[1]

Wally     96
Eva      100
Sam       87
Katie     81
Bob       65
Name: Test2, dtype: int64

### Selecting Rows via Slices and Lists with the loc and iloc Attributes
- Index can be a slice
     - Slicing data refers to extracting a subset of elements from a larger data structure, such as a list, string, or array, based on specific criteria like index ranges or conditions.
- When using slices containing labels with loc, the range specified includes the high index ('Test3'):

In [12]:
grades.loc['Test1':'Test3']

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,65
Test3,70,90,90,82,85


- When using slices containing integer indices with iloc, the range you specify excludes the high index (2):

In [13]:
grades.iloc[0:2]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,65


- Select specific rows with a list

In [14]:
grades.loc[['Test1', 'Test3']]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test3,70,90,90,82,85


In [15]:
grades.iloc[[0, 2]]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test3,70,90,90,82,85


## Selecting Subsets of the Rows and Columns
- View only Eva’s and Katie’s grades on Test1 and Test2

In [16]:
grades.loc['Test1':'Test2', ['Eva', 'Katie']]

Unnamed: 0,Eva,Katie
Test1,100,100
Test2,87,81


In [84]:
grades.loc['Test3',['Bob']]

Bob    85
Name: Test3, dtype: int64

- Use iloc with a list and a slice to select the first and third tests and the first three columns for those tests


In [87]:
grades.iloc[[0, 2], 0:3]

Unnamed: 0,Wally,Eva,Sam
Test1,87,100,94
Test3,70,90,90


### Boolean Indexing
- One of pandas’ more powerful selection capabilities is Boolean indexing
- Select all the A grades—that is, those that are greater than or equal to 90:
- Pandas checks every grade to determine whether its value is greater than or equal to 90 and, if so, includes it in the new DataFrame.
- Grades for which the condition is False are represented as NaN (not a number) in the new `DataFrame
- NaN is pandas’ notation for missing values

In [23]:
grades[grades >= 90]


Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,,100.0,94.0,100.0,
Test2,96.0,,,,
Test3,,90.0,90.0,,


- Pandas Boolean indices combine multiple conditions with the Python operator & (bitwise AND), not the and Boolean operator
- For or conditions, use | (bitwise OR)
- NumPy also supports Boolean indexing for arrays, but always returns a one-dimensional array containing only the values that satisfy the condition

In [88]:
grades[(grades >= 80) & (grades < 90)]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87.0,,,,83.0
Test2,,,87.0,81.0,
Test3,,,,82.0,85.0


- Accessing a Specific DataFrame Cell by Row and Column
- DataFrame method at and iat attributes get a single value from a DataFrame

In [103]:
grades.at['Test2', 'Eva']


np.int64(100)

In [104]:
grades.iat[2, 0]


np.int64(70)

In [105]:
grades.at['Test2', 'Eva'] = 100

In [106]:
grades.at['Test2', 'Eva']


np.int64(100)

In [107]:
grades.iat[1, 2] = 87


In [108]:
grades.iat[1, 2]

np.int64(87)

#### Descriptive Statistics
- DataFrames describe method calculates basic descriptive statistics for the data and returns them as a DataFrame
- Statistics are calculated by column

In [109]:
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.333333,96.666667,90.333333,87.666667,77.666667
std,13.203535,5.773503,3.511885,10.692677,11.015141
min,70.0,90.0,87.0,81.0,65.0
25%,78.5,95.0,88.5,81.5,74.0
50%,87.0,100.0,90.0,82.0,83.0
75%,91.5,100.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


### Quick way to summarize your data
- Nicely demonstrates the power of array-oriented programming with a clean, concise functional-style call
- Can control the precision and other default settings with pandas’ set_option function

In [114]:
pd.set_option('display.precision', 2)
# The command pd.set_option('display.precision', 2) sets the display precision 
# for floating-point numbers in pandas DataFrames to 2 decimal places, meaning all numeric values will be shown with two decimal points when printed.

In [115]:
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.33,96.67,90.33,87.67,77.67
std,13.2,5.77,3.51,10.69,11.02
min,70.0,90.0,87.0,81.0,65.0
25%,78.5,95.0,88.5,81.5,74.0
50%,87.0,100.0,90.0,82.0,83.0
75%,91.5,100.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


- For student grades, the most important of these statistics is probably the mean
- Can calculate that for each student simply by calling mean on the DataFrame

In [119]:
grades.mean()

Wally    84.33
Eva      96.67
Sam      90.33
Katie    87.67
Bob      77.67
dtype: float64

- Transposing the DataFrame with the T Attribute
- Can quickly transpose rows and columns—so the rows become the columns, and the columns become the rows—by using the T attribute to get a view

In [121]:
grades.T

Unnamed: 0,Test1,Test2,Test3
Wally,87,96,70
Eva,100,100,90
Sam,94,87,90
Katie,100,81,82
Bob,83,65,85


- Assume that rather than getting the summary statistics by student, you want to get them by test
- Call describe on grades.T

In [123]:
grades.T.describe()
#Get average of all the students’ grades on each test


Unnamed: 0,Test1,Test2,Test3
count,5.0,5.0,5.0
mean,92.8,85.8,83.4
std,7.66,13.81,8.23
min,83.0,65.0,70.0
25%,87.0,81.0,82.0
50%,94.0,87.0,85.0
75%,100.0,96.0,90.0
max,100.0,100.0,90.0


In [124]:
grades.T.mean()


Test1    92.8
Test2    85.8
Test3    83.4
dtype: float64

### Sorting by Rows by Their Indices
- Can sort a DataFrame by its rows or columns, based on their indices or values
- Sort the rows by their indices in descending order using sort_index and its keyword argument ascending=False

In [125]:
grades.sort_index(ascending=False)


Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test3,70,90,90,82,85
Test2,96,100,87,81,65
Test1,87,100,94,100,83


### Sorting by Column Indices
- Sort columns into ascending order (left-to-right) by their column names
- axis=1 keyword argument indicates that we wish to sort the column indices, rather than the row indices
- axis=0 (the default) sorts the row indices

In [127]:
grades.sort_index(axis=1)


Unnamed: 0,Bob,Eva,Katie,Sam,Wally
Test1,83,100,100,94,87
Test2,65,100,81,87,96
Test3,85,90,82,90,70


### Sorting by Column Values
- To view Test1’s grades in descending order so we can see the students’ names in highest-to-lowest grade order, call method sort_values 
- by and axis arguments work together to determine which values will be sorted
- In this case, we sort based on the column values (axis=1) for Test1

In [130]:
grades.sort_values(by='Test1', axis=1, ascending=False)


Unnamed: 0,Eva,Katie,Sam,Wally,Bob
Test1,100,100,94,87,83
Test2,100,81,87,96,65
Test3,90,82,90,70,85


- Might be easier to read the grades and names if they were in a column
- Sort the transposed DataFrame instead

In [131]:
grades.T.sort_values(by='Test1', ascending=False)


Unnamed: 0,Test1,Test2,Test3
Eva,100,100,90
Katie,100,81,82
Sam,94,87,90
Wally,87,96,70
Bob,83,65,85


- Since we’re sorting only Test1’s grades, we might not want to see the other tests at all
- Combine selection with sorting

In [132]:
grades.loc['Test1'].sort_values(ascending=False)


Eva      100
Katie    100
Sam       94
Wally     87
Bob       83
Name: Test1, dtype: int64

### Copy vs. In-Place Sorting¶
- sort_index and sort_values return a copy of the original DataFrame
- Could require substantial memory in a big data application
- Can sort in place by passing the keyword argument inplace=True