# Chapter 7: Array-Oriented Programming with NumPy

[Numpy documentation here](https://numpy.org/doc/stable/index.html)

### 7.2 Creating arrays from existing data

In [2]:
import numpy as np

numbers = np.array([2, 3, 5, 7, 11])
print(numbers)

type(numbers)

[ 2  3  5  7 11]


numpy.ndarray

Multidimensional arguments

In [3]:
np.array([[1, 2, 3], [4, 5, 6]])

array([[1, 2, 3],
       [4, 5, 6]])

### 7.2 Self Check

In [7]:
numbers1 = np.array([item for item in range(2, 21, 2)])
print(numbers1)

[ 2  4  6  8 10 12 14 16 18 20]


In [9]:
numbers2 = np.array([[item for item in range(2, 11, 2)], [item for item in range(1, 10, 2)]])
print(numbers2)

[[ 2  4  6  8 10]
 [ 1  3  5  7  9]]


### 7.3 array attributes

In [13]:
import numpy as np

integers = np.array([[1, 2, 3], [4, 5, 6]])
print(integers)

floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4])
print(floats)

[[1 2 3]
 [4 5 6]]
[0.  0.1 0.2 0.3 0.4]


**Determining an array's element type**

In [17]:
integers.dtype

dtype('int64')

In [18]:
floats.dtype

dtype('float64')

**Determing an array's dimensions**

In [19]:
integers.ndim

2

In [20]:
floats.ndim

1

In [21]:
integers.shape

(2, 3)

In [22]:
floats.shape

(5,)

**Determining an array's number of elements and element size**
- size: total number of elements
- itemsize: number of bytes required to store each element

In [23]:
integers.size

6

In [24]:
integers.itemsize

8

In [25]:
floats.size

5

In [26]:
floats.itemsize

8

**Iterating through a multidimensional array's elements**

In [29]:
for row in integers:
    for column in row:
        print(column, end=' ')
    print()

1 2 3 
4 5 6 


**You can iterate through a multidimensional array as if it were one-dimensional by using its flat attribute**

In [30]:
for i in integers.flat:
    print(i, end=' ')

1 2 3 4 5 6 

### 7.3 Self Check

In [32]:
print(numbers2.ndim)
print(numbers2.shape)

2
(2, 5)


### 7.4 Filling arrays with specific values

In [34]:
import numpy as np

np.zeros(5)

array([0., 0., 0., 0., 0.])

In [38]:
np.ones((2, 4), dtype=float) #Specify the array's element type with the dtype argument

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [37]:
np.full((3, 5), 13)

array([[13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13],
       [13, 13, 13, 13, 13]])

### 7.5 Creating arrays from ranges

**Creating integer ranges with arrange**

In [40]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [41]:
np.arange(5, 10)

array([5, 6, 7, 8, 9])

In [43]:
np.arange(10, 1, -2)

array([10,  8,  6,  4,  2])

**Creating floating-point ranges with linespace**: create evenly spaced floating-point ranges

In [52]:
np.linspace(0.0, 1.0, num=5)
# first, second numbers: starting and ending values (ending value is including in the array)
# last number: number of evenly spaced values to produce (the default value is 50)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [53]:
np.linspace(0.0, 1.0, 20)

array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
       0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
       0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
       0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ])

**Reshaping an array**

In [55]:
np.arange(1, 21).reshape(4, 5) # chained method call

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20]])

**Displaying large arrays**

In [56]:
np.arange(1, 100001).reshape(4, 25000)

array([[     1,      2,      3, ...,  24998,  24999,  25000],
       [ 25001,  25002,  25003, ...,  49998,  49999,  50000],
       [ 50001,  50002,  50003, ...,  74998,  74999,  75000],
       [ 75001,  75002,  75003, ...,  99998,  99999, 100000]])

In [57]:
np.arange(1, 100001).reshape(100, 1000)

array([[     1,      2,      3, ...,    998,    999,   1000],
       [  1001,   1002,   1003, ...,   1998,   1999,   2000],
       [  2001,   2002,   2003, ...,   2998,   2999,   3000],
       ...,
       [ 97001,  97002,  97003, ...,  97998,  97999,  98000],
       [ 98001,  98002,  98003, ...,  98998,  98999,  99000],
       [ 99001,  99002,  99003, ...,  99998,  99999, 100000]])

### 7.5 Self Check

In [58]:
np.arange(2, 41, 2).reshape(4, 5)

array([[ 2,  4,  6,  8, 10],
       [12, 14, 16, 18, 20],
       [22, 24, 26, 28, 30],
       [32, 34, 36, 38, 40]])

### 7.6 Python Magic

- %timeit -n3 -r2
- %load: to read code into Ipython from a local file or URL
- %run: to execute a .py file from Ipython
- %precision: to change the default floating-point precision for ipython outputs
- %cd: to change directories without having to exit ipython first
- %edit to launch an external editor
- %history to view a list of all snippets and commands you've executed in the current ipython session

### 7.7 array operators

**Arithmetic operations with arrays and individual numeric values**: each returns a new array containing the result

In [4]:
import numpy as np
numbers = np.arange(1, 6)
numbers * 2

array([ 2,  4,  6,  8, 10])

In [5]:
numbers ** 3

array([  1,   8,  27,  64, 125])

In [6]:
numbers # numbers is unchanged by the arithmetic operations

array([1, 2, 3, 4, 5])

**Augmented assignments modify every element in the left operand**

In [7]:
numbers += 10
print(numbers)

[11 12 13 14 15]


**Broadcasting**: When one operand is a single value, called a scalar, NumPy performs the element-wise calculations as if the scalar were an array of the same shape as the other operand, but with the scalar value in all its elements.

In [8]:
numbers * [2, 2, 2, 2, 2]

array([22, 24, 26, 28, 30])

In [9]:
numbers * 2

array([22, 24, 26, 28, 30])

**Arithmetic operations between arrays**: you may perform arithmetic operations and augmented assignments between arrays of the same shape

In [12]:
numbers2 = np.linspace(1.1, 5.5, 5)
print(numbers2)

numbers * numbers2

[1.1 2.2 3.3 4.4 5.5]


array([12.1, 26.4, 42.9, 61.6, 82.5])

**Comparing arrays**: comparisons are performed element-wise

In [13]:
numbers

array([11, 12, 13, 14, 15])

In [14]:
numbers >= 13

array([False, False,  True,  True,  True])

In [15]:
numbers2

array([1.1, 2.2, 3.3, 4.4, 5.5])

In [16]:
numbers2 < numbers

array([ True,  True,  True,  True,  True])

In [17]:
numbers == numbers2

array([False, False, False, False, False])

In [18]:
numbers == numbers

array([ True,  True,  True,  True,  True])

### 7.7 Self Check

In [1]:
import numpy as np

numbers5 = np.array(range(1,6))

numbers6 = numbers5 ** 2

print(numbers6)

[ 1  4  9 16 25]


### 7.8 Numpy Calculation Methods

By default, these methods ignore the array's shape and use all the elements in the calculations.

In [3]:
import numpy as np

grades = np.array([[87, 96, 70], [100, 87, 90],
                  [94, 77, 90], [100, 81, 82]])

grades

array([[ 87,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

Each of the following is a functional-style programming reduction:

In [4]:
grades.sum()

1054

In [5]:
grades.min()

70

In [6]:
grades.max()

100

In [7]:
grades.mean()

87.83333333333333

In [8]:
grades.std()

8.792357792739987

In [9]:
grades.var()

77.30555555555556

**Calculations by row or column**

In [19]:
grades.mean(axis=0) # Specifying axis=0 performs the calculation on all the row values within each column.
# This calculates the mean test grade for each test

array([95.25, 85.25, 83.  ])

In [20]:
grades.mean(axis=1) # Specifying axis=1 performs the calculation on all the column values within each individual row.
# This calculates the mean test grade for each student.

array([84.33333333, 92.33333333, 87.        , 87.66666667])

### 7.8 Self Check

In [18]:
import numpy as np

df1 = np.random.randint(60, 100, 12).reshape(3, 4)

print(df1)

print(df1.mean()) # average of all numbers

print(df1.mean(axis=0)) # column average

print(df1.mean(axis=1)) # row average

[[74 84 72 70]
 [99 77 88 63]
 [94 91 62 60]]
77.83333333333333
[89.         84.         74.         64.33333333]
[75.   81.75 76.75]


### 7.9 Universal Functions

In [21]:
import numpy as np

numbers = np.array([1, 4, 9, 16, 25, 36])

np.sqrt(numbers)

array([1., 2., 3., 4., 5., 6.])

In [23]:
numbers2 = np.arange(1, 7) * 10
numbers2

array([10, 20, 30, 40, 50, 60])

In [24]:
np.add(numbers, numbers2)

array([11, 24, 39, 56, 75, 96])

**Broadcasting with universal functions**

In [25]:
np.multiply(numbers2, 5)

array([ 50, 100, 150, 200, 250, 300])

In [26]:
numbers3 = numbers2.reshape(2, 3)
numbers3

array([[10, 20, 30],
       [40, 50, 60]])

In [28]:
numbers4 = np.array([2, 4, 6])
np.multiply(numbers3, numbers4)

array([[ 20,  80, 180],
       [ 80, 200, 360]])

If a universal function receives two arrays of different shapes that do not support broadcasting, a ValueError occurs.

**Other Universal Functions**

[List of universal functions here](https://numpy.org/doc/stable/reference/ufuncs.html)

### 7.9 Self Check

In [32]:
import numpy as np

array1 = np.array(range(1, 6))

np.power(array1, 3)



array([  1,   8,  27,  64, 125])

### 7.10 Indexing and Slicing

One-dimensional arrays can be indexed and sliced using the same syntax and techniques used with lists and tuples.

**Indexing with two-dimensional arrays**

In [34]:
import numpy as np

grades = np.array([[87, 96, 70], [100, 87, 90],
                    [94, 77, 90], [100, 81, 82]])

grades

array([[ 87,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

In [36]:
grades[0, 1] # row 0, column 1

96

**Selecting a subset of a two-dimensional array's rows**

In [38]:
grades[1] # Selects the second row

array([100,  87,  90])

In [39]:
grades[0:2] # Selects multiple sequential rows

array([[ 87,  96,  70],
       [100,  87,  90]])

In [41]:
grades[[1, 3]] # Selects multiple non-sequential rows (note extra brackets)

array([[100,  87,  90],
       [100,  81,  82]])

**Selecting a subset of a two-dimensional array's columns**

In [43]:
grades[:, 0] # Selects all elements in the first column

array([ 87, 100,  94, 100])

In [46]:
grades[:, 1:3] # Selects all elements in columns 1:3

array([[96, 70],
       [87, 90],
       [77, 90],
       [81, 82]])

In [47]:
grades[:, [0, 2]] # Selects all elements in a list of columns [0, 2]

array([[ 87,  70],
       [100,  90],
       [ 94,  90],
       [100,  82]])

### 7.10 Self Check

In [58]:
import numpy as np

scores = np.array([[1, 2, 3, 4, 5],
                  [6, 7, 8, 9, 10],
                  [11, 12, 13, 14, 15]])


In [59]:
scores[:, 1] # Select the second column

array([ 2,  7, 12])

In [60]:
scores[[0, 2]] # Select the first and third row

array([[ 1,  2,  3,  4,  5],
       [11, 12, 13, 14, 15]])

In [61]:
scores[:, 1:4]

array([[ 2,  3,  4],
       [ 7,  8,  9],
       [12, 13, 14]])

### 7.11 Views: Shallow Copies

View objects see the data in other objects. Views are also known as shallow copies. The view method returns a new object with a view of the original array object's data.

In [1]:
import numpy as np

numbers = np.arange(1, 6)
numbers

array([1, 2, 3, 4, 5])

In [2]:
numbers2 = numbers.view()
numbers2

array([1, 2, 3, 4, 5])

In [3]:
print(id(numbers))
print(id(numbers2))

4573752368
4573025456


In [4]:
numbers[1] *= 10
print(numbers)

[ 1 20  3  4  5]


In [5]:
print(numbers2)

[ 1 20  3  4  5]


In [6]:
numbers2[1] /= 10 # Changing a value in the view also changes that value in the original array

In [7]:
print(numbers2)
print(numbers)

[1 2 3 4 5]
[1 2 3 4 5]


**Slice views**

In [8]:
numbers2 = numbers[0:3]

print(numbers2)

[1 2 3]


In [10]:
print(id(numbers))
print(id(numbers2))

4573752368
4936715312


In [11]:
numbers[1] *= 20

In [12]:
print(numbers)
print(numbers2)

[ 1 40  3  4  5]
[ 1 40  3]


### 7.12 Deep Copies

The array method copy returns a new array object with a deep copy of the original array object's data.

In [3]:
import numpy as np

numbers = np.arange(1, 6)

print(numbers)

numbers2 = numbers.copy()
print(numbers2)

[1 2 3 4 5]
[1 2 3 4 5]


In [4]:
numbers[1] *= 10
print(numbers)
print(numbers2)

[ 1 20  3  4  5]
[1 2 3 4 5]


### 7.13 Reshaping and Transposing

**reshape vs resize**
- reshape returns a view of the original array with the new dimensions
- resize modifies the original array's shape

In [5]:
import numpy as np
grades = np.array([[87, 96, 70], [100, 87, 90]])

grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [8]:
grades.reshape(1, 6)

array([[ 87,  96,  70, 100,  87,  90]])

In [9]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [10]:
grades.resize(1, 6)
grades

array([[ 87,  96,  70, 100,  87,  90]])

**flatten vs. ravel**
- flatten creates a deep copy of the original array's data
- ravel produces a view of the original array, which shares the original array's data

In [11]:
grades = np.array([[87, 96, 70], [100, 87, 90]])

grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [12]:
flattened = grades.flatten()

flattened

array([ 87,  96,  70, 100,  87,  90])

In [13]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [15]:
flattened[0] = 100

flattened

array([100,  96,  70, 100,  87,  90])

In [16]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [20]:
raveled = grades.ravel()

raveled

array([100,  96,  70, 100,  87,  90])

In [18]:
grades

array([[ 87,  96,  70],
       [100,  87,  90]])

In [19]:
raveled[0] = 100

raveled

array([100,  96,  70, 100,  87,  90])

In [21]:
grades

array([[100,  96,  70],
       [100,  87,  90]])

**Transposing rows and columns**: the T attribute returns a transposed view of the array. 

In [22]:
grades.T

array([[100, 100],
       [ 96,  87],
       [ 70,  90]])

In [23]:
grades

array([[100,  96,  70],
       [100,  87,  90]])

**Horizontal and vertical stacking**

In [24]:
grades2 = np.array([[94, 77, 90], [100, 81, 82]])
np.hstack((grades, grades2))

array([[100,  96,  70,  94,  77,  90],
       [100,  87,  90, 100,  81,  82]])

In [26]:
np.vstack((grades, grades2))

array([[100,  96,  70],
       [100,  87,  90],
       [ 94,  77,  90],
       [100,  81,  82]])

### 7.13 Self Check

In [32]:
import numpy as np

numbers = np.array([[1, 2, 3],
                    [4, 5, 6]])

numbers2 = np.hstack((numbers, numbers))

numbers3 = np.vstack((numbers2, numbers2))

numbers3

array([[1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6],
       [1, 2, 3, 1, 2, 3],
       [4, 5, 6, 4, 5, 6]])

### 7.14 Intro to Data Science: pandas Series and DataFrames

[Pandas documentation here](https://pandas.pydata.org/docs/)

**pandas Series**
- A Series is an enhanced one-dimensional array
- Series support custome indexing
- Many Series operations ignore missing data by default

**Creating a Series with default indices**
- By default, a Series has integer indices numbered sequentially from 0
- The initializer may be a list, tuple, dictionary, array, another Series, or a single value

In [1]:
import pandas as pd
grades = pd.Series([87, 100, 94])

**Displaying a Series**

In [2]:
grades

0     87
1    100
2     94
dtype: int64

**Creating a Series with all elemnts having the same value**

In [8]:
pd.Series(98.6, range(3)) 
# The second argument is a one-dimensional iterable object containing the Series' indices.

0    98.6
1    98.6
2    98.6
dtype: float64

In [7]:
pd.Series(98.6, range(2, 5)) 

2    98.6
3    98.6
4    98.6
dtype: float64

**Accessing a Series' Elements**

In [9]:
grades[0]

87

**Producing descriptive statistics for a Series**
- Each of these is a functional-style reduction

In [10]:
grades.count()

3

In [11]:
grades.mean()

93.66666666666667

In [12]:
grades.min()

87

In [13]:
grades.max()

100

In [14]:
grades.std()

6.506407098647712

In [15]:
grades.describe()

count      3.000000
mean      93.666667
std        6.506407
min       87.000000
25%       90.500000
50%       94.000000
75%       97.000000
max      100.000000
dtype: float64

**Creating a Series with custom indices**

In [16]:
grades = pd.Series([87, 10, 94], index=['Wally', 'Eva', 'Sam'])
grades

Wally    87
Eva      10
Sam      94
dtype: int64

**Dictionary initializers**
- If you intialize a Series with a dictionary, its keys become the Series' indices, and its values become the Series' element values.

In [17]:
grades = pd.Series({'Wally': 87, 'Eva': 100, 'Sam': 94})
grades

Wally     87
Eva      100
Sam       94
dtype: int64

**Accessing elements of a Series via custom indices**

In [18]:
grades['Eva']

100

- If the custom indices are strings that could represent valid Python identifiers, pandas automatically adds them to the Series as attributes that you can access via a dot.

In [20]:
grades.Wally

87

- Series also has built-in attributes

In [21]:
grades.dtype

dtype('int64')

In [22]:
grades.values

array([ 87, 100,  94])

**Creating a Series of strings**
- If a Series contains strings, you can use its str attribute to call string methods on the elements.

In [23]:
hardware = pd.Series(['Hammer', 'Saw', 'Wrench'])

hardware

0    Hammer
1       Saw
2    Wrench
dtype: object

In [24]:
hardware.str.contains('a')

0     True
1     True
2    False
dtype: bool

In [25]:
hardware.str.upper()

0    HAMMER
1       SAW
2    WRENCH
dtype: object

In [26]:
hardware

0    Hammer
1       Saw
2    Wrench
dtype: object

In [28]:
hardware.str.find('a')
# -1 indicates that it did not find the subset string
# if multiple subsets are found, it returns the first

0    1
1    1
2   -1
dtype: int64

[10 Most Useful String functions in Pandas Link](https://www.aboutdatablog.com/post/10-most-useful-string-functions-in-pandas)

### 7.14.1 Self Check

In [39]:
import numpy as np
import pandas as pd

random_numbers1 = np.random.randint(60, 101, 5) # Make sure the second parameter is one value above included value

temperatures = pd.Series(random_numbers1)

print(temperatures)

0    89
1    85
2    87
3    81
4    70
dtype: int64


In [35]:
temperatures.min()

85

In [36]:
temperatures.max()

99

In [37]:
temperatures.mean()

93.2

In [38]:
temperatures.describe()

count     5.000000
mean     93.200000
std       5.403702
min      85.000000
25%      91.000000
50%      95.000000
75%      96.000000
max      99.000000
dtype: float64

### 7.14.2 DataFrames

**DataFrame**
- An enhanced two-dimensional array
- Can have custom row and column indices
- Offer additional operations and capabilities that make them more convenient for data science tasks
- Each column in a DataFrame is a Series
- Different columns can contain different element types

**Creating a DataFrame from a Dictionary**
- Dictionary keys become the column names
- Dictionary values become the element values in the corresponding column
- By default, the row indices are auto-generated integers starting from 0

In [42]:
import pandas as pd

grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90],
               'Sam': [94, 77, 90], 'Katie': [100, 81, 82],
               'Bob': [83, 64, 85]}

grades = pd.DataFrame(grades_dict)

grades # Wow

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
0,87,100,94,100,83
1,96,87,77,81,64
2,70,90,90,82,85


**Customizing a DataFrame's indices with the index attribute**

In [45]:
# First option
pd.DataFrame(grades_dict, index=['Test1', 'Test2', 'Test3'])

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,64
Test3,70,90,90,82,85


In [48]:
# Another option using index attribute
grades.index = ['Test1', 'Test2', 'Test3']
grades

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,64
Test3,70,90,90,82,85


**Accessing a DataFrame's columns**

In [49]:
grades['Eva']

Test1    100
Test2     87
Test3     90
Name: Eva, dtype: int64

In [50]:
grades.Sam

Test1    94
Test2    77
Test3    90
Name: Sam, dtype: int64

**Selecting rows via the loc and iloc attributes**

In [51]:
# Access a row by its lavel using the loc attribute
grades.loc['Test1']

Wally     87
Eva      100
Sam       94
Katie    100
Bob       83
Name: Test1, dtype: int64

In [52]:
# Access a row by integer zero-based indices using iloc
grades.iloc[1]

Wally    96
Eva      87
Sam      77
Katie    81
Bob      64
Name: Test2, dtype: int64

**Selecting rows via slices and lists with the loc and iloc attributes**

In [53]:
# Using loc with a slice
grades.loc['Test1':'Test2']

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,64


In [54]:
# Using iloc with a slice
grades.iloc[0:2]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test2,96,87,77,81,64


In [55]:
# Using loc with a list
grades.loc[['Test1', 'Test3']]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test3,70,90,90,82,85


In [56]:
# Using iloc with a list
grades.iloc[[0, 2]]

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87,100,94,100,83
Test3,70,90,90,82,85


**Boolean indexing**

In [58]:
grades[grades >= 90]
# Pandas checks every grade to determine whether its value is greater than or equal 90

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,,100.0,94.0,100.0,
Test2,96.0,,,,
Test3,,90.0,90.0,,


In [65]:
grades[(grades >= 80) & (grades < 90)] # ( ) are needed, use & or |

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test1,87.0,,,,83.0
Test2,,87.0,,81.0,
Test3,,,,82.0,85.0


**Accessing a Specific DataFrame Cell by Row and Column**

In [66]:
grades.at['Test2', 'Eva']

87

In [67]:
grades.iat[2, 0]

70

**Assign new values to specific elements**

In [68]:
grades.at['Test2', 'Eva'] = 100

grades.at['Test2', 'Eva']

100

In [69]:
grades.iat[1, 2] = 87

grades.iat[1, 2]

87

**Descriptive statistics**

In [70]:
grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.333333,96.666667,90.333333,87.666667,77.333333
std,13.203535,5.773503,3.511885,10.692677,11.590226
min,70.0,90.0,87.0,81.0,64.0
25%,78.5,95.0,88.5,81.5,73.5
50%,87.0,100.0,90.0,82.0,83.0
75%,91.5,100.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


In [71]:
# Control the precision and other default settings with pandas' set_option function

pd.set_option('precision', 2)

grades.describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,3.0,3.0,3.0,3.0,3.0
mean,84.33,96.67,90.33,87.67,77.33
std,13.2,5.77,3.51,10.69,11.59
min,70.0,90.0,87.0,81.0,64.0
25%,78.5,95.0,88.5,81.5,73.5
50%,87.0,100.0,90.0,82.0,83.0
75%,91.5,100.0,92.0,91.0,84.0
max,96.0,100.0,94.0,100.0,85.0


In [73]:
grades.loc[['Test1', 'Test2']].describe()

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
count,2.0,2.0,2.0,2.0,2.0
mean,91.5,100.0,90.5,90.5,73.5
std,6.36,0.0,4.95,13.44,13.44
min,87.0,100.0,87.0,81.0,64.0
25%,89.25,100.0,88.75,85.75,68.75
50%,91.5,100.0,90.5,90.5,73.5
75%,93.75,100.0,92.25,95.25,78.25
max,96.0,100.0,94.0,100.0,83.0


In [74]:
grades.mean()

Wally    84.33
Eva      96.67
Sam      90.33
Katie    87.67
Bob      77.33
dtype: float64

**Transposing the DataFrame with the T attribute**

In [75]:
grades.T

Unnamed: 0,Test1,Test2,Test3
Wally,87,96,70
Eva,100,100,90
Sam,94,87,90
Katie,100,81,82
Bob,83,64,85


In [76]:
grades.T.describe()

Unnamed: 0,Test1,Test2,Test3
count,5.0,5.0,5.0
mean,92.8,85.6,83.4
std,7.66,14.19,8.23
min,83.0,64.0,70.0
25%,87.0,81.0,82.0
50%,94.0,87.0,85.0
75%,100.0,96.0,90.0
max,100.0,100.0,90.0


In [79]:
grades.T.mean()

Test1    92.8
Test2    85.6
Test3    83.4
dtype: float64

**Sorting by rows by their indices**

In [80]:
grades.sort_index(ascending=False)

Unnamed: 0,Wally,Eva,Sam,Katie,Bob
Test3,70,90,90,82,85
Test2,96,100,87,81,64
Test1,87,100,94,100,83


**Sorting by Column Indices**

In [82]:
grades.sort_index(axis=1)

Unnamed: 0,Bob,Eva,Katie,Sam,Wally
Test1,83,100,100,94,87
Test2,64,100,81,87,96
Test3,85,90,82,90,70


**Sorting by column values**

In [85]:
grades.sort_values(by='Test1', axis=1, ascending=False)

Unnamed: 0,Eva,Katie,Sam,Wally,Bob
Test1,100,100,94,87,83
Test2,100,81,87,96,64
Test3,90,82,90,70,85


In [86]:
grades.T.sort_values(by='Test1', ascending=False)

Unnamed: 0,Test1,Test2,Test3
Eva,100,100,90
Katie,100,81,82
Sam,94,87,90
Wally,87,96,70
Bob,83,64,85


In [87]:
grades.loc['Test1'].sort_values(ascending=False)

Eva      100
Katie    100
Sam       94
Wally     87
Bob       83
Name: Test1, dtype: int64

**Copy vs. in-place sorting**
- sort_index and sort_values return a copy of the original DataFrame 
- You can sort the DataFrame in place, rather than copying the data using inplace=True

In [93]:
grades.sort_values(by='Test1', axis=1, ascending=False, inplace=True)
grades

Unnamed: 0,Eva,Katie,Sam,Wally,Bob
Test1,100,100,94,87,83
Test2,100,81,87,96,64
Test3,90,82,90,70,85


### 7.14 Self Check

In [103]:
import pandas as pd

temps = {'Mon': [68, 89], 'Tue': [71, 93], 'Wed': [66, 82], 'Thu': [75, 97], 'Fri': [62, 79]}

temperatures = pd.DataFrame(temps, index=['Low', 'High'])

In [107]:
temperatures.loc[:, 'Mon':'Wed']

Unnamed: 0,Mon,Tue,Wed
Low,68,71,66
High,89,93,82


In [108]:
temperatures.loc['Low']

Mon    68
Tue    71
Wed    66
Thu    75
Fri    62
Name: Low, dtype: int64

In [112]:
pd.set_option('precision', 2)

temperatures.mean()

Mon    78.5
Tue    82.0
Wed    74.0
Thu    86.0
Fri    70.5
dtype: float64

In [113]:
temperatures.loc['High'].mean()

88.0

In [114]:
temperatures.loc['Low'].mean()

68.4