<div style="color:white;
           display:fill;
           border-radius:5px;
           background-color:#5642C5;
           font-size:200%;
           font-family:Arial;letter-spacing:0.5px">

<p width = 20%, style="padding: 10px;
              color:white;">
Pandas: Data Structures and Accessing the Data
              
</p>
</div>

Data Science Cohort Live NYC Feb 2022
<p>Phase 1: Topic 4</p>
<br>
<br>

<div align = "right">
<img src="Images/flatiron-school-logo.png" align = "right" width="200"/>
</div>
    
    

In [1]:
# Import libraries
import numpy as np
import pandas as pd # import pandas library

Pandas has two core data structures:
- Series: 1D array with native support for many data operations that numpy arrays don't.
- DataFrames: Tabular data with various tabular manipulation operations. Individual columns/rows are pandas Series.

#### Pandas Series

We have data on the highest number of cars that a few famous people have owned. 

| Person | Max number of Cars |
| --- | --- | 
| Muammar Qaddafi | 25000 |
| Mohandas Gandhi | 0 |
| Saddam Hussein | 4500 |
| Kevin Bacon | 2 |
| Billy Bob Thornton | 8 |

Let's represent this as a series.

In [2]:
pd.Series([25000,0,4500,2,8],
          index = ['Muammar Qaddafi', 'Mohandas Gandhi', 'Saddam Hussein', 'Kevin Bacon', 'Billy Bob Thornton'], 
          name = 'Max Number Cars Owned')

Muammar Qaddafi       25000
Mohandas Gandhi           0
Saddam Hussein         4500
Kevin Bacon               2
Billy Bob Thornton        8
Name: Max Number Cars Owned, dtype: int64

In [3]:
# This more naturally can be inputted from a dict.
car_dict = {'Muammar Qaddafi': 25000, 'Mohandas Gandhi': 0, 
            'Saddam Hussein': 4500, 'Kevin Bacon': 2, 'Billy Bob Thornton': 8}

car_owner_series = pd.Series(car_dict)
car_owner_series

Muammar Qaddafi       25000
Mohandas Gandhi           0
Saddam Hussein         4500
Kevin Bacon               2
Billy Bob Thornton        8
dtype: int64

Why use Pandas series?

Combines:
- Dictionary style fast lookup.
- Numpy style vectorized operations on the values.


In [4]:
# indexed on sensible keys. 
car_owner_series['Billy Bob Thornton']

8

In [5]:
# can slice on these keys
car_owner_series["Mohandas Gandhi"
                 :"Kevin Bacon"]

Mohandas Gandhi       0
Saddam Hussein     4500
Kevin Bacon           2
dtype: int64

In [6]:
#can do fast computation like a numpy array

# A new set of values. Kevin Bacon bought an extra car and Billy Bob bought two more. 
delta_cars = {'Mohandas Gandhi': 0, 'Billy Bob Thornton': 2, 
              'Saddam Hussein': 0, 'Kevin Bacon': 1, 'Muammar Qaddafi': 0}

delta_cars_series = pd.Series(delta_cars)

In [7]:
print(delta_cars_series)

Mohandas Gandhi       0
Billy Bob Thornton    2
Saddam Hussein        0
Kevin Bacon           1
Muammar Qaddafi       0
dtype: int64


In [8]:
print(car_owner_series)

Muammar Qaddafi       25000
Mohandas Gandhi           0
Saddam Hussein         4500
Kevin Bacon               2
Billy Bob Thornton        8
dtype: int64


Want to update but the two series are not in the same order.

No problem for pandas.

In [9]:
new_car_series = car_owner_series + delta_cars_series
print(new_car_series)

Billy Bob Thornton       10
Kevin Bacon               3
Mohandas Gandhi           0
Muammar Qaddafi       25000
Saddam Hussein         4500
dtype: int64


#### Some important Series attributes

- The Series.index attribute: list of indices (keys)

In [10]:
new_car_series.index

Index(['Billy Bob Thornton', 'Kevin Bacon', 'Mohandas Gandhi',
       'Muammar Qaddafi', 'Saddam Hussein'],
      dtype='object')

- The Series.values attribute: series values returns as numpy array

In [11]:
new_car_series.values

array([   10,     3,     0, 25000,  4500], dtype=int64)

- The Series.name attribute: the name of the series

In [13]:
new_car_series.name = 'Max cars owned'
print(new_car_series)

Billy Bob Thornton       10
Kevin Bacon               3
Mohandas Gandhi           0
Muammar Qaddafi       25000
Saddam Hussein         4500
Name: Max cars owned, dtype: int64


In [14]:
new_car_series.name

'Max cars owned'

- The Series.dtype: data type for Series values

In [18]:
new_car_series.dtype

dtype('int64')

Series have some various attached methods.

Examples: sorting by max cars in descending order:

In [17]:
new_car_series.sort_values(ascending = False)

Muammar Qaddafi       25000
Saddam Hussein         4500
Billy Bob Thornton       10
Kevin Bacon               3
Mohandas Gandhi           0
Name: Max cars owned, dtype: int64

Series have:
- native methods for handling time series data
- whole host of other nice methods.

Will see these later.

#### Pandas DataFrames

We saw these before with the heart disease dataset. Tabular data structure.

Let' take a new dataset that has data about various breakfast cereals.

In [19]:
cereal_df = pd.read_csv('Data/cereal.csv', index_col = 'name')

Often want a quick view of the first few entries in the table data.

The .head() method:

In [24]:
cereal_df.head(2) # default returns first 5 elements

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679


Less common, take a look at the end:

The .tail() method:

In [25]:
cereal_df.tail()

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
Triples,G,C,110,2,1,250,0.0,21.0,3,60,25,3,1.0,0.75,39.106174
Trix,G,C,110,1,1,140,0.0,13.0,12,25,25,2,1.0,1.0,27.753301
Wheat Chex,R,C,100,3,1,230,3.0,17.0,3,115,25,1,1.0,0.67,49.787445
Wheaties,G,C,100,3,1,200,3.0,17.0,3,110,25,1,1.0,1.0,51.592193
Wheaties Honey Gold,G,C,110,2,1,200,1.0,16.0,8,60,25,1,1.0,0.75,36.187559


Good common practice: 

Start by looking at some metadata and descriptive statistics on DataFrame.

- .info() method: column data type. Any nulls?
- .describe() method: statistics for each column

In [26]:
cereal_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 77 entries, 100% Bran to Wheaties Honey Gold
Data columns (total 15 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   mfr       77 non-null     object 
 1   type      77 non-null     object 
 2   calories  77 non-null     int64  
 3   protein   77 non-null     int64  
 4   fat       77 non-null     int64  
 5   sodium    77 non-null     int64  
 6   fiber     77 non-null     float64
 7   carbo     77 non-null     float64
 8   sugars    77 non-null     int64  
 9   potass    77 non-null     int64  
 10  vitamins  77 non-null     int64  
 11  shelf     77 non-null     int64  
 12  weight    77 non-null     float64
 13  cups      77 non-null     float64
 14  rating    77 non-null     float64
dtypes: float64(5), int64(8), object(2)
memory usage: 9.6+ KB


In [26]:
cereal_df.describe()

Unnamed: 0,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
count,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0,77.0
mean,106.883117,2.545455,1.012987,159.675325,2.151948,14.597403,6.922078,96.077922,28.246753,2.207792,1.02961,0.821039,42.665705
std,19.484119,1.09479,1.006473,83.832295,2.383364,4.278956,4.444885,71.286813,22.342523,0.832524,0.150477,0.232716,14.047289
min,50.0,1.0,0.0,0.0,0.0,-1.0,-1.0,-1.0,0.0,1.0,0.5,0.25,18.042851
25%,100.0,2.0,0.0,130.0,1.0,12.0,3.0,40.0,25.0,1.0,1.0,0.67,33.174094
50%,110.0,3.0,1.0,180.0,2.0,14.0,7.0,90.0,25.0,2.0,1.0,0.75,40.400208
75%,110.0,3.0,2.0,210.0,3.0,17.0,11.0,120.0,25.0,3.0,1.0,1.0,50.828392
max,160.0,6.0,5.0,320.0,14.0,23.0,15.0,330.0,100.0,3.0,1.5,1.5,93.704912


Important basic DataFrame attributes:

- DataFrame.index: list of index names for rows
- DataFrame.columns: list of column names
- DataFrame.shape: returns (number rows, number columns) tuple.


In [27]:
cereal_df.columns

Index(['mfr', 'type', 'calories', 'protein', 'fat', 'sodium', 'fiber', 'carbo',
       'sugars', 'potass', 'vitamins', 'shelf', 'weight', 'cups', 'rating'],
      dtype='object')

In [28]:
cereal_df.index[0:10]

Index(['100% Bran', '100% Natural Bran', 'All-Bran',
       'All-Bran with Extra Fiber', 'Almond Delight',
       'Apple Cinnamon Cheerios', 'Apple Jacks', 'Basic 4', 'Bran Chex',
       'Bran Flakes'],
      dtype='object', name='name')

In [29]:
cereal_df.shape

(77, 15)

#### Accessing data in a DataFrame

Accessing data in a Series by named index is easy. Remember:

In [30]:
new_car_series['Billy Bob Thornton']

10

DataFrames: can access entire **columns** in a similar way. Access the calories column.

In [31]:
cereal_df['calories']

name
100% Bran                     70
100% Natural Bran            120
All-Bran                      70
All-Bran with Extra Fiber     50
Almond Delight               110
                            ... 
Triples                      110
Trix                         110
Wheat Chex                   100
Wheaties                     100
Wheaties Honey Gold          110
Name: calories, Length: 77, dtype: int64

In [35]:
cereal_df.calories # equivalent to cereal_df['calories']

name
100% Bran                     70
100% Natural Bran            120
All-Bran                      70
All-Bran with Extra Fiber     50
Almond Delight               110
                            ... 
Triples                      110
Trix                         110
Wheat Chex                   100
Wheaties                     100
Wheaties Honey Gold          110
Name: calories, Length: 77, dtype: int64

Wait a minute...this is returning a Series with name "calories"! 

Individual columns/rows extracted as pandas Series from the DataFrame architecture.

Can also extract data from a subset of the columns by passing in a list of column names.

DataFrame[list of column names in subset]: returns a DataFrame

In [33]:
col_list = ['calories', 'fat', 'sugars']
cereal_df[['calories', 'fat', 'sugars']]

Unnamed: 0_level_0,calories,fat,sugars
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
100% Bran,70,1,6
100% Natural Bran,120,5,8
All-Bran,70,1,5
All-Bran with Extra Fiber,50,0,0
Almond Delight,110,2,8
...,...,...,...
Triples,110,1,3
Trix,110,1,12
Wheat Chex,100,1,3
Wheaties,100,1,3


This is a new dataframe with just the accessed columns in the list. We can access a particular row and column as follows:

DataFrame[column_name][row_name]

In [35]:
cereal_df['sugars']['Fruity Pebbles']

12

#### The .loc[] accessor:

- Access single row by named index
- Complex selections: slicing across both rows and columns, etc
- Really important to use when assigning values in selections.

1. DataFrame.loc[row_accessor]
2. DataFrame.loc[row_accessor, column_accessor]


Accessing a single row with .loc[]

In [36]:
cereal_df.head(8)

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
Apple Cinnamon Cheerios,G,C,110,2,2,180,1.5,10.5,10,70,25,1,1.0,0.75,29.509541
Apple Jacks,K,C,110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094
Basic 4,G,C,130,3,2,210,2.0,18.0,8,100,25,3,1.33,0.75,37.038562


In [37]:
cereal_df.loc['All-Bran']

mfr                 K
type                C
calories           70
protein             4
fat                 1
sodium            260
fiber             9.0
carbo             7.0
sugars              5
potass            320
vitamins           25
shelf               3
weight            1.0
cups             0.33
rating      59.425505
Name: All-Bran, dtype: object

Accessing multiple rows:

In [40]:
cereal_df.head(8)

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
Apple Cinnamon Cheerios,G,C,110,2,2,180,1.5,10.5,10,70,25,1,1.0,0.75,29.509541
Apple Jacks,K,C,110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094
Basic 4,G,C,130,3,2,210,2.0,18.0,8,100,25,3,1.33,0.75,37.038562


In [31]:
# select rows by list of index names
row_list = ['All-Bran', 'Almond Delight', 'Apple Jacks']
cereal_df.loc[row_list]

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
Apple Jacks,K,C,110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094


In [42]:
#slice rows by name
cereal_df.loc['All-Bran':'Apple Jacks']

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
Apple Cinnamon Cheerios,G,C,110,2,2,180,1.5,10.5,10,70,25,1,1.0,0.75,29.509541
Apple Jacks,K,C,110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094


Note: with .loc[],  final entry *is included* in slice.

Accessing multiple columns:

In [43]:
cereal_df.head(8)

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
Apple Cinnamon Cheerios,G,C,110,2,2,180,1.5,10.5,10,70,25,1,1.0,0.75,29.509541
Apple Jacks,K,C,110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094
Basic 4,G,C,130,3,2,210,2.0,18.0,8,100,25,3,1.33,0.75,37.038562


In [32]:
# select columns by list
listcol = ["calories", "protein", 
                   "fat","sodium"]
cereal_df.loc["All-Bran", listcol]

calories     70
protein       4
fat           1
sodium      260
Name: All-Bran, dtype: object

In [45]:
# slice on columns by name
cereal_df.loc["All-Bran", 
              "calories":"sodium"]


calories     70
protein       4
fat           1
sodium      260
Name: All-Bran, dtype: object

Putting it altogether (selections on rows and columns):

In [46]:
cereal_df.head(8)

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
Apple Cinnamon Cheerios,G,C,110,2,2,180,1.5,10.5,10,70,25,1,1.0,0.75,29.509541
Apple Jacks,K,C,110,2,0,125,1.0,11.0,14,30,25,2,1.0,1.0,33.174094
Basic 4,G,C,130,3,2,210,2.0,18.0,8,100,25,3,1.33,0.75,37.038562


In [38]:
# slicing on rows AND columns
cereal_df.loc["All-Bran":"Almond Delight", 
              "calories":"sodium"]

Unnamed: 0_level_0,calories,protein,fat,sodium
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
All-Bran,70,4,1,260
All-Bran with Extra Fiber,50,4,0,140
Almond Delight,110,2,2,200


In [39]:
# accessing all rows and a column subset 
# with .loc accessor 
cereal_df.loc[:, ['protein', 'fat']]

Unnamed: 0_level_0,protein,fat
name,Unnamed: 1_level_1,Unnamed: 2_level_1
100% Bran,4,1
100% Natural Bran,3,5
All-Bran,4,1
All-Bran with Extra Fiber,4,0
Almond Delight,2,2
...,...,...
Triples,2,1
Trix,1,1
Wheat Chex,3,1
Wheaties,3,1


In [49]:
cereal_df[['calories','protein']]

Unnamed: 0_level_0,calories,protein
name,Unnamed: 1_level_1,Unnamed: 2_level_1
100% Bran,70,4
100% Natural Bran,120,3
All-Bran,70,4
All-Bran with Extra Fiber,50,4
Almond Delight,110,2
...,...,...
Triples,110,2
Trix,110,1
Wheat Chex,100,3
Wheaties,100,3


Only difference arises when slicing on columns:
- Really need to use .loc[] accessor for this.

In [34]:
cereal_df.loc[:, 'calories':'sodium']

Unnamed: 0_level_0,calories,protein,fat,sodium
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100% Bran,70,4,1,130
100% Natural Bran,120,3,5,15
All-Bran,70,4,1,260
All-Bran with Extra Fiber,50,4,0,140
Almond Delight,110,2,2,200
...,...,...,...,...
Triples,110,2,1,250
Trix,110,1,1,140
Wheat Chex,100,3,1,230
Wheaties,100,3,1,200


In [35]:
cereal_df['calories':'sodium']

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1


The .iloc[] accessor:

- Access rows and columns by their integer position instead of named index.
- Everything else pretty much the same as .loc[]

In [52]:
cereal_df.head(5)

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843


In [53]:
cereal_df.iloc[1:4, 2:6]

Unnamed: 0_level_0,calories,protein,fat,sodium
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
100% Natural Bran,120,3,5,15
All-Bran,70,4,1,260
All-Bran with Extra Fiber,50,4,0,140


Note with .iloc slice, last index *NOT included* in slice