<H1>Intro to Python</H1>

Here are some extra resources for learning Python:

**Getting Started with Python**:

* https://www.codecademy.com/learn/python
* http://docs.python-guide.org/en/latest/intro/learning/
* https://learnpythonthehardway.org/book/
* https://www.codementor.io/learn-python-online

**Learning Python in Notebooks**:

* http://mbakker7.github.io/exploratory_computing_with_python/

This is handy to always have available for reference:

**Python Reference**:

* https://docs.python.org/3.5/reference/


There are also Python courses in the MDST datacamp!

## 0. Jupyter Notebook

Welcome to Jupyter Notebook! Jupyter lets you develop documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

All our cells so far are _markdown_ cells, meaning they just contain text! You can enter edit mode on them by pressing ENTER

In [None]:
# Jupyter also has code cells, like this one! You can write and run a single line of code or more
# In python, the '#' symbol creates a comment line
# To run a cell, press CTRL-ENTER or SHIFT-ENTER (also moves down to the next cell)

In [1]:
# Below, we define the variable a to be the number 1
a = 1

Now, the variable we defined is accessible from any other code cell.

In [2]:
a

1

Some basics:

In [3]:
print('Hello World!')

Hello World!


In [4]:
# The notebook will execute every line in a cell, but will only print the last line
a = a + 1
a
a + 1

3

In [5]:
# You can get around this by calling print explicitly
print(a)
print(a + 1)

2
3


# 0. Imports

Python has tons of cool, prewritten libraries that implement code for you! All you have to do is import them

In [6]:
import antigravity

Most libraries won't automatically do anything -- they will just become a module which you can call

In [7]:
import time
time.time()

# be careful not to overwrite the library by creating a new variable of the same name...
# e.g. time = 2 will overwrite the library time and you won't be able to use its functions unless you import it again

1630353726.4881635

You can import individual functions and not have to use the module name...

In [8]:
from random import random
random()

# if you just ran `import random`, you would have to run random.random()

0.47592843563311105

or import all the functions within a library with *

In [9]:
from math import *
sqrt(3)

1.7320508075688772

You can also import your own files... we will see this in the first checkpoint!

## 1. Data Types

### 1.0 Dynamic Typing

Python does not require you to specify the data type when you create it -- it will automatically guess at assigning it

In [10]:
x = 1
type(x)

int

In other languages such as C++, java, etc., defining a variable looks more like:

In [11]:
int i = 3

SyntaxError: invalid syntax (<ipython-input-11-9fcf0b6bc4c8>, line 1)

This also means a variable's data type can change

In [12]:
x = str(x)
print(x)
type(x)

1


str

In [13]:
x = [1,2,3]
type(x)

list

### 1.1 Numbers

This is an integer...

In [14]:
1

1

In [15]:
type(0)

int

so is this

In [16]:
-3

-3

This is a float

In [17]:
2.0

2.0

In [18]:
type(0.0)

float

You can do math with them

In [19]:
3 + 2

5

In [21]:
1.1 - 9.0


-7.9

In [20]:
1 * 5

5

In [22]:
# float division
1 / 2

0.5

In [23]:
# integer division
13 // 5

2

In [24]:
# modulus (remainder)
13 % 5

3

In [25]:
# exponent
4 ** 2

16

ints and floats are mostly interchangeable

In [26]:
3 * 3.0

9.0

In [27]:
9.8 // 2

4.0

but can also be cast (i.e. converted) to the other type

In [28]:
float(3)

3.0

In [29]:
int(2.9)

2

### 1.2 Strings

This is a string...

In [30]:
s = "apple"
s

'apple'

so is this

In [31]:
'apple'

'apple'

### 1.3 Boolean Values

In [32]:
True

True

In [33]:
False

False

### 1.4 Data Structures

#### 1.4.1 Lists

This is a list

In [34]:
l = [42, 7, 13, 24601, 2001, 3.50]

In [35]:
len(l)

6

In [36]:
r = [32, 'wse', True]

In [37]:
len(r)

3

#### _a sidenote on indexing_

Python uses 0-indexing

In [38]:
l[0]

42

In [39]:
l[5]

3.5

Negative numbers index from the end

In [40]:
l[-1]

3.5

You can index lists to get individual items...

In [41]:
l[3]

24601

and strings to get individual characters (substring of length 1)

In [42]:
'string'[3]

'i'

You can use indexing to get subarrays/substrings

syntax: [start:end:step]

the subarray will include the start index but not the end

In [43]:
# the second and third elements
print(l[1:3])
print([l[1],l[2]])

[7, 13]
[7, 13]


In [44]:
# the last 8 characters
'this is a sentence'[-8:]

'sentence'

In [45]:
# every other element
l[::2]

[42, 13, 2001]

#### back to lists

Elements can be added...

In [46]:
l.append(7)
l

[42, 7, 13, 24601, 2001, 3.5, 7]

removed (the first occurence)...

In [47]:
l.remove(7)
l

[42, 13, 24601, 2001, 3.5, 7]

and modified

In [48]:
l[0] = 100
l

[100, 13, 24601, 2001, 3.5, 7]

You can make lists like this:

In [49]:
list1 = list()
list1

[]

In [50]:
list2 = []
list2

[]

In [51]:
list3 = [10,30,20]
list3

[10, 30, 20]

In [52]:
# l4 uses the range function which generates a sequence of numbers
list4 = list(range(3))
list4

[0, 1, 2]

In [53]:
# l5 uses list comprehension -- it's like a one-line for loop
list5 = [i*2 for i in list4]
list5

[0, 2, 4]

##### 1.4.2 tuples

This is a python tuple

In [54]:
t = (1, 2, 3)
t

(1, 2, 3)

You can index them like lists

In [55]:
t[0]

1

but CAN'T modify them (they are immutable)

In [56]:
t[2] = 10

TypeError: 'tuple' object does not support item assignment

In [57]:
t.append(10)

AttributeError: 'tuple' object has no attribute 'append'

In most cases lists are more useful, but the immutable property of tuples means they can be used as dictionary keys, in sets, etc.

##### 1.4.3 dictionaries

Dictionaries store (key: value) pairs

In [58]:
d = {"apple": "a fruit", "basil": "an herb", "monkey": "a mammal"}
d

{'apple': 'a fruit', 'basil': 'an herb', 'monkey': 'a mammal'}

In [59]:
d.keys()

dict_keys(['apple', 'basil', 'monkey'])

In [60]:
d.values()

dict_values(['a fruit', 'an herb', 'a mammal'])

They are indexed by their keys and return the corresponding values

In [61]:
d['basil']

'an herb'

##### 1.4.4 sets

Sets store unique elements

In [62]:
s = set([1,2,3,1,2,3])
s

{1, 2, 3}

In [63]:
s.add(3)
s

{1, 2, 3}

In [64]:
s.add(12)
s

{1, 2, 3, 12}

In [65]:
s.update([3,4,5])
s

{1, 2, 3, 4, 5, 12}

## 2. Statements and loops

### 2.1 If statements

Conditional statements to handle different cases

In [66]:
if True:
    print('true')

true


In [67]:
if 1 == 3:
    print('1')
elif 2 == 3:
    print('2')
elif 3 == 3:
    print('3')
else:
    print('4')

3


### 2.2 For loops

Iterate through a group of items

In [68]:
for i in range(3):
    print(i)

0
1
2


In [69]:
# enumerate generates a sequence of index, value pairs
for i,x in enumerate(['a','b','c']):
    print(i,':',x)

0 : a
1 : b
2 : c


### 2.3 While loops

Whlie loops continually iterate until a condition is met

In [70]:
i = 0
while i < 3:
    print(i)
    i+=1
print('after loop ends: i =',i)

0
1
2
after loop ends: i = 3


## 3. Functions

### 3.1 Built-in functions

In [71]:
abs(-3)

3

In [72]:
all([True, True, False])

False

In [73]:
dir()

['In',
 'Out',
 '_',
 '_10',
 '_12',
 '_13',
 '_14',
 '_15',
 '_16',
 '_17',
 '_18',
 '_19',
 '_2',
 '_20',
 '_21',
 '_22',
 '_23',
 '_24',
 '_25',
 '_26',
 '_27',
 '_28',
 '_29',
 '_30',
 '_31',
 '_32',
 '_33',
 '_35',
 '_37',
 '_38',
 '_39',
 '_4',
 '_40',
 '_41',
 '_42',
 '_44',
 '_45',
 '_46',
 '_47',
 '_48',
 '_49',
 '_50',
 '_51',
 '_52',
 '_53',
 '_54',
 '_55',
 '_58',
 '_59',
 '_60',
 '_61',
 '_62',
 '_63',
 '_64',
 '_65',
 '_7',
 '_71',
 '_72',
 '_8',
 '_9',
 '__',
 '___',
 '__builtin__',
 '__builtins__',
 '__doc__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 '_dh',
 '_i',
 '_i1',
 '_i10',
 '_i11',
 '_i12',
 '_i13',
 '_i14',
 '_i15',
 '_i16',
 '_i17',
 '_i18',
 '_i19',
 '_i2',
 '_i20',
 '_i21',
 '_i22',
 '_i23',
 '_i24',
 '_i25',
 '_i26',
 '_i27',
 '_i28',
 '_i29',
 '_i3',
 '_i30',
 '_i31',
 '_i32',
 '_i33',
 '_i34',
 '_i35',
 '_i36',
 '_i37',
 '_i38',
 '_i39',
 '_i4',
 '_i40',
 '_i41',
 '_i42',
 '_i43',
 '_i44',
 '_i45',
 '_i46',
 '_i47',
 '_i48',
 '_i49',
 '_i

In [74]:
eval('1+3')

4

In [75]:
isinstance('a',str)

True

In [76]:
len([1,2,3])

3

In [77]:
max([3,4,5])

5

In [78]:
min([-3,3,9])

-3

In [79]:
pow(3,3)

27.0

In [80]:
list(reversed([1,2,3]))

[3, 2, 1]

In [81]:
round(3.8)

4

In [82]:
round(3.3)

3

In [83]:
list(sorted([4,2,6]))

[2, 4, 6]

In [84]:
sum([1,3,5])

9

In [85]:
type(3.3)

float

### 3.2 Custom functions

You can write your own functions to minimize duplicate or long sections of code

These can be used to just 'do' something, such as print

In [86]:
def foo(arg1, arg2=2):
    if arg2//2 == 0:
        print(arg1)

Or you can use them to calculate and return a value

In [87]:
def bar(arg1):
    if arg1 > 10:
        return arg1 - 10
    
    return arg1 

In [88]:
bar(12)

2

In [89]:
bar(2)

2

# 4. Pickle

Pickle is a python library which we will be using a lot... all it does is compress and save Python objects to files

In [90]:
import pickle

Dumping saves an object:

In [94]:
# the first argument is the object you want to save
# the second is the file which you are saving it to

sample_list = [1,2,3]
b = iter(sample_list)

pickle.dump(b, open('b.pkl','wb'))

# the 'wb' stands for write-binary [ write enables saving to a file, binary is the way pickle uses to load the file ]

Loading lets you load and reassign the object:

In [95]:
c = pickle.load(open('b.pkl','rb'))

# the 'rb' stands for read-binary

In [96]:
c

<list_iterator at 0x2145a099d00>

In [97]:
b

<list_iterator at 0x2145a11dac0>

# 5. Numpy

Numpy is [numerical python], a library built for large arrays and matrices which is much faster than regular python. It is the first of the three big libraries used for data science!

In [98]:
import numpy as np

## 5.1 Arrays

Numpy arrays can be created from a python list

In [99]:
a = [1,2,3,4,5,6]
b = np.array(a)
b

array([1, 2, 3, 4, 5, 6])

Right now, it looks an awful like a python list, but there are some key points you should know.

Numpy arrays are:
- homogeneous (all elements in an array have the same type)
- multidimensional

In [100]:
# Homogeneous: all numpy arrays have an associated data type
# numbers are usually ints or floats
b.dtype

dtype('int32')

In [101]:
# Multidimensional: numpy arrays can have multiple dimensions, like a nested list
# We can reshape b into a 3x2 matrix
# Note: this doesn't change b. That's why we assign it to a new variable: m
m = b.reshape(3, 2)
m

array([[1, 2],
       [3, 4],
       [5, 6]])

In [102]:
# Each dimension is called an axis
# The size across each axis is called the shape
# These are two very important concepts!
m.shape

(3, 2)

## 5.2 Math

Numpy gives us a lot of math functions to work with. You can find them all in the <a href=https://numpy.org/doc/stable/reference/index.html>documentation</a>.

In [103]:
np.sum(b)

21

In [104]:
np.mean(b)

3.5

In [105]:
# for convenience, you can also call
b.mean()

3.5

You can also apply these functions to only one axis

In [106]:
# only sum across rows (read: apply the sum to axis 1)
np.sum(m, axis=1)

array([ 3,  7, 11])

# 6. Pandas

Pandas is another python library which we will be using _a lot!_ It lets us look at data in tabular format and is well integrated with other libraries for plotting, machine learning, etc. (we'll see some of these next week).

In [107]:
import pandas as pd

## 6.1 Dataframes & Series

Pandas puts data into dataframes, which are made up of series. A dataframe is like a table:

In [108]:
df = pd.read_csv("../data/cereal.csv")
type(df)

# here, we're reading in data from a 'csv', or comma-separated value, file

pandas.core.frame.DataFrame

In [109]:
df

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
1,100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.00,33.983679
2,All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
3,All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.50,93.704912
4,Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
72,Triples,G,C,110,2,1,250,0.0,21.0,3,60,25,3,1.0,0.75,39.106174
73,Trix,G,C,110,1,1,140,0.0,13.0,12,25,25,2,1.0,1.00,27.753301
74,Wheat Chex,R,C,100,3,1,230,3.0,17.0,3,115,25,1,1.0,0.67,49.787445
75,Wheaties,G,C,100,3,1,200,3.0,17.0,3,110,25,1,1.0,1.00,51.592193


We can use head(), tail(), or sample() to take a look at the data

In [110]:
# head returns the first 5 rows in the dataframe, tail returns the last 5
df.head()

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
1,100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
2,All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
3,All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
4,Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843


In [111]:
df.sample()

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
16,Corn Flakes,K,C,100,2,0,290,1.0,21.0,2,35,25,1,1.0,1.0,45.863324


Each column is a pandas Series (pd.Series)

In [112]:
df["name"]

0                     100% Bran
1             100% Natural Bran
2                      All-Bran
3     All-Bran with Extra Fiber
4                Almond Delight
                ...            
72                      Triples
73                         Trix
74                   Wheat Chex
75                     Wheaties
76          Wheaties Honey Gold
Name: name, Length: 77, dtype: object

In [113]:
type(df["name"])

pandas.core.series.Series

Series are similar to numpy arrays

In [114]:
df["carbo"].mean()

14.597402597402597

In [115]:
# we can turn pd.Series into a numpy array
df["carbo"].to_numpy()

array([ 5. ,  8. ,  7. ,  8. , 14. , 10.5, 11. , 18. , 15. , 13. , 12. ,
       17. , 13. , 13. , 12. , 22. , 21. , 13. , 12. , 10. , 21. , 21. ,
       11. , 18. , 11. , 14. , 14. , 12. , 14. , 13. , 11. , 15. , 15. ,
       17. , 13. , 12. , 11.5, 14. , 17. , 20. , 21. , 12. , 12. , 16. ,
       16. , 16. , 17. , 15. , 15. , 21. , 18. , 13.5, 11. , 20. , 13. ,
       10. , 14. , -1. , 14. , 10.5, 15. , 23. , 22. , 16. , 19. , 20. ,
        9. , 16. , 15. , 21. , 15. , 16. , 21. , 13. , 17. , 17. , 16. ])

The key difference is that Series are indexed

In [116]:
# See the 0, 1, ... 76 on the left? That is the index of each item.
# Right now they are just positions, but theoretically they can be any identifier for the row

df["carbo"].index

RangeIndex(start=0, stop=77, step=1)

## 6.2 Pandas Indexing

The index in a pandas series/dataframe can by any list of values (row number, ID, time, etc.)

In [117]:
# a range index is just a numeric index
df.index

RangeIndex(start=0, stop=77, step=1)

In [118]:
# see how the leftmost row is now replaced with the cereal names
df_ = df.set_index('name')
df_.head()

Unnamed: 0_level_0,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843


In [119]:
df_.index

Index(['100% Bran', '100% Natural Bran', 'All-Bran',
       'All-Bran with Extra Fiber', 'Almond Delight',
       'Apple Cinnamon Cheerios', 'Apple Jacks', 'Basic 4', 'Bran Chex',
       'Bran Flakes', 'Cap'n'Crunch', 'Cheerios', 'Cinnamon Toast Crunch',
       'Clusters', 'Cocoa Puffs', 'Corn Chex', 'Corn Flakes', 'Corn Pops',
       'Count Chocula', 'Cracklin' Oat Bran', 'Cream of Wheat (Quick)',
       'Crispix', 'Crispy Wheat & Raisins', 'Double Chex', 'Froot Loops',
       'Frosted Flakes', 'Frosted Mini-Wheats',
       'Fruit & Fibre Dates; Walnuts; and Oats', 'Fruitful Bran',
       'Fruity Pebbles', 'Golden Crisp', 'Golden Grahams', 'Grape Nuts Flakes',
       'Grape-Nuts', 'Great Grains Pecan', 'Honey Graham Ohs',
       'Honey Nut Cheerios', 'Honey-comb', 'Just Right Crunchy  Nuggets',
       'Just Right Fruit & Nut', 'Kix', 'Life', 'Lucky Charms', 'Maypo',
       'Muesli Raisins; Dates; & Almonds', 'Muesli Raisins; Peaches; & Pecans',
       'Mueslix Crispy Blend', 'Multi-Gr

Indexing in pandas is a bit different than in built-in Python

`iloc` is used to index by row number in a dataframe

In [120]:
# this returns the first row of the dataframe
df_.iloc[0]

mfr              N
type             C
calories        70
protein          4
fat              1
sodium         130
fiber           10
carbo            5
sugars           6
potass         280
vitamins        25
shelf            3
weight           1
cups          0.33
rating      68.403
Name: 100% Bran, dtype: object

`loc` is used to index by the series/dataframe index

In [121]:
df_.loc['All-Bran']

mfr               K
type              C
calories         70
protein           4
fat               1
sodium          260
fiber             9
carbo             7
sugars            5
potass          320
vitamins         25
shelf             3
weight            1
cups           0.33
rating      59.4255
Name: All-Bran, dtype: object

In [122]:
# multiple indices work
df.iloc[[1, 2, 3]]

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
1,100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
2,All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
3,All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912


We can also use boolean indexing to condionally select data

In [124]:
df[[True] + [False] * 76]

# [True] + [False] * 76 gives us a list that looks like [True, False, ..., False] with 1 True and 76 Falses
# This matches the number of rows in our data (77)
# pandas returns all the rows with a corresponding True (in this case, only the first one)

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973


This is powerful because we can also make comparisons with Series and values

In [125]:
df["protein"] > 3

0      True
1     False
2      True
3      True
4     False
      ...  
72    False
73    False
74    False
75    False
76    False
Name: protein, Length: 77, dtype: bool

Combining these two things, we have a very expressive way of filtering

In [126]:
# This gives us all the rows in which the protein is greater than 3.
df[df["protein"] > 3]

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
2,All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
3,All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
11,Cheerios,G,C,110,6,2,290,2.0,17.0,1,105,25,1,1.0,1.25,50.764999
41,Life,Q,C,100,4,2,150,2.0,12.0,6,95,25,2,1.0,0.67,45.328074
43,Maypo,A,H,100,4,1,0,0.0,16.0,3,95,25,2,1.0,1.0,54.850917
44,Muesli Raisins; Dates; & Almonds,R,C,150,4,3,95,3.0,16.0,11,170,25,3,1.0,1.0,37.136863
45,Muesli Raisins; Peaches; & Pecans,R,C,150,4,3,150,3.0,16.0,11,170,25,3,1.0,1.0,34.139765
56,Quaker Oat Squares,Q,C,100,4,1,135,2.0,14.0,6,110,25,3,1.0,0.5,49.511874
57,Quaker Oatmeal,Q,H,100,5,2,0,2.7,-1.0,-1,110,0,1,1.0,0.67,50.828392


##  6.3 Manipulating Data

Often when we're preprocessing data, we want to make changes to a specific column. We can do this by applying functions.

In [127]:
# Suppose we want to make the cereals more appetizing.
# Let's add "Delicious " to the beginning of every name.

# The pattern is we define a function for a single entry
def make_delicious(name):
    return "Delicious " + name

# and then call apply on the series to apply the function to each element in the series
df["name"].apply(make_delicious)

0                     Delicious 100% Bran
1             Delicious 100% Natural Bran
2                      Delicious All-Bran
3     Delicious All-Bran with Extra Fiber
4                Delicious Almond Delight
                     ...                 
72                      Delicious Triples
73                         Delicious Trix
74                   Delicious Wheat Chex
75                     Delicious Wheaties
76          Delicious Wheaties Honey Gold
Name: name, Length: 77, dtype: object

In [128]:
# this returns the changes, but doesn't apply them in place.
# that means on our original dataframe, the cereals are still bland
df.head()

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
1,100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
2,All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
3,All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
4,Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843


In [129]:
# we can fix this by assigning the new names to the column.
df["name"] = df["name"].apply(make_delicious)
df.head()

Unnamed: 0,name,mfr,type,calories,protein,fat,sodium,fiber,carbo,sugars,potass,vitamins,shelf,weight,cups,rating
0,Delicious 100% Bran,N,C,70,4,1,130,10.0,5.0,6,280,25,3,1.0,0.33,68.402973
1,Delicious 100% Natural Bran,Q,C,120,3,5,15,2.0,8.0,8,135,0,3,1.0,1.0,33.983679
2,Delicious All-Bran,K,C,70,4,1,260,9.0,7.0,5,320,25,3,1.0,0.33,59.425505
3,Delicious All-Bran with Extra Fiber,K,C,50,4,0,140,14.0,8.0,0,330,25,3,1.0,0.5,93.704912
4,Delicious Almond Delight,R,C,110,2,2,200,1.0,14.0,8,-1,25,3,1.0,0.75,34.384843
