# Lecture 23

### pandas; Selecting Columns; Selecting Rows; Pause; Column Operations; Broadcasting; Boolean Indexing; More Sophisticated Boolean Operations; Iterating Through Rows; Basic Statistics; Dropping Missing Values With .dropna(), Sorting With .sort_values(), Scatter Plot; Summary; Recursion

# 1. pandas

### * pandas (the name comes from "panel data") is a package that allows you create objects called `DataFrames`, which, roughly speaking, function as spreadsheet tables.    


In [1]:
# EXAMPLE 1a: A pandas DataFrame

import pandas as pd # "pd" is a widely used nick-name for the pandas module.

netflix_df = pd.read_csv('netflix.csv')

print('############################\nnetflix_df.columns:\n############################\n')
print(netflix_df.columns)

print('\n')

print('############################\nnetflix_df.head():\n############################\n')
print(netflix_df.head())

############################
netflix_df.columns:
############################

Index(['Title', 'Genre', 'Tags', 'Languages', 'Series or Movie',
       'Hidden Gem Score', 'Country Availability', 'Runtime', 'Director',
       'Writer', 'Actors', 'View Rating', 'IMDb Score',
       'Rotten Tomatoes Score', 'Metacritic Score', 'Awards Received',
       'Awards Nominated For', 'Boxoffice', 'Release Date',
       'Netflix Release Date', 'Production House', 'Netflix Link', 'IMDb Link',
       'Summary', 'IMDb Votes', 'Image', 'Poster', 'TMDb Trailer',
       'Trailer Site'],
      dtype='object')


############################
netflix_df.head():
############################

                 Title                                   Genre  \
0     Lets Fight Ghost  Crime, Drama, Fantasy, Horror, Romance   
1  HOW TO BUILD A GIRL                                  Comedy   
2           Centigrade                         Drama, Thriller   
3                ANNE+                                   D

In [3]:
# EXAMPLE 1b: Printing the DataFrame

import pandas as pd
netflix_df = pd.read_csv('netflix.csv')

print(netflix_df)

                                           Title  \
0                               Lets Fight Ghost   
1                            HOW TO BUILD A GIRL   
2                                     Centigrade   
3                                          ANNE+   
4                                          Moxie   
...                                          ...   
15475                     K-POP Extreme Survival   
15476            DreamWorks Shreks Swamp Stories   
15477  DreamWorks Happy Holidays from Madagascar   
15478                DreamWorks Holiday Classics   
15479   DreamWorks Kung Fu Panda Awesome Secrets   

                                        Genre  \
0      Crime, Drama, Fantasy, Horror, Romance   
1                                      Comedy   
2                             Drama, Thriller   
3                                       Drama   
4                     Animation, Short, Drama   
...                                       ...   
15475                           

### * So:

### --- `import pandas as pd` loads the pandas module.
### --- `pd.read_csv()` produces a `DataFrame` object from a Comma Separated Value (or CSV) spreadsheet, the simplest way to create a `DataFrame`
### --- A `DataFrame` is essentially a table.  It has many rows, each with their own row index (which by default is 0-based numbering), and many columns, each with their own name.  Every row has a value for each column.  When you print a `DataFrame`, it shows it as a truncated table.
### --- If `df` is the name of a `DataFrame`, then`df.columns` would provide the column names of this `DataFrame`. (When you load a spreadsheet into a `DataFrame`, pandas is smart enough to figure out that the top row is probably column names, rather than data!)
### --- If you want to look at the first 5 rows of `df` just to see what the data "looks like", use `df.head()`.


<br><br><br><br><br>
<br><br><br><br><br>


# 2. Selecting Columns with Dictionary-like Syntax

### * Think of a `DataFrame` as being almost like a dictionary: *keys* would be the column names, and the *values* would be the list of values in each column.  (Actually, the proper pandas name for the "list" of values is `Series`.) 


In [5]:
# EXAMPLE 2a: Dictionary-like syntax

import pandas as pd
netflix_df = pd.read_csv('netflix.csv')

# We're treating netflix_df like a dictionary here.
# The key is a column name, the value is the column itself.
one_col = netflix_df['Title']
print(one_col)

# Note that when you print out the names, you the corresponding row indices are also displayed.

0                                 Lets Fight Ghost
1                              HOW TO BUILD A GIRL
2                                       Centigrade
3                                            ANNE+
4                                            Moxie
                           ...                    
15475                       K-POP Extreme Survival
15476              DreamWorks Shreks Swamp Stories
15477    DreamWorks Happy Holidays from Madagascar
15478                  DreamWorks Holiday Classics
15479     DreamWorks Kung Fu Panda Awesome Secrets
Name: Title, Length: 15480, dtype: object


In [7]:
# EXAMPLE 2b: More dictionary-like syntax

import pandas as pd
netflix_df = pd.read_csv('netflix.csv')

# Here's something you CAN'T do with dictionaries:
# instead of providing ONE key, you can provide a LIST of them.
# (This of course makes perfect sense for tables.)

smaller_df = netflix_df[ ['Title', 'Director', 'IMDb Score'] ]
print(smaller_df)

                                           Title         Director  IMDb Score
0                               Lets Fight Ghost  Tomas Alfredson         7.9
1                            HOW TO BUILD A GIRL    Coky Giedroyc         5.8
2                                     Centigrade    Brendan Walsh         4.3
3                                          ANNE+              NaN         6.5
4                                          Moxie    Stephen Irwin         6.3
...                                          ...              ...         ...
15475                     K-POP Extreme Survival              NaN         NaN
15476            DreamWorks Shreks Swamp Stories              NaN         NaN
15477  DreamWorks Happy Holidays from Madagascar              NaN         6.8
15478                DreamWorks Holiday Classics              NaN         6.4
15479   DreamWorks Kung Fu Panda Awesome Secrets              NaN         6.2

[15480 rows x 3 columns]


### * If `df` is a `DataFrame`:

### --- `df[ <column_name> ]` produces a new smaller `DataFrame`: one that contains only one column, with the given name.  Because this is a `DataFrame`, it also still has row indices.
### --- `df[ <list> ]` also produces a smaller `DataFrame`, which contains all the columns whose names are in `<list>`.

<br><br><br><br><br>
<br><br><br><br><br>




# 3. Selecting Rows with `.iloc[]`

### * If you want to select a subset of rows from a `DataFrame`, we'll use `.iloc[]` (short for "index location"). 


In [9]:
# EXAMPLE 3a: .iloc[] and .at[]

import pandas as pd
netflix_df = pd.read_csv('netflix.csv')

# This is how you select just row 5540 (actually, the sixth row)
print('############################\nRow 5540:\n############################\n')
print(netflix_df.iloc[5540])   # Note the SQUARE brackets: .iloc isn't a function!


# And this is how you select several rows
print('\n############################\nRows 840-850:\n############################\n')
print(netflix_df.iloc[840:850])

############################
Row 5540:
############################

Title                                                           Its Bruno!
Genre                                                               Comedy
Tags                                   TV Comedies,US TV Shows,US TV Shows
Languages                                                          English
Series or Movie                                                     Series
Hidden Gem Score                                                       7.5
Country Availability     India,Lithuania,Portugal,Brazil,Israel,Argenti...
Runtime                                                       < 30 minutes
Director                                                               NaN
Writer                                                         Solvan Naim
Actors                     Solvan Naim, Donnell Rawlings, Bruno, Sam Eliad
View Rating                                                          TV-MA
IMDb Score                     

### * If `df` is a DataFrame, 

### --- `df.iloc[]` can be used to produce a new DataFrame which contains an individual row, or containing a range of rows (via slicing).  You can also combine this with column selection, as before.

<br><br><br><br><br>
<br><br><br><br><br>


# 4.  Pause: Summary, Practice

### * `import pandas as pd` loads the pandas module.
### * `pd.read_csv()` produces a `DataFrame` object from a Comma Separated Value (or CSV) spreadsheet.  
### * `df.columns` would provide the column names of this `df`.
### * `df.head()` provides the first 5 rows of `df`.
### * `df[ <column_name> ]` produces a new smaller `DataFrame`: one that contains only one column, with the given name.  
### * `df[ <list> ]` also produces a smaller `DataFrame`, which contains all the columns whose names are in `<list>`.
### * `df.iloc[]` can be used to produce a new DataFrame which contains an individual row, or containing a range of rows (via slicing).  You can also combine this with column selection.



In [19]:
# EXAMPLE 4a: Basic .iloc[] practice

# How would you print just the Title, Director, and IMDb Score columns for just rows 1230 - 1240?

import pandas as pd
df = pd.read_csv('netflix.csv')
sdf = df.iloc[1230:1240]
print(sdf[['Title','Director','IMDb Score']])




                                    Title           Director  IMDb Score
1230                                Jurek  Pawel Wysoczanski         7.4
1231                         Campo Grande       Sandra Kogut         6.3
1232                   Pagpag: Nine Lives      Frasco Mortiz         5.5
1233                           Rogue City    Olivier Marchal         6.1
1234  Guillermo Vilas: Settling the Score   Matías Gueilburt         7.1
1235                        Blood of Zeus                NaN         7.6
1236          Secrets of the Saqqara Tomb       James Tovell         7.3
1237                          Kaali Khuhi    Terrie Samundra         3.4
1238                            His House        Remi Weekes         6.5
1239                             Holidate                NaN         6.9


<br><br><br><br><br>
<br><br><br><br><br>

# 5. Column Operations

### * pandas allows you to perform operations on columns in a couple of different ways.  

### * If you want to add or subtract or multiply or divide two columns, then pandas will perform those operations item-by-item, and produce a new column from them (if the two columns aren't the same size, funny things can happen).


In [21]:
# EXAMPLE 5a: Column operations

import pandas as pd
n_df = pd.read_csv('netflix.csv')

# Here is a basic column operation: subtracting two columns, to produce a new
# column. (Obviously, ordinary Python lists don't behave this way!!!!)

new_col = n_df['Rotten Tomatoes Score'] - n_df['Metacritic Score']

print(new_col)

0        16.0
1        10.0
2         NaN
3         NaN
4         NaN
         ... 
15475     NaN
15476     NaN
15477     NaN
15478     NaN
15479     NaN
Length: 15480, dtype: float64


<br><br><br><br><br>
<br><br><br><br><br>


# 6. Broadcasting

### * You can also perform an operation between a column and a single value. pandas will interpret such an expression to mean that you want to perform that operation between your single value and *each individual entry* of the column, to form a new column.  

### * This is called *broadcasting*.

In [24]:
# EXAMPLE 6a: More column operations

import pandas as pd
n_df = pd.read_csv('netflix.csv')

imdb_col = 10 * n_df['IMDb Score']

# n_df['IMDb Score'] is a column, whereas 10 is just a plain old number.  
# However, pandas is built to deal with operations like this.
# Whenever it sees an expression like "column times number", it multiplies the number by 
# EACH ENTRY of the column. This is called BROADCASTING.
print(imdb_col)


0        79.0
1        58.0
2        43.0
3        65.0
4        63.0
         ... 
15475     NaN
15476     NaN
15477    68.0
15478    64.0
15479    62.0
Name: IMDb Score, Length: 15480, dtype: float64


In [28]:
# EXAMPLE 6b: Boolean broadcasting

import pandas as pd
n_df = pd.read_csv('netflix.csv')

# As we'll see, broadcasting is particularly useful for Boolean operations.
# The following, for example, creates a column which contains True and False values.

my_bool_col = n_df['IMDb Score'] > 8.0
print(my_bool_col)


0        False
1        False
2        False
3        False
4        False
         ...  
15475    False
15476    False
15477    False
15478    False
15479    False
Name: IMDb Score, Length: 15480, dtype: bool


<br><br><br><br><br>
<br><br><br><br><br>

# 7. Boolean Indexing

### * Suppose that you have a `DataFrame` called `df`, and that you also have a column `b` of Boolean values of the same length.   Then `df[b]` will produce a new `DataFrame` with fewer rows: specifically, it will keep all the rows of `df` where `b` has the value `True`, and throw out all the values of `df` where `b` has the value `False`.  

### * This is called *Boolean indexing*.

### * Note: before, when we wrote something like `df[ ]`, we were selecting **columns**, whereas now we're using `df[  ]` to select **rows**.  

### * The difference is what goes inside those square brackets: pandas is smart enough to know that if there is a list of strings, it's selecting columns, and if there is a pandas column containing Boolean values, it's selecting rows.

In [44]:
# EXAMPLE 7a: Boolean indexing with DataFrames

import pandas as pd
n_df = pd.read_csv('netflix.csv')

# First, identify the movie with an IMDb Score of at least 8

my_bool_col = n_df['IMDb Score'] >= 8.0
n_df = n_df[my_bool_col]
print(n_df)

# Now, select the movies




                                       Title  \
15                                     Joker   
17                          Harrys Daughters   
27             Comrades: Almost a Love Story   
33           When a Woman Ascends the Stairs   
34                                  Yearning   
...                                      ...   
15409                         House of Cards   
15429  The Lord of the Rings: The Two Towers   
15444                 Hunter X Hunter (2011)   
15451                          Stargate SG-1   
15454                         Inazuma Eleven   

                                                   Genre  \
15                                Crime, Drama, Thriller   
17                    Adventure, Drama, Fantasy, Mystery   
27                                        Drama, Romance   
33                                                 Drama   
34                                                 Drama   
...                                                  ...   
154

In [46]:
# EXAMPLE 7b: Let's get the names of the movies that have at least 100000 votes on IMDb

# The desired column name is 'IMDb Votes'

import pandas as pd
n_df = pd.read_csv('netflix.csv')

# The desired column name is 'IMDb Votes'

i_df = n_df['IMDb Votes'] >= 100000

# Now, get the titles of the selected movies

print(n_df[i_df])


                                          Title  \
0                              Lets Fight Ghost   
15                                        Joker   
16                                            I   
17                             Harrys Daughters   
69                                  Stand by Me   
...                                         ...   
15453                              3:10 to Yuma   
15456                            50 First Dates   
15457                                        21   
15460  The Twilight Saga: Breaking Dawn: Part 1   
15465                            13 Going on 30   

                                              Genre  \
0            Crime, Drama, Fantasy, Horror, Romance   
15                           Crime, Drama, Thriller   
16               Action, Adventure, Fantasy, Sci-Fi   
17               Adventure, Drama, Fantasy, Mystery   
69                                 Adventure, Drama   
...                                             ...   
15

<br><br><br><br><br>
<br><br><br><br><br>

# 8. More Sophisticated Boolean Operations: Watch Out!

### * If you want to perform more sophisticated Boolean operations involving multiple columns, you can, but there are a couple of sticking points:

### --- Use parentheses LIBERALLY. 

### --- `and` doesn't work for pandas columns -- use `&` in its place.  Likewise, use `|` in place of `or`, and `~` in place of `not`.

In [1]:
# EXAMPLE 8a: Composite Boolean indexing

import pandas as pd
n_df = pd.read_csv('netflix.csv')


# Let's get the movies that have high IMDb ratings AND high votes

# This feels like it should work.  But it doesn't. 
# #(So, COMMENT IT OUT!)
# select = n_df['IMDb Score'] > 8.0 and n_df['IMDb Votes'] > 100000

# # Instead, you should use the AMPERSAND, &, instead. But, this doesn't work either.
# #(So, COMMENT THIS OUT TOO!)
# select = n_df['IMDb Score'] > 8.0 & n_df['IMDb Votes'] > 100000

# FINALLY: put both individual expressions in parentheses.
select = (n_df['IMDb Score'] > 8.0) & (n_df['IMDb Votes'] > 100000)


# Now, we can use this column to select the best movies.  
print(n_df[select][ ['Title', 'IMDb Score'] ])   


ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

<br><br><br><br><br>
<br><br><br><br><br>

# 9. Iterating Through DataFrames

### * Looping through `DataFrame`s is discouraged.  That being said, given that I'm giving a limited introduction to pandas, perhaps you should be aware of how to loop through a `DataFrame`.

### * Tempting:

In [None]:
for row in df:
    <do something with <row>>

### * But that probably doesn't do what you want.

In [3]:
# EXAMPLE 9a: Looping through a DataFrame gives column names :(

import pandas as pd
n_df = pd.read_csv('netflix.csv')

for r in n_df:
    print('*', r)
    
# It's going to just print the names of the columns.  That could be useful,
# but it sure wasn't what I was expecting.

* Title
* Genre
* Tags
* Languages
* Series or Movie
* Hidden Gem Score
* Country Availability
* Runtime
* Director
* Writer
* Actors
* View Rating
* IMDb Score
* Rotten Tomatoes Score
* Metacritic Score
* Awards Received
* Awards Nominated For
* Boxoffice
* Release Date
* Netflix Release Date
* Production House
* Netflix Link
* IMDb Link
* Summary
* IMDb Votes
* Image
* Poster
* TMDb Trailer
* Trailer Site


<br><br><br><br><br>
<br><br><br><br><br>

### * So, how can I loop through the rows of a `DataFrame`?  Preferred way is to use `.iterrows()`.


### * If `df` is a `DataFrame`, then

In [None]:
for idx, r in df.iterrows():

### will loop through the rows of `df`.  For each iteration of the loop, `idx` will be assigned to be the *index* of one row after another, and `r` will be assigned to be the corresponding *row itself*.  

### * If you are interested not in entire rows, but just some specific columns in each row, you can write `r[<col name>]` to retreive a specific entry in the row `r`.

In [5]:
# EXAMPLE 9b: .iterrows()

import pandas as pd
n_df = pd.read_csv('netflix.csv')

shorter_df = n_df[ (n_df['IMDb Score'] > 8.0) & (n_df['IMDb Votes'] > 100000) ]

for idx, r in shorter_df.iterrows():
    print('Row', idx, ':', r['Title']) # idx will be the row index
    


Row 15 : Joker
Row 17 : Harrys Daughters
Row 69 : Stand by Me
Row 294 : The Hunt
Row 420 : The Elephant Man
Row 696 : Me & You vs The World
Row 864 : Fiction.
Row 1138 : Girls Revenge
Row 1208 : Homeland
Row 1217 : 1917
Row 1256 : Rebecca
Row 1337 : Spotlight
Row 1358 : Aliens
Row 1594 : Parasite
Row 1640 : Cobra Kai
Row 1721 : Joker
Row 1754 : Grave
Row 1903 : Downton Abbey
Row 2789 : 2 дня
Row 2840 : Честное пионерское 2
Row 2843 : Честное пионерское 3
Row 3265 : The Kid
Row 3359 : The Gold Rush
Row 3362 : City Lights
Row 3441 : The 400 Blows
Row 3632 : Howl’s Moving Castle
Row 3811 : Nausicaä of the Valley of the Wind
Row 4230 : The Witcher
Row 4424 : Gods Little Village
Row 4516 : Klaus
Row 4602 : Three Billboards Outside Ebbing, Missouri
Row 4868 : Green Book
Row 5340 : Spider-Man: Into the Spider-Verse
Row 6364 : Sex Education
Row 6634 : Blade Runner: The Final Cut
Row 6834 : Death Note
Row 6858 : Alien
Row 6981 : The Haunting of Hill House
Row 8258 : Aliens
Row 8645 : Monty Pyth

In [16]:
# EXAMPLE 9c: Genres

import pandas as pd
n_df = pd.read_csv('netflix.csv')

shorter_df = n_df[ (n_df['IMDb Score'] > 8.0) & (n_df['IMDb Votes'] > 100000) & (n_df['Genre'] == 'Comedy')]


# Let's identify the best movies in a given genre

hmovie = ''
hscore = 0

for idx,r in shorter_df.iterrows():
    if r['IMDb Score'] > hscore:
        hmovie = r['Title']
        hscore = r['IMDb Score']

print(hmovie,hscore)
        

The Office (U.S.) 8.9


<br><br><br><br><br>

# 10. Basic Statistics

### --- `df.describe()` and `df.corr()` give basic statistical info about the columns (the latter provides correlations between columns). 

In [18]:
# EXAMPLE 10a: .describe()

import pandas as pd
n_df = pd.read_csv('netflix.csv')

n_df.describe()


Unnamed: 0,Hidden Gem Score,IMDb Score,Rotten Tomatoes Score,Metacritic Score,Awards Received,Awards Nominated For,IMDb Votes
count,13379.0,13381.0,6382.0,4336.0,6075.0,7661.0,13379.0
mean,5.937551,6.496054,59.523034,56.813653,8.764444,13.983161,42728.41
std,2.250202,1.14691,26.999173,17.582545,18.311171,29.821052,125701.2
min,0.6,1.0,0.0,5.0,1.0,1.0,5.0
25%,3.8,5.8,38.0,44.0,1.0,2.0,403.5
50%,6.8,6.6,64.0,57.0,3.0,5.0,2322.0
75%,7.9,7.3,83.0,70.0,8.0,12.0,20890.5
max,9.8,9.7,100.0,100.0,300.0,386.0,2354197.0


In [27]:
# EXAMPLE 10b: .corr()

import pandas as pd
n_df = pd.read_csv('netflix.csv')

n_df.corr()

ValueError: could not convert string to float: 'Lets Fight Ghost'

<br><br><br><br><br>
<br><br><br><br><br>


# 11. Dropping Missing Values With .dropna(), Sorting With .sort_values(), Scatter Plot





In [None]:
# EXAMPLE 11a: Dropping Missing Values With .dropna

import pandas as pd
n_df = pd.read_csv('netflix.csv')

# This operation removes all entries where one of the corresponding columns is missing.
scores = n_df.dropna(subset = ['IMDb Score', 'Rotten Tomatoes Score', 'Metacritic Score'])

print(scores)

In [None]:
# EXAMPLE 11b: Sorting Values

import pandas as pd
n_df = pd.read_csv('netflix.csv')
scores = n_df.dropna(subset = ['IMDb Score', 'Rotten Tomatoes Score', 'Metacritic Score'])

# This sorts by Rotten Tomatoes Scores, then resolves ties by sorting by Metacritic Score
rtscores = scores.sort_values(by = ['Rotten Tomatoes Score', 'Metacritic Score'], ascending = False)


print(rtscores[['Title', 'Rotten Tomatoes Score', 'Metacritic Score', 'Genre']])

In [None]:
# EXAMPLE 11c: Plot

import pandas as pd
n_df = pd.read_csv('netflix.csv')
scores = n_df.dropna(subset = ['IMDb Score', 'Rotten Tomatoes Score', 'Metacritic Score'])

scores.plot.scatter('Rotten Tomatoes Score', 'Metacritic Score')

# What's that outlier?


In [None]:
# EXAMPLE 11d: Find the outlier

import pandas as pd
n_df = pd.read_csv('netflix.csv')
scores = n_df.dropna(subset = ['IMDb Score', 'Rotten Tomatoes Score', 'Metacritic Score'])

# FIND THE OUTLIER!

<br><br><br><br><br>
<br><br><br><br><br>

# 12. Final Summary

### * Tons more to say: you can create `DataFrame`s from other sources than CSV spreadsheets; you can create `DataFrame`s from scratch in Python, and then export them to spreadsheets; you can edit entries in a `DataFrame`, and add or delete columns; you can sort, reindex, perform simple statistics, etc. 


### * Look at *Python for Data Analysis*, or the page http://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html, as starting points for further study.

### * Final summary (where `df` always denotes some `DataFrame` object):

### --- `import pandas as pd` loads the pandas module.
### --- `pd.read_csv()` produces a `DataFrame` object from a Comma Separated Value (or CSV) spreadsheet.  
### --- `df.columns` would provide the column names of this `df`.
### --- `df.head()` provides the first 5 rows of `df`.
### --- `df[ <column_name> ]` produces a new smaller `DataFrame`: one that contains only one column, with the given name.  
### --- `df[ <list> ]` also produces a smaller `DataFrame`, which contains all the columns whose names are in `<list>`.
### --- `df.iloc[]` can be used to produce a new DataFrame which contains an individual row, or containing a range of rows (via slicing).  You can also combine this with column selection.
### --- You can perform column operations to create new columns.
### --- If you create a Boolean column called `b` (of the same si, then `df[b]` will select the rows from `df` where `b` has the value `True`.
### --- `for idx, r in df.iterrows():` will iterate through the rows of `df`: for each row in `df`, `idx` will be assigned the row index, and `r` will be assigned the row itself.
### --- `df.describe()` and `df.corr()` give basic statistical info about the DataFrame.
### --- `df.dropna(subset = ['Col name 1', 'Col name 2'])` produces a dataframe where rows with missing values in the given columns are dropped.
### --- `df.sort_values(by = ['Col name 1', 'Col name 2'])` sorts the dataframe, first on one column, then resolving ties by sorting on a second column.
### --- `df.plot.scatter('Col name 1', 'Col name 2')` produces a scatter plot of your data set.



<br><br><br><br><br><br><br><br><br><br>

# 13. Recursion

### * Recall that $n!$, read as n factorial, is an important mathematical operation.  How can we implement it as a function?  Well, a loop's not too hard.


In [None]:
# EXAMPLE 13a: Factorial function

def factorial(n):
    """Return n factorial."""
    product = 1
    # Multiply product by each number between 1 and n. 
    for factor in range (1, n+1):
        product *= factor
    return product

print(factorial(5), ' (should be 120)')
print(factorial(42), ' (should be...big)')
print(factorial(0), ' (0! = 1 by definition)')


<br><br><br><br><br>
<br><br><br><br><br>

### * $n!$ can ALSO be defined by:

* $0! = 1$
* $n! = n \cdot (n-1)!$ if $n \geq 1$

### * For example, $5! = 5\cdot 4 \cdot 3 \cdot 2 \cdot 1 = 5 \cdot (4 \cdot 3 \cdot 2 \cdot 1) = 5\cdot 4!$.  

### * Can you use a function to compute the value of *that same* function?  Let's give it a try!

In [None]:
# EXAMPLE 1b: Recursive factorial

def rfact(n):
    """Return n factorial, computed recursively."""
    if n <= 1:
        return 1
    else:
        return n * rfact(n-1)
    
print(rfact(4))


### * This is an example of a **recursive function**: this simply means a function that makes a call to itself.  

### * Recursive functions will have some simple base cases (like $1! = 1$) that can be computed without recurvise calls.  

### * Furthermore, an effective recursive function will frequently find the answer to a "big" problem using calls to the same function applied to a "smaller" problem.  (For example, to compute $10!$, first we solve the smaller problem $9!$, and multiply the answer by 10.)


