<font color='darkred'> Unless otherwise noted, **this notebook will not be reviewed or autograded.**</font> You are welcome to use it for scratchwork, but **only the files listed in the exercises will be checked.**

---

# Exercises

For these exercises, add your functions to the *apputil\.py* file. If you like, you're welcome to adjust the *app\.py* file, but it is not required.

## Notes on Recursion

A [recursive function](https://www.w3schools.com/python/gloss_python_function_recursion.asp) is one which calls itself.

1. When the function is called, your CPU runs through each line of code until the function needs to be called again.
2. At that point, all variables are saved in memory, and the function runs through each line of code again until the function is called (again, but with a different passed argument), and so on.
3. Eventually, this process will stop at the "bottom of the **stack**", where the function doesn't get a chance to call itself again (likely because of some condition un/met by the latest passed argument).
4. Then, your CPU will work its way back up the stack to the final result. For example, take a look at [this visual example](https://realpython.com/python-recursion/#calculate-factorial) of calculating 4!.

When you write these functions, keep two things in mind:

- You will need a built-in stopping point (i.e., the "bottom"), where your function returns some result before it calls itself.
- **Don't think too hard about this.** Recursion can be perplexing to conceptualize when writing the code. So, when you call the function inside the function, think about it as a magical "hidden" function that has already done what you want it to do.
- [Python Tutor](https://pythontutor.com/) ([editor](https://pythontutor.com/visualize.html#mode=edit)) can be a helpful resource for this exercise!

## Exercise 1

The Fibonacci Series starts with 0 and 1. Each of the following numbers are the sum of the previous two numbers in the series:

`0 1 1 2 3 5 8 13 21 34 ...`

So, `fib(9) = 34`.

Write a recursive function (`fib`) that, given `n`, will return the `n`th number of the Fibonacci Series.

*Test your function using Google or any other tool that can calculate the Fibonacci Series.*

In [None]:
"""
This funtion is a recursive function that will return the nth number of fibonacci series for a given n
Parameters:
 n: It is the position of a number in fibonacci series, it is a non-negative integer.
Returns:
 This fuction will return the nth number of fibonacci series for a given n
"""
def fib_rec(n):
    # This is the base case to exit the recursive function. 
    if n<=0:
        return 0
    elif n == 1:
        return 1
    else:
        return fib_rec(n-1) + fib_rec(n-2)
    # This is the function to call the recursive function.
def fib(n):
    fibonacci = [fib_rec(i) for i in  range(n+1)]
    print(fibonacci)
    return(fibonacci[n])

In [32]:
print(fib(9))

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
34



## Exercise 2

Write a (single) recursive function, `to_binary()`, that [converts](https://en.wikipedia.org/wiki/Binary_number#Conversion_to_and_from_other_numeral_systems) an integer into its [binary](https://en.wikipedia.org/wiki/Binary_number) representation. So, for example:

```python
to_binary(2)   -->  10
to_binary(12)  -->  1100
```

*Note: you can test your function with the built in `bin()` function.*

In [None]:
"""
This is a recursive funtion that converts an integer into its binary representation.
Parameters:
 n: It is a non-negative integer to convert into binary representation.
Returns:
 This fuction will return binary representation of a given number n 
"""
def to_binary(n):
    # This is the base case to exit the recursive function.
    if n == 0:
        return '0'
    if n == 1:
        return '1'
    # This divides the number by 2 and returns the remainder until the number becomes 0.
    return to_binary(n // 2) + str(n % 2)

In [10]:
to_binary(2)

'10'

In [12]:
to_binary(12)


'1100'

## Exercise 3 

Use the raw Bellevue Almshouse Dataset (`df_bellevue`) extracted at the top of the lab (i.e., with `pd.read_csv ...`).

**Write a function for each of the following tasks. Name these functions `task_i()`** (i.e., without any input arguments).

1. Return a list of all column names, *sorted* such that the first column has the *least* missing values, and the last column has the *most* missing values (use the raw column names).
   - *Note: there is an issue with the `gender` column you'll need to remedy first ...*
2. Return a **data frame** with two columns:
   - the year (for each year in the data), `year`
   - the total number of entries (immigrant admissions) for each year, `total_admissions`
3. Return a **series** with:
   - Index: gender (for each gender in the data)
   - Values: the average age for the indexed gender.
4. Return a list of the 5 most common professions *in order of prevalence* (so, the most common is first).

For each of these, if there are messy data issues, use the `print` statement to explain.


In [None]:
"""
This funtion returns list of all columns sorted in ascending order of least missing values. 
"""
import pandas as pd
import numpy as np

url = 'https://github.com/melaniewalsh/Intro-Cultural-Analytics/raw/master/book/data/bellevue_almshouse_modified.csv'
#This creates a dataframe from the csv file and read it.
df_bellevue = pd.read_csv(url)
#This replaces the null values with nan values in the dataframe.
df_bellevue.replace("", np.nan, inplace=True)

#this function returns list of all columns sorted in ascending order of least missing values.
def task_1():
    temp_df = df_bellevue

    # cleaning gender column
    temp_df['gender'] = temp_df['gender'].replace(['?', 'g', 'h'], np.nan)

    #counting null values in each column
    column_null_count = temp_df.isnull().sum().to_dict()
    
    #sorting the columns based on null values in ascending order
    sorted_column_null_count = dict(sorted(column_null_count.items(), key=lambda item: item[1]))
    print(sorted_column_null_count)
    
    column_list = list(sorted_column_null_count.keys())
    return column_list[::-1]

print(task_1())

    

{'date_in': 0, 'last_name': 0, 'first_name': 4, 'gender': 5, 'age': 50, 'profession': 1019, 'disease': 3087, 'children': 9547}
['date_in', 'last_name', 'first_name', 'gender', 'age', 'profession', 'disease', 'children']


In [None]:
"""
This funtion returns two columns "year" and "total admissons" correstponding to each year. 
"""
def task_2():
    #df_immigrant_data = df_bellevue[df_bellevue['disease'] ==  'recent emigrant']
    #print(df_immigrant_data)
    df_result = pd.DataFrame(columns=['year', 'total_admissions'])
    #df_filtered_data = df_bellevue.dropna(subset=['disease'])
    df_result['year'] = pd.to_datetime(df_bellevue['date_in'])
    df_result['year'] = df_result['year'].dt.year

    total_admissions = df_result.groupby('year').size().reset_index(name='total_admissions')
    return total_admissions

task_2()

Unnamed: 0,year,total_admissions
0,1846,3073
1,1847,6511


In [None]:
"""
This function returns a series with each gender and average age corresponding to each gender. 
"""
def task_3():
    average_age_by_gender = df_bellevue.groupby('gender')['age'].mean().dropna()
    return average_age_by_gender
task_3()

gender
g    59.000000
h    56.000000
m    31.813433
w    28.725162
Name: age, dtype: float64

In [None]:
"""
This function returns a series with each gender with top 5 professions corresponding to each gender. 
"""
def task_4():
    top_5_professions = df_bellevue['profession'].value_counts().head(5).index.tolist()
    return top_5_professions

task_3()

gender
g    59.000000
h    56.000000
m    31.813433
w    28.725162
Name: age, dtype: float64

In [7]:
def to_binary(n):
    if n == 0:
        return '0'
    if n == 1:
        return '1'
    return to_binary(n // 2) + str(n % 2)

print(to_binary(12))

1100
