---
---
Recitation 3: Numerical Python

Applied Data Science in Python for Social Scientists

New York University, Abu Dhabi

Dated: 12th Sept 2023

---
---
#Start Here
## Learning Goals
### General Goals
- Learn the basics of programming in Python
- Learn to process data
- Learn to represent arrays for efficient computation

### Specific Goals
- Learn the numpy library and some of its basic functions
- Learn to use numpy for different problems

## Distribution of Class Materials
These problem sets and recitations are intellectual property of NYUAD, and we request the students to **not** distribute them or their solutions to other students who have not signed up for this class, and/or intend to sign up in the future. We also request you don't post these problem sets, and recitations online or on any public platforms.

## Submission
You will submit all your code as a Python Notebook through [Brightspace](https://brightspace.nyu.edu/) as **R3_YOUR NETID.ipynb**.

---




# General Instructions
This recitation is worth 50 points. It has 4 parts. All the parts need to be completed in a Jupyter (Colab) Notebook attached with this handout.



# Part I: Evenly-spaced columns (15 points)

Write a function `even_column_array(start_lst, end_lst, r, c)` that takes in 4 arguments `start_lst`,`end_lst`,`r`, and `c`, and returns a two-dimensional numpy array with `r` rows and `c` columns such that each column has evenly spaced values starting from its corresponding index in the `start_lst`, and ending at its corresponding index in the `end_lst`.

For example, if we call the function

 `even_column_array([2,5,9], [100,130,160], 10, 3)`

such that

```
start_lst = [2,5,9]
end_lst = [100,130,160]
r = 10
c = 3
```

Then we want your function to return a two dimensional array with 3 columns, with each column having 10 evenly spaced values, with first column having evenly spaced values from 2 to 100, second column having evenly spaced values from 5 to 130, and third column having evenly spaced values from 9 to 160.

Your output for the above example will hence be the following:

```
array([[  2.        ,   5.        ,   9.        ],
       [ 12.88888889,  18.88888889,  25.77777778],
       [ 23.77777778,  32.77777778,  42.55555556],
       [ 34.66666667,  46.66666667,  59.33333333],
       [ 45.55555556,  60.55555556,  76.11111111],
       [ 56.44444444,  74.44444444,  92.88888889],
       [ 67.33333333,  88.33333333, 109.66666667],
       [ 78.22222222, 102.22222222, 126.44444444],
       [ 89.11111111, 116.11111111, 143.22222222],
       [100.        , 130.        , 160.        ]])
```

Our reference solution is no more than 4 lines of code. The most robust solution is just 1 line of code.

In [34]:
import numpy as np

def even_column_array(start_lst, end_lst, r, c):
    # Write your code below this line
    ######### SOLUTION #########
    # Using the linspace function from numpy to create a 2D array of evenly spaced numbers with r rows
    return np.linspace(start_lst, end_lst, r)[:,:c] # Only return the first c columns


    ######### SOLUTION END #########

# How we will call your function
even_column_array([2,5,9, 5], [100,130,160, 200], 10, 3)

array([[  2.        ,   5.        ,   9.        ],
       [ 12.88888889,  18.88888889,  25.77777778],
       [ 23.77777778,  32.77777778,  42.55555556],
       [ 34.66666667,  46.66666667,  59.33333333],
       [ 45.55555556,  60.55555556,  76.11111111],
       [ 56.44444444,  74.44444444,  92.88888889],
       [ 67.33333333,  88.33333333, 109.66666667],
       [ 78.22222222, 102.22222222, 126.44444444],
       [ 89.11111111, 116.11111111, 143.22222222],
       [100.        , 130.        , 160.        ]])

## Rubric

- +8 points for correctness (using `numpy` wherever necessary to achieve the desired output)
- +5 points for conciseness (no redundant code)
- +2 points for comments

# Part II. Recruiting Experts (15 points)

We would like to recruit students from this class for capstone projects. We have a certain threshold `t` of grade percentage for recruiting students. In this sub-part, you will help us with this task. We have a list of names of `students`, and another corresponding list of `grades` in percentages. We would like to recruit students whose grades are above the given threshold percentage `t`.

Write a function `find_experts()` that takes in two lists `students` and `grades` and another float argument `t`, and only returns a numpy array with names of students who are eligible, and are above the provided threshold.

*Note: We will always test your function with lists that have equal lengths, so don't worry about other cases.*

In [37]:
def find_experts(students, grades, t):
    # Write your code below this line
    ######### SOLUTION METHOD #########
    # Create a numpy array of students and using boolean masking, return students with grades greater than t
    return np.array(students)[np.array(grades) > t]


    ######### SOLUTION END #########

# How we will call your function
# This should return ['shahan', 'anahit'] as a numpy array
find_experts(["shahan", "bedoor", "anahit"], [89,73.1,94], 80)

array(['shahan', 'anahit'], dtype='<U6')

## Rubric

- +8 points for correctness (using `numpy` wherever necessary to achieve the desired output)
- +5 points for conciseness
- +2 points for comments

# Part III: `where` does the `x` occur `n`th time? (10 points)





Your task in this part is to use `np.where` to define a function called `index_of_nth_x(lst_1d, x, n)` that takes in a 1-dimensional python list `lst_1d`, an argument `x`, and an integer `n`, and gives back the index of the `n`th occurrence of `x` in `lst_1d`.

For example, if

```
lst_1d = [1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2]

x = 1

n = 5
```

your function should return `8`.

In [51]:
def index_of_nth_x(lst_1d, x, n):
    # Write your implementation here
    ######### SOLUTION #########
    # Create array of indices where x is found in lst_1d and return the nth (n - 1) index
    return np.where(np.array(lst_1d) == x)[0][n-1] # n - 1 since we are indexing from 0

    ######### SOLUTION END #####

# How we will call your function
lst_1d = [1, 2, 1, 1, 3, 4, 3, 1, 1, 2, 1, 1, 2]
x = 1
n = 5
assert(index_of_nth_x(lst_1d, x, n)==8)

## Rubric

- +4 points for correctness
- +4 points for conciseness
- +2 points for comments

# Part IV: Were you listening closely? (10 points)

In one line of code, only using numpy functions, and without hard-coding, create the following pattern from the given array `A`.

Given
```
A = [1,2,3,4]
```

Create

`[1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,1,2,3,4,1,2,3,4,1,2,3,4]`

You may use any numpy function to do this, however, you don't need to know any other functions aside from the ones mentioned in the lecture. Were you listening closely? :)

In [65]:
A = [1,2,3,4]

# Write your code below this line
######### SOLUTION #########
# Repeat each element of A four times, tile the entire array thrice, and concatenate the two arrays
np.concatenate((np.repeat(A, 4), np.tile(A, 3)))

######### SOLUTION END #########

array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 1, 2, 3, 4, 1, 2,
       3, 4, 1, 2, 3, 4])

## Rubric

- +4 points for correctness
- +4 points for conciseness
- +2 points for comments


# Final Remarks on learning

We have learnt many numpy functions in class and in recitation so far. However, we cannot teach you every numpy function that is out there. In fact this will be true for most of the libraries you learn to use in the class. One of the goals of this course is to introduce you to new tools, and then help you practice to help yourself. We have designed the course, its exercises, and its recitations to reinforce the key concepts taught in the lectures, but also introduce you to new concepts. Therefore, we expect you to look up and be able to read documentations of different libraries, and learn new functions on your own whenever possible. With time, libraries and tools will keep getting better, and it's important for prospective data scientists to be able to learn new concepts whenever necessary.