# Lecture 4 – Strings and Arrays
## DSC 10, Fall 2022

### Announcements

- Lab 1 is released and is due **Saturday at 11:59PM**.
- Homework 1 is released and is due **Tuesday at 11:59PM**.
    - Finish the lab before you work on the homework!
- Issues with DataHub? [See here.](https://dsc10.com/debugging/#when-i-click-a-link-on-the-course-website-i-see-a-black-screen-with-text-and-a-red-error-bar-that-says-error-undefined-what-should-i-do) Issues with Gradescope? [See here.](https://edstem.org/us/courses/29053/discussion/1827122) Other issues? [Post on EdStem.](https://edstem.org/us/courses/29053/discussion/)

### Agenda

- Strings.
- Lists.
- Arrays.
- Ranges.

### Resources

- We're covering **a lot** of content very quickly. If you're overwhelmed, just know that we're here to support you! 
    - Office hours and EdStem are your friends 🤝.
- Remember to check the [Resources tab of the course website](https://dsc10.com/resources/) for programming resources.

## Strings

### Strings

- A string is a snippet of text of any length.
- In Python, strings are enclosed by either single quotes or double quotes.

In [None]:
'woof'

In [None]:
type('woof')

In [None]:
"woof"

In [None]:
# A string, not an int!
"1998"

### String arithmetic

When using the `+` symbol between two strings, the operation is called "concatenation".

In [None]:
s1 = 'baby'
s2 = '🐼'

In [None]:
s1 + s2

In [None]:
s1 + ' ' + s2

In [None]:
s2 * 3

### String methods
* Strings are associated with certain functions called **string methods**.
* Access string methods with a `.` after the string (dot notation).    
    * For instance, to use the `upper` method on string `s`, we write `s.upper()`.
* Examples include `upper`, `title`, and `replace`.

In [None]:
my_cool_string = 'data science is super cool!'

In [None]:
my_cool_string.title()

In [None]:
my_cool_string.upper()

In [None]:
my_cool_string.replace('super cool', '💯' * 3)

In [None]:
# len is not a method, since it doesn't use dot notation
len(my_cool_string)

### Special characters in strings

Single quotes and double quotes are usually interchangeable, except when the string itself contains a single or double quote.

In [None]:
'my string's full of apostrophes!'

In [None]:
"my string's full of apostrophes!"

In [None]:
# escape the apostrophe with a backslash!
'my string\'s "full" of apostrophes!'

In [None]:
print('my string\'s "full" of apostrophes!')

### Aside: `print`
- By default Jupyter notebooks display the "raw" value of the expression of the last line in a cell.
- The `print` function displays the value in human readable text when it's evaluated.

In [None]:
12 # 12 won't be displayed, since Python only shows the value of the last expression
23

In [None]:
# Note, there is no Out[number] to the left! That only appears when displaying a non-printed value.
# But both 12 and 23 are displayed.
print(12)
print(23)

In [None]:
# '\n' inserts a new line
my_newline_str = 'here is a string with two lines.\nhere is the second line'  
my_newline_str

In [None]:
# The quotes disappeared and the newline is rendered!
print(my_newline_str)  

### Type conversion to and from strings
* Any value can be converted to a string using ```str```.
* Some strings can be converted to ```int``` and ```float```.

In [None]:
str(3)

In [None]:
float('3')

In [None]:
int('4')

In [None]:
int('baby panda')

### Concept Check ✅ – Answer at [cc.dsc10.com](http://cc.dsc10.com) 

Assume you have run the following statements:

```py
x = 3
y = '4'
z = '5.6'
```

Choose the expression that will be evaluated **without** an error.

A. `x + y`

B. `x + int(y + z)`

C. `str(x) + int(y)`

D. `str(x) + z`

E. All of them have errors

## Lists

### Motivation

How would we store the temperatures for each of the first 6 days in the month of September?

Our best solution right now is to create a separate variable for each day.

In [None]:
temperature_on_sept_01 = 84
temperature_on_sept_02 = 78
temperature_on_sept_03 = 81
temperature_on_sept_04 = 75
temperature_on_sept_05 = 79
temperature_on_sept_06 = 75

This _technically_ allows us to do things like compute the average temperature through the first 6 days:

```
avg_temperature = 1/6 * (
    temperature_on_sept_01
    + temperature_on_sept_02
    + temperature_on_sept_03
    + ...)
```

Imagine a whole month's data, or a whole year's data. It seems like we need a better solution.


### Lists in Python

In Python, a list is used to store multiple values in a single value/variable. To create a new list from scratch, we use `[`square brackets`]`.


In [None]:
temperature_list = [84, 78, 81, 75, 79, 75]

In [None]:
len(temperature_list)

Notice that the elements in a list don't need to be unique!

### Lists make working with sequences easy!

To find the average temperature, we just need to divide the **sum of the temperatures** by the **number of temperatures recorded**:

In [None]:
temperature_list

In [None]:
sum(temperature_list) / len(temperature_list)

### Types

The `type` of a list is... `list`.

In [None]:
temperature_list

In [None]:
type(temperature_list)

Within a list, you can store elements of different types.

In [None]:
mixed_list = [-2, 2.5, 'ucsd', [1, 3]]
mixed_list

### There's a problem...

- Lists are **very slow**.
- This is not a big deal when there aren't many entries, but it's a big problem when there are millions or billions of entries.

## Arrays

### NumPy

<center>
<img src='images/numpy.png' width=400>
</center>

- NumPy (pronounced "num pie") is a Python library (module) that provides support for **arrays** and operations on them.

- The `babypandas` library, which you will learn about next week, goes hand-in-hand with NumPy.
    - NumPy is used heavily in the real world.

- To use `numpy`, we need to import it. It's usually imported as `np` (but doesn't have to be!)

In [None]:
import numpy as np

### Arrays

Think of NumPy arrays (just "arrays" from now on) as fancy, faster lists.

<center><img src="images/squid.png" width=30%></center>

To create an array, we pass a list as input to the `np.array` function.

In [None]:
np.array([4, 9, 1, 2])

<center>
<img src='images/brackets.png' width=50%>
</center>

In [None]:
temperature_array = np.array([84, 78, 81, 75, 79, 75])
temperature_array

In [None]:
temperature_list

In [None]:
# No square brackets, because temperature_list is already a list!
np.array(temperature_list)

### Positions

When people stand in a line, each person has a position.

<center><img src="images/position.png" width=50%></center>
    
Similarly, each element of an array (and list) has a position.

### Accessing elements by position

- Python, like most programming languages, is "0-indexed." 
    - This means that the position of the first element in an array is 0, not 1. 
    - One reason: an element's position represents the number of elements in front of it.
- To access the element in array `arr_name` at position `pos`, we use the syntax `arr_name[pos]`.

In [None]:
temperature_array

In [None]:
temperature_array[0]

In [None]:
temperature_array[1]

In [None]:
temperature_array[3]

In [None]:
# Access last element
temperature_array[5]

In [None]:
temperature_array[6]

In [None]:
# If a position is negative, count from the end!
temperature_array[-1]

### Types

Earlier in the lecture, we saw that lists can store elements of multiple types.

In [None]:
nums_and_strings_lst = ['uc', 'sd', 1961, 3.14]
nums_and_strings_lst

**This is not true of arrays – all elements in an array must be of the same type.**

In [None]:
# All elements are converted to strings!
np.array(nums_and_strings_lst)

### Array-number arithmetic

Arrays make it easy to perform the same operation to every element. This behavior is formally known as "broadcasting".

In [None]:
temperature_array

In [None]:
# Increase all temperatures by 3 degrees
temperature_array + 3

In [None]:
# Halve all temperatures
temperature_array / 2

In [None]:
# Convert all temperatures to Celsius
(5 / 9) * (temperature_array - 32)

**Note:** In none of the above cells did we actually modify `temperature_array`! Each of those expressions created a new array.

In [None]:
temperature_array

To actually change `temperature_array`, we need to reassign it to a new array.

In [None]:
temperature_array = (5 / 9) * (temperature_array - 32)

In [None]:
# Now in Celsius!
temperature_array

### Element-wise arithmetic

- We can apply arithmetic operations to multiple arrays, provided they have the same length.
- The result is computed **element-wise**, which means that the arithmetic operation is applied to one pair of elements from each array at a time.
- For example, `a + b` is an array whose first element is the sum of the first element of `a` and first element of `b`.

In [None]:
a = np.array([1, 2, 3])
b = np.array([-4, 5, 9])

In [None]:
a + b

In [None]:
a / b

In [None]:
a ** 2 + b ** 2

### Example: TikTok views 🎬

Baby Panda made a series five TikTok videos called "A Day In the Life of a Data Science Mascot". The number of views they've received on these videos are stored in the array `views` below.

In [None]:
views = np.array([158, 352, 195, 1423916, 46])

Some questions:

What was their average view count?

In [None]:
views

In [None]:
sum(views) / len(views)

In [None]:
# The mean method exists for arrays (but not for lists)
views.mean()

How many views did their most and least popular videos receive?

In [None]:
views

In [None]:
views.max()

In [None]:
views.min()

How many views **above average** did each of their videos receive? How many views above average did their most viewed video receive?

In [None]:
views

In [None]:
views - views.mean()

In [None]:
(views - views.mean()).max()

It has been [estimated](https://www.ngpf.org/blog/question-of-the-day/question-of-the-day-how-much-can-a-creator-on-tiktok-make-if-their-video-receives-1-million-views/) that TikTok pays their creators \\$0.03 per 1000 views. If this is true, how many dollars did Baby Panda earn on their most viewed video?

In [None]:
views

In [None]:
views.max() * 0.03 / 1000

## Ranges

### Motivation

We often find ourselves needing to make arrays like this:

In [None]:
days_in_september = np.array([
    1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 
    13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 
    23, 24, 25, 26, 27, 28, 29, 30
])

There needs to be an easier way to do this!

### Ranges
* A **range** is an array of evenly spaced numbers. We create ranges using `np.arange`.
* The most general way to create a range is `np.arange(start, end, step)`. This returns an array such that:
    - The first number is `start`. **By default, `start` is 0.**
    - All subsequent numbers are spaced out by `step`, until (but excluding) `end`. **By default, `step` is 1.**

In [None]:
# Start at 0, end before 8, step by 1
# This will be our most common use-case!
np.arange(8)

In [None]:
# Start at 5, end before 10, step by 1
np.arange(5, 10)

In [None]:
# Start at 3, end before 32, step by 5
np.arange(3, 32, 5)

In [None]:
# Steps can be fractional!
np.arange(-3, 2, 0.5)

In [None]:
# If step is negative, we count backwards.
np.arange(1, -10, -3)

### Activity

🎉 Congrats! 🎉 You won the lottery 💰. Here's how your payout works: on the first day of September, you are paid \\$0.01. Every day thereafter, your pay doubles, so on the second day you're paid \\$0.02, on the third day you're paid \\$0.04, on the fourth day you're paid \\$0.08, and so on.

September has 30 days.

Write a **one-line expression** that uses the numbers `2` and `30`, along with the function `np.arange` and the method `.sum()`, that computes the total amount **in dollars** you will be paid in September.

In [None]:
...

## Summary, next time

### Summary

- Strings are used to store text. Enclose them in single or double quotes.
- Lists and arrays are used to store **sequences**.
    - Arrays are faster and more convenient for numerical operations.
    - You can easily perform numerical operations on all elements of an array and perform operations on multiple arrays.
- Ranges are arrays of equally-spaced numbers.
- Remember to refer to the resources from the start of lecture!

### Next time

We'll learn about how to use Python to work with real-world tabular data.