# Lesson 2 - Intro to Python strings and lists

# Reading `str` data from a file

## Basic recipe

This is not the _most_ basic way but it is the _cleanest_ most basic way.

```python
file_name = "my_file.txt" # Same dir as notebook file
with open(file_name, 'r') as file:
    file_data = file.read()
```

Once that code has run your variable `file_data` is now a string representing all of the text in the file.

You can see the contents of the file data by doing:
```python
print(file_data)
```

## Understanding the recipe

### 1. Set your file name as a str

```python
file_name = "my_file.txt"
```

### 2. The `open()` function

The function we are using is called, `open`. The function, `open()` takes one mandatory argument and several optional arguments. For now, we will use the one mandatory argument, `file`, and one of the optional arguments, `mode`.
* `file` -> This is your file name. Can be a `str` or "path-like object" (more next week)
* `mode` -> There are four basic modes:
    * `'r'` - Reading (text) mode
    * `'w'` - Writing (text) mode
    * `'rb'` - Reading binary mode
    * `'wb'` - Writing binary mode
    * We will just be using `'r'` and/or `'w'`. If you do not put a `mode` argument, Python will assume it is `'r'`

### 3. Using `with ___ as ___:`

Opening a file like this:
```python
with open(file_name, 'r') as file:
    file_data = file.read()
```

Is roughly equivalent to doing this:

```python
file = open(file_name, 'r')
file_data = file.read()
file.close()
```

# Working with `str` data: Words and Text

Strings (`str`) types are one of the most common data types to use in Python. Most data that we read from external sources will be "parsed" (processed and interpreted) by python as strings of characters, which python calls `str`.

We can make strings by putting quotes around **anything**.

```python
a = "This is a string" # You can use double quotes
b = 'This is a string, too' # Or single quotes
```

These are also strings:
```python
c = "34.5" # This is a str, not a float
d = "28" # This is a str, not an int
e = 'print(2 + 4**3)' # This is a str, not a function call
```

Strings have some special characters that are known as "escaped" characters. Escaped characters start with a backslash `\`.

* `'\n'`: New line character
* `'\t'`: Tab character
* `'\r'`: Carriage return character (brings the "cursor" to the "beginning of the line")
* `'\b'`: Backspace

To actually write a backslash in your string, you have to "escape" the backslash: `"\\"`

## Representations vs Renderings

With `str`, and other kinds of objects, you can see it in one of two ways:
1. The object's "representation" or ("repr")
2. The object's "rendering"

```python
my_str = "This is a string\nsplit over\nthree lines"
my_str
```

vs.

```python
print(my_str)
```

* The "repr" is viewable whenever you "inspect" an object on the command line or when you use the `repr()` function.
* The "rendering", for a `str`, is viewable whenever you use `print()` or when you write the string to a file.

## A brief note about string encoding

Nowadays, we expect computers to be able to work with ANY character set. The old Latin encodings of ANSI and ASCII are not capable of this so the encoding `UTF-8` was created in 1992 to accomplish this. 

UTF-8 is the default encoding in Python but it is possible to access other encodings if you encounter data that have different encodings.

```python
# These strings demonstrate Python's utf-8 encodings

f = "如果您可以阅读此内容，则本课程似乎进展顺利。"
g = "إذا كنت تستطيع قراءة هذا ، يبدو أن هذا الدرس يسير على ما يرام."
h = "यदि आप इसे पढ़ सकते हैं, तो ऐसा लगता है कि यह पाठ ठीक चल रहा है।"
j = "ប្រសិនបើអ្នកអាចអានវាហាក់ដូចជាមេរៀននេះដំណើរការល្អ។"
k = "😀"
```

**In Python, you can convert _anything_ into a `str` with the `str(...)` function.**

# Collections of data: `list`

A `list` is a collection of Python objects that can be anything you like:

```python
my_first_list = ["Apples", "Oranges", "Bananas", "Pears"]
my_second_list = [12, 43.3, 56]
my_third_list = ["Cars", 42, "😀"]
```

A `list` can also contain other lists:

```python
two_lists_within_a_list = ["abc", [12, 634], [89.3, 0.0001, 342]]
a_nested_list = [[["a"]]]
```

You can convert other _iterables_ to a list by using the `list(...)` function.

> Note: Converting a `str` to a `list` will break out each character as a list item.

# What `str` and `list` have in common

Both `str` and `list` are considered _sequences_. In Python, another word for "sequence" is _**iterable**_. 

* A `str` is an iterable of individual characters
* A `list` is an iterable of other objects, which may be `str`, `int`, `float`, `bool`, other `list`s, or whatever.

Because `str` and `list` are both sequences, they have certain useful traits in common:

### 1. Indexing
### 2. Combinations with `+` and `*`
### 3. Methods

# 1. Indexing

Both characters in a `str` and objects in a `list` can be _indexed_. This means we can access individual _members_ of the `str` or `list` if we know their position within the `str` or `list`.

```python
b = 'This is a string.'
my_first_list = ["Apples", "Oranges", "Bananas", "Pears"]
```

```python
b[0] # This accesses the first character in the str
my_first_list[2] # This accesses the third object in the list
```

However, `int`s and `float`s are not indexable (or "subscriptable"):
```python
m = 234.234009234
n = 8982389482
```

```python
m[1] # This won't work to get the second number
n[5] # This won't work either (to get the sixth number)
```

```python
m = "234.234009234" # Now these are subscriptable
n = "8982389482" # Now these are subscriptable
```

### Note: Python is "zero indexed"
Python is what is called a "zero indexed" language. This means that all numbering starts from `0` and goes up. The item in the first position is indexed with `0`.

### Indexing syntax `[start:stop:step]`

Indexing is not just used to get single members of a sequence. It can be used to get a range of members or some selection of members.

Here are examples:
```python
diary_entry = 'If you want to destroy my sweater/Hold this thread as I walk away'

diary_entry[3:6] # you
diary_entry[15:22] # destroy
```

This example shows indexing used to get individual items from a list.


```python
shopping_list = ["Apples", "Oranges", "Bananas", "Pears", "Mangos", "Mangosteens", "Pandan leaf", "Betel leaf"]

shopping_list[2:6] # ["Bananas", "Pears", "Mangos", "Mangosteens"]
shopping_list[6:] # Read as "start at position six, do not stop": ["Pandan leaf", "Betel leaf"]
```

This is an example of how we can use the `step` parameter to get specific sub-sequences

```python
numbers_0_thru_20 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
two_times_table = numbers_0_thru_20[0::2] # Read as "start at position zero, do not stop, every 2nd item"
two_times_table # [0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20]
```

In [2]:
numbers_0_thru_20 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

### Understanding `[start:stop:step]`

* `[x]` - Get the item at position `x`
* `[x:y]` - Get the items starting at `x` and stop BEFORE `y` (i.e., does not include position `y`)
* `[x:y:z]` - Get the items start at `x` and stop before `y` retrieving every `z`th item.

#### Variations
* `[x:]` - Get the items starting at `x` and do not stop
* `[x::]` - Get the items starting at `x` and do not stop, retrieving every item (effectively same as above so you wouldn't write this)
* `[x::z]` - Get the items starting at `x` and do not stop, retrieving every `z`th item
* `[::z]` - Get the items starting at the beginning, do not stop, retrieving every `z`th item

#### Going backwards
* `[-1]` - Get the last item
* `[-2]` - Get the second-to-last item
* `[::-1]` - Reverse the list (step through the list in reverse)
* `[-2:-5:-1]` - Start at the second-to-last position, stop before the fifth-to-last position, retrieving every item going backward
* `[-2:-5:-2]` - Start at the second-to-last position, stop before the fifth-to-last position, retrieving every second item going backward

# 2. Add items with `+` and `*`

The `+` and `*` operators, in addition to being used with numbers, can be used on `str` and `list`. However, `-` and `/` cannot be.

Explanation by examples:

```python
a = "cat"
a*5 # "catcatcatcatcat"

b = ["list item 1"]
b*5 # ["list item 1", "list item 1", "list item 1", "list item 1", "list item 1"]
```

```python
c = "run"
a + c # "catrun"

d = ["list item 2", "list item 3"]
d + b # ["list item 2", "list item 3", "list item 1"]
```

# 3. Built-in _methods_

In Lesson 1 and Workbook 1, we were introduced to the idea of _functions_ and how to "call" them. 

> We have some object, called a function, that exists on it's own. We provide it some input (parameters) and it gives us an output.

**A _method_ is like a function that is _built-in_ to the data. It's like a built-in _process_ that the data can use to transform itself.**

### Explanation of methods by example:
```python
greeting = "good day, archibald"
greeting.title() # "Good Day, Archibald"

shopping_list = ["cheese", "eggs", "bread"]
shopping_list.append("yoghurt") # ["cheese", "eggs", "bread", "yoghurt"]
```

### An (incomplete, but useful) list of methods for `str`

**Bolded method names are perhaps the most useful to remember.**

| Transformation            | Testing                  | Investigation     | Creating |
| ---------------           | --------------           | ----------------- | ------------------| 
| **`.replace(old=, new=)`** | `.isalpha()`            | `.find(sub_str=)` |`.format(var=)`  
**`.split(sub_str=)`**      |  `.isalnum()`            | `.count(sub_str=)`|    
| `.capitalize()`         | `.isdigit()`            | `.rfind(sub_str=)` | `.join(iterable=)` 
|  `.title()`                |   `.islower()`           |                    |    
|  `.strip()`              |   `.isupper()`           | 
|  `.lstrip()`                 |  `.startswith(sub_str=)` |  
| `.rstrip()`              |  `.endswith(sub_str=)`   


### A (complete) list of methods for `list`

| Transformation             | Investigation     | Creating/Editing |
| ---------------- | ----------------- | ------------------| 
| `.reverse()`     | `.count(item=)`   | **`.append(item=)`**     
| `.sort()`        | `.index(item=)`   | `.extend(iterable=)`
|                  |                   | `.insert(item=)`
|                  |                   | `.pop(index=)`
|                  |                   | `.remove(item=)`
|                  |                   | `.clear()`
|                  |                   | `.copy()`

# Putting Variables into `list`s and `str`s

### To add your variables to a `list`

**Adding items to a list directly**
```python
item_1 = "my list item"
item_2 = "another list item"
my_items_in_a_list = [item_1, item_2]
```

**Adding items to the end of a list with `.append(item=)`**
```python
item_1 = "my list item"
my_list = [item_1]
item_2 = "another list item"
my_list.append(item_2)
```

**Inserting items at a certain position with `.insert(index=, item=)`**
```python
item_1 = "my list item"
item_2 = "another list item"
my_items_in_a_list = [item_1, item_2]
item_3 = "to go in the middle"
my_items_in_a_list.insert(index=1, item_3)
```

## To add your variables into a `str`

**Using f-strings**

```python
my_number = 34.2
my_string = f"My number is {my_number}"
```

```python
name = "Connor"
city = "Vancouver"
statement = f"{name} is from {city}"
```

## Errors you may run into in this week's Workbook

`IndexError`: When you are trying to access an index outside of the range of the collection, you will get an `IndexError`.

`NameError`: When you try to reference a variable name that you have not actually defined yet. You may have forgot to run a cell above where you defined your variable.




In [None]:
# My first IndexError

my_first_list = ["Apples", "Oranges", "Bananas", "Pears"]

# my_first_list[5] # There is nothing in the '5' position in this list, it's not that long
#my_first_list[2] # <- This one will work