<font size="+3"><strong>Python: Advanced</strong></font>

# Strings

## What's a string? <a id='whats-a-string'></a>

Recall that a `string` is any kind of information that can be represented with letters.

## Working with strings <a id='working-with-strings'></a>

When working with data, often files and directories have names that fit a pattern. For example, data on property prices in Colombia and Mexico might be stored in files named:

1. `colombia-real-estate-1.csv`
2. `colombia-real-estate-2.csv`
3. `colombia-real-estate-3.csv`
4. `mexico-city-real-estate-1.csv`
5. `mexico-city-real-estate-2.csv`
6. `mexico-city-real-estate-3.csv`
7. `mexico-city-real-estate-4.csv`
8. `mexico-city-real-estate-5.csv`
9. `mexico-city-test-features.csv`
10. `mexico-city-test-labels.csv`

When the list of files is short like this one, it's not difficult to find the ones we want, but if the list were longer, we might need some help. If we're only interested in finding files that deal with Mexico, we could search the files for files beginning with `mexico-city-real-estate-`. To do this, we'll use the `.glob` function. The code looks like this:

In [None]:
import glob

glob.glob("./data/mexico-city-real-estate-[0-9].csv")

The `.glob` function allows for pattern matching. In this example `[0-9]` allows for any digit between 0 and 9, but there are lots of other patterns that `.glob` can find. Here are a few of the more common ones:
- `*` Match any number of characters
- `?` Match a single character of any kind
- `[a-z]` Match any lower case alphabetical character in the current locale
- `[A-Z]` Match any upper case alphabetical character in the current locale
- `[!a-z]` Do not match any lower case alphabetical character in the current locale

So, if we wanted to find all the files from Mexico City, we would use code like this:

In [None]:
glob.glob("./data/mexico-city*")

<font size="+1">Practice</font> 

Try it yourself! Find only the data files containing the word `test`.

In [None]:
import glob

glob.glob("./data/mexico-city-test*.csv")

So far, you have only searched for files in one specific directory. It's also possible to search for files in sub directories. To get a listing of all notebook files starting from the directory above this one and all others below it, you can use:

In [None]:
glob.glob("../**/*.ipynb", recursive=True)

## Working with f-strings <a id='working-with-f-strings'></a>

We usually use `print` to examine output in Python, but most of the examples we've been printing have been relatively short. Formatted strings are helpful for all sorts of reasons, but when we're assembling and formatting a long string, using the `print` function can be difficult and time-consuming. Along the same lines, it's also useful to directly evaluate variables and expressions within strings. To do those things, we create `f""` strings. The code looks like this:

In [None]:
Home = "Mexico City"
f"My home is {Home}"

In [None]:
import datetime

python_birthday = datetime.datetime(year=1991, month=2, day=20)
print(
    f"Python first appeared on {python_birthday:%B %d} in the year {python_birthday:%Y}."
)

now = datetime.datetime.now()
print(f"Python is {now.year - python_birthday.year} years old.")

<font size="+1">Practice</font>  

Mexico-Tenochtitlan was established on 13 March 1325; use f-strings to indicate how long ago that was.

In [None]:
import datetime

mexico_founding = datetime.datetime(year=1325, month=3, day=13)
now = datetime.datetime.now()

f"Mexico-Tenochtitlan was established {now.year - mexico_founding.year} years ago."

*Sources and further reading* 
- [Online tutorial on finding list lengths in Python](https://www.w3schools.com/python/gloss_python_list_length.asp)
- [Official python documentation on the `len` function](https://docs.python.org/3/library/functions.html?#len)

# Iterators and Iterables 

A list is a container with a countable number of values. Because that's true, a list is an **iterable**, meaning that we can **iterate** through it one item at a time. In other words, iterators retrieve these values only when we ask for them. If we try to bring in a large database &mdash; over a million values, for example &mdash; asking for every action to be applied to every value will take up a huge amount of memory. Iterators are helpful because they allow us to free up memory to use for other tasks. We'll spend more time working with databases later on, but for now, let's take a look at some code:

In [None]:
from pymongo import MongoClient

client = MongoClient(host="localhost", port=27017)

(list(client.list_databases()))

Setting aside the first two lines of code, we have a method which has returned a list of four databases. If we want to examine each database by itself, we can create a variable called `results`, and then try to print it.

In [None]:
results = client.list_databases()
print((results))

That doesn't seem like much of anything, but if we add the **iterator** `next()`, we'll get back something more useful.

In [None]:
print(next(results))

That makes much more sense! As you can see, this returns the first row. If we do it again, we'll get the second row:

In [None]:
print(next(results))

We can keep doing this until we get to the end of the list, at which point we'll get an error telling us that there's nothing left to print. Every time we use the `next()` method, we're using it as an iterator to iterate through our iterable.

## List Comprehension <a id='list-comprehension'></a>

List comprehension is used to iterate through lists without explicitly writing loops, which is especially useful for filtering data according to a specific condition.

Let's take a look at a list that shows property prices in Mexican pesos.

In [None]:
price_mexican_pesos = [
    35000000.0,
    2000000.0,
    2700000.0,
    6347000.0,
    6994543.16,
    6617835.61,
    670000.0,
]

But maybe we're interested in comparing these prices to property values in Colombia. To do that, we'll need to figure out how to express the data on our list in Colombian pesos. We can use a `for` loop to make the conversion based on an exchange rate of 1 Mexican peso to 190 Colombian pesos. The code looks like this:

In [None]:
price_colombian_pesos = []
for price in price_mexican_pesos:
    price_colombian_pesos.append(price * 190)

print(price_colombian_pesos)

But what if we could do the same thing, but using fewer lines? That's what `list comprehension` is for. The code looks like this:

In [None]:
price_colombian_pesos = [price * 190 for price in price_mexican_pesos]

print(price_colombian_pesos)

We can use list comprehension to find all the `house` entries in this list of properties, like this:

In [None]:
records = [
    'sell,apartment,|México|Distrito Federal|Benito Juárez|,"19.384467,-99.135872",1860000.0,MXN,1843173.75,97996.85,,70.0,,26571.42857142857',
    'sell,apartment,|México|Distrito Federal|Iztapalapa|Cerro de La Estrella|,"19.324123,-99.074132",700000.0,MXN,693667.44,36880.53,,50.0,,14000.0',
    'sell,house,|México|Distrito Federal|La Magdalena Contreras|San Jerónimo Lídice|,"19.317653,-99.236291",3350000.0,MXN,3319694.98,176499.72,,350.0,,9571.42857142857',
    'sell,apartment,|México|Distrito Federal|Cuauhtémoc|,"19.446313,-99.14006",405108.0,MXN,401443.16,21343.71,,50.0,,8102.16',
    'sell,house,|México|Distrito Federal|Coyoacán|,"19.303906,-99.107812",7200000.0,MXN,7134866.79,379342.68,,250.0,,28800.0',
    'sell,apartment,|México|Distrito Federal|Benito Juárez|,"19.374171,-99.181264",2425000.0,MXN,2403062.73,127764.72,,96.0,,25260.416666666668',
    'sell,apartment,|México|Distrito Federal|Tlalpan|,"19.287428,-99.122283",1250000.0,MXN,1238692.07,65858.1,,65.0,,19230.76923076923',
    'sell,house,|México|Distrito Federal|Venustiano Carranza|,"19.436436,-99.117256",1362000.0,MXN,1349678.96,71758.99,,98.0,,13897.959183673467',
    'sell,apartment,|México|Distrito Federal|Benito Juárez|,"19.382429,-99.160199",2250000.0,MXN,2229645.73,118544.58,,90.0,,25000.0',
    'sell,house,|México|Distrito Federal|Tlalpan|Granjas Coapa|,"19.300456,-99.115741",3900000.0,MXN,3864719.42,205477.28,,153.0,,25490.19607843137',
    'sell,apartment,|México|Distrito Federal|Álvaro Obregón|,"19.363167,-99.276028",9000000.0,MXN,8918583.49,474178.35,,188.0,,47872.34042553192',
    'sell,house,|México|Distrito Federal|Coyoacán|Villa Coyoacán|,"19.348694,-99.16291",1150000.0,USD,21629775.0,1150000.0,,555.0,,2072.072072072072',
    'sell,house,|México|Distrito Federal|Tlalpan|,"19.300963,-99.144237",7500000.0,MXN,7432152.81,395148.62,,385.0,,19480.51948051948',
    'sell,house,|México|Distrito Federal|Coyoacán|Paseos de Taxqueña|,"19.343979,-99.124863",6310000.0,MXN,6252917.98,332451.71,,183.0,,34480.87431693989',
    'sell,apartment,|México|Distrito Federal|Coyoacán|San Diego Churubusco|,"19.354509,-99.149765",10000000.0,MXN,9909537.15,526864.83,,293.0,,34129.69283276451',
]

In [None]:
[row for row in records if "house" in row]

<font size="+1">Practice</font> 

Explore the list records in the list, and find all entries located in `Tlalpan`

In [None]:
[row for row in records if "Tlalpan" in row]

# Functions

When we code in Python, we want to create **readable** programs. One of the easiest ways to make a program readable is by not repeating sections of code that do the same thing. We do that by using `functions`. For example, you might have surface area of a property in square meters, but you want to see it in square feet. Keeping in mind that one square meter = 10.76391 square feet, you can write a function that starts with the area in square meters, and gives as output the area in square feet. The code looks like this:

In [None]:
def m2toft2(area_meter2):
    area_feet2 = 10.76391 * area_meter2
    return area_feet2

The code above defines a function called `m2toft2` that takes in a single input, called `area_meters`, and returns a single output, called `area_feet`. Let's try another one:

# References and Further Reading

- [Context Manager](https://book.pythontips.com/en/latest/context_managers.html)